Author Archives: Anthony Spiteri

The State of DRaaS…A Few Thoughts

Over the past week Garter released the 2018 edition of the Magic Quadrant for DR as a Service. The first thing that I noticed was how sparse the quadrant was when comparing it to the 2017 quadrant. Though many hold it in high regard, the Gartner Quadrant isn’t the be all and end all source of information pertaining to those offering DRaaS and succeeding. But It got me thinking as to the state of the current DRaaS market.

Just before I talk about that, what does it mean to see less vendors in the Magic Quadrant this year? Probably not much apart from the fact the ones that dropped out probably don’t see value in undertaking the process. Though, as mentioned in this post it could also be due to the criteria changing. As a comparison, from the past three years you can see above that only ten participants remain down from twenty three the previous year. There has been a shift in position and it’s great to see iLand leading the way beating out global powerhouses like IBM and Microsoft.

But does the lack of participants in this year’s quadrant point to a declining market? Are companies skipping DRaaS for traditional workloads and looking to build availability and resilience into the application layer? Has network extension become so common place and reliable that companies are becoming less inclined to use DRaaS providers and just rely on inbuilt replication and mobility? There is an argument to be had that the push to cloud native applications, the use of public cloud and evolving network technologies has the potential to kill DRaaS…but not yet…and not any time soon!

Hybrid cloud and multi-platform services are here to stay…and while the use of the hyper-scale public clouds, serverless and containerisation has increased, there is still an absolute play to be had in the business of ensuring availability for “traditional” workloads. Those workloads that sit on-premises, in private or public cloud platforms still use the base unit of measurement as the VM.

This is where DRaaS still has the long game.

Depending on region, there is still a smattering of physical servers running workloads (some regions like Asia are 5-10 years behind the rest of the world in Virtualisation…let alone containerization or public cloud). It’s true that most Service Providers who have been successful with Infrastructure as a Service have spent the last few years developing their Backup, Replication and Disaster Recovery as a service offerings.

Underpinning these service offerings are vendors like Veeam, Zerto, VMware and other availability vendors that offer software that Service Providers can leverage to offer DR services both from on-premises locations to their cloud platforms, or between their cloud platforms. Traditional backup vendors offer replication features that can also be used for DR. There is also the likes of Azure that offers DRaaS using technologies like Azure Site Recovery that looks to offer an end to end service.

DRaaS still predominantly focuses on the availability of Virtual Machines and the services and applications they run. The end goal is to have critical line of business applications identified, replicated and then made available in the case of a disaster. The definition of a disaster varies depending on who you speak to and the industry loves to use geo-scale impact events when talking about disasters…but reality is that the failure of a single instance or application is much more likely than whole system failures.

Disaster avoidance has become paramount with DRaaS. Businesses accept that outages will happen but where possible the ramifications of down time needs to kept to a minimum. Or better yet…not happen at all. In my experience, having worked in and with the service provider industry since 2002, all infrastructure/cloud providers will experience outages at some point…and as one of my work colleagues put it…

It’s an immutable truth that outages will occur! 

I’ve written before about this topic before and even had a shirt for sale at once stage stating that Outages are like assholes…everyone has one!

There are those that might challenge my thoughts on the subject, however as I talk to service providers around the world, the one thing they all believe in is that DRaaS is worth investing in and will generate significant revenue streams. I would argue that the DRaaS hasn’t even hit an inflection point yet, whereby it’s been seen to be a critically necessary service to consume for businesses. It’s true to say that Backup as a Service has nearly become a commodity…but DRaaS has serious runway.

References:

https://www.gartner.com/doc/3881865

What’s Changed: 2018 Gartner Magic Quadrant for Disaster Recovery as a Service

VMworld 2018 – #vGolf Las Vegas

#vGolf is back! Bigger and better than last years event. This is the third year of the event having had the inaugural #vGolf at VMworld 2016.

Last year we had 34 participants and everyone who attended had a blast at the brilliant Bali Hai Golf complex. This year, Bali Hai is closed during the VMworld weekend so we are moving the event to the Royal Links Golf Club which is approximately 8 miles from the Las Vegas Strip.

This year the event will expand with more sponsors and a more structured golfing competition with prizes going out for the top 2 placed two ball teams. Yes, this year we will be competing between foursomes.

Details will be updated on this site and on the Eventbrite page once the day is finalised and sponsors confirmed. For the moment, if you are interested please reserve your spot by securing a ticket. At this stage there are 40 available…depending on popularity that could be extended.

Last year the golfing fee’s where heavily subsidised to $40 USD per person (green fees usually $130-150). Once registered, I will be reaching out to ask that an advance payment is made via PayPal so that the morning of the event is a cash free zone…always hard to count early on a Sunday morning in Vegas!

The cost will include green fees plus buggy and club hire. The clubs also come with 6 brand new Callaway Golf balls. Shoe hire is extra on the day for those that wish to wear proper footwear. My intention is to fund some cold drinks on the day depending on the final sponsorship numbers.

Registration Page

There is a password on the registration page to protect against people registering directly via the public page. The password is vGolf2018. I’m looking forward to seeing you all there bright and early on Sunday morning!

Take a look at what awaits you…don’t miss out!

Sponsorship Call:

If you, or your company can offer some sponsorship for the event, please email [email protected] to discuss arrangements. I am looking to subsidise most of the green fee’s if possible and for that we would need four to five sponsors.

First Look – Zenko, Multi-Platform Data Replication and Management

A couple of weeks ago I stumbled upon Zenko via a LinkedIn post. I was interested in what it had to offer and decided to go and have a deeper look. With Veeam launching our vision to be the leader of intelligent data management at VeeamON this year, I have been on the lookout for solutions that do smart thing with data that addresses the needs related to controlling the accelerated spread and sprawl of that data. Zenko looks to be on the right track with it’s notion of freedom to avoid being locked into a specific cloud platform whether it’s private or public.

Having come from service provider land I have always been against the idea of a Hyper-Scaler Public Cloud monopoly that forces lock-in and diminishes choice. Because of that, I gravitated to Zenko’s mission statement:

We believe that everyone should be in control of their data. Zenko’s mission is to allow everyone to be in control of their data, while leveraging the efficiency of private and public clouds.

This platform looks to do data mobility across multiple cloud platforms through common communication protocols and by sharing a common set of APIs to manage it’s data sets. Zenko is focused on achieving this multi-cloud capability through a unified AWS S3 API based services with data management and federated search capabilities driving it’s use cases. Data mobility between clouds, whether private or public cloud services it what Zenko is aimed at.

Zenko Orbit:

Zenko Orbit is the cloud portal for data placement, workflows and global search. Focused for application developers and “DevOps” the premise of Zenko Orbit is that those guys can spend less time learning multiple interfaces for different clouds while leveraging the power of cloud storage and data management services without needing to be an expert across different platforms.

Orbit provides an easy way to create replication workflows between difference cloud storage platforms…weather it be Amazon s3, Azure Blog, GCP Storage or others. You then have the ability to search across a global namespace for system and user-defined metadata.

Quick Walkthrough:

Given this is open source you have the option to download and install a Zenko instance which will then be registered against the Orbit cloud portal or you can pull the whole stack from GitHub. They also have a sandboxed instance hosted by them that can be used to take the system for a test drive.

Once done, you are presented with a Dashboard that gives you an overview of the amount of data and other metric contained in your instance. Looking at the Settings area you are given details about the instance, account details and endpoints to use to connect up into. They also other the ability to download pre generated Cyberduck Profiles.

You need to create a storage management account to be able to browse your buckets in the Orbit portal.

Once that’s been done you can create a bucket and select a location which in the sandbox defaults to AWS us-east-1.

From here, you can add a new storage location and configure the replication policy. For this, I created a new Azure Blob Storage account as shown below.

From the Orbit menu, I then added a New Storage Location.

Once the location has been added you can configure the bucket replication. This is the cool part that is the premise of the platform. Being able to setup policies to replicate data across multiple cloud platforms. From the sandbox, the policy is one way meaning there is no directional replication. Simply select the source and destination and the bucket from the menu.

Once that has been done you can connect to the endpoint and upload files. I tested this out with the setup above and it worked as advertised. Using the CyberDuck profile I connected in, uploaded some files and monitored the Azure Blog storage end for the files to replicate.

Conclusion: 

While you could say that Zenko feels like DFS-R for the multi-platform storage world, the solution has impressed me. Many would know that it’s not easy to orchestrate the replication of data between different platforms. They are also talking up their capabilities around extensibility of the platform as is relates to data management, backend storage plugins and search.

I think about this sort of technology and how it could be extended to cloud based backups. Customers could have the option to tier into cheaper cloud based storage and then further protect that data by replicating it to another cloud platform which could be cheaper yet. This could achieve added resiliency while offering cost benefits. However there is also the risk that the more spread out the data is, the harder it is to control. That’s where intelligent data management comes into play…interesting times!

References:

Zenko Orbit – Multi-Cloud Data Management Simplified

 

Workaround – VCSA 6.7 Upgrade Fails with CURL Error: Couldn’t resolve host name

It’s never an issue with DNS! Even when DNS looks right…it’s still DNS! I came across an issue today trying to upgrade a 6.5 VCSA to 6.7. The new VCSA appliance deployment was failing with an OVFTool error suggesting that DNS was incorrectly configured.

Initially I used the FQDN for source and target vCenter’s and let the installer choose the underlying host to deploy the new VCSA appliance to. Even though everything checked out fine in terms of DNS resolution across all systems I kept on getting the failure. I triple checked name resolution on the machine running the update, both vCenter’s and the target hosts. I even tried using IP addresses for the source and target vCenter but the error remained as it still tried to connect to the vCenter controlled host via it’s FQDN resulting in the error.

After doing a quick Google search and finding nothing, I changed the target to be an ESXi host directly and used it’s IP address over it’s FQDN. This time the OVFTool was able to do it’s thing and deploy the new VCSA appliance.

The one caveat when deploying directly to a host over a vCenter is that you need to have the target PortGroup configured as an ephemeral…but that’s a general rule of bootstrapping a VCSA in any case and it’s the only one that will show up from the drop down list.

While very strange given all DNS checked out as per my testing, the workaround did it’s thing and allowed me to continue with the upgrade. This didn’t find the root cause…however when you need to motor on with anupgrade, a workaround is just as good!

Veeam 9.5 Update 3a – What’s in it for Service Providers

Earlier this week Update 3a (Build 9.5.1922) for Veeam Backup & Replication was made generally available. This release doesn’t contain any major new features or enhancements but does add support for a number of key platforms. Importantly for our Cloud and Service Providers Update 3a extends our support for vSphere vSphere 6.7, vSphere 6.5 Update 2 (with a small caveat) and vCloud Director 9.1. We also have support for the April update of Windows 10 and the 1803 versions of Windows Server and Hyper-V.

vSphere 6.7 support (VSAN 6.7 validation is pending) is something that our customers and partners have been asking for since it was released in late April and it’s a credit to our R&D and QC teams to reach supportability within 90 days given the amount of underlying changes that came with vSphere 6.7. The performance of DirectSAN and Hot Add transport modes has been improved for backup infrastructure configurations through optimizing system memory interaction.

As mentioned, the recently released vCloud Director 9.1 is supported and maintains our lead in the availability of vCloud Director environments. Storage snapshot only vCloud Director backup jobs are now supported for all storage integrations tht support storage snapshot-only jobs. Update 3a also fully supports the VMware Cloud on AWS version 1.3 release without the requirement for the patch.

One of the new features in Update 3a is a new look Veeam vSphere Client Plug-in based on VMware’s Clarity UX. This is more a port, however with the announcement that the Flex based Web Client will be retired it was important to make the switch.

In terms of key fixes for Cloud and Service Providers, I’ve listed them below from the VeeamKB.

  • User interface performance has been improved for large environments, including faster VM search and lower CPU consumption while browsing through job sessions history.
  • Incremental backup runs should no longer keep setting ctkEnabled VM setting to “true”, resulting in unwanted events logged by vCenter Server.
  • Windows file level recovery (FLR) should now process large numbers of NTFS reparse points faster and more reliably.

Veeam Cloud Connect
Update 3a also includes enhancements and bug fixes for cloud and service providers who are offering Veeam Cloud Connect services, For more information relating to that, please head to this thread on the Veeam Cloud & Service Provider forum. A reminder as well, that if you are running Cloud Connect Replication you need to be aware that clients replicating in on higher VMware VM Hardware versions will error out. Meaning you need to either let the customer know that the replication cluster is at a certain level…or upgrade to the latest version…which is now vSphere 6.7 that gives Version 14.

For a full list check out the release notes below and download the update here. You can also download the update package without backup agents here.

References:

https://www.veeam.com/kb2646

Adding Let’s Encrypt SSL Certificate to vCloud Director Keystore

For the longest time the configuring of vCloud Director’s SSL certificate keystore has been the thing that makes vCD admins shudder. There are lots of posts on the process…some good…some not so good. I even have a post from way back in 2012 about fronting vCD with a Citrix NetScaler and if I am honest, I cheated in having HTTPS at the load balancer deal with the SSL certificate while leaving vCD configured with the self signed cert. With the changes to the way the HTML5 Tenant Portal deals with certs and DNS I’m not sure that method would even work today.

I wanted to try and update the self signed certs in both my lab environments to assist in resolving the No Datacenters are available issue that cropped up in vCD 9.1. Instead of generating and using self signed certs I decided to try use Let’s Encrypt signed certs. Most of the process below is curtesy of blog posts from Luca Dell’Oca and it’s worth looking at this blog post from Tom Fojta who has a PowerShell script to automate Let’s Encrypt SSL certs for us on NSX Edge load balancers.

In my case, I wanted to install the cert directly into the vCD Cell Keystore. The manual end to end the process is listed below. I intend to try and automate this process so as to overcome the one constraint with using Let’s Encrypt…that is the 90 day lifespan of the certs. I think that is acceptable and it ensures validity of the SSL cert and a fair caveat given the main use case for this is in lab environments.

Generating the Signed SSL Cert from Let’s Encrypt:

To complete this process you need the ACMESharp PowerShell module. There are a couple of steps to follow which include registering the domain you want to create the SSL cert against, triggering a verification challenge that can be done by creating a domain TXT record as shown in the output of the challenge command. Once submitted, you need to look out for a Valid Status response.

Once complete, there is a script that can be run as show on Luca’s Blog. I’ve added to the script to automatically import the newly created SSL cert into the Local Computer certificate store.

From here, I exported the certificate with the private key so that you are left with a PFX file. I also saved to Base-64 X.509 format the Root and Intermediate certs that form the whole chain. This is required to help resolve the No Datacenters are available error mentioned above. Upload the three files to the vCD cell and continue as shown below.

Importing Signed SSL from Let’s Encrypt into vCD Keystore:

Next, the steps to take on the vCD Cell can be the most complex steps to follow and this is where I have seen different posts do different things. Below shows the commands from start to finish that worked for me…see inline for comments on what each command is doing.

Once that has been done and the vCD services has restarted, the SSL cert has been applied and we are all green and the Let’s Encrypt SSL cert is in play.

Quick Tip: Let’s Encrypt ACME Powershell Ownership Challenge Can’t see Challenge Data

I’m currently going through the process of acquiring a new Let’s Encrypt free SSL Certificate against a new domain I registered. For a great overview of what Let’s Encrypt is and what is can do for you, head over to Luca Dell’Oca’s blog here. I was following Luca’s instructions for getting the new domain authorised for use with the Let’s Encrypt service via a DNS challenge when I ran into the following.

After running the PowerShell command to generate the challenge, it was not returning the Handler Message as expected form the direct output…well obviously anyway.

After scratching my head for a bit, I checked to see if the data was contained withing the returned PowerShell command.

From here I was able to create the DNS TXT entry and complete the challenge.

Just in case it wasn’t obvious this very quick post will save you a bit of time.

Released: vCloud Director 9.1.0.1 – API Tweaks and Resolved Issues

There was a point release of vCloud Director 9.1 (9.1.0.1 Build 8825802) released last week, bringing with it an updated Java Runtime plus new API functions that allow additional configuration of advanced settings for virtual machines. There was also a number of bug fixes from the initial 9.1 release earlier in the year. Some of the issues that are resolved are significant and worth looking into if you have 9.1 GA deployed.

I haven’t been able to find an exact list of the new API functions, however traversing the Org Admin rights API call I did spot something new relating to Latency as show below.

And when I granted this right through the API mechanism I was able to allocate the right to the Org Admin via the administrator web interface.

I’m trying get a list of all the new API rights that where added as part of this release and will update this post when I have them.

Some of the bigger issues that where resolved are listed below:

  • In vCloud Director Tenant Portal, the Configure Services tab is disabled for Advanced Edge Gateway. In vCloud Director Tenant Portal, you cannot configure Advanced Edge Gateway settings as an administrator with any of the Gateway Advanced Services rights.
  • When importing a virtual machine from vCenter Server, vCloud Director relocates it to the primary resource pool. When you import a virtual machine created on a non-primary cluster in vCenter Server to vCloud Director, the machine is always relocated to the primary cluster.
  • In the vCloud Director Tenant Portal, the administrator of one organization can see virtual machines that belong to other vCloud Director organizations. When you configure the organizations in vCloud Director to use an LDAP server for authentication, an administrator of one organization, who is logged in vCloud Director Tenant Portal, can see virtual machines that belong to other organizations.
  • Importing a virtual machine from the vCenter Server deletes the original virtual machine after cloning it. When importing a virtual machine from the vCenter Server to vCloud Director involves changing its datastore, the process consists in cloning the source virtual machine and deleting it, while effectively changing its Managed Object Reference (MoRef).
  • Enabling High Availability for existing edge gateways in a data center with installed NSX Edge 6.4.0 fails.  In a data center with installed NSX Edge 6.4.0, you cannot enable High Availability for existing edge gateways that belong to a datastore cluster with enabled Storage Distributed Resource Scheduler (SDRS).
  • vCloud Director Tenant Portal does not display existing organization virtual data centers. When you use a self-signed SSL certificate for vCloud Director and you log in to vCloud Director Tenant Portal, you do not see a list of the existing organization virtual data centers.

The rest can be found here.

Just to finish up, there is still a lingering issue from the GA release that changed the behaviour of the HTML5 Tenant UI in scenarios where the SSL self signed certificates are used which is covered in this VMwareKB. Even though (as shown above) it’s been listed as resolved…I have run into it again in two different installs.

Obviously, if you are using legit SSL certificates you won’t have the issue, however the work around is not doing it’s thing for me. Hopefully I can resolve this ASAP as I am about to start some validation testing for Veeam and vCloud Director as well as start to test out our new functionality coming in Update 4 of Backup & Replication for Cloud Connect Replication.

For those with the correct entitlements…download here.

#LongLivevCD

References:

https://docs.vmware.com/en/vCloud-Director/9.1/rn/rel_notes_vcloud_director_9-1-0-1.html

Released: Veeam Availability Console Update 1

Today, Veeam Availability Console Update 1 (Build 2.0.2.1750) was released. This update improves on our multi-tenant service provider management and reporting platform that is provided free to VCSPs. VAC acts as a central portal for Veeam Cloud and Service Providers to remotely manage and monitor customer instances of Backup & Replication including the ability to monitor Cloud Connect Backup and Replication jobs and failover plans. It also is the central mechanism to deploy and manage our Agent for Windows which includes the ability to install agents onto on-premises machines and apply policies to those agents once deployed.

What’s new in Update 1:

If you want to get the low down from the What’s new document can be access here. I’ve summarised the new features and enhancements below and expanded on the key ones below.

  • Enhanced support for Veeam Agents
  • New Operator Role
  • ConnectWise Manage Plugin
  • Improved Veeam Backup & Replication monitoring
  • New backup policy types
  • Sub-tenant Accounts and Sub-tenant Management
  • Alarm for tracking VMs stored in cloud repositories
  • RESTful APIs enhancements

RESTful APIs enhancements: VACs API first approach gets a number of enhancements in Update 1 with more information stored in the VAC configuration database accessible via new RESTful API calls that include:

  • Managed backup server licenses
  • Tenant descriptions
  • References to the parent object for users, discovery rules and computers

As with the GA, this is all accessible via the built in Swagger Interface.

Enhanced support for Veeam Agents: VAC Update 1 introduces support for Veeam Agents that are managed by Veeam Backup & Replication. This adds monitoring and alarms for Veeam Agent for Microsoft Windows and Veeam Agent for Linux that are managed by a Veeam Backup & Replication. One of the great features of this is the search functionality which allows you to more efficiently search for agent instances that exist in Backup & Replication and see their statuses.

New Operator Role: While not the Reseller role most VCSPs are after this new role allows VCSPs wanting to delegate VAC access to their own IT staff to take advantage of the new operator role without granting complete administrative access. This role allows access to everything essential to remotely monitor and manage customer environments, but restricts access to VAC configuration settings.

ConnectWise Manage Plugin: ConnectWise Manage is a very popular platform used by MSPs all over the world. VAC Update 1 includes native integration with ConnectWise Manage. The integration allows VCSPs to synchronize and map company accounts between the two platforms, integrated billing, enabling you to use ConnectWise Manage to generate tenant invoices based on their usage and the plugin allows you to create tickets based on triggered alarms in VAC. The integration is solid and based on VACs strong underlying API driven approach. More importantly, this is the first extensibility feature of VAC using a Plugin framework…the idea is for it to just be the start.

Alarm for tracking VMs stored in cloud repositories:  A smaller enhancement, but one that is important for those running Cloud Connect is the new alarm that allows you to be notified when the number of customer VMs stored in the cloud repository exceeds a certain threshold.

Scalability enhancements: Finally there has been a significant improvement in VAC scalability limits when it comes to the number of managed Backup & Replication servers for each VAC instance. This ensures stable operation and performance when managing up to 10,000 Veeam Agents and up to 600 Backup & Replication servers, protecting 150-200 VMs or Veeam Agents each.

References and Product Guides:

https://www.veeam.com/vac_2_0_u1_release_notes_rn.pdf

https://www.veeam.com/documentation-guides-datasheets.html

https://www.veeam.com/availability-console-service-providers-faq.html

https://www.veeam.com/vac_2_0_u1_whats_new_wn.pdf

Installing and Managing Veeam Agent for Linux with Backup & Replication

With the release of Update 3 of Veeam Backup & Replication we introduced the ability to manage agent from within the console. This was for both our Windows and Linux agents and aimed to add increased levels of manageability and control when deploying agents in larger enterprise type environments. For an overview of the features there is a veeam.com blog post here that goes through the different components and the online help documentation is also helpful in providing an detailed look at the ins and outs.

Scouring the web, there has been a lot written about the Windows Agent and how that’s managed from the Backup & Replication console, but not a lot written about managing Linux Agents. There theory is exactly the same…Add a Protection Group, add the machines you want to include in the Protection Group, scan the group and then install the agent. From there you can add the agents to a new or existing backup job and manage licenses.

In terms of how that looks and the steps you need to take. Head to the Inventory menu section and right click on Physical & Cloud Infrastructure to Add Protection Group. Give the group a meaningful name and then to add Linux machines select Individual or CSV method under Type. In my example I chose to add the Linux machines individually and added then added the machines via their Host Name or IP Address with the right credentials.

Under Options, you can select the Distribution Server which is where the agent will be deployed from and choose to set a schedule to Rescan the Protection Group.

Once this part is complete the first Discovery is run and all things being equal the Linux Agent will be installed to the machines that where added as part of the first step. I actually ran into an issue upon first run where the agent didn’t install due to the following error shown below.

The fix was as simple as installing the DKMS package on the servers via apt-get. Asking around, this was not a normal occurrence and that it should deploy and install without issue. Maybe this was due to my Linux server being TurnKey Linux appliances…in any case, once the package was installed I re-triggered the install by right clicking the machine and selecting Install Agent.

Once that job has finished we are able to assign the Linux agent machines to new or existing backup jobs.

As with the Windows Agent you have two different Job modes. In my example I created a job of each type. The result is one agent that is in lock down mode meaning reduced functionality from the GUI or Command line while the other has more functionality but is still managed by the system administrator. The differences between both GUIs is shown below.

From the Jobs list under the Home menu this is represented by the job type being Linux Agent Backup vs Linux Agent Policy.

Finally, when looking at the licensing aspect, once a license has been applied to a Backup & Replication server that contains agent licenses, an additional view will appear under the License view in the console where you can assign or remove agent licenses from.

From within Enterprise Manager (if the VBR instance is managed), you also see additional tab views for the Windows and Linux Agents as shown below.

References:

https://helpcenter.veeam.com/docs/backup/agents/introduction.html?ver=95

https://helpcenter.veeam.com/docs/agentforlinux/userguide/license_vbr_revoke.html?ver=20

https://helpcenter.veeam.com/docs/backup/agents/agent_policy.html?ver=95

« Older Entries