Author Archives: Anthony Spiteri

Kubernetes Everywhere…Time to Take off the Blinkers!

This is more or less a follow up post to the one I wrote back in 2015 about the state of containers in the IT World as I saw it at the time. I started off that post talking about the freight train that was containerization along with a cheeky meme… fast forward four years and the narrative around containers has changed significantly, and now there is new cargo on that freight train… and it’s all about Kubernetes!

In my previous role working at a Cloud Provider, shortly after writing that 2015 post I started looking at ways to offer containers as a service. At the time there wasn’t much, but I dabbled a bit in Docker and if you remember at the time, VMware’s AppCatalyst… which I used to deploy basic Docker images on my MBP (think it’s still installed actually) with the biggest highlight for me at the time being able to play Docker Doom!

I also was involved in some of the very early alphas for what was at the time vSphere Integrated Containers (Docker containers as VMs on vCenter) which didn’t catch on compared to what is currently out there for the mass deployment and management of containers. VMware did evolve it’s container strategy with Pivotal Container Services, however those outside the VMware world where already looking elsewhere as the reality of containerised development along with serverless and cloud has taken hold and become accepted as a mainstream IT practice.

Even four or five years ago I was hearing the word Kubernetes often. I remember sitting in my last VMware vChampion session with where Kit Colbert was talking about Kuuuuuuuurbenites (the American pronunciation stuck in my mind) and how we all should be ready to understand how it works as it was about to take over the tech world. I didn’t listen… and now, I have a realisation that I should have started looking into Kubernetes and container management in general more seriously sooner.

Not because it’s fundamental to my career path…not because I feel like I was lagging technically and not because there have been those saying for years that Kubernetes will win the race. There is an opportunity to take off the blinkers and learn something that is being adopted by understanding the fundamentals about what makes it tick. In terms of discovery and learning, I see this much like what I have done over the past eighteen months with automation and orchestration.

From a backup and recovery point of view, we have been seeing an increase in the field of customers and partners asking how they backup containers and Kubernetes. For a long time the standard response was “why”. But it’s becoming more obvious that the initial stateless nature of containers is making way for more stateful persistent workloads. So now, it’s not only about backing up the management plane.. but also understanding that we need to protect the data that sits within the persistent volumes.

What I’ll Be Doing:

I’ve been interested for a long time superficially about Kubernetes, reading blogs here and there and trying to absorb information where possible. But as with most things in life, you best learn by doing! My intention is to create a series of blog posts that describe my experiences with different Kubernetes platforms to ultimately deploy a simple web application with persistent storage.

These posts will not be how-tos on setting up a Kubernetes cluster etc. Rather, I’ll look at general config, application deployment, usability, cost and whatever else becomes relevant as I go through the process of getting the web application online.

Off the top of my head, i’ll look to work with these platforms:

  • Google Kubernetes Engine (GKE)
  • Amazon Elastic Container Service for Kubernetes (EKS)
  • Azure Container Service (AKS)
  • Docker
  • Pivotal Container Service (PKS)
  • vCloud Director CSE
  • Platform9

The usual suspects are there in terms of the major public cloud providers. From a Cloud and Service Provider point of view, the ability to offer Kubernetes via vCloud Director is very exciting and if I was still in my previous role I would be looking to productize that ASAP. For a different approach, I have always likes what Platform 9 has done and I was also an early tester of their initial managed vSphere support, which has now evolved into managed OpenStack and Kubernetes. They also recently announced Managed Applications through the platform which i’ve been playing with today.

Wrapping Up:

This follow up post isn’t really about the state of containers today, or what I think about how and where they are being used in IT today. The reality is that we live in a hybrid world and workloads are created as-is for specific platforms on a need by need basis. At the moment there is nothing to say that virtualization in the form of Virtual Machines running on hypervisors on-premises are being replaced by containers. The reality is that between on-premises, public clouds and in between…workloads are being deployed in a variety of fashions… Kubernetes seems to have come to the fore and has reached some level of maturity that makes it a viable option… that could no be said four years ago!

It’s time for me (maybe you) to dig underneath the surface!

Link:

https://kubernetes.io/docs/concepts/overview/what-is-kubernetes/

Kubernetes is mentioned 18 times in this and on this page

Mapping vCloud Director Backup Jobs to Self Service Portal Tenants

Since version 7 of Backup & Replication, Veeam has lead the way in regard to the protection of workloads running in vCloud Director. With version 7 Veeam first released deep integration into vCD that talked directly to the vCD APIs to facilitate the backup and recovery of vCD workloads and their constructs. More recently in version 9.5, the vCD Self Service Portal was released which also taps into vCD for tenant authentication.

The portal leverages Enterprise Manager and allows service providers to grant their tenants self-service backup for their vCD workloads. More recently we have seen some VCSPs integrate the portal into the new vCD UI via the extensibility plugin which is a great example of the power that Veeam has with vCD today while we wait for deeper, native integration.

It’s possible that some providers don’t even know that this portal exists let alone the value it offers. I’ve covered the basics of the portal here…but in this post, I am going to quickly mention an extension to a project I released last year for the vCD Self Service Portal, that automatically enables a tenant, creates a default backup jobs based on policies, tie backup copy jobs to default job for longer retention and finally import the jobs into the vCD Self Service Portal ready for use.

Standalone Map and Unmap PowerShell Script:

From the above project, the job import part has been expanded to include its own standalone PowerShell script that can also be used to map or unmap existing vCD Veeam Backup jobs to a a tenant to manage from the vCD Self Service Portal. This is done using the Set-VBRvCloudOrganizationJobMapping commandlet.

As shown below, this tenant has already configured a number of jobs in the Portal.

There was another historical job that was created outside of the portal directly from the Veeam console. Seen below as TEST IMPORT.

To map the job, run the PowerShell script is with the -map parameter. A list of all existing vCloud Director Backup jobs will be listed. Once the corresponding number has been entered the commandlet within the script will be run and the job mapped to the tenant linked to the job.

Once that has been run, the tenant now has that job listed in the vCD Self Service Portal.

There is a little bit of error checking built into the script, to that it exits nicely on an exception as shown below.

Finally, if you want to unmap a job from the vCD Self Service portal, run the PowerShell script with the -unmap parameter. Conclusion:

Like most things I work on and then publish for general consumption, I had a request to wrap some logic around the Set-VBRvCloudOrganizationJobMapping commandlet from a service provider partner. The script can be taken and improved, but as-is, provides an easy way to retrieve all vCloud Jobs belonging to a Veeam Server, select the desired job and then have it mapped to a tenant using the vCD Self Service Portal.

References:

https://github.com/anthonyspiteri/powershell/blob/master/vCD-Create-SelfServiceTenantandPolicyJobs/vCD_job.ps1

https://helpcenter.veeam.com/docs/backup/powershell/set-vbrvcloudorganizationjobmapping.html?ver=95u4

First Look: On Demand Recovery with Cloud Tier and VMware Cloud on AWS

Since Veeam Cloud Tier was released as part of Backup & Replication 9.5 Update 4, i’ve written a lot about how it works and what it offers in terms of offloading data from more expensive local storage to what is fundamentally cheaper remote Object Storage. As with most innovative technologies, if you dig a little deeper… different use cases start to present themselves and unintended use cases find their way to the surface.

Such was the case when, together with AWS and VMware, we looked at how Cloud Tier could be used as a way to allow on demand recovery into a cloud platform like VMware Cloud on AWS. By way of a quick overview, the solution shown below has Veeam backing up to a Scale Out Backup Repository which has a Capacity Tier backed by an Object Storage repository in Amazon S3. There is a minimal operational restore window set which means data is offloaded quicker to the Capacity Tier.

Once there, if disaster happens on premises, an SDDC is spun up, a Backup & Replication Server deployed and configured into that SDDC. From there, a SOBR is configured with the same Amazon S3 credentials that connects to the Object Storage bucket which detects the backup data and starts a resync of the metadata back to the local performance tier. (as described here) Once the resync has finished workloads can be recovered, streamed directly from the Capacity Tier.

The diagram above has been published on the AWS Reference Architecture page, and while this post has been brief, there is more to come by way of an offical AWS Blog Post co-authored by myself Frank Fan from AWS around this solution. We will also look to automate the process as much as possible to make this a truely on demand solution that can be actioned with the click of a button.

For now, the concept has been validated and the hope is people looking to leverage VMware Cloud on AWS as a target for disaster and recovery look to leverage Veeam and the Cloud Tier to make that happen.

References: AWS Reference Architecture

Quick Fix: Unable to Login to WordPress Site

I’ve just had a mild scare in that I was unable to log into this WordPress site even after trying a number of different ways to gain access by resetting the password via the methods listed on a number of WordPress help sites. The standard reset my password via email option was also not working. I have access directly to the web server and also have access to the backend MySQL database via PHPMyAdmin. Even with all that access, and having apparently changed the password value successfully, I was still getting failed logins.


I had recently enabled Two Factor Authentication using the Google Authenticator and using the WordPress Plugin of the same name. I suspected that this might be the issue as one of the suggestions on the troubleshooting pages was to disable all plugins.

Luckily, I remembered that through the WordPress website you have administrative access back to your blog site. So rather than go down a more complex and intrusive route, I went in and remotely disabled the plugin in question.

Disabling that plugin worked and I was able to login. I’m not sure yet if there was general issues with the Google Authenticator, or if the Plugin had some sort of issue, however end result was I could login and my slight panic was over.

Interesting note is that most things can be done through the WordPress website including publishing blog posts and general site administration. In this case it saved me a lot of time trying to work out what was happening with me not able to login. So if you do have issues with your login, and you suspect it’s a Plugin, make sure you have access to WordPress.com and remotely handle the activation status of the plugin.

Veeam Availability Console v3 Important Patch Release

Today, a new patch was released for Veeam Availability Console v3 which brings the build to 3.0.0.2725. Contained in this patch is a number of fixes that covers reporting and licensing, server fixes, agent fixes and there are also a number of other resolved issues including some RESTful API fixes as well as for those using the ConnectWise Plugin. The patch is advised to be deployed to all VCSPs running Veeam Availability Console v3 in production.

To apply the patch, head to the VeeamKB here and follow the instructions. You need to have at least VAC v3 Build 3.0.0.2647 prior to installing as shown below.

From there, make sure you have a backup of the database, close down the Web UI and execute both MSI packages as administrator on the server.

The first one updates the VAC server. The second one updates the WebUI. Once completed the patches are applied and VAC v3 is up to date running on version number Server Version 3.0.0.2725.

References:

https://www.veeam.com/kb2960

Orchestration of NSX by Terraform for Cloud Connect Replication with vCloud Director

That is probably the longest title i’ve ever had on this blog, however I wanted to highlight everything that is contained in this solution. Everything above works together to get the job done. The job in this case, is to configure an NSX Edge automatically using the vCloud Director Terraform provider to allow network connectivity for VMs that have been replicated into a vCloud Director tenant organization with Cloud Connect Replication.

With the release of Update 4 for Veeam Backup & Replication we enhanced Cloud Connect Replication to finally replicate into a Service Providers vCloud Director platform. In doing this we enabled tenants to take advantage of the advanced networking features of the NSX Edge Services Gateway. The only caveat to this was that unlike the existing Hardware Plan mechanism, where tenants where able to configure basic networking on the Network Extension Appliance (NEA), the configuration of the NSX Edge had to be done directly through the vCloud Director Tenant UI.

The Scenario:

When VMs are replicated into a vCD organisation with Cloud Connect Replication the expectation in a full failover is that if a disaster happened on-premises, workloads would be powered on in the service provider cloud and work exactly as if they where still on-premises. Access to services needs to be configured through the edge gateway. The edge gateway is then connected to the replica VMs via the vOrg Network in vCD.

In this example, we have a LAMP based web server that is publishing a WordPress site over HTTP and HTTPs.

The VM is being replicated to a Veeam Cloud Service Provider vCloud Director backed Cloud Connect Replication service.

During a disaster event at the on-premises end, we want to enact a failover of the replica living at in the vCloud Director Virtual Datacenter.

The VM replica will be fired up and the NSX Edge (the Network Extension Appliance pictured is used for partial failovers) associated to the vDC will allow the HTTP and HTTPS to be accessed from the outside world. The internal IP and Subnet of the VM is as it was on-premises. Cloud Connect Replication handles the mapping of the networks as part of the replication job.

Even during the early development days of this feature I was thinking about how this process could be automated somehow. With our previous Cloud Connect Replication networking, we would use the NEA as the edge device and allow basic configuration through the Failover Plan from the Backup & Replication console. That functionality still exists in Update 4, but only for non vCD backed replication.

The obvious way would be to tap into the vCloud Director APIs and configure the Edge directly. Taking that further, we could wrap that up in PowerShell and invoke the APIs from PowerShell, which would allow a simpler way to pass through variables and deal with payloads. However with the power that exists with the Terraform vCloud Director provider, it became a no brainer to leverage this to get the job done.

Configuring NSX Edge with Terraform:

In my previous post around Infrastructure as Code vs APIs I went through a specific example where I configured an NSX Edge using Terraform. I’m not going to go over that again, but what I have done is published that Terraform plan with all the code to GitHub.

The GitHub Project can be found here.

The end result after running the Terraform Plan is:

  • Allowed HTTP, HTTPS, SSH and ICMP access to a VM in a vDC
    • Defined as a variable as the External IP
    • Defined as a variable as the Internal IP
    • Defined as a variable as the vOrg Subnet
  • Configure DNAT rules to allow HTTP, HTTPS and SSH
  • Configure SNAT rule to allow outbound from the vOrg subnet

The variables that align with the VM and vORG network are defined in the terraform.tfvars file and need to be modified to match the on-premises network configuration. The variables are defined in the variables.tf file.

To add additional VMs and/or vOrg networks you will need to define additional variables in both files and add additional entires under the firewall_rules.tf and nat_fules.tf. I will look at ways to make this more elegant using Terraform arrays/lists and programatic constructs in future.

Creating PowerShell for Execution:

The Terraform plan can obviously be run standalone and the NSX Edge configuration can be actioned at any time, but the idea here is to take advantage of the script functionality that exists with Veeam backup and replication jobs and have the Terraform plan run upon completion of the Cloud Connect Replication job every time it is run.

To achieve this we need to create a PowerShell script:

GitHub – configure_vCD_VCCR_NSX_Edge.ps1

The PowerShell script initializes Terraform and downloads the Provider, ensures there is an upgrade in the future and then executes the Terraform plan. Remembering that that variables will change within the Terraform Plan its self, meaning these scripts remain unchanged.

Adding Post Script to Cloud Connect Replication Job:

The final step is to configure the PowerShell script to execute once the Cloud Connect Replication job has been run. This is done via a post script settings that can be found in Job Settings -> Advanced -> Scripts. Drop down to selected ps1 files and choose the location of the script.

That’s all that is required to have the PowerShell script executed once the replication job completes.

End Result:

Once the replication component of the job is complete, the post job script will be executed by the job.

This triggers the PowerShell, which runs the Terraform plan. It will check the existing state of the NSX Edge configuration and work out what configuration needs to be added. From the vCD Tenant UI, you should see the recent tasks list modifications to the NSX Edge Gateway by the user configured to access the vCD APIs via the Provider.

Taking a look at the NSX Edge Firewall and NAT configuration you should see that it has been configured as specified in the Terraform plan.

Which will match the current state of the Terraform plan

Conclusion:

At the end of the day, what we have done is achieved the orchestration of Veeam Cloud Connect Replication together with vCloud Director and NSX… facilitated by Terraform. This is something that Service Providers offering Cloud Connect Replication can provide to their clients as a way for them to define, control and manage the configuration of the NSX edge networking for their replicated infrastructure so that there is access to key services during a DR event.

While there might seem like a lot happening, this is a great example of leveraging Infrastructure as Code to automated as otherwise manual task. Once the Terraform is understood and the variables applied, the configuration of the NSX Edge will be consistent and in a desired state with the config checked and applied on every run of the replication job. The configuration will not fall out of line with what is required during a full failover and will ensure that services are available if a disaster occurs.

References:

https://github.com/anthonyspiteri/automation/tree/master/vccr_vcd_configure_nsx_edge

The Reality of Disaster Recovery Planning and Testing

As recent events have shown, outages and disasters are a fact of life in this modern world. Given the number of different platforms that data sits on today, we know that disasters can equally come in many shapes and sizes and lead to data loss and impact business continuity. Because major wide scale disasters occur way less often than smaller disasters from within a datacenter, it’s important to plan and test cloud disaster recovery models for smaller disasters that can happen at different levels of the platform stack.

Because disasters can lead to revenue, productivity and reputation loss, it’s important to understand that having cloud based backup is just one piece of the data protection puzzle. Here at Veeam, we empower our cloud and service providers to offer services based on Veeam Cloud Connect Backup and Replication. However, the planning and testing of what happens once disaster strikes is ultimately up to either the organizations purchasing the services or the services company offering Disaster Recovery as a Service (DRaaS) that is wrapped around backup and replication offerings.

Why it’s Important to Plan:

In theory, planning for a disaster should be completed before selecting a product or solution. In reality, it’s common for organizations to purchase cloud DR services without an understanding of what needs to be put in place prior to workloads being backed up or replicated to a cloud provider or platform. Concepts like recovery time and recovery point objectives (RTPO) need to be understood and planned so, if a disaster strikes and failover has occurred, applications will not only be recovered within SLAs, but also that data on those recovered workloads will be useful in terms of its age.

Smaller RTPO values go hand-in-hand with increased complexity and administrative services overhead. When planning ahead, it’s important to size your cloud disaster platform and build the right disaster recovery model that’s tailored to your needs. When designing your DR plan, you will want to target strategies that relate to your core line of business applications and data.

A staged approach to recovery means that you recover tier-one applications first so the business can still function. A common tier-one application example is the mail server. Another is payroll systems, which could result in an organization being unable to pay its suppliers. Once your key applications and services are recovered, you can move on to recovering data. Keeping mind that archival data generally doesn’t need to be recovered first. Again, being able to categorize systems where your data sits and then working those categories into your recovery plan is important.

Planning should also include specific tasks and controls that need to be followed up on and adhered to during a disaster. It’s important to have specific run books executed by specific people for a smoother failover. Finally, it is critical to make sure that all IT staff know how to accessing applications and services after failover.

Why it’s Important to Test:

When talking about cloud based disaster recovery models, there are a number of factors to consider before a final sign-off and validation of the testing process. Once your plan is in place, test it regularly and make adjustments if issues arise from your tests. Partial failover testing should be treated with the same level of criticality as full failover testing.

Testing your DR plan ensures that business continuity can be achieved in a partial or full disaster. Beyond core backup and replication services testing, you should also test networking, server and application performances. Testing should even include situational testing with staff to be sure that they are able to efficiently access key business applications.

Cloud Disaster Recovery Models:

There are a number of different cloud disaster recovery models, that can be broken down into three main categories:

  • Private cloud
  • Hybrid cloud
  • Public cloud

Veeam Cloud Connect technology works for hybrid and public cloud models, while Veeam Backup & Replication works across all three models. The Veeam Cloud & Service Provider (VCSP) program offers Veeam Cloud Connect backup and replication classified as hybrid clouds offering RaaS (recovery-as-a-service). Public clouds, such as AWS and Azure, can be used with Veeam Backup & Replication to restore VM workloads. Private clouds are generally internal to organizations and leverage Veeam Backup & Replication to replicate or back up or for a backup copy of VMs between datacenter locations.

The ultimate goal here is to choose a cloud recovery model that best suits your organization. Each of the models above offer technological diversity and different price points. They also plan and test differently in order to, ultimately, execute a disaster plan.

When a partial or full disaster strikes, a thoroughly planned and well-tested DR plan, backed by the right disaster recovery model, will help you avoid a negative impact on your organization’s bottom line. Veeam and its cloud partners, service-provider partners and public cloud partners can help you build a solution that’s right for you.

First Published on veeam.com by me – modified and updated for republish today  

Veeam Powered Network v2 Azure Marketplace Deployment

Last month Veeam PN v2 went GA and was available for download and install from the veeam.com download page. As an update to that, we published v2 to the Azure Marketplace which is now available for deployment. As a quick refresher, Veeam PN was initially released as part of Direct Recovery to Azure and was marketed through the Azure Marketplace. In addition to that, for the initial release I went through a number of use cases for Veeam PN which are all still relevant with the release of v2:

With the addition of WireGuard replacing OpenVPN for site-to-site connectivity the list of use cases will be expanded and the use cased above enhanced. For most of my own use of Veeam PN, I have the Hub living in an Azure Region which I connect up into where ever I am around the world.

Now that the Veeam PN v2 is available from the Azure Marketplace I have created a quick deployment video that can be viewed below. For those that want a more step by step guide as a working example, you can reference this post from v1… essentially the process is the same.

  • Deploy Veeam PN Appliance from Azure Marketplace
  • Perform Initial Veeam PN Configuration to connect Azure
  • Configure SiteGateway and Clients

NOTE: One of the challenges that we introduced by shifting over to WireGuard is that there is no direct upgrade path from v1 to v2. With that, there needs to be a side by side stand up of v2 and v1 to enable a configuration migration… which at the moment if a manual process.

References:

https://anthonyspiteri.net/veeam-powered-network-azure-and-remote-site-configuration/

Cloud Tier Deep Dive Super Session On Demand!

Last week at VeeamON 2019, Dustin Albertson and myself delivered a two part deep dive session on Cloud Tier, which was released in Update 4 of Veeam Backup & Replication 9.5 in January. I’ve blogged about how Cloud Tier is one of the most innovative features i’ve seen in recent times and I have been able to dig under the covers of the technology from early in the development cycle. I have presented basic overviews to more complex deep dives over the past six or so months however at VeeamON 2019, Dustin and myself took it a step further and went even deeper.

Part I:

The first part of the Deep Dive was presented as the first session of the event, just after the opening keynote. It was on main stage and was all slide driven content that introduces the Cloud Tier, talks about the architecture and then dives deeper into its inner workings as well as us talking about some of the caveats.

Part II:

From the first session to the last session slot of the event…to finish up, Dustin and I presented a demo only super session which I have to admit… was one of the best sessions i’ve ever been a part of in terms of flow, audience participation and what we where able to actually show. We even where able to show off some of the new COPY functionality coming in v10.

There are a few scripts that we used in that session that I will look to release on GitHub over the next week or so.. so stay tuned for those! But for now, enjoy the session recordings embedded above.

« Older Entries