Orchestration of NSX by Terraform for Cloud Connect Replication with vCloud Director

That is probably the longest title i’ve ever had on this blog, however I wanted to highlight everything that is contained in this solution. Everything above works together to get the job done. The job in this case, is to configure an NSX Edge automatically using the vCloud Director Terraform provider to allow network connectivity for VMs that have been replicated into a vCloud Director tenant organization with Cloud Connect Replication. With the release of Update 4 for Veeam Backup & Replication we enhanced Cloud Connect Replication to finally replicate into a Service Providers vCloud Director platform. In doing this we enabled tenants to take advantage of the advanced networking features of the NSX Edge Services Gateway. The only caveat to this was that unlike the existing Hardware Plan mechanism, where tenants where able to configure basic networking on the Network Extension Appliance (NEA), the configuration of the NSX Edge had to be done directly through the vCloud Director Tenant UI.

The Scenario:

When VMs are replicated into a vCD organisation with Cloud Connect Replication the expectation in a full failover is that if a disaster happened on-premises, workloads would be powered on in the service provider cloud and work exactly as if they where still on-premises. Access to services needs to be configured through the edge gateway. The edge gateway is then connected to the replica VMs via the vOrg Network in vCD. In this example, we have a LAMP based web server that is publishing a Wordpress site over HTTP and HTTPs. The VM is being replicated to a Veeam Cloud Service Provider vCloud Director backed Cloud Connect Replication service. During a disaster event at the on-premises end, we want to enact a failover of the replica living at in the vCloud Director Virtual Datacenter. The VM replica will be fired up and the NSX Edge (the Network Extension Appliance pictured is used for (http://anthonyspiteri.net/cloud-connect-replication-partial-failover-example/)) associated to the vDC will allow the HTTP and HTTPS to be accessed from the outside world. The internal IP and Subnet of the VM is as it was on-premises. Cloud Connect Replication handles the mapping of the networks as part of the replication job.

Even during the early development days of this feature I was thinking about how this process could be automated somehow. With our previous Cloud Connect Replication networking, we would use the NEA as the edge device and allow basic configuration through the Failover Plan from the Backup & Replication console. That functionality still exists in Update 4, but only for non vCD backed replication.

The obvious way would be to tap into the vCloud Director APIs and configure the Edge directly. Taking that further, we could wrap that up in PowerShell and invoke the APIs from PowerShell, which would allow a simpler way to pass through variables and deal with payloads. However with the power that exists with the Terraform vCloud Director provider, it became a no brainer to leverage this to get the job done.

Configuring NSX Edge with Terraform:

In my previous post around Infrastructure as Code vs APIs I went through a specific example where I configured an NSX Edge using Terraform. I’m not going to go over that again, but what I have done is published that Terraform plan with all the code to GitHub. The GitHub Project can be found (https://github.com/anthonyspiteri/automation/tree/master/vccr_vcd_configure_nsx_edge). The end result after running the Terraform Plan is:

Allowed HTTP, HTTPS, SSH and ICMP access to a VM in a vDC
Defined as a variable as the External IP
Defined as a variable as the Internal IP
Defined as a variable as the vOrg Subnet
Configure DNAT rules to allow HTTP, HTTPS and SSH
Configure SNAT rule to allow outbound from the vOrg subnet

The variables that align with the VM and vORG network are defined in the terraform.tfvars file and need to be modified to match the on-premises network configuration. The variables are defined in the variables.tf file. To add additional VMs and/or vOrg networks you will need to define additional variables in both files and add additional entires under the firewall_rules.tf and nat_fules.tf. I will look at ways to make this more elegant using Terraform arrays/lists and programatic constructs in future.

Creating PowerShell for Execution:

The Terraform plan can obviously be run standalone and the NSX Edge configuration can be actioned at any time, but the idea here is to take advantage of the (https://helpcenter.veeam.com/docs/backup/vsphere/backup_job_advanced_scripts_vm.html?ver=95u4) and have the Terraform plan run upon completion of the Cloud Connect Replication job every time it is run. To achieve this we need to create a PowerShell script:

<#
.SYNOPSIS
configure_vCD_VCCR_NSX_EDGE.ps1
.DESCRIPTION
#>

function Run-Terraform 
 {
 $host.ui.RawUI.WindowTitle = "Configuring NSX Edge for VCCR Failover"
 & .\terraform.exe init
 & .\terraform.exe init -upgrade
 & .\terraform.exe apply -auto-approve
 }

Run-Terraform

GitHub - configure_vCD_VCCR_NSX_Edge.ps1 The PowerShell script initializes Terraform and downloads the Provider, ensures there is an upgrade in the future and then executes the Terraform plan. Remembering that that variables will change within the Terraform Plan its self, meaning these scripts remain unchanged.

Adding Post Script to Cloud Connect Replication Job:

The final step is to configure the PowerShell script to execute once the Cloud Connect Replication job has been run. This is done via a post script settings that can be found in Job Settings -> Advanced -> Scripts. Drop down to selected ps1 files and choose the location of the script. That’s all that is required to have the PowerShell script executed once the replication job completes.

End Result:

Once the replication component of the job is complete, the post job script will be executed by the job. This triggers the PowerShell, which runs the Terraform plan. It will check the existing state of the NSX Edge configuration and work out what configuration needs to be added. From the vCD Tenant UI, you should see the recent tasks list modifications to the NSX Edge Gateway by the user configured to access the vCD APIs via the Provider. Taking a look at the NSX Edge Firewall and NAT configuration you should see that it has been configured as specified in the Terraform plan. Which will match the current state of the Terraform plan

Conclusion:

At the end of the day, what we have done is achieved the orchestration of Veeam Cloud Connect Replication together with vCloud Director and NSX… facilitated by Terraform. This is something that Service Providers offering Cloud Connect Replication can provide to their clients as a way for them to define, control and manage the configuration of the NSX edge networking for their replicated infrastructure so that there is access to key services during a DR event. While there might seem like a lot happening, this is a great example of leveraging Infrastructure as Code to automated as otherwise manual task. Once the Terraform is understood and the variables applied, the configuration of the NSX Edge will be consistent and in a desired state with the config checked and applied on every run of the replication job. The configuration will not fall out of line with what is required during a full failover and will ensure that services are available if a disaster occurs. References: https://github.com/anthonyspiteri/automation/tree/master/vccr_vcd_configure_nsx_edge