Tag Archives: Terraform

Orchestration of NSX by Terraform for Cloud Connect Replication with vCloud Director

That is probably the longest title i’ve ever had on this blog, however I wanted to highlight everything that is contained in this solution. Everything above works together to get the job done. The job in this case, is to configure an NSX Edge automatically using the vCloud Director Terraform provider to allow network connectivity for VMs that have been replicated into a vCloud Director tenant organization with Cloud Connect Replication.

With the release of Update 4 for Veeam Backup & Replication we enhanced Cloud Connect Replication to finally replicate into a Service Providers vCloud Director platform. In doing this we enabled tenants to take advantage of the advanced networking features of the NSX Edge Services Gateway. The only caveat to this was that unlike the existing Hardware Plan mechanism, where tenants where able to configure basic networking on the Network Extension Appliance (NEA), the configuration of the NSX Edge had to be done directly through the vCloud Director Tenant UI.

The Scenario:

When VMs are replicated into a vCD organisation with Cloud Connect Replication the expectation in a full failover is that if a disaster happened on-premises, workloads would be powered on in the service provider cloud and work exactly as if they where still on-premises. Access to services needs to be configured through the edge gateway. The edge gateway is then connected to the replica VMs via the vOrg Network in vCD.

In this example, we have a LAMP based web server that is publishing a WordPress site over HTTP and HTTPs.

The VM is being replicated to a Veeam Cloud Service Provider vCloud Director backed Cloud Connect Replication service.

During a disaster event at the on-premises end, we want to enact a failover of the replica living at in the vCloud Director Virtual Datacenter.

The VM replica will be fired up and the NSX Edge (the Network Extension Appliance pictured is used for partial failovers) associated to the vDC will allow the HTTP and HTTPS to be accessed from the outside world. The internal IP and Subnet of the VM is as it was on-premises. Cloud Connect Replication handles the mapping of the networks as part of the replication job.

Even during the early development days of this feature I was thinking about how this process could be automated somehow. With our previous Cloud Connect Replication networking, we would use the NEA as the edge device and allow basic configuration through the Failover Plan from the Backup & Replication console. That functionality still exists in Update 4, but only for non vCD backed replication.

The obvious way would be to tap into the vCloud Director APIs and configure the Edge directly. Taking that further, we could wrap that up in PowerShell and invoke the APIs from PowerShell, which would allow a simpler way to pass through variables and deal with payloads. However with the power that exists with the Terraform vCloud Director provider, it became a no brainer to leverage this to get the job done.

Configuring NSX Edge with Terraform:

In my previous post around Infrastructure as Code vs APIs I went through a specific example where I configured an NSX Edge using Terraform. I’m not going to go over that again, but what I have done is published that Terraform plan with all the code to GitHub.

The GitHub Project can be found here.

The end result after running the Terraform Plan is:

  • Allowed HTTP, HTTPS, SSH and ICMP access to a VM in a vDC
    • Defined as a variable as the External IP
    • Defined as a variable as the Internal IP
    • Defined as a variable as the vOrg Subnet
  • Configure DNAT rules to allow HTTP, HTTPS and SSH
  • Configure SNAT rule to allow outbound from the vOrg subnet

The variables that align with the VM and vORG network are defined in the terraform.tfvars file and need to be modified to match the on-premises network configuration. The variables are defined in the variables.tf file.

To add additional VMs and/or vOrg networks you will need to define additional variables in both files and add additional entires under the firewall_rules.tf and nat_fules.tf. I will look at ways to make this more elegant using Terraform arrays/lists and programatic constructs in future.

Creating PowerShell for Execution:

The Terraform plan can obviously be run standalone and the NSX Edge configuration can be actioned at any time, but the idea here is to take advantage of the script functionality that exists with Veeam backup and replication jobs and have the Terraform plan run upon completion of the Cloud Connect Replication job every time it is run.

To achieve this we need to create a PowerShell script:

GitHub – configure_vCD_VCCR_NSX_Edge.ps1

The PowerShell script initializes Terraform and downloads the Provider, ensures there is an upgrade in the future and then executes the Terraform plan. Remembering that that variables will change within the Terraform Plan its self, meaning these scripts remain unchanged.

Adding Post Script to Cloud Connect Replication Job:

The final step is to configure the PowerShell script to execute once the Cloud Connect Replication job has been run. This is done via a post script settings that can be found in Job Settings -> Advanced -> Scripts. Drop down to selected ps1 files and choose the location of the script.

That’s all that is required to have the PowerShell script executed once the replication job completes.

End Result:

Once the replication component of the job is complete, the post job script will be executed by the job.

This triggers the PowerShell, which runs the Terraform plan. It will check the existing state of the NSX Edge configuration and work out what configuration needs to be added. From the vCD Tenant UI, you should see the recent tasks list modifications to the NSX Edge Gateway by the user configured to access the vCD APIs via the Provider.

Taking a look at the NSX Edge Firewall and NAT configuration you should see that it has been configured as specified in the Terraform plan.

Which will match the current state of the Terraform plan

Conclusion:

At the end of the day, what we have done is achieved the orchestration of Veeam Cloud Connect Replication together with vCloud Director and NSX… facilitated by Terraform. This is something that Service Providers offering Cloud Connect Replication can provide to their clients as a way for them to define, control and manage the configuration of the NSX edge networking for their replicated infrastructure so that there is access to key services during a DR event.

While there might seem like a lot happening, this is a great example of leveraging Infrastructure as Code to automated as otherwise manual task. Once the Terraform is understood and the variables applied, the configuration of the NSX Edge will be consistent and in a desired state with the config checked and applied on every run of the replication job. The configuration will not fall out of line with what is required during a full failover and will ensure that services are available if a disaster occurs.

References:

https://github.com/anthonyspiteri/automation/tree/master/vccr_vcd_configure_nsx_edge

Infrastructure as Code vs RESTful APIs … A Working Example with Terraform and vCloud Director

Last week I wrote an opinion piece on Infrastructure as Code vs RESTful APIs. In a nutshell, I talked about how leveraging IaC instead of trying to code against APIs directly can be more palatable for IT professionals as it acts as a middle man interpreter between yourself and the infrastructure endpoints. IaC can be considered a black box that does the complicated lifting for you without having to deal with APIs directly.

As a follow up to that post I wanted to show an example about the differences between using direct APIs verses using an IaC tool like Terraform. Not surprisingly the example below features vCloud Director…but I think it speaks volumes to the message I was trying to get across in the introduction post.

The vCloud Director Terraform Provider was recently upgraded with the release of vCloud Director 9.7 and now sits at version 2.1 itself.

The Terraform Provider has been developed using Python and GO. It uses Client-Server model inside the hood where the client has been written using GO and server has been written using Python language. The core reason to use two different languages is to make a bridge between Terraform and Pyvcloud API. Pyvcloud is the SDK developed by VMware and provides an medium to talk to vCloud Director. Terraform uses GO to communicate where Pyvcloud has been written in Python3.

The above explanation as to how this provider is doing its thing highlights my previous points around any IaC tools. The abstraction of the infrastructure endpoint is easy to see… and in the below examples you will see it’s benefit for those who have not got the inclination to hit the APIs directly.

The assumption for both examples is that we are starting without any configured Firewall or NAT rules for the NSX Edge Services Gateway. Both methods are connecting as tenant’s of the vCD infrastructure and authenticating with Organisation level access.

The end result will be:

  • Allow HTTP, HTTPS and ICMP access to a VM living in a vDC
    • External IP is 82.221.98.109
    • Internal IP of VM is 172.17.0.240
    • VM Subnet is 172.17.0.0/24
  • Configure DNAT rules to allow HTTP and HTTPS
  • Configure SNAT rule to allow outbound from the VM subnet
Configuring Firewall and NAT Rules with RESTful API:

Firstly, to understand what vCD API operations need to be hit, we need to be familiar with the API Documentation. This will cover initial authentication as either a SYSTEM or Organizational admin and then what calls need to be made to get information relating to the current configuration and schema. Further to this, we need to also be familiar with the NSX API for vCD Documentation which covers how to interact with the network specific API operations possible from the vCD API endpoint.

We are going to be using Postman to execute against the vCD API. Postman is great because you can save your call history and reuse them at a later date. You can also save variable into Workspaces and also insert specific code to assist with things like authentication.

First step is to authenticate against the API and get a session authorization key that will allow you to feed that key back into subsequent requests. This authorization key will only last you a finite amount of time and will need to be regenerated.

Because we are using a decent RESTful API Client like Postman, there is a better way to programatically authenticate using a bearer access token as described in Tom Fojta’s post here when talking to the vCD API.

Once that is done we are authenticated as a vCD Organizational Admin and we can now query the NSX Edge Services Gateway (ESG) Settings for Firewall and NAT rules. I’ll walk through configuring a NAT rule for the purpose of the example, but the same method will be used to configure the Firewall as well.

A summary of the NAT requests can be seen below and found here.

Below we are querying the existing NAT rules using a GET request against the NSX ESG. What we are returned is an empty config in XML.

What needs to be done is to turn that request into a POST and craft an XML payload into the Body of the request so that we can configure the NAT rules as desired.

Redoing the GET request will now show that the NAT rules have been created.

And will be shown in the vCD Tenant UI

From here we can update, append, reset or delete the NAT rules as per the API documentation. Each one of those actions will require a new call to the API and the same process followed as above.

Configuring Firewall and NAT Rules with Terraform:

For a primer on the vCloud Director Terraform Provider, read this post and also head over to Luca’s post on Terraform with vCD. As with the RESTful API example above, I will use Terraform IaC to configure the same Tenant NSX Edge Gateway’s Firewall and NAT rules. What will become clear using Terraform for this is that it is a lot more efficient and elegant that going at it directly against the APIs.

Initially we needs to setup the required configuration items in order for the Terraform Provider to talk to the vCD API endpoint. To do this we need to setup a number of Terraform files that declare the variables required to connect to the vCD Organization and then configure the terraform.tfvars file that contains the specific variables.

We also create a provider .tf file to specifically call out the required Terraform provider and set the main variables.

We contain all this in a single folder (seen in the left pane above) for organization and portability…These folders can be called as Terraform Modules if desired in more complex, reusable plans.

We then create two more .tf files for the Firewall and NAT rules. The format is dictated by the Provider pages which gives examples. We can make things more portable by incorporating some of the variables we declared elsewhere in the code as shown below for the Edge Gateway name and Destination IP address.

Once the initial configuration work is done, all that’s required in order to apply the configuration is to initialize the Terraform Provider, make sure that the Terraform Plan is as expected… and then apply the plan against the Tenant’s Organization.

As the video shows… in less than a minute we have the NSX Firewall and NAT rules configured. More importantly, we now have a desired state which can be modified at any time by simple additions or subtractions to the Terraform code.

Wrapping it up:

From looking at both examples, it’s clear that both methods of configuration do the trick and it really depends on what sort of IT Professional you are in terms of which method is more suited to your day to day. For those that are working as automation engineers, working with APIs directly and/or integrating them into provisioning engines or applications is going to be your preferred method. For those that want to be able to deploy, configure and manager their own infrastructure in a more consumable way, using a Terraform provider is probably a better way

The great thing about Terraform in my eyes is the fact that you have declared the state that you want configured and once that has been actioned, you can easily check that state and modify it by changing the configuration items in the .tf files and reapplying the plan. For me it’s a much more efficient way to programatically configure vCD than doing the same configuration directly against the API.

Ohhh… and don’t forget… you are still allowed to use the UI as well… there is no shame in that!

Infrastructure as Code vs RESTful APIs …Terraform and Everything in Between!

While I was a little late to the game in understanding the power of Infrastructure as Code, I’ve spent a lot of the last twelve months working with Terraform specifically to help deploy and manage various types of my lab and cloud based infrastructure. Appreciating how IaC can fundamentally change the way in which you deploy and configure infrastructure, workloads and applications is not an easy thing to grasp…there can be a steep learning curve and lots of tools to choose from.

In terms of a definition as to what is IaC:

Infrastructure as code (IaC) is the process of managing and provisioning computer data centers through machine-readable definition files, rather than physical hardware configuration or interactive configuration tools. The IT infrastructure managed by this comprises both physical equipment such as bare-metal servers as well as virtual machines and associated configuration resources. The definitions may be in a version control system. It can use either scripts or declarative definitions, rather than manual processes, but the term is more often used to promote declarative approaches.

As represented above, there are many tools that are in the IaC space and everyone will gravitate towards their own favourite. The post where I borrowed that graphic from actually does a great job or talking about the differences and also why Terraform has become my standout for IT admins and why Hashicorp is on the up. I love how the article talks about the main differences between each one and specifically the part around the Procedural vs Declarative comparison where it states that declarative approach is where “you write code that specifies your desired end state, and the IcC tool itself is responsible for figuring out how to achieve that state.”

You Don’t Need to Know APIs to Survive!:

The statement above is fairly controversial… especially for those that have been preaching about IT professionals having to code in order to remain viable. A lot of that mindshare is centred around the API and the DevOps world…but not everyone needs to be a DevOp! IT is all about trying to solve problems and achieve outcomes… it doesn’t matter how you solve it… as long as the problem/outcome is solved/attained. Being as efficient as possible is also important when achieving that outcome.

My background prior to working with IaC tools like Terraform was working with and actioning outcomes directly against RESTFul APIs. I spent a lot of time specifically with vCloud Director and NSX APIs in order to help productise services in my last two roles so I feel like I know my way around a cURL command or Postman window. Let me point out that there is nothing wrong with having knowledge of APIs and that it is important for IT Professionals to understand the fundamentals of APIs and how they are accessed and used for programatic management of infrastructure and for creating applications.

I’m also not understating the skill that is involved in being able to understand and manipulate APIs directly and also being able to take those resources and create automated provisioning or actual applications that interact directly with APIs and create an outcome of their own. Remembering though that everyones skill set and level is different, and no one should feel any less an IT practitioner if they can’t code at a perceived higher level.

How IaC Tools Bridge the Gap:

In my VMUG UserCon session last month in Melbourne and Sydney I went through the Veeam SDDC Deployment Toolkit that was built with various IaC tooling (Terraform and Chef) as well as PowerShell, PowerCLI and some Bash Scripting. Ultimately putting all that together got us to a point where we could declaratively deploy a fully configured Veeam Backup & Replication server and fully configure it ready for action on any vSphere platform.

That aside, the other main point of the session was taking the audience through a very quick Terraform 101 introduction and demo. In both cities, I asked the crowd how much time they spent working with APIs to do “stuff” on their infrastructure… in both cities there was almost no one that raised their hands. After I went through the basic Terraform demo where I provisioned and then modified a VM from scratch I asked the audience if something like this would help them in their day to day roles… in both cities almost everyone put their hands up.

Therein lies the power of IaC tools like Terraform. I described it to the audience as a way to code without having to know the APIs directly. Terraform Providers act as the middle man or interpreter between yourself and the infrastructure endpoints. Consider it a black box that does the complicated lifting for you… this is the essence of Infrastructure as Code!

There are some that may disagree with me (and that’s fine) but I believe that for the majority of IT professionals that haven’t gotten around yet into transitioning away from “traditional” infrastructure management, configuration and deployment, that looking at a IaC tools like Terraform can help you not only survive…but also thrive!

References:

https://blog.gruntwork.io/why-we-use-terraform-and-not-chef-puppet-ansible-saltstack-or-cloudformation-7989dad2865c

https://en.wikipedia.org/wiki/Infrastructure_as_code

Quick Fix: Terraform Plan Fails on Guest Customizations and VMware Tools

Last week I was looking to add the deployment of a local CentOS virtual machine to the Deploy Veeam SDDC Toolkit project so that it included the option to deploy and configure a local Linux Repository. This could then can be added to the Backup & Replication server. As part of the deployment I call the Terraform vSphere Provider to clone and configure the virtual machine from a pre loaded CentOS template.

As shown below, I am using the Terraform customization commands to configure VM name, domain details as well as network configuration.

In configuring the CentOS template i did my usual install of Open VM Tools. When the Terraform plan executes we applied the VM was cloned without issue, but it failed at the Guest Customizations part.

The error is pretty clear and to test the error and fix, I tried applying the plan without any VMware Tools installed. In fact without VMware Tools the VM will not finish the initial deployment after the clone and be deleted by Terraform. I next installed open-vm-tools but ended up with the same scenario of the plan failing and the VM not being deployed. For some reason it does not like this version of the package being deployed.

Next test was to deploy the open-vm-tools-deploypkg as described in this VMwareKB. Now the Terraform plan executed to the point of cloning the VM and setting up the desired VM hardware and virtual network port group settings but still failed on the custom IP and hostname components of the customisation. This time with a slightly different error.

The final requirement is to pre-install the perl package onto the template. This allows for the in guest customizations to take place together with VMware Tools. Once I added that to the template the Terraform Plan succeeded without issue.

References:

https://kb.vmware.com/s/article/2075048

 

 

Automating the Creation of AWS VPC and Subnets for VMware Cloud on AWS

Yesterday I wrote about how to deploy a Single Host SDDC through the VMware Cloud on AWS web console. I mentioned some pre-requisites that where required in order for the deployment to be successful. Part of those is to setup an AWS VPC up with networking in place so that the VMC components can be deployed. While it’s not too hard a task to perform through the AWS console, in the spirit of the work I’m doing around automation I have gotten this done via a Terraform plan.

The max lifetime for a Single Instance deployment is 30 days from creation, but the reality is most people will/should be using this to test the waters and may only want to spin the SDDC up for a couple of hours a day, run some tests and then destroy it. That obviously has it’s disadvantages as well. The main one being that you have to start from scratch every time. Given the nature of the VMworld session around the automation and orchestration of Veeam and VMC, starting from scratch is not an issue however it was desirable to look for efficiencies during the re-deployment.

For those looking to save time and automate parts of the deployment beyond the AWS VPC, there are a number of PowerShell code example and modules available that along with the Terraform plan, reduce the time to get a new SDDC firing.

I’m using a combination of the above scripts to deploy a new SDDC once the AWS VPC has been created. The first one actually deploys the SDDC through PowerShell while the second one is a module that allows some interactivity via commandlets to do things such as export and import Firewall rules.

Using Terraform to Create AWS VPC for VMware Cloud on AWS:

The Terraform plan linked here on GitHub does a couple of things:

  • Creates a new VPC
  • Creates a VPC Network
  • Creates three VPC subnets across different Availability Zones
  • Associates the three VPN subnets to the main route table
  • Creates desired security group rules

https://github.com/anthonyspiteri/vmc_vpc_subnet_create

[Note] Even for the Single Instance Node SDDC it will take about 120 minutes to deploy…so that needs to be factored in in terms of the window to work on the instance.

Using Terraform to Deploy and Configure a Ready to use Backup Repo into an AWS VPC

A month of so ago I wrote a post on deploying Veeam Powered Network into an AWS VPC as a way to extend the VPC network to a remote site to leverage a Veeam Linux Repository running as an EC2 instance. During the course of deploying that solution I came across a lot of little check boxes and settings that needed to by tweaked in order to get things working. After that, I set myself the goal of trying to automate and orchestrate the deployment end to end.

For an overview of the intended purpose behind the solution head to the original blog post here. That post was mainly focused around the Veeam PN component, however I was using that as a mechanism to create a site-to-site connection to allow Veeam Backup & Replication to talk to the other EC2 instance which was the Veeam Linux Repository.

Terraform by HashiCorp:

In order to automate the deployment into AWS, I looked at Cloudformation first…but found that learning curve to be a little steep…so I went back to HashiCorp’s Terraform which I have been familiar with for a number of years, but never gotten my hands dirty with. HashiCorp specialise in Cloud Infrastructure Automation and their provisioning product is called Terraform.

Terraform is used to create, manage, and update infrastructure resources such as physical machines, VMs, network switches, containers, and more. Almost any infrastructure type can be represented as a resource in Terraform.

A provider is responsible for understanding API interactions and exposing resources. Providers generally are an IaaS (e.g. AWS, GCP, Microsoft Azure, OpenStack), PaaS (e.g. Heroku), or SaaS services (e.g. Terraform Enterprise, DNSimple, CloudFlare).

Terraform supports a host of providers and once you wrap your head around the basics and view some example code, provisioning Infrastructure as Code can be achieved with relatively no coding experience…however, as I did find out, you need to be careful in this world and not make the same initial mistake I did as explained in this post.

Going from Manual to Orchestrated with Automation:

The Terraform AWS provider is what I used to create the code required to deploy the required components. Like everything that’s automated, you need to understand the manual process first and that is where the previous experience came in handy. I knew what the end result was…I just needed to work backwards and make sure that the Terraform provider had all the instructions it needed to orchestrate the build.

the basic flow is:

  • Fetch AWS Access Key and Secret
  • Fetch AWS Key Pair
  • Create AWS VPC
    • Configure Networking and Routing for VPC
  • Create CentOS EC2 Instance for Veeam Linux Repo
    • Add new disk and set size
    • Execute configuration script
      • Install PERL modules
  • Create Ubuntu EC2 Instance for Veeam PN
    • Execute configuration script
      • Install VeeamPN modules from repo
  • Login to Veeam PN Web Console and Import Site Configuration.

I’ve uploaded the code to a GitHub project. An overview and instructions for the project can be found here. I’ve also posted a video to YouTube showing the end to end process which i’ve embedded below (best watched at 2x speed):

In order to get the Terraform plan to work there are some variables that need modifying in the GitHub Project and you will need to download, install and initialise Terraform. I’m intending to continue to tweak the project and complete the provisioning end to end, including the Veeam PN site configuration part at the end. The remote execution feature of Terraform allows some pretty cool things by way of script initiation.

References:

https://github.com/anthonyspiteri/automation/aws_create_veeamrepo_veeampn

https://www.terraform.io/intro/getting-started/install.html

 

Public Cloud and Infrastructure as Code…The Good and the Bad all in One Day!

I’m ok admitting that I am still learning as I progress through my career and I’m ok to admit when things go wrong. Learning from mistakes is a crucial part of learning…and I learnt a harsh lesson today! That Infrastructure as Code is as dangerous as it is awesome…and that the public cloud is an unforgiving place!

Earlier today I created a new GitHub Repository for a project i’ve been working on. Before I realised my mistake I had uploaded a Terraform variables file with my AWS Access and Secret Key. I picked up on this probably two minutes after I pushed the contents up to the public repository. Roughly five minutes later I deleted the repository and was about to start fresh without the credentials but then realised than my Terraform plan was failing with a credential error.

I logged into the AWS Console and saw that my main VPC and EC2 instances had been terminated and that there was 20 new instances in it’s place. I knew exactly at that point what had happened! I’d been compromised and I had handed over the keys on a silver web scraper platter.

My access key had been deleted and new ones created along with VPCs and Key Pairs in every single AWS region across the world. I deleted the new access key the malicious user created locking him out from doing any more damage, however in the space of ten minutes 240 EC2 instances in total where spun up. This was a little more than the twenty I thought I had dealt with initially…costing only $4.50…Amazing!

I contacted AWS support and let them know what happened. To their credit (and to my surprise) I had a call back within a few hours. Meanwhile they automatically restricted my account until I had satisfied a series of clean up steps so as to limit any more potential damage. The billing will be reversed as well so I am a little less in a panic when I see my current month breakdown.

The Bad Side of Infrastructure as Code and Public Cloud:

This example shows how dangerous the world we are living in can be. With AWS and alike providing brilliant API access into their provisioning platforms malicious users have seen an opportunity to use Infrastructure as Code as a way to spin up cloud resources in a matter of seconds. All they need is an in. And in my case, that in was a moment of stupidity…and even though I realised what I had done, all it took was less than five minutes for them to take advantage of my lack of concentration and exploit my security lapse. They also exploited the fact that I am new to this space and had not learnt best practice for storing credentials.

I was lucky that everything I had in AWS was just there for demo purpose and I had nothing of real important there. However, if this happened to be someone running business critical applications they would be in for a very very bad day. Everything was wiped! Even the backup software I had running in there using local snapshots…as ever a case for offsite copies if there was one! (Ergo – Veeam Agents and N2WS)

The Good Side of Infrastructure as Code and Public Cloud:

What good could come of this? Well, apart from learning a little more about Terraform and how to store credentials the awesome part was that all the work I had put in over the past couple of weeks getting a start with Infrastructure as Code and Terraform was that I was able to reprovision everything that I lost within 5 minutes…once my account restriction was lifted.

That’s the power of APIs and the applications that take advantage of them. And even though I copped a slap in the face today…I’m converted. This stuff is cool! We just need to be aware of the dangers that come and the fact that the coolness can be used and exploited in the wrong way as well.