Category Archives: Terraform

The Separation of Dev and Ops is Upon Us!

Apart from the K word, there was one other enduring message that I think a lot of people took home from VMworld 2019. That is, that Dev and Ops should be considered as seperate entities again. For the best part of the last five or so years the concept of DevOps, SecOps and other X-Ops has been perpetuated mainly due to the rise of consumable platforms outside the traditional control of IT operations people.

The pressure to DevOp has become very real in the IT communities that I am involved with. These circles are mainly made up of traditional infrastructure guys. I’ve written a few pieces around how the industry trend to try turn everyone into developers isn’t one that needs to be followed. Automation doesn’t equal development and there are a number of Infrastructure as Code tools that looks to bridge the gap between the developer and the infrastructure guy.

That isn’t to say that traditional IT guys shouldn’t be looking to push themselves to learn new things and improve and evolve. In fact, IT Ops needs to be able to code in slightly abstracted ways to work with APIs or leverage IaC tooling. However my view is that IT Ops number one role is to understand fundamentally what is happening within a platform, and be able to support infrastructure that developers can consume.

I had a bit of an aha moment this week while working on some Kubernetes (that word again!) automation work with Terraform which I’ll release later this week. The moment was when I was trying to get the Sock Shop demo working on my fresh Kubernetes cluster. I finally understood why Kubernetes had been created. Everything about the application was defined in the json files and deployed as is holistically through one command. It’s actually rather elegant compared to how I worked with developers back in the early days of web hosting on Windows and Linux web servers with their database backends and whatnot.

Regardless of the ease of deployment, I still had to understand the underlying networking and get the application to listen on external IPs and different ports. At this point I was doing dev and doing IT Ops in one. However this is all contained within my lab environment that has no bearing on the availability of the application, security or otherwise. This is where separation is required.

For developers, they want to consume services and take advantages of the constructs of a containerised platform like Docker paired together with the orchestrations and management of those resources that Kubernetes provides. They don’t care what’s under the hood and shouldn’t be concerned what their application runs on.

For IT Operations they want to be able to manage the supporting platforms as they did previously. The compute, the networking the storage… this is all still relevant in a hybrid world. They should absolutely still care about what’s under the hood and the impact applications can have to infrastructure.

VMware has introduced that (re)split of dev and ops with the introduction of Project Pacific and I applaude them for going against the grain and endorsing the separation of roles and responsibilities. Kubernetes and ESXi in one vSphere platform is where that vision lies. Outside of vSphere, it is still very true that devs can consume public clouds without a care about underlying infrastructure… but for me… it all comes back down to this…

Let devs be devs… and let IT Ops be IT Ops! They need to work together in this hybrid, multi-cloud world!

Orchestration of NSX by Terraform for Cloud Connect Replication with vCloud Director

That is probably the longest title i’ve ever had on this blog, however I wanted to highlight everything that is contained in this solution. Everything above works together to get the job done. The job in this case, is to configure an NSX Edge automatically using the vCloud Director Terraform provider to allow network connectivity for VMs that have been replicated into a vCloud Director tenant organization with Cloud Connect Replication.

With the release of Update 4 for Veeam Backup & Replication we enhanced Cloud Connect Replication to finally replicate into a Service Providers vCloud Director platform. In doing this we enabled tenants to take advantage of the advanced networking features of the NSX Edge Services Gateway. The only caveat to this was that unlike the existing Hardware Plan mechanism, where tenants where able to configure basic networking on the Network Extension Appliance (NEA), the configuration of the NSX Edge had to be done directly through the vCloud Director Tenant UI.

The Scenario:

When VMs are replicated into a vCD organisation with Cloud Connect Replication the expectation in a full failover is that if a disaster happened on-premises, workloads would be powered on in the service provider cloud and work exactly as if they where still on-premises. Access to services needs to be configured through the edge gateway. The edge gateway is then connected to the replica VMs via the vOrg Network in vCD.

In this example, we have a LAMP based web server that is publishing a WordPress site over HTTP and HTTPs.

The VM is being replicated to a Veeam Cloud Service Provider vCloud Director backed Cloud Connect Replication service.

During a disaster event at the on-premises end, we want to enact a failover of the replica living at in the vCloud Director Virtual Datacenter.

The VM replica will be fired up and the NSX Edge (the Network Extension Appliance pictured is used for partial failovers) associated to the vDC will allow the HTTP and HTTPS to be accessed from the outside world. The internal IP and Subnet of the VM is as it was on-premises. Cloud Connect Replication handles the mapping of the networks as part of the replication job.

Even during the early development days of this feature I was thinking about how this process could be automated somehow. With our previous Cloud Connect Replication networking, we would use the NEA as the edge device and allow basic configuration through the Failover Plan from the Backup & Replication console. That functionality still exists in Update 4, but only for non vCD backed replication.

The obvious way would be to tap into the vCloud Director APIs and configure the Edge directly. Taking that further, we could wrap that up in PowerShell and invoke the APIs from PowerShell, which would allow a simpler way to pass through variables and deal with payloads. However with the power that exists with the Terraform vCloud Director provider, it became a no brainer to leverage this to get the job done.

Configuring NSX Edge with Terraform:

In my previous post around Infrastructure as Code vs APIs I went through a specific example where I configured an NSX Edge using Terraform. I’m not going to go over that again, but what I have done is published that Terraform plan with all the code to GitHub.

The GitHub Project can be found here.

The end result after running the Terraform Plan is:

  • Allowed HTTP, HTTPS, SSH and ICMP access to a VM in a vDC
    • Defined as a variable as the External IP
    • Defined as a variable as the Internal IP
    • Defined as a variable as the vOrg Subnet
  • Configure DNAT rules to allow HTTP, HTTPS and SSH
  • Configure SNAT rule to allow outbound from the vOrg subnet

The variables that align with the VM and vORG network are defined in the terraform.tfvars file and need to be modified to match the on-premises network configuration. The variables are defined in the variables.tf file.

To add additional VMs and/or vOrg networks you will need to define additional variables in both files and add additional entires under the firewall_rules.tf and nat_fules.tf. I will look at ways to make this more elegant using Terraform arrays/lists and programatic constructs in future.

Creating PowerShell for Execution:

The Terraform plan can obviously be run standalone and the NSX Edge configuration can be actioned at any time, but the idea here is to take advantage of the script functionality that exists with Veeam backup and replication jobs and have the Terraform plan run upon completion of the Cloud Connect Replication job every time it is run.

To achieve this we need to create a PowerShell script:

GitHub – configure_vCD_VCCR_NSX_Edge.ps1

The PowerShell script initializes Terraform and downloads the Provider, ensures there is an upgrade in the future and then executes the Terraform plan. Remembering that that variables will change within the Terraform Plan its self, meaning these scripts remain unchanged.

Adding Post Script to Cloud Connect Replication Job:

The final step is to configure the PowerShell script to execute once the Cloud Connect Replication job has been run. This is done via a post script settings that can be found in Job Settings -> Advanced -> Scripts. Drop down to selected ps1 files and choose the location of the script.

That’s all that is required to have the PowerShell script executed once the replication job completes.

End Result:

Once the replication component of the job is complete, the post job script will be executed by the job.

This triggers the PowerShell, which runs the Terraform plan. It will check the existing state of the NSX Edge configuration and work out what configuration needs to be added. From the vCD Tenant UI, you should see the recent tasks list modifications to the NSX Edge Gateway by the user configured to access the vCD APIs via the Provider.

Taking a look at the NSX Edge Firewall and NAT configuration you should see that it has been configured as specified in the Terraform plan.

Which will match the current state of the Terraform plan

Conclusion:

At the end of the day, what we have done is achieved the orchestration of Veeam Cloud Connect Replication together with vCloud Director and NSX… facilitated by Terraform. This is something that Service Providers offering Cloud Connect Replication can provide to their clients as a way for them to define, control and manage the configuration of the NSX edge networking for their replicated infrastructure so that there is access to key services during a DR event.

While there might seem like a lot happening, this is a great example of leveraging Infrastructure as Code to automated as otherwise manual task. Once the Terraform is understood and the variables applied, the configuration of the NSX Edge will be consistent and in a desired state with the config checked and applied on every run of the replication job. The configuration will not fall out of line with what is required during a full failover and will ensure that services are available if a disaster occurs.

References:

https://github.com/anthonyspiteri/automation/tree/master/vccr_vcd_configure_nsx_edge

Infrastructure as Code vs RESTful APIs … A Working Example with Terraform and vCloud Director

Last week I wrote an opinion piece on Infrastructure as Code vs RESTful APIs. In a nutshell, I talked about how leveraging IaC instead of trying to code against APIs directly can be more palatable for IT professionals as it acts as a middle man interpreter between yourself and the infrastructure endpoints. IaC can be considered a black box that does the complicated lifting for you without having to deal with APIs directly.

As a follow up to that post I wanted to show an example about the differences between using direct APIs verses using an IaC tool like Terraform. Not surprisingly the example below features vCloud Director…but I think it speaks volumes to the message I was trying to get across in the introduction post.

The vCloud Director Terraform Provider was recently upgraded with the release of vCloud Director 9.7 and now sits at version 2.1 itself.

The Terraform Provider has been developed using Python and GO. It uses Client-Server model inside the hood where the client has been written using GO and server has been written using Python language. The core reason to use two different languages is to make a bridge between Terraform and Pyvcloud API. Pyvcloud is the SDK developed by VMware and provides an medium to talk to vCloud Director. Terraform uses GO to communicate where Pyvcloud has been written in Python3.

The above explanation as to how this provider is doing its thing highlights my previous points around any IaC tools. The abstraction of the infrastructure endpoint is easy to see… and in the below examples you will see it’s benefit for those who have not got the inclination to hit the APIs directly.

The assumption for both examples is that we are starting without any configured Firewall or NAT rules for the NSX Edge Services Gateway. Both methods are connecting as tenant’s of the vCD infrastructure and authenticating with Organisation level access.

The end result will be:

  • Allow HTTP, HTTPS and ICMP access to a VM living in a vDC
    • External IP is 82.221.98.109
    • Internal IP of VM is 172.17.0.240
    • VM Subnet is 172.17.0.0/24
  • Configure DNAT rules to allow HTTP and HTTPS
  • Configure SNAT rule to allow outbound from the VM subnet
Configuring Firewall and NAT Rules with RESTful API:

Firstly, to understand what vCD API operations need to be hit, we need to be familiar with the API Documentation. This will cover initial authentication as either a SYSTEM or Organizational admin and then what calls need to be made to get information relating to the current configuration and schema. Further to this, we need to also be familiar with the NSX API for vCD Documentation which covers how to interact with the network specific API operations possible from the vCD API endpoint.

We are going to be using Postman to execute against the vCD API. Postman is great because you can save your call history and reuse them at a later date. You can also save variable into Workspaces and also insert specific code to assist with things like authentication.

First step is to authenticate against the API and get a session authorization key that will allow you to feed that key back into subsequent requests. This authorization key will only last you a finite amount of time and will need to be regenerated.

Because we are using a decent RESTful API Client like Postman, there is a better way to programatically authenticate using a bearer access token as described in Tom Fojta’s post here when talking to the vCD API.

Once that is done we are authenticated as a vCD Organizational Admin and we can now query the NSX Edge Services Gateway (ESG) Settings for Firewall and NAT rules. I’ll walk through configuring a NAT rule for the purpose of the example, but the same method will be used to configure the Firewall as well.

A summary of the NAT requests can be seen below and found here.

Below we are querying the existing NAT rules using a GET request against the NSX ESG. What we are returned is an empty config in XML.

What needs to be done is to turn that request into a POST and craft an XML payload into the Body of the request so that we can configure the NAT rules as desired.

Redoing the GET request will now show that the NAT rules have been created.

And will be shown in the vCD Tenant UI

From here we can update, append, reset or delete the NAT rules as per the API documentation. Each one of those actions will require a new call to the API and the same process followed as above.

Configuring Firewall and NAT Rules with Terraform:

For a primer on the vCloud Director Terraform Provider, read this post and also head over to Luca’s post on Terraform with vCD. As with the RESTful API example above, I will use Terraform IaC to configure the same Tenant NSX Edge Gateway’s Firewall and NAT rules. What will become clear using Terraform for this is that it is a lot more efficient and elegant that going at it directly against the APIs.

Initially we needs to setup the required configuration items in order for the Terraform Provider to talk to the vCD API endpoint. To do this we need to setup a number of Terraform files that declare the variables required to connect to the vCD Organization and then configure the terraform.tfvars file that contains the specific variables.

We also create a provider .tf file to specifically call out the required Terraform provider and set the main variables.

We contain all this in a single folder (seen in the left pane above) for organization and portability…These folders can be called as Terraform Modules if desired in more complex, reusable plans.

We then create two more .tf files for the Firewall and NAT rules. The format is dictated by the Provider pages which gives examples. We can make things more portable by incorporating some of the variables we declared elsewhere in the code as shown below for the Edge Gateway name and Destination IP address.

Once the initial configuration work is done, all that’s required in order to apply the configuration is to initialize the Terraform Provider, make sure that the Terraform Plan is as expected… and then apply the plan against the Tenant’s Organization.

As the video shows… in less than a minute we have the NSX Firewall and NAT rules configured. More importantly, we now have a desired state which can be modified at any time by simple additions or subtractions to the Terraform code.

Wrapping it up:

From looking at both examples, it’s clear that both methods of configuration do the trick and it really depends on what sort of IT Professional you are in terms of which method is more suited to your day to day. For those that are working as automation engineers, working with APIs directly and/or integrating them into provisioning engines or applications is going to be your preferred method. For those that want to be able to deploy, configure and manager their own infrastructure in a more consumable way, using a Terraform provider is probably a better way

The great thing about Terraform in my eyes is the fact that you have declared the state that you want configured and once that has been actioned, you can easily check that state and modify it by changing the configuration items in the .tf files and reapplying the plan. For me it’s a much more efficient way to programatically configure vCD than doing the same configuration directly against the API.

Ohhh… and don’t forget… you are still allowed to use the UI as well… there is no shame in that!

Infrastructure as Code vs RESTful APIs …Terraform and Everything in Between!

While I was a little late to the game in understanding the power of Infrastructure as Code, I’ve spent a lot of the last twelve months working with Terraform specifically to help deploy and manage various types of my lab and cloud based infrastructure. Appreciating how IaC can fundamentally change the way in which you deploy and configure infrastructure, workloads and applications is not an easy thing to grasp…there can be a steep learning curve and lots of tools to choose from.

In terms of a definition as to what is IaC:

Infrastructure as code (IaC) is the process of managing and provisioning computer data centers through machine-readable definition files, rather than physical hardware configuration or interactive configuration tools. The IT infrastructure managed by this comprises both physical equipment such as bare-metal servers as well as virtual machines and associated configuration resources. The definitions may be in a version control system. It can use either scripts or declarative definitions, rather than manual processes, but the term is more often used to promote declarative approaches.

As represented above, there are many tools that are in the IaC space and everyone will gravitate towards their own favourite. The post where I borrowed that graphic from actually does a great job or talking about the differences and also why Terraform has become my standout for IT admins and why Hashicorp is on the up. I love how the article talks about the main differences between each one and specifically the part around the Procedural vs Declarative comparison where it states that declarative approach is where “you write code that specifies your desired end state, and the IcC tool itself is responsible for figuring out how to achieve that state.”

You Don’t Need to Know APIs to Survive!:

The statement above is fairly controversial… especially for those that have been preaching about IT professionals having to code in order to remain viable. A lot of that mindshare is centred around the API and the DevOps world…but not everyone needs to be a DevOp! IT is all about trying to solve problems and achieve outcomes… it doesn’t matter how you solve it… as long as the problem/outcome is solved/attained. Being as efficient as possible is also important when achieving that outcome.

My background prior to working with IaC tools like Terraform was working with and actioning outcomes directly against RESTFul APIs. I spent a lot of time specifically with vCloud Director and NSX APIs in order to help productise services in my last two roles so I feel like I know my way around a cURL command or Postman window. Let me point out that there is nothing wrong with having knowledge of APIs and that it is important for IT Professionals to understand the fundamentals of APIs and how they are accessed and used for programatic management of infrastructure and for creating applications.

I’m also not understating the skill that is involved in being able to understand and manipulate APIs directly and also being able to take those resources and create automated provisioning or actual applications that interact directly with APIs and create an outcome of their own. Remembering though that everyones skill set and level is different, and no one should feel any less an IT practitioner if they can’t code at a perceived higher level.

How IaC Tools Bridge the Gap:

In my VMUG UserCon session last month in Melbourne and Sydney I went through the Veeam SDDC Deployment Toolkit that was built with various IaC tooling (Terraform and Chef) as well as PowerShell, PowerCLI and some Bash Scripting. Ultimately putting all that together got us to a point where we could declaratively deploy a fully configured Veeam Backup & Replication server and fully configure it ready for action on any vSphere platform.

That aside, the other main point of the session was taking the audience through a very quick Terraform 101 introduction and demo. In both cities, I asked the crowd how much time they spent working with APIs to do “stuff” on their infrastructure… in both cities there was almost no one that raised their hands. After I went through the basic Terraform demo where I provisioned and then modified a VM from scratch I asked the audience if something like this would help them in their day to day roles… in both cities almost everyone put their hands up.

Therein lies the power of IaC tools like Terraform. I described it to the audience as a way to code without having to know the APIs directly. Terraform Providers act as the middle man or interpreter between yourself and the infrastructure endpoints. Consider it a black box that does the complicated lifting for you… this is the essence of Infrastructure as Code!

There are some that may disagree with me (and that’s fine) but I believe that for the majority of IT professionals that haven’t gotten around yet into transitioning away from “traditional” infrastructure management, configuration and deployment, that looking at a IaC tools like Terraform can help you not only survive…but also thrive!

References:

https://blog.gruntwork.io/why-we-use-terraform-and-not-chef-puppet-ansible-saltstack-or-cloudformation-7989dad2865c

https://en.wikipedia.org/wiki/Infrastructure_as_code

Quick Fix: Terraform Plan Fails on Guest Customizations and VMware Tools

Last week I was looking to add the deployment of a local CentOS virtual machine to the Deploy Veeam SDDC Toolkit project so that it included the option to deploy and configure a local Linux Repository. This could then can be added to the Backup & Replication server. As part of the deployment I call the Terraform vSphere Provider to clone and configure the virtual machine from a pre loaded CentOS template.

As shown below, I am using the Terraform customization commands to configure VM name, domain details as well as network configuration.

In configuring the CentOS template i did my usual install of Open VM Tools. When the Terraform plan executes we applied the VM was cloned without issue, but it failed at the Guest Customizations part.

The error is pretty clear and to test the error and fix, I tried applying the plan without any VMware Tools installed. In fact without VMware Tools the VM will not finish the initial deployment after the clone and be deleted by Terraform. I next installed open-vm-tools but ended up with the same scenario of the plan failing and the VM not being deployed. For some reason it does not like this version of the package being deployed.

Next test was to deploy the open-vm-tools-deploypkg as described in this VMwareKB. Now the Terraform plan executed to the point of cloning the VM and setting up the desired VM hardware and virtual network port group settings but still failed on the custom IP and hostname components of the customisation. This time with a slightly different error.

The final requirement is to pre-install the perl package onto the template. This allows for the in guest customizations to take place together with VMware Tools. Once I added that to the template the Terraform Plan succeeded without issue.

References:

https://kb.vmware.com/s/article/2075048