Monthly Archives: September 2019

Assigning vSphere Tags with Terraform for Policy Based Backups

vSphere Tags are used to add attributes to VMs so that they can be used to help categorise VMs for further filtering or discovery. vSphere Tags have a number of use cases of which Melissa has a great blog post here on the power of vSphere Tags, their configuration and their application. Veeam fully supports the use of vSphere Tags when configuring Backup or Replication Jobs. The use of tags essentially transforms static jobs into dynamic policy based management for backup and replication.

Once a job is set to build its VM inventory from Tags there is almost no need to go back and amend the job settings to cater for VMs that are added or removed from vCenter. . Shown above, I have a Tag Category configured with two tags that are used to set a VM to be included or excluded in the backed job. Every time the job is run it will source the VM list based on these policy elements resulting in less management overheads and as a way to capture changes to the VM inventory.

vSphere Tags with Terraform:

I’ve been deploying a lot of lab VMs using Terraform of late. The nature of these deployments means that VMs are being created and destroyed often. I was finding that VMs that should be backed up where not being backed up, while VMs that shouldn’t be backed up where being backed up. This also leads to issues with the backup job…an example was this week, when I was working on my Kubernetes vSphere Terraform project.

The VMs where present at the start of the backup, but during the window the VMs had been destroyed leaving the job in an error state. These VMs being transient in nature should never have been part of the job. With the help of the tags I created above I was able to use Terraform to assign those tags to VMs created as part of the plan.

With Terraform you can create Tag Categories and Tags as part of the code. You can also leverage existing Tag Categories and Tags and feed that into the declarations as variables. For backup purposes, every VM that I create now has one of the two tags assigned to it. Outside of Terraform, I would apply this from the Web Client or via PowerShell, but the idea is to ensure a repeatable, declarative VM state where any VM created with Terraform has a tag applied.

Terraform vSphere Tag Configuration Example:

First step is to declare two data sources somewhere in the TF code. I typically place these into a main.tf file.

We have the option to hard code the names of the Tag and Tag Category in the data source, but a better way is to use variables for maximum portability.

The terraform.tfvars file is where we can set the variable

We also need to created a corresponding entry in the variables.tf

Finally we can set the tag information in the VM .tf file that references the data sources, that in turn reference the variables that have been configured.

The Result:

Once the Terraform plan has been applied and the VMs created the Terraform State file will contain references to the tags and the output from the running of the plan will show it assigned to the VM.

The Tag will be assigned to the VM and visible as an attribute in vCenter.

Any Veeam Backup Job that is configured to use Tags will now dynamically add or exclude VMs created by Terraform plan. In the case above, the VM has the TPM03-NO-BACKUP tag assigned which means it will be part of the exclusion list for the backup job.

Conclusion:

vSphere Tags are an excellent way to configure policy based backup and replication jobs through Veeam. Terraform is great for deploying infrastructure in a repeatable, declarative way. By having Terraform assign Tags to VMs as they are deployed allows us to control whether a VM is included or excluded from a backup policy. If deploying VMs from Terraform, take advantage of vSphere Tags and have them as part of your deployments.

References:

https://www.terraform.io/docs/providers/vsphere/r/tag.html

Deploying a Kubernetes Sandbox on VMware with Terraform

Terraform from HashiCorp has been a revelation for me since I started using it in anger last year to deploy VeeamPN into AWS. From there it has allowed me to automate lab Veeam deployments, configure a VMware Cloud on AWS SDDC networking and configure NSX vCloud Director Edges. The time saved by utilising the power of Terraform for repeatable deployment of infrastructure is huge.

When it came time for me to play around with Kubernetes to get myself up to speed with what was happening under the covers, I found a lot of online resources on how to install and configure a Kubernetes cluster on vSphere with a Master/Node deployment. I found that while I was tinkering, I would break deployments which meant I had to start from scratch and reinstall. This is where Terraform came into play. I set about to create a repeatable Terraform plan to deploy the required infrastructure onto vSphere and then have Terraform remotely execute the installation of Kubernetes once the VMs had been deployed.

I’m not the first to do a Kubernetes deployment on vSphere with Terraform, but I wanted to have something that was simple and repeatable to allow quick initial deployment. The above example uses KubeSpray along with Ansible with other dependancies. What I have ended up with is a self contained Terraform plan that can deploy a Kubernetes sandbox with Master plus a dynamic number of Nodes onto vSphere using CentOS as the base OS.

I haven’t automated is the final step of joining the nodes to the cluster automatically. That step takes a couple of seconds once everything else is deployed. I also haven’t integrated this with VMware Cloud Volumes and prepped for persistent volumes. Again, the idea here is to have a sandbox deployed within minutes to start tinkering with. For those that are new to Kubernetes it will help you get to the meat and gravy a lot quicker.

The Plan:

The GitHub Project is located here. Feel free to clone/fork it.

In a nutshell, I am utilising the Terraform vSphere Provider to deploy a VM from a preconfigured CentOS template which will end up being the Kubernetes Master. All the variables are defined in the terraform.tfvars file and no other configuration needs to happen outside of this file. Key variables are fed into the other tf declarations to deploy the Master and the Nodes as well as how to configure the Kubernetes cluster IP networking.

[Update] – It seems as though Kubernetes 1.16.0 was released over the past couple of days. This resulted in the scripts not installing the Master Node correctly due to an API issue when configuring the POD networking. Because of that i’ve updated the code to now use a variable that specifies the Kubernetes version being installed. This can be found on Line 30 of the terraform.tfvars. The default is 1.15.3.

The main items to consider when entering in your own variables for the vSphere environment is to look at Line 18, and then Line 28-31. Line 18 defines the Kubernetes POD network which is used during the configuration and then Line 28-31 sets the number of nodes, the starting name for the VM and then uses two seperate variables to build out the IP addresses of the nodes. Pay attention to the format here of the network on Line 30 and then choose the starting IP for the Nodes on Line 31. This is used as a starting IP for the Node IPs and is enumerated in the code using the Terraform Count construct. 

By using Terraforms remote-exec provisioner, I am then using a combination of uploaded scripts and direct command line executions to configure and prep the Guest OS for the installation of Docker and Kubernetes.

You can see towards the end I have split up the command line scripts to ensure that the dynamic nature of the deployment is attained. The remote-exec on Line 82 pulls in the POD Network Variable an executes it inline. The same is done for Line 116-121 which configures the Guest OS hosts file to ensure name resolution. They are used together with two other scripts that are uploaded and executed.

The scripts have been build up from a number of online sources that go through how to install and configure Kubernetes manually. For the networking, I went with Weave Net after having a few issues with Flannel. There are lots of other networking options for Kubernetes… this is worth a read.

For better DNS resolution on the Guest OS VMs, the hosts file entries are constructed from the IP address settings set in the terraform.tfvars file.

Plan Execution:

The Nodes can be deployed dynamically using a Terraform var option when applying the plan. This allows for zero to as many nodes as you want for the sandbox… though three seems to be a nice round number.

The number of nodes can also be set in the terraform.tfvars file on Line 28. The variable set during the apply will take precedence over the one declared in the tfvars file. One of the great things about Terraform is we can alter the variable either way which will end up with nodes being added or removed automatically.

Once applied, the plan will work through the declaration files and the output will be similar to what it shown below. You can see in just over 5 minutes we have deployed one Master and three Nodes ready for further config.

The next step is to use the kubeadm join command on the nodes. For those paying attention the complete join command was outputted via the Terraform apply. Once applied on all nodes you should have a ready to go Kubernetes Cluster running on CentOS ontop of vSphere.

Conclusion:

While I do believe that the future of Kubernetes is such that a lot of the initial installation and configuration will be taken out of our hands and delivered to us via services based in Public Clouds or through platforms such as VMware’s Project Pacific having a way to deploy a Kubernetes cluster locally on vSphere is a great way to get to know what goes into making a containerisation platform tick.

Build it, break it, destroy it and then repeat… that is the beauty of Terraform!

https://github.com/anthonyspiteri/terraform

References:

https://github.com/anthonyspiteri/terraform/tree/master/deploy_kubernetes_CentOS

 

Quick Fix – ESXi loses all Network Configuration… but still runs?

I had a really strange situation pop up in one of my lab environments over the weekend. vSAN Health was reporting that one of the hosts had lost networking connectivity to the rest of the cluster. This is something i’ve seen intermittently at times so waited for the condition to clear up. When it didn’t clear up, I went to look at the host to put it into maintenance mode, but found that I wasn’t getting the expected vSAN options.

I have seen situations recently where the enable vSAN option on the VMkernel interface had been cleared and vCenter thinks there are networking issue. I thought maybe it was this again. Not that that situation in its self was normal, but what I found when I went to view the state of the VMkernel Adapters from the vSphere Web Client was even stranger.

No adapters listed!

The host wasn’t reported as being disconnected and there was still connectivity to it via the Web UI and SSH. To make sure this wasn’t a visual error from the Web Client I SSH’ed into the host and ran esxcli to get a list of the VMkernel interfaces.

Unable to find vmknic for dvsID: xxxxx

So from the cli, I couldn’t get a list of interfaces either. I tried restarting the core services without luck and still had a host that was up with VMs running on it without issue, yet reporting networking issues and having no network interfaces configured per the running state.

Going to the console… the situation was not much better.

Nothing… no network or host information at all 🙂

Not being bale to reset the management network my only option from here was to reboot the server. Upon reboot the host did come back up online, however the networking was reporting as being 0.0.0.0/0 from the console and now the host was completely offline.

I decided to reboot using last know good configuration as shown below:

Upon reboot using the last known good configuration all previous network settings where restored and I had a list of VMkernel interfaces again present from the Web Client and from the cli.

Because of the “dirty” vSAN reboot, as is usual with anything that disrupts vSAN, the cluster needed some time to get its self back into working order and while some VMs where in an orphaned or unavailable state after reboot, once the vSAN re-sync had completed all VMs where back up and operational.

Cause and Offical Resolution:

The workaround to bring back the host networking seemed to do the trick however I don’t know what the root cause was for the host to lose all of its network config. I have an active case going with VMware Support at the moment with the logs being analysed. I’ll update this post with the results when they come through.

ESXi Version: 6.7.0.13006603
vSphere Version: 6.7.0.30000
NSX-v: 6.4.4.11197766

The Separation of Dev and Ops is Upon Us!

Apart from the K word, there was one other enduring message that I think a lot of people took home from VMworld 2019. That is, that Dev and Ops should be considered as seperate entities again. For the best part of the last five or so years the concept of DevOps, SecOps and other X-Ops has been perpetuated mainly due to the rise of consumable platforms outside the traditional control of IT operations people.

The pressure to DevOp has become very real in the IT communities that I am involved with. These circles are mainly made up of traditional infrastructure guys. I’ve written a few pieces around how the industry trend to try turn everyone into developers isn’t one that needs to be followed. Automation doesn’t equal development and there are a number of Infrastructure as Code tools that looks to bridge the gap between the developer and the infrastructure guy.

That isn’t to say that traditional IT guys shouldn’t be looking to push themselves to learn new things and improve and evolve. In fact, IT Ops needs to be able to code in slightly abstracted ways to work with APIs or leverage IaC tooling. However my view is that IT Ops number one role is to understand fundamentally what is happening within a platform, and be able to support infrastructure that developers can consume.

I had a bit of an aha moment this week while working on some Kubernetes (that word again!) automation work with Terraform which I’ll release later this week. The moment was when I was trying to get the Sock Shop demo working on my fresh Kubernetes cluster. I finally understood why Kubernetes had been created. Everything about the application was defined in the json files and deployed as is holistically through one command. It’s actually rather elegant compared to how I worked with developers back in the early days of web hosting on Windows and Linux web servers with their database backends and whatnot.

Regardless of the ease of deployment, I still had to understand the underlying networking and get the application to listen on external IPs and different ports. At this point I was doing dev and doing IT Ops in one. However this is all contained within my lab environment that has no bearing on the availability of the application, security or otherwise. This is where separation is required.

For developers, they want to consume services and take advantages of the constructs of a containerised platform like Docker paired together with the orchestrations and management of those resources that Kubernetes provides. They don’t care what’s under the hood and shouldn’t be concerned what their application runs on.

For IT Operations they want to be able to manage the supporting platforms as they did previously. The compute, the networking the storage… this is all still relevant in a hybrid world. They should absolutely still care about what’s under the hood and the impact applications can have to infrastructure.

VMware has introduced that (re)split of dev and ops with the introduction of Project Pacific and I applaude them for going against the grain and endorsing the separation of roles and responsibilities. Kubernetes and ESXi in one vSphere platform is where that vision lies. Outside of vSphere, it is still very true that devs can consume public clouds without a care about underlying infrastructure… but for me… it all comes back down to this…

Let devs be devs… and let IT Ops be IT Ops! They need to work together in this hybrid, multi-cloud world!

VMworld 2019 Veeam Wrap Up – Supportability Announcements and Session Recaps

VMworld 2019 is almost a distant memory, and with the focus now shifting to VMworld Europe happening later in the year I wanted to round out the US event with a wrap up of Veeam happenings at the event. It was a busy week for myself at the event which is representative of how much Veeam invests into the event to retain mindshare and also to support the community. I was able to attend a number of community events in between daily recap videos, partner meetings and official Veeam gatherings. The week as usual, was extremely rewarding.

My earlier post on Project Pacific has been well read this week, showing me that VMware’s move to integrate Kubernetes into vSphere has resonated with IT Pros. While from a Veeam product point of view we were able to publicly demo for the first time our long awaited CDP feature in the session I gave with Danny Allan along with a couple new features coming in v10, while Michael Cade and David Hill took people through Veeam’s portable data format which gives us simplicity, reliability and flexibility

Announcements:

Veeam announced some important supportability milestones around the event:

  • vCloud Director 9.7 support and validation – With Veeam Backup & Replication 9.5 Update 4b we retain existing support for vCloud Director 9.7. Visit the Veeam KB article to learn more about this and other topics.
  • vSAN 6.7 Update 2 certification – Veeam has successfully passed the vSAN 6.7 Update 2 certification. See the VMware Compatibility Guide for details.
  • vSphere 6.5 Update 3 now supported – Veeam officially supports this release. All documentation and release notes have been updated to reflect this.
  • Veeam continues to support VMware Cloud on AWS – SDDC 1.8 is supported and Veeam is officially certified. See the Veeam KB for more info.
  • NSX‑T Support – Customers can now receive a patch from support to make Veeam Backup & Replication v9.5 Update 4b compatible. This will be integrated into the upcoming v10. See the Veeam KB for more info.
Breakout Sessions and Techtalks:

Beyond the supportability, there where a number of Veeam related sessions at the event including two breakouts and a number of vBrownBag Tech Talks. The breakouts are gated this year, but all you need to do to view the sessions online is register for a VMworld account

Backups are just the start! Enhanced Data Mobility with Veeam (HBI3535BUS)

Enhancing Data Protection for vSphere with What’s Coming from Veeam (HBI3532BUS)

We also had a number of Veeam flavoured vBrownBag TechTalks… they have been embedded below.

VMworld 2019 Review – Project Pacific is a Stroke of Kubernetes Genius… but not without a catch!

Kubernetes Kubernetes, Kubernetes… say Kubernetes one more time… I dare you!

If it wasn’t clear what the key take away from VMworld 2019 was last week in San Francisco then I’ll repeat it one more time… Kubernetes! It was something which I predicted prior to the event in my session breakdown. And all jokes aside, with the amount of times we heard Kubernetes mentioned last week, we know that VMware signalled their intent to jump on the Kubernetes freight train and ride it all the way.

When you think about it, the announcement of Project Pacific isn’t a surprise. Apart from it being an obvious path to take to ensure VMware remains viable with IT Operations (IT Ops) and Developers (Devs) holistically, the more I learned about what it actually does under the hood, the more I came to belief that it is a stroke of genius. If it delivers technically on the its promise of full ESX and Kubernetes integration into the one vSphere platform, then it will be a huge success.

The whole premise of Project Pacific is to use Kubernetes to manage workloads via declarative specifications. Essentially allowing IT Ops and Devs to tell vSphere what they want and have it deploy and manage the infrastructure that ultimately serves as a platform for an application. This is all about the application! Abstracting all infrastructure and most of the platform to make the application work. We are now looking at a platform platform that controls all aspects of that lifecycle end to end.

By redesigning vSphere and implanting Kubernetes into the core of vSphere, VMware are able to take advantage of the things that make Kubernetes popular in todays cloud native world. A Kubernetes Namespace is effectively a tenancy in Kubernetes that will manage applications holistically and it’s at the namespace level where policies are applied. QoS, Security, Availability, Storage, Networking, Access Controls can all be applied top down from the Namespace. This gives IT Ops control, while still allowing devs to be agile.

I see this construct similar to what vCloud Director offers by way of a Virtual Datacenter with vApps used as the container for the VM workloads… in truth, the way in which vCD abstracted vSphere resources into tenancies and have policies applied was maybe ahead of it’s time?

DevOps Seperation:

DevOps has been a push for the last few years in our industry and the pressure to be a DevOp is huge. The reality of that is that both sets of disciplines have fundamentally different approaches to each others lines of work. This is why it was great to see VMware going out of their way to make the distinction between IT Ops and Devs.

Dev and IT Ops collaboration is paramount in todays IT world and with Project Pacific, when a Dev looks at the vSphere platform they see Kubernetes. When an IT Ops guy looks at vSphere he still sees vSphere and ESXi. This allows for integrated self service and allows more speed with control to deploy and manage the infrastructure and platforms the run applications.

Consuming Virtual Machines as Containers and Extensibility:

Kubernetes was described as a Platform Platform… meaning that you can run almost anything in Kubernetes as long as its declared. The above image shows a holistic application running in Project Pacific. The application is a mix of Kubernetes containers, VMs and other declared pieces… all of which can be controlled through vSphere and lives under that single Namespace.

When you log into the vSphere Console you can see a Kubernetes Cluster in vSphere and see the PODs and action on them as first class citizens. vSphere Native PODs are an optimized run time… apparently more optimized than baremetal… 8% faster than baremetal as we saw in the keynote on Monday. The way in which this is achieved is due to CPU virtualization having almost zero cost today. VMware has taken advantage of the advanced ESXi scheduler of which vSphere/ESXi have advanced operations across NUMA nodes along with the ability to strip out what is not needed when running containers on VMs so that there is optimal runtime for workloads.

vSphere will have two APIs with Project Pacific. The traditional vSphere API that has been refined over the years will remain and then, there will be the Kubernetes API. There is also be ability to create infrastructure with kubectl. Each ESXi Cluster becomes a Kubernetes cluster. The work done with vSphere Integrated Containers has not gone to waste and has been used in this new integrated platform.

PODs and VMs live side by side and declared through Kubernetes running in Kubernetes. All VMs can be stored in the container registry. Critical Venerability scans, encryption, signing can be leveraged at a container level that exist in the container ecosystem and applied to VMs.

There is obviously a lot more to Project Pacific, and there is a great presentation up on YouTube from Tech Field Day Extra at VMworld 2019 which I have embedded below. In my opinion, they are a must for all working in and around the VMware ecosystem.

The Catch!

So what is the catch? With 70 million workloads across 500,000+ customers VMware is thinking that with this functionality in place the current movement of refactoring of workloads to take advantage of cloud native constructs like containers, serverless or Kubernetes doesn’t need to happen… those, and existing workloads instantly become first class citizens on Kubernetes. Interesting theory.

Having been digging into the complex and very broad container world for a while now, and only just realising how far on it has become in terms of it being high on most IT agendas my currently belief is that the world of Kubernetes and containers is better placed to be consumed on public clouds. The scale and immediacy of Kubernetes platforms on Google, Azure or AWS without the need to ultimately still procure hardware and install software means that that model of consumption will still have an advantage over something like Project Pacific.

The one stroke of genius as mentioned is that by combining “traditional” workloads with Kubernetes as its control plane within vSphere the single, declarative, self service experience that it potentially offers might stop IT Operations from moving to public clouds… but is that enough to stop the developers forcing their hands?

It is going to be very interesting to see this in action and how well it is ultimately received!

More on Project Pacific

The videos below give a good level of technical background into Project Pacific, while Frank also has a good introductory post here, while Kit Colbert’s VMworld session is linked in the references.

References:

https://videos.vmworld.com/global/2019/videoplayer/28407