Tag Archives: Deployment

Deploying a Kubernetes Sandbox on VMware with Terraform

Terraform from HashiCorp has been a revelation for me since I started using it in anger last year to deploy VeeamPN into AWS. From there it has allowed me to automate lab Veeam deployments, configure a VMware Cloud on AWS SDDC networking and configure NSX vCloud Director Edges. The time saved by utilising the power of Terraform for repeatable deployment of infrastructure is huge.

When it came time for me to play around with Kubernetes to get myself up to speed with what was happening under the covers, I found a lot of online resources on how to install and configure a Kubernetes cluster on vSphere with a Master/Node deployment. I found that while I was tinkering, I would break deployments which meant I had to start from scratch and reinstall. This is where Terraform came into play. I set about to create a repeatable Terraform plan to deploy the required infrastructure onto vSphere and then have Terraform remotely execute the installation of Kubernetes once the VMs had been deployed.

I’m not the first to do a Kubernetes deployment on vSphere with Terraform, but I wanted to have something that was simple and repeatable to allow quick initial deployment. The above example uses KubeSpray along with Ansible with other dependancies. What I have ended up with is a self contained Terraform plan that can deploy a Kubernetes sandbox with Master plus a dynamic number of Nodes onto vSphere using CentOS as the base OS.

I haven’t automated is the final step of joining the nodes to the cluster automatically. That step takes a couple of seconds once everything else is deployed. I also haven’t integrated this with VMware Cloud Volumes and prepped for persistent volumes. Again, the idea here is to have a sandbox deployed within minutes to start tinkering with. For those that are new to Kubernetes it will help you get to the meat and gravy a lot quicker.

The Plan:

The GitHub Project is located here. Feel free to clone/fork it.

In a nutshell, I am utilising the Terraform vSphere Provider to deploy a VM from a preconfigured CentOS template which will end up being the Kubernetes Master. All the variables are defined in the terraform.tfvars file and no other configuration needs to happen outside of this file. Key variables are fed into the other tf declarations to deploy the Master and the Nodes as well as how to configure the Kubernetes cluster IP networking.

The main items to consider when entering in your own variables for the vSphere environment is to look at Line 18, and then Line 28-31. Line 18 defines the Kubernetes POD network which is used during the configuration and then Line 28-31 sets the number of nodes, the starting name for the VM and then uses two seperate variables to build out the IP addresses of the nodes. Pay attention to the format here of the network on Line 30 and then choose the starting IP for the Nodes on Line 31. This is used as a starting IP for the Node IPs and is enumerated in the code using the Terraform Count construct. 

By using Terraforms remote-exec provisioner, I am then using a combination of uploaded scripts and direct command line executions to configure and prep the Guest OS for the installation of Docker and Kubernetes.

You can see towards the end I have split up the command line scripts to ensure that the dynamic nature of the deployment is attained. The remote-exec on Line 82 pulls in the POD Network Variable an executes it inline. The same is done for Line 116-121 which configures the Guest OS hosts file to ensure name resolution. They are used together with two other scripts that are uploaded and executed.

The scripts have been build up from a number of online sources that go through how to install and configure Kubernetes manually. For the networking, I went with Weave Net after having a few issues with Flannel. There are lots of other networking options for Kubernetes… this is worth a read.

For better DNS resolution on the Guest OS VMs, the hosts file entries are constructed from the IP address settings set in the terraform.tfvars file.

Plan Execution:

The Nodes can be deployed dynamically using a Terraform var option when applying the plan. This allows for zero to as many nodes as you want for the sandbox… though three seems to be a nice round number.

The number of nodes can also be set in the terraform.tfvars file on Line 28. The variable set during the apply will take precedence over the one declared in the tfvars file. One of the great things about Terraform is we can alter the variable either way which will end up with nodes being added or removed automatically.

Once applied, the plan will work through the declaration files and the output will be similar to what it shown below. You can see in just over 5 minutes we have deployed one Master and three Nodes ready for further config.

The next step is to use the kubeadm join command on the nodes. For those paying attention the complete join command was outputted via the Terraform apply. Once applied on all nodes you should have a ready to go Kubernetes Cluster running on CentOS ontop of vSphere.

Conclusion:

While I do believe that the future of Kubernetes is such that a lot of the initial installation and configuration will be taken out of our hands and delivered to us via services based in Public Clouds or through platforms such as VMware’s Project Pacific having a way to deploy a Kubernetes cluster locally on vSphere is a great way to get to know what goes into making a containerisation platform tick.

Build it, break it, destroy it and then repeat… that is the beauty of Terraform!

https://github.com/anthonyspiteri/terraform

References:

https://github.com/anthonyspiteri/terraform/tree/master/deploy_kubernetes_CentOS

 

NSX Bytes: Controller Deployment Gone Bad?

With NSX becoming more and more widely available there are more NSX home labs being stood up and with that the chances of the NSX Controllers failing due to “Home Lab” nested issues become more prevalent. The NSX Controllers are Ubuntu Linux VMs and like any Linux VM are fairly sensitive to storage latency and other issues that appear in #NestedESXi or lab environments.

In one of my labs I came across an issue where I needed to redeploy all the NSX Controllers due to the VMs effectively breaking due to the storage being ripped out from under them…however when I went to redeploy the latency of the underlying nested storage was still not that great and the deployment got stuck in a loop as shown below.

No matter what I tried…vCenter restart, NSX Manager Reboot or Host Reboot the end result was the status remaining in the spinning state. If I tried to deploy another controller I would get the following error.

Controller IP address allocation failed for reason : cluster already contains controller of IP x.x.x.x

In my case the VM existed with the IP address configured against the VM however I could not access the cli to check NSX Cluster Status due to the fact the VM was in a pretty bad way.

Taking a look at the IP Pool allocations…even though the error said that the IP was in use, it wasn’t listed as such…meaning it was trying to use the first IP in the pool regardless.

Before going into the fix, it should be noted that if this scenario was to happen, and you where down to your last controller in production you would be best served to call up VMware Support and work through the restore options as without any controllers your VXLAN Unicast traffic isn’t going to be updated via the VTEPS and things will eventually grind to a halt. It’s also worth reading the VMware Docs on what to do if even one Controller is lost in a cluster. If this is in a lab scenario…we can be a little harsher!

While the Controller status is spinning in a Deploying state you can’t interact with it via the Web Client. You need to turn to the API to delete the NSX Controller and start again or deploy a new cluster set. First you will need the CONTROLLER-ID which can be easily seen via the Web Client. To remove the controller you need to call the API below using the Delete method. If the stuck controller is the last one in the cluster you need to add the ?forceRemoval=True option at the end of the call.

Once complete you should get a 200 status and a job data ID. If you check back in at the Web Client you should see the Controller VM being deleted and it being removed from the list under Controller Nodes. We are now free of the Deploying Loop and can rebuild or extend the NSX Controller cluster as is appropriate.

References:

https://pubs.vmware.com/NSX-62/index.jsp?topic=%2Fcom.vmware.nsx.admin.doc%2FGUID-3A84E9D1-CAC0-41B1-B45C-E032B230DB49.html