Monthly Archives: September 2014

NSX Bytes: 6.1 Upgrade Part 3|4 – Clusters, Logical Switches, Edges and Guest Introspection

In Part 1 and Part 2 we went through the relatively straight forward upgrade of the NSX Manager and the more involved upgrading the NSX Controllers. To complete things…in this post we will upgrade the rest!

NSX components must be upgraded in the following order:

  1. NSX Manager
  2. NSX controller
  3. Clusters and Logical Switches
  4. NSX Edge and Guest Introspection

The upgrade process is managed by the NSX Manager. If the upgrade of a component fails or is interrupted and you need to repeat or restart the upgrade, the process begins from the point at which it stopped; it does not start over from the beginning

3. Clusters and Logical Switches:

Once you have updated the NSX Manager and Controllers you need upgrade the Clusters and their ESXi Hosts. During the host upgrade it will need to be put into Maintenance Mode and the NSX Manager will work through vCenter to perform a rolling upgrade of all hosts in the Cluster.

Go to the Installation Tab of the Networking & Security Menu in the Web Client and click on the Host Preparation Tab. You should see and Upgrade option in the Installation Status Column.

If you do run into an issue with any of the host updates an alert will show up which you can click into to get details of the error…this typically would be due to a time-out while trying to put the host into Maintenance Mode…if that happens click Resolve to rerun and complete the upgrade.

What we have effectively upgraded during this process are the NSX Kernel Modules for Port Security, VXLAN, Distributed Firewall, Switching and Routing up to the latest 6.1 install bundles.

4. NSX Edge and Guest Introspection:

The final steps is to upgrade any NSX Edges and what was previously vShield Endpoints (which is now rather interestingly called Guest Introspection) In the Networking & Security Menu go to NSX Edges. Right Click on each Edge that needs upgrading as shown below and select Upgrade Version as shown below:

During the upgrade a new OVF Appliance will be deployed and the config from the existing Edge will be imported into the new updated instance…there may be 1-5 seconds of downtime during this process. Once done the version number will show 6.1

Moving onto the final part of the upgrade…if you are running Services that require vShield Endpoints pre NSX you might be a little confused with the whole separate Agent VM thing let alone the renaming of vShield Endpoint to Guest Introspection in 6.1…from what I can read there isn’t any functional change from 6.0.x but the renaming completes the evolution from vCNS to NSX.

To upgrade On the Installation tab, click on Service Deployments and any previous Endpoints you had should have an Upgrade Available Message and Icon in the Installation Status column.

Select the service and click on the Upgrade Icon and you will be presented with a Confirm Upgrade Window which gives you options to modify Storage, Portgroup and IP Pool details if you wish. You can also schedule the upgrade is desired.

Working through the upgrade it was interesting to see what was happening in vCenter to the existing Endpoint VMs. As you can see below the process is similar to the Edge Upgrades where a new OVF is deployed and the config copied over finished off with a VM name change.

Once deployed the Guest Introspection version should sit at 6.1 as shown below:

At this point NSX has been fully upgrades to 6.1 and the new features can be accessed and exploited.




NSX Bytes: 6.1 Upgrade Part 2 – NSX Controller

In Part 1 we went through the relatively straight forward upgrade of the NSX Manager. In this post we will upgrade the NSX Controllers from 6.0.x to 6.1.

NSX components must be upgraded in the following order:

  1. NSX Manager
  2. NSX controller
  3. Clusters and Logical Switches
  4. NSX Edge and Guest Introspection

The upgrade process is managed by the NSX Manager. If the upgrade of a component fails or is interrupted and you need to repeat or restart the upgrade, the process begins from the point at which it stopped; it does not start over from the beginning

2. NSX Controller:

In the vSphere Web Client head to Networking & Security -> Installation and the Management Tab. You should see that your NSX Manager is at Version 6.1.0.x and that the Controller Cluster Status has Upgrade Available.

The notes suggest that you should upgrade the controllers during a Maintenance window…more on the potential reasons for that later on. Depending on the size of your Controller Cluster (3, 5, 7 etc) you need to ensure that they are all connected and that the cluster has quorum.

One way to verify the Controller Cluster has quorum is to SSH to one of the Controllers and enter in the show control-cluster status command…you should see Majority Status as Connected to Cluster Majority

There was a little confusion as to weather the upgrading of the Controllers would cause VXLAN backed VMs to loose connectivity. During the upgrade there is a period of non-majority which could result in new VMs dropping traffic. Point and case, during this maintenance window don’t allow the creation of new VMs or allow VMs to be moved (DRS) during the upgrades.

Before the actual upgrade its suggested in the release note to backup the Controller Data by Downloading a Snapshot of the config. To do that select the controller and click on the icon shown below. You will be prompted to download the small file to your PC.

Next step is to start the Upgrade. Click on Upgrade Available to start the upgrade process. That status will change to Downloading upgrade file as shown below.

Next you will see the Upgrade Status of the Controllers go through a couple different statuses as shown below. A normal upgrade will take about 5 minutes per controller…but if the upgrade fails for any reason on any controller there is a 30 minutes timeout in play. If an upgrade fails (generally due to network connectivity errors) you can click on Upgrade Available again to restart the process.

Once all the Controllers have been upgraded you should have all their Statuses on Normal and the Software Version up to 6.1.x

With the NSX Controllers done we move onto the NSX Enabled vSphere Clusters, Hosts and Logical Switche Upgrade…stay tuned for Part 3.


NSX Bytes: 6.1 Upgrade Part 1 – NSX Manager

On the back of VMworld US and a bunch of other VMware Product Updates released last week…NSX 6.1 went GA late last week…. For a quick overview of the new feature enhancements from 6.0.x have a look at this post from @pandom_

Word is on the street that this upgrade is not without it’s complications. There may be an outage along the way and you will loose access to the VXLAN Virtual Wires at stages during the upgrade (tbc). With that in mind you want to be looking at an upgrade during a maintenance window…If you are like me and still have NSX in the lab then there is less to worry about, but for those who have it in production it’s worth noting.

NSX components must be upgraded in the following order:

  1. NSX Manager
  2. NSX controller
  3. Clusters and Logical Switches
  4. NSX Edge and Guest Introspection

The upgrade process is managed by the NSX Manager. If the upgrade of a component fails or is interrupted and you need to repeat or restart the upgrade, the process begins from the point at which it stopped; it does not start over from the beginning

1. Upgrade NSX Manager:

Take a snapshot of your NSX Manager VM as a just in case…Ensure that the Update Bundle is a .tar.gz as some browsers will remove the .gz from the extension resulting in the upgrade failing.

The upgrade process will begin and complete in relatively quick time. Once complete you will get a message as shown below

And while it doesn’t tell you it’s happening the NSX Manager Appliance is being rebooted in the background. Click on Close and refresh the webpage where you will be prompted to login again. All things being equal you should be at version 6.1

Ok, that’s the easy non disruptive part of the upgrade….next Part involves upgrading the NSX Controllers which I’ll go through in Part 2.


Bug Fix: vCloud 5.x IP Sub Allocation Pool Error (5.5.2 Update)

A few months ago I wrote a quick post on a bug that existed in vCloud Director 5.1 in regards to IP Sub Allocation Pools and IP’s being marked as in use when they should be available to allocate. What this leads to is a bunch of unusable IPs…meaning that they go to waste and pools can exhaust quicker…

  • Unused external IP addresses from sub-allocated IP pools of the gateway failed after upgrading from vCloud Director 1.5.1 to vCloud Director 5.1.2
    After upgrading vCloud Director from version 1.5.1 to version 5.1.2, attempting to remove unused external IP addresses from sub-allocated IP pools of a gateway failed saying that IPs are in use. This issue is resolved in vCloud Director 5.1.3.

This condition also presents it’s self in vCloud 5.5 environments that have 1.5 lineage. Greenfields deployments don’t seem affected…vCD 5.1.3 was suppose to contain the fix but the release notes where released in error…we where then told that the fix would come in vCD 5.5…but when we upgraded our zones we still had the issue.

We engaged VMware Support recently and they finally had a fix for the bug which has now been officially released in vCD 5.5.2 (BUILD 2000523). It’s great to see a nice long list of bug fixes in addition to the one specified in this post.

Looking forward to the next major release of vCloud Director which will be the first of the Service Provider Releases…more on that when it comes closer to GA.


VMware Fling: ESXi MAC Learning dvFilter for Nested ESXi

Late last year I was load testing against a new storage platform using both physical and nested ESXi hosts…at the time I noticed decreased network throughput while using Load Test VMs hosted on the nested hosts. I wrote this post and reached out to William Lam who responded with an explanation as to what was happening and why promiscuous mode was required for nested ESXi installs.

Forward to VMworld 2014 and in a discussion I had with William at The W Bar (where lots of great discussions are had) after the Official Party he mentioned that a new Fling was about to be released that addresses the issues with nested ESXi hosts and promiscuous mode enabled on the Virtual Switches. As William explains in his new blog post he took the problem to VMware Engineering who where having similar issues in their R&D Labs and have come up with a workaround…this workaround is now an official Fling! Apart from feeling a little bit chuffed that I sparked interest in this problem which has resulted in a fix, I decided to put it to the test in my lab.

I ran the same tests that I ran last year. Running one load test on a 5.5 ESXi host nested on a physical 5.5 Host I saw equal network utilization across all 6 nested hosts.

The Load VM was only able to push 15-17MBps on a random read test. As William saw in his post ESXTOP shows you more about whats happening

About even network throughput across all NICs on all Hosts that are set for Promiscuous Mode…Overall throughput is reduced

After installing the VIB on the Physical host, you have to add the Advanced Virtual Machine settings to each Nested Host to enable the MAC Learning. Unless you do this via an API call you will need to shutdown the VM to edit the VMX/Config. I worked through a set of PowerCLI commands shown below to bulk add the Advanced Setting to running Nested Hosts. Below works for any VM matching ESX in a resource pool and has two NICs.

Checking back in on ESXTOP it looks to have an instant effect and only the Nested Host generating the traffic shows significant network throughput…the other hosts are doing nothing and I am now seeing about 85-90MBps against the load test.

Taking a look at Network Throughput graphs (below) you can see an example of two Nested Hosts in the group with the same throughput until the dvFilter was installed at which point traffic dropped on the host not running the load test. Throughput increased almost five fold on the host running the test.

The effect on Nested Host CPU utilization is also dramatic. Only the host generating the load has significant CPU usage while the other hosts return to normal operations…meaning overall the physical host CPUs are not working as hard.

As William mentions in his post this is a no brainer install for anyone using nested ESXi hosts for lab work…thinking about further implications of this fix I am thinking about the possibility of being able to support full nested environments within Virtual Data Centers without the fear of increased host CPU and decreased network throughput…for this to happen though VMware would need to change their stance on supportability of Nested ESXi environments…but this Fling, together with the VMTools Fling certainly makes nested hosts all that more viable.

EVO:RAIL – Who are VMware Really Targeting?

Probably the biggest announcement from last week’s VMworld was the unveiling of the Project Marvin/Mystic as EVO:RAIL. Most of the focus on VMware releasing their own OEM distributed hyperconverged solution was to compete head to head with established hyper converged players like Nutanix…however after reading through Duncan Eppings blog post and seeing the UI Demo doing the rounds on YouTube (see below) it occurred to me that possiblely there is more at play here than VMware competing for the SMB/E hyperconverged market.

From a services point of view the battle for the Private Cloud between VMware and Microsoft is well advanced…Hyper-V is certainly an alternative option for companies looking to (wrongly or rightly) “save” money or to try out something different after having VMware as an incumbent technology. The Windows Azure Pack adds Cloud like functionality to a Hyper-V Platform allowing people the ability to use Azure’s pretty interfaces internally. It also offers PaaS functionality like DBaaS and Web Sites Creation…to be honest something that’s been available for years…The WAP is also a direct pathway for consumers to push VMs and Platform servers to Azure which is Microsoft’s ultimate play…everything in Azure.

Enter EVO:RAIL…a scale out hyperconverged platform that offers an pretty new interface on top of vSphere with VSAN thrown in to the mix. For me its got the WAP in its sights and offers a readymade private cloud alternative built on superior vSphere technology. Without doubt one of the complaints I hear often in the services space (outside of Cloud and IaaS) is that vSphere and vCloud don’t have an intuitive enough interface…specially compared with the Azure Pack (lipstick on a pig) but the EVO Interface changes all that.

So VMware have positioned themselves well in the market with EVO:RAIL…if you look beyond the comparisons to Nutanix and other current market players in the space I think there is a play here to keep the on-prem market firmly in the grasp of VMware and offer a superior alternative to WAP.

VCAP-DCA – Experience

I couple of months ago I wouldn’t have seen myself writing up one of these blog posts which seems to be customary for any blogger who has taken a VCAP. Having only secured my VCP last October I wasn’t thinking about VCAPs until the lure of the 50% discount and realisation that I needed to push myself further. Four weeks before VMworld I decided to accept the challenge and booked the exam for the Sunday at 2pm. Another big driver for me to take the exam at VMworld was that I thought it was means to avoid the dreaded latency that seems to plague takers in Australia.

510 or 550?:

This was an interesting choice for me…it seemed that even with the 550 available most people where choosing the 510. To make up my mind I read through both blueprints and saw that certain sections where missing from the 510 (vMA, Autodeploy) while newer features like vSphere Replication and vFlash Read Cache where added along with vCO. That said when I booked my exam I decided to do the 510, however there where no slots at VMworld so I was forced to book the 550. End of the day I think that that was the right DCA and from what I understand the exam format has been better optimized for takers. I can see how the additional items on the blueprint could put people off, but in reality it shouldn’t be daunting for seasoned vSphere Admins. So end of the day I gave myself just over 3 weeks to prep.

Materials and Study:

Having the Blueprint by your side throughout the prep is critical…know it back to front and use resources out there like Chris Wahls Study Guide to work through the objectives and know what you know…and what you need working on…you really can’t afford to skip any section.

Having had a spare Amazon Voucher I had back ordered the VCAP-DCA Official Study Guide in February without any real intent of taking the DCA any time soon, however having this book allowed me to structure study based on it’s excellent chapter content which follows the 5.1 and 5.5 Blueprint objectives.

Pluralsights pay per the month for all training content is worth its weight in gold and Jason Nash’s Optimize and Scale Course is what I considered to be my most valuable study asset. The offline mode can be consumed anywhere and I spent most train rides and gym sessions working through chapters. I went through the content about 4-5 times in total stepping up the play speed each time…by the end of it I had Jason coming at me in 2x…

I also took the official VMware Optimize and Scale Course 5.1 in October of 2013. While I don’t think it ultimately helped me pass or fail the exam, its still worth a shot if the training expense can be justified.

The Lab:

For me, without a decent lab there is no chance of passing this exam…I am/was lucky in that I have a very decent lab at my disposal through ZettaGrid, but I still loaded up a mini lab on my Mac Book Pro to help me study and revise while on the plane ride over to VMworld. You need to go over and execute CLI commands because speed is the key in this exam…I would also learn up on 5.5 Web Client menu context and where to configure the new features listed in the Blueprint. A Lab with access to iSCSI/NFS shares is recommended and work through relevant blueprint items again and again…see how I am saying that repetition is key here? ☺

The Exam:

The format has changed fairly significantly from the 500 and 510 exams from my research and in asking others of their experience…You now get 23 questions over 180 minutes and the Exam Lab has five ESXi Hosts, 2 vCenters and a bunch of datastores…you also get a vSphere Replication Appliance and vCO Server instance.

Throughout the exam you are repeatedly warned to not change anything but what actions are stated in the question…Modding the Management Networking in error could end the exam. There seems to be a little more leeway on that and I found out the hard way when I completely misread one question and almost bricked a host..lucky it was a 5.5 host or else my dVS might not have come back.

As expected I didn’t have any issues with Latency and performance of the lab…this was a big thing and meant I could attack questions without the worry of screen refresh issues due to latency or sticky keys for CLI commands.

On that note, time management is absolutely key…I was working on 8 questions an hour, but quickly found the time coming down quick…some questions take longer than others, but I felt each question was roughly equal in terms of whats expected. Two questions stumped me initially and I left them for the end if I had time. I got to the end of question 23 with about 14 minutes left and attempted to go back and answer the 2 that caused me trouble…end of the day I had to leave those unanswered.

When time expired I got the message saying that your results would be sent out in 15-20 days. Overall I left the exam fairly positive I had done more than enough to pass.

The Result and Final Thoughts:

I got a Tweet from SOSTech_WP later in the evening while enjoying the Redwood Room at The Clift saying to check my email…when I got back to the hotel I was extremely surprised to see the results email waiting for me in my inbox. And while I didn’t smash it like I thought I was going to upon coming out…a pass is a pass is a pass! I was relieved and fairly happy that the plan had worked and all the hard work of the previous three weeks had paid off.

In reality it wasn’t just about the last three weeks and this VCAP-DCA exam is a true representation of an acquired skill level in administrating a complex vSphere platform and in that it was a validation of the work I do in and around vSphere. People commented to me prior to taking the exam that it was a fun exam to take…and I certainly understand that point of view… I had a great time taking it…but had a better time passing it!

Next for me is to decide on a more challenging path and journey…certainly one that may have me taking at least one more VCAP Administrator Exam and a Design Exam.

Good luck to all those taking the VCAP-DCA in the future.