Tag Archives: VXLAN

vCloud Director 9.0: Manual Quick fix for VXLAN Network Pool Error

vCloud Director 9.0, released last week has a bunch of new enhancements and a lot of those are focused around it’s integration with NSX. Tom Fojta has a what’s new page on the go with a lot of the new features being explained. One of his first posts just after the GA was around the new feature of being able to manually create VXLAN backed Network Pools.

VXLAN Network Pool is recommended to be used as it scales the best. Until version 9, vCloud Director would create new VXLAN Network Pool automatically for each Provider VDC backed by NSX Transport Zone (again created automatically) scoped to cluster that belong to the particular Provider VDC. This would create multiple VXLAN network pools and potentially confusion which to use for a particular Org VDC.

In vCloud Director 9.0 we now have the option of creating a VXLAN backed network pool manually instead of one being created at the time of a setting up a Provider vDC. In many of my environments for one reason or another the automatic creation of VXLAN network pool together with NSX would fail. In fact my current NextedESXi SliemaLabs vCD instance shows the following error:

There is a similar but less serious error that can be fixed by changing the replication mode from within the NSX Web Client as detailed here by Luca, however like my lab I’ve know a few people to run into the more serious error as shown above. You can’t delete the pool and a repair operation will continue to error out. Now in vCD 9.0 we can create a new VXLAN Network Pool form the Transport Zones created in NSX.

Once that’s been done you will have the newly created VXLAN Network Pool that’s truly more global and tied to best practice for NSX Transport Zones and one that can be used with the desired replication mode. The old one will remain, but you can now configure Org vDCs to consume the VXLAN backed network pool over the traditional VLAN backed pool.


vCloud Director 9: What’s New

vCloud Director 9: Create VXLAN Network Pool

Heads Up: Heavy VXLAN Traffic Causing Broadcom 10GB NICS to Drop

For the last couple of weeks we have had some intermittent issues where by ESXi network adapters have gone into a disconnected state requiring a host reboot to bring the link back online. Generally it was only one NIC at a time, but in some circumstances both NICs went offline resulting in host failure and VM HA events being triggered. From the console ESXi appears to be up, but each NIC was listed as disconnected and when we checked the switch ports there was no indication of a loss of link.

In the vmkernal logs the following entries are observed:

After some time working with VMware Support our Ops Engineer @santinidaniel came aross this VMwareKB which described the situation we where seeing. Interestingly enough we only saw this happening after recent host updates to ESXi 5.5 Update 3 builds but as the issue is listed as being present in ESXi 5, 5.5 and 6.0 that might just be a side note.

The cause as listed in the KB is:

This issue occurs when the guest virtual machine sends invalid metadata for TSO packets. The packet length is less than Maximum Segment Size (MSS), but the TSO bit is set. This causes the adapter and driver to go into a non-operational state.

Note: This issue occurs only with VXLAN configured and when there is heavy VXLAN traffic.

It just so happened that we did indeed have a large customer with high use Citrix Terminal Servers using our NSX Advanced Networking…and they where sitting on a VXLAN Virtualwire. The symptoms got worse today that coincided with the first official day of work for the new year.

There is a simple workaround:

That command has been described in blog posts relating to the Broadcom (which now present as QLogic drivers) drivers and where previously there was no resolution, there is now a fix in place by upgrading to the latest drivers here. Without upgrading to the latest certified drivers the quickest way to avoid the issue is to apply the workaround and reboot the host.

There has been recent outcry bemoaning the lack of QA with some of VMware’s latest releases but the reality is the more bits you add the more likelihood there is for issues to pop up…This is becoming more the case with ESXi as the base virtualization platform continues to add to it’s feature set which now includes VSAN baked in. Host extensions further add to the chance of things going wrong due to situations that are hard to test in as part of the QA process.

Deal, fix…and move on!





NSX Bytes: Testing VXLAN Tunnel End Points between ESXi Hosts

When working with NSX you have the option to create logical switches that are bound by Transport Zones…within those Transport Zones you could have a single vSphere Cluster or a number of Clusters that may or may not span different VLANs or Subnets. For those that have created NSX Logical Switches that span multiple clusters within a Transport Zone at some stage you may need to test connectivity between the VTEPs to ensure VXLAN is working as expected so that the NSX Controllers can do their thing.

There are a couple of different ways to achieve this:

ping with ++netstack=vxlan

This command can be run from the ESXi Shell and the parameters that are required are the correct VMKernel NIC that’s being used for VXLAN traffic encapsulation and the VTEP IP of the Host you want to test in the other Cluster.

If successful you should see outputs from both ends similar to what’s shown above. If you don’t have successful communication between the hosts you will need to start troubleshooting the underlying network as the MTU of the physical transport networks might not be set to 1600 minimum end to end. You can also test data packet sizes to see where the MTU might be set and if Jumbo frames is enabled.

Note: Joseph Griffiths has a good post expanding on the test and a little more detail around VXLAN MTU sizes.

Web Client Logical Switch Monitor Ping Test:

Once logged into the Web Client, click through to Networking & Security -> Logical Switches and then double click on the Logical Switch you want to test. On the Monitor Tab you will see a summary of the Logicial Switch objects in the left window pane while in the main window you have the option to select the Test Parameters and the Start Test button. The Size of the test packet option allows you to perform a standard ping test or one for VXLAN.

Select the Source and Destination Host that span the different Clusters within the Transport Zone you want to test against:

Click on Start Test and if everything is configured as expected you will get a couple of Green Ticks and a confirmation that all packets transmitted where received.

Again, if you get a failed test you will need to investigate where in the underlying network the issue could be…however now you have a couple of options to test the connectivity and more importantly one that allows you to test without having to enable SSH on the ESXi Hosts and test straight from the Web Client.



NSX vCloud Retrofit: Controller Deployment, Host Preperation and VXLAN Config

This blog series extends my NSX Bytes Blog Posts to include a more detailed look at how to deploy NSX 6.1.x into an existing vCloud Director Environment. Initially we will be working with vCD 5.5.x which is the non SP Fork of vCD, but as soon as an upgrade path for 5.5.2 -> 5.6.x is released I’ll be including the NSX related improvements in that release.

Part 3: Controller Deployment, Host Preperation and VXLAN Config

With the NSX Manager deployed and configured and after verifying that vShield Edges are still being deployed by vCloud Director we can move onto the nuts and bolts of the NSX Configuration. There are a number of posts out there on this process so I’ll keep it relatively high level…but do check out the bottom of this post for more detailed install links.

Controller Deployment:

Just in case this step hasn’t been done in previous product installs…a best practice for most new generation VMware Applications is to have the Managed IP Address set under vCenter Server Settings -> Runtime Settings -> Managed IP Address Set the IP that of your vCenter.

Next login to the Web Client -> Networking and Security -> NSX Managers -> IP Address of NSX Manager -> Manage -> Grouping Objects -> IP Pools and Click Add.

Here we are going to preconfigure the IP Pools that are going to be used by the NSX Controllers. At this point we can also add the IP Pool that will be used by the VXLAN VMKernel Interfaces which become our VTEP’s. If we are routing our VXLAN Transport VLAN then add as many IP Pools as you need to satisfy the routed subnets in the Transport Zones.

For the NSX Controllers its recommended that 3 (5 Possible…7 max) be deployed for increased NSX Resiliency. The idea is to split them across the Management Network and on ESXi Hosts as diverse as possible. They can be split across different vCenter Clusters in the a vCenter Datacenter…Ideally there should be configured with DRS Anti Affinity Rules to ensure a single host failure doesn’t result in a loss of Cluster Quorum.

Go to the Web Client -> Networking and Security -> Installation In the NSX Controller Nodes Pane click on add

  • Select the NSX Manager, Datacenter, Cluster/Resource Pool, Datastore
  • Leave the Host blank (allow for auto placement)
  • On the Connected To, click Select and go to the Distributed PortGroup List and Select the Zone Management PortGroup
  • On the IP Pool, click Select and Choose the NSX Controller IP Pool created above

The Deployment of the NSX Controllers can be monitored via vCenter and will take about 5-10 minutes. The deployment will fail if Compute resources are not available or if the Controllers can’t talk to vCenter on the configured PortGroup.

LINK: NSX Controller Deployment Gotchyas

Host Preparation:

In the Networking & Security Menu go to Installation and Host Preparation. Here you will see the Clusters in the vCenter and their Installation Status. Once you click Install all hosts in the Cluster are Prepared at once…Preparing the hosts involves the installing of the following components:

  • UWA – User World Agent
  • DFW – Distributed Firewall Module
  • DLR – Distributed Router Module
  • VXLAN Module

The installation of these VIBs is done without interruptions to host operations and doesn’t result in Maintenance Mode being triggered during the install a reboot is not required.

VXLAN Configuration:

Once the Hosts have been prepared the Configure option becomes available in the VXLAN column of the Host Preperation tab. This process will create the initial VXLAN PortGroup under the selected Distributed Switch and add new VMKernel Interfaces to each Host in the prepared Cluster…the IP of which will act as the VTEP (VXLAN Tunnel End Point) and is from which all VM traffic passes if VXLAN enabled. There are a number of different Teaming Policies available and each choice depends on the design of your switching network…I chose Failover due to the fact LACP was not available and each ESXi host has 2x10Gig pNICs and I am comfortable with a failover scenario.

  • Select the Distributed Switch relative to each zone
  • Set the VLAN (Configured to be carried throughout the underlying Physical Network)
  • Set the MTU to 1600 (at least to allow overhead)
  • Use the VXLAN Up Pool create in previous steps
  • Set the Teaming Policy to Fail Over
  • Leave the VTEP to 1 (Can only be one if Failover is selected)

As mentioned above a new Distributed Port Group will be created and VMK NICs added to each host in the Cluster


Once a Cluster is Prepared any host that gets added to the cluster will have the Agents and Networking automatically configured…the removal of the NSX Components does require a host to be restarted. In my experience of upgrading from NSX 6.x to 6.1 host reports are also required.

Further Reading: