NSX Bytes: Friends Don’t Let Friends Delete The VTEP PortGroup

Last week I posted a tweet saying “Friends don’t let friends delete the NSX-v VTEP PortGroup” and as most of us do in our industry we learn by doing and I found out the hard way that you shouldn’t mess with the PortGroup created during the Host Preparation of the NSX setup and configuration stage. This PortGroup is used by the Hosts in an NSX Enabled Cluster for the VMKernel Interfaces that are the VTEPs or VXLAN Tunnel End Points.

In a production environment this action is actually near on impossible to do because you can’t delete a PortGroup when it’s in use. Where I found myself in this situation was in trying to clone off a lab environment and restore components of the existing lab into new lab with new hosts. With that the following is something that could be handy in lab environments.

Once the new hosts have been prepared I went to configure the VXLAN against the cluster which creates a new VMKernel Interface on each host and assigns it a VTEP address from DHCP or from a pre-configured IP Pool but got an error. When I looked at the event logs in vCenter I saw the following error.

DVPortGroup dvportgroup-148806 couldnot be found
 The object or item referred to could not be found

Instantly I remembered that I had “cleaned up” the cloned vCenter configuration and removed any surplus PortGroups…in doing so I deleted the PortGroup NSX was referencing. I tried to recreate the PortGroup with the same name but it was clear that the configuration was referencing the MOID of the PortGroup and asking vCenter to use that to complete the job. Even an export/import of the Distributed Switch configuration from the original vCenter didn’t do the trick as the import increments the MOID already contained in the vCenter Database.

GSS Support Fix:

Thinking back to previous NSX related cases I’ve raised with VMware support I knew that the NSX Manager Database kept a very simple structure of vCenter objects and I guessed that some backend SQL search and replace could do the trick. After raising a case I had the guys in GSS enter into the NSX Manager backend, that can only be access with a secret VMware password and search for the table that referenced the MOID of the PortGroup. As can be seen below the fix is simple if you know the MOID of the old and the new PortGroup.

Note: Only VMware Support can action this fix.

With that modification committed I was able configure the VTEPs for the new hosts and continue to rebuild up the cloned instance. So if you ever get yourself in a situation where you have managed to do as I have done…there is a fix that can be done to avoid a complete start from scratch scenario.