If you are familiar with vCloud Director Edge Gateway Services you might have come across situations where Edges become unmanageable and you see the following options greyed out in the vCD UI. All Edge services remain functional, however no changes can be made.
In environments where NSX has been retrofitted with an in place upgrade over vCNS you may hit a bug with the NSX Manager to do with how it interprets vShield Edge Statuses. Basically vCloud Director talks to the NSX Manager to grab the status of deployed Edge Gateways. A situation can occur where the NSX Manager sends back the incorrect status resulting in the Edge becoming unmanageable from vCloud Director as shown above.
A Re-Deploy will work in resetting the status and making the Edge manageable again, however this will result in downtime for the services sitting behind the effected Edge device. During the course of an SR I raised with the NSX and vCloud VMware Engineering Teams a fix was created that uses the NSX Manager APIs to POST a new status that makes vCloud Director pick up the edge as manageable without having to Re-Deploy.
First step is to find out which Edges might be effected by this condition…apart from going through each Edge in the vCD UI I suggest looking at this post from @fojta in which he creates a PowerCLI Script to grab the current statuses of all edges. In addition to the Edge Name you will also need the EDGE-ID
Get the EDGE-ID by going into the vSphere Web Client’s Networking and Security Tab -> NSX Edges and Search to the Edge Name that matches the vCD UI
Using your favourite Rest Client, take the EDGE-ID and replace the identifier in the following API Call to get more details of the Edge.
https://NSX-MANAGER-IP/api/4.0/edges/edge-xxx
Next take the EDGE-ID and NAME (checking the DATACENTERID) from the response above and modify the Payload below increment the ID Number as you go along
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
<systemEvent> <eventId>90</eventId> <!-- You can change this id sequentially --> <timestamp>1431679596</timestamp> <severity>Informational</severity> <eventSource>edge_id</eventSource> <!-- Fill the appropriate data--> <eventCode>30041</eventCode> <message>vShield Edge VM has recovered and now responding to health check</message> <module>vShield Edge Gateway</module> <objectId>datacenter_id</objectId> <!-- Fill the appropriate data, you can get datacenter id where this edge is hosted --> <reporterName>vShield Manager</reporterName> <reporterType>4</reporterType> <sourceType>4</sourceType> <eventMetadata> <data> <key>edgeId</key> <value>edge_id</value> <!-- Fill the appropriate data --> </data> <data> <key>name</key> <value>Edge11</value> <!-- Fill the appropriate data --> </data> </eventMetadata> </systemEvent> |
Executing the following API POST
https://NSX-MANAGER-IP/api/2.0/systemevent
You should see a 201 Created Status Returned after the POST…Refresh the list in vCloud Director and the edge should be manageable. Repeat the process for any effected Edges.
Thanks to the NSX and vCloud Engineering team for working through an elegant solution that means zero impact on client services…As I am discovering, there are lots of cool things that can be done with APIs!
Further Reading:
This blog series extends my NSX Bytes Blog Posts to include a more detailed look at how to deploy NSX 6.1.x into an existing vCloud Director Environment. Initially we will be working with vCD 5.5.x which is the non SP Fork of vCD, but as soon as an upgrade path for 5.5.2 -> 5.6.x is released I’ll be including the NSX related improvements in that release.