I had a really strange situation pop up in one of my lab environments over the weekend. vSAN Health was reporting that one of the hosts had lost networking connectivity to the rest of the cluster. This is something i’ve seen intermittently at times so waited for the condition to clear up. When it didn’t clear up, I went to look at the host to put it into maintenance mode, but found that I wasn’t getting the expected vSAN options.
I have seen situations recently where the enable vSAN option on the VMkernel interface had been cleared and vCenter thinks there are networking issue. I thought maybe it was this again. Not that that situation in its self was normal, but what I found when I went to view the state of the VMkernel Adapters from the vSphere Web Client was even stranger.
No adapters listed!
The host wasn’t reported as being disconnected and there was still connectivity to it via the Web UI and SSH. To make sure this wasn’t a visual error from the Web Client I SSH’ed into the host and ran esxcli to get a list of the VMkernel interfaces.
Unable to find vmknic for dvsID: xxxxx
So from the cli, I couldn’t get a list of interfaces either. I tried restarting the core services without luck and still had a host that was up with VMs running on it without issue, yet reporting networking issues and having no network interfaces configured per the running state.
Going to the console… the situation was not much better.
Nothing… no network or host information at all 🙂
Not being bale to reset the management network my only option from here was to reboot the server. Upon reboot the host did come back up online, however the networking was reporting as being 0.0.0.0/0 from the console and now the host was completely offline.
I decided to reboot using last know good configuration as shown below:
Upon reboot using the last known good configuration all previous network settings where restored and I had a list of VMkernel interfaces again present from the Web Client and from the cli.
Because of the “dirty” vSAN reboot, as is usual with anything that disrupts vSAN, the cluster needed some time to get its self back into working order and while some VMs where in an orphaned or unavailable state after reboot, once the vSAN re-sync had completed all VMs where back up and operational.
Cause and Offical Resolution:
The workaround to bring back the host networking seemed to do the trick however I don’t know what the root cause was for the host to lose all of its network config. I have an active case going with VMware Support at the moment with the logs being analysed. I’ll update this post with the results when they come through.
ESXi Version: 126.96.36.19906603
vSphere Version: 188.8.131.52000