vCOPS 5.8: Critical Data Collection Bug

UPDATE: VMware Global Support supplied me with vCOPs 5.8.0 Hot Fix 01 Build 1537842 which is available via a support request. This is a complete .pak update so you will need to go through the upgrade process as per usual.

The issue of the missing data has been resolved, however I did need to go through and remove a heap of duplicate entity types in the Custom Dashboard. I have a fully functional vCOPs platform now. Hopefully no more bugs in this build!

Like most…I jumped to upgrade vCOPs from 5.7.x to 5.8 when it went GA mid December. Initially the upgrade completed without issue and the four vCenter’s registered looked to continue collecting data as expected.

# To cut to the guts of the error scroll to the bottom on the post.

Shortly after the upgrade I went through and performed an upgrade of vCenter from 5.0 to 5.1 in one of our sites. Upon completing the upgrade and having a look at the Custom Dashboards we have setup, I noticed that 90% of the hosts in the recently upgraded vCenter where showing as white boxes (no data)

Looking through the Analytic VM collector logs I found these entries that seemed to point to a connectivity/communication issue between vCOPs and the Hosts.

After going through a painful couple of support calls with VMware support where they where insistent the issue was with the vCenter (have you tried turning it off and on) and/or a disk space issue on the Analytic VM. They suggested it was a well known upgrade bug that can be resolved as per below.



Off the bat I knew this wasn’t my issues because initially on upgrade I didn’t have the issue. While I waited for support to try and diagnose the issue via vCOPs support log bundles, I assumed that the issue was in the data…possibly a bad row in the database relative to a host/vCenter.

I removed the affected vCenter from vCOPs Admin Registration Tab and ran the following command on the UI VM that I picked up from @h0bbel‘s Post here:

 

The command ran for about 20-30 minutes and returned as being successful. When I went back into vCOPs the affected vCenter’s hosts had returned and was actively collecting and reporting data…however I now saw a number of other hosts across multiple vCenters showing showing the same problem!

I contacted VMware support again and had the case escalated which resulted in the admission that there was a newly discovered (probably through my persistence in regards to the case) bug in 5.8 and that I was experiencing all the symptoms.

Issue in 5.8 where we have the below symptoms:

  • One or more ESXi/ESX hosts are no longer present in the vCenter Operations Manager inventory.
  • ESXi/ESX hosts are missing from the vCenter Operations Manager inventory.
  • Child objects of missing ESXi/ESX hosts such as virtual machines and datastores are present in the vCenter Operations Manager inventory.
  • This issue occurs when you place the ESXi/ESX hosts into Maintenance Mode in vCenter Server and then take the hosts out of Maintenance Mode.

Review the Below KB:
http://kb.vmware.com/kb/2068303/

So, at the moment there is no resolution or fix and the workarounds are pretty nasty! …basically you will be modding database entries and taking snapshots of the Analytic VM.

I just got off the phone with VMware support in Palo Alto and got told that a hotfix was being worked on and should be available soon. Once released to me, i’ll update this post with the final resolution.

2 comments

  • Did you get the patch for this ?

    • Hey there…patch has been released…check the KB above:

      This issue is resolved in vCenter Operations Manager 5.8.1 available at VMware Downloads. For more information, see the vCenter Operations Manager 5.8.1. Release Notes.