Category Archives: vCOPs

Quick Fix: VCSA Web Client 6.0 Throws Monitoring Errors

This quick fix post is for those out there who are still using vCenter Operations Manager 5.8.x and are or thinking about deploying or upgrading to vCenter 6.0…I came across this annoying situation all of a sudden while working on a new vCenter instance when the Web Client started to report the error shown below.

This can be ignored by clicking no and you will still be able to operate most areas of the Web Client but you will find that Monitoring and Health pages fail to load and give you a generic Error #2036 as shown below.

It took me a while to realize that the error was related specifically to the monitoring modules and it finally clicked in my head that the error started happening when I Registered the vCenter against my lab vCOPs instance. I was still running vCOPs (not vRA) and the instance hadn’t been upgraded to the latest build. Having a look through the VMwareKBs I came across KB 2111224 which explained the cause.

This issue occurs because vRealize Operations Manager versions prior to 5.8.5 are not supported in the vSphere 6.0 environment.

Upgrading the vCOPs Appliances to build 5.8.5-2532416 sorted the issue and I was able to browse through the Web Client without the error and have the integrated Health Monitoring work without issue.

References:

http://kb.vmware.com/selfservice/microsites/search.do?cmd=displayKC&docType=kc&externalId=2111224&sliceId=1&docTypeID=DT_KB_1_1&dialogID=850159080&stateId=0%200%20850161293

PowerCli IOPS Metrics: vCloud Org and VPS Reporting

We have recently been working through a product where knowing and reporting on VM Max Read/Write IOPS was critical. We needed a way to be able to provide reporting on our clients VPSs and vCloud Organisation VMs.

vCOPs is a seriously great monitoring and analytics tool, but it has got a flaw in it’s reporting in that you can’t search, export or manipulate metrics relating to VM IOPS in a useful way. VeeamOne gives you a Top 10 list of IOPS, CloudPhysics has a great card showing DataStore/VM performance…but again, not exportable or granular enough for what we needed.

If you search on Google for IOPS Reporting you will find a number of guys who have created excellent PowerCLI Scripts. Problem I found was that most worked in some cases, but not for what we required. One particular post I came across this Post on the VMware Community Forums gave a quick and dirty script to gather IOPS stats for all VMs. This lead me to the Alpacapowered Blog. So initial credit for the following goes to MKguy…I merely hacked around it to provide us with additional functionality.

Before You Start:

Depending on your Logging Level in vCenter (I have run this against vCenter 5.1 with PowerCLI 5.5) you may not be collecting the stats required to get Read/Write IOPS. To check this run the following in PowerCLI connected to your vCenter

If you don’t get the output it means your logging level is set to a lower level than is required. Read through this Post to have vCenter logging the required metrics on a granular level. Once thats been done, give vCenter about 30 minutes to collect its 5 minute samples. If you ever want to check individually how many samples you have for a particular VM you can run the following command. It will also show you the Min/Max Count plus the average.

The Script:

I’ve created two versions of the script (one for Single VMs and on for vCloud Org VMs) and as you can see below, I added in a couple niceties to make this more user friendly and easy to trigger for our internal support staff. Idea is that anyone with the right access to vCenter can double-click on the .ps1 script, and with the right details produce a report for either a single VM or a vCloud Organisation.

Script Notes:

Line 1: Adds the PowerCLI Snap-In to be able to call ESXi Commandlets from PowerShell on click of the .ps1

Line 3: Without notes from MKguy, i’m assuming this is telling us to use the last 30 days of stats if they exist.

Line 7: I discovered the -menu flag for Connect-VIServer which lists a 10 list of your most recently connects vCenter or ESXi servers…from there you enter a number to connect (ease of use for helpdesk)

Line 16: Does uses the Get-Folder command to allow us to get all the VMs in a vCloud Org…you can obviously enter in your own preferred search flags here.

Lines 17-22 are the ones I picked up form the Community post which basically takes the command we used above to check for samples metrics and feeds it into a read/write variable which is then displayed in a series of columns as shown below.

Script Output:

Executing the .ps1 will open a PowerShell window, Ask you to enter in the vCenter/Host and finally the VM name or vCloud Org Description. If you have a folder with a number of VMs, the script can take a little time going through the math and spit out the values.

From there you can do a select and copy to export the values out for manipulation…I haven’t done a csv export option due to time constraints, however if anyone want to add that to the end of the script, please do and let me know 🙂

Hope this script is useful for some!

vCOPS 5.8: Critical Data Collection Bug

UPDATE: VMware Global Support supplied me with vCOPs 5.8.0 Hot Fix 01 Build 1537842 which is available via a support request. This is a complete .pak update so you will need to go through the upgrade process as per usual.

The issue of the missing data has been resolved, however I did need to go through and remove a heap of duplicate entity types in the Custom Dashboard. I have a fully functional vCOPs platform now. Hopefully no more bugs in this build!

Like most…I jumped to upgrade vCOPs from 5.7.x to 5.8 when it went GA mid December. Initially the upgrade completed without issue and the four vCenter’s registered looked to continue collecting data as expected.

# To cut to the guts of the error scroll to the bottom on the post.

Shortly after the upgrade I went through and performed an upgrade of vCenter from 5.0 to 5.1 in one of our sites. Upon completing the upgrade and having a look at the Custom Dashboards we have setup, I noticed that 90% of the hosts in the recently upgraded vCenter where showing as white boxes (no data)

Looking through the Analytic VM collector logs I found these entries that seemed to point to a connectivity/communication issue between vCOPs and the Hosts.

After going through a painful couple of support calls with VMware support where they where insistent the issue was with the vCenter (have you tried turning it off and on) and/or a disk space issue on the Analytic VM. They suggested it was a well known upgrade bug that can be resolved as per below.



Off the bat I knew this wasn’t my issues because initially on upgrade I didn’t have the issue. While I waited for support to try and diagnose the issue via vCOPs support log bundles, I assumed that the issue was in the data…possibly a bad row in the database relative to a host/vCenter.

I removed the affected vCenter from vCOPs Admin Registration Tab and ran the following command on the UI VM that I picked up from @h0bbel‘s Post here:

 

The command ran for about 20-30 minutes and returned as being successful. When I went back into vCOPs the affected vCenter’s hosts had returned and was actively collecting and reporting data…however I now saw a number of other hosts across multiple vCenters showing showing the same problem!

I contacted VMware support again and had the case escalated which resulted in the admission that there was a newly discovered (probably through my persistence in regards to the case) bug in 5.8 and that I was experiencing all the symptoms.

Issue in 5.8 where we have the below symptoms:

  • One or more ESXi/ESX hosts are no longer present in the vCenter Operations Manager inventory.
  • ESXi/ESX hosts are missing from the vCenter Operations Manager inventory.
  • Child objects of missing ESXi/ESX hosts such as virtual machines and datastores are present in the vCenter Operations Manager inventory.
  • This issue occurs when you place the ESXi/ESX hosts into Maintenance Mode in vCenter Server and then take the hosts out of Maintenance Mode.

Review the Below KB:
http://kb.vmware.com/kb/2068303/

So, at the moment there is no resolution or fix and the workarounds are pretty nasty! …basically you will be modding database entries and taking snapshots of the Analytic VM.

I just got off the phone with VMware support in Palo Alto and got told that a hotfix was being worked on and should be available soon. Once released to me, i’ll update this post with the final resolution.

First Look: CloudPhysics – Datastore Contention Card

I first came across CloudPhysics just before VMWorld 2012. For a general overview, go here: I am a massive fan of analytics and trend metrics and I use a number of systems to gain a wide overview of the performance and monitoring of our Hosting and Cloud Platform…as well as extending out to client systems.

I love the deep/complex analytics of VMware Operations Manager but sometimes I feel a sense of being overwhelmed with the sheer amount of data presented by the default views of vCOPs and working with the Custom Dashboards can be a frustrating exercise if you don’t have a heap of time and patience.

This is where I have found CloudPhysics comes into it’s own…via it’s brilliant presentation of things that matter. I’m not going to go through the setup and config, but in a nutshell…from the site, register, login, download and deploy the VMware Probe Appliance, give it an IP and enter in your email address as it relates to your CloudPhysics login. It’s one probe per vCenter, but you can deploy multiple probes to multiple vCenters and links them back under the same username and CloudPhysics App.

When you log in, you are presented with the home screen below:

From relatively humble and basic default cards released around the VMWorld launch the team has been adding more complex and useful cards. HA Cluster Health and SnapShots Gone Wild are my personal favourites and offer a view into key areas of vSphere management. What’s also great about these cards is that they offer external jump links to VMware KB’s and offer basic information about subject matter. The organisation and presentation of the data pulled by the probe is simple yet effective in allowing you to get an understanding of how your environments are performing and which areas are under stress.

Released today was the DataStore Contention Card which looks at the performance of VMFS Datastores in your environment. The Default view selects the DataStore that needs the most attention. In my case I was surprised to see the Datastore below exhibit combined read/write latency that was off the chart!

The interface allows you to select a block of time at any level and see which VM may be contributing the most to the Performance Metric selected. Those metrics are shown below and include Latency, Outstanding I/O’s, IOPS and Bandwidth. You also have the ability to  Filter the view by vCenter, Datastore Cluster and Datastore.

The screen grabs don’t do the CloudPhysic’s Web Application interface justice so head over the site and download the probe to get started. It must be said that the product is only in BETA so use at your own risk, but I’ve had no issues with the Probe VM who’s specs are 2vCPU, 4GB of RAM and 16GB of storage.

vCenter Operations Manager 5 – UI Time-out Settings

A little after vCenter Operations Manager 5.0 went GA I posted this forum thread in regards to the default time-out value of the Web UI…Thanks to ILIO and VEIgel for follow-up posts with the initial solution, which at this time wasn’t documented. This is now covered in the release notes for 5.0.1 under known issues. But for those who haven’t read through the issues see the below to adjust the UI Time-out on vCOPS 5.0 (and now 5.0.1). The default time-out value is 30 minutes.

To change the default timeout value in the vSphere UI in the vApp:

  1. On the UI VM, edit /usr/lib/vmware-vcops/tomcat/webapps/vcops-vsphere/WEB-INF/web.xml and adjust the value for session-timeout parameter. Minus one (-1) results in an infinite timeout ” the session will not expire at all.
  2. Restart the Apache Tomcat web server by running the command: service vcopsweb restart

With the release of 5.0.1 it looks like any customizations you may have done get overwritten, so this process will have to be repeated upon every update. I’ve read posts about having to save any custom dashboards you may have created as well.

As a side note, I’ve noticed that by default you are able to use the login credentials relative to the vCenter instance(s) you have registered in the admin console…however I’ve found that when I added vCenters that use different LDAP sources the UI won’t let you login with any previously working LDAP account and you will need to log in with the default admin account…not sure if that’s a bug, or by design…I was surprised that it picked up user auth without any additional config from the first vCenter registered. In my example I had three vCenters registered, the first two had common LDAP profiles, while the third was standalone…

sources (1)