Tag Archives: Fix

Quick Fix: VCSA 503 Service Unavailable Error

I’ve just had to fix one of my VCSA’s again from the infamous 503 Service Unavailable error that seems to be fairly common with the VCSA even though it’s was claimed to be fixed in vCenter version 6.5d. I’ve had this error pop up fairly regularly since deploying my homelab’s vCenter Server Appliance as a version 6.5 GA instance and for the most part I’ve refrained from rebooting the VCSA just in case the error pops up upon reboot and have even kept a snapshot against the VM just in case I needed to revert to it on the high change that it would error out.

503 Service Unavailable (Failed to connect to endpoint: [N7Vmacore4Http20NamedPipeServiceSpecE:0x0000559b1531ef80] _serverNamespace = / action = Allow _pipeName =/var/run/vmware/vpxd-webserver-pipe)

After doing a Google search for any permanent solutions to the issue, I came across a couple of posts referencing USB passthrough devices that could trigger the error which was plausible given I was using an external USB Hard Drive. IP changes seem to also be a trigger for the error though in my case, it wasn’t the cause. There is a good Reddit thread here that talks about duplicate keys…again related to USB passthrough. It also links externally to some other solutions that where not relevant to my VCSA.

Solution:

As referenced in this VMware communities forum post, to fix the issue I had to first find out if I did have a duplicate key error in the VCSA logs. To do that I dropped into the VCSA shell and went into /var/logs and did a search for any file containing device_key + already exists. As shown in the image above this returned a number of entries confirming that I had duplicate keys and that it was causing the issue.

The VMware vCenter Server Appliance vpxd 6.5 logs are located in the /var/log/vmware/vmware-vpx folder

What was required next was to delete the duplicate embedded PostGres database table entries. To connect to the embedded postgres database you need to run the following command from the VCSA shell:

To remove the duplicate key I ran the following command and rebooted the appliance, noting that the id and device_key will vary.

Once everything rebooted all the services started up and I had a functional vCenter again which was a relief given I was about five minutes away from a restore or a complete rebuild…and ain’t nobody got time for that!

vCenter (VCSA) 6.5 broken after restart from vmware

Reference:

https://communities.vmware.com/thread/556490

 

HomeLab – SuperMicro 5028D-TNT4 Storage Driver Performance Issues and Fix

Ok, i’ll admit it…i’ve had serious lab withdrawals since having to give up the awesome Zettagrid Labs. Having a lab to tinker with goes hand in hand with being able to generate tech related content…point and case, my new homelab got delivered on Monday and I have been working to get things setup so that I can deploy my new NestedESXi lab environment.

By way of an quick intro (longer first impression post to follow) I purchased a SuperMicro SYS-5028D-TN4T that I based off this TinkerTry Bundle which has become a very popular system for vExpert homelabers. It’s got an Intel Xeon D-1541 CPU and I loaded it up with 128GB or RAM. The system comes with an embedded Lynx Point AHCI Controller that allows up to six SATA devices and is listed on the VMware Compatibility Guide for ESXi 6.5.

The issue that I came across was to do with storage performance and the native driver that comes bundled with ESXi 6.5. With the release of vSphere 6.5 yesterday, the timing was perfect to install ESXI 6.5 and start to build my management VMs. I first noticed some issues when uploading the Windows 2016 ISO to the datastore with the ISO taking about 30 minutes to upload. From there I created a new VM and installed Windows…this took about two hours to complete which I knew was not as I had expected…especially with the datastore being a decent class SSD.

I created a new VM and kicked off a new install, but this time I opened ESXTOP to see what was going on, and as you can see from the screen shots below, the Kernel and disk write latencies where off the charts topping 2000ms and 700-1000ms respectivly…In throuput terms I was getting about 10-20MB/s when I should have been getting 400-500MB/s. 

ESXTOP was showing the VM with even worse write latency.

I thought to myself if I had bought a lemon of a storage controller and checked the Queue Depth of the card. It’s listed with a QD of 31 which isn’t horrible for a homelab so my attention turned to the driver. Again referencing the VMware Compatability Guide the listed driver for the conrtoller the device driver is listed as ahci version 3.0.22vmw.

I searched for the installed device driver modules and found that the one listed above was present, however there was also a native VMware device drive as well.

I confirmed that the storage controller was using the native VMware driver and went about disabling it as per this VMwareKB (thanks to @fbuechsel who pointed me in the right direction in the vExpert Slack Homelab Channel) as shown below.

After the host rebooted I checked to see if the storage controller was using the device driver listed in the compatability guide. As you can see below not only was it using that driver, but it was now showing the six HBA ports as opposed to just the one seen in the first snippet above.

I once again created a new VM and installed Windows and this time the install completed in a little under five minutes! Quiet a difference! Upon running a crystal disk mark I was now getting the expected speeds from the SSDs and things are moving along quiet nicely.

Hopefully this post saves anyone else who might by this, or other SuperMicro SuperServers some time and not get caught out by poor storage performance caused by the native VMware driver packaged with ESXi 6.5.


References
:

http://www.supermicro.com/products/system/midtower/5028/SYS-5028D-TN4T.cfm

https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2044993