2020 Edit: Seems like this error has been updated in vSphere 7.x to be a little more friendly and less terminal. The old error I referenced below was something specific to an unclean shutdown/startup of the appliance… with it being a lot more resilient these days, the 503 error came about just during initial startup of key services.
We now get… no healthy upstream
I’ve just had to fix one of my VCSA’s again from the infamous 503 Service Unavailable error that seems to be fairly common with the VCSA even though it’s was claimed to be fixed in vCenter version 6.5d. I’ve had this error pop up fairly regularly since deploying my homelab’s vCenter Server Appliance as a version 6.5 GA instance and for the most part I’ve refrained from rebooting the VCSA just in case the error pops up upon reboot and have even kept a snapshot against the VM just in case I needed to revert to it on the high change that it would error out.
503 Service Unavailable (Failed to connect to endpoint: [N7Vmacore4Http20NamedPipeServiceSpecE:0x0000559b1531ef80] _serverNamespace = / action = Allow _pipeName =/var/run/vmware/vpxd-webserver-pipe)
After doing a Google search for any permanent solutions to the issue, I came across a couple of posts referencing USB passthrough devices that could trigger the error which was plausible given I was using an external USB Hard Drive. IP changes seem to also be a trigger for the error though in my case, it wasn’t the cause. There is a good Reddit thread here that talks about duplicate keys…again related to USB passthrough. It also links externally to some other solutions that where not relevant to my VCSA.
Solution:
As referenced in this VMware communities forum post, to fix the issue I had to first find out if I did have a duplicate key error in the VCSA logs. To do that I dropped into the VCSA shell and went into /var/logs and did a search for any file containing device_key + already exists. As shown in the image above this returned a number of entries confirming that I had duplicate keys and that it was causing the issue.
The VMware vCenter Server Appliance vpxd 6.5 logs are located in the /var/log/vmware/vmware-vpx folder
What was required next was to delete the duplicate embedded PostGres database table entries. To connect to the embedded postgres database you need to run the following command from the VCSA shell:
1 |
/opt/vmware/vpostgres/current/bin/psql -d VCDB -U postgres |
To remove the duplicate key I ran the following command and rebooted the appliance, noting that the id and device_key will vary.
1 |
DELETE FROM vc.vpx_vm_virtual_device where id='27' and device_key='4000'; |
Once everything rebooted all the services started up and I had a functional vCenter again which was a relief given I was about five minutes away from a restore or a complete rebuild…and ain’t nobody got time for that!
vCenter (VCSA) 6.5 broken after restart
byu/FatalIll invmware
Reference:
https://communities.vmware.com/thread/556490