Tag Archives: VCSA

Quick Fix – Issues Upgrading VCSA due to Password Expiration

It seems like an interesting “condition” has worked its self into recent VCSA builds where upon completing upgrades, the process seems to reset the root account expiration flag. This blocked my proceeding with an upgrade and only worked when I followed the steps listed below.

The error I got is shown below:

“Appliance (OS) root password is expired or is going to expire soon. Please change the root password before installing an update.”

When this happened on the first vCenter I went to upgrade, I thought that maybe there was a chance I had forgotten to set that to never expires… but usually by default I check that setting and set it to never expires… not the greatest security practice, but for my environments it’s something I set almost automatically during initial configuration. After reaching out on Twitter, I got some immediate feedback saying to reset the root password by going into single user mode… which did work.

When this happened a second time on a second VCSA, on which I without question set the never expires flag to true, I took a slightly different approach to the problem and decided to try reset the password from the VCSA Console, however that process fails as well.

After going back through the Tweet responses, I did come across this VMwareKB which lays down the issue and offers the reason behind the errors.

This issue occurs when VAMI is not able to change an expired root password.

Fair enough… but I don’t have a reason for the password never expires option not being honoured? Some feedback and conversations suggest that maybe this is a bug that’s worked its way into recent builds during upgrade procedures. In any case the way to fix it is simple and doesn’t need console access to access the command line… you just need to SSH into the VCSA and reset the root password as shown below.

Once done, the VCSA upgrade proceeds as expected. As you can see there we have also confirmed that the Password Expires is set to never. If anyone can confirm the behaviour regarding that flag being reset, feel free to comment below.

Apart from that, there is the quick fix!

References:

https://kb.vmware.com/s/article/67414

Workaround – VCSA 6.7 Upgrade Fails with CURL Error: Couldn’t resolve host name

It’s never an issue with DNS! Even when DNS looks right…it’s still DNS! I came across an issue today trying to upgrade a 6.5 VCSA to 6.7. The new VCSA appliance deployment was failing with an OVFTool error suggesting that DNS was incorrectly configured.

Initially I used the FQDN for source and target vCenter’s and let the installer choose the underlying host to deploy the new VCSA appliance to. Even though everything checked out fine in terms of DNS resolution across all systems I kept on getting the failure. I triple checked name resolution on the machine running the update, both vCenter’s and the target hosts. I even tried using IP addresses for the source and target vCenter but the error remained as it still tried to connect to the vCenter controlled host via it’s FQDN resulting in the error.

After doing a quick Google search and finding nothing, I changed the target to be an ESXi host directly and used it’s IP address over it’s FQDN. This time the OVFTool was able to do it’s thing and deploy the new VCSA appliance.

The one caveat when deploying directly to a host over a vCenter is that you need to have the target PortGroup configured as an ephemeral…but that’s a general rule of bootstrapping a VCSA in any case and it’s the only one that will show up from the drop down list.

While very strange given all DNS checked out as per my testing, the workaround did it’s thing and allowed me to continue with the upgrade. This didn’t find the root cause…however when you need to motor on with anupgrade, a workaround is just as good!

The One Problem with the VCSA

Over the past couple of months I noticed a trend in my top blog daily reporting…the Quick fix post on fixing a 503 Service Unavailable error was constantly in the top 5 and getting significant views. The 503 error in various forms has been around since the early days of the VCSA which usually manifests it’s self with the following.

503 Service Unavailable (Failed to connect to endpoint: [N7Vmacore4Http20NamedPipeServiceSpecE:0x0000559b1531ef80] _serverNamespace = / action = Allow _pipeName =/var/run/vmware/vpxd-webserver-pipe)

Looking at the traffic stats for that post it’s clear to see an upward trend in the page views since about the end of June.

This to me is both a good and bad thing. It tells me that more people are deploying or migrating to the VCSA which is what VMware want…but it also tells me that more people are running into this 503 error and looking for ways to fix it online.

The Very Good:

The vCenter Server Appliance is a brilliant initiative from VMware and there has been a huge effort in developing the platform over the past three to four years to get it to a point where it not only became equal to vCenter’s deployed on Windows (and relying on MSSQL) but surpassed it in a lot of features especially in the vSphere 6.5 release. Most VMware shops are planning to or have migrated from Windows to the VCSA and for VMware labs it’s a no brainer for both corporate or homelab instances.

Personally I’ve been running VCSA’s in my various labs since the 5.5 release, have deployed key management clusters with the VCSA and more recently have proven that even the most mature Windows vCenter can be upgraded with the excellent migration tool. Being free of Windows and more importantly MSSQL is a huge factor in why the VCSA is an important consideration and the fact you get extra goodies like HA and API UI’s adds to it’s value.

The One Bad:

Everyone who has dealt with storage issues knows that it can lead to Guest OS file systems errors. I’ve been involved with shared hosting storage platforms all my career so I know how fickle filesystems can be to storage latency or loss of connectivity. Reading through the many forums and blog posts around the 503 error there seems to be a common denominator of something going wrong with the underlying storage before a reboot triggers the 503 error. Clicking here will show the Google results for VCSA + 503 where you can read the various posts mentioned above.

As you may or may not know the 6.5 VCSA has twelve VMDKs, up from 2 in the initial release and to 11 in the 6.0 release. There a couple of great posts from William Lam and Mohammed Raffic that go through what each disk partition does. The big advantage in having these seperate partitions is that you can manage storage space a lot more granularly.

The problem as mentioned is that the underlying Linux file system is susceptible to storage issue. Not matter what storage platform you are running you are guaranteed to have issues at one point or another. In my experience Linux filesystems don’t deal will with those issues. Windows file systems seem to tolerate storage issue much better than their Linux counterparts and without starting a religious war I do know about the various tweaks that can be done to help make Linux filesystems more resilient to underlying storage issues.

With that in mind, the VCSA is very much susceptible to those same storage issues and I believe a lot of people are running into problems mainly triggered by storage related events. Most of the symptoms of the 503 relate back to key vCenter services unable to start after reboot. This usually requires some intervention to fix or a recovery of the VCSA from backup, but hopefully all that’s needed is to run an e2fsck against the filesystem(s) impacted.

The Solution:

VMware are putting a lot of faith into the VCSA and have done a tremendous job to develop it up to this point. It is the only option moving forward for VMware based platforms however there needs to be a little more work done into the resiliency of the services to protect against external issues that can impact the guest OS. PhotonOS is now the OS of choice from 6.5 onwards but that will not stop the legacy of susceptibility that comes with Linux based filesystems leading to issues such as the 503 error. If VMware can protect key services in the event of storage issues that will go a long way to improving that resiliency.

I believe it will get better and just this week VMware announced a monthly security patch program for the VCSA which shows that they are serious (not to say they where not before) about ensuring the appliance is protected but I’m sure many would agree that it needs to offer reliability as well…this is the one area where the Windows based vCenter has an advantage still.

With all that said, make sure you are doing everything possible to have the VCSA housed on as reliable as possible storage and make sure that you are not only backing up the VCSA and external dependancies correctly but understand how to restore the appliance including understanding of the inbuilt backup mechanisms for backing up the config and the PostGres database.

I love and would certainly recommend the VCSA…I just want to love it a little more without having to deal with possibility of having the 503 server error lurking around every storage event.

References:

http://www.vmwarearena.com/understanding-vcsa-6-5-vmdk-partitions-mount-points/

http://www.virtuallyghetto.com/2016/11/updates-to-vmdk-partitions-disk-resizing-in-vcsa-6-5.html

https://www.veeam.com/wp-vmware-vcenter-server-appliance-backup-restore.html

https://kb.vmware.com/kb/2091961

https://kb.vmware.com/kb/2147154

migrate2vcsa – Migrating vCenter 6.0 to 6.5 VCSA

Over the past few years i’ve written a couple of articles on upgrading vCenter from 5.5 to 6.0. Firstly an in place upgrade of the 5.5 VCSA to 6.0 and then more recently an in place upgrade of a Windows 5.5 vCenter to 6.0. This week I upgraded and migrated my NestedESXi SliemaLab vCenter using the migrate2vcsa tool that’s now bundled into the vCenter 6.5 ISO. The process worked first time and even though I held some doubts about the migration working without issue and my Windows vCenter is now in retirement.

The migration tool that’s part of vSphere 6.5 was actually first released as a VMware fling after it was put forward as an idea in 2013. It was then officially to GA with the release of vSphere 6.0 Update 2m…where m stood for migration. Over it’s development it has been championed by William Lam who has written a number of articles on his blog and more recently Emad Younis has been the technical marketing lead on the product as it was enhanced for vSphere 6.5.

Upgrade Options:

You basically have two options to upgrade a Windows based 6.0 vCenter:

My approach for this particular environment was to ensure a smooth upgrade to vSphere 6.0 Update 2 and then look to upgrade again to 6.5 once is thaws outs in the market. The cautious approach will still be undertaken by many and a stepped upgrade to 6.5 and migration to the VCSA will still be common place. For those that wish to move away from their Windows vCenter, there is now a very reliable #migrate2vcsa path…as a side note it is possible to migrate directly from 5.5 to 6.5.

Existing Component Versions:

  • vCenter 6.0 (4541947)
    • NSX Registered
    • vCloud Director Registered
    • vCO Registered
  • ESXi 6.0 (3620759)
  • Windows 2008 (RTM)
  • SQL Server 2008 R2 (10.50.6000.34)

All vCenter components where installed on the Windows vCenter instance including Upgrade Manager. There where also a number of external services registered agains’t the vCenter of which the NSX Manager needed to be re-registered for the SSO to allow/trust the new SSL certificate thumbprint. This is common, and one to look out for after migration.

Migration Process:

I’m not going to go through the whole process as it’s been blogged about a number of times, but in a nutshell you need to

  • Take a backup of your existing Windows vCenter
  • I took a snapshot as well before I began the process
  • Download the vCenter Server Appliance 6.5 ISO and mount the ISO
  • Copy the migration-assistant folder to the Windows vCenter
  • Start the migration-assistant tool and work through the pre-checks

If all checks complete successfully the migration assistant will finish at waiting for migration to start. From here you start the VCSA 6.5 installer and click on the Migrate menu option.

Work through the wizard which asks you for detail on the source and target servers, lets you select the compute, storage and appliance size as well as the networking settings. Once everything is entered we are ready to start Stage 1 of the process.

When Stage 1 finishes you are taken to Stage 2 where is asks you to select the migration data as shown below. This will give you some idea as to how much storage you will need and what the initial foot print of the over and above the actual VCSA VM storage.

There are a couple more steps the migration assistant goes through to complete the process…which for me took about 45 minutes to complete but this will vary depending on the amount of date you want to transfer across.

If there are any issues or if the migration failed at any of the steps you do have the option to power down/remove the new VCSA and power back on the old Windows vCenter as is. The old Windows vCenter would have been shutdown by the migration process just as the copying of the key data finished and the VCSA was rebooted with network settings and machine name copied across. There is proper roll back series of steps listed in this VMwareKB.

The only external service that I needed to re-register against vCenter was NSX. vCloud Director carried on without issue, but it’s worth checking out all registered services just in case.

Conclusion and Thoughts:

As mentioned at the start, I was a bit skeptical that this process would work as flawlessly as it did…and on it’s first time! It’s almost a little disappointing to have this as automated and hands off as it is, but it’s a testament to the engineering effort the team at VMware has done around this tool to make it a very viable and reliable way to remove dependancies on Windows and MSSQL. It also allows those with older version of Windows that are well past their used by date the ability to migrate to the VSCA with absolute confidence.

References:

http://www.virtuallyghetto.com/page/2?s=migrate2vcsa

https://github.com/younise/migrate2vcsa-resources

Quick Fix: VCSA 503 Service Unavailable Error

I’ve just had to fix one of my VCSA’s again from the infamous 503 Service Unavailable error that seems to be fairly common with the VCSA even though it’s was claimed to be fixed in vCenter version 6.5d. I’ve had this error pop up fairly regularly since deploying my homelab’s vCenter Server Appliance as a version 6.5 GA instance and for the most part I’ve refrained from rebooting the VCSA just in case the error pops up upon reboot and have even kept a snapshot against the VM just in case I needed to revert to it on the high change that it would error out.

503 Service Unavailable (Failed to connect to endpoint: [N7Vmacore4Http20NamedPipeServiceSpecE:0x0000559b1531ef80] _serverNamespace = / action = Allow _pipeName =/var/run/vmware/vpxd-webserver-pipe)

After doing a Google search for any permanent solutions to the issue, I came across a couple of posts referencing USB passthrough devices that could trigger the error which was plausible given I was using an external USB Hard Drive. IP changes seem to also be a trigger for the error though in my case, it wasn’t the cause. There is a good Reddit thread here that talks about duplicate keys…again related to USB passthrough. It also links externally to some other solutions that where not relevant to my VCSA.

Solution:

As referenced in this VMware communities forum post, to fix the issue I had to first find out if I did have a duplicate key error in the VCSA logs. To do that I dropped into the VCSA shell and went into /var/logs and did a search for any file containing device_key + already exists. As shown in the image above this returned a number of entries confirming that I had duplicate keys and that it was causing the issue.

The VMware vCenter Server Appliance vpxd 6.5 logs are located in the /var/log/vmware/vmware-vpx folder

What was required next was to delete the duplicate embedded PostGres database table entries. To connect to the embedded postgres database you need to run the following command from the VCSA shell:

To remove the duplicate key I ran the following command and rebooted the appliance, noting that the id and device_key will vary.

Once everything rebooted all the services started up and I had a functional vCenter again which was a relief given I was about five minutes away from a restore or a complete rebuild…and ain’t nobody got time for that!

vCenter (VCSA) 6.5 broken after restart from vmware

Reference:

https://communities.vmware.com/thread/556490

 

Quick Fix: VCSA MAC Address Conflict Migration Fail + Fix

The old changing of the MAC address causing NIC/IP issues has raised it’s ugly head again…this time during a planned migration of one of our VCSA from one vCenter to another vCenter. Ran into an issue where the source VM running on an ESXi 5.5 Update 3a host, had a MAC address that wasn’t compatible with the destination host running ESXi 6.0.

Somewhere along the lines during the creation of this VM (and others in this particular Cluster) the assigned MAC address conflicted with the reserved MAC ranges from VMware. There is a workaround to this as mentioned in this post but It was too late for the VCSA and upon reboot I could see that it couldn’t find eth0 and had created a new eth1 interface that didn’t have any IP config. The result was all vCenter services struggled to start and eventually timed out rendering the appliance pretty much dead in the water.

To fix the issue, firstly you need to not down the current MAC address assigned to the VM.

There is an additional eth interface picked up by the appliance that needs to be removed and an adjustment made to the initial eth0 config…After boot, wait for the services to time out (can take 20-30 minutes) and then ALT-F1 into the console. Login using the root account and enable the shell.

  • cd /etc/udev/rules.d/
  • Modify 70-persistent-net.rules and change the MAC address to the value recorded for eth0.
  • Comment out or remove the line corresponding to the eth1 interface.
  • Save and close the file and Reboot the Appliance.

All things being equal you should have a working VSCA again.

References:

https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2012451

Quick Post: Web Client vs VI Client Permissions with VCSA

I’ve been using the VCSA for a couple of years now since the release of vSphere 5.5 and have been happily using the upgraded 6.0 version for a couple of my environments As with most people I found the adjustment going from the VI Client to the new Web Client to be a little rough and I do still find myself going between the two while performing different tasks and configuration actions.

I caught this tweet from Luis Ayuso overnight which he was asking if I had found out the answer to a tweet I had put out almost a year ago meaning it had had a Google Hit as the best response.

After Luis’s issues I decided to put together a very quick post outlining in a basic way what needs to be configured for like for like access in both the Web Client and in the VI Client. In this scenario I have a single VM deployment of the 6.0 VCSA with a simple install of the Platform Services Controller and a SSO Domain configured and the VCSA connected and configured to a local Active Directory.

Let’s start by logging in with a user that’s got no permissions set but is a member of the AD domain. As you can see the Web Client will allow the user to log in but show an empty inventory…the VI Client gives you a “You Shall Not Pass!” response.

I then added the user to the AD Group that had been granted Administrator permissions in the VI Client at the top level.

These match what you see from the Web Client

Logging back into the VI Client the user now has full admin rights

However if you log into the Web Client you still get the Empty Inventory message. To get the user the same access in the Web Client as the VI Client you need to log into the Web Client using the SSO Admin account, head to Administration -> Users and Groups -> Groups and select the Administrators group in the main window. Under Group Members search the AD Domain for the user account or group and add to the membership.

Now when you log into the Web Client with the user account you should see the full inventory and have admin access to perform tasks on vCenter Objects.

This may not be 100% best practice way to achieve the goal but it works and you should consider permission structures for vCenter relative to your requirements.

vSphere 6.0 vCenter Server Appliance: Upgrading from 5.x

Like most VMware Junkies over the past 24 hours I’ve downloaded the vSphere 6.0 bits and had them ready and primed to deploy and discover all the new features. Given the nature of a .0 release (not withstanding Change Control) upgrading production systems won’t be happening any time soon…however I had an opportunity right away to upgrade the Zettagrid Lab VCSA 5.5 Appliance to the 6.0 Appliance.

No word if this upgrade is possible via the old school method shown above, however the basic run through of the upgrade process is that the new VCSA is deployed and then imports data and settings from the existing VCSA while living on temporary network settings. It then shuts down the old VSCA and assumes it’s IP Address.

Working through the Upgrade Process, you need to download VMware-VCSA-all-6.0.0-2562643.iso and mount the VCSA ISO (Thanks @grantorchard) and from there double click on vcsa-setup.htm.

If you haven’t installed the 6.0 Client Integration Plugin located in the \vcsa folder of the ISO close down the browser and run that.

Rerun the vcsa-setup.htm and you should get the Install/Upgrade page…for me in Chrome I have to allow an External Protocol Request.

From there you have the option to Install or Upgrade to VCSA 6.0. This is an Upgrade that was chosen at which point the installed kicks off and gives you some details on what version of the VCSA you can upgrade from.

Clicking on Ok we are now presented with the VCSA Deployment Menu. First step is to enter in a target ESXi Host…not a vCenter like I first did to get the error below!

Moving through the Steps, we enter in a Virtual Machine Name and then specify the Source Appliance details which again is all host based.

Set the Appliance Size

Select the Datastore and then move onto the Temporary Network Settings…the explanation of what’s happening can be seen in the description below

Once all the settings are reviewed click on finish and the installer does all the work. For me this took about 20-30 minutes…bearing in mind that this vCenter was not large so import times will vary. You do have the option to bring across all existing Statistics…checking that option will vary the install time depending on how much data you have.

NOTE: My install progress stalled on Start the vCenter Web Client but that was more likely due to me not following instructions. When you select your target ESXi Host it states that DRS should be switched to Manual during the install process…my assumption is that the the VCSA was brought up on a different host other than the one specified and the installer lost track of the VM and wasn’t able to confirm the completion.

Regardless of that, I was able to log into the Web Client and ensure all Services where in tact and that the upgrade was successful…however I did start to see some strange behaviours and login issues over the weekend. I rolled back and went through the upgrade again, this time ensuring DRS was set to Manual…and had a 100% successful outcome.

I now can enjoy the benefits of the Web Client and start to get a feel for vSphere 6.0.

http://www.vmware.com/files/pdf/vsphere/VMware-vSphere-Platform-Whats-New.pdf

https://www.vmware.com/support/pubs/vsphere-esxi-vcenter-server-6-pubs.html