Tag Archives: Quick Fix

Quick Fix – OS Not Found Deploying Windows Template with Terraform

During the first plan execution of a new VM based on a Windows Server Core VM Template, my Terraform plan timed out on Guest Customizations. The same plan had worked without issue previously with an existing Windows Template, so I was a little confused as to what had gone wrong. When I checked the console of the cloned VMs in vSphere, I found that it was stuck at the boot screen not able to find the Operating System.

Operating System not found – Obviously having issues booting into the templated disk.

After a little digging around, I came across this post which describes the error being related to the VM Template being configured with EFI Firmware which is now the default for vSphere 6.7 VMs. Upon cloning, Terraform deploys the new VM with a BIOS Firmware resulting in the disk not able to boot.

Checking the VM Template, it did in-fact have EFI set.

An option was to reconfigure the Template and make it default to BIOS, however the Terraform vSphere Provider was updated last year to include an option to set the Firmware on deployment.

In the instance declaration file we can set firmware as shown below

If we set it up such that it reads that value from a variable we only have to configure the efi or bios setting once in the terraform.tfvars files.

In the variables.tf file the variable is set and has a default value of bios set.

Once this was configured, the plan was able to successfully deploy the new Windows Template without issue and Guest Customizations where able to continue.

Terraform Version: 0.11.7

Resources:

https://github.com/terraform-providers/terraform-provider-vsphere/issues/441

https://github.com/terraform-providers/terraform-provider-vsphere/pull/485

https://www.terraform.io/docs/providers/vsphere/r/virtual_machine.html#firmware

Quick Fix – ESXi loses all Network Configuration… but still runs?

I had a really strange situation pop up in one of my lab environments over the weekend. vSAN Health was reporting that one of the hosts had lost networking connectivity to the rest of the cluster. This is something i’ve seen intermittently at times so waited for the condition to clear up. When it didn’t clear up, I went to look at the host to put it into maintenance mode, but found that I wasn’t getting the expected vSAN options.

I have seen situations recently where the enable vSAN option on the VMkernel interface had been cleared and vCenter thinks there are networking issue. I thought maybe it was this again. Not that that situation in its self was normal, but what I found when I went to view the state of the VMkernel Adapters from the vSphere Web Client was even stranger.

No adapters listed!

The host wasn’t reported as being disconnected and there was still connectivity to it via the Web UI and SSH. To make sure this wasn’t a visual error from the Web Client I SSH’ed into the host and ran esxcli to get a list of the VMkernel interfaces.

Unable to find vmknic for dvsID: xxxxx

So from the cli, I couldn’t get a list of interfaces either. I tried restarting the core services without luck and still had a host that was up with VMs running on it without issue, yet reporting networking issues and having no network interfaces configured per the running state.

Going to the console… the situation was not much better.

Nothing… no network or host information at all 🙂

Not being bale to reset the management network my only option from here was to reboot the server. Upon reboot the host did come back up online, however the networking was reporting as being 0.0.0.0/0 from the console and now the host was completely offline.

I decided to reboot using last know good configuration as shown below:

Upon reboot using the last known good configuration all previous network settings where restored and I had a list of VMkernel interfaces again present from the Web Client and from the cli.

Because of the “dirty” vSAN reboot, as is usual with anything that disrupts vSAN, the cluster needed some time to get its self back into working order and while some VMs where in an orphaned or unavailable state after reboot, once the vSAN re-sync had completed all VMs where back up and operational.

Cause and Offical Resolution:

The workaround to bring back the host networking seemed to do the trick however I don’t know what the root cause was for the host to lose all of its network config. I have an active case going with VMware Support at the moment with the logs being analysed. I’ll update this post with the results when they come through.

ESXi Version: 6.7.0.13006603
vSphere Version: 6.7.0.30000
NSX-v: 6.4.4.11197766

Quick Fix – Incompatible Veeam Backup for Office 365 Server Version

This week Veeam dropped version 3.0 of Backup for Microsoft Office 365, which represents another significant update to the SaaS backup platform and builds on the previous 2.0 and 1.5 releases. For a quick look at some of the highlights, head to my fellow Technologist, Niels Engelen blog post for an overview. Like many out there i’ve been waiting patiently to install the GA and got things updated without any issues…however when looking to browse existing backup points for my Office 365 mailboxes I came across this error.

Incompatible Veeam Backup for Office 365 Server Version (received: 9.6.5.422, expected: 9.6.4.1078).

This is after the Veeam Explorer for Microsoft Exchange has been loaded and it tried to connect to the VBO server. The error is a little misleading in that it’s actually talking about the version of the Explorer rather than the VBO server its self.

If you look inside the VBO v3 downloaded zip file that you will see three installers.

The simple fix is to install the new version of the explorers. The dead giveaway is the new splash screen as seen below.

Once done, relaunching the Explorer session will success and you will be able to see the backed up mailboxes listed.

So there you go… a really simple fix to an error that might stump a few people at first!

Quick Fix – Backing up vCenter Content Library Content with Veeam

A question came up in the Veeam Forums this week about how you would backup the contents of a Content Library. As a refresher, content libraries are container objects for VM templates, vApp templates, and other types of files. Administrators can use the templates in the library to deploy virtual machines and vApps via vCenter. Using Content libraries results in consistency, compliance, efficiency, and automation when deploying workloads at scale.

Content Libraries are created and managed from a single vCenter, but can be shared to other vCenter Server instances. VM templates and vApps templates are stored as OVF file formats in the content library. You can also upload other file types, such as ISO images, text files, and so on, in a content library. It’s possible to create content libraries that are 3rd party hosted, such as the example here by William Lam looking at how to create and manage an AWS S3 based content library.

For those looking to store them locally on an ESXi datastore there is a way to backup the contents of the content library with a Veeam Backup & Replication File Copy job. This is a basic solution to the question posed in the Veeam Forums however it does work. With the File Copy, you can choose any file or folder contained in any connected infrastructure in Backup & Replication. For a Content Library stored on an ESXi datastore you just need to browse to the location as shown below.

The one caveat is that the destination can’t be a Veeam Repository. There is no versioning or incremental copy so every time the job is executed a full backup of the files is performed.   

One way to work around this is to set the destination to a location that is being backed up in a Veeam Job or an Agent Job. However if the intention is to just protect the immediate contents of the library than have a full once off backup shouldn’t be an issue.

You can also create/add to a File Copy job from the Files view as shown above.

In terms of recovery, The File Copy job is doing a basic file copy and doesn’t know about the fact the files are part of a Content Library and as you can see, the folder structure that vCenter creates uses UIDs for identification. Because of this, if there was a situation where a whole Content Library was lost, it would have to be recreated in vCenter and then the imported back in directly from the File Copy Job destination folder location.

Again, this is a quick and nasty solution and it would be a nice feature addition to have this backed up natively…naming and structure in place. For the moment, this is a great way of utilizing a cool feature of Veeam Backup & Replication to achieve the goal.

Quick Tip: Let’s Encrypt ACME Powershell Ownership Challenge Can’t see Challenge Data

I’m currently going through the process of acquiring a new Let’s Encrypt free SSL Certificate against a new domain I registered. For a great overview of what Let’s Encrypt is and what is can do for you, head over to Luca Dell’Oca’s blog here. I was following Luca’s instructions for getting the new domain authorised for use with the Let’s Encrypt service via a DNS challenge when I ran into the following.

After running the PowerShell command to generate the challenge, it was not returning the Handler Message as expected form the direct output…well obviously anyway.

After scratching my head for a bit, I checked to see if the data was contained withing the returned PowerShell command.

From here I was able to create the DNS TXT entry and complete the challenge.

Just in case it wasn’t obvious this very quick post will save you a bit of time.

Quick Fix: vSAN Health Reports iSCSI Target Service Stopped

A few weeks ago I wrote about using iSCSI as a backup repository target. While still running this POC in my environment I came across an error in the vSAN Health Checker stating the vSAN iSCSI target service was in a Failed state. Drilling down into the vSAN Health check tree I could see a Service Runtime status of stopped as shown below against the host.

This host had recently been marked as unreachable in vCenter and required a Management Agent reset to bring it back online. There is a chance that that process stopped the iSCSI Target service but did not start it. In any case there is an easy way to see the status of the services and then get them back online.

Once that’s been done, a re-run of the vSAN Health checker will show that the issue has been resolved and the iSCSI Target Service on the host is now running.

References:

https://kb.vmware.com/s/article/2147603

 

vCloud Director 9.0: Manual Quick fix for VXLAN Network Pool Error

vCloud Director 9.0, released last week has a bunch of new enhancements and a lot of those are focused around it’s integration with NSX. Tom Fojta has a what’s new page on the go with a lot of the new features being explained. One of his first posts just after the GA was around the new feature of being able to manually create VXLAN backed Network Pools.

VXLAN Network Pool is recommended to be used as it scales the best. Until version 9, vCloud Director would create new VXLAN Network Pool automatically for each Provider VDC backed by NSX Transport Zone (again created automatically) scoped to cluster that belong to the particular Provider VDC. This would create multiple VXLAN network pools and potentially confusion which to use for a particular Org VDC.

In vCloud Director 9.0 we now have the option of creating a VXLAN backed network pool manually instead of one being created at the time of a setting up a Provider vDC. In many of my environments for one reason or another the automatic creation of VXLAN network pool together with NSX would fail. In fact my current NextedESXi SliemaLabs vCD instance shows the following error:

There is a similar but less serious error that can be fixed by changing the replication mode from within the NSX Web Client as detailed here by Luca, however like my lab I’ve know a few people to run into the more serious error as shown above. You can’t delete the pool and a repair operation will continue to error out. Now in vCD 9.0 we can create a new VXLAN Network Pool form the Transport Zones created in NSX.

Once that’s been done you will have the newly created VXLAN Network Pool that’s truly more global and tied to best practice for NSX Transport Zones and one that can be used with the desired replication mode. The old one will remain, but you can now configure Org vDCs to consume the VXLAN backed network pool over the traditional VLAN backed pool.

References:

vCloud Director 9: What’s New

vCloud Director 9: Create VXLAN Network Pool

ESXI 6.5 Storage Performance Issues Resolved in Update 1

I originally came across the issue of slow storage performance with the native vmw_ahci driver that comes bundled with ESXi 6.5 just as I was first playing with my SuperMicro SYS-5028D-TN4T in my homelab. After publishing a couple of posts about the workaround shortly afterwards the issue become quiet prevalent in the community and the post continues to get decent traffic, meaning that the issues impacted quiet a few people out there.

The good news is that with the release of vSphere 6.5 Update 1 there is a fix for the problem in the form of updated drivers for the AHCI module. William Lam has been quick to blog about the fix and if you had previously disabled the driver you will need to re-enable it.

This VMwareKB covers the specific patch as listed in the release notes:

No confirmation as of yet if it actually does the trick, but the release notes look promising as the assumption is that it will resolve the issues so that homelabbers and people using the driver in production systems can rest easy.

References:

https://docs.vmware.com/en/VMware-vSphere/6.5/rn/vsphere-esxi-651-release-notes.html

https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2149910

http://www.virtuallyghetto.com/2017/07/ahci-vmw_ahci-performance-issue-resolved-in-esxi-6-5-update-1.html

Quick Fix – Unable to Upgrade Distributed Switch After vCenter Upgrade

This week I upgraded (and migrated) my SliemaLabs NestedESXi vCenter from a Windows 6.0 server to a 6.5 VCSA …everything went well, but ran into an issue when I went to upgrade my distributed switch to 6.5.0. Even though everything appeared to be working with regards to the host and VM networking associated with the switch, when I went to upgrade it I got the following error:

Doing a quick Google for Unable to retrieve data about the distributed switch came up with nothing and clicking on next didn’t do anything actionable. A restart of the Web Client and a reboot of the VCSA didn’t resolve the issue either.The distributed switch in question was still on version 5.5 as I forgot to upgrade it to 6.0 during the upgrade to vCenter 6.0. Weather that condition somehow caused the error I am not sure…regardless the quick fix or better said…work around is pretty simple; Use PowerCLI.

Interestingly the Vendor is different…though not sure this caused the issue. In any case the work around is to upgrade the distributed switch using the Set-VDSwitch command.

And success!

I’m not sure what caused the error to appear in the Web Client but the workaround meant that it became a moot point. Suffice to say if you come across this error in your Web Client when trying to upgrade a distributed switch…head over the PowerCLI.