Last week VMware released a new patch (ESXi 6.0 Build 5572656) that addresses a number of serious bugs with Snapshot operations. Usually I wouldn’t blog about a patch release, but when I looked through the rest of the fixes in the VMwareKB it was apparent to me that this was more than your average VMware patch and addresses a number of issues around storage but again, a lot around Snapshot operations which is so critical to most VM backup operations.

Here are some of the key resolutions that I’ve picked out from the patch release:

  • When you take a snapshot of a virtual machine, the virtual machine might become unresponsive
  • After you create a virtual machine snapshot of a SEsparse format, you might hit a rare race condition if there are significant but varying write IOPS to the snapshot. This race condition might make the ESXi host stop responding
  • Because of a memory leak, the hostd process might crash with the following error: Memory exceeds hard limit. Panic. The hostd logs report numerous errors such as Unable to build Durable Name. This kind of memory leak causes the host to get disconnected from vCenter Server
  • Using SESparse for both creating snapshots and cloning of virtual machines, might cause a corrupted Guest OS file system
  • During snapshot consolidation a precise calculation might be performed to determine the storage space required to perform the consolidation. This precise calculation can cause the virtual machine to stop responding, because it takes a long time to complete
  • Virtual Machines with SEsparse based snapshots might stop responding, during I/O operations with a specific type of I/O workload in multiple threads
  • When you reboot the ESXi host under the following conditions, the host might fail with a purple diagnostic screen and a PCPU xxx: no heartbeat error.
    • You use the vSphere Network Appliance (DVFilter) in an NSX environment
    • You migrate a virtual machine with vMotion under DVFilter control
  • Windows 2012 domain controller supports SMBv2, whereas Likewise stack on ESXi supports only SMBv1. With this release, the likewise stack on ESXi is enabled to support SMBv2
  • When the unmap commands fail, the ESXi host might stop responding due to a memory leak in the failure path. You might receive the following error message in the vmkernel.log file: FSDisk: 300: Issue of delete blocks failed [sync:0] and the host gets unresponsive.
  • In case you use SEsparse and enable unmapping operation to create snapshots and clones of virtual machines, after the wipe operation (the storage unmapping) is completed, the file system of the guest OS might be corrupt. The full clone of the virtual machine performs well.

There is also a number of vSAN related fixes in the patch so overall it’s worth looking to apply this patch as soon as is possible.

References:

https://kb.vmware.com/kb/2149955