headdesk
[UPDATE] – VMware have released an official KB for the CBT issue.

Sadly if you recognize the title of this post it’s because this isn’t the first time I’ve felt compelled to write about the continued industry frustration with some repeat ESXi bugs. In February I wrote in general around the recent history of bugs slipping through VMware QA. Four months later and there has been another CBT bug slip through the net…just to reaffirm the core message of my last post I talked about the fact:

There are a number of competing vendors (and industry watchers) waiting to capitalize on any weakness shown in the VMware stack and with the recent number of QA issues leading to a significant bugs popping up not abating, I wonder how much longer VMware can afford to continue to slip up before it genuinely hurts its standing

The one area of absolute concern is the amount of Change Blog Tracking bugs that seems to slip into new builds of ESXi. This time it’s Express Patch 6 for ESXi 6 (Build 3825889) that contains an apparently new symptom of our old friend the CBT Bug. The patch it’s self is a fairly critical one for those running VSAN and VMXNET3 NICs as it addresses some core issues around them but if you use quiesced snapshots duing a VM Backup may have issues with CBT. The vmware.log of a VM being backed up will contain:

vcpu-0| xxxx: SNAPSHOT:SnapshotBranchDisk: Failed to acquire current epoch for disk /vmfs/volumes/
vmdk : Change tracking is not active for this disk xxx.

For a detailed explanation of the issue go to: http://www.running-system.com/take-care-express-patch-6-esxi-6-can-break-backup-cbt-bug/ 

[UPDATE]

VMware Support is aware of this issue and are currently working on it.
This KB article will be updated once the fix for this issue is released.

To work around this issue, apply one of these options:

Again as a Service Provider the CBT bugs are the most worrying because they fundamentally threaten the integrity of backup data which is not something that IT Operation staff or end users who’s data is put at risk should have to worry about and most backup vendor’s use CBT to make backups more efficient. In this case…specifically if you use Veeam the lack of CBT will extend backup windows and increase the chances of VMs not being backed up as expected.

VMware need to continue to nail ESXi (and vCenter) as well as keeping focus on the new products. VSAN, NSX and everything that VMware offers runs on or off of ESXi and though hypervisors are not as front of mind anymore, everything that VMware does relies on ESXi and VMware partners who create products to work with ESXi need it to be stable…especially around backups. Everyone needs to backup with absolute confidence…the more these CBT bugs appear the less confident pundits become…I already hear of people not wanting to go to ESXi 6.0 because of issues like such as this latest one.

That’s not a good place for VMware to be.

Note: I had sat on this post since Friday, but reading through Anton’s Veeam Community Forums Digest this morning where he lamented the lack of QC and repeat issues. He suggest’s that this is the new normal…and that maybe the thing to do is wait and hope for vSphere 6.5…not a good situation. However, like me he also believes that this can be fixed…but it needs to happen before the next release.

References:

https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2144685