We are currently in the process of upgrading all of our vCenter Clusters from ESXi 5.1 to 5.5 Update 2 and have come across a bug whereby the vMotion of VMs from the 5.1 hosts to the 5.5 hosts fails at 14% with the following error:
[UPDATE 6/10] –
Observed Conditions:
- vCenter 5.5 U1/2 (U2 resulted in less 14% stalls, but still occurring)
- Mixed Cluster of ESXi 5.1 and 5.5 Hosts (U1 or U2)
- Has been observed happening in fully upgraded 5.5 U2 Cluster
- VMs have various vCPU and vRAM configuration
- VMs have vRAM Reservations Unlimited vRAM/vCPU
- VMs are vCD Managed
Observed Workarounds:
- Restart Management Agents on vMotion Destination Host (hit + miss)
- vMotion VM to 5.1 Host if available
- Remove vRAM Reservation and Change to Unlimited vCPU/vRAM
- Stop and start VM on different host (not ideal)
We are running vCenter 5.5 Update 1 with an number of Clusters that where on ESXi 5.1 of which some act as Provider vDCs for vCloud Director. Upgrading the Clusters which are not vCloud Providers (meaning VMs aren’t vCD managed or have vCD reservations applied) didn’t result in the issue and we where able to upgrade all hosts to ESXi 5.5 Update two without issue.
There seemed to be no specific setting or configuration of the VMs that ultimatly got stuck during a vMotion from a 5.1 to 5.5 host however they all have memory reservations of various sizes based on our vCloud Allocation Pool settings.
Looking through the host.d logs on the 5.5 Host acting as the destination for the vMotion we see the following entry:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 |
2014-10-01T07:09:12.783Z [7CF81B70 info 'Solo.Vmomi' opID=EE34B3DD-0001CB0E-11-96-32 user=vpxuser] Result: --> (vim.fault.Timedout) { --> dynamicType = <unset>, --> faultCause = (vmodl.MethodFault) null, --> faultMessage = (vmodl.LocalizableMessage) [ --> (vmodl.LocalizableMessage) { --> dynamicType = <unset>, --> key = "vob.sched.group.mem.admitfailed", --> arg = (vmodl.KeyAnyValue) [ --> (vmodl.KeyAnyValue) { --> dynamicType = <unset>, --> key = "1", --> value = "vm.334204", --> }, ... ... --> ], --> message = "Group vm.334204: Cannot admit VM: Memory admission check failed. Requested reservation: 312526 pages --> ", --> }, ... ... --> ], --> message = "Group vm.334204: Invalid memory allocation parameters for virtual machine vmm0:TWC_Web_2_(a7109c43-8041-4b74-91f8-af94a809203d). (min: 298752, max: 4294967295, minLimit: 4294967295, shares: 4294967293, units: pages) --> ", --> }, --> (vmodl.LocalizableMessage) { --> dynamicType = <unset>, --> key = "msg.vmmonVMK.admitFailed", --> arg = (vmodl.KeyAnyValue) [ --> (vmodl.KeyAnyValue) { --> dynamicType = <unset>, --> key = "1", --> value = "msg.vmk.status.VMK_MEM_ADMIT_FAILED", --> }, --> (vmodl.KeyAnyValue) { --> dynamicType = <unset>, --> key = "2", --> value = "VMware ESX", --> } --> ], --> message = "Could not power on VM : Admission check failed for memory resource --> See the VMware ESX Resource Management Guide for information on resource management settings. --> ", --> }, --> (vmodl.LocalizableMessage) { --> dynamicType = <unset>, --> key = "msg.monitorLoop.createVMFailed.vmk", --> message = "Failed to power on VM.", --> }, --> (vmodl.LocalizableMessage) { --> dynamicType = <unset>, --> key = "msg.migrate.resume.fail", --> message = "The VM failed to resume on the destination during early power on. --> ", --> } --> ], --> msg = "" --> } |
Some of the key entries where
1 2 |
message = "Could not power on VM : Admission check failed for memory resource --> See the VMware ESX Resource Management Guide for information on resource management settings. |
1 |
The VM failed to resume on the destination during early power on |