WMware on AWS: vMotion doesn’t work
Short story: The VMs running at On-prem will be using higher cpu feature sets and if its migrated live to VMC with lesser cpu feature sets, the VMs may crash. That is why vMotion for such a scenario fails at validation.
First vMotion, first error! 🙁
ook!! what happens?!
First of all. I used a VM that exists on On-prem VMware cluster for my first test migrations. That VM was created years ago with an old vSphere version. Cold migration and Bulk migration work well. So, I got this error only doing a vMotion. At first look the log is not very clear:
A general system error occurred: vMotion failed: unknow error
and if we read all the log message we can found some interesting words like these:
waiting for data Connection closed by remote host remote host closed the connection unexpectedly and migration was stopped The closed connection probably results from a migration failure detected on the remote host
So, I tried some new test to understand this issue.
I prepared 1 large VM (~50 GB in size) and I cloned it twice. Now I have 3 VMs that I’ll use to test 3 type of migration. I made sure they have the last HW version, then I created 3 migration tasks:
- Live migration for VM01
- Cold migration for VM02
- Bulk migration for VM03
I ran all these migration tasks at the same time and waited for them to complete.
Whoa! I got a new error! this is the entire message but we can undestand that the error is on EVC configuration:
Migration Failed Exception while processing destination side start Relocate result The target host does not support the virtual machine's current hardware requirements. com.vmware.vim.vmfeature.cpuid.avx512bw:Advanced Vector Extensions 512 Byte and Word Instructions (AVX512BW) are unsupported. com.vmware.vim.vmfeature.cpuid.avx512cd:Advanced Vector Extensions 512 Confict Detection (AVX512CD) are unsupported. com.vmware.vim.vmfeature.cpuid.avx512dq:Advanced Vector Extensions 512 Doubleword and Quadword (AVX512DQ) are unsupported. com.vmware.vim.vmfeature.cpuid.avx512f:Advanced Vector Extensions 512 Foundation (AVX512F) are unsupported. com.vmware.vim.vmfeature.cpuid.avx512vl:AVX-512 Vector Length Extensions (AVX512VL) are unsupported. com.vmware.vim.vmfeature.cpuid.clflushopt:Optimized version of clflush (CLFLUSHOPT) is unsupported. com.vmware.vim.vmfeature.cpuid.clwb:Cache line write back (CLWB) is unsupported. com.vmware.vim.vmfeature.cpuid.pku:Protection Keys For User-mode Pages (PKU) is not supported. com.vmware.vim.vmfeature.cpuid.xcr0_master_bndcsr:XSAVE of BNDCFGU and BNDSTATUS registers (BNDCSR) is unsupported. com.vmware.vim.vmfeature.cpuid.xcr0_master_bndregs:XSAVE of BND0-BND3 bounds registers (BNDREGS) is unsupported. com.vmware.vim.vmfeature.cpuid.xcr0_master_hi16_zmm:XSAVE of ZMM registers ZMM16-ZMM31 is unsupported (ZMM_Hi16). com.vmware.vim.vmfeature.cpuid.xcr0_master_opmask:XSAVE of opmask registers k0-k7 is unsupported. com.vmware.vim.vmfeature.cpuid.xcr0_master_pkru:XSAVE of Protection Key Register User State (PKRU) is unsupported. com.vmware.vim.vmfeature.cpuid.xcr0_master_zmm_h:XSAVE of high 256 bits of ZMM registers ZMM0-ZMM15 is unsupported (ZMM_Hi256). com.vmware.vim.vmfeature.cpuid.xsavec:XSAVEC (save extended states in compact format) is unsupported. com.vmware.vim.vmfeature.cpuid.xsaves:XSAVES (save supervisor states) is unsupported. com.vmware.vim.vmfeature.cpuid.mpx:Memory Protection Extensions (MPX) are unsupported. com.vmware.vim.vpxd.vmcheck.featureRequirementsNotMet.useClusterOrPerVmEvc:Use a cluster with Enhanced vMotion Compatibility (EVC) enabled to create a uniform set of CPU features across the cluster, or use per-VM EVC for a consistent set of CPU features for a virtual machine and allow the virtual machine to be moved to a host capable of supporting that set of CPU features. See KB article 1003212 for cluster EVC information.
As VMware say the I3.metall servers have Intel Xeon E5-2686 v4 (Broadwell) processors.
We can see that cold migration and bulk migration succeeded, whereas vmotion failed due to hardware incompatibility.
Reading through: CPU Compatibility Scenarios
When I attempt to migrate a virtual machine with vMotion, one of the following scenarios applies:
– The virtual machine’s CPU feature set contains features not supported by the destination host.
– CPU compatibility requirements are not met, and migration with vMotion cannot proceed.
EVC overcomes such incompatibility by providing a “baseline” feature set for all virtual machines running in a cluster. This baseline feature set hides the differences among the clustered hosts’ CPUs from the virtual machines.
To workaround this , one of the alternative is to enable EVC with broadwell as the family on the onprem side (yes, EVC was disabled), this will be cluster based EVC.
The VM running at On-prem will be using higher cpu feature sets and if its migrated live to VMC with lesser cpu feature stes, the VM may crash. That is why vmotion for such a scenario fails at validation.
At the end, if you have a VMware cluster with newer processor family than Broadwell generation EVC could be enabled on your cluster to use vMotion to migrate VMs on VMC on AWS.
Links:
Enhanced vMotion Compatibility (EVC) processor support