So I recently found a bug with CPU Over-Commitment controls in DRS.
Typically if you try to power on a Virtual Machine and its going to exceed the limit you have specified on the cluster it will throw an error like…
“vCenter Server was unable to find a suitable host to power on the following virtual machines for the reasons listed below.”
“The total number of virtual CPUs present or requested in virtual machines’ configuration has exceeded the limit on the host: XX”
The bug I discovered appears to bypass the workflow that checks if this limit is going to be exceeded for the cluster and powers on the Virtual Machine anyway….
Recently I had been performing load and performance tests on several large vSAN clusters with this setting enabled, as expected I reached a limit and I couldn’t power on any more VM’s to generate CPU and Memory workload… so I switched my focus to vSAN’s storage to try and break vSAN….
I needed to clone many existing Windows VM’s to consume space on the vSAN datastores, to defeat vSAN’s dedupe and compression I needed to power on these VM’s and enable Bitlocker… so to my surprise when I cloned the VM with the option “Power on virtual machine after creation” the VM actually powered on once the clone was finished.
I tested this on several clusters including my personal lab and was able to replicate it every time… tested on the vCenter 6.5.0 Build 5178943 and 6.5.0 Build 5705665
So the implication of this bug could be in a provisioning process using the clone workflow… the system could continue to provision and power on VM’s if it doesn’t have its own ratio control / checks in place.