Tag Archives: performance

ESXI 6.5 Storage Performance Issues Resolved in Update 1

I originally came across the issue of slow storage performance with the native vmw_ahci driver that comes bundled with ESXi 6.5 just as I was first playing with my SuperMicro SYS-5028D-TN4T in my homelab. After publishing a couple of posts about the workaround shortly afterwards the issue become quiet prevalent in the community and the post continues to get decent traffic, meaning that the issues impacted quiet a few people out there.

The good news is that with the release of vSphere 6.5 Update 1 there is a fix for the problem in the form of updated drivers for the AHCI module. William Lam has been quick to blog about the fix and if you had previously disabled the driver you will need to re-enable it.

This VMwareKB covers the specific patch as listed in the release notes:

No confirmation as of yet if it actually does the trick, but the release notes look promising as the assumption is that it will resolve the issues so that homelabbers and people using the driver in production systems can rest easy.

References:

https://docs.vmware.com/en/VMware-vSphere/6.5/rn/vsphere-esxi-651-release-notes.html

https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2149910

http://www.virtuallyghetto.com/2017/07/ahci-vmw_ahci-performance-issue-resolved-in-esxi-6-5-update-1.html

NestedESXi – Network Performance Improvements with Learnswitch

I’ve been running my NestedESXi homelab for about eight months now but in all that time I had not installed or enabled the ESXi MAC Learning dvFilter. As a quick refresher the VMware Fling addresses the issues with nested ESXi hosts and the impact that promiscuous mode has when enabled on virtual switches. In a nutshell, network traffic will hit all the network interfaces attached to the portgroup which reduces network throughput and also increases latency and impacts CPU.

The ESXi MAC Learn dvFilter Fling was released about two years ago and its a must have for those running homelabs or work labs running nested ESXi. However earlier this year a new fling was released that improves on the dvFilter and addresses some of it’s limitations. The new native MAC Learning VMkernel module is called Learnswitch.

ESXi Learnswitch is a complete implementation of MAC Learning and Filtering and is designed as a wrapper around the host virtual switch. It supports learning multiple source MAC addresses on virtual network interface cards (vNIC) and filters packets from egressing the wrong port based on destination MAC lookup. This substantially improves overall network throughput and system performance for nested ESX and container use cases.

For a more in depth look at it’s functionality head over to William Lams blog post here.

dvFilter vs Learnswitch:

I was interested to see if the new Learnswitch offered any significant performance improvements over the dvFilter in addition to its main benefits. I went about installing and enabling the dvFilter in my lab and ran some basic performance tests using Crystal Disk Mark. Before that, I ran the performance test without either installed as a base.

Firstly to see what the network traffic looks like hitting the nested hosts you can see from the ESXTOP output below that each host is dealing with about the same amount of received packets. Overall throughput is reduced when this happens.

In terms of performance the Crystal Disk Mark test run on a nested VM (right) showed reduced performance across all tests when compared to one run on the parent host (left) directly.

There was also elevated datastore latency and significant CPU usage due to the overheads with the increased traffic hitting all interfaces.

The CPU usage alone shows the value in having the dvFilter or Learnswitch installed when running nested ESXi hosts.

With the baseline testing done I installed and enabled the dvFilter and then ran the same tests. For a detailed look at how to install the dvFilter (just in case you don’t fit the requirements for using the Learnswitch module) check out my initial post on the dvFilter here. Having gone through that I went about uninstalling the dvFilter and installing and configuring the Learnswitch.

Like the dvFilter you need to download and install am ESXi software bundle but unlike the dvFilter, you need to reboot the host to enable the Learnswitch module.

As per the instructions on William Lam’s post or the Fling page you then need to configure and run a Python script to enable the Learnswitch against the NestedESXi portgroups that have promiscuous mode enabled.

From there the impact of the module is immediate and you can see a normalization of network traffic hitting the interfaces of each NestedESXi host. When running the performance test the ESXTOP output is significantly different to what you see if the module is not loaded as shown below.

You also have access to a new command that lists out stat’s of the Learnswitch showing packet and port statistics as well as the current MAC address table.

In terms of what it looks like from a performance point of view, below are the results of all Crystal Disk Mark tests. The bottom two represent the dvFilter (left) and the Learnswitch (right).

And finally to have a look at the improvement in CPU performance with the modules installed you can see below a timeline showing the performance tests run at different times across the last 24 hours…again a significant improvement looking at the graphs on the left hand side which was during the testing without any module and then moving across to the dvFilter test with the Learnswitch test on the right hand side. It does seem like the Learnswitch is a little better on CPU, but can’t be 100% with my limited testing.

Conclusion:

As expected there isn’t a huge different in performance between both modules but certainly the features of the Learnswitch make it the new preferred choice out of the two if the requirements are met. Again, the main advantages of the Learnswitch over the dvFilter make it a must have addition to any NestedESXi environment. If you haven’t installed either yet…get onto it!

CPU Overallocation and Poor Network Performance in vCD – Beware of Resource Pools

For the longest time all VMware administrators have been told that resource pools are not folders and that they should only be used under circumstances where the impact of applying the resource settings is fully understood. From my point of view I’ve been able to utilize resource pools for VM management without too much hassle since I first started working on VMware Managed Service platforms and from a managed services point of view they are a lot easier to use as organizational “folders” than vSphere folders themselves. For me, as long as the CPU and Memory Resources Unlimited checkbox was ticked nothing bad happened.

Working with vCloud Director however, resource pools are heavily utilized as the control mechanism for resource allocation, sharing and management. It’s still a topic that can cause confusion when trying to wrap ones head around the different allocation models vCD offers. I still reference blog posts from Duncan Epping and Frank Denneman written nearly seven years ago to refresh my memory every now and then.

Before moving onto an example of how overallocation or client undersizing in vCloud Director can cause serious performance issues it’s worth having a read of this post by Frank that goes through in typical Frank detail around what resource management looks like in vCloud Director.

Proper Resource management is very complicated in a Virtual Infrastructure or vCloud environment. Each allocation models uses a different combination of resource allocation settings on both Resource Pool and Virtual Machine level

Undersized vDCs Causing Network Throughput Issue:

The Allocation Pool model was the one that I worked with the most and it used to throw up a few client related issues when I worked at Zetttagrid. When using the Allocation Pool method which is the default model you are specifying the amount of resources for your Org vDC and also specifying how much of these resources are guaranteed. The guarantee means that a reservation will be set and that the amount of guaranteed resources is taken from the Provider vDC. The total amount of resources specified is the upper boundary, which is also the resource pool limit.

Because tenants where able to purchase Virtual Datacenters of any size there was a number of occasions where the tenants undersized their resources. Specifically, one tenant came to us complaining about poor network performance during a copy operation between VMs in their vDC. At first the operations team thought that is was the network causing issues…we where also running NSX and these VMs where also on a VXLAN segment so fingers where being pointed there as well.

Eventually, after a bit of troubleshooting we where able to replicate the problem…it was related to the resources that the tenant had purchased or lack thereof. In a nutshell because the allocation pool model allows the over provisioning or resources not enough vCPU was purchased. The vDC resource pool had 1000Mhz of vCPU with a 0% reservation but he had created 4 dual vCPU VMs. When the network copy job started it consumed CPU which in turn exhausted the vCD CPU allocation.

What happened next can be seen in the video below…

With the resource pool constrained ready time is introduced to throttle the CPU which in turn impacts the network throughput. As shown in the video when the resource pool has the the unlimited button checked the ready goes away and the network throughput returns to normal.

Conclusion:

Again, its worth checking out the impact on the network throughput in the video as it clearly shows what happens what tenants underprovision or overallocate their Virtual Datacenters in vCloud Director. Outside of vCloud Director it’s also handy to understand the impact of applying reservations on Resource Pools in terms of VM compute and networking performance.

It’s not always the network!

References:

http://www.vmware.com/resources/techresources/10325

http://frankdenneman.nl/2010/09/24/provider-vdc-cluster-or-resource-pool/

http://www.yellow-bricks.com/2012/02/28/resource-pool-shares-dont-make-sense-with-vcloud-director/

https://kb.vmware.com/kb/2006684

Allocation Pool Organization vDC Changes in vCloud Director 5.1

ESXi 6.5 Storage Performance Issues and Fix

[NOTE] : I decided to republish this post with a new heading and skip right to the meat of the issue as I’ve had a lot of people reach out saying that the post helped them with their performance issues on ESXi 6.5. Hopefully people can find the content easier and have a fix in place sooner.

The issue that I came across was to do with storage performance and the native driver that comes bundled with ESXi 6.5. With the release of vSphere 6.5 yesterday, the timing was perfect to install ESXI 6.5 and start to build my management VMs. I first noticed some issues when uploading the Windows 2016 ISO to the datastore with the ISO taking about 30 minutes to upload. From there I created a new VM and installed Windows…this took about two hours to complete which I knew was not as I had expected…especially with the datastore being a decent class SSD.

I created a new VM and kicked off a new install, but this time I opened ESXTOP to see what was going on, and as you can see from the screen shots below, the Kernel and disk write latencies where off the charts topping 2000ms and 700-1000ms respectively…In throuput terms I was getting about 10-20MB/s when I should have been getting 400-500MB/s. 

ESXTOP was showing the VM with even worse write latency.

I thought to myself if I had bought a lemon of a storage controller and checked the Queue Depth of the card. It’s listed with a QD of 31 which isn’t horrible for a homelab so my attention turned to the driver. Again referencing the VMware Compatibility Guide the listed driver for the controller the device driver is listed as ahci version 3.0.22vmw.

I searched for the installed device driver modules and found that the one listed above was present, however there was also a native VMware device drive as well.

I confirmed that the storage controller was using the native VMware driver and went about disabling it as per this VMwareKB (thanks to @fbuechsel who pointed me in the right direction in the vExpert Slack Homelab Channel) as shown below.

After the host rebooted I checked to see if the storage controller was using the device driver listed in the compatibility guide. As you can see below not only was it using that driver, but it was now showing the six HBA ports as opposed to just the one seen in the first snippet above.

I once again created a new VM and installed Windows and this time the install completed in a little under five minutes! Quiet a difference! Upon running a crystal disk mark I was now getting the expected speeds from the SSDs and things are moving along quiet nicely.

Hopefully this post saves anyone else who might by this, or other SuperMicro SuperServers some time and not get caught out by poor storage performance caused by the native VMware driver packaged with ESXi 6.5.


References
:

https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2044993

HomeLab – SuperMicro 5028D-TNT4 Storage Driver Performance Issues and Fix

Ok, i’ll admit it…i’ve had serious lab withdrawals since having to give up the awesome Zettagrid Labs. Having a lab to tinker with goes hand in hand with being able to generate tech related content…point and case, my new homelab got delivered on Monday and I have been working to get things setup so that I can deploy my new NestedESXi lab environment.
The issue that I came across was to do with storage performance and the native driver that comes bundled with ESXi 6.5. With the release of vSphere 6.5 yesterday, the timing was perfect to install ESXI 6.5 and start to build my management VMs. I first noticed some issues when uploading the Windows 2016 ISO to the datastore with the ISO taking about 30 minutes to upload. From there I created a new VM and installed Windows…this took about two hours to complete which I knew was not as I had expected…especially with the datastore being a decent class SSD.
By way of an quick intro (longer first impression post to follow) I purchased a SuperMicro SYS-5028D-TN4T that I based off this TinkerTry Bundle which has become a very popular system for vExpert homelabers. It’s got an Intel Xeon D-1541 CPU and I loaded it up with 128GB or RAM. The system comes with an embedded Lynx Point AHCI Controller that allows up to six SATA devices and is listed on the VMware Compatibility Guide for ESXi 6.5.

I created a new VM and kicked off a new install, but this time I opened ESXTOP to see what was going on, and as you can see from the screen shots below, the Kernel and disk write latencies where off the charts topping 2000ms and 700-1000ms respectivly…In throuput terms I was getting about 10-20MB/s when I should have been getting 400-500MB/s. 

ESXTOP was showing the VM with even worse write latency.

I thought to myself if I had bought a lemon of a storage controller and checked the Queue Depth of the card. It’s listed with a QD of 31 which isn’t horrible for a homelab so my attention turned to the driver. Again referencing the VMware Compatability Guide the listed driver for the conrtoller the device driver is listed as ahci version 3.0.22vmw.

I searched for the installed device driver modules and found that the one listed above was present, however there was also a native VMware device drive as well.

I confirmed that the storage controller was using the native VMware driver and went about disabling it as per this VMwareKB (thanks to @fbuechsel who pointed me in the right direction in the vExpert Slack Homelab Channel) as shown below.

After the host rebooted I checked to see if the storage controller was using the device driver listed in the compatability guide. As you can see below not only was it using that driver, but it was now showing the six HBA ports as opposed to just the one seen in the first snippet above.

I once again created a new VM and installed Windows and this time the install completed in a little under five minutes! Quiet a difference! Upon running a crystal disk mark I was now getting the expected speeds from the SSDs and things are moving along quiet nicely.

Hopefully this post saves anyone else who might by this, or other SuperMicro SuperServers some time and not get caught out by poor storage performance caused by the native VMware driver packaged with ESXi 6.5.


References
:

http://www.supermicro.com/products/system/midtower/5028/SYS-5028D-TN4T.cfm

https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2044993

Nested ESXi – Reduced Network Throughput with Promiscuous Mode PortGroups

We have been conducting performance and stress testing of a new NFS connected storage platform over the past month or so and through that testing we have seen some interesting behaviours in terms of what effects overall performance relative to total network throughput vs Latency vs IOPS.

Using a couple of stress test scripts, we have been able to bottleneck the performance at the network layer…that is to say, all conditions being equal we can max out a 10Gig Uplink across a couple hosts resulting in throughput of 2.2GB/s or ~20,000Mbits/s on the switch. Effectively the backend storage (at this stage) is only limited by the network.

To perform some “real world” testing I deployed a Nested ESXi Lab comprising of our vCloud Platform into the environment. For the Nested ESXi Hosts (ESXi 5.0 Update 2) networking I backed 2x vNICs with a Distributed Switch PortGroup configured as a Trunk accepting all VLANs and Accepting Promiscuous Mode traffic.

While continuing with the load testing in the parent environment (not within the nested ESXi) we started to see throughput issues while running the test scripts. Where we had been getting 1-1.2GB/s of throughput per host, all of a sudden that had dropped to 200-300MB/s. Nothing else had changed on the switching so we where left scratching our heads.

From the above performance graph, you can see that performance was 1/5 of what we had seen previously and it looked like there was some invisible rate limiting going on. After accusing the network guy (@sjdix0n) I started to switch off VMs to see if there was any change to the throughput. As soon as all VMs where switched off, throughput returned to the expected Uplink saturation levels.

As soon as I powered on one of the nested ESXi hosts performance dropped from 1.2GB/s to about 450MB/s. As soon as I turned on the other nested host it dropped to about 250MB/s. In reverse you can see that performance stepped back up to 1.2GB/s with both nested hosts offline.

So…what’s going on here? The assumption we made was that Promiscuous Mode was causing saturation within the Distributed vSwitch causing the performance test to record the much lower values and not be able to send out traffic at expected speeds.

Looking at one of the nested hosts Network Performance graphs you can see that it was pushing traffic on both vNICs roughly equal to the total throughput of that the test was generating…the periods of non activity above was when we where starting and stopping the load tests.

In a very broad sense we understand how and why this is happening, but it would be handy to get an explanation as the the specifics of why having a setup like this can have such a huge effect on overall performance.

UPDATE WITH EXPLANATION: After posting this initially I reached out to William Lam (@lamw) via Twitter and he was kind enough to post an article explaining why Promiscuous Mode and Forged Transmits are required in Nested ESXi environments.

http://www.virtuallyghetto.com/2013/11/why-is-promiscuous-mode-forged.html

Word to the wise about having Nested Hosts…it can impact your environment in ways you may not expect! Is it any wonder why nested hosts are not officially supported?? 🙂