NestedESXi – Network Performance Improvements with Learnswitch
I’ve been running my NestedESXi homelab for about eight months now but in all that time I had not installed or enabled the ESXi MAC Learning dvFilter. As a quick refresher the VMware Fling addresses the issues with nested ESXi hosts and the impact that promiscuous mode has when enabled on virtual switches. In a nutshell, network traffic will hit all the network interfaces attached to the portgroup which reduces network throughput and also increases latency and impacts CPU.
The ESXi MAC Learn dvFilter Fling was released about two years ago and its a must have for those running homelabs or work labs running nested ESXi. However earlier this year a new fling was released that improves on the dvFilter and addresses some of it’s limitations. The new native MAC Learning VMkernel module is called Learnswitch.
ESXi Learnswitch is a complete implementation of MAC Learning and Filtering and is designed as a wrapper around the host virtual switch. It supports learning multiple source MAC addresses on virtual network interface cards (vNIC) and filters packets from egressing the wrong port based on destination MAC lookup. This substantially improves overall network throughput and system performance for nested ESX and container use cases.
For a more in depth look at it’s functionality head over to William Lams blog post here.
dvFilter vs Learnswitch:
I was interested to see if the new Learnswitch offered any significant performance improvements over the dvFilter in addition to its main benefits. I went about installing and enabling the dvFilter in my lab and ran some basic performance tests using Crystal Disk Mark. Before that, I ran the performance test without either installed as a base.
Firstly to see what the network traffic looks like hitting the nested hosts you can see from the ESXTOP output below that each host is dealing with about the same amount of received packets. Overall throughput is reduced when this happens.
There was also elevated datastore latency and significant CPU usage due to the overheads with the increased traffic hitting all interfaces.
With the baseline testing done I installed and enabled the dvFilter and then ran the same tests. For a detailed look at how to install the dvFilter (just in case you don’t fit the requirements for using the Learnswitch module) check out my initial post on the dvFilter here. Having gone through that I went about uninstalling the dvFilter and installing and configuring the Learnswitch.
Like the dvFilter you need to download and install am ESXi software bundle but unlike the dvFilter, you need to reboot the host to enable the Learnswitch module.William Lam’s post or the Fling page you then need to configure and run a Python script to enable the Learnswitch against the NestedESXi portgroups that have promiscuous mode enabled.
In terms of what it looks like from a performance point of view, below are the results of all Crystal Disk Mark tests. The bottom two represent the dvFilter (left) and the Learnswitch (right).
And finally to have a look at the improvement in CPU performance with the modules installed you can see below a timeline showing the performance tests run at different times across the last 24 hours…again a significant improvement looking at the graphs on the left hand side which was during the testing without any module and then moving across to the dvFilter test with the Learnswitch test on the right hand side. It does seem like the Learnswitch is a little better on CPU, but can’t be 100% with my limited testing.
As expected there isn’t a huge different in performance between both modules but certainly the features of the Learnswitch make it the new preferred choice out of the two if the requirements are met. Again, the main advantages of the Learnswitch over the dvFilter make it a must have addition to any NestedESXi environment. If you haven’t installed either yet…get onto it!