Category Archives: Storage

vSphere 6.5 Update 1 – What’s in it for Service Providers

Late last week VMware released vSphere 6.5 Update 1 which included updated builds of both vCenter and ESXi and as per usual I will go through some of the key features and fixes that are included in the latest versions of vCenter and ESXi. When looking through the release notes I generally keep an eye out for improvements that relate back to Service Providers who use vSphere as the foundation of their Managed or Infrastructure as a Service offerings. This update also contains an update to vSAN which is now at 6.6.1 so I’ll spend some time looking at what’s been added there.

 

New Features and Enhancements:

Without question this is a significant patch release for vCenter and ESXi and the length of the release notes is testament to that point. In terms of new features there isn’t anything groundbreaking but there are a few nice additions like being able to run the VCSA GUI and CLI installers on Windows 2012 and 2012 R2 as well as 2016 and macOS Sierra and Ubuntu 17.04 OS is supported for Guest OS Customization. vCenter now supports Microsoft SQL Server 2014 SP2 2016 and SP1 as well as some increased configuration maximums supporting Linked Mode with 15 vCenter Instances, 5000 ESXi hosts and 50,000 powered on virtual machines.

Ability to Upgrade or Migrate from vCenter 6.0 Update 3:

This release addresses the previous limitation in the upgrade and migration path for those running vSphere 6.0 U3 in going to vSphere 6.5. I know this will make a lot of providers happy as I know a lot that had to go to 6.0 Update 3 to address existing bug in the platform but where not yet ready or able to go to 6.5 at the time.

HTML5 Client Update:

The HTML5 Web Client has gotten it’s own update that brings it up to speed with the 3.15 Fligng version however it’s still partially functional which remains somewhat frustrating…The online documentation for supported functionality has been updated to vSphere 6.5U1 and is available here.

The list below is of the main updates in this release.

  • DRS/HA VM overrides
  • SDRS rules
  • Content Library – further actions
  • Roles and Global Permissions
  • Download multiple files as zip
  • Distributed Switch – further actions
  • Fault Tolerance
  • SPBM
  • VM Hardware – further items
  • Apply Customize Guest OS during Clone
  • VM Migration – further actions (compute+storage, Cross VC, batch)
vSAN Features:

For service providers, vSAN 6.6 was another major release that sured up vSANs status as a serious storage platform for service provider platforms.

vSAN 6.6.1 introduces three key new features:

  • VMware vSphere Update Manager (VUM) integration
  • Performance Diagnostics in vSAN Cloud Analytics
  • Storage Device Serviceability enhancement

The ability to upgrade with VUM is a nice touch and continues to improve on the usability and manageability of vSAN. For a full look at what’s new in this release for vSAN 6.6.1 head to this blog post.

Resolved Issues:

There are a bunch of resolved issues in this release and I’ve gone through the rather extensive list to pull out the biggest fixes that relate to my experience in service provider operations and have also extended this to include fixes that relate to backup operations. The majority of what I pick out related to storage, networking hosts and VM operations…the core of any platform, but even more important in the service provider world. The ones in red are specific fixes that relate to issues that iv’e come across…good to see them addressed!

vCenter:
  • First-boot failure occurs when upgrading from vSphere 5.5 or 6.0 to vSphere 6.5 on Windows If an older version of the OpеnSSL DLLs are installed, upgrading to vSphere 6.5 fails to run because the older DLL versions are loaded
  • Affinity rules configured on vCenter Server 5.5 can cause crashes after upgrading to vCenter Server 6.5 Migrating a VM with affinity rules configured while on vCenter Server 5.5 to a cluster that has affinity rules configured on vCenter Server 6.0 or 6.5 can cause vCenter Server to crash.
  • VM Snapshot Size (GB) alarm is not triggered after the VM is powered on. VM Snapshot Size (GB) alarm is reset if the virtual machine is shut down. Alarm fails to trigger after the VM is powered on. This issue occurs in alarms based on VM Snapshot (GB) and Vm Total Size on Disk because their status is altered when the power state of the VM is changed. This issue occurs because disk usage of a VM is the same regardless of the VM power state.
  • When you add ports to a vSphere Distributed Switch you get an error Because of a race condition, when you add ports to a vSphere Distributed Switch you get the error message: Cannot create a new port because number of ports exceeds 2147483647, maximum number of ports allowed on vDS.
  • A runtime exception “Unable to retrieve data about the distributed switch” might occur while upgrading vSphere Distributed Switch (vDS) from 5.0 to 6.5 version When you try to upgrade an existing distributed switch after the vCenter upgrade is completed, the runtime exception Unable to retrieve data about the distributed switch might occur in the wizard and the distributed switch cannot be upgraded. The exception is a result of unexpected value NULL for a LACP property of the distributed switch, instead of TRUE or FALSE, as LACP is not supported for the current version of vSphere Distributed Switch.
  • Host configuration might not be available after vCenter Server restarts After a vCenter Server restart, the host configuration might not be available if vCenter Server cannot communicate with the host. After connectivity is restored, the configuration becomes available.
  • OVF tool fails to upload OVF or OVA files larger than 10 GB If you use OVF tool fails to upload OVF or OVA files larger than 10 GB, the upload might fail.

ESXi:

  • Virtual machine crashes on ESXi 6.5 when multiple users log on to Windows Terminal Server VM Windows 2012 terminal server running VMware tools 10.1.0 on ESXi 6.5 stops responding when many users are logged in.vmware.log will show similar messages to2017-03-02T02:03:24.921Z| vmx| I125: GuestRpc: Too many RPCI vsocket channels opened.
    2017-03-02T02:03:24.921Z| vmx| E105: PANIC: ASSERT bora/lib/asyncsocket/asyncsocket.c:5217
    2017-03-02T02:03:28.920Z| vmx| W115: A core file is available in "/vmfs/volumes/515c94fa-d9ff4c34-ecd3-001b210c52a3/h8-
    ubuntu12.04x64/vmx-debug-zdump.001"
    2017-03-02T02:03:28.921Z| mks| W115: Panic in progress... ungrabbing 
  • An ESXi host might fail with purple diagnostic screen when collecting performance snapshots
    An ESXi host might fail with purple diagnostic screen when collecting performance snapshots with vm-support due to calls for memory access after the data structure has already been freed.An error message similar to the following is displayed:
  • Full duplex configured on physical switch may cause duplex mismatch issue with igb native Linux driver supporting only auto-negotiate mode for nic speed/duplex setting
    If you are using the igb native driver on an ESXi host, it always works in auto-negotiate speed and duplex mode. No matter what configuration you set up on this end of the connection, it is not applied on the ESXi side. The auto-negotiate support causes a duplex mismatch issue if a physical switch is set manually to a full-duplex mode.
  • An ESXi host might fail with a purple screen and a Spin count exceeded (refCount) – possible deadlock with PCPU error An ESXi host might fail with a purple screen and a Spin count exceeded (refCount) - possible deadlock with PCPU error, when you reboot the ESXi host under the following conditions:
    • You use the vSphere Network Appliance (DVFilter) in an NSX environment
    • You migrate a virtual machine with vMotion under DVFilter control
  • A Virtual Machine (VM) with e1000/e1000e vNIC might have network connectivity issues For a VM with e1000/e1000e vNIC, when the e1000/e1000e driver tells the e1000/e1000e vmkernel emulation to skip a descriptor (the transmit descriptor address and length are 0), a loss of network connectivity might occur.
  • An ESXi host might stop responding when you migrate a virtual machine with Storage vMotion between ESXi 6.0 and ESXi 6.5 hosts The vmxnet3 device tries to access the memory of the guest OS while the guest memory preallocation is in progress during the migration of virtual machine with Storage vMotion. This results in an invalid memory access and the ESXi 6.5 host failure.
  • Modification of IOPS limit of virtual disks with enabled Changed Block Tracking (CBT) fails with errors in the log files To define the storage I/O scheduling policy for a virtual machine, you can configure the I/O throughput for each virtual machine disk by modifying the IOPS limit. When you edit the IOPS limit and CBT is enabled for the virtual machine, the operation fails with an error The scheduling parameter change failed. Due to this problem, the scheduling policies of the virtual machine cannot be altered. The error message appears in the vSphere Recent Tasks pane.You can see the following errors in the /var/log/vmkernel.log file:2016-11-30T21:01:56.788Z cpu0:136101)VSCSI: 273: handle 8194(vscsi0:0):Input values: res=0 limit=-2 bw=-1 Shares=1000
    2016-11-30T21:01:56.788Z cpu0:136101)ScsiSched: 2760: Invalid Bandwidth Cap Configuration
    2016-11-30T21:01:56.788Z cpu0:136101)WARNING: VSCSI: 337: handle 8194(vscsi0:0):Failed to invert policy
  • When you hot-add an existing or new virtual disk to a CBT (Changed Block Tracking) enabled virtual machine (VM) residing on VVOL datastore, the guest operation system might stop responding When you hot-add an existing or new virtual disk to a CBT enabled VM residing on VVOL datastore, the guest operation system might stop responding until the hot-add process completes. The VM unresponsiveness depends on the size of the virtual disk being added. The VM automatically recovers once hot-add completes.
  • When you use vSphere Storage vMotion, the UUID of a virtual disk might change When you use vSphere Storage vMotion on vSphere Virtual Volumes storage, the UUID of a virtual disk might change. The UUID identifies the virtual disk and a changed UUID makes the virtual disk appear as a new and different disk. The UUID is also visible to the guest OS and might cause drives to be misidentified.
  • An ESXi host might become unresponsive if the VMFS-6 volume has no space for the journal When opening a VMFS-6 volume, it allocates a journal block. Upon successful allocation, a background thread is started. If there is no space on the volume for the journal, it is opened in read-only mode and no background thread is initiated. Any intent to close the volume, results in attempts to wake up a nonexistent thread. This results in the ESXi host failure.
  • SSD congestion might cause multiple virtual machines to become unresponsiv Depending on the workload and the number of virtual machines, diskgroups on the host might go into permanent device loss (PDL) state. This causes the diskgroups to not admit further IOs, rendering them unusable until manual intervention is performed.
  • Unable to collect vm-support bundle from an ESXi 6.5 host Unable to collect vm-support bundle from an ESXi 6.5 host because when generating logs in ESXi 6.5 by using the vSphere Web Client, the select specific logs to export text box is blank. The options: network, storage, fault tolerance, hardware etc. are blank as well. This issue occurs because the rhttpproxy port for /cgi-bin has a value different from 8303.This issue is resolved in this release.
  • vSphere Storage vMotion might fail with an error message if it takes more than 5 minutes The destination virtual machine of the vSphere Storage vMotion is incorrectly stopped by a periodic configuration validation for the virtual machine. vSphere Storage vMotion that takes more than 5 minutes fails with the The source detected that the destination failed to resume message.
    The VMkernel log from the ESXi host contains the message D: Migration cleanup initiated, the VMX has exited unexpectedly. Check the VMX log for more details.

vSAN:

  • Hosts in a vSAN cluster have high congestion which leads to host disconnects When vSAN components with invalid metadata are encountered while an ESXi host is booting, a leak of reference counts to SSD blocks can occur. If these components are removed by policy change, disk decommission, or other method, the leaked reference counts cause the next I/O to the SSD block to get stuck. The log files can build up, which causes high congestion and host disconnects.
  • vSAN cluster becomes partitioned after the member hosts and vCenter Server reboot If the hosts in a unicast vSAN cluster and the vCenter Server are rebooted at the same time, the cluster might become partitioned. The vCenter Server does not properly handle unstable vpxd property updates during a simultaneous reboot of hosts and vCenter Server.
  • Large File System overhead reported by the vSAN capacity monitor When deduplication and compression are enabled on a vSAN cluster, the Used Capacity Breakdown (Monitor > vSAN > Capacity) incorrectly displays the percentage of storage capacity used for file system overhead. This number does not reflect the actual capacity being used for file system activities. The display needs to correctly reflect the File System overhead for a vSAN cluster with deduplication and compression enabled.

It’s also worth reading through the Known Issues section as there is a fair bit to be aware of in Update 1 and that remain from the GA.

Happy upgrading!

References:

https://docs.vmware.com/en/VMware-vSphere/6.5/rn/vsphere-esxi-651-release-notes.html

https://docs.vmware.com/en/VMware-vSphere/6.5/rn/vsphere-vcenter-server-651-release-notes.html

Second vSphere Client (HTML5) update in vSphere 6.5U1

Introducing vSAN 6.6.1 and New Operational Savings

ESXI 6.5 Storage Performance Issues Resolved in Update 1

I originally came across the issue of slow storage performance with the native vmw_ahci driver that comes bundled with ESXi 6.5 just as I was first playing with my SuperMicro SYS-5028D-TN4T in my homelab. After publishing a couple of posts about the workaround shortly afterwards the issue become quiet prevalent in the community and the post continues to get decent traffic, meaning that the issues impacted quiet a few people out there.

The good news is that with the release of vSphere 6.5 Update 1 there is a fix for the problem in the form of updated drivers for the AHCI module. William Lam has been quick to blog about the fix and if you had previously disabled the driver you will need to re-enable it.

This VMwareKB covers the specific patch as listed in the release notes:

No confirmation as of yet if it actually does the trick, but the release notes look promising as the assumption is that it will resolve the issues so that homelabbers and people using the driver in production systems can rest easy.

References:

https://docs.vmware.com/en/VMware-vSphere/6.5/rn/vsphere-esxi-651-release-notes.html

https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2149910

http://www.virtuallyghetto.com/2017/07/ahci-vmw_ahci-performance-issue-resolved-in-esxi-6-5-update-1.html

NestedESXi – Network Performance Improvements with Learnswitch

I’ve been running my NestedESXi homelab for about eight months now but in all that time I had not installed or enabled the ESXi MAC Learning dvFilter. As a quick refresher the VMware Fling addresses the issues with nested ESXi hosts and the impact that promiscuous mode has when enabled on virtual switches. In a nutshell, network traffic will hit all the network interfaces attached to the portgroup which reduces network throughput and also increases latency and impacts CPU.

The ESXi MAC Learn dvFilter Fling was released about two years ago and its a must have for those running homelabs or work labs running nested ESXi. However earlier this year a new fling was released that improves on the dvFilter and addresses some of it’s limitations. The new native MAC Learning VMkernel module is called Learnswitch.

ESXi Learnswitch is a complete implementation of MAC Learning and Filtering and is designed as a wrapper around the host virtual switch. It supports learning multiple source MAC addresses on virtual network interface cards (vNIC) and filters packets from egressing the wrong port based on destination MAC lookup. This substantially improves overall network throughput and system performance for nested ESX and container use cases.

For a more in depth look at it’s functionality head over to William Lams blog post here.

dvFilter vs Learnswitch:

I was interested to see if the new Learnswitch offered any significant performance improvements over the dvFilter in addition to its main benefits. I went about installing and enabling the dvFilter in my lab and ran some basic performance tests using Crystal Disk Mark. Before that, I ran the performance test without either installed as a base.

Firstly to see what the network traffic looks like hitting the nested hosts you can see from the ESXTOP output below that each host is dealing with about the same amount of received packets. Overall throughput is reduced when this happens.

In terms of performance the Crystal Disk Mark test run on a nested VM (right) showed reduced performance across all tests when compared to one run on the parent host (left) directly.

There was also elevated datastore latency and significant CPU usage due to the overheads with the increased traffic hitting all interfaces.

The CPU usage alone shows the value in having the dvFilter or Learnswitch installed when running nested ESXi hosts.

With the baseline testing done I installed and enabled the dvFilter and then ran the same tests. For a detailed look at how to install the dvFilter (just in case you don’t fit the requirements for using the Learnswitch module) check out my initial post on the dvFilter here. Having gone through that I went about uninstalling the dvFilter and installing and configuring the Learnswitch.

Like the dvFilter you need to download and install am ESXi software bundle but unlike the dvFilter, you need to reboot the host to enable the Learnswitch module.

As per the instructions on William Lam’s post or the Fling page you then need to configure and run a Python script to enable the Learnswitch against the NestedESXi portgroups that have promiscuous mode enabled.

From there the impact of the module is immediate and you can see a normalization of network traffic hitting the interfaces of each NestedESXi host. When running the performance test the ESXTOP output is significantly different to what you see if the module is not loaded as shown below.

You also have access to a new command that lists out stat’s of the Learnswitch showing packet and port statistics as well as the current MAC address table.

In terms of what it looks like from a performance point of view, below are the results of all Crystal Disk Mark tests. The bottom two represent the dvFilter (left) and the Learnswitch (right).

And finally to have a look at the improvement in CPU performance with the modules installed you can see below a timeline showing the performance tests run at different times across the last 24 hours…again a significant improvement looking at the graphs on the left hand side which was during the testing without any module and then moving across to the dvFilter test with the Learnswitch test on the right hand side. It does seem like the Learnswitch is a little better on CPU, but can’t be 100% with my limited testing.

Conclusion:

As expected there isn’t a huge different in performance between both modules but certainly the features of the Learnswitch make it the new preferred choice out of the two if the requirements are met. Again, the main advantages of the Learnswitch over the dvFilter make it a must have addition to any NestedESXi environment. If you haven’t installed either yet…get onto it!

Important ESXi 6.0 Patch – Upgrade Now!

Last week VMware released a new patch (ESXi 6.0 Build 5572656) that addresses a number of serious bugs with Snapshot operations. Usually I wouldn’t blog about a patch release, but when I looked through the rest of the fixes in the VMwareKB it was apparent to me that this was more than your average VMware patch and addresses a number of issues around storage but again, a lot around Snapshot operations which is so critical to most VM backup operations.

Here are some of the key resolutions that I’ve picked out from the patch release:

  • When you take a snapshot of a virtual machine, the virtual machine might become unresponsive
  • After you create a virtual machine snapshot of a SEsparse format, you might hit a rare race condition if there are significant but varying write IOPS to the snapshot. This race condition might make the ESXi host stop responding
  • Because of a memory leak, the hostd process might crash with the following error: Memory exceeds hard limit. Panic. The hostd logs report numerous errors such as Unable to build Durable Name. This kind of memory leak causes the host to get disconnected from vCenter Server
  • Using SESparse for both creating snapshots and cloning of virtual machines, might cause a corrupted Guest OS file system
  • During snapshot consolidation a precise calculation might be performed to determine the storage space required to perform the consolidation. This precise calculation can cause the virtual machine to stop responding, because it takes a long time to complete
  • Virtual Machines with SEsparse based snapshots might stop responding, during I/O operations with a specific type of I/O workload in multiple threads
  • When you reboot the ESXi host under the following conditions, the host might fail with a purple diagnostic screen and a PCPU xxx: no heartbeat error.
    • You use the vSphere Network Appliance (DVFilter) in an NSX environment
    • You migrate a virtual machine with vMotion under DVFilter control
  • Windows 2012 domain controller supports SMBv2, whereas Likewise stack on ESXi supports only SMBv1. With this release, the likewise stack on ESXi is enabled to support SMBv2
  • When the unmap commands fail, the ESXi host might stop responding due to a memory leak in the failure path. You might receive the following error message in the vmkernel.log file: FSDisk: 300: Issue of delete blocks failed [sync:0] and the host gets unresponsive.
  • In case you use SEsparse and enable unmapping operation to create snapshots and clones of virtual machines, after the wipe operation (the storage unmapping) is completed, the file system of the guest OS might be corrupt. The full clone of the virtual machine performs well.

There is also a number of vSAN related fixes in the patch so overall it’s worth looking to apply this patch as soon as is possible.

References:

https://kb.vmware.com/kb/2149955

vSAN 6.6 – What’s In It For Service Providers

Last February when VMware released VSAN 6.2 I stated that “Things had gotten Interesting” with regards to the 6.2 release of vSAN finally marking it’s arrival as a serious player in the Hyper-converged Infrastructure (HCI) market. vSAN was ready to be taken very seriously by VMware’s competitors. Fast forward fourteen months and apart from the fact we have confirmed the v in vSAN is a lower case with the product name officially changing from Virtual SAN to vSAN…Version 6.6 was announced last week is set to GA today, and with it comes the biggest list of new features and enhancements in vSANs history.

VMware has decided to break with the normal vSphere release cycle for vSAN and move to patch releases for vSphere that are actually major updates of vSAN. This is why this release is labeled vSAN 6.6 and will be included in the vSphere 6.5EP2 build. The move allows the vSAN team to continue to enhance the platform outside of the core vSphere platform and I believe it will deliver at least 2 update releases per year.

Looking at the new features and enhancements of the vSAN 6.6 release it’s clear to see that the platform has matured and given the 7000+ strong customer base it’s also clear to see that its being accepted more and more for critical workloads. From a service provider point of view I know of a lot more vCloud Air Network partners that have implemented vSAN as not only their Management HCI platform, but also now their customer HCI compute and storage  platforms.

A lot for Service Providers to like:

As shown in the feature timeline above there are over 20+ new features and enhancements but for me the following ones are most relative to vCAN Service Providers who are using, or looking to use vSAN in their offerings. I will expand on the ones in red as I see them as being the most significant of the new features and enhancements for service providers.

  • Native encryption for data-at-rest
  • Compliance certifications
  • vSAN Proactive Drive HA for failing drives
  • Resilient management independent of vCenter
  • Rapid recovery with smart, efficient rebuilds
  • Certified file service & data protection solutions
  • Enhanced vSAN SDK and PowerCLI
  • Simple networking with Unicast
  • vSAN Cloud Analytics for performance
  • vSAN Cloud Analytics with real-time support notification and recommendations*
  • vSAN Config Assist with 1-click hardware lifecycle management
  • Extended Health Services
  • Up to 50% greater IOPS for all-flash with optimized checksum and dedupe
  • Optimized for latest flash technologies
  • Expanded caching tier choice
  • New Docker Volume Driver

Simple networking with Unicast:

As John Nicholson wrote on the Virtual Blocks blog…it’s time to say goodbye to the multicast requirements around vSAN networking traffic. For a history as to why multicast was used, click here. Also it’s worth reading John’s post and also the he goes through the upgrade process as if you are upgrading from previous versions, multicast will still be used unless you make the change as also specified here.

I can attest first hand to the added complexity when it comes to setting up vSAN with multicast and have gone through a couple of painful deployments where the multicast configuration was an issue during initial setup and also caused issue with switching infrastructure that needed to be upgraded to before vSAN could work reliably. In my mind unicast offers a simpler less complex solution with minimal overheads and makes it more transportable across networks.

Performance Improvements:

Service Providers are always trying to squeeze the most out of their hardware purchases and with VMware claiming 50% greater IOPS for all-flash through optimized data services that in theory can enable 150K IOPS per host it appears they will be served well. in addition to optimized checksum and dedupe along with support for the latest flash technologies. The increased performance helps accelerate tenant workloads and provides higher consolidation ratios for those workloads.

Service providers can accelerate new hardware technologies with the support of the latest flash technologies, including solutions like the new breed of NVMe SSDs. These solutions can deliver up to 250% greater performance for write-intensive applications. vSAN 6.6 now offers larger caching drive options that includes 1.6TB flash drives, so that service providers can take advantage of larger capacity flash drives.

Disk Performance Enhancements:

For those that have gone through a vSAN rebuild operation you would know that is can be a long exercise depending on the amount of data and configuration of the vSAN datastore. vSAN 6.6 introduces a new smart rebuild and rebalancing feature along with partial repairs of degraded or absent components. There is also resync throttling and improved visibility into the rebuilding status through the Health Status. Cormac Hogan goes through the improvements in detail here.

From a Service Provider point of view having these enhanced features around the rebuilds it critical to continued quality of service for IaaS customer who live on shared vSAN storage. Shorter and more efficient rebuild times means less impact to customers.

Health Checks and Monitoring Improvements:

vSAN Encryption:

VMware has introduced VM encryption native at the vSAN datastore level. This can be enabled per vSAN Cluster and works with deduplication and compression across hybrid and all-flash cluster configurations. vSAN 6.6 data Encryption is hardware agnostic, there is no requirement to use specialized and more expensive Self-Encrypting Drives (SEDs) which is also a bonus. Jase McCarty has another Virtual Blocks article here that goes through this feature in great detail.

From a Service Provider point of view you can now potentially offer two classes of vSAN backed storage for IaaS customers. One that lives on an Encrypted enabled cluster that’s charged at a premium over non Encrypted clusters. In talking with service providers across the globe, data at rest encryption has become something that potential customers are asking for and most leading storage companies have an encryption story…now so does vSAN and it appears to be market leading.

vSAN 6.6 Licensing:

In terms of the licensing Matrix, nothing too drastic has changed except for the addition of Data at Rest Encryption in the Enterprise bundle, however in a significant move for vCAN Service Providers, QoS IOPS Limiting has been extended across all license types and can now be taken advantage across the board. This is good for Service Providers who look to offer different tiers or storage performance based on IOPS limited…previously it was only available under Enterprise licensing.

Bootstrapping UI:

As a bonus feature that I think will assist vCAN Service Providers is the new Native Bootstrap installer in vSAN 6.6. William Lam has written about the feature here, but for those looking to install their first vSAN node without vSphere available the ability to bootstrap is invaluable. The old manual process is still worth looking at as it’s always beneficial to know what’s going on in the background, but it’s all GUI based now via the VCSA installer.

Conclusion:

vSAN 6.6 appears to be a great step forward for VMware and Service Providers will no doubt be keen to upgrade as soon as possible to take advantage of the features and enhancements that have been delivered in this 6.6 release.

References:

http://cormachogan.com/2017/04/11/whats-new-vsan-6-6/ 

https://storagehub.vmware.com/#!/vmware-vsan/vmware-vsan-6-5-technical-overview

http://vsphere-land.com/news/an-overview-of-whats-new-in-vmware-vsan-6-6.html

https://storagehub.vmware.com/#!/vmware-vsan/vsan-multicast-removal/multicast-removal-steps-and-requirements/1

vSAN 6.6 Encryption Configuration

vSAN 6.6 – Native Data-at-Rest Encryption

Goodbye Multicast

Native VCSA bootstrap installer in vSAN 6.6

Top Posts 2016

2016 is pretty much done and dusted and it’s been an good year for Virtualization is Life! There was a more modest 70% increase in site visits this year compared to 2015 and a 2600% increase in visits since I began blogging in 2012. In 2016 I managed to produce 124 posts (including this one) which was slightly up on the 110 I produced in 2015 and in doing so passed 300 total blogs since I started here. I was fairly consistent in getting out at least eight blogs per month with June being my most prolific month with sixteen blog posts published.

Looking back through the statistics generate via JetPack, I’ve listed the Top 10 Blog Posts from the last 12 months. This year the opinion pieces seemed to be of interest to my readers and there is still vCloud Director and NSX representation in the top ten with my Veeam articles doing well. Again it was interesting to see that two of the most generic (older posts) and certainly basic posts took out two of the top three spots. It shows that bloggers should not be afraid of blogging around simple topics as there is an audience that will appreciate the content and get value out of the post.

  1. NSX Edge vs vShield Edge: Part 1 – Feature and Performance Matrix
  2. Quick Post: E1000 vs VMXNET3
  3. vSphere 6.0 vCenter Server Appliance: Upgrading from 5.x
  4. ESXi Bugs – VMware Can’t Keep Letting This Happen!
  5. Nutanix Buying PernixData: My Critical Analysis
  6. New NSX License Tier Thoughts and Transformers
  7. CBT Bugs – VMware Can’t Keep Letting This Happen!
  8. Veeam 9 Released: Top New Features
  9. Veeam’s Next Big Thing – Veeam has Arrived!
  10. vCloud Director 8: New Features And A New UI Addition…

I was honoured to have this blog voted #44 in the TopvBlog2016 and even with all the controversy around the voting I still hold that as a significant outcome of which I am very proud and I’d like to thank the readers and supporters of this blog for voting for me! And thanks must also go to my site sponsors who are all listed on the right hand side of this page.

With me moving across to vendor land it’s going to be interesting to see if I can keep up the variety of posts as I “narrow” down my core focus…however I fully intend to keep on pushing this blog by keeping it strong to it’s roots of vCloud Director and core VMware technologies like NSX and vSAN. I have the Home lab and the drive to continue to produce content around the things I am passionate about…and that includes all things hosting and cloud now with a touch of availability 🙂

Stay tuned for an even bigger 2017!

#LongLivevCD

Quick Look – vSphere 6.5 Storage Space Reclamation

One of the cool newly enabled features of vSphere 6.5 is the come back of VMFS storage space reclamation. This feature was enabled in a manual way for VMFS5 datastores and was able to be triggered when you free storage space inside a datastore when deleting or migrating a VM…or consolidate a snapshot. At a Guest OS level, storage space is freed when you delete files on a thinly provisioned VMDK and then exists as dead or stranded space. ESXi 6.5 supports automatic space reclamation (SCSI unmap) that originates from a VMFS datastore or a Guest OS…the mechanism reclaims unused space from VM disks that are thin provisioned.

When storage space is deleted without this automated feature the delete operation leaves blocks of unused space on the datastore. VMFS uses the SCSI unmap command to indicate to the array that the storage blocks contain deleted data, so that the array can unallocate these blocks.

On VMFS6 datastores, ESXi supports automatic asynchronous reclamation of free space. VMFS6 generally supports automatic space reclamation requests that generate from the guest operating systems, and passes these requests to the array. Many guest operating systems can send the unmap command and do not require any additional configuration. The guest operating systems that do not support automatic unmaps might require user intervention.

I was interested in seeing if this worked as advertised, so I went about formatting a new VMFS6 datastore with the default options via the Web Client as shown below:

Heading over the hosts command line I checked the reclamation config using the new esxcli namespace:

Through the Web Client you can only set the Reclamation Priority to None or Low, however through the esxcli command you can set that value to medium or high as well as low or none, but as I’ve literally just found out, these esxcli only settings don’t actually do anything in this release.

For the low setting in terms of reclaim priority and how long before the process kicks off on the datastore, the expectation is that any blocks that are no longer used will be reclaimed within 12 hours. I was keeping track of a couple of VMs and the datastore sizes in general and saw that after a day or so there was a difference in the available storage. 

You can see that I clawed back about 22GB and 14GB on both datastores in the first 24 hours. So my initial testing with this new feature shows that it’s a valued and welcomed edition to the new vSphere 6.5 release. I know that for Service Providers that thin provision but charge based on allocated storage, they will benefit greatly from this feature as it automates a mechanism that was complex at best in previous releases.

There is also a great section around UNMAP in the vSphere 6.5 Core Storage White Paper that’s literally just been released as well and can be found here:

References:

http://pubs.vmware.com/vsphere-65/topic/com.vmware.ICbase/PDF/vsphere-esxi-vcenter-server-65-storage-guide.pdf

https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2057513

vSphere 6.5 Core Storage White Paper Now Available

HomeLab – SuperMicro 5028D-TNT4 Storage Driver Performance Issues and Fix

Ok, i’ll admit it…i’ve had serious lab withdrawals since having to give up the awesome Zettagrid Labs. Having a lab to tinker with goes hand in hand with being able to generate tech related content…point and case, my new homelab got delivered on Monday and I have been working to get things setup so that I can deploy my new NestedESXi lab environment.
The issue that I came across was to do with storage performance and the native driver that comes bundled with ESXi 6.5. With the release of vSphere 6.5 yesterday, the timing was perfect to install ESXI 6.5 and start to build my management VMs. I first noticed some issues when uploading the Windows 2016 ISO to the datastore with the ISO taking about 30 minutes to upload. From there I created a new VM and installed Windows…this took about two hours to complete which I knew was not as I had expected…especially with the datastore being a decent class SSD.
By way of an quick intro (longer first impression post to follow) I purchased a SuperMicro SYS-5028D-TN4T that I based off this TinkerTry Bundle which has become a very popular system for vExpert homelabers. It’s got an Intel Xeon D-1541 CPU and I loaded it up with 128GB or RAM. The system comes with an embedded Lynx Point AHCI Controller that allows up to six SATA devices and is listed on the VMware Compatibility Guide for ESXi 6.5.

I created a new VM and kicked off a new install, but this time I opened ESXTOP to see what was going on, and as you can see from the screen shots below, the Kernel and disk write latencies where off the charts topping 2000ms and 700-1000ms respectivly…In throuput terms I was getting about 10-20MB/s when I should have been getting 400-500MB/s. 

ESXTOP was showing the VM with even worse write latency.

I thought to myself if I had bought a lemon of a storage controller and checked the Queue Depth of the card. It’s listed with a QD of 31 which isn’t horrible for a homelab so my attention turned to the driver. Again referencing the VMware Compatability Guide the listed driver for the conrtoller the device driver is listed as ahci version 3.0.22vmw.

I searched for the installed device driver modules and found that the one listed above was present, however there was also a native VMware device drive as well.

I confirmed that the storage controller was using the native VMware driver and went about disabling it as per this VMwareKB (thanks to @fbuechsel who pointed me in the right direction in the vExpert Slack Homelab Channel) as shown below.

After the host rebooted I checked to see if the storage controller was using the device driver listed in the compatability guide. As you can see below not only was it using that driver, but it was now showing the six HBA ports as opposed to just the one seen in the first snippet above.

I once again created a new VM and installed Windows and this time the install completed in a little under five minutes! Quiet a difference! Upon running a crystal disk mark I was now getting the expected speeds from the SSDs and things are moving along quiet nicely.

Hopefully this post saves anyone else who might by this, or other SuperMicro SuperServers some time and not get caught out by poor storage performance caused by the native VMware driver packaged with ESXi 6.5.


References
:

http://www.supermicro.com/products/system/midtower/5028/SYS-5028D-TN4T.cfm

https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2044993

Free Guide: Building NetApp ONTAP 9 Lab

In computing, there is one thing you shouldn’t compromise on…and that thing is storage. This carries over to Lab or NestedESXi environments as poor lab performance can be just as frustrating as production performance issues. I’ve used a number of nested storage platform’s for my lab environments and I’m always on the lookout for alternative solutions.

When Neil Anderson asked my to write a short introductory post on his new how-to guide Build Your Own NetApp ONTAP 9 Lab I decided to flick through the guide to check it out and see if it could add any value to my future plans for a homelab. The e-book is professionally laid out and has excellent diagrams, notes and step by step…it’s extremely comprehensive.

NetApp Simulator 9 Free eBook – Build Your Own NetApp ONTAP 9 Lab!

While I’ve never been a NetApp guy there does seem to be a level of complexity in the NetApp VSA setup, but with the step by step in the e-book any ambiguity is removed. If you are looking for lab storage this is a great end to end example of how to install and configure one based on ONTOP 9 NetApp Simulator.

Give it a look over here.

The Anatomy of a vBlog Part 1: Building a Blogging Platform

Earlier this week my good friend Matt Crape sent out a Tweet lamenting the fact that he was having issues uploading media to WordPress…shortly after that tweet went out Matt wasn’t short of Twitter and Slack vCommunity advice (follow the Twitter conversation below) and there where a number of options presented to Matt on how best to host his blogging site Matt That IT Guy.

Over the years I have seen that same question of “which platform is best” pop up a fair bit and thought it a perfect opportunity to dissect the anatomy of Virtualization is Life!. The answer to the specific question as to which blogging platform is best doesn’t have a wrong or right answer and like most things in life the platform that you use to host your blog is dependent on your own requirements and resources. For me, I’ve always believed in eating my own dog food and I’ve always liked total end to end control of sites that I run. So while, what I’m about to talk about worked for me…you might like to look at alternative options but feel free to borrow on my example as I do feel it gives bloggers full flexibility and control.

Brief History:

Virtualization is Life! started out as Hosting is Life! back in April of 2012 and I choose WordPress at the time mainly due to it’s relatively simple installation and ease of use. The site was hosted on a Windows Hosting Platform that I had built at Anittel, utilizing WebsitePanel on IIS7.5, running FastCGI to serve the PHP content. Server backend was hosted on a VMware ESX Cluster out of the Anittel Sydney Zones. The cost of running this site was approximately $10 US per month.

Tip: At this stage the site was effectively on a shared hosting platform which is a great way to start off as the costs should be low and maintenance and uptime should be included in the hosters SLA.

Migration to Zettagrid:

When I started at Zettagrid, I had a whole new class of virtual infrastructure at my hands and decided to migrate the blog to one of Zettagrid’s Virtual DataCenter products where I provisioned a vCloud Director vDC and created a vApp with a fresh Ubuntu VM inside. The migration from a Windows based system to Linux went smoother than I thought and I only had a few issues with some character maps after restoring the folder structure and database.

The VM it’s self is configured with the following hardware specs:

  • 2 vCPU (5GHz)
  • 4GB vRAM
  • 20GB Storage

As you can see above the actual usage pulled from vCloud Director shows you how little resource a VM with a single WordPress instance uses. That storage number actually represents the expanded size of a thin provisioned disk…actual used on the file system is less than 3GB, and that is with four and a half years and about 290 posts worth of media and database content  I’ll go through site optimizations in Part 2, but in reality the amount of resources required to get you started is small…though you have to consider the occasional burst in traffic and work in a buffer as I have done with my VM above.

The cost of running this Virtual Datacenter in Zettagrid is approx $120 US per month.

TipEven though I am using a vCloud Director vDC, given the small resource requirements initially needed a VPS or instance based service might be a better bet. Azure/AWS/Google all offer instance based VM instances, but a better bet might be a more boutique provider like DigitalOcean.

Networking and Security:

From a networking point of view I use the vShield/NSX Edge that is part of vCloud Director as my Gateway device. This handles all my DHCP, NAT and Firewall rules and is able to handle the site traffic with ease. If you want to look at what capabilities the vShield/NSX Edges can do, check out my NSX Edge vs vShield Series. Both the basic vShield Edges and NSX Edges have decent Load Balancing features that can be used in high availability situations if required.

As shown below I configured the Gateway rules from the Zettagrid MyAccount Page but could have used the vCloud Director UI. For a WordPress site, the following services should be configured at a minimum.

  • Web (HTTP)
  • Secure Web (HTTPS)
  • FTP (Locked down to only accept connections from specific IPs)
  • SSH (Locked down to only accept connections from specific IPs)

OS and Web Platform Details:

As mentioned above I choose Ubuntu as my OS of choice to run Wordpress though any Linux flavour would have done the trick. Choosing Linux over Windows obviously means you save on the Microsoft SPLA costs associated with hosting a Windows based OS…the savings should be around $20-$50 US a month right there. A Linux distro is a personal choice so as long as you can install the following modules it doesn’t really matter which one you use.

  • SSH
  • PHP
  • MySQL
  • Apache
  • HTOP

The only thing I would suggest is that you use a long term support distro as you don’t want to be stuck on a build that can’t be upgraded or patched to protect against vulnerability and exploits. Essentially I am running a traditional LAMP stack, which is Linux, Apache, MySQL and PHP built on a minimal install of Ubuntu with only SSH enabled. The upkeep and management of the OS and LAMP stack is not much and I would estimate that I have spent about five to ten hours a year since deploying the original server dealing with updates and maintenance. Apache as a web server still performs well enough for a single blog site, though I know many that made the switch to NGINX and use the LEMP Stack.

The last package on this list is a personal favorite of mine…HTOP is an interactive process viewer for Unix systems that can be installed with a quick apt-get install htop command. As shown below it has a detailed interface and is much better than trying to work through standard top.

TipIf you don’t want to deal with installing the OS or installing and configuring the LAMP packages, you can download a number of ready made appliances that contain the LAMP stack. Turnkey Linux offers a number of appliances that can be deployed in OVA format and have a ready made LAMP appliance as well as a ready made WordPress appliance.

That covers off the hosting and platform components of this blog…In Part 2 I will go through my WordPress install in a little more detail and look at themes and plugins as well as talk about how best to optimize a blogging site with the help of free caching and geo-distribution platforms.

References and Guides:

http://www.ubuntu.com/download/server

http://howtoubuntu.org/how-to-install-lamp-on-ubuntu

https://www.digitalocean.com/community/tutorials/how-to-install-linux-nginx-mysql-php-lemp-stack-in-ubuntu-16-04

« Older Entries