Tag Archives: Storage

Quick Look: Cloud Tier SOBR Offload Job

With the release of Update 4 for Veeam Backup & Replication 9.5 we introduced the Cloud Tier, which is an extension of the Scale Out Backup Repository (SOBR). The Cloud Tier allows for data to be stripped out of Veeam backup files and offloaded as blocks of data to Object Storage leaving a dehydrated Veeam backup file on the local extents with just the metadata remaining in place. This is done based on a policy that is set against the SOBR that dictates the operational restore window of which local storage is used as the primary landing zone for backup data. The result is a space saving, smaller footprint on the local storage.

Overview of Offload Job:

By default the offload job is run against the data located on the Performance Tier extents of the SOBR every 4 hours. This is a set value that can not be changed. To offload the backup data to the Capacity Tier, the Offload job does the following:

  • Verifies whether backup chains located on the Performance Tier extents satisfy validation criteria and can be offloaded to object storage.
  • Collects verified backup chains from each Performance Tier extent and sends them directly to object storage in the form of data blocks.
  • Saves each session results to the configuration database so that you can review them upon request.

The job and job details can be viewed from the History Menu under System or the Home Menu under Last 24 Hours.

The details of the job will show how much data was offloaded to the Capacity Tier per VM residing on the SOBR. It will show statistics on how much data was processed, read and transferred. Once this job has completed, the local backup files only contain job metadata with the data residing on the Object Storage.

Forcing The Offload Job:

As mentioned, the Offload Job by default is set to run every 4 hours from the creation initial configuration of the Capacity Tier extent on the SOBR. The default value of 4 hours can not be modified however if you want to force the job to run you have two options.

First option is through the UI, under the Backup Infrastructure Menu and under Scale-Out Repositories, do a CONTROL+Click against the SOBR and select the Run Tiering Job Now option. This is hidden by default as an option and will only be shown with the CONTROL+Click

Second option is to run the following PowerShell command:

This tiggers the Offload Job to run.

Note that once the Offload Job has been forced the 4 hours counter is reset to when the job was run…ie the next job will be 4 hours from the time the job was forced.

It’s important to understand that running the job on demand doesn’t necessary mean that you will offload data to the Capacity Tier any quicker. The conditions around operations restore window and sealed backup chains still need to be in place for the job to do its thing. Having the job run six times a day (every 4 hours) is generally going to be more than enough for most instances.

If no data has been offloaded, you will see the following in the job details:

Wrap Up and More Cloud Tier:

To learn more about the Cloud Tier head to my veeam.com post here, and also check our Rhys Hammonds post here. Also look out for a new Veeam White Paper being released in the next month or so which will deep dive into the Cloud Tier in more detail. I will post a few more posts on the Cloud Tier over the next few weeks as well looking at some more use cases and features.

References:

https://helpcenter.veeam.com/docs/backup/vsphere/capacity_tier.html?ver=95u4

 

 

How to Copy Amazon S3 Buckets with AWS CLI

I am doing some work on validated restore scenarios using the new Veeam Cloud Tier that backed by an Object Storage Repository pointing at an Amazon S3 Bucket. So that I am not messing with the live data I wanted a way to copy and access the objects from another bucket or folder. There is no option at the moment to achieve this via the AWS Console, however it can be done via the AWS CLI.

First step was to ensure I had the AWS CLI installed on my MBP and it was at the latest version:

For the first part of the copy process, I cheated and created a new Bucket from the AWS Console that was based on the one I wanted to copy.

Next step is to make sure that the AWS CLI is configured with the correct AWS Access and Secret keys. Once done, the command to copy/sync buckets is a simple one.

Obviously the time to complete the operation will depend on the amount of Objects in the Bucket and whether its cross region or local. It took about 4 hours to copy across ~50GB of data from US-EAST-2 to US-WEST-2 going at about 4MB/s. By default the process is shown on the screen.

Once the first pass was complete I ran the same command again which will this time look for differences between the source and destination and only sync the differences. You can run the command below to view the Total Objects and Total Size of both buckets for comparison.

That is it! Pretty simple process. I’ll blog around the actual reason behind the Veeam Cloud Tier requirement and put this into action at a later date!

References:

https://docs.aws.amazon.com/cli/latest/userguide/install-macos.html

https://aws.amazon.com/premiumsupport/knowledge-center/move-objects-s3-bucket

Quick Fix: vSAN Health Reports iSCSI Target Service Stopped

A few weeks ago I wrote about using iSCSI as a backup repository target. While still running this POC in my environment I came across an error in the vSAN Health Checker stating the vSAN iSCSI target service was in a Failed state. Drilling down into the vSAN Health check tree I could see a Service Runtime status of stopped as shown below against the host.

This host had recently been marked as unreachable in vCenter and required a Management Agent reset to bring it back online. There is a chance that that process stopped the iSCSI Target service but did not start it. In any case there is an easy way to see the status of the services and then get them back online.

Once that’s been done, a re-run of the vSAN Health checker will show that the issue has been resolved and the iSCSI Target Service on the host is now running.

References:

https://kb.vmware.com/s/article/2147603

 

Released: vSAN 6.7 – HTML5 Goodness, Enhanced Health Checks and More!

VMware has announced the general availability of vSAN 6.7. As vSAN continues to grow, VMware are very buoyant about how it’s performing in the market. With some 10,000 customers at a run rate of over 600 million they claim to lead the HyperConverged market with a 32% market share. From my point of view it’s great to see vSAN being deployed across 250 cloud providers and have it as the cornerstone storage of the VMware Cloud on AWS solution. vSAN 6.7 is focusing on intuitive operational experience, consistent application experience and holistic support experience.

New Features and Enhancements:

  • HTML5 User Interface
  • Embedded vROPs plugin for HTML5 User Interface
  • Support for Windows Failover Cluster using iSCSI
  • Adaptive Resync Performance Improvements
  • Destaging Performance Improvements
  • More Efficient data placement during Host Decommissioning
  • Improved Space Efficiency
  • Faster Failover with Redundant vSAN Networks
  • Optimized Witness Traffic Seperation
  • Stretched Cluster Improvements
  • Host Affinity for Next-Gen Applications
  • Health Check Enhancements
  • Enhanced Diagnostics
  • vSAN Support Insight
  • 4Kn Device Support
  • Improved FIPS 140-2 Validation Security

There are a lot of enhancements in this release and while not as ground breaking at the 6.6 release last year, there is still a lot to like about how VMware is improving the platform. From the list above, i’ve taken the key ones from my point of view and expanded on them a little.

HTML5 User Interface:

As has been the trend with all VMware products of late, vSAN is getting the Clarity Framework overhaul and is being included in the HTML5 vSphere Web Client with new vSAN tasks and workflows developed from the ground up to simplify the experience. There is also new vSAN functionality that can only be accessed via the HTML5 client.

The legacy Flex client will still be available for use and it’s also worth noting that this is not a direct port of the Flex interface but started from the ground up. This has resulted in a more efficient experience for the user with less clicks and less time to action items. Any new features or enhancements will only be seen in the new HTML5 UI.

Support for Windows Failover Cluster using iSCSI:

A few weeks back I posted around how you could use vSAN as Veeam repository using the iSCSI feature. With vSAN 6.7 there is offical support for Windows Failover Clustering using the vSAN iSCSI service. Lots of people still run MSCS and a lot still use traditional clustering. This supports physical and virtual Guest iSCSI initiators that includes transparent failover of clusters with vSAN iSCSI volumes.

I’m not sure if this now means that iSCSI volumes are supported as Veeam Cloud Repositories…but I will confirm either way.

Adaptive Resync Performance Improvements:

vSAN 6.7 introduces a new Adaptive ReSync feature that will make sure resources are available for VM IO and resync IO. This ensures that under IO stress certain traffic types are not starved of resources and allows more bandwidth to be used when there are periods of less contention. Under contention, resync IO will be guaranteed at least 20% of the bandwidth and if no resync traffic exists, VM IO may consume 100%. This is effectively regulating reads and writes to ensure optimal balance for VM and reync IO.

Destaging Performance Improvements:

vSAN 6.7 looks to be more consistent when talking about data optimizations in the data path. With the faster destaging, data drains more quickly from the write buffer to the capacity tier. This allows the buffer tier to be available for newer IO quicker. This is done via improved in-memory handling of IO during destaging that delivers higher throughput and more consistency which in turn improves the overall performance of VM and resync IO.

More Efficient data placement during Host Decommissioning:

When putting a host in maintenance mode or decommissioning a host you need to select the evacuation type for the objects on that host. This can take time depending on the amount of data. vSAN 6.7 builds on improvements introduced in 6.6 that consolidates replicas living across multiple hosts while maintaining FTT compliance. Is looks for the smallest component to move while results in less data being rebuilt and less temporary space usage. vSAN will provide more intelligence behind the data movement to reduce the time and effort it takes to put a host into maintenance mode.

Improved Space Efficiency:

In previous vSAN versions the VM swap object was always thick provisioned even if the VM it’s self was thin. in vSAN 6.7 this will now be thin by default and also inherit the policy from the VM so that the FTT is the swap object is consistent with the VM which results in more efficient storage. Previous to this, large environments would suffer with a large number of swap files taking up a higher proportionate amount of space.

 

Conclusion:

vSan continues to be improved by VMware and they have addressed some core usability and efficiency features in this 6.7 release. The move to the HTML5 web client was expected, but still good to see while the enhancements in resync and destaging all contributes to platform stability. The enhanced health checks add a new dimension to vSAN troubleshooting and the support insight allows users to get a better view of what’s happening on their instances.

References:

Pre release information and images sourced via VMware EABP

https://blogs.vmware.com/virtualblocks/2018/04/17/whats-new-vmware-vsan-6-7/

 

 

The One Problem with the VCSA

Over the past couple of months I noticed a trend in my top blog daily reporting…the Quick fix post on fixing a 503 Service Unavailable error was constantly in the top 5 and getting significant views. The 503 error in various forms has been around since the early days of the VCSA which usually manifests it’s self with the following.

503 Service Unavailable (Failed to connect to endpoint: [N7Vmacore4Http20NamedPipeServiceSpecE:0x0000559b1531ef80] _serverNamespace = / action = Allow _pipeName =/var/run/vmware/vpxd-webserver-pipe)

Looking at the traffic stats for that post it’s clear to see an upward trend in the page views since about the end of June.

This to me is both a good and bad thing. It tells me that more people are deploying or migrating to the VCSA which is what VMware want…but it also tells me that more people are running into this 503 error and looking for ways to fix it online.

The Very Good:

The vCenter Server Appliance is a brilliant initiative from VMware and there has been a huge effort in developing the platform over the past three to four years to get it to a point where it not only became equal to vCenter’s deployed on Windows (and relying on MSSQL) but surpassed it in a lot of features especially in the vSphere 6.5 release. Most VMware shops are planning to or have migrated from Windows to the VCSA and for VMware labs it’s a no brainer for both corporate or homelab instances.

Personally I’ve been running VCSA’s in my various labs since the 5.5 release, have deployed key management clusters with the VCSA and more recently have proven that even the most mature Windows vCenter can be upgraded with the excellent migration tool. Being free of Windows and more importantly MSSQL is a huge factor in why the VCSA is an important consideration and the fact you get extra goodies like HA and API UI’s adds to it’s value.

The One Bad:

Everyone who has dealt with storage issues knows that it can lead to Guest OS file systems errors. I’ve been involved with shared hosting storage platforms all my career so I know how fickle filesystems can be to storage latency or loss of connectivity. Reading through the many forums and blog posts around the 503 error there seems to be a common denominator of something going wrong with the underlying storage before a reboot triggers the 503 error. Clicking here will show the Google results for VCSA + 503 where you can read the various posts mentioned above.

As you may or may not know the 6.5 VCSA has twelve VMDKs, up from 2 in the initial release and to 11 in the 6.0 release. There a couple of great posts from William Lam and Mohammed Raffic that go through what each disk partition does. The big advantage in having these seperate partitions is that you can manage storage space a lot more granularly.

The problem as mentioned is that the underlying Linux file system is susceptible to storage issue. Not matter what storage platform you are running you are guaranteed to have issues at one point or another. In my experience Linux filesystems don’t deal will with those issues. Windows file systems seem to tolerate storage issue much better than their Linux counterparts and without starting a religious war I do know about the various tweaks that can be done to help make Linux filesystems more resilient to underlying storage issues.

With that in mind, the VCSA is very much susceptible to those same storage issues and I believe a lot of people are running into problems mainly triggered by storage related events. Most of the symptoms of the 503 relate back to key vCenter services unable to start after reboot. This usually requires some intervention to fix or a recovery of the VCSA from backup, but hopefully all that’s needed is to run an e2fsck against the filesystem(s) impacted.

The Solution:

VMware are putting a lot of faith into the VCSA and have done a tremendous job to develop it up to this point. It is the only option moving forward for VMware based platforms however there needs to be a little more work done into the resiliency of the services to protect against external issues that can impact the guest OS. PhotonOS is now the OS of choice from 6.5 onwards but that will not stop the legacy of susceptibility that comes with Linux based filesystems leading to issues such as the 503 error. If VMware can protect key services in the event of storage issues that will go a long way to improving that resiliency.

I believe it will get better and just this week VMware announced a monthly security patch program for the VCSA which shows that they are serious (not to say they where not before) about ensuring the appliance is protected but I’m sure many would agree that it needs to offer reliability as well…this is the one area where the Windows based vCenter has an advantage still.

With all that said, make sure you are doing everything possible to have the VCSA housed on as reliable as possible storage and make sure that you are not only backing up the VCSA and external dependancies correctly but understand how to restore the appliance including understanding of the inbuilt backup mechanisms for backing up the config and the PostGres database.

I love and would certainly recommend the VCSA…I just want to love it a little more without having to deal with possibility of having the 503 server error lurking around every storage event.

References:

http://www.vmwarearena.com/understanding-vcsa-6-5-vmdk-partitions-mount-points/

http://www.virtuallyghetto.com/2016/11/updates-to-vmdk-partitions-disk-resizing-in-vcsa-6-5.html

https://www.veeam.com/wp-vmware-vcenter-server-appliance-backup-restore.html

https://kb.vmware.com/kb/2091961

https://kb.vmware.com/kb/2147154

ESXI 6.5 Storage Performance Issues Resolved in Update 1

I originally came across the issue of slow storage performance with the native vmw_ahci driver that comes bundled with ESXi 6.5 just as I was first playing with my SuperMicro SYS-5028D-TN4T in my homelab. After publishing a couple of posts about the workaround shortly afterwards the issue become quiet prevalent in the community and the post continues to get decent traffic, meaning that the issues impacted quiet a few people out there.

The good news is that with the release of vSphere 6.5 Update 1 there is a fix for the problem in the form of updated drivers for the AHCI module. William Lam has been quick to blog about the fix and if you had previously disabled the driver you will need to re-enable it.

This VMwareKB covers the specific patch as listed in the release notes:

No confirmation as of yet if it actually does the trick, but the release notes look promising as the assumption is that it will resolve the issues so that homelabbers and people using the driver in production systems can rest easy.

References:

https://docs.vmware.com/en/VMware-vSphere/6.5/rn/vsphere-esxi-651-release-notes.html

https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2149910

http://www.virtuallyghetto.com/2017/07/ahci-vmw_ahci-performance-issue-resolved-in-esxi-6-5-update-1.html

vSAN 6.6 – What’s In It For Service Providers

Last February when VMware released VSAN 6.2 I stated that “Things had gotten Interesting” with regards to the 6.2 release of vSAN finally marking it’s arrival as a serious player in the Hyper-converged Infrastructure (HCI) market. vSAN was ready to be taken very seriously by VMware’s competitors. Fast forward fourteen months and apart from the fact we have confirmed the v in vSAN is a lower case with the product name officially changing from Virtual SAN to vSAN…Version 6.6 was announced last week is set to GA today, and with it comes the biggest list of new features and enhancements in vSANs history.

VMware has decided to break with the normal vSphere release cycle for vSAN and move to patch releases for vSphere that are actually major updates of vSAN. This is why this release is labeled vSAN 6.6 and will be included in the vSphere 6.5EP2 build. The move allows the vSAN team to continue to enhance the platform outside of the core vSphere platform and I believe it will deliver at least 2 update releases per year.

Looking at the new features and enhancements of the vSAN 6.6 release it’s clear to see that the platform has matured and given the 7000+ strong customer base it’s also clear to see that its being accepted more and more for critical workloads. From a service provider point of view I know of a lot more vCloud Air Network partners that have implemented vSAN as not only their Management HCI platform, but also now their customer HCI compute and storage  platforms.

A lot for Service Providers to like:

As shown in the feature timeline above there are over 20+ new features and enhancements but for me the following ones are most relative to vCAN Service Providers who are using, or looking to use vSAN in their offerings. I will expand on the ones in red as I see them as being the most significant of the new features and enhancements for service providers.

  • Native encryption for data-at-rest
  • Compliance certifications
  • vSAN Proactive Drive HA for failing drives
  • Resilient management independent of vCenter
  • Rapid recovery with smart, efficient rebuilds
  • Certified file service & data protection solutions
  • Enhanced vSAN SDK and PowerCLI
  • Simple networking with Unicast
  • vSAN Cloud Analytics for performance
  • vSAN Cloud Analytics with real-time support notification and recommendations*
  • vSAN Config Assist with 1-click hardware lifecycle management
  • Extended Health Services
  • Up to 50% greater IOPS for all-flash with optimized checksum and dedupe
  • Optimized for latest flash technologies
  • Expanded caching tier choice
  • New Docker Volume Driver

Simple networking with Unicast:

As John Nicholson wrote on the Virtual Blocks blog…it’s time to say goodbye to the multicast requirements around vSAN networking traffic. For a history as to why multicast was used, click here. Also it’s worth reading John’s post and also the he goes through the upgrade process as if you are upgrading from previous versions, multicast will still be used unless you make the change as also specified here.

I can attest first hand to the added complexity when it comes to setting up vSAN with multicast and have gone through a couple of painful deployments where the multicast configuration was an issue during initial setup and also caused issue with switching infrastructure that needed to be upgraded to before vSAN could work reliably. In my mind unicast offers a simpler less complex solution with minimal overheads and makes it more transportable across networks.

Performance Improvements:

Service Providers are always trying to squeeze the most out of their hardware purchases and with VMware claiming 50% greater IOPS for all-flash through optimized data services that in theory can enable 150K IOPS per host it appears they will be served well. in addition to optimized checksum and dedupe along with support for the latest flash technologies. The increased performance helps accelerate tenant workloads and provides higher consolidation ratios for those workloads.

Service providers can accelerate new hardware technologies with the support of the latest flash technologies, including solutions like the new breed of NVMe SSDs. These solutions can deliver up to 250% greater performance for write-intensive applications. vSAN 6.6 now offers larger caching drive options that includes 1.6TB flash drives, so that service providers can take advantage of larger capacity flash drives.

Disk Performance Enhancements:

For those that have gone through a vSAN rebuild operation you would know that is can be a long exercise depending on the amount of data and configuration of the vSAN datastore. vSAN 6.6 introduces a new smart rebuild and rebalancing feature along with partial repairs of degraded or absent components. There is also resync throttling and improved visibility into the rebuilding status through the Health Status. Cormac Hogan goes through the improvements in detail here.

From a Service Provider point of view having these enhanced features around the rebuilds it critical to continued quality of service for IaaS customer who live on shared vSAN storage. Shorter and more efficient rebuild times means less impact to customers.

Health Checks and Monitoring Improvements:

vSAN Encryption:

VMware has introduced VM encryption native at the vSAN datastore level. This can be enabled per vSAN Cluster and works with deduplication and compression across hybrid and all-flash cluster configurations. vSAN 6.6 data Encryption is hardware agnostic, there is no requirement to use specialized and more expensive Self-Encrypting Drives (SEDs) which is also a bonus. Jase McCarty has another Virtual Blocks article here that goes through this feature in great detail.

From a Service Provider point of view you can now potentially offer two classes of vSAN backed storage for IaaS customers. One that lives on an Encrypted enabled cluster that’s charged at a premium over non Encrypted clusters. In talking with service providers across the globe, data at rest encryption has become something that potential customers are asking for and most leading storage companies have an encryption story…now so does vSAN and it appears to be market leading.

vSAN 6.6 Licensing:

In terms of the licensing Matrix, nothing too drastic has changed except for the addition of Data at Rest Encryption in the Enterprise bundle, however in a significant move for vCAN Service Providers, QoS IOPS Limiting has been extended across all license types and can now be taken advantage across the board. This is good for Service Providers who look to offer different tiers or storage performance based on IOPS limited…previously it was only available under Enterprise licensing.

Bootstrapping UI:

As a bonus feature that I think will assist vCAN Service Providers is the new Native Bootstrap installer in vSAN 6.6. William Lam has written about the feature here, but for those looking to install their first vSAN node without vSphere available the ability to bootstrap is invaluable. The old manual process is still worth looking at as it’s always beneficial to know what’s going on in the background, but it’s all GUI based now via the VCSA installer.

Conclusion:

vSAN 6.6 appears to be a great step forward for VMware and Service Providers will no doubt be keen to upgrade as soon as possible to take advantage of the features and enhancements that have been delivered in this 6.6 release.

References:

http://cormachogan.com/2017/04/11/whats-new-vsan-6-6/ 

https://storagehub.vmware.com/#!/vmware-vsan/vmware-vsan-6-5-technical-overview

http://vsphere-land.com/news/an-overview-of-whats-new-in-vmware-vsan-6-6.html

https://storagehub.vmware.com/#!/vmware-vsan/vsan-multicast-removal/multicast-removal-steps-and-requirements/1

vSAN 6.6 Encryption Configuration

https://blogs.vmware.com/virtualblocks/2017/04/11/vsan-6-6-native-data-at-rest-encryption/

https://blogs.vmware.com/virtualblocks/2017/04/11/goodbye-multicast/

Native VCSA bootstrap installer in vSAN 6.6

ESXi 6.5 Storage Performance Issues and Fix

[NOTE] : I decided to republish this post with a new heading and skip right to the meat of the issue as I’ve had a lot of people reach out saying that the post helped them with their performance issues on ESXi 6.5. Hopefully people can find the content easier and have a fix in place sooner.

The issue that I came across was to do with storage performance and the native driver that comes bundled with ESXi 6.5. With the release of vSphere 6.5 yesterday, the timing was perfect to install ESXI 6.5 and start to build my management VMs. I first noticed some issues when uploading the Windows 2016 ISO to the datastore with the ISO taking about 30 minutes to upload. From there I created a new VM and installed Windows…this took about two hours to complete which I knew was not as I had expected…especially with the datastore being a decent class SSD.

I created a new VM and kicked off a new install, but this time I opened ESXTOP to see what was going on, and as you can see from the screen shots below, the Kernel and disk write latencies where off the charts topping 2000ms and 700-1000ms respectively…In throuput terms I was getting about 10-20MB/s when I should have been getting 400-500MB/s. 

ESXTOP was showing the VM with even worse write latency.

I thought to myself if I had bought a lemon of a storage controller and checked the Queue Depth of the card. It’s listed with a QD of 31 which isn’t horrible for a homelab so my attention turned to the driver. Again referencing the VMware Compatibility Guide the listed driver for the controller the device driver is listed as ahci version 3.0.22vmw.

I searched for the installed device driver modules and found that the one listed above was present, however there was also a native VMware device drive as well.

I confirmed that the storage controller was using the native VMware driver and went about disabling it as per this VMwareKB (thanks to @fbuechsel who pointed me in the right direction in the vExpert Slack Homelab Channel) as shown below.

After the host rebooted I checked to see if the storage controller was using the device driver listed in the compatibility guide. As you can see below not only was it using that driver, but it was now showing the six HBA ports as opposed to just the one seen in the first snippet above.

I once again created a new VM and installed Windows and this time the install completed in a little under five minutes! Quiet a difference! Upon running a crystal disk mark I was now getting the expected speeds from the SSDs and things are moving along quiet nicely.

Hopefully this post saves anyone else who might by this, or other SuperMicro SuperServers some time and not get caught out by poor storage performance caused by the native VMware driver packaged with ESXi 6.5.


References
:

https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2044993

Quick Look – vSphere 6.5 Storage Space Reclamation

One of the cool newly enabled features of vSphere 6.5 is the come back of VMFS storage space reclamation. This feature was enabled in a manual way for VMFS5 datastores and was able to be triggered when you free storage space inside a datastore when deleting or migrating a VM…or consolidate a snapshot. At a Guest OS level, storage space is freed when you delete files on a thinly provisioned VMDK and then exists as dead or stranded space. ESXi 6.5 supports automatic space reclamation (SCSI unmap) that originates from a VMFS datastore or a Guest OS…the mechanism reclaims unused space from VM disks that are thin provisioned.

When storage space is deleted without this automated feature the delete operation leaves blocks of unused space on the datastore. VMFS uses the SCSI unmap command to indicate to the array that the storage blocks contain deleted data, so that the array can unallocate these blocks.

On VMFS6 datastores, ESXi supports automatic asynchronous reclamation of free space. VMFS6 generally supports automatic space reclamation requests that generate from the guest operating systems, and passes these requests to the array. Many guest operating systems can send the unmap command and do not require any additional configuration. The guest operating systems that do not support automatic unmaps might require user intervention.

I was interested in seeing if this worked as advertised, so I went about formatting a new VMFS6 datastore with the default options via the Web Client as shown below:

Heading over the hosts command line I checked the reclamation config using the new esxcli namespace:

Through the Web Client you can only set the Reclamation Priority to None or Low, however through the esxcli command you can set that value to medium or high as well as low or none, but as I’ve literally just found out, these esxcli only settings don’t actually do anything in this release.

For the low setting in terms of reclaim priority and how long before the process kicks off on the datastore, the expectation is that any blocks that are no longer used will be reclaimed within 12 hours. I was keeping track of a couple of VMs and the datastore sizes in general and saw that after a day or so there was a difference in the available storage. 

You can see that I clawed back about 22GB and 14GB on both datastores in the first 24 hours. So my initial testing with this new feature shows that it’s a valued and welcomed edition to the new vSphere 6.5 release. I know that for Service Providers that thin provision but charge based on allocated storage, they will benefit greatly from this feature as it automates a mechanism that was complex at best in previous releases.

There is also a great section around UNMAP in the vSphere 6.5 Core Storage White Paper that’s literally just been released as well and can be found here:

References:

http://pubs.vmware.com/vsphere-65/topic/com.vmware.ICbase/PDF/vsphere-esxi-vcenter-server-65-storage-guide.pdf

https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2057513

vSphere 6.5 Core Storage White Paper Now Available

HomeLab – SuperMicro 5028D-TNT4 Storage Driver Performance Issues and Fix

Ok, i’ll admit it…i’ve had serious lab withdrawals since having to give up the awesome Zettagrid Labs. Having a lab to tinker with goes hand in hand with being able to generate tech related content…point and case, my new homelab got delivered on Monday and I have been working to get things setup so that I can deploy my new NestedESXi lab environment.
The issue that I came across was to do with storage performance and the native driver that comes bundled with ESXi 6.5. With the release of vSphere 6.5 yesterday, the timing was perfect to install ESXI 6.5 and start to build my management VMs. I first noticed some issues when uploading the Windows 2016 ISO to the datastore with the ISO taking about 30 minutes to upload. From there I created a new VM and installed Windows…this took about two hours to complete which I knew was not as I had expected…especially with the datastore being a decent class SSD.
By way of an quick intro (longer first impression post to follow) I purchased a SuperMicro SYS-5028D-TN4T that I based off this TinkerTry Bundle which has become a very popular system for vExpert homelabers. It’s got an Intel Xeon D-1541 CPU and I loaded it up with 128GB or RAM. The system comes with an embedded Lynx Point AHCI Controller that allows up to six SATA devices and is listed on the VMware Compatibility Guide for ESXi 6.5.

I created a new VM and kicked off a new install, but this time I opened ESXTOP to see what was going on, and as you can see from the screen shots below, the Kernel and disk write latencies where off the charts topping 2000ms and 700-1000ms respectivly…In throuput terms I was getting about 10-20MB/s when I should have been getting 400-500MB/s. 

ESXTOP was showing the VM with even worse write latency.

I thought to myself if I had bought a lemon of a storage controller and checked the Queue Depth of the card. It’s listed with a QD of 31 which isn’t horrible for a homelab so my attention turned to the driver. Again referencing the VMware Compatability Guide the listed driver for the conrtoller the device driver is listed as ahci version 3.0.22vmw.

I searched for the installed device driver modules and found that the one listed above was present, however there was also a native VMware device drive as well.

I confirmed that the storage controller was using the native VMware driver and went about disabling it as per this VMwareKB (thanks to @fbuechsel who pointed me in the right direction in the vExpert Slack Homelab Channel) as shown below.

After the host rebooted I checked to see if the storage controller was using the device driver listed in the compatability guide. As you can see below not only was it using that driver, but it was now showing the six HBA ports as opposed to just the one seen in the first snippet above.

I once again created a new VM and installed Windows and this time the install completed in a little under five minutes! Quiet a difference! Upon running a crystal disk mark I was now getting the expected speeds from the SSDs and things are moving along quiet nicely.

Hopefully this post saves anyone else who might by this, or other SuperMicro SuperServers some time and not get caught out by poor storage performance caused by the native VMware driver packaged with ESXi 6.5.


References
:

http://www.supermicro.com/products/system/midtower/5028/SYS-5028D-TN4T.cfm

https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2044993

« Older Entries