Tag Archives: VSAN

VSAN Upgrading From 6.1 To 6.2 Hybrid To All Flash – Part 2

When VSAN 6.2 was released earlier this year it came with new and enhanced features and with the price of SSDs continuing to fall and an expanding HCL it seems like All Flash instances are becoming more the norm and for those that have already deployed VSAN in a Hybrid configuration the temptation to upgrade to All Flash is certainly there. Duncan Epping has previously blogged the overview of migrating from Hybrid to All Flash so I wanted to expand on that post and go through the process in a little more detail. This is part two of what is now a three part blog series with the process overview outlined below.

Use the links below to page jump.

In part one I covered upgrading existing hosts, expanding an existing VSAN cluster and upgrading the license and disk format. In this part am going to go through the simple task of extending the cluster by adding new All Flash Disk Groups on the host I added in part one and then go through the actual Hybrid to All Flash migration steps.

The configuration of the VSAN Cluster after the upgrade will be:

  • Four Host Cluster
  • vCenter 6.0.0 Update 2
  • ESXi 6.0.0 Update 2
  • One Disk Groups Per Host
  • 1x 480GB SSD Cache and 2x 1000GB SSD Capacity
  • VSAN Erasure Coding Raid 5 FTT=1
  • DeDuplication and Compression On

As mentioned in part one I added a new host to the cluster in order to give me some breathing room while doing the Hybrid to All Flash upgrade as we need to perform rolling maintenance on each hosts in the cluster in order to get to the All Flash configuration. Each host will be entered into maintenance mode and all data evacuated. Before the process is started on the initial three hosts lets go ahead and create a new All Flash Disk Group on the new hosts.

To create the new Disk Group head to Disk Management under the Virtual SAN section of the Manage Tab whilst the Cluster and click on the Create New Disk Group Button. As you can see below I have the option of selecting any of the flash devices claimed as being ok for VSAN.

After the disk selection is made and the disk group created, you can see below that there is now a mixed mode scenario happening where the All Flash host is participating in the VSAN Cluster and contributing to the capacity.

Upgrade Disk Group from Hybrid to All Flash:

Ok, now that there is some extra headroom the process to migrate the existing Hybrid Hosts over to All Flash can begin. Essentially what the process involves is placing the hosts in maintenance mode with a full data migration, deleting any existing Hybrid disk groups, removing the spinning disk, replacing them with flash and then finally creating new All Flash disk groups.

If you are not already aware about maintenance mode with VSAN then it’s worth reading over this VMware Blog Post to ensure you understand that using the VI Client is a big no no. In this case I wanted to do a full data migration which moves all VSAN components onto remaining hosts active in the cluster.

You can track this process by looking at the Resyncing Components section of the Virtual SAN Monitor Tab to see which objects are being copied to other hosts.

As you can see the new host is actively participating in the Hybrid mixed mode cluster now and taking objects.

Once the copy evacuation has completed we can now delete the existing disk groups on the host by highlights the disk group and clicking on the Remove Disk Group button. A warning appears telling us that data will be deleted and also lets us know how much data is currently on the disks. The previous step has ensured that there should be no data on the disk group and it should be safe to (still) select Full data migration and remove the disk group.

Do this for all existing Hybrid disk groups and once all disk groups have been deleted from the host you are ready to remove the existing spinning disks and replace them with flash disks. The only thing to ensure before attempting to claim the new SSDs is that they don’t have any previous partitions on them…if so you can use the ESXi Embedded Host Client to remove any existing partitions.

Warning: Again it’s worth mentioning that any full data data migration is going to take a fair amount of time depending on the consumed storage of your disk groups and the types of disks being used.

Repeat this process on all remaining hosts in the cluster with Hybrid disk groups until you have a full All Flash cluster as shown above. From here we are now able to take advantage of erasure coding, DeDuplication and compression…I will finish that off in part three of this series.

 

VSAN Upgrading from 6.1 to 6.2 Hybrid to All Flash – Part 1

When VSAN 6.2 was released earlier this year it came with new and enhanced features and depending on what version you where running you might not have been able to take advantage of them all right away. Across all versions, Software Checksum was added with Advanced and Enterprise versions getting VSANs implementation of Erasure Coding (RAID 5/6) with Deduplication and Compression available for the All Flash version and QOS IOPS Limiting available in Enterprise only.

With the price of SSDs continuing to fall and an expanding HCL it seems like All Flash instances are becoming more the norm and for those that have already deployed VSAN in a Hybrid configuration the temptation to upgrade to All Flash is certainly there. Duncan Epping has previously blogged the overview of migrating from Hybrid to All Flash so I wanted to expand on that post and go through the process in a little more detail. This is a two part blog post with a lot of screen shots to compliment the process which is outlined below.

Use the links below to page jump.

Warning: Before I begin it’s worth mentioning that this is not a short process so make sure you plan this out relative to the existing size of your VSAN cluster. In talking with other people who have gone through the disk format upgrade the average rate seems to be about 10TB of consumed data per day depending on the type of disks being used. I’ll reference some posts at the end that relates to the disk upgrade process as it has been troublesome for some however also worth pointing out that the upgrade process is non disruptive for running workloads.

Existing Configuration:

  • Three Host Cluster
  • vCenter 6.0.0 Update 2
  • ESXi 6.0.0 Update 1
  • Two Disk Groups Per Host
  • 1x 200GB SSD and 2x 600GB HDD
  • VSAN Default Policy FTT=1

Upgrade Existing Hosts to 6.0 Update 2:

At the time of writing ESXi 6.0.0 Update 2 is the latest release and the builds that contain the VSAN 6.2 codebase. From the official VMware Upgrade matrix it seems you can’t upgrade from VSAN versions older than 6.1, so if you are on 5.x or 6.0 releases you will need to take note of this VMwareKB to get to ESXI 6.0.0 Update 2. A great resource for the latest builds as well as links to upgrade from head here:

https://esxi-patches.v-front.de/ESXi-6.0.0.html

For a quick upgrade directly from the VMware online host update repository you can do the following on each host in the cluster after putting them into VSAN Maintenance Mode. Note that there are also some advanced settings that are recommended as part of the VSAN Health Checks in 6.2

After rolling through each host in the cluster make sure that you have an updated copy of the VSAN HCL and run a health check to see where you stand. You should see a warning about the disks needing an upgrade and if any hosts didn’t have the above advanced settings applied you will have a warning about that as well.

Expanding VSAN Cluster:

As part of this upgrade I am also adding an additional host to the existing three to expand to a four host cluster. I am doing this for a couple of reasons, not withstanding the accepted design position on four host being better than three from a data availability point of view you also need a minimum of four hosts if you want to enable RAID5 erasure coding (six is required as a minimum for RAID6). The addition of the fourth host also allowed me to roll through the Hybrid to AF upgrade with a lot more headroom.

Before adding the new host to the existing cluster you need to ensure that the build is consistent with the existing hosts in terms of versioning and more importantly networking. Ensure that you have configured an VMkernel Interface for VSAN traffic and marked it as such through the Web Client. If you don’t do this prior to putting the host into the existing cluster I found that the management VMKernel interface was enabled by default for VSAN.

If you notice below this cluster is also NSX enabled, hence the events relating to Virtual NICs being added. Most importantly the host can see other hosts in the cluster and is enabled for HA.

Once in the cluster the host can be used for VM placement with data served from the existing hosts with configured disk groups over the VSAN network.

Upgrade License:

At this point I upgraded the licenses to enable the new features in VSAN 6.2. As a refresher on VSAN licensing there are three editions with the biggest change from previous versions being that to get the Deduplication and Compression, Erasure Coding and QoS features you need to be running All Flash and have an Enterprise license key.

To upgrade the license you need to head to Licensing under the Configuration section of the Manage Tab whilst the Cluster is selected. Apply the new license and you should see the following.

Upgrade Disk Format:

If you have read up around upgrading VSAN you know that there is a disk format upgrade required to get the benefits of the newer versions. Once you have upgraded both vCenter and Hosts to 6.0.0 Update 2 if you check the VSAN Health under the Monitor Tab of the Cluster you should see an failure talking about v2 disks not working with v3 disks as shown below.

You can click on the Upgrade On-Disk Format button here to kick off the process. This can also be triggered from the Disk Management section under the Virtual San menu in the Manage cluster section of the Web Client. Once triggered you will see some events trigger and an update in progress message near the version number.

Borrowing from one of Cormac Hogan’s posts on VSAN 6.2 the following explains what is happening during the disk format upgrade. Also described in the blog post is a way using the Ruby vSphere Client to monitor the progress in more detail.

There are a few sub-steps involved in the on-disk format upgrade. First, there is the realignment of all objects to a 1MB address space. Next, all vsanSparse objects (typically used by snapshots) are aligned to a 4KB boundary. This will bring all objects to version 2.5 (an interim version) and readies them for the on-disk format upgrade to V3. Finally, there is the evacuation of components from a disk groups, then the deletion of said disk group and finally the recreation of the disk group as a V3. This process is then repeated for each disk group in the cluster, until finally all disks are at V3.

As explained above the upgrade can take a significant amount of time depending on the amount of disk groups, data consumed on your VSAN datastore as well as the type of disks being used (SAS based vs SATA/NL-SAS) Once complete you should have a green tick and the On-Disk format version reporting 3.0

With that done we can move ahead to the Hybrid to All Flash conversion. For details on the look out for Part 2 of this series coming soon.

References:

Hybrid vs All-flash VSAN, are we really getting close?

VSAN 6.2 Part 2 – RAID-5 and RAID-6 configurations

VSAN 6.2 Part 12 – VSAN 6.1 to 6.2 Upgrade Steps

VSAN Permanent Disk Failure Detection – Health Status vs Hardware Status

A month or so ago I had one of our VSAN Management Clusters Health Status flag that there was a failed disk in one of the hosts disk groups. VSAN worked quickly and very efficiently to start an evacuation of the data on the impacted host by triggering a resync operation. Once the data was resynced I checked to see the status of the disk that had flagged with the error…Interestingly enough the disk wasn’t being flagged under the ESXi hardware status or in the DELL iDRAC as shown below:

Searching through the vmkernel.log for the affected disk I came across:

The status is being returned from the device and indicates a hardware failure despite the iDRAC not reporting any issues. On top of that the ESXi Host Client status of the disk looked normal.

Initially the sense error was being reporting as being be due to a parity error but not a hardware error on the disk meaning that the disk wasn’t going to be replaced. DELL support couldn’t see anything wrong with the disk from the point of view of the FX2s Chassis, Storage Controller or from the ESXi hardware status. DELL got me to install and run the ESXi PERC command line onto the hosts (which I found to be a handy utility in it’s own right) which also reporting no physical issues.

PERCCLI: http://www.dell.com/support/home/us/en/19/Drivers/DriversDetails?driverId=XY978

As a temporary measure I removed the disk from the disk group and brought the host back out of maintenance mode to allow me to retain cluster resiliency. Later on I updated the Storage Controller Driver and Firmware to the current supported version and after that I added the disk back into the VSAN Disk Group and then cloned a VM onto that host ensuring that data was placed on the hosts disk groups. About 5 minutes after copy the host has flagged and this time the disk has been marked with a permanent disk failure.



This was different behavior to what I first experienced before the firmware and driver update, however the iDRAC and ESXI Hardware status was still showing the disk as being good. This time however the evidence above was enough to get DELL to replace the disk and once replaced I tested a clone operation onto that host again and there where no more issues.

So make sure that you are keeping tabs of the VSAN Health Status as it seems to be a little better or sensitive at flagging troublesome disks than what ESXi is and even the hardware controllers. Better to be overcautious when dealing with data!

References:

https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2109874

http://www.dell.com/support/home/us/en/19/Drivers/DriversDetails?driverId=XY978

Quick Post – VSAN and Log Insight Custom Alerting Example

Log Insight is one of those great VMware products that needs to get more airplay as it has quiet a few applications other than a run of the mill log parser…in this post I’ll go through configuring a basic VSAN alert to detect disk failures. Once VSAN has been configured and deployed there is a new set of alerting parameters that VMware Admins need to be aware of that would usually be part of a traditional storage platforms feature set. Like all storage we need to be made aware of any issues with the supporting hardware such as Storage Controllers and Physical Disks. VSAN 6.2 comes with an excellent Health Monitor that allows you to get a quick overview of a VSAN instances state and will alert through vCenter if any issues arise.

While vCenter Triggered Alerting is fine we had a situation recently where a failed disk was missed for a couple of days due to the default vCenter Alarming not configured correctly. The only way we found out about the failed disk was by visually seeing the alert against the vCenter and then taking a look at the VSAN Health Analyzer. While vCenter monitoring is ok, I don’t believe it should be your only/primary source of monitoring and alerting.

Having done a few alerts in Log Insight before, I looked at what Log Insight could provide by way of logging through the recently released VSAN Content Pack.

Using the Diskgroup Failures menu on the VSAN Content Pack Dashboard I searched through to try and locate the previous disk failure. As shown below a Disk Permanent Error had been registered.

Clicking through to the Interactive Analysis on that event you get a more detailed view of the error and the search parameters of the specific log entry.

To create a custom alert that emails when a Permanent Disk Failure occurs I removed the search fields that related directly to the disk and host and clicked on the Create Alert Icon (Red Bell top left of the image)

As shown below configuring the alert is simple and there are a number of different hooks to use as methods of notification. One of the great things about using Log Insight to trigger Alert notification is the suppression mechanisms to stop alert floods.

Apart from creating custom alerts the VSAN Content pack comes with a number of pre-canned alerts that are disabled by default. To view and enable these click on the Manage Alerts button and filter for VSAN.

If you haven’t had a chance to look at Log Insight, take a look at the features page and if you own a vCenter license you already own 25 OSI Pack of Log Insight.

References:

https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2144909

 

VSAN 6.2 ESXi Patch Updates + DELL PERC Firmware Updates

I wanted to cover off a couple of important updates in this post relating to the DELL PERC storage controller Firmware and software drivers as well as an important new release of ESXi 6.0 that addresses a couple of issues with VSAN and also fixes to more VMXNET3 problems which seem to keep popping up. Read further below for the ESXi fixes but firstly a couple of weeks ago I posted about the new certified driver updates for the DELL PERC based storage controllers that VMware released for VSAN 6.2. This driver was only half of the fix as DELL also released new Firmware for most of the PERC based controllers listed below.

It’s important to match the PERC Firmware with the updated driver from VMware as together they protect against the LSI issues mentioned here. The workaround after the driver has been installed is just that and it requires the FW upgrade to be fully protected. As shown below you want to be on at least version 25.4.0.0015.

Side note: While you are at it looking at the DELL Drivers and Download site you should also consider upgrading to the latest iDRAC Firmware and any other component that contains fixes to issues that could impact you.

Just on that new VMware driver…even if you are running earlier versions of VSAN with the Health Checker if you update the HCL database and run a health check you will see a warning against PERC FW Controller Driver versions prior to lsi_mr3 (6.903.85.00-1OEM.600.0.0.2768847) as shown below.

New ESXi 6.0 Update 2 Build VSAN Fixes:

Last week VMware released ESXi 6.0 Build 3825889 that addressed a couple of big issues relating to VSAN datastore updates and also a bad VMXNET3 PSOD issue. Of most importance to me looking to upgrade existing VSAN 6.1 clusters to VSAN 6.2 there was an issue with CBT enabled VMs when upgrading the VSAN filesystem from 2.0 to 3.0.

Attempts to upgrade a Virtual SAN cluster On-Disk format version from version 2.0 to 3.0 fails when you Power On CBT-enabled VMs. Also, CBT-enabled VMs from a non-owning host might fail due to on-disk lock contention on the ctk files and you might experience the following issues:

  • Deployment of multiple VMs from same CBT enabled template fail.
  • VMs are powered off as snapshot consolidation fails.
  • VM does not Power On if the hardware version is upgraded (for example, from 8 or 9 to 10) before registering the VM on a different host

So that’s not too cool specially if you are using Veeam or some other VDP based backup solution but glad there is a fix for that. Again I don’t get why or how these things slip through…but it seems like things haven’t improved too much when it comes to the QA of ESXi releases. But again, the relative turn around time to have these issues fixed seems to be somewhat acceptable.

As mentioned there are a few more significant fixes so when the time is right this update should be applied to existing ESXi 6.0 Update 2 installations.

References:

https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2145070

https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2144614

http://www.vmware.com/resources/compatibility/detail.php?deviceCategory=vsanio&productid=38055&deviceCategory=vsanio&details=1&vsan_type=vsanio&io_partner=23&io_releases=275&page=1&display_interval=10&sortColumn=Partner&sortOrder=Asc

VSAN 6.2 + DELL PERC: Important Certified Driver Updates

As many of us rejoiced at the release of VSAN 6.2 that came with vSphere 6 Update 2…those of us running DELL PERC based storage controllers where quickly warned of a potential issues and where told to not upgrade. VMwareKB 2144614 referenced these issues and stated that the PERC H730 and FD332 found in DELL Server platforms where not certified for VSAN 6.2 pending onging investigations. The storage controllers that where impacted are listed below.

This impacted me as we have the FD332 Dual ROC in our production FX2s with VSAN 6.1 and a test bed with VSAN 6.2. With the KB initially saying No ETA I sat and waited like others impacted to have the controllers certified. Late last week however DELL and VMware finally released an updated FW driver for that PERC based which certifies the H730s and FD332s with VSAN 6.2.

Before this update if you looked at the VSAN Health Monitor you would have seen a Warning next to the VMware Certified check and official driver support.

As well as upgrading the controller drivers it’s also suggested that you make the following changes on each host in the cluster which adds two new VSAN IO timeout settings. No reboot is required after applying the advanced config and the command is persistent.

esxcfg-advcfg -s 100000 /LSOM/diskIoTimeout
esxcfg-advcfg -s 4 /LSOM/diskIoRetryFactor

Once the driver has been upgraded you should see all green in the VSAN Health Checks as shown below with the up to date driver info.

This is all part of the fun and games of using your own components for VSAN, but I still believe it’s a huge positive to be able to cater a design for specific use cases with specific hardware. In talking with various people within VMware and DELL (as it related to this and previous PERC driver issues) it’s apparent that both parties need to communicate better and go through much better QA before updating drivers and firmware releases however this is not only something that effects VMware and DELL and not only for storage drivers…it’s a common issues throughout the industry and it not only impacts VMware VSAN with every vendor having issues at some point.

Better safe than sorry here and well done on VMware and DELL on getting the PERC certified without too much delay.

References:

https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2144614

http://www.vmware.com/resources/compatibility/detail.php?deviceCategory=vsanio&productid=38055&deviceCategory=vsanio&details=1&vsan_type=vsanio&io_partner=23&io_releases=275&page=1&display_interval=10&sortColumn=Partner&sortOrder=Asc

VSAN 6.2 – Price Changes for Service Providers!

Who said big corporations don’t listen to their clients! VMware have come to the party in a huge way with the release of VSAN 6.2…and not only from a technical point of view. Ever since the release of VSAN the pricing structure for vCloud Air Service Provider partners has been off the mark in terms of the commercial viability in having VSAN deployed at scale. The existing model was hurting any potential uptake in the HCI platform beyond deployments for Management Clusters and alike.

I have been on VMware’s back since March 2014 when VSPP pricing was first revealed and I wrote a detailed blog post back in October where I compared the different vCAN bundles options and showed some examples of how it did not scale.

For me VMware need to look at slightly tweaking the vCAN cost model for VSAN to either allow some form of tiering (ie 0-500GB .08, 500-1000GB .05, 1TB-5TB .02 and so on) and/or change over the metering from allocated GB to consumed GB which allows Service Providers to take advantage of over provisioning and only pay for whats actually being consumed in the VSAN Cluster.

Since that post (obviously not only off the back of the noise I was making) the VSAN Product and Marketing teams have gone out to vCAN Partners and spent time going over possible tweaks to the billing structure for VSAN by surveying partners and trying to achieve the best balance going forward to help increase VSAN uptake.

With the release of VSAN 6.2 in ESXi 6.0 Update 2 this week, VMware have announced new pricing for vCAN Partners…the changes are significant and will represent a complete rethink of VSAN at scale for IaaS Providers. Furthermore the changes are also strategically important for VMware in an attempt to secure the storage market for existing vCAN partners.

The changes are indeed significant and not only is the billing metric based on used or consumed per GB storage now, but in somewhat of a surprise to me the VSPP Point Per Month component has been slashed. Further from that the Enterprise Plus was rumored to be listed at .18 VSPP Point per allocated GB which was going to price out AF even more…now with AF Enterprise costing as much as what the Standard cost in VSPP points per used GB that whole conversation has changed.

Below is an an example software only cost of 10 Hosts (64GB RAM) with 100TB of Storage (60% used capacity average) with an expected utilization of 80% assuming 2 hosts are reserved for HA. Old numbers are in the brackets to the right and is based on VSAN Standard. It must be noted that these are rough numbers based on the new pricing and for the specifics of the new costings you will need to engage with your local vCAN Partner Account manager.

VSAN 80TB Allocated
(48TB Used)
vRAM 410GB (205GB Reserved) Per Month
 $1,966 ($6,400) $1,433  $3,399 ($7,833)

If we scale that to 20 hosts with 128GB and 200TB of Storage (60% used capacity average) with an expected utilization of 80% assuming 4 hosts are reserved for HA.

VSAN 160TB Allocated
(96TB Used)
vRAM 1.6TB (820GB Reserved) Per Month
 $3,932 ($12,800) $5,734  $9,666 ($18,534)

In a real world example based on figures I’ve seen…Taking into account just VSAN…if you have 500TB worth of storage provisioned, of which 200TB was consumed with Advanced plus the Enterprise Add-On the approx. cost of running VSAN comes down from ~30K to ~6K per month.

The idea now that Service Providers can take advantage of thin provisioning plus the change in metric to used or consumed storage and makes VSAN a lot more attractive at scale…while there are still no break points in terms of total storage blocks the conversation around VSAN being to expensive has now, for the most disappeared.

Well done to the VSAN and vCAN product and marketing teams!

Disclaimer:

These figures are based on my own calculations and are based on the VSPP Point value being $1US. This value will be different for vCAN partners depending on the bundle and points level they are on through the program. I have been accurate with my figures but errors and omissions may exist.

VSAN 6.2 – Things Just Got Interesting!

There is a saying in our industry that Microsoft always get their products right on the third attempt…and while this has been less and less the case of late (Hyper-V 2012 didn’t exactly deliver) it is more or less an accurate statement. Having been part of the beta and early access blogger sessions for VSAN 6.2 I can say with confidence that VMware have hit the nail on the head with this 6.2 release.

The Hyper-converged storage platform which is built into the worlds leading hypervisor platform (VMware ESXi) has reached a level of maturity and feature set that should and will make the more established HCI vendors take note and certainly act towards lowering the competitive attack surface that existed with previous releases of VSAN.

The table below shows you the new features of 6.2 together with the existing features of 6.1. As you can see by the number of green dots there are not a lot of new features…but they certainly pack a punch and fill in the gaps that had stopped VSAN being adopted for higher end workloads in comparison with existing market leaders.

Across all versions, Software Checksum has been added with Advanced and Enterprise versions getting VSANs implementation of Erasure Coding (RAID 5/6) with Deduplication and Compression available for the All Flash version and QOS IOPS Limiting available in Enterprise only.

With the initial 5.x releases of VSAN VMware where very reluctant to state that it was suitable for “enterprise” workloads and only mentioned VDI, Test and Development workloads…the language changed to extend to more enterprise workloads in VSAN 6.x but as you can see below the 6.2 release now targets all workloads…and more importantly VMware are openly confident of backing the claim.

VMware have achieved this mostly through the efficiencies that come with their deduplication and compression feature along with erasure coding which in effect adds RAID5/6 support with a FTT level of 1 or 2 set which is in addition to the RAID1 implementation in previous versions. Software Checksum has been used as a huge point of difference in comparing other HCI platforms to the previous VSAN releases so it’s great to see this added tick box to further ensure data consistency across VSAN disk group and datastore objects.

The QOS feature that applies IOPS limiting on a per VM basis is also significant for extending VSAN workload reach and allows the segmentation of noisy neighbours and allows operators to apply limits that have had a flaky history up to this point on vSphere platforms and this is probably my favourite new feature.

As with previous 6.x releases of VSAN there is an AFA option available in Enterprise and Enterprise Plus editions though you will pay a premium compared to the hybrid version and while I’m still not convinced VMware have the pricing right I do know that there is ongoing work to make it more attractive for enterprises and service providers alike.

One of the great things about VSAN is the ability to build your own platform from whatever combination of HCL approved hardware you want. This flexibility is only comparable to EMCs ScaleIO but also means that some extra thought needs to go into a VSAN build if you don’t want to go down the Ready Node path. In my testing…if sized correctly the only limitation in terms of performance is the speed of your network cards and I’ve been able to push VSAN (Hybrid) to impressive throughput numbers with importantly low latency numbers.

Finally, the 6.2 version of VSAN expands on the Health and Monitoring components that existed in previous versions. VMware have baked in new performance and capacity monitoring into the vCenter Web Client that gives insights in VM storage consumption and how that capacity is taken up by the various VSAN components.

There is also a new Cluster Performance Menu to gives greater details into VSAN Cluster throughput, IOPS and latency so there should be no need to get into the vSphere Ruby Client which is a blessing. The UI is limited by the Web Client and not as sexy and modern as others out there but it’s come a long way and now means you don’t need to hook in external systems to get VSAN related metrics.

As suggested by the posts title, I believe that this VSAN release represents VMware’s official coming of age into the HCI market and will make the other players take note which will no doubt spark the odd Twitter fuelled banter and Slack Channel discussions about what’s missing or what’s been copied…but at the end of the day competition in tech is great and better products are born out of competition.

Things just got Interesting!

For a more detailed look at the new features check out Duncan Epping‘s post here:

Preserving VSAN + DELL PERC Critical Drivers after ESXi 6.0 CBT Update

Last week VMware released a patch to fix another issue with Change Block Tracking (CBT) which took the ESXi 6.0 Update 1 Build to 3247720. The update bundle contains a number of updates to the esx-base including the resolution of the CBT issue.

This patch updates the esx-base VIB to resolve an issue that occurs when you run virtual machine backups which utilize Changed Block Tracking (CBT) in ESXi 6.0, the CBT API call QueryDiskChangedAreas() might return incorrect changed sectors that results in inconsistent incremental virtual machine backups. The issue occurs as the CBT fails to track changed blocks on the VMs having I/O during snapshot consolidation.

Having just deployed and configured a new Management Cluster consisting of four ESXI 6.0 Update 1 hosts running VSAN I was keen to get the patch installed so that VDP based backups would work without issue however once I had deployed the update (via esxcli) to the first three hosts I saw that the VSAN Health Checker was raising a warning against the cluster. Digging into the VSAN Health Check Web Client Monitor view I saw the following under HCL Health -> Controller Driver Test

As I posted early November there was an important driver and firmware update that was released by VMware and DELL that resolved a number of critical issues with VSAN when put under load. The driver package is shown above against node-104 as 6.606.12.00-1OEM.600.0.0.2159203 and that shows a Passed Driver Health state. The others are all in the Warning state and the version is 6.605.08.00-7vmw.600.1.17.3029758.

What’s happened here is that the ESXi Patch has “updated” the Controller driver to the latest VMware driver number and has overwritten the driver released on the 19th of May and the one listed on the VMware VSAN HCL Page. The simple fix is to reinstall the OEM drivers so that you are left back with the VSAN Health Status as shown below.

Interestingly the Device now shows up as a Avago (LSI) MegaRAID SAS Invader Controller instead of a FD332-PERC (Dual ROC) … I questioned that with a member of the VSAN team and it looks as though that is indeed the OEM name for the FD332 Percs.

So be aware when updating ESXi builds to ensure the updated drivers haven’t removed/replaced it with anything that’s going to potentially give you a really bad time with VSAN…or any other component for that matter.

References:

http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2137546

Dell PowerEdge FX2: VSAN Disk Configuration Steps

When you get your new DELL FX2s out of the box and powered on for the first time you will notice that the disk configuration has not been setup with VSAN in mind…If you where to log into ESXi on the blades in SLOT1a and 1c you would see that each host will have each SAS disk configured as a datastore. There is a little pre-configuration you need to do in order to get the drives presented correctly to the blades servers as well as remove and reconfigure the datastores and disks from within ESXi.

With my build I had four FC430 Blades with two FD332 Storage Sleds that contained 4x200GB SSDs and 8x600GB SAS drives in each sled.  By default the storage mode is configured in Split Single Host mode which results in all the disks being assigned to the hosts in SLOT1a and SLOT1c and both controllers as also assigned to the single host.

You can configure individual storage sleds containing two RAID controllers to operate in the following modes:

  • Split-single – Two RAID controllers are mapped to a single compute sled. Both the controllers are enabled and each controller is connected to eight disk drives
  • Split-dual – Both RAID controllers in a storage sled are connected to two compute sleds.
  • Joined – The RAID controllers are mapped to a single compute sled. However, only one controller is enabled and all the disk drives are connected to it.

To take advantage of the FD332-PERC (Dual ROC) controller you need to configure Split-Dual mode. All hosts need to be powered off to change the default configuration and change it to Split Dual Hosts for the VSAN configuration.

Head to Server Overview -> Power and from here Gracefully Shutdown all four servers

Once the servers have been powered down, click on the Storage Sleds in SLOT-03 and SLOT-04 and go to the Setup Tab. Change the Storage Mode to Split Dual Host and Click Apply.

To check the distribution of the disks you can Launch the iDRAC to each blade and go to Storage -> Enclosures and check to see that each Blade now has 2xSSDs and 4xHDD drives assigned. With the FD332 there are 16 total slots with 0-7 belonging to the first blade and 8-16 belonging to the seconds blade. As shown below we are looking at the config of SLOT1a.

The next step is to reconfigure the disks within ESXi to make sure VSAN can claim them when configuring the Disk Groups. Part of the process below is to delete any datastores that exist and clear the partition table…by far the easiest way to achieve this is via the new Embedded Host Client.

Install the Embedded Host Client on each Host

Log into the Hosts via the Embedded Client from https://HOST_IP/ui and go to the Storage Menu and delete any datastores that where preconfigured by DELL.

Click on Devices Tab in the Storage Menu and Clear the Partition Table so the VSAN can claim the disks that have been just deleted.

From here all disks should be available to be claimed by VSAN to create your disk groups.

As a side note it’s important to update to the latest driver for the PERC.

References:

http://www.dell.com/support/manuals/au/en/aubsd1/dell-cmc-v1.20-fx2/CMCFX2FX2s12UG-v1/Notes-cautions-and-warnings?guid=GUID-5B8DE7B7-879F-45A4-88E0-732155904029&lang=en-us

« Older Entries Recent Entries »