Tag Archives: VSAN

vExpert Pivot: NSX and VSAN Program Announcements

This week the VMware vExpert team officially lifted the lid on two new subprograms that focus on NSX and VSAN. The announcements signal a positive move for the vExpert program that had come under some criticism over the past two or so years around the fact that the program had lost some of it’s initial value. As I’ve mentioned previously the program is unmistakably an advocacy program first and foremost and those who are part of the vExpert group should be active contributors in championing VMware technologies as well as being active in their spheres of influence.

Corey and the rest of the team have responded to the calls for change by introducing vExpert Specialties now more in line to what Microsoft does with it’s MVP Program. The first specializations are focused on VMware’s core focus products of NSX and VSAN…these programs are built on the base vExpert program and the group is chosen from existing vExperts who have shown and demonstrated contribution to each technology. The VSAN announcement blog articulates the criteria perfectly.

This group of individuals have passion and enthusiasm for technology, but more importantly, have demonstrated significant activity and evangelism around VSAN.

With that, I am extremely proud to be part of both the inaugural NSX and VSAN vExpert program. It’s some reward and acknowledgment for the content I have created and contributed to for both technologies since their release. Substance is important when it comes to awarding community contribution and as I look through the list I see nothing but substance and quality in the groups.

Again, this is a great move by the vExpert team and I’m looking forward to it reinvigorating the program. I’ve pasted linked below to my core NSX and VSAN content…I’m especially proud of the NSX Bytes series which continues to do well in terms of people still seeking out the content. More recently I have done a bit of work around VSAN and upgrading VSAN from Hybrid to All Flash series was well received. Feel free to browse the content below and look forward to catching up with everyone at VMworld US.

References:

vExpert NSX 2016 Award Announcement

Announcing the 2016 VSAN vExperts

VMworld 2016: Top Session Picks

VMworld 2016 is just around the corner (10 days and counting) and the theme this year is be_Tomorrow …which looks to build on the Ready for Any and Brave IT messages from the last couple of VMworld events. It’s a continuation of VMware’s call to arms to get themselves and their partners and customers prepared for the shift in the IT of tomorrow. This will be my fourth VMworld and I am looking forward to spending time networking with industry peers, walking around the Solutions Exchange on the look out out for the next Rubrik or Platform9 and attending Technical Sessions.

http://www.vmworld.com/uscatalog.jspa

The Content Catalog went live a few weeks ago and the Session Builder has also been live allowing attendees to lock in sessions. There are a total of 817 sessions this year, up from the 752 sessions last year. I’ve listed the main tracks with the numbers fairly similar to last year.

Cloud Native Applications (17)
End-User Computing (97)
Hybrid Cloud (63)
Partner Exchange @ VMworld (74)
Software-Defined Data Center (504)
Technology Deep Dives & Futures (22)

VMware’s core technology focus around VSAN and NSX again has the lions share of sessions this time year, with EUC still a very popular subject. It’s pleasing to see a lot of vCloud Air Network related sessions in the list (for a detailed look at the vCAN Sessions read my previous post) and there is a solid amount of Cloud Native Application content. Below are my top picks for this year:

  • Virtual SAN – Day 2 Operations [STO7534]
  • Advanced Network Services with NSX [NET7907]
  • A Day in the Life of a VSAN I/O [STO7875]
  • vSphere 6.x Host Resource Deep Dive [INF8430]
  • The Architectural Future of Network Virtualization [NET8193R]
  • Conducting a Successful Virtual SAN 6.2 Proof of Concept [STO7535]
  • How to design and implement VMware’s vCloud in production [SDDC9612-SPO]
  • PowerNSX and PyNSXv: Using PowerShell and Python for Automation and Management of VMware NSX for vSphere [NET7514]
  • Evolving the vSphere API for the Modern Era [INF8255]
  • Multisite Networking and Security with Cross-vCenter NSX: Part 2 [NET7861R]

My focus seems to have shifted back towards more vCloud Director and Network/Hybrid Cloud automation of late and it’s reflected in the choices above. Along side that I am also very interested to see how VMware position vCloud Air after the shambles of the past 12 months and I always I look forward to hearing from respected industry technical leads Frank Denneman, Chris Wahl and Duncan Epping as they give their perspective on storage and software defined datacenters and automation. This year I’m also looking at what the SABU Tech Marketing Team are up to around VSAN and VSAN futures.

As has also become tradition, there are a bunch of bloggers who put out their Top picks for VMworld…check out the links below for more insight into what’s going to be hot in Las Vegas this VMworld. Hope to catch up with as many community folk as possible while over so if you are interested in a chat, hit me up!

My top 15 VMworld sessions for 2016

Top 5 Log Insight VMworld Sessions

be_TOMORROW at VMworld 2016 – Key Storage and Availability Activities

 

My Top Session picks for VMworld 2016

http://www.mindthevirt.com/top-vmworld-sessions-category-1247

PowerCLI Script to Calculate VSAN vCAN Points Per Month

There is no doubt that new pricing introduced to vCAN Service Providers announced just after VSAN 6.2 was released meant that Service Providers looking at VSAN for their IaaS or MSP offerings that had previously written it off due to price, could once again consider it as a viable and price competitive option. As of writing this blog post there is no way to meter the new reporting mechanism automatically through the existing vCloud Usage Meter with the current 3.5 beta also lacking the ability to report billing info.

I had previously come across a post from @virten that contained a PowerCLI script to calculate VSPP points based on the original allocated GB model. With VSAN 6.2 pricing was now based on a consumed GB model which was a significant win for those pushing for a more competitive pricing structure to be able to push a now mature VSAN as a platform of choice.

Before I post the code it’s worth noting that I am still not 100% happy with the interpretation of the reporting:

The VsanSpaceUsage(vim.cluster.VsanSpaceUsage) data object has the following two properties which vCAN partners can use to pull Virtual SAN usage information: a) totalCapacityB (total Virtual SAN capacity in bytes) and b) freeCapacityB (free Virtual SAN capacity in bytes). Subtracting b) from a) should yield the desired “Used Capacity” information for monthly reporting.

I read that to say that you report for any fault tolerance or data resiliency overheads…that is to say if you have a VM with a 100GB hard disk consuming 50GB on a VSAN datastore utilizing RAID1 and an FTT=1 you will pay for the 100GB that is actually consumed.

With that in mind I had to add in a multiplier into the original script I had hacked together to cater for the fault tolerance and raid level you may run. The rest is pretty self explanatory and I have built on @virtens original script by asking for which vCenter you want to log into, what VSAN licensing model you are using and then finally ask for the RAID and FTT levels you are running. The result is the total amount of consumed storage of all VM disks residing on the VSAN datastore (which is the only value hard coded) and then the amount of vCAN points you would be up for per month with and without the overhead tax.

The code is below, please share and improve and note that I provide it as is and should be used as such. Please let me know if I’ve made any glaring mistakes…

If someone can also let me know how to round numbers and capture an incorrect vCenter login gracefully and exit that would be excellent! – [EDIT] Thanks to Virten for jumping on that! Code updated!

References:

PowerCLI Script to Calculate VSAN VSPP Points

VSAN 6.2: Reminder About Important Fix

[UPDATE] This issue is resolved in VMware ESXi 6.0, Patch Release ESXi600-201608001. For more information, see VMware ESXi 6.0, Patch Release ESXi600-201608001 (2145663).

Last week VMware released an important KB based around an issue with VSAN 6.2 where some VMs residing on existing Hybrid VSAN datastores may exhibit reduced disk IO performance after an upgrade. In a nutshell the issue is caused by a new operation that’s linked to the new deduplication and compression features in VSAN 6.2. The issue affects only VSAN 6.2 Hybrid deployments and is obviously not applicable to All Flash VSAN Clusters.

If impacted you may see:

  • A significantly lower than expected read cache hit ratio is observed on VSAN caching tier.
  • A higher percentage of IOPS may be observed on capacity tier disks on Hybrid diskgroups when compared from previous 6.x systems.
  • Overall increased VM observed latency

This issue is caused by VSAN 6.2 performing low level scanning for unique blocks, which is related to deduplication, can still occur on VSAN hybrid disk groups. This causes performance deterioration on Hybrid Disk groups, as it has a significant read caching performance impact on the SSD cache tier of VSAN disk groups.

The Workaround:

To work around this issue, if you are using a Hybrid configuration, you can turn off the dedup scanner option on each VSAN host in the VSAN Hybrid cluster. The way to turn it off is to modify the advanced setting lsomComponentDedupScanType which is set to a default value of 2. For the workaround you set that to 0. The easiest way to archive this is through PowerCLI as shown below.

Note that each host needs to be rebooted for the settings to take affect so go through the normal process of ensuring hosts go into VSAN maintenance mode before reboot.

Also worth mentioning a PowerCLI script that Jase McCarty has put up on GitHub that Gets/Sets the Deduplication Scanner settings with the use of some checks via a PowerCLI script that accepts variables.

https://github.com/jasemccarty/DedupeScan

References:

https://kb.vmware.com/kb/2146267

VSAN Upgrading from 6.1 to 6.2 Hybrid to All Flash – Part 3

When VSAN 6.2 was released earlier this year it came with new and enhanced features and with the price of SSDs continuing to fall and an expanding HCL it seems like All Flash instances are becoming more the norm and for those that have already deployed VSAN in a Hybrid configuration the temptation to upgrade to All Flash is certainly there. Duncan Epping has previously blogged the overview of migrating from Hybrid to All Flash so I wanted to expand on that post and go through the process in a little more detail. This is the final part of a three part blog series with the process overview outlined below.

Use the links below to page jump.

In part one I covered upgrading existing hosts, expanding an existing VSAN cluster and upgrading the license and disk format. In part two I covered the actual Hybrid to All Flash migration steps and in this last part I will finish off by going through the process of creating a new VSAN Policy, migrate existing VMs to the new policy and  then enable deduplication and compression.

Before continuing it’s worth pointing out that after the Hybrid to All Flash migration you are going to be left with an unbalanced VSAN cluster as the full data evacuation off the last Hybrid host will leave that host without objects. Any new objects created will work to re-balance the cluster however if you want to initiate a proactive re-balance you can tit the re-balance button from the Health status window. For more on this process check out this post from Cormac Hogan.

Create new Policy and Migrate VMs:

To take advantage of the new erasure coding now in the VSAN 6.2 All Flash cluster we will need to create a new storage policy and apply that policy to any existing VMs. In my case all VMs where on the Default VSAN Policy with FTT=1. The example below shows the creation of a new Storage Policy that uses RAID5 erasure coding with FTT=1. If you remember from previous posts the reason for expanding the cluster to four hosts was to cater for this specific policy.

To create the new Storage Policy head to VM Storage Policies from the Home page of the Web Client and click on Create New VM Storage Policy. Give policy a name, click Next and construct Rule-Set 1 which is based on VSAN. Select the Failure tolerance method and choose RAID-5/6 (Erasure Coding) – Capacity.

In this case with FTT=1 chosen RAID5 will be used. Clicking on Next should show that the existing VSAN datastore is compatible with the policy. With that done we can migrate existing VMs off the Default VSAN Policy onto the newly created one.

To get an list of what VMs are going to be migrated have a look at the PowerCLI commands below to get the VMs on the VSAN Datastore and then get their Storage Policy. The last command below gets a list of existing policies.

To apply the new Erasure Coding Storage Policy its handy to get the full name of the policy.

To migrate the VMs to the new policy you can either do it one by one via the Web Client of do it on mass via the following PowerCLI script.

Once run the VMs will have the new policy applied and VSAN will work in the background to get those VM objects compliant. You can see the status of Virtual Disk Placement in the Virtual SAN tab of the Monitor Tab of the cluster.

Enable DeDupe and Compression:

Before I go into the details…for a brilliant overview and explanation of DeDupe and Compression with VSAN 6.2 head to this post from Cormac Hogan. To enable this feature we need to double check that the licensing is correct as detailed in the first post and also ensure that all previous steps relating to the Hybrid to All Flah migration has taken place. To turn on this feature head to the General window under the Virtual SAN Settings menu on the cluster Manage tab and click on the Edit button next to Virtual SAN is Turned ON.

Choose Enabled in the drop down and take note of the checkbox that talks about Allow Reduced Redundancy understanding what that means by reading the info box as shown above. Once you click on the process to enable DeDuplication and Compress will begin…this process will go through an reconfigure all Disk Groups similar to to the process to upgrade from between Hybrid and All Flash. Again this will take some time depending on the number of host, number of disk groups and type of disks in the cluster.

Below I have shown the before and after of the Capacity window under the Virtual SAN tab in the Monitor section of the Cluster view. You can see that before enabled, there is a message saying that DeDeuplication and Compression is disabled.

And after enabling DeDuplication and Compression you start to get some statistics relating to both of them in the window relating to savings and ratios. Even in my small lab environment I started to see some benefits.

With that complete we have finished this series and have gone through all the steps in order to get to an All Flash VSAN Cluster with the newest features enabled.

References:

VSAN 6.2 Part 1 – Deduplication and Compression

VSAN 6.2 Part 2 – RAID-5 and RAID-6 configurations

 

VSAN Upgrading From 6.1 To 6.2 Hybrid To All Flash – Part 2

When VSAN 6.2 was released earlier this year it came with new and enhanced features and with the price of SSDs continuing to fall and an expanding HCL it seems like All Flash instances are becoming more the norm and for those that have already deployed VSAN in a Hybrid configuration the temptation to upgrade to All Flash is certainly there. Duncan Epping has previously blogged the overview of migrating from Hybrid to All Flash so I wanted to expand on that post and go through the process in a little more detail. This is part two of what is now a three part blog series with the process overview outlined below.

Use the links below to page jump.

In part one I covered upgrading existing hosts, expanding an existing VSAN cluster and upgrading the license and disk format. In this part am going to go through the simple task of extending the cluster by adding new All Flash Disk Groups on the host I added in part one and then go through the actual Hybrid to All Flash migration steps.

The configuration of the VSAN Cluster after the upgrade will be:

  • Four Host Cluster
  • vCenter 6.0.0 Update 2
  • ESXi 6.0.0 Update 2
  • One Disk Groups Per Host
  • 1x 480GB SSD Cache and 2x 1000GB SSD Capacity
  • VSAN Erasure Coding Raid 5 FTT=1
  • DeDuplication and Compression On

As mentioned in part one I added a new host to the cluster in order to give me some breathing room while doing the Hybrid to All Flash upgrade as we need to perform rolling maintenance on each hosts in the cluster in order to get to the All Flash configuration. Each host will be entered into maintenance mode and all data evacuated. Before the process is started on the initial three hosts lets go ahead and create a new All Flash Disk Group on the new hosts.

To create the new Disk Group head to Disk Management under the Virtual SAN section of the Manage Tab whilst the Cluster and click on the Create New Disk Group Button. As you can see below I have the option of selecting any of the flash devices claimed as being ok for VSAN.

After the disk selection is made and the disk group created, you can see below that there is now a mixed mode scenario happening where the All Flash host is participating in the VSAN Cluster and contributing to the capacity.

Upgrade Disk Group from Hybrid to All Flash:

Ok, now that there is some extra headroom the process to migrate the existing Hybrid Hosts over to All Flash can begin. Essentially what the process involves is placing the hosts in maintenance mode with a full data migration, deleting any existing Hybrid disk groups, removing the spinning disk, replacing them with flash and then finally creating new All Flash disk groups.

If you are not already aware about maintenance mode with VSAN then it’s worth reading over this VMware Blog Post to ensure you understand that using the VI Client is a big no no. In this case I wanted to do a full data migration which moves all VSAN components onto remaining hosts active in the cluster.

You can track this process by looking at the Resyncing Components section of the Virtual SAN Monitor Tab to see which objects are being copied to other hosts.

As you can see the new host is actively participating in the Hybrid mixed mode cluster now and taking objects.

Once the copy evacuation has completed we can now delete the existing disk groups on the host by highlights the disk group and clicking on the Remove Disk Group button. A warning appears telling us that data will be deleted and also lets us know how much data is currently on the disks. The previous step has ensured that there should be no data on the disk group and it should be safe to (still) select Full data migration and remove the disk group.

Do this for all existing Hybrid disk groups and once all disk groups have been deleted from the host you are ready to remove the existing spinning disks and replace them with flash disks. The only thing to ensure before attempting to claim the new SSDs is that they don’t have any previous partitions on them…if so you can use the ESXi Embedded Host Client to remove any existing partitions.

Warning: Again it’s worth mentioning that any full data data migration is going to take a fair amount of time depending on the consumed storage of your disk groups and the types of disks being used.

Repeat this process on all remaining hosts in the cluster with Hybrid disk groups until you have a full All Flash cluster as shown above. From here we are now able to take advantage of erasure coding, DeDuplication and compression…I will finish that off in part three of this series.

 

VSAN Upgrading from 6.1 to 6.2 Hybrid to All Flash – Part 1

When VSAN 6.2 was released earlier this year it came with new and enhanced features and depending on what version you where running you might not have been able to take advantage of them all right away. Across all versions, Software Checksum was added with Advanced and Enterprise versions getting VSANs implementation of Erasure Coding (RAID 5/6) with Deduplication and Compression available for the All Flash version and QOS IOPS Limiting available in Enterprise only.

With the price of SSDs continuing to fall and an expanding HCL it seems like All Flash instances are becoming more the norm and for those that have already deployed VSAN in a Hybrid configuration the temptation to upgrade to All Flash is certainly there. Duncan Epping has previously blogged the overview of migrating from Hybrid to All Flash so I wanted to expand on that post and go through the process in a little more detail. This is a two part blog post with a lot of screen shots to compliment the process which is outlined below.

Use the links below to page jump.

Warning: Before I begin it’s worth mentioning that this is not a short process so make sure you plan this out relative to the existing size of your VSAN cluster. In talking with other people who have gone through the disk format upgrade the average rate seems to be about 10TB of consumed data per day depending on the type of disks being used. I’ll reference some posts at the end that relates to the disk upgrade process as it has been troublesome for some however also worth pointing out that the upgrade process is non disruptive for running workloads.

Existing Configuration:

  • Three Host Cluster
  • vCenter 6.0.0 Update 2
  • ESXi 6.0.0 Update 1
  • Two Disk Groups Per Host
  • 1x 200GB SSD and 2x 600GB HDD
  • VSAN Default Policy FTT=1

Upgrade Existing Hosts to 6.0 Update 2:

At the time of writing ESXi 6.0.0 Update 2 is the latest release and the builds that contain the VSAN 6.2 codebase. From the official VMware Upgrade matrix it seems you can’t upgrade from VSAN versions older than 6.1, so if you are on 5.x or 6.0 releases you will need to take note of this VMwareKB to get to ESXI 6.0.0 Update 2. A great resource for the latest builds as well as links to upgrade from head here:

https://esxi-patches.v-front.de/ESXi-6.0.0.html

For a quick upgrade directly from the VMware online host update repository you can do the following on each host in the cluster after putting them into VSAN Maintenance Mode. Note that there are also some advanced settings that are recommended as part of the VSAN Health Checks in 6.2

After rolling through each host in the cluster make sure that you have an updated copy of the VSAN HCL and run a health check to see where you stand. You should see a warning about the disks needing an upgrade and if any hosts didn’t have the above advanced settings applied you will have a warning about that as well.

Expanding VSAN Cluster:

As part of this upgrade I am also adding an additional host to the existing three to expand to a four host cluster. I am doing this for a couple of reasons, not withstanding the accepted design position on four host being better than three from a data availability point of view you also need a minimum of four hosts if you want to enable RAID5 erasure coding (six is required as a minimum for RAID6). The addition of the fourth host also allowed me to roll through the Hybrid to AF upgrade with a lot more headroom.

Before adding the new host to the existing cluster you need to ensure that the build is consistent with the existing hosts in terms of versioning and more importantly networking. Ensure that you have configured an VMkernel Interface for VSAN traffic and marked it as such through the Web Client. If you don’t do this prior to putting the host into the existing cluster I found that the management VMKernel interface was enabled by default for VSAN.

If you notice below this cluster is also NSX enabled, hence the events relating to Virtual NICs being added. Most importantly the host can see other hosts in the cluster and is enabled for HA.

Once in the cluster the host can be used for VM placement with data served from the existing hosts with configured disk groups over the VSAN network.

Upgrade License:

At this point I upgraded the licenses to enable the new features in VSAN 6.2. As a refresher on VSAN licensing there are three editions with the biggest change from previous versions being that to get the Deduplication and Compression, Erasure Coding and QoS features you need to be running All Flash and have an Enterprise license key.

To upgrade the license you need to head to Licensing under the Configuration section of the Manage Tab whilst the Cluster is selected. Apply the new license and you should see the following.

Upgrade Disk Format:

If you have read up around upgrading VSAN you know that there is a disk format upgrade required to get the benefits of the newer versions. Once you have upgraded both vCenter and Hosts to 6.0.0 Update 2 if you check the VSAN Health under the Monitor Tab of the Cluster you should see an failure talking about v2 disks not working with v3 disks as shown below.

You can click on the Upgrade On-Disk Format button here to kick off the process. This can also be triggered from the Disk Management section under the Virtual San menu in the Manage cluster section of the Web Client. Once triggered you will see some events trigger and an update in progress message near the version number.

Borrowing from one of Cormac Hogan’s posts on VSAN 6.2 the following explains what is happening during the disk format upgrade. Also described in the blog post is a way using the Ruby vSphere Client to monitor the progress in more detail.

There are a few sub-steps involved in the on-disk format upgrade. First, there is the realignment of all objects to a 1MB address space. Next, all vsanSparse objects (typically used by snapshots) are aligned to a 4KB boundary. This will bring all objects to version 2.5 (an interim version) and readies them for the on-disk format upgrade to V3. Finally, there is the evacuation of components from a disk groups, then the deletion of said disk group and finally the recreation of the disk group as a V3. This process is then repeated for each disk group in the cluster, until finally all disks are at V3.

As explained above the upgrade can take a significant amount of time depending on the amount of disk groups, data consumed on your VSAN datastore as well as the type of disks being used (SAS based vs SATA/NL-SAS) Once complete you should have a green tick and the On-Disk format version reporting 3.0

With that done we can move ahead to the Hybrid to All Flash conversion. For details on the look out for Part 2 of this series coming soon.

References:

Hybrid vs All-flash VSAN, are we really getting close?

VSAN 6.2 Part 2 – RAID-5 and RAID-6 configurations

VSAN 6.2 Part 12 – VSAN 6.1 to 6.2 Upgrade Steps

VSAN Permanent Disk Failure Detection – Health Status vs Hardware Status

A month or so ago I had one of our VSAN Management Clusters Health Status flag that there was a failed disk in one of the hosts disk groups. VSAN worked quickly and very efficiently to start an evacuation of the data on the impacted host by triggering a resync operation. Once the data was resynced I checked to see the status of the disk that had flagged with the error…Interestingly enough the disk wasn’t being flagged under the ESXi hardware status or in the DELL iDRAC as shown below:

Searching through the vmkernel.log for the affected disk I came across:

The status is being returned from the device and indicates a hardware failure despite the iDRAC not reporting any issues. On top of that the ESXi Host Client status of the disk looked normal.

Initially the sense error was being reporting as being be due to a parity error but not a hardware error on the disk meaning that the disk wasn’t going to be replaced. DELL support couldn’t see anything wrong with the disk from the point of view of the FX2s Chassis, Storage Controller or from the ESXi hardware status. DELL got me to install and run the ESXi PERC command line onto the hosts (which I found to be a handy utility in it’s own right) which also reporting no physical issues.

PERCCLI: http://www.dell.com/support/home/us/en/19/Drivers/DriversDetails?driverId=XY978

As a temporary measure I removed the disk from the disk group and brought the host back out of maintenance mode to allow me to retain cluster resiliency. Later on I updated the Storage Controller Driver and Firmware to the current supported version and after that I added the disk back into the VSAN Disk Group and then cloned a VM onto that host ensuring that data was placed on the hosts disk groups. About 5 minutes after copy the host has flagged and this time the disk has been marked with a permanent disk failure.



This was different behavior to what I first experienced before the firmware and driver update, however the iDRAC and ESXI Hardware status was still showing the disk as being good. This time however the evidence above was enough to get DELL to replace the disk and once replaced I tested a clone operation onto that host again and there where no more issues.

So make sure that you are keeping tabs of the VSAN Health Status as it seems to be a little better or sensitive at flagging troublesome disks than what ESXi is and even the hardware controllers. Better to be overcautious when dealing with data!

References:

https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2109874

http://www.dell.com/support/home/us/en/19/Drivers/DriversDetails?driverId=XY978

Quick Post – VSAN and Log Insight Custom Alerting Example

Log Insight is one of those great VMware products that needs to get more airplay as it has quiet a few applications other than a run of the mill log parser…in this post I’ll go through configuring a basic VSAN alert to detect disk failures. Once VSAN has been configured and deployed there is a new set of alerting parameters that VMware Admins need to be aware of that would usually be part of a traditional storage platforms feature set. Like all storage we need to be made aware of any issues with the supporting hardware such as Storage Controllers and Physical Disks. VSAN 6.2 comes with an excellent Health Monitor that allows you to get a quick overview of a VSAN instances state and will alert through vCenter if any issues arise.

While vCenter Triggered Alerting is fine we had a situation recently where a failed disk was missed for a couple of days due to the default vCenter Alarming not configured correctly. The only way we found out about the failed disk was by visually seeing the alert against the vCenter and then taking a look at the VSAN Health Analyzer. While vCenter monitoring is ok, I don’t believe it should be your only/primary source of monitoring and alerting.

Having done a few alerts in Log Insight before, I looked at what Log Insight could provide by way of logging through the recently released VSAN Content Pack.

Using the Diskgroup Failures menu on the VSAN Content Pack Dashboard I searched through to try and locate the previous disk failure. As shown below a Disk Permanent Error had been registered.

Clicking through to the Interactive Analysis on that event you get a more detailed view of the error and the search parameters of the specific log entry.

To create a custom alert that emails when a Permanent Disk Failure occurs I removed the search fields that related directly to the disk and host and clicked on the Create Alert Icon (Red Bell top left of the image)

As shown below configuring the alert is simple and there are a number of different hooks to use as methods of notification. One of the great things about using Log Insight to trigger Alert notification is the suppression mechanisms to stop alert floods.

Apart from creating custom alerts the VSAN Content pack comes with a number of pre-canned alerts that are disabled by default. To view and enable these click on the Manage Alerts button and filter for VSAN.

If you haven’t had a chance to look at Log Insight, take a look at the features page and if you own a vCenter license you already own 25 OSI Pack of Log Insight.

References:

https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2144909

 

VSAN 6.2 ESXi Patch Updates + DELL PERC Firmware Updates

I wanted to cover off a couple of important updates in this post relating to the DELL PERC storage controller Firmware and software drivers as well as an important new release of ESXi 6.0 that addresses a couple of issues with VSAN and also fixes to more VMXNET3 problems which seem to keep popping up. Read further below for the ESXi fixes but firstly a couple of weeks ago I posted about the new certified driver updates for the DELL PERC based storage controllers that VMware released for VSAN 6.2. This driver was only half of the fix as DELL also released new Firmware for most of the PERC based controllers listed below.

It’s important to match the PERC Firmware with the updated driver from VMware as together they protect against the LSI issues mentioned here. The workaround after the driver has been installed is just that and it requires the FW upgrade to be fully protected. As shown below you want to be on at least version 25.4.0.0015.

Side note: While you are at it looking at the DELL Drivers and Download site you should also consider upgrading to the latest iDRAC Firmware and any other component that contains fixes to issues that could impact you.

Just on that new VMware driver…even if you are running earlier versions of VSAN with the Health Checker if you update the HCL database and run a health check you will see a warning against PERC FW Controller Driver versions prior to lsi_mr3 (6.903.85.00-1OEM.600.0.0.2768847) as shown below.

New ESXi 6.0 Update 2 Build VSAN Fixes:

Last week VMware released ESXi 6.0 Build 3825889 that addressed a couple of big issues relating to VSAN datastore updates and also a bad VMXNET3 PSOD issue. Of most importance to me looking to upgrade existing VSAN 6.1 clusters to VSAN 6.2 there was an issue with CBT enabled VMs when upgrading the VSAN filesystem from 2.0 to 3.0.

Attempts to upgrade a Virtual SAN cluster On-Disk format version from version 2.0 to 3.0 fails when you Power On CBT-enabled VMs. Also, CBT-enabled VMs from a non-owning host might fail due to on-disk lock contention on the ctk files and you might experience the following issues:

  • Deployment of multiple VMs from same CBT enabled template fail.
  • VMs are powered off as snapshot consolidation fails.
  • VM does not Power On if the hardware version is upgraded (for example, from 8 or 9 to 10) before registering the VM on a different host

So that’s not too cool specially if you are using Veeam or some other VDP based backup solution but glad there is a fix for that. Again I don’t get why or how these things slip through…but it seems like things haven’t improved too much when it comes to the QA of ESXi releases. But again, the relative turn around time to have these issues fixed seems to be somewhat acceptable.

As mentioned there are a few more significant fixes so when the time is right this update should be applied to existing ESXi 6.0 Update 2 installations.

References:

https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2145070

https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2144614

http://www.vmware.com/resources/compatibility/detail.php?deviceCategory=vsanio&productid=38055&deviceCategory=vsanio&details=1&vsan_type=vsanio&io_partner=23&io_releases=275&page=1&display_interval=10&sortColumn=Partner&sortOrder=Asc

« Older Entries