Tag Archives: FX2

VSAN 6.2 ESXi Patch Updates + DELL PERC Firmware Updates

I wanted to cover off a couple of important updates in this post relating to the DELL PERC storage controller Firmware and software drivers as well as an important new release of ESXi 6.0 that addresses a couple of issues with VSAN and also fixes to more VMXNET3 problems which seem to keep popping up. Read further below for the ESXi fixes but firstly a couple of weeks ago I posted about the new certified driver updates for the DELL PERC based storage controllers that VMware released for VSAN 6.2. This driver was only half of the fix as DELL also released new Firmware for most of the PERC based controllers listed below.

It’s important to match the PERC Firmware with the updated driver from VMware as together they protect against the LSI issues mentioned here. The workaround after the driver has been installed is just that and it requires the FW upgrade to be fully protected. As shown below you want to be on at least version 25.4.0.0015.

Side note: While you are at it looking at the DELL Drivers and Download site you should also consider upgrading to the latest iDRAC Firmware and any other component that contains fixes to issues that could impact you.

Just on that new VMware driver…even if you are running earlier versions of VSAN with the Health Checker if you update the HCL database and run a health check you will see a warning against PERC FW Controller Driver versions prior to lsi_mr3 (6.903.85.00-1OEM.600.0.0.2768847) as shown below.

New ESXi 6.0 Update 2 Build VSAN Fixes:

Last week VMware released ESXi 6.0 Build 3825889 that addressed a couple of big issues relating to VSAN datastore updates and also a bad VMXNET3 PSOD issue. Of most importance to me looking to upgrade existing VSAN 6.1 clusters to VSAN 6.2 there was an issue with CBT enabled VMs when upgrading the VSAN filesystem from 2.0 to 3.0.

Attempts to upgrade a Virtual SAN cluster On-Disk format version from version 2.0 to 3.0 fails when you Power On CBT-enabled VMs. Also, CBT-enabled VMs from a non-owning host might fail due to on-disk lock contention on the ctk files and you might experience the following issues:

  • Deployment of multiple VMs from same CBT enabled template fail.
  • VMs are powered off as snapshot consolidation fails.
  • VM does not Power On if the hardware version is upgraded (for example, from 8 or 9 to 10) before registering the VM on a different host

So that’s not too cool specially if you are using Veeam or some other VDP based backup solution but glad there is a fix for that. Again I don’t get why or how these things slip through…but it seems like things haven’t improved too much when it comes to the QA of ESXi releases. But again, the relative turn around time to have these issues fixed seems to be somewhat acceptable.

As mentioned there are a few more significant fixes so when the time is right this update should be applied to existing ESXi 6.0 Update 2 installations.

References:

https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2145070

https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2144614

http://www.vmware.com/resources/compatibility/detail.php?deviceCategory=vsanio&productid=38055&deviceCategory=vsanio&details=1&vsan_type=vsanio&io_partner=23&io_releases=275&page=1&display_interval=10&sortColumn=Partner&sortOrder=Asc

Dell PowerEdge FX2: VSAN Disk Configuration Steps

When you get your new DELL FX2s out of the box and powered on for the first time you will notice that the disk configuration has not been setup with VSAN in mind…If you where to log into ESXi on the blades in SLOT1a and 1c you would see that each host will have each SAS disk configured as a datastore. There is a little pre-configuration you need to do in order to get the drives presented correctly to the blades servers as well as remove and reconfigure the datastores and disks from within ESXi.

With my build I had four FC430 Blades with two FD332 Storage Sleds that contained 4x200GB SSDs and 8x600GB SAS drives in each sled.  By default the storage mode is configured in Split Single Host mode which results in all the disks being assigned to the hosts in SLOT1a and SLOT1c and both controllers as also assigned to the single host.

You can configure individual storage sleds containing two RAID controllers to operate in the following modes:

  • Split-single – Two RAID controllers are mapped to a single compute sled. Both the controllers are enabled and each controller is connected to eight disk drives
  • Split-dual – Both RAID controllers in a storage sled are connected to two compute sleds.
  • Joined – The RAID controllers are mapped to a single compute sled. However, only one controller is enabled and all the disk drives are connected to it.

To take advantage of the FD332-PERC (Dual ROC) controller you need to configure Split-Dual mode. All hosts need to be powered off to change the default configuration and change it to Split Dual Hosts for the VSAN configuration.

Head to Server Overview -> Power and from here Gracefully Shutdown all four servers

Once the servers have been powered down, click on the Storage Sleds in SLOT-03 and SLOT-04 and go to the Setup Tab. Change the Storage Mode to Split Dual Host and Click Apply.

To check the distribution of the disks you can Launch the iDRAC to each blade and go to Storage -> Enclosures and check to see that each Blade now has 2xSSDs and 4xHDD drives assigned. With the FD332 there are 16 total slots with 0-7 belonging to the first blade and 8-16 belonging to the seconds blade. As shown below we are looking at the config of SLOT1a.

The next step is to reconfigure the disks within ESXi to make sure VSAN can claim them when configuring the Disk Groups. Part of the process below is to delete any datastores that exist and clear the partition table…by far the easiest way to achieve this is via the new Embedded Host Client.

Install the Embedded Host Client on each Host

Log into the Hosts via the Embedded Client from https://HOST_IP/ui and go to the Storage Menu and delete any datastores that where preconfigured by DELL.

Click on Devices Tab in the Storage Menu and Clear the Partition Table so the VSAN can claim the disks that have been just deleted.

From here all disks should be available to be claimed by VSAN to create your disk groups.

As a side note it’s important to update to the latest driver for the PERC.

References:

http://www.dell.com/support/manuals/au/en/aubsd1/dell-cmc-v1.20-fx2/CMCFX2FX2s12UG-v1/Notes-cautions-and-warnings?guid=GUID-5B8DE7B7-879F-45A4-88E0-732155904029&lang=en-us

VSAN + DELL PERC: Important Driver and Firmware Updates

I’m currently going through and documenting the build process for our VSAN Management Clusters and one of the first steps I noted down was to double check that the I/O Controllers where compatible as per the VSAN HCL. As I am using the DELL FX2s I checked to ensure that there where no issues with the FD332-PERC (Dual ROC) controller. As shown below there are no issues with it being on the list (confirmed before the actual hardware purchase) however there was a footnote listed next to the Release Info

That link takes you to the MyVMware download for the SAS Driver for the DELL PERC9 Based SAS Adapters, of which there are a number of models listed below.

Version 6.606.12.00-1OEM
Description The ESXi 6.0 driver package includes lsi-mr3 driver version 6.606.12.00-1OEM which enables support for PERC 9 based 12Gbps family of SAS controllers such as Models H730P, H730, H830, H330, FD33xS, FD33xD
Release Date 2015-05-19

I dug a little more into this release and managed to link it back to a VMwareKB (2109665) that talks about adverse symptoms when using PERC9 based controllers and VSAN 5.x or 6.x

  • In the VMware vCenter Server event log display, you see the error:IO was aborted by VMFS via a virt-reset on the device
  • When the VSAN is under load, you can see this or a similar VSAN status display showing VSAN disk(s) unhealthy:
  • High IO latency alarms
  • Failed IO and controller reset messages in the ESXi logs similar to these:WARNING: lsi_mr3: fusionReset:2565: megaraid_sas: Hardware critical error, returning FAILED.
    WARNING: ScsiPath: 7133: Set retry timeout for failed TaskMgmt abort for CmdSN 0x0, status Failure, pathvmhba0:C0:T0:L0

Without copying and pasting the whole KB you want to ensure that you download an install the VIB update listed above and ensure that the Driver and Controller versions are up to date as referenced in the KB. You also need to ensure that the DELL backplane firmware is greater than whats shown below.

  • Expander storage backplane (BP13G+EX): firmware version 3.03
  • Non-expander storage backplane (BP13G+): firmware version 2.23

From the FX2 CMC you can check the versions of that hardware by going to the Update Tab under Chassis Overview, click on one of the servers Update Target in the bottom pane and you are looking for the components highlighted below.

References:

http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2109665

https://my.vmware.com/web/vmware/details?downloadGroup=DT-ESXI60-LSI-LSI-MR3-66061200-1OEM&productId=491

Dell PowerEdge FX2: CMC Configuration Gotchya

Back in September I wrote an introductory post (If you haven’t read that post click here) on the DELL PowerEdge FX2 HCI hardware and why we had selected it for our VSAN Management platform. After a busy two months consisting of a VMworld, vForumAU and VeeamOn it’s finally time to put start working towards putting these babies into production.

I’m hoping to do a series of posts around the FX2s and VSAN and thought I would kick things off with the short but very important public service announcement around the default configuration behavior of the Chassis Management Controller network port settings and how if you don’t RTFM you could be left with an angry network guy beating down at your door!

CAUTION: Connecting the STK/Gb2 port to the management network will have unpredictable results if the CMC setting is not changed from default Stacking to Redundant, to implement NIC failover. In the default Stacking mode, cabling the Gb1 and STK/Gb2 ports to the same network (broadcast domain) can cause a broadcast storm. A broadcast storm can also occur if the CMC setting is changed to Redundant mode, but the cabling is daisy chained between chassis in the Stacking mode. Ensure that the cabling model matches the CMC setting for the intended usage.

That warning should be one of the first things you read as you go through the CMC for PowerEdge FX2 User Guide but just in case you don’t read that and are looking to take advantage of the redundant NIC feature the CMC offers similar to that found in the DELL M1000e Chassis you need to Network -> General Settings and change the default radio option shown below from Stacking to Redundant.

If this isn’t done and you do attempt to set up redundant management ports in the stacking option you will more than likely as the caution suggests impact your network due to the switches grinding to a halt under the stress of the broadcast storm…and in turn have some not to happy networking admins coming after you once they work out whats going on.

The diagram above, pulled from the online documentation shows you what not to do if Management Port 2 is configured in stacking Mode. Stacking mode is used to daisy chain a number of FX2 Chassis for single access management if required. I would have thought that having the least dangerous option set as default was the way to go but it is certainly a case of be aware that some assumptions can lead to major headaches…so a final reminder to RTFM just in case…and be aware of this default behavior in the FX2 CMCs.

http://www.dell.com/support/manuals/au/en/aubsd1/dell-cmc-v1.20-fx2/CMCFX2FX2s12UG-v1/Checklist-to-set-up-chassis?guid=GUID-767EC114-FE22-477E-AD20-E3356DD53395&lang=en-us

 

First Look: Dell PowerEdge FX2 Converged Platform

For the last six months or so I’ve been on the look out for server and storage hardware to satisfy the requirement for new Management Clusters across our Zettagrid vCloud Zones… After a fairly exhaustive discovery and research stage the Dell PowerEdge FX2 dropped at the right time to make the newly updated converged architecture hardware platform a standout choice for a HCI based solution.

I plan on doing a couple of posts on the specifics of the hardware chosen as part of the build that will end up as a VMware VSAN configuration but for the moment there is a little more info on the FX2 PowerEdge (below) as well as a Virtual Unboxing video that goes through the initial familiarization with the CMC and then walks through the FC430 System and Storage Configuration as well as what the new BIOS menu looks like:

 

Below are some specs from the Dell site going through the compute and storage hardware…as you saw in the video above we went for the 1/4 Blade FC430’s with two FD322 Storage Sleds.

Server blocks at the heart of the FX converged architecture are powered by the latest Intel® Xeon® processors. They include:

  • FC430: 2-socket, quarter-width 1U high-density server block with optional InfiniBand configuration
  • FC630: 2-socket, half-width 1U workhorse server block ideal for a wide variety of business applications
  • FC830: Powerful 4-socket, full-width 1U server block for mid-size and enterprise data centers
  • FM120x4: Half-width 1U sled housing up to four separate Intel® Atom® powered single-socket microservers offers up to 16 microservers per 2U.

The FC430 features:

  • Two multi-core Intel® Xeon® E5-2600 v3 processors or one multi-core Intel® Xeon® E5-1600 v3 processor (up to 224 cores per FX2)
  • Up to 8 memory DIMMs (up to 64 DIMMs per FX2)
  • Two 1.8″ SATA SSDs or one 1.8″ SATA SSD (w/front IB Mezzanine port)
  • Dual-port 10Gb LOM
  • Access to one PCIe expansion slot in the FX2 chassis

The FD332 provides massive direct attached storage (DAS) capacity in easily scalable, modular half-width, 1U blocks. Each block can house up to 16 direct-attached small form factor (SFF) storage devices. Combined with FX servers, the FD332 drives highly flexible, scale out computing solutions and is an excellent option for dense VSAN environments using optimized ratios of HDD/SSD storage (including all flash) .

  • Up to 16 SFF 2.5″ SSDs/HDDs , both SATA and SAS
  • Up to three FD332 blocks per chassis (with one FC630 for processing). Other storage options include one or two blocks with different combinations of server blocks
  • 12Gbps SAS 3.0 and 6Gbps SATA 3.0
  • PowerEdge RAID Controller (PERC9), single or dual controllers, RAID or HBA modes, or mix and match modes with dual controllers

My first impressions are that this is a very very sexy bit of kit! I am looking forward to getting it up and firing and putting it to use as the basis for a solid Management Cluster platform.

http://www.dell.com/us/business/p/poweredge-fx/pd?oc=pe_fc430_1085&model_id=poweredge-fx&l=en&s=bsd