Category Archives: DELL

VSAN 6.2 + DELL PERC: Important Certified Driver Updates

As many of us rejoiced at the release of VSAN 6.2 that came with vSphere 6 Update 2…those of us running DELL PERC based storage controllers where quickly warned of a potential issues and where told to not upgrade. VMwareKB 2144614 referenced these issues and stated that the PERC H730 and FD332 found in DELL Server platforms where not certified for VSAN 6.2 pending onging investigations. The storage controllers that where impacted are listed below.

This impacted me as we have the FD332 Dual ROC in our production FX2s with VSAN 6.1 and a test bed with VSAN 6.2. With the KB initially saying No ETA I sat and waited like others impacted to have the controllers certified. Late last week however DELL and VMware finally released an updated FW driver for that PERC based which certifies the H730s and FD332s with VSAN 6.2.

Before this update if you looked at the VSAN Health Monitor you would have seen a Warning next to the VMware Certified check and official driver support.

As well as upgrading the controller drivers it’s also suggested that you make the following changes on each host in the cluster which adds two new VSAN IO timeout settings. No reboot is required after applying the advanced config and the command is persistent.

esxcfg-advcfg -s 100000 /LSOM/diskIoTimeout
esxcfg-advcfg -s 4 /LSOM/diskIoRetryFactor

Once the driver has been upgraded you should see all green in the VSAN Health Checks as shown below with the up to date driver info.

This is all part of the fun and games of using your own components for VSAN, but I still believe it’s a huge positive to be able to cater a design for specific use cases with specific hardware. In talking with various people within VMware and DELL (as it related to this and previous PERC driver issues) it’s apparent that both parties need to communicate better and go through much better QA before updating drivers and firmware releases however this is not only something that effects VMware and DELL and not only for storage drivers…it’s a common issues throughout the industry and it not only impacts VMware VSAN with every vendor having issues at some point.

Better safe than sorry here and well done on VMware and DELL on getting the PERC certified without too much delay.

References:

https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2144614

http://www.vmware.com/resources/compatibility/detail.php?deviceCategory=vsanio&productid=38055&deviceCategory=vsanio&details=1&vsan_type=vsanio&io_partner=23&io_releases=275&page=1&display_interval=10&sortColumn=Partner&sortOrder=Asc

Preserving VSAN + DELL PERC Critical Drivers after ESXi 6.0 CBT Update

Last week VMware released a patch to fix another issue with Change Block Tracking (CBT) which took the ESXi 6.0 Update 1 Build to 3247720. The update bundle contains a number of updates to the esx-base including the resolution of the CBT issue.

This patch updates the esx-base VIB to resolve an issue that occurs when you run virtual machine backups which utilize Changed Block Tracking (CBT) in ESXi 6.0, the CBT API call QueryDiskChangedAreas() might return incorrect changed sectors that results in inconsistent incremental virtual machine backups. The issue occurs as the CBT fails to track changed blocks on the VMs having I/O during snapshot consolidation.

Having just deployed and configured a new Management Cluster consisting of four ESXI 6.0 Update 1 hosts running VSAN I was keen to get the patch installed so that VDP based backups would work without issue however once I had deployed the update (via esxcli) to the first three hosts I saw that the VSAN Health Checker was raising a warning against the cluster. Digging into the VSAN Health Check Web Client Monitor view I saw the following under HCL Health -> Controller Driver Test

As I posted early November there was an important driver and firmware update that was released by VMware and DELL that resolved a number of critical issues with VSAN when put under load. The driver package is shown above against node-104 as 6.606.12.00-1OEM.600.0.0.2159203 and that shows a Passed Driver Health state. The others are all in the Warning state and the version is 6.605.08.00-7vmw.600.1.17.3029758.

What’s happened here is that the ESXi Patch has “updated” the Controller driver to the latest VMware driver number and has overwritten the driver released on the 19th of May and the one listed on the VMware VSAN HCL Page. The simple fix is to reinstall the OEM drivers so that you are left back with the VSAN Health Status as shown below.

Interestingly the Device now shows up as a Avago (LSI) MegaRAID SAS Invader Controller instead of a FD332-PERC (Dual ROC) … I questioned that with a member of the VSAN team and it looks as though that is indeed the OEM name for the FD332 Percs.

So be aware when updating ESXi builds to ensure the updated drivers haven’t removed/replaced it with anything that’s going to potentially give you a really bad time with VSAN…or any other component for that matter.

References:

http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2137546

vCloud Air and Virtustream – Just kill vCloud Air Already?!?

I’ve been wanting to write some commentary around the vCloud Air and Virtustream merger since rumours of it took place just before VMworld in Auguest and I’ve certainly been more interested in the whole state of play since news of the EMC/VMware Cloud Services spin off was announced in late October…the basis of this new entity is to try and get a strangle hold in the Hybrid Cloud market which is widely known to make up the biggest chunk of the Cloud market for the foreseeable future topping $90 billion by 2020.

Below are some of the key points lifted from the Press Release:

  • EMC and VMware  plan to form new cloud services business creating the industry’s most comprehensive hybrid cloud portfolio
  • Will incorporate and align cloud capabilities of EMC Information Infrastructure, Virtustream and VMware to provide the complete spectrum of on- and off-premises cloud offerings
  • The new cloud services business will be jointly owned 50:50 by VMware and EMC and will operate under the Virtustream brand led by CEO Rodney Rogers
  • Virtustream’s financial results to be consolidated into VMware financial statements beginning in Q1 2016
  • Virtustream is expected to generate multiple hundreds of millions of dollars in recurring revenue in 2016, focused on enterprise-centric cloud services, with an outlook to grow to a multi-billion business over the next several years
  • VMware will establish a Cloud Provider Software business unit incorporating existing VMware cloud management offerings and Virtustream’s software assets — including the xStream cloud management platform and others.

I’ve got a vested interest in the success or otherwise of vCloud Air as it directly impacts Zettagrid and the rest of the vCloud Air Network as well as my current professional area of focus however I feel I am still able to provide leveled feedback when it comes to vCloud Air and the time was finally right to comment after yesterday evening comming across the following LinkedIn Post from Nitin Bahdur

It grabbed my attention not only because of my participation in the vCloud Air Network but also because the knives have been out for vCloud Air almost before the service was launched as vCloud Hybrid Services. The post its self from Nitin though brief, was suggesting that VMware should further embrace it’s partnership with Google Cloud and just look to direct VMware Cloud customers onto the Google Cloud. The suggestion was based on letting VMware Orchestrate workloads on Google while letting Google do what it’s best at…which was surprisingly Infrastructure.

With that in mind I want to point out that vCloud Air is nowhere near the equal of AWS, Azure or Google in terms of total service offerings but in my opinion it’s never been about trying to match those public cloud players platform services end to end. Where VMware (and by extension it’s Service Provider Partners) does have an advantage is in the fact that in reality, VMware does do Infrastructure brilliantly and has the undisputed market share among other hypervisor platforms therefore giving it a clear advantage when talking about the total addressable market for Hybrid Cloud services.

As businesses look to go through their natural hardware refresh cycles the current options are:

  • Acquire new compute and storage hardware for existing workloads (Private – CapEx)
  • Migrate VM workloads to a cloud based service (IaaS – OpEx)
  • Move some application workloads into modern Cloud Services (SaaS)
  • Move all workloads to cloud and have third parties provide all core business services (SaaS, PaaS)

Without going into too much detail around each option…at a higher level where vCloud Air and the vCloud Air Network has the advantage in that most businesses I come across are not ready to move into the cloud holistically and for the next three to five years existing VM workloads will need a home as businesses work out a way to come to terms with an eventual move towards the next phase of cloud adoption which is all about platform and software delivered in a cloud native way.

Another reason why vCloud Air and the Air Network is attractive is because migration and conversion of VMs is still problematic and a massive pain (in the you know what) for most businesses to contemplate undertaking…let alone spend the additional capital on. A platform that offers the same underlying infrastructure as what’s out there as what vCloud Air, the vCloud Air Network partners and what Virtustream offers should continue to do well and there is enough ESXi based VMs out there to keep VMware based cloud providers busy for a while yet.

vCloud Air isn’t even close to being perfect and has a long way to go to even begin to catch up with the bigger players and VMware/EMC/DELL might well choose to wrap it up but my feel is that that would be a mistake…certainly it needs to evolve but the platform has a great advantage and it, along with the vCloud Air Network should be able to cash in.

In the next part I will look at what Virtustream brings to the table and how VMware can combine the best of both entities into a service that can and should do well over the next 3-5 years as the Cloud Market starts to mature and move into different territory leading into the next shift in cloud delivery.

References:

https://www.linkedin.com/pulse/should-vmware-kill-vcloud-air-instead-use-google-cloud-nitin-bahadur

http://www.crn.com/news/cloud/300077924/sources-vmware-cutting-back-on-vcloud-air-development-may-stop-work-on-new-features.htm

http://www.vmware.com/company/news/releases/vmw-newsfeed/EMC-and-VMware-Reveal-New-Cloud-Services-Business/2975020-manual?x-src=paidsearch_vcloudair_general_google_search_anz&kw=nsx%20+vcloud%20air&mt=b&gclid=Cj0KEQiAj8uyBRDawI3XhYqOy4gBEiQAl8BJbWUytu-I8GaYbrRGIiTuUQe9j6VTPMAmKJoqtUyCScAaAuGv8P8HAQ

http://www.marketsandmarkets.com/PressReleases/hybrid-cloud.asp

Dell PowerEdge FX2: VSAN Disk Configuration Steps

When you get your new DELL FX2s out of the box and powered on for the first time you will notice that the disk configuration has not been setup with VSAN in mind…If you where to log into ESXi on the blades in SLOT1a and 1c you would see that each host will have each SAS disk configured as a datastore. There is a little pre-configuration you need to do in order to get the drives presented correctly to the blades servers as well as remove and reconfigure the datastores and disks from within ESXi.

With my build I had four FC430 Blades with two FD332 Storage Sleds that contained 4x200GB SSDs and 8x600GB SAS drives in each sled.  By default the storage mode is configured in Split Single Host mode which results in all the disks being assigned to the hosts in SLOT1a and SLOT1c and both controllers as also assigned to the single host.

You can configure individual storage sleds containing two RAID controllers to operate in the following modes:

  • Split-single – Two RAID controllers are mapped to a single compute sled. Both the controllers are enabled and each controller is connected to eight disk drives
  • Split-dual – Both RAID controllers in a storage sled are connected to two compute sleds.
  • Joined – The RAID controllers are mapped to a single compute sled. However, only one controller is enabled and all the disk drives are connected to it.

To take advantage of the FD332-PERC (Dual ROC) controller you need to configure Split-Dual mode. All hosts need to be powered off to change the default configuration and change it to Split Dual Hosts for the VSAN configuration.

Head to Server Overview -> Power and from here Gracefully Shutdown all four servers

Once the servers have been powered down, click on the Storage Sleds in SLOT-03 and SLOT-04 and go to the Setup Tab. Change the Storage Mode to Split Dual Host and Click Apply.

To check the distribution of the disks you can Launch the iDRAC to each blade and go to Storage -> Enclosures and check to see that each Blade now has 2xSSDs and 4xHDD drives assigned. With the FD332 there are 16 total slots with 0-7 belonging to the first blade and 8-16 belonging to the seconds blade. As shown below we are looking at the config of SLOT1a.

The next step is to reconfigure the disks within ESXi to make sure VSAN can claim them when configuring the Disk Groups. Part of the process below is to delete any datastores that exist and clear the partition table…by far the easiest way to achieve this is via the new Embedded Host Client.

Install the Embedded Host Client on each Host

Log into the Hosts via the Embedded Client from https://HOST_IP/ui and go to the Storage Menu and delete any datastores that where preconfigured by DELL.

Click on Devices Tab in the Storage Menu and Clear the Partition Table so the VSAN can claim the disks that have been just deleted.

From here all disks should be available to be claimed by VSAN to create your disk groups.

As a side note it’s important to update to the latest driver for the PERC.

References:

http://www.dell.com/support/manuals/au/en/aubsd1/dell-cmc-v1.20-fx2/CMCFX2FX2s12UG-v1/Notes-cautions-and-warnings?guid=GUID-5B8DE7B7-879F-45A4-88E0-732155904029&lang=en-us

Dell PowerEdge FX2: CMC Configuration Gotchya

Back in September I wrote an introductory post (If you haven’t read that post click here) on the DELL PowerEdge FX2 HCI hardware and why we had selected it for our VSAN Management platform. After a busy two months consisting of a VMworld, vForumAU and VeeamOn it’s finally time to put start working towards putting these babies into production.

I’m hoping to do a series of posts around the FX2s and VSAN and thought I would kick things off with the short but very important public service announcement around the default configuration behavior of the Chassis Management Controller network port settings and how if you don’t RTFM you could be left with an angry network guy beating down at your door!

CAUTION: Connecting the STK/Gb2 port to the management network will have unpredictable results if the CMC setting is not changed from default Stacking to Redundant, to implement NIC failover. In the default Stacking mode, cabling the Gb1 and STK/Gb2 ports to the same network (broadcast domain) can cause a broadcast storm. A broadcast storm can also occur if the CMC setting is changed to Redundant mode, but the cabling is daisy chained between chassis in the Stacking mode. Ensure that the cabling model matches the CMC setting for the intended usage.

That warning should be one of the first things you read as you go through the CMC for PowerEdge FX2 User Guide but just in case you don’t read that and are looking to take advantage of the redundant NIC feature the CMC offers similar to that found in the DELL M1000e Chassis you need to Network -> General Settings and change the default radio option shown below from Stacking to Redundant.

If this isn’t done and you do attempt to set up redundant management ports in the stacking option you will more than likely as the caution suggests impact your network due to the switches grinding to a halt under the stress of the broadcast storm…and in turn have some not to happy networking admins coming after you once they work out whats going on.

The diagram above, pulled from the online documentation shows you what not to do if Management Port 2 is configured in stacking Mode. Stacking mode is used to daisy chain a number of FX2 Chassis for single access management if required. I would have thought that having the least dangerous option set as default was the way to go but it is certainly a case of be aware that some assumptions can lead to major headaches…so a final reminder to RTFM just in case…and be aware of this default behavior in the FX2 CMCs.

http://www.dell.com/support/manuals/au/en/aubsd1/dell-cmc-v1.20-fx2/CMCFX2FX2s12UG-v1/Checklist-to-set-up-chassis?guid=GUID-767EC114-FE22-477E-AD20-E3356DD53395&lang=en-us

 

First Look: Dell PowerEdge FX2 Converged Platform

For the last six months or so I’ve been on the look out for server and storage hardware to satisfy the requirement for new Management Clusters across our Zettagrid vCloud Zones… After a fairly exhaustive discovery and research stage the Dell PowerEdge FX2 dropped at the right time to make the newly updated converged architecture hardware platform a standout choice for a HCI based solution.

I plan on doing a couple of posts on the specifics of the hardware chosen as part of the build that will end up as a VMware VSAN configuration but for the moment there is a little more info on the FX2 PowerEdge (below) as well as a Virtual Unboxing video that goes through the initial familiarization with the CMC and then walks through the FC430 System and Storage Configuration as well as what the new BIOS menu looks like:

 

Below are some specs from the Dell site going through the compute and storage hardware…as you saw in the video above we went for the 1/4 Blade FC430’s with two FD322 Storage Sleds.

Server blocks at the heart of the FX converged architecture are powered by the latest Intel® Xeon® processors. They include:

  • FC430: 2-socket, quarter-width 1U high-density server block with optional InfiniBand configuration
  • FC630: 2-socket, half-width 1U workhorse server block ideal for a wide variety of business applications
  • FC830: Powerful 4-socket, full-width 1U server block for mid-size and enterprise data centers
  • FM120x4: Half-width 1U sled housing up to four separate Intel® Atom® powered single-socket microservers offers up to 16 microservers per 2U.

The FC430 features:

  • Two multi-core Intel® Xeon® E5-2600 v3 processors or one multi-core Intel® Xeon® E5-1600 v3 processor (up to 224 cores per FX2)
  • Up to 8 memory DIMMs (up to 64 DIMMs per FX2)
  • Two 1.8″ SATA SSDs or one 1.8″ SATA SSD (w/front IB Mezzanine port)
  • Dual-port 10Gb LOM
  • Access to one PCIe expansion slot in the FX2 chassis

The FD332 provides massive direct attached storage (DAS) capacity in easily scalable, modular half-width, 1U blocks. Each block can house up to 16 direct-attached small form factor (SFF) storage devices. Combined with FX servers, the FD332 drives highly flexible, scale out computing solutions and is an excellent option for dense VSAN environments using optimized ratios of HDD/SSD storage (including all flash) .

  • Up to 16 SFF 2.5″ SSDs/HDDs , both SATA and SAS
  • Up to three FD332 blocks per chassis (with one FC630 for processing). Other storage options include one or two blocks with different combinations of server blocks
  • 12Gbps SAS 3.0 and 6Gbps SATA 3.0
  • PowerEdge RAID Controller (PERC9), single or dual controllers, RAID or HBA modes, or mix and match modes with dual controllers

My first impressions are that this is a very very sexy bit of kit! I am looking forward to getting it up and firing and putting it to use as the basis for a solid Management Cluster platform.

http://www.dell.com/us/business/p/poweredge-fx/pd?oc=pe_fc430_1085&model_id=poweredge-fx&l=en&s=bsd 

How To: DELL DSET Report Tool Live CD and Linux VLAN Config

Here is a quick post on generating support logs for DELL cases if you are running VMware ESX(i) on any of the DELL server hardware. I had a CPU alert appear in my vSphere Hardware status and raised a support ticket with DELL. Previously I’ve had to wrestle with the config/setup of the DSET tool on ESX(i) and even had it cause a boot up failures due to a comparability bug.

The Dell tech send me the link below which is a CENTOS LiveCD which can be downloaded and booted up on the server in question.

http://linux.dell.com/files/openmanage-contributions/omsa-70-live/

Once downloaded and attached via the iDRAC Virtual Media Manager you will automatically go through to the desktop where you can double click on the DSET Tool Icon. Let it do it’s thing and gather all the relevant info which is then packaged into a zip file under \tmp\data\

Ok, so now that you have the file…how do you get it off the LiveCD instance? The answer would be simple if you had interfaces configured with DHCP, but the majority of these servers are configured with NICs on VLAN enabled ports which are not easily switched over or able to be reconfigured without going through change management etc etc.

The Network Configuration GUI in CENTOS doesn’t have the ability to configure VLAN tagging on the interfaces so you need to jump into the shell and manually configure the network settings as shown below.

Create a new config file for eth0 and configure it as shown below…key here is to take note of the MAC Address, no include and IP or Subnet details and I disabled IPv6.

Once saved, copy that file and save to ifcfg-eth0.x where is is the VLAN you want the interface to communicate in. This time you are adding relevent IP info along with specifying the device name as eth0.x and VLAN=yes which obviously enabled the VLAN tag config.

Fire up the new interfaces and restart network and you have a VLAN enabled connection that you can now grab the DSET zip file off and send to DELL for analysis.

As a side note, being the good VMware fanboy that I am, I used my Octopus Beta service to upload the file and make it available via the Octopus URL for sharing…because getting access to the Horizon Suite BETA is currently near on impossible 🙂