Tag Archives: ESX

Quick Fix: ESX 4.1 Host Stops Responding When iSCSI LUN is “pulled”

REMOVING DEAD PATHS IN ESX4.1 (version 5 guidance here)

Very quick post in relation to a slightly sticky situation I found myself in this afternoon. I was decommissioning a service which was linked to a VM which had a number of VMDKs, one of which was located on a dedicated VMFS Datastore…the guest OS also had a directly connected iSCSI LUN.

I choose to delete the LUNs first and then move up the stack removing the VMFS and eventually the VM. In this I simply went to the SAN and deleted the disk and disk group resource straight up! (hence the pulled reference in the title) Little was I to know that ESX would have a small fit when I attempted to do any sort of reconfiguration or management on the VM. The first sign of trouble was when I attempted to restart the VM and noticed that the task in vCenter wasn’t progressing. At that point my Nagios/OpsView Service Check’s against the ESX host began to timeout and I lost connectivity to the host in the vCenter Console.

Restarting the ESX management agents wasn’t helping and as this was very much a production host with production VM’s on it my first (and older way of thinking) thought of rebooting it wasn’t acceptable during core business/SLA hours. As knowledge and confidence builds with experience in and around ESX I’ve come to use the ESX(i) shell access more and more…so I jumped into SSH and had a look at what the vmkernal logs where saying.

So from the logs it was obvious the system was having major issues (re)connecting to the device I had just pulled out from under it. On the other hosts in the Cluster the datastore was greyed out and I was unable to delete it from the Storage Config. A re-scan of the HBA’s removed the dead datastore from the storage list so if I still had vCenter access to this host a simple re-scan should have sorted things out. Moving to the command line of the host in question I ran the esxcfg-rescan command:

And at the same time while tailing the vmkernal logs I saw the following entries:

From tailing through those logs the rescan basically detected that the path in question was in use (bound to a datastore where a VMDK was attached to a VM) reporting the “Device is in use by Worlds” error. The e rrors also highlights dead paths due to me removing the LUN while in use.

The point at which the host went into a spin (as viewed by seeing the Could not select Path for device in the vmkernal log) was when I attempted to switch on the VM and the host (still thinking it had access to the VMDK) trying to access all disks.

So lesson learnt. When decommissioning VMFS datastores, don’t pull the LUN from under ESX…remove it gracefully first from vSphere and then you are free to delete on the SAN.

 

How To: DELL DSET Report Tool Live CD and Linux VLAN Config

Here is a quick post on generating support logs for DELL cases if you are running VMware ESX(i) on any of the DELL server hardware. I had a CPU alert appear in my vSphere Hardware status and raised a support ticket with DELL. Previously I’ve had to wrestle with the config/setup of the DSET tool on ESX(i) and even had it cause a boot up failures due to a comparability bug.

The Dell tech send me the link below which is a CENTOS LiveCD which can be downloaded and booted up on the server in question.

http://linux.dell.com/files/openmanage-contributions/omsa-70-live/

Once downloaded and attached via the iDRAC Virtual Media Manager you will automatically go through to the desktop where you can double click on the DSET Tool Icon. Let it do it’s thing and gather all the relevant info which is then packaged into a zip file under \tmp\data\

Ok, so now that you have the file…how do you get it off the LiveCD instance? The answer would be simple if you had interfaces configured with DHCP, but the majority of these servers are configured with NICs on VLAN enabled ports which are not easily switched over or able to be reconfigured without going through change management etc etc.

The Network Configuration GUI in CENTOS doesn’t have the ability to configure VLAN tagging on the interfaces so you need to jump into the shell and manually configure the network settings as shown below.

Create a new config file for eth0 and configure it as shown below…key here is to take note of the MAC Address, no include and IP or Subnet details and I disabled IPv6.

Once saved, copy that file and save to ifcfg-eth0.x where is is the VLAN you want the interface to communicate in. This time you are adding relevent IP info along with specifying the device name as eth0.x and VLAN=yes which obviously enabled the VLAN tag config.

Fire up the new interfaces and restart network and you have a VLAN enabled connection that you can now grab the DSET zip file off and send to DELL for analysis.

As a side note, being the good VMware fanboy that I am, I used my Octopus Beta service to upload the file and make it available via the Octopus URL for sharing…because getting access to the Horizon Suite BETA is currently near on impossible 🙂