Tag Archives: iSCSI

Quick Fix: vSAN Health Reports iSCSI Target Service Stopped

A few weeks ago I wrote about using iSCSI as a backup repository target. While still running this POC in my environment I came across an error in the vSAN Health Checker stating the vSAN iSCSI target service was in a Failed state. Drilling down into the vSAN Health check tree I could see a Service Runtime status of stopped as shown below against the host.

This host had recently been marked as unreachable in vCenter and required a Management Agent reset to bring it back online. There is a chance that that process stopped the iSCSI Target service but did not start it. In any case there is an easy way to see the status of the services and then get them back online.

Once that’s been done, a re-run of the vSAN Health checker will show that the issue has been resolved and the iSCSI Target Service on the host is now running.

References:

https://kb.vmware.com/s/article/2147603

 

Setting up vSAN iSCSI and using it as a Veeam Repository

Probably one of the least talked about features of vSAN is it’s ability to serve out iSCSI volumes. The feature was released with vSAN 6.5 and was primarily focused on physical workloads and is easily configurable via the vSphere Web Client. iSCSI targets on vSAN are managed the same as any other vSAN objects using Storage Policy Based Management (SPBM). Deduplication, compression, mirroring, and erasure coding can be utilized with the iSCSI target service as well as CHAP and Mutual CHAP authentication.

Of late, i’ve been asked by service providers about using Object Storage platforms as Veeam Backup & Replication repositories. There are a lot of options out there but someone asked specifically about using vSAN. In theory you could just use a VMDK on a vSAN datastore but I thought it would be interesting to look at using iSCSI to mount a volume and use it as a repository.

Initial iSCSI Configuration for vSAN:

First thing we need to do is enable the iSCSI Target service from the vSphere Web Console. Under the Cluster Configuration tab and in the iSCSI Target menu you need to enabled the iSCSI service. Select the default iSCSI Network kernel interface and then modify the iSCSI port and add security if desired. Take note of the info message around using the Storage Policy for the home object.

From there we setup a new iSCIS Target. From here you will be given the IQN and we will give the target an alias. This window also lets us create the first LUN to the iSCSI Target. The LUN id can be specified along with the alias and finally the size. Just like creating a new VMDK on a vSAN datastore we are given the storage consumption of the object depending on the Storage Policy chosen.

Once completed under the iSCSI Target pane we see the details of the Target and LUN just created. Take note of the I/O Owner Host as that is what we will be using later on as the iSCSI Target from the Veeam repository server.

Configuring Host access and setting iSCSI Access Permissions:

On the creation of a LUN there is a default policy that allows all initiator sources to connect to it. To create specific permissions for host access and to also create access groups you need to first enable the iSCSI initiator at the hosts. For that, I’ve got a Windows VM (note only physicals are officially supported) that’s got Veeam Backup & Replication installed on it. To connect to the iSCSI network we have to add an additional vNIC that’s hooked into a PortGroup that’s configured with the vSAN iSCSI VLAN.

Below we can see the VMKernel configuration and IP address of the I/O Owner hosts.

I’ve created a new PortGroup for the new vNIC to be attached to and added it to the VM.

From there we need to start the Microsoft iSCSI Initiator service which will give us the Initiator name we need to configure host access in the vSphere Web Client. Note that we should also install and enable MPIO for iSCSI if not installed as a Windows Feature.

Under the iSCSI Initiator Groups menu in the Cluster Configuration tab you can add the initiator to a new group. This can contain one or many hosts as you would expect in any iSCSI initiator group configuration.

Once that’s been done we have to allow that new group access to the target where the LUN is contained. Under the iSCSI Target menu and under Target Details in the lower pane click on the + icon and add the group as an allowed initiator.

From here we can go back to the Windows VM and connect to the iSCSI Target. We are using the IP Address of the Host was was highlighted above in the initial configuration.

Once done we should have a connected disk that’s visible in the Devices configuration of the isCSI Initiator.

Configuring new iSCSI Volume as Veeam Repository:

From here the process to setup a Veeam Repository based on the vSAN iSCSI LUN is straight forward. Firstly we need to bring online the volume and create a partition. As you can see below, the disk is of Bus Type iSCSI and Name is VMware Virtual SAN.

As for the partition configuration, I’ve set it up as shown before. ReFS being used as the file system.

From here we can head into the Backup & Replication console and create a new Repository with the new volume selected.

Performance and Limitations:

Once configured I was interested in seeing how a vSAN iSCSI connected object performed against a vSAN disk. The results below show that there is a significant performance hit in going one way or the other. This seems logical as in addition to iSCSI overheads a native VMDK on vSAN is hooked into the ESXi kernel directly and should get line speed rates when it comes to data transfer.

Below are the configuration maximums with vSAN iSCSI as listed below:

  • Maximum 1024 LUNs per vSAN cluster
  • Maximum 128 targets per vSAN cluster
  • Maximum 256 LUNS per target
  • Maximum LUN size of 62TB
  • Maximum 128 iSCSI sessions per host.
  • Maximum 4096 iSCSI IO queue depth per host
  • Maximum 128 outstanding writes per LUN .
  • Maximum 256 outstanding IOs per LUN.
  • Maximum 64 client initiators per LUN

So the max size of an iSCSI LUN matches the max size of a VMDK. Therefore when considering iSCSI as a possible option for Veeam backups, Scale Out Backup Repositories should be used to enable the adding at extents once that limit is reached.

There are also limitation on offical support for virtual machines and other platforms:

  • Currently not supported for implementation for Microsoft clusters.
  • Currently not supported for use as a target for other vSphere hosts.
  • Currently not supported for use with third party hypervisors.
  • Currently not supported for use with virtual machines

So if this becomes a consideration, physical servers will need to be used in order to gain support.

Conclusion:

So after all is said an done, we have a Veeam Repository than is now sitting on vSAN via iSCSI. The question remains weather this is a good application of vSAN or weather it’s worth looking at as an option, however the option is now there. Again, you may be able to look at the native VMDK option, but I like the flexibility of iSCSI for physical repositories at the moment.

Probably the biggest consideration for using vSAN iSCSI as a Veeam repository is the design of the vSAN Cluster. vSAN has not traditionally been considered for storage only purposes, however you could put together some low compute nodes with large disk groups that would present decent storage for repository purposes.

In using vSAN you have the benefit of knowing your data is redundant across multiple nodes as per the vSAN Storage Policies. This is the benefit of using object storage like vSAN as a Veeam Repository.

References:

https://docs.vmware.com/en/VMware-vSphere/6.5/com.vmware.vsphere.virtualsan.doc/GUID-13ADF2FC-9664-448B-A9F3-31059E8FC80E.html 

https://kb.vmware.com/kb/2148216

 

Quick Fix: ESX 4.1 Host Stops Responding When iSCSI LUN is “pulled”

REMOVING DEAD PATHS IN ESX4.1 (version 5 guidance here)

Very quick post in relation to a slightly sticky situation I found myself in this afternoon. I was decommissioning a service which was linked to a VM which had a number of VMDKs, one of which was located on a dedicated VMFS Datastore…the guest OS also had a directly connected iSCSI LUN.

I choose to delete the LUNs first and then move up the stack removing the VMFS and eventually the VM. In this I simply went to the SAN and deleted the disk and disk group resource straight up! (hence the pulled reference in the title) Little was I to know that ESX would have a small fit when I attempted to do any sort of reconfiguration or management on the VM. The first sign of trouble was when I attempted to restart the VM and noticed that the task in vCenter wasn’t progressing. At that point my Nagios/OpsView Service Check’s against the ESX host began to timeout and I lost connectivity to the host in the vCenter Console.

Restarting the ESX management agents wasn’t helping and as this was very much a production host with production VM’s on it my first (and older way of thinking) thought of rebooting it wasn’t acceptable during core business/SLA hours. As knowledge and confidence builds with experience in and around ESX I’ve come to use the ESX(i) shell access more and more…so I jumped into SSH and had a look at what the vmkernal logs where saying.

So from the logs it was obvious the system was having major issues (re)connecting to the device I had just pulled out from under it. On the other hosts in the Cluster the datastore was greyed out and I was unable to delete it from the Storage Config. A re-scan of the HBA’s removed the dead datastore from the storage list so if I still had vCenter access to this host a simple re-scan should have sorted things out. Moving to the command line of the host in question I ran the esxcfg-rescan command:

And at the same time while tailing the vmkernal logs I saw the following entries:

From tailing through those logs the rescan basically detected that the path in question was in use (bound to a datastore where a VMDK was attached to a VM) reporting the “Device is in use by Worlds” error. The e rrors also highlights dead paths due to me removing the LUN while in use.

The point at which the host went into a spin (as viewed by seeing the Could not select Path for device in the vmkernal log) was when I attempted to switch on the VM and the host (still thinking it had access to the VMDK) trying to access all disks.

So lesson learnt. When decommissioning VMFS datastores, don’t pull the LUN from under ESX…remove it gracefully first from vSphere and then you are free to delete on the SAN.