Archive for VMware

First Look: CloudPhysics Card Designer

The boys at CloudPhysics are working hard behind the scenes at adding new features to their current stable of Analytic Cards based on data collected from their Probe VA’s hooked into vCenter environments.

Check out this post on their DataStore Contention Card:

For a general overview, go here: I am a massive fan of analytics and trend metrics and I use a number of systems to gain a wide overview of the performance and monitoring of our Hosting and Cloud Platform.

A few weeks ago, the CloudPhysics team released to a limited number of users a Custom Card Designer. This pretty much lets you construct custom cards based on a huge number of metrics presented via a builder wizard.

cp_cd

Cards you design and save are listed on the page above. From here you can view your custom cards and edit them if they require tweaking. Once you click the Create Card + button you are presented with a list of property data metrics from which to construct your card.

cp_cd3

Properties fall under four main categories and there are a large number of available metrics under each category. The wizard lets you drag and drop items into the builder window. From there you can preview and then save your custom card for future use.

As a quick example I needed a quick way to see which datastores where connected to their respective hosts in each cluster so that consistency in datastore availability was maintained. It was as simple as dragging across Host:Name and Host:Datastore, putting in a filter to only view hosts of a certain name it was ready to go.

cp_cd4

You have the option to preview and continue editing, or saving to the Card Designer main page. From that page you can execute the query. The results of my quick test card are shown below.

cp_cd5

One thing I would like to see is an option to export the results to a csv or excel document…but other than that it’s a great example of what CloudPhysics is all about…data and how to get the most out of it as efficiently as possible.

How-To: VMware Horizon Workspace 1.0 vApp Install – Part 1

I’ve been waiting to deploy Project Octopus for the best part of 18 months… I’m still actively running the Octopus Beta and for my personal use/internal testing and it’s lived up to expectation for the most. There have been a number of bugs identified and general limitations with the Beta release builds, but all in all it does the job. I was a little frustrated with the time to market for the initial GA of the product, and even more so when it was incorporated into the Horizon Suite of products. Feel VMware has missed getting to a key part of the market with DropBox like clones popping up everywhere of late.

Having just gone through my first deployment of the Horizon Workspace vApp (…and failed) …put together with the fact there isn’t much on the internet in terms of walkthroughs, I thought a blog post would be handy. This won’t be a HA scaled out deployment as I only need to support 100-500 internal users for the moment, but the on-line docs do touch on Advanced Configuration tasks.

There is quiet a bit to the deployment, so this post will only touch on the key points and any additional items the docs don’t cover clearly. While starting to write out this post it became clear this would need to be a multi-parter…in this part I’ll go through initial DNS configuration requirements, deploying the Horizon Workspace vApp and going through the initial configuration wizard.

Initial Design Action Items:

Reading through the online docs the key takeaway is that you need to get your DNS right…that is, allocate the vApp VM IP addresses and ensure the reverse IP’s match up. You also need to think about the FQDN for internal and external access.

FQDN: xx.horizon.domain.com -> (split DNS employed relative to the vCenter/ESX environment to ensure internal and external access is achieved without the VM’s having to route publicly)

 Caution: After you deploy, you cannot change the Horizon Workspace FQDN.

This was the mistake I made which meant I had to redeploy the vApp and get the FQDN right. When it came time for me to publish the gateway-va externally the external host name redirected the the FQDN specified during setup which I configured as an internal address.

Deploy The vApp:

Once you download and acquire the OFV from the VMware Download page, deploying the vApp is straight forward, however one thing to point out is that you need to ensure you have a vCenter Datacenter IP Pool configured so that the vAPP can correctly allocate IP/DNS settings to the VM’s. The OVF deployment screen below, warns you about that.

hw_01

I had a previous IP Pool setup for my vCOP’s install, but there wasn’t a requirement to populate the DNS settings. That part is critical for this setup to be successful as the vApp will use these settings to configure DNS on the VM’s…without it, the initial configuration will fail due to a DNS lookup error when the configurator VA tries it’s first lookup against the VA IPs. You will need to restart the VA if any errors are detected.

hw_02

Initial Configuration:

Once the vApp has been deployed you should only have the configurator-va powered on. (do not power on the other VA’s). Log into the vCenter console for the configurator-va and go through the initial Configuration Wizard.

hw_03

Once enter is pressed the wizard kicks off the the DNS checks mentioned above are executed. You are then prompted to enter in the root password to all VA’s in the vApp (this also becomes you default login password). From there you enter in your SMTP relay, Workspace FQDN and vCenter credentials.

hw_05

From this point the wizard goes through and configures the remains VA’s, allocates the root password throughout the different systems and creates the SSL certificate services. This process can take up to 30-40 minutes depending on the your underlying storage. Viewing the process through vCenter you can see a summary of what’s taking place…interestingly (similar to vCloud Director managed VM’s) the VA’s management is taken over by the configurator-va and through that all the wizard actions take place.

hw_08

Once complete you are presented with the message below and you are ready to continue configuring Horizon Workspace from the configurator-va web console.

hw_06

hw_07

Part 2 will follow and run through setting up initial Horizon Workspaces users, groups, services and policies.

VMware PEX ANZ 2013 Thoughts – Software Defined Storage

I was luckey to attend PEX at Australia Technology Park this week and thought I would share some of my take always. The venue was a little different to what you would come to expect from a tech event in Sydney… Usually we are in and around Darling Harbour at the Convention Centre… And even if there where whispers of VMware being late to book the event in the city the surroundings of the old rail works in Redfern refurbished and transformed into a spectacular Centre for technology and innovation fits.

There is a fundamental shift happening in how we consume IT and pretty much all leading technology vendors are in the process of embracing that change. VMware have chosen to focus on three key areas and after a few years of letting the dust settle they have three main pillars of focus.

Software Defined Datacenter
Hybrid Cloud
End User Computing

I’ve written about EUC and their Hybrid Cloud Offerings in the past so I’m not going to focus on that in this post…but the one thing I will say is that VMware still have a material understanding of where their partners sit in the ecosystem and still see them being central to their offerings… As a Service Provider guy working for a vCloud Powered provider there is some concern around the vHPC platform that will be deployed globally over the next few years… But we need to understand that there has to something significant in the Public Cloud space in order to compete with AWS and Google … And maybe Microsofts Azure. AWS is a massive beast and will only be slowed by its own success…will it get too big and product heavy… therefore loosing focus on the basics. There has been the evidence in recent weeks about increasing issues with instance performance due to capacity issues.

With regards to the SDDC push … Last year was the year of network virtualisation but what excites me more at this point is the upcoming features around software defined storage. There has been an explosion of software based storage solutions coming on the market over the past 18 months and VMware have seen this as a key piece to the SDDC.

vVOLs and vSANs represent a massive shift in how vSphere/vCloud environments are architected and engineered. Storage is the biggest pain point for most providers and traditional SANs might have well run their race. There is no doubt that storage arrays are still relevant but with the new technology behind virtual sans on the horizon direct access storage will start to feature… Where we had limitations around availability and redundancy previously the introduction of technology that can take DAS and create a distributed virtual San across multiple hosts excites me.

Why tier and put performance on a device that’s removed from the compute resource? It’s logical to start bringing it back closer to the compute.

Not only to you solve the HA/DRS issue but, given the right choices in DAS/flash/embedded storage there is potential to offer service levels based on low latency/high IOP data store design that takes away the common issue with shared LUNs presented as VMFS or NFS mounts for data stores. Traditional SANs can certainly still exist and this set and in fact will still be critical to act as lower tier high volume storage options.

For a technical overview of VMware Distributed Storage check out Duncan Eppings (@DuncanYB) Post here: There is also a slightly dated VMwareKB overview by Cormac Hogan (@VMwareStorage) that I have embedded below…note that it’s only the tech preview, but if it’s any indication of what’s coming later in the year…it can’t come soon enough.

Being able to control the max/min number of IOPs garunteed to VM/VMDK similar to the way in which you can select the IOP performance on AWS instances is worth the price of admission and solves the current limitations of vSphere in that you can only set max values to block out noisy neighbors.

Vendors that are already pushing out solutions around storage virtualization continue the great work…anything that sits on top of this technology and complements/improves/enhances it can only be a good thing.

It’s the year of storage virtualization…

Additional Reading:

http://www.yellow-bricks.com/2013/03/06/why-the-world-needs-software-defined-storage/
http://www.yellow-bricks.com/2013/04/05/software-defined-storage-just-some-random-thought/
http://www.nexenta.com/corp/products/what-is-openstorage/what-is-software-defined-storage
http://cto.vmware.com/2013-predictions-the-year-of-software-defined-storage/
http://virsto.com/blog/the-missing-link-in-software-defined-storage
http://www.nutanix.com/evolution-of-the-data-center/

Quick Fix: ESX 4.1 Host Stops Responding When iSCSI LUN is “pulled”

REMOVING DEAD PATHS IN ESX4.1 (version 5 guidance here)

Very quick post in relation to a slightly sticky situation I found myself in this afternoon. I was decommissioning a service which was linked to a VM which had a number of VMDKs, one of which was located on a dedicated VMFS Datastore…the guest OS also had a directly connected iSCSI LUN.

I choose to delete the LUNs first and then move up the stack removing the VMFS and eventually the VM. In this I simply went to the SAN and deleted the disk and disk group resource straight up! (hence the pulled reference in the title) Little was I to know that ESX would have a small fit when I attempted to do any sort of reconfiguration or management on the VM. The first sign of trouble was when I attempted to restart the VM and noticed that the task in vCenter wasn’t progressing. At that point my Nagios/OpsView Service Check’s against the ESX host began to timeout and I lost connectivity to the host in the vCenter Console.

Restarting the ESX management agents wasn’t helping and as this was very much a production host with production VM’s on it my first (and older way of thinking) thought of rebooting it wasn’t acceptable during core business/SLA hours. As knowledge and confidence builds with experience in and around ESX I’ve come to use the ESX(i) shell access more and more…so I jumped into SSH and had a look at what the vmkernal logs where saying.

Mar 11 17:55:55 esx03 vmkernel: 393:13:48:38.873 cpu8:4222)NMP: nmp_DeviceUpdatePathStates: Activated path "NULL" for NMP device "naa.6782bcb00014ebe60000035e4de4314c".
Mar 11 17:55:55 esx03 vmkernel: 393:13:48:38.874 cpu12:4265)WARNING: vmw_psp_rr: psp_rrSelectPath: Could not select path for device "naa.6782bcb00014ebe60000035e4de4314c".
Mar 11 17:55:56 esx03 vmkernel: 393:13:48:39.873 cpu11:4223)WARNING: vmw_psp_rr: psp_rrSelectPathToActivate: Could not select path for device "naa.6782bcb00014ebe60000035e4de4314c.

So from the logs it was obvious the system was having major issues (re)connecting to the device I had just pulled out from under it. On the other hosts in the Cluster the datastore was greyed out and I was unable to delete it from the Storage Config. A re-scan of the HBA’s removed the dead datastore from the storage list so if I still had vCenter access to this host a simple re-scan should have sorted things out. Moving to the command line of the host in question I ran the esxcfg-rescan command:

[root@esx03 log]# esxcfg-rescan vmhba39
Dead path vmhba39:C1:T0:L3 for device naa.6782bcb00014ebe60000035e4de4314c not removed.
Device is in use by worlds:
 World # of Handles Name

And at the same time while tailing the vmkernal logs I saw the following entries:

==> vmkernel <==
Mar 11 17:56:16 esx03 vmkernel: 393:13:48:59.768 cpu13:4118)Vol3: 644: Could not open device 'naa.6782bcb00014ebe60000035e4de4314c:1' for volume open: I/O error
Mar 11 17:56:16 esx03 vmkernel: 393:13:48:59.768 cpu13:4118)FSS: 735: Failed to get object f530 28 1 4de4a1f8 3002130c 21000ff6 5abda09b 0 0 0 0 0 0 0 :I/O error
Mar 11 17:56:16 esx03 vmkernel: 393:13:48:59.768 cpu13:4118)WARNING: Fil3: 1987: Failed to reserve volume f530 28 1 4de4a1f8 3002130c 21000ff6 5abda09b 0 0 0 0 0 0 0
Mar 11 17:56:16 esx03 vmkernel: 393:13:48:59.768 cpu13:4118)FSS: 735: Failed to get object f530 28 2 4de4a1f8 3002130c 21000ff6 5abda09b 4 1 0 0 0 0 0 :I/O error
Mar 11 17:56:16 esx03 vmkernel: 393:13:48:59.769 cpu0:4096)VMNIX: VMKFS: 2561: status = -5
Mar 11 17:56:16 esx03 vmkernel: 393:13:48:59.873 cpu9:45315)NMP: nmp_DeviceUpdatePathStates: Activated path "NULL" for NMP device "naa.6782bcb00014ebe60000035e4de4314c".
Mar 11 17:56:16 esx03 vmkernel: 393:13:48:59.874 cpu15:4265)WARNING: NMP: nmpDeviceAttemptFailover: Retry world restore device "naa.6782bcb00014ebe60000035e4de4314c" - no more com mands to retry
Mar 11 17:56:16 esx03 vmkernel: 393:13:49:00.232 cpu15:4120)WARNING: vmw_psp_rr: psp_rrSelectPath: Could not select path for device "naa.6782bcb00014ebe60000035e4de4314c".
Mar 11 17:56:16 esx03 vmkernel: 393:13:49:00.232 cpu15:4120)WARNING: ScsiCore: 1399: Invalid sense buffer: error=0x0, valid=0x0, segment=0x0, key=0x2
Mar 11 17:56:16 esx03 vmkernel: 393:13:49:00.232 cpu15:4120)WARNING: vmw_psp_rr: psp_rrSelectPath: Could not select path for device "naa.6782bcb00014ebe60000035e4de4314c".
Mar 11 17:56:16 esx03 vmkernel: 393:13:49:00.232 cpu15:4120)WARNING: NMP: nmp_IssueCommandToDevice: I/O could not be issued to device "naa.6782bcb00014ebe60000035e4de4314c" due to Not found
Mar 11 17:56:16 esx03 vmkernel: 393:13:49:00.232 cpu15:4120)ScsiDeviceIO: 1672: Command 0x1a to device "naa.6782bcb00014ebe60000035e4de4314c" failed H:0x1 D:0x0 P:0x0 Possible sen se data: 0x2 0x3a 0x0.
Mar 11 17:56:16 esx03 vmkernel: 393:13:49:00.232 cpu15:4120)WARNING: ScsiDeviceIO: 5172: READ CAPACITY on device "naa.6782bcb00014ebe60000035e4de4314c" from Plugin "NMP" failed. I /O error
Mar 11 17:56:16 esx03 vmkernel: 393:13:49:00.232 cpu15:4120)Vol3: 644: Could not open device 'naa.6782bcb00014ebe60000035e4de4314c:1' for volume open: I/O error
Mar 11 17:56:16 esx03 vmkernel: 393:13:49:00.232 cpu15:4120)FSS: 3924: No FS driver claimed device 'naa.6782bcb00014ebe60000035e4de4314c:1': Not supported
Mar 11 17:57:18 esx03 vmkernel: 393:13:50:02.431 cpu15:40621)WARNING: vmw_psp_rr: psp_rrSelectPathToActivate: Could not select path for device "naa.6782bcb00014ebe60000035e4de4314 c".
Mar 11 17:57:18 esx03 vmkernel: 393:13:50:02.431 cpu15:40621)NMP: nmp_DeviceUpdatePathStates: Activated path "NULL" for NMP device "naa.6782bcb00014ebe60000035e4de4314c".

From tailing through those logs the rescan basically detected that the path in question was in use (bound to a datastore where a VMDK was attached to a VM) reporting the “Device is in use by Worlds” error. The e rrors also highlights dead paths due to me removing the LUN while in use.

The point at which the host went into a spin (as viewed by seeing the Could not select Path for device in the vmkernal log) was when I attempted to switch on the VM and the host (still thinking it had access to the VMDK) trying to access all disks.

So lesson learnt. When decommissioning VMFS datastores, don’t pull the LUN from under ESX…remove it gracefully first from vSphere and then you are free to delete on the SAN.

 

First Look: CloudPhysics – Datastore Contention Card

I first came across CloudPhysics just before VMWorld 2012. For a general overview, go here: I am a massive fan of analytics and trend metrics and I use a number of systems to gain a wide overview of the performance and monitoring of our Hosting and Cloud Platform…as well as extending out to client systems.

I love the deep/complex analytics of VMware Operations Manager but sometimes I feel a sense of being overwhelmed with the sheer amount of data presented by the default views of vCOPs and working with the Custom Dashboards can be a frustrating exercise if you don’t have a heap of time and patience.

This is where I have found CloudPhysics comes into it’s own…via it’s brilliant presentation of things that matter. I’m not going to go through the setup and config, but in a nutshell…from the site, register, login, download and deploy the VMware Probe Appliance, give it an IP and enter in your email address as it relates to your CloudPhysics login. It’s one probe per vCenter, but you can deploy multiple probes to multiple vCenters and links them back under the same username and CloudPhysics App.

When you log in, you are presented with the home screen below:

cloudphysics

From relatively humble and basic default cards released around the VMWorld launch the team has been adding more complex and useful cards. HA Cluster Health and SnapShots Gone Wild are my personal favourites and offer a view into key areas of vSphere management. What’s also great about these cards is that they offer external jump links to VMware KB’s and offer basic information about subject matter. The organisation and presentation of the data pulled by the probe is simple yet effective in allowing you to get an understanding of how your environments are performing and which areas are under stress.

Released today was the DataStore Contention Card which looks at the performance of VMFS Datastores in your environment. The Default view selects the DataStore that needs the most attention. In my case I was surprised to see the Datastore below exhibit combined read/write latency that was off the chart!

cloudphysics_2

The interface allows you to select a block of time at any level and see which VM may be contributing the most to the Performance Metric selected. Those metrics are shown below and include Latency, Outstanding I/O’s, IOPS and Bandwidth. You also have the ability to  Filter the view by vCenter, Datastore Cluster and Datastore.

cloudphysics_3

The screen grabs don’t do the CloudPhysic’s Web Application interface justice so head over the site and download the probe to get started. It must be said that the product is only in BETA so use at your own risk, but I’ve had no issues with the Probe VM who’s specs are 2vCPU, 4GB of RAM and 16GB of storage.