The boys at CloudPhysics are working hard behind the scenes at adding new features to their current stable of Analytic Cards based on data collected from their Probe VA’s hooked into vCenter environments.
Check out this post on their DataStore Contention Card:
For a general overview, go here: I am a massive fan of analytics and trend metrics and I use a number of systems to gain a wide overview of the performance and monitoring of our Hosting and Cloud Platform.
A few weeks ago, the CloudPhysics team released to a limited number of users a Custom Card Designer. This pretty much lets you construct custom cards based on a huge number of metrics presented via a builder wizard.
Cards you design and save are listed on the page above. From here you can view your custom cards and edit them if they require tweaking. Once you click the Create Card + button you are presented with a list of property data metrics from which to construct your card.
Properties fall under four main categories and there are a large number of available metrics under each category. The wizard lets you drag and drop items into the builder window. From there you can preview and then save your custom card for future use.
As a quick example I needed a quick way to see which datastores where connected to their respective hosts in each cluster so that consistency in datastore availability was maintained. It was as simple as dragging across Host:Name and Host:Datastore, putting in a filter to only view hosts of a certain name it was ready to go.
You have the option to preview and continue editing, or saving to the Card Designer main page. From that page you can execute the query. The results of my quick test card are shown below.
One thing I would like to see is an option to export the results to a csv or excel document…but other than that it’s a great example of what CloudPhysics is all about…data and how to get the most out of it as efficiently as possible.
I was luckey to attend PEX at Australia Technology Park this week and thought I would share some of my take always. The venue was a little different to what you would come to expect from a tech event in Sydney… Usually we are in and around Darling Harbour at the Convention Centre… And even if there where whispers of VMware being late to book the event in the city the surroundings of the old rail works in Redfern refurbished and transformed into a spectacular Centre for technology and innovation fits.
There is a fundamental shift happening in how we consume IT and pretty much all leading technology vendors are in the process of embracing that change. VMware have chosen to focus on three key areas and after a few years of letting the dust settle they have three main pillars of focus.
Software Defined Datacenter
End User Computing
I’ve written about EUC and their Hybrid Cloud Offerings in the past so I’m not going to focus on that in this post…but the one thing I will say is that VMware still have a material understanding of where their partners sit in the ecosystem and still see them being central to their offerings… As a Service Provider guy working for a vCloud Powered provider there is some concern around the vHPC platform that will be deployed globally over the next few years… But we need to understand that there has to something significant in the Public Cloud space in order to compete with AWS and Google … And maybe Microsofts Azure. AWS is a massive beast and will only be slowed by its own success…will it get too big and product heavy… therefore loosing focus on the basics. There has been the evidence in recent weeks about increasing issues with instance performance due to capacity issues.
With regards to the SDDC push … Last year was the year of network virtualisation but what excites me more at this point is the upcoming features around software defined storage. There has been an explosion of software based storage solutions coming on the market over the past 18 months and VMware have seen this as a key piece to the SDDC.
vVOLs and vSANs represent a massive shift in how vSphere/vCloud environments are architected and engineered. Storage is the biggest pain point for most providers and traditional SANs might have well run their race. There is no doubt that storage arrays are still relevant but with the new technology behind virtual sans on the horizon direct access storage will start to feature… Where we had limitations around availability and redundancy previously the introduction of technology that can take DAS and create a distributed virtual San across multiple hosts excites me.
Why tier and put performance on a device that’s removed from the compute resource? It’s logical to start bringing it back closer to the compute.
Not only to you solve the HA/DRS issue but, given the right choices in DAS/flash/embedded storage there is potential to offer service levels based on low latency/high IOP data store design that takes away the common issue with shared LUNs presented as VMFS or NFS mounts for data stores. Traditional SANs can certainly still exist and this set and in fact will still be critical to act as lower tier high volume storage options.
For a technical overview of VMware Distributed Storage check out Duncan Eppings (@DuncanYB) Post here: There is also a slightly dated VMwareKB overview by Cormac Hogan (@VMwareStorage) that I have embedded below…note that it’s only the tech preview, but if it’s any indication of what’s coming later in the year…it can’t come soon enough.
Being able to control the max/min number of IOPs garunteed to VM/VMDK similar to the way in which you can select the IOP performance on AWS instances is worth the price of admission and solves the current limitations of vSphere in that you can only set max values to block out noisy neighbors.
Vendors that are already pushing out solutions around storage virtualization continue the great work…anything that sits on top of this technology and complements/improves/enhances it can only be a good thing.
REMOVING DEAD PATHS IN ESX4.1 (version 5 guidance here)
Very quick post in relation to a slightly sticky situation I found myself in this afternoon. I was decommissioning a service which was linked to a VM which had a number of VMDKs, one of which was located on a dedicated VMFS Datastore…the guest OS also had a directly connected iSCSI LUN.
I choose to delete the LUNs first and then move up the stack removing the VMFS and eventually the VM. In this I simply went to the SAN and deleted the disk and disk group resource straight up! (hence the pulled reference in the title) Little was I to know that ESX would have a small fit when I attempted to do any sort of reconfiguration or management on the VM. The first sign of trouble was when I attempted to restart the VM and noticed that the task in vCenter wasn’t progressing. At that point my Nagios/OpsView Service Check’s against the ESX host began to timeout and I lost connectivity to the host in the vCenter Console.
Restarting the ESX management agents wasn’t helping and as this was very much a production host with production VM’s on it my first (and older way of thinking) thought of rebooting it wasn’t acceptable during core business/SLA hours. As knowledge and confidence builds with experience in and around ESX I’ve come to use the ESX(i) shell access more and more…so I jumped into SSH and had a look at what the vmkernal logs where saying.
Mar 11 17:55:55 esx03 vmkernel: 393:13:48:38.873 cpu8:4222)NMP: nmp_DeviceUpdatePathStates: Activated path "NULL" for NMP device "naa.6782bcb00014ebe60000035e4de4314c".
Mar 11 17:55:55 esx03 vmkernel: 393:13:48:38.874 cpu12:4265)WARNING: vmw_psp_rr: psp_rrSelectPath: Could not select path for device "naa.6782bcb00014ebe60000035e4de4314c".
Mar 11 17:55:56 esx03 vmkernel: 393:13:48:39.873 cpu11:4223)WARNING: vmw_psp_rr: psp_rrSelectPathToActivate: Could not select path for device "naa.6782bcb00014ebe60000035e4de4314c.
So from the logs it was obvious the system was having major issues (re)connecting to the device I had just pulled out from under it. On the other hosts in the Cluster the datastore was greyed out and I was unable to delete it from the Storage Config. A re-scan of the HBA’s removed the dead datastore from the storage list so if I still had vCenter access to this host a simple re-scan should have sorted things out. Moving to the command line of the host in question I ran the esxcfg-rescan command:
[root@esx03 log]# esxcfg-rescan vmhba39
Dead path vmhba39:C1:T0:L3 for device naa.6782bcb00014ebe60000035e4de4314c not removed.
Device is in use by worlds:
World # of Handles Name
And at the same time while tailing the vmkernal logs I saw the following entries:
==> vmkernel <==
Mar 11 17:56:16 esx03 vmkernel: 393:13:48:59.768 cpu13:4118)Vol3: 644: Could not open device 'naa.6782bcb00014ebe60000035e4de4314c:1' for volume open: I/O error
Mar 11 17:56:16 esx03 vmkernel: 393:13:48:59.768 cpu13:4118)FSS: 735: Failed to get object f530 28 1 4de4a1f8 3002130c 21000ff6 5abda09b 0 0 0 0 0 0 0 :I/O error
Mar 11 17:56:16 esx03 vmkernel: 393:13:48:59.768 cpu13:4118)WARNING: Fil3: 1987: Failed to reserve volume f530 28 1 4de4a1f8 3002130c 21000ff6 5abda09b 0 0 0 0 0 0 0
Mar 11 17:56:16 esx03 vmkernel: 393:13:48:59.768 cpu13:4118)FSS: 735: Failed to get object f530 28 2 4de4a1f8 3002130c 21000ff6 5abda09b 4 1 0 0 0 0 0 :I/O error
Mar 11 17:56:16 esx03 vmkernel: 393:13:48:59.769 cpu0:4096)VMNIX: VMKFS: 2561: status = -5
Mar 11 17:56:16 esx03 vmkernel: 393:13:48:59.873 cpu9:45315)NMP: nmp_DeviceUpdatePathStates: Activated path "NULL" for NMP device "naa.6782bcb00014ebe60000035e4de4314c".
Mar 11 17:56:16 esx03 vmkernel: 393:13:48:59.874 cpu15:4265)WARNING: NMP: nmpDeviceAttemptFailover: Retry world restore device "naa.6782bcb00014ebe60000035e4de4314c" - no more com mands to retry
Mar 11 17:56:16 esx03 vmkernel: 393:13:49:00.232 cpu15:4120)WARNING: vmw_psp_rr: psp_rrSelectPath: Could not select path for device "naa.6782bcb00014ebe60000035e4de4314c".
Mar 11 17:56:16 esx03 vmkernel: 393:13:49:00.232 cpu15:4120)WARNING: ScsiCore: 1399: Invalid sense buffer: error=0x0, valid=0x0, segment=0x0, key=0x2
Mar 11 17:56:16 esx03 vmkernel: 393:13:49:00.232 cpu15:4120)WARNING: vmw_psp_rr: psp_rrSelectPath: Could not select path for device "naa.6782bcb00014ebe60000035e4de4314c".
Mar 11 17:56:16 esx03 vmkernel: 393:13:49:00.232 cpu15:4120)WARNING: NMP: nmp_IssueCommandToDevice: I/O could not be issued to device "naa.6782bcb00014ebe60000035e4de4314c" due to Not found
Mar 11 17:56:16 esx03 vmkernel: 393:13:49:00.232 cpu15:4120)ScsiDeviceIO: 1672: Command 0x1a to device "naa.6782bcb00014ebe60000035e4de4314c" failed H:0x1 D:0x0 P:0x0 Possible sen se data: 0x2 0x3a 0x0.
Mar 11 17:56:16 esx03 vmkernel: 393:13:49:00.232 cpu15:4120)WARNING: ScsiDeviceIO: 5172: READ CAPACITY on device "naa.6782bcb00014ebe60000035e4de4314c" from Plugin "NMP" failed. I /O error
Mar 11 17:56:16 esx03 vmkernel: 393:13:49:00.232 cpu15:4120)Vol3: 644: Could not open device 'naa.6782bcb00014ebe60000035e4de4314c:1' for volume open: I/O error
Mar 11 17:56:16 esx03 vmkernel: 393:13:49:00.232 cpu15:4120)FSS: 3924: No FS driver claimed device 'naa.6782bcb00014ebe60000035e4de4314c:1': Not supported
Mar 11 17:57:18 esx03 vmkernel: 393:13:50:02.431 cpu15:40621)WARNING: vmw_psp_rr: psp_rrSelectPathToActivate: Could not select path for device "naa.6782bcb00014ebe60000035e4de4314 c".
Mar 11 17:57:18 esx03 vmkernel: 393:13:50:02.431 cpu15:40621)NMP: nmp_DeviceUpdatePathStates: Activated path "NULL" for NMP device "naa.6782bcb00014ebe60000035e4de4314c".
From tailing through those logs the rescan basically detected that the path in question was in use (bound to a datastore where a VMDK was attached to a VM) reporting the “Device is in use by Worlds” error. The e rrors also highlights dead paths due to me removing the LUN while in use.
The point at which the host went into a spin (as viewed by seeing the Could not select Path for device in the vmkernal log) was when I attempted to switch on the VM and the host (still thinking it had access to the VMDK) trying to access all disks.
So lesson learnt. When decommissioning VMFS datastores, don’t pull the LUN from under ESX…remove it gracefully first from vSphere and then you are free to delete on the SAN.
I first came across CloudPhysics just before VMWorld 2012. For a general overview, go here: I am a massive fan of analytics and trend metrics and I use a number of systems to gain a wide overview of the performance and monitoring of our Hosting and Cloud Platform…as well as extending out to client systems.
I love the deep/complex analytics of VMware Operations Manager but sometimes I feel a sense of being overwhelmed with the sheer amount of data presented by the default views of vCOPs and working with the Custom Dashboards can be a frustrating exercise if you don’t have a heap of time and patience.
This is where I have found CloudPhysics comes into it’s own…via it’s brilliant presentation of things that matter. I’m not going to go through the setup and config, but in a nutshell…from the site, register, login, download and deploy the VMware Probe Appliance, give it an IP and enter in your email address as it relates to your CloudPhysics login. It’s one probe per vCenter, but you can deploy multiple probes to multiple vCenters and links them back under the same username and CloudPhysics App.
When you log in, you are presented with the home screen below:
From relatively humble and basic default cards released around the VMWorld launch the team has been adding more complex and useful cards. HA Cluster Health and SnapShots Gone Wild are my personal favourites and offer a view into key areas of vSphere management. What’s also great about these cards is that they offer external jump links to VMware KB’s and offer basic information about subject matter. The organisation and presentation of the data pulled by the probe is simple yet effective in allowing you to get an understanding of how your environments are performing and which areas are under stress.
Released today was the DataStore Contention Card which looks at the performance of VMFS Datastores in your environment. The Default view selects the DataStore that needs the most attention. In my case I was surprised to see the Datastore below exhibit combined read/write latency that was off the chart!
The interface allows you to select a block of time at any level and see which VM may be contributing the most to the Performance Metric selected. Those metrics are shown below and include Latency, Outstanding I/O’s, IOPS and Bandwidth. You also have the ability to Filter the view by vCenter, Datastore Cluster and Datastore.
The screen grabs don’t do the CloudPhysic’s Web Application interface justice so head over the site and download the probe to get started. It must be said that the product is only in BETA so use at your own risk, but I’ve had no issues with the Probe VM who’s specs are 2vCPU, 4GB of RAM and 16GB of storage.
A few years ago there was a theory put forward by a certain Apple CEO that we were entering the Post PC Era…while I have never subscribed to that theory (which was affirmed by VMware CTO Steve Herrod at VMworld 2012) it’s obvious to see that the revolution is more based around the ways in which workers access their desktops, data and LOB applications. I think the fact that we have been inundated with iPhones, iPads, Galaxy Tablets and a like has had something to do with the misunderstanding of the Post PC Era.
The facts are that the PC will never disappear (for the foreseeable future anyways), and when I say PC…I don’t mean Windows, I also mean Apple and Linux desktops…as much as the fanboys would tell you otherwise, these are PC’s. So let’s try to think about the Post PC Era as the End User Computing Revolution. This much better reflects what I believe is happening at the moment.
At VMworld we saw demos of Horizon Application Manager with AppBlast, Data (formally Octopus) and the SSO experience for bringing external applications (be it SaaS or Hosted) all accessible and available via the one browser window. What this represents is the power of the browser and what can be achieved by getting the correct framework in place to deliver everything that was previously done on the desktop or externally via a provider through a private or hosted instance of Horizon.
We are about to enter a world where SaaS is only part of the equation. Five or so years ago, many people where seeing SaaS as the ultimate solution for most SME/B’s whereby every key service and application is delivered by external providers. The power of virtualization has rebalanced the scales by way of allowing companies to look at deploying extremely scalable and cost effective private cloud solutions. The vCloud stack is as feature rich as it is malleable. There is no reason for all future installs of ESX and vCenter to include the vCloud management and automation layer…when you add the additional layer of DynamicOps, you start to have the building blocks for a client infrastructure that can seamlessly move workloads between private and partner hosted environments (and public if they so wish).
So what will impact this uptake of this shift? It’s really quiet easy to work out…end user acceptance…Will a key decision maker at a company looking at their options fully comprehend what this shift entails? Will they understand the fundamental shift that translates an employee’s workspace from a decentralised mess of files, applications and external services to a logically presented single sign on experience? Will they understand the concept of the Self Service Experience when it comes to new or additional applications?
Really, what it comes down to in order to ease decision makers and end users into this new EUC world is ensuring that integrators and service providers fully understand the technology themselves…that is, there needs to be a process whereby this technology and the concepts are properly delivered via a productization process.
Learn -> Productize -> Promote -> Sell -> Deliver
Internally, to deliver the EUC experience we are just undertaking the Learning stage, but it’s also my job as a technology evangelist to Promote and Sell the concepts. While I hate the term, there will come a time where we need to “Dogfood” the technology. By getting sales people, tech teams and select management onto an internal beta/UAT of a platform like Horizon is key to ensuring that the Promote, Sell and Deliver part of the equation can go smoothly.
When I close my eyes and think about how our SBM clients should be working in 12-18 months time I can picture a single user experience with the browser being central to deliver files, apps and the desktop. For me…there is no better platform than Horizon, and VMware will work hard to ensure partners/service providers will be positioned to deliver on the promise of the real revolution!