Tag Archives: Metrics

Enabling, Configuring and Viewing Metrics in vCloud Director 9.0

Last week I released a post on configuring Cassandra for vCloud Director 9.0 metrics. As a refresher, one of the cool features released in vCloud Director SP 5.6.x was the ability to expose VM metrics that service providers could expose to their clients via a set of API calls. With the release of vCloud Director 9.0, the metrics can now be viewed from the new HTML5 tenant UI, meaning that all service providers should be able to offer this to their customers.

With the Cassandra configuration out of the way, the next step is to use the Cell Management Tool to tell the vCD cells to push the VM Metric data. Before this, if you log into the HTML5 UI you will notice no menu for Monitoring…this only gets enabled once the metrics have have been enabled by the tool.

The command has changed from previous versions in line with removing the dependancy on the KairosDB and we are now calling a cassandra argument that has the following options:

Those familiar with the previous command to configure the metrics will see a lot more options that specify the Cassandra nodes, the original command to configure the schema, the username and password to connect to the Cassandra database with and the ttl for the data, meaning that if you wanted you could keep more than two weeks of data.

If you tail the Cassandra system.log while the process is happening you will see a bunch of tables being created and populated with the initial data.

With the done, if you go into the new HTML5 Tenant UI and go to the Virtual Machine view you should now see a Monitoring Chart drop down in the menu in the main window. From here you can choose any of the available metrics across a half hour, hour, day and week timescale.

API Calls to Retrieve Current and Historical Metrics:

If you still want to go old school the following API Calls are used to gather current and historical VM metrics for vCD VMs. The Machine ID required used the VM GUID as seen in vCenter. The ID can be sourced from the VM Name. The vCD Machine ID shown below in the brackets is what you are after.



Configuring Cassandra for vCloud Director 9.0 Metrics

One of the cool features released in vCloud Director SP 5.6.x was the ability to expose VM metrics that service providers could expose to their clients via a set of API calls. Some service providers took advantage of this and where able to offer basic VM metrics to their tenants through customer written portals. Zettagrid was one of those service providers and while I was at Zettagrid, I worked with the developers to get VM metrics out to our customers.

Part of the backend configuration to enable the vCloud Director cells to export the metric data was to stand up a Cassandra/KairosDB cluster. This wasn’t a straight forward exercise but after a bit of tinkering due to a lack of documentation, most service providers where able to have the backend in place to support the metrics.

With the release of vCloud Director 9.0, the requirement to have KairosDB managed by Apache has been removed and metrics can now be accessed natively in Cassandra using the cell management tool. Even cooler is that the metrics can now be viewed from the new HTML5 tenant UI, meaning that all service providers should be able to offer this to their customers.

Cassandra is an open source database that you can use to provide the backing store for a scalable, high-performance solution for collecting time series data like virtual machine metrics. If you want vCloud Director to support retrieval of historic metrics from virtual machines, you must install and configure a Cassandra cluster and use the cell-management-tool to connect the cluster to vCloud Director. Retrieval of current metrics does not require optional database software.

The vCloud Director online docs have a small install guide but it’s not very detailed. It basically says to install and configure the Cassandra cluster with four nodes, two of which are seed nodes, enabling encryption and user authentication with Java Native Access installed. Not overly descriptive. I’ve created an script below that installs and configures a basic single node Cassandra cluster that will suffice for most labs/testing environments.

Setting up Cassandra on Ubuntu 16.04 LTS:

I’ve forked an existing bash script on Github and added modifications that goes through the installation and configuration of Cassandra 2.2.6 (as per the vCD 9.0 release notes) on a single node, enabling authentication while disabling encryption in order to keep things simple.

This will obviously work on any distro that supports apt-get. Once configured you can view the Cassandra status by using the nodetool status command as shown below.

The manual steps for the Cassandra installation are below…note that they don’t include the configuration file changes required to enable authentication and set the seeds.

From here you are ready to configure vCD to push the metrics to the Cassandra database. I’ll cover that in a seperate post.

References:

https://docs.vmware.com/en/vCloud-Director/9.0/com.vmware.vcloud.install.doc/GUID-E5B8EE30-5C99-4609-B92A-B7FAEC1035CE.html

https://www.vmware.com/content/dam/digitalmarketing/vmware/en/pdf/vcloud/vmware-vcloud-director-whats-new-9-0-white-paper.pdf

Runecast: Overview and Service Provider Use Case

A few months ago I was lucky enough to spend time with a couple of the founders of Runecast, Stanimir Markov and Ched Smokovic and got to know a little more about their real time analytics platform for VMware based infrastructure. Soon after that I downloaded and deployed it in my lab and have been running it for a few months. In that time I’ve come to understand and appreciate the value that it adds to the operations and management of any vSphere platform.

Having been part of, and led teams that operated and managed large vSphere based cloud platforms one of the challenges of managing any platform of size is how to stay on top of issues operationally…not only when and as they happen, but also before then happen. Proactive monitoring and alerting that pinpoints issues before they happen is invaluable and up to this point I haven’t found a product that focuses in as specifically as Runecast does to help solve that challenge.

In the past I have researched and used more than a few tools on the market and probably the closest comparison that I can make with Runecast is what CloudPhysics tried to do with their Knowledge Base Adviser feature. For those that have used CloudPhysics in the past Runecast will feel somewhat similar in theory, however Runecast have taken what CloudPhsyics had done and taken it to the next level.

By using a number of resources within VMware’s knowledgebase Runecast is been able to deliver a platform that looks at best practices, log information and security hardening guides to monitor your vSphere infrastructure which in turn brings to your attention through a simple yet intuitive interface to issues that may exist.

Runecast for Service Providers:

Proactive analysis is the name of the game and it’s one of the holy grail’s for any operations team. Prevention of an issue before it occurs is what Runecast sets out to achieve and for service providers that are running critical line of business applications for their clients (which is all service providers) the ability to prevent service disruption is huge.

Apart from the obvious benefits around proactive analytics, one of the best features for service providers is the security hardening feature. Lots of service providers these days are being governed by specific regulations and compliance and security has become front and center of any platform owner. With the security hardening feature it points out specifically what passes and what fails as per the official VMware hardening guide.

I can also see how the specific inventory feature for vCenter objects can be developed in the future to allow service providers to expose certain information via the Runecast APIs to their tenants. I’d love to see some integration with vCloud Director, NSX and vSAN among other VMware platforms…there is serious potential here.

The API endpoints that are being exposed version to version means that service providers can take the information presented and manipulate it their hearts content. It providers a powerful way for service providers to take full advantage of the data that’s being collect and analyised.

Final Thoughts:

This is, for the most a targeted analytics system that focuses on getting you the relevant information quickly and without fuss and allows you to ascertain issues and work towards their resolution. I’m looking forward to seeing what the guys come up with over the next twelve to eighteen months as they further enhance the capabilities.

For your free 14 day Trial register here and if you are heading to VMworld this year make sure to visit them at Booth #832

Disclaimer: Runecast are sponsors of Virtualization is Life!

CloudPhysics: Rightsizing Intelligence and Cost Calculator for Private Cloud

CloudPhysics have been a little quiet over the past twelve or so months with focus shifting from presenting data via Cards to Dashboards and also focusing on delivering on boarding solutions for managed service provider partners that has resulted in their channel business growing successfully. Before VMworld they announced the release of their Cost Calculator for Private Clouds in addition to releasing a couple more dashboards for their SaaS based landing page as well as adding a tagging feature for VMs and other objects.

CloudPhysics roots is all about data science and what can be achieved with literally billions of data points…so it’s no surprise that they are starting to put that front and center when it come to their new feature capabilities. Rightsizing at the 99th and 95th percentile usually cuts off the top 5% or 1% of metric peaks, and then presents the data at the nearest metric rate. In this way infrequent peaks are ignored, and the data is better suited to making decisions against. Now CloudPhysics rightsizing can be applied with intelligence to virtual machines and compute/storage infrastructure and capture savings by reducing workloads to match actual demands and reduce over provisioning.

The CloudPhysics Cost Calculator for Private Cloud lets you apply basic costing models to determine your actual costs per virtual machine (VM) in terms of power, compute resources, memory, storage, licensing, and more to generate a cost baseline.

As you can see below the new Card gives you the option to enter in cost points for most input items in a typical private cloud situation. They have not only included standard costs of servers hardware, memory and storage but also given you options to enter in depreciation terms, hypervisor cost details, environment costs relating to power and cooling but also additional 3rd party license costs that could be used for backup or acceleration software.

Once entered in you can filter through your platform as seen by the CloudPhysics Observer and get an understanding of what each individual VM is costing you in relation to your inputs. You also get a Cost as Configured amount that can be adjusted for the 99th and 95th percentile as well.

This view really gives you an understanding of what VMs are costing you the most and then get an idea of how to plan for any move to a public cloud where rightsizing based on more than just maximums is key. There is an option to click on the Compare Cloud Costs button which takes you to a new sister Card that displays the side by side cost of hosting your private cloud on AWS or Azure and again lets you manipulate the data with rightsizing.

In talking with the CloudPhysics team I’m hopeful that they will add to this card to include vCloud Air Network service providers running vSphere based IaaS platforms. I’m sure the 4000 odd vCAN SPs would appreciate a direct comparison for potential new customers looking to make a choice between the hyperscalers and their on platforms.

New Dashboard Items and Tags:

As mentioned in the opening paragraph CloudPhysics also added a couple new dashboards that can be configured to look at a number of different VM and Host metrics and show a trend over the last one, seven for thirty days. These new dashboard items as shown below are extremely handy for being bale to pick up problem objects in your infrastructure.

Also added is the basic ability to add Tags to VMs for easier searching from withing the CloudPhysics interface. In future these will be possibly integrated with vSphere tags which would be a welcome feature as more and more people are implementing tags for Storage Based Policy Management and Backup Management.

All in all another great set of enhancements to the CloudPhysics platform and I can tell you all that you need to keep an eye on what the team has in store for the next 6-12 months as I believe they are ready to take their offering to the next level and expand well and truly beyond anything they have done up to this point.

They have a free edition which you can tryout here: CloudPhysics Free Edition

Additional Content:

Chris Schin from goes through some of the new features during VMworld.

Resources:

https://en.wikipedia.org/wiki/Percentile

http://vmblog.com/archive/2016/08/25/cloudphysics-unveils-cost-calculator-for-private-cloud-with-public-cloud-comparison-tool.aspx#.V9au3Lh94-W

CloudPhysics Exploration Mode – New Host View

Late last year CloudPhysics released their VM Exploration mode feature which allowed for a detailed look into what was happening holistically to a VM with the ability to view key metrics and VM related events over an extended period of time. Last weekend CloudPhsyics extended this to also include Hosts. Extending Exploration Mode to include Hosts further improves the proactive monitoring and analysis capabilities of the CloudPhysics platform as it looks to break away from its roots of Card Views.

With Exploration Mode now encompassing both VMs and hosts, administrators can focus in on a workload performance issue and “replay” the environment to correlate events, resource utilization patterns, and environment changes in the seconds, minutes or days leading up to a problem in application performance or availability.

To view a Host with Exploration Mode, you use the new Search Virtual Machines and Hosts bar at the top of the CloudPhysics Web Console.

Once the Host has been selected you get taken to a dashboard that gives you configuration details of the Host, any changes (Power Operations, Snapshot, vMotions) that have been done against that VM in the provided date range and a performance graph that covers CPU, Memory, Network and Storage. There is also an Issues section which alerts you to any possible configuration issues or mismatch.

There is also the introduction of a Tab View which allows you to have open multiple Hosts and/or VMs to compare against…what would be nice would be the ability to overlay both Hosts and VMs to try and pinpoint events or key metrics points as they happened.

Below is a YouTube video from a recent webinar where the CloudPhysics VP Product Management Chris Schin walks through the way the platform uses Exploration Mode to identify root causes of VM Performance issues.

If you are interested in giving CloudPhysics a try, they have a free edition which you can register for and download here: CloudPhysics Free Edition

Firstlook: CloudPhysics Exploration Mode

During VMworld CloudPhysics released their new Dashboard Feature which saw a change of direction in the way CloudPhysics customers get presented with their data and was the first time Card Based analytics was not used to allow access to the wide array of metrics CloudPhysics stores in their data warehouses.

I’ve been working closely with the CloudPhysics team for a number of years now and they are great at listening to feedback around how to improve the platform. One of my biggest gripes (if you could call it that) over the years was that there was no way to view in detail (and historically) what was happening to a particular VM. One of the other issues was the time it took for data to show up in the CloudPhysics UI which meant that you could get access to data after about thirty minutes.

With the release of Exploration Mode there is more a case for proactive monitoring and analysis of VMs and their issues and the data refresh rate has been brought down to about 15 minutes which allows for more real time troubleshooting as well as allowing us to go back in time a number of days to try and correlate issues and try to look at patterns that might have occurred over the course of those days.

With Exploration Mode, administrators can go back in time, correlating events, issues, and changes that are associated with any selected time range in the vSphere environment, making it possible for users to see exactly what transpired in the seconds, minutes or days leading up to an application performance or availability problem.

To view a VM with Exploration Mode, you use the new Search VMs bar at the top of the CloudPhysics Web Console.

Once the VM has been selected you get taken to a dashboard that gives you configuration details of the VM, any changes (Power Operations, Snapshot, vMotions) that have been done against that VM in the provided date range and a performance graph that covers CPU, Memory, Network and Storage. There is also an Issues section which alerts you to any possible

CloudPhysics have always been a personal favorite of mine and I’m legitimately excited with what the team has got in store to further develop the platform into an extremely powerful analytics tool for VMware based platforms.

They have a free edition which you can tryout here: CloudPhysics Free Edition

Sources:

http://www.marketwired.com/press-release/cloudphysics-releases-exploration-mode-which-lets-vmware-users-identify-root-cause-2069851.htm

#VMworld: First Look – CloudPhysics New Release

Over the past year the guys at CloudPhysics have been relatively quiet compared to the proceeding 3 years since they burst onto the Scene at VMworld 2012. The reason relative radio science has been had was that they have been busily working away on a revamp of their SaaS based Analytics platform…and the results are impressive.

The new release has the following highlights.

  • Always-on diagnostics: Continuous diagnosis of infrastructure with changes continuously captured, recorded and reflected. Unique data derivations, correlations, mashups and filters reduce “noise” and identify true hazards.
    • Configurable dashboards: Rich contextual views expose hot spots and potential risks before problems form and impact operations. Trending analysis consolidates multiple objects and views, enabling multi-dimensional correlation.
    • Groundbreaking exploration capabilities: Interactive ability to analyze changes over time through easily manipulated exploration mode, using time slices with zoom in/out capabilities to evaluate correlations and causation. Users can “correlate in context” to troubleshoot application disruptions with data drawn from VM performance/resource consumption; change/event log; configuration history; and known issues associated with operational hazards and best practices.
    • 20+ new analytics for managing health and preempting hazards, available in our extensive library of “cards.”
    • Platform innovation: Time series data is uniquely handled by the CloudPhysics platform to enable a user to analyze multiple dimensions of the infrastructure around the same time axis.

With the new features, CloudPhysics delivers unique, meaningful insights, giving vSphere teams the confidence to act boldly to reduce risk and waste without compromising safety of the virtual infrastructure or the applications it supports. Building on its ease of use, intuitive user interface and deep visibility across multiple vCenters, CloudPhysics now:

  • Reduces disruption and incidents with always-on diagnostics that surface hot spots and emerging problems, enabling admins to get ahead of nascent performance problems• Improves mean-time-to-resolution (MTTR) with directed exploration, enabling admins to zero in on root cause and resolve application disruptions more quickly• Generates insights for realigning misconfigured infrastructure to prevent future performance and availability issues and improve efficiency

CloudPhysics have always been a personal favourite of mine since I first chatted to Irfan back on the Solutions Exchange floor in 2012 and I’m legitimately excited with what the team has got in store to further develop the platform into an extremely powerful analytics tool for VMware based platforms.

They have a free edition which you can tryout here: CloudPhysics Free Edition

While at VMworld, head over to Booth #2346

vCloud Director SP: VM Metrics Database Configuration Part 2

vCloud Director SP 5.6.x has the ability to export VM statistics to an external database source which can then be queried via a set of new API calls. I’ve gone through a couple of different posts on how to configure the Cassandra/KairosDB data platform.

vCloud Director SP: VM Metrics Database Configuration Part 1
Installing and Configuring Cassandra and KairosDB on Ubuntu 14.04 LTS

This post finishes off the series and goes through the configuration of vCloud Director to start exporting VM metrics and then how to query the vCloud APIs to get those metrics. One thing to mention before continuing is that the vCD SP Documentation KairosDB v 0.9.1 is referenced as the version to install. Even though there are newer builds 0.9.1 is the only one tested and verified…other versions cause bugs and I have seen some strange results.

Configuring vCD SP Metric Database Connection:

Data for historic metrics is stored in a KairosDB database backed by a Cassandra cluster. Cassandra and KairosDB are configured you then use the cell-management-tool utility to connect vCloud Director to KairosDB. To create a connection from KairosDB to a vCloud Director, use a command line with the following form:

cell-management-tool configure-metrics options

Those familiar with the vCD cell.log entries you will notice a couple new startup entries…there is a new port (8999) thats bound for KairosDB communications and the tail end of the log above appears after the config..though I’m not sure what it actually does.

Once the cell services have been restarted the VM metrics should start to be collected by KairosDB. Wait about 5-10 minutes and then enter this URL into a web browser : http://IP-KairosDD:8080/api/v1/metricnames You should see the following displayed:

You can see that the results show metrics relating to VMs…this is a good thing! Using a RestClient you will see a much prettier output

API Calls to Retrieve Current and Historical Metrics:

The following API Calls are used to gather current and historical VM metrics for vCD VMs. The Machine ID required used the VM GUID as seen in vCenter. The ID can be sourced from the VM Name. The vCD Machine ID shown below in the brackets is what you are after.



CloudPhysics: Enhanced Storage Analytics Cards [Part 1] – Datastore Contention

The guys at CloudPhysics have been busy behind the scenes of late working on improving an already great Analytic and Monitoring platform and recently I was able to preview a new enhanced set of Storage Analytic Cards. These cards are currently in Preview and with an official write up here by @esxtopGuru Which goes through the different cards on offer.

Coming from Service Provider land, I am always extremely interested in being able to find out how my datastores are performing and which VMs are causing or have caused trouble…I am also interested in SnapShots and if any have the potential to do harm on our platform. In this post i’ll be going through the Datastore Contention v2 Card…followed by Part 2 which will go through the Snapshots Gone Wild v2 Card.

Below is the new interface to the Datastore Contention v2 Card and you can see off the bat that there is a lot more going when compared to the v1 Cards

The initial Card View will show you Datastores across your environment that need Attention and those that are of interest. This is based on an algorithm that CloudPhysics have created that dictates acceptable levels of contention on VMs on datastores. You will get an overview of Throughput, IOPS and Latency metrics as well as total VMs and how many are potentially affected by storage contention.

While the actual metrics haven’t changed here from the v1 Card the way in which you can manipulate the data has been enhanced. For a period going back the lat 24 hours (It would be nice to go back further…something I’ve mentioned as a feature request) you can dynamically change the graph to display Bandwidth, Latency, IOPS, Outstanding IOs and choose to display Average, Read and/or Write Values.

As you click on the Active Red Zones in the Graph the list of Culprit VMs and Victim VMs changes to match the time period. [UPDATED] You now have a side by side view of Culprit VMs and Victim VMs giving quick and easy access to affected instances…By Clicking on the Blue View Details Button you can further drill down into the list and view VM specific storage metrics for that period as shown below.

You also now have the ability to export each graph in a variety of usable formats which is excellent for reporting purposes. In fact exporting has been enabled at all levels and you can export to CSV the entire Card View of data by clicking on the ALL Tab that lists all datastores in your environment. As a side note…the left hand search menu drop box sorting has been added to by a dynamic search bar which allows you to search for datastores…it even takes regular expression for those that are that way inclined!

Heading back to the top of the Card View you get an overall Aggregate picture of the currently selected datastores in the main presentation area…this dynamically adjusts based on what datastore(s) are in focus at the time.

Wrapping Part 1 up the new Datastore Contention card is brilliant…not only does it better give you access to potential problem datastores and VMs but it’s able to let you visualize and export data quickly and efficiently…The enhancements in the visual representation of data shows that the guys at CloudPhysics are looking at more dynamic style of data view which makes the overall experience and usability that much more enhanced.

http://blog.cloudphysics.com/blog/whos-minding-your-storage-zoo

http://blog.cloudphysics.com/blog/2014/4/7/noisy-neighbor-where-art-thou-performance-culprit-and-victim-analysis-using-cloudphysics-storage-analytics

 

First Look: CloudPhysics Card Designer

The boys at CloudPhysics are working hard behind the scenes at adding new features to their current stable of Analytic Cards based on data collected from their Probe VA’s hooked into vCenter environments.

Check out this post on their DataStore Contention Card:

For a general overview, go here: I am a massive fan of analytics and trend metrics and I use a number of systems to gain a wide overview of the performance and monitoring of our Hosting and Cloud Platform.

A few weeks ago, the CloudPhysics team released to a limited number of users a Custom Card Designer. This pretty much lets you construct custom cards based on a huge number of metrics presented via a builder wizard.

Cards you design and save are listed on the page above. From here you can view your custom cards and edit them if they require tweaking. Once you click the Create Card + button you are presented with a list of property data metrics from which to construct your card.

Properties fall under four main categories and there are a large number of available metrics under each category. The wizard lets you drag and drop items into the builder window. From there you can preview and then save your custom card for future use.

As a quick example I needed a quick way to see which datastores where connected to their respective hosts in each cluster so that consistency in datastore availability was maintained. It was as simple as dragging across Host:Name and Host:Datastore, putting in a filter to only view hosts of a certain name it was ready to go.

You have the option to preview and continue editing, or saving to the Card Designer main page. From that page you can execute the query. The results of my quick test card are shown below.

One thing I would like to see is an option to export the results to a csv or excel document…but other than that it’s a great example of what CloudPhysics is all about…data and how to get the most out of it as efficiently as possible.

« Older Entries