vCloud Director SP: VM Metrics Database Configuration Part 1

vCloud Director SP 5.6.3 was initially released in October 2014 was the first of the SP Editions that had been forked from the Enterprise 5.x builds that came before it. VMware delivered on their promise to release vCloud Air Features to Network Partners. One of those features was VM Metrics.

  • Virtual machine monitoring: Expose current and historical VM performance metrics to tenants through this tenant visible, multi-tenant safe API. Using the API, tenants can troubleshoot application performance problems, auto-scale their applications, and perform capacity planning.

To facilitate the storing of those VM Metrics a separate database needs to be deployed and and then have vCloud configured to use the new database instance.

No worries right?

The catch here is that VMware have decided to use a linearly scalable and fault tolerant database called KairosDB backed by a Cassandra cluster. To satisfy the requirements as set by the vCloud Director team you need to deploy a three node Cassandra Cluster.

While initially a little annoyed that we would have to deploy and manage another group of servers and services I do understand the decision for the vCloud Dev Team to go with a highly scalable database platform…after all development is done for the vCloud Air Service and then handed down to SPs after. At scale it makes sense to use something like this, however it would have been nice to have the option to use a “light” database option like MSSQL or similar…that would have made sense but lets move on!

There isn’t a lot of sizing around the VM Requirements for this Cassandra cluster so I went with three VMs with 1vCPU and 4GB of vRAM with 100GB of storage to start with. The is no guidance on the growth projections so at this point it’s a case of wait and see. Would be good to have someone from the team give estimates on the size of the cluster relative to the number of VMs in an environment.

In the lab I’m using Ubuntu 14.10 but this would apply to the current 14.04.1 LTS release as well. I’ve linked to the Debian install and config at the end of this post…Below are the quick and nasty steps to download the Cassandra tarball and the required packages to build and run a single server Cassandra instance. In production I would spend a bit more time tweaking the config however it gives you an instance to wrap your head around.

The Video below shows the whole process end to end.

To satisfy the three node Cassandra Requirement we need to repeat the steps on the next two VMs and configure for HA.

Go to the extracted location and the to /conf/cassandra.yaml and edit the config file entries listed below. For the seeds, there is no need to enter in the master IP. For an explanation on the config options the config file is well commented.

Once that’s been configured on every node in the cluster, restart the Cassandra Services. To validate the cluster status use the nodetool command under the bin folder and you should see the following:

In the next post I’ll work through the KairosDB install and configuration as well as tying it all together with vCloud Director and I’ll even attempt to pull some VM stats via the API.

Cassandra Debian Install Guide: http://www.datastax.com/documentation/cassandra/2.0/cassandra/install/installDeb_t.html

VMware vCD SP 5.6.3 Document Link: http://pubs.vmware.com/vcd-56/index.jsp#com.vmware.vcloud.install.doc_56/GUID-E5B8EE30-5C99-4609-B92A-B7FAEC1035CE.html

7 comments

  • Hi Anthony

    I agree with your comments… another DB platform!! Never mind… Like you say, it seems to be the right DB for the job.

    Anyway, a question for you on your deployment. We tend to keep our public facing stuff (like the VCD cells) on one network with the SQL, AD, vCenters, etc., all on a separate network with firewalls in front and behind the VCD cells. I’d planned to put the Cassandra/KairosDB servers on the “internal” network with the SQL servers.

    However, the VMware docs say you should install Cassandra/KairosDB “on at least three machines that are connected to the same network that your vCloud Director cells use”.

    It looks as though it’s only a couple of ports it’ll be communicating across (port 4242 or 8080) so I guess it should be fine if I get those ports opened on the firewall. I just wondered how you had done this in your environment and if I’m setting myself up for a fall here!

    thanks!

    Dave

    • There really isn’t a lot specifics in the doco right 🙂 I can’t see any issues as long as the Cells and talk to the db servers on the right ports. Are you going to load balance them? I deployed a HAProxy Cluster on all the Cassandra/KairosDB nodes so it’s all self contained in its resiliency…in terms of sizing I still dont have an idea. Did you manage to get your KairosDB pointing to disk over memory for storage?

  • Must admit, I’ve not actually set it up yet. I got as far as getting 3 CentOS VMs up and running then thought I’d better do some reading and see what it was all about 🙂 I’ve not looked at disk over memory for storage and had only briefly considered the load balancing aspect so far. Thought I would use the same load balancing that I’m using for vCD portal and console but will have to look into HAProxy.

  • Well I got a 3 node Cassandra cluster with KairosDB on all three – just HAProxy to put on top which doesn’t look to bad. Being impatient, I thought I’d point vCD at one of the Cassandra nodes and see what I get. I left some VMs powered on in VCD and left it for a day. Next day, had a go at querying the stats through the API, I get no historical metrics for my vCD managed VMs at all. However, looking at the tags that KairosDB has put into the database, I am getting data logged for some VMs. Here’s the odd bit…. I’m getting no data for the VMs that are managed by vCloud Director but I am getting data for the test VMs that I left powered on in the root of the vCenter – ones that VCD knows nothing about. Wierd….

    On the cell, I’m getting a load of the following in vmware-container-debug.log every 5 minutes (that must be how often it writes stats to KairosDB).

    2015-01-28 13:32:58,613 | ERROR | Thread-12520 | KairosdbStatsReceiver | Error processing entity stats |
    com.vmware.ee.statsfeeder.kairosdb.KairosdbException: Bad HTTP status 7
    at com.vmware.ee.statsfeeder.kairosdb.KairosdbClient.writeWithNesting(KairosdbClient.java:99)
    at com.vmware.ee.statsfeeder.kairosdb.KairosdbClient.write(KairosdbClient.java:48)
    at com.vmware.ee.statsfeeder.kairosdb.KairosdbStatsReceiver.receiveStats(KairosdbStatsReceiver.java:107)
    at com.vmware.ee.statsfeeder.StatsRetriever.onComplete(StatsRetriever.java:141)
    at com.vmware.ee.statsfeeder.StatsRetriever.access$300(StatsRetriever.java:24)
    at com.vmware.ee.statsfeeder.StatsRetriever$1.run(StatsRetriever.java:189)

    I ended up doing some packet capture on the cell and can see POSTs going to the KairosDB address and in quick succession. You can see some requests including datapoints for the VMs that I’m getting nothing in the DB for but the format looks the same as those I am getting datapoints for in the DB. So in all, a little bit strange. Need to go trawling for some logs on KairosDB and/or Cassandra I think to see what might be going on.

    Did you come up against anything like this or are you happily querying data out now? 🙂

    • hi @davelee212 … did you got any resolution for the “Error processing entity stats” issue? I am also encountering this issue in my environment. However, on KairosDB I don’t see any issue.

      Regards,
      Vish

      • Hi Vish

        In my case I had installed the very latest version of KairosDB. It no longer accepts certain characters, such as brackets, in the data that’s inserted into the DB. Because all VMs created by vCloud Director have the ID in brackets, it would fail to add in any data.

        The solution was to use the version of KairosDB that VMware list as being supported in the vCloud documentation. I learnt my lesson the hard way. Always use the version of software that the documentation tells you to use, not just the latest version available. It doesn’t always work 🙂

        Hope that helps

        Dave

  • Hi Anthony,

    Great article mate, since your blog there is been a change, I was unable to download from

    wget http://psg.mtu.edu/pub/apache/cassandra/2.1.2/apache-cassandra-2.1.2-src.tar.gz
    tar -xvf apache-cassandra-2.1.2-src.tar.gz

    this server was not reachable.

    instead I’ve used the following on an ubuntu 14.10 server:

    echo “deb http://debian.datastax.com/community stable main” | tee -a /etc/apt/sources.list.d/cassandra.sources.list

    curl -L http://debian.datastax.com/debian/repo_key | apt-key add –

    apt-get update

    apt-get install dsc21

    apt-get install cassandra-tools

    service cassandra stop

    rm -rf /var/lib/cassandra/data/system/*

    nano /etc/cassandra/cassandra.yaml

    the following rules I’ve set on the ubuntu firewall as I was getting an exception error on port 7199.

    sudo ufw allow 8888
    sudo ufw allow 7000
    sudo ufw allow 7001
    sudo ufw allow 7199
    sudo ufw allow 9042
    sudo ufw allow 9160
    sudo ufw allow 61620
    sudo ufw allow 61621

    another quick note, it is important to have the listen address pointing each servers own IP address, this might be obvious for some however it is not clearly documented in the standard documentation.

    again many thanks for this great post, very useful!