Monthly Archives: June 2014

vCloud Director 5.5 Upgrade: Database Upgrade Issues

Generally speaking upgrading the vCloud Director binaries is a pretty straight forward task. It’s a two step process where the Cell’s are upgraded first followed by some update scripts run against the vCloud database. Finally database Indexes and Statistics are updated. In the last 6-8 months I’ve worked through 1.5 -> 5.1 and recently 5.1 -> 5.5 upgrades…as well as point releases and BETA builds in the lab…all of those upgrades have ultimately been successful, but as recently as last week I’ve come across a couple of issues when running the update database schema script. The second issue is of more significance as there is little reference to it on Google.

Issue 1:

This one is apparently fairly common, and doesn’t need too much concern.

The fix is straight forward and is explained in this post, but the workaround is to simply log into your SQL server and execute the stored procedure as instructed in the warning message from the upgrade script. Done, nice and easy!

Issue 2:

After the database upgrade tasks have been completed you are asked if you want to rebuild the database indexes and that it may take several minutes. I’ve found in the lab and in production this can take 1-2 minutes, however during our latest upgrade we received the message below

Even though this is listed as an extra step, we all know how critical indexes are to efficient database query execution so I wanted to try and get the rebuild done. The vCD database in question isn’t overly large (about 3-4GB) and suspecting a transient SQL connectivity error I backed up the database again and ran the upgrade database script once again…this is safe to do and it’s smart enough to work out that the database update schema tasks have been committed…however I received the same error. I didn’t want to blanket rebuild all Indexes and suspected the rebuild to be a little more targeted.

Thinking back to my old days of troubleshooting dodgy programmer queries I decided to fire up MSQL Management Studio and ran the Activity Monitor and filtered out for the vCD database. I ran the upgrade script again and was able to see the queries being run via the monitor tab. Right clicking on the suspected query I was able to view what was being run:

Against the vCD Database I executed the specific query as show below and it completed successfully in a tick under 3 minutes. There was a noticeable pause in the output messages about 2/3 the way through so my thinking is that the upgrade script has a timeout that was reached due to inactivity.

From here we where able to fire up the Cell and continue with the upgrade process. Again, this issue seems to be rare, but hopefully if anyone comes across it the fix is as straight forward as the update statistics fix.

How-To: vCloud Director and Log Insight

Recently v2.0 of VMware Log Insight became GA, and I’ve been playing around with it since it’s release. Having been on the BETA of the 1.5 Version the 2.0 Version is streets ahead in terms of usability and completeness. Living day in and day out within vCloud Director I decided to look at hooking up the vCloud Cell Logs to Log Insight and create a basic Dash Board to assist us in working through vCD logs.

First up you need to SSH into your vCloud Cell and head to:

From there you want to edit the log4j.properties file. Probably worth making a backup of the file before the edit. Go to the bottom of the file and append the following, making sure to substitute the xxx.xxx.xxx.xxx on the second line below with the IP or hostname of your Log Insight Server.

You can see that we can control the level of logging being sent through to Log Insight by editing that last line and changing the threshold to WARN or CRITICAL. Once that section has been added, head back to the top of the file and modify line 2 as shown below

What we are doing there is adding the source we just configured in the first step. Save the file and restart the vmware-vcd service

Load up the Log Insight Web GUI and go to the Interactive Analytics Page. If you hit search a couple of times you should start seeing vCloud Director related entries appear in the Events pane.

Click on the Add Filter Icon and sort by Source -> Contains and Enter in the host name of the vCloud Cell. Depending on the amount of hosts you are logging the Cell may appear in the list as you click into the search box. Hit the Search button and you will see your filtered log entries in the events pane. To make for easier reading, I choose the Field Tab which makes reading the entries a little easier.

 

Finally we can create a basic Custom Dashboard to view cell log numbers over time. With the above Filter in play, click on the Add to Dashboard icon which is on the right hand side of the search button and give a name relating to the Cell. In the example below I already have a Dashboard created so it appears in the drop down list…otherwise you can create a new one from this window.

 

After clicking on Add you can go back to the Log Insight Dash Boards to view your creation.

 

Again, its a very basic display literally showing you the number of events in a period of time, however the usefulness here is that if you have to search for an event you can drill down and perform an Interactive Analysis with a little more accuracy.

Watch out for more Content Packs to Come out for Log Insight…the library will only grow and give more value add to this tool.

Veeam Cloud Connect

I’ve written a couple of posts around the pain of backup products and I’ve talked about a world where we backup independent of the Application. Storage platforms like AWS S3 offer a place where objects can be stored for safe keeping and accessed upon request. There is a catch to that kind of storage in that there are costs to pull back data to start recovery processes…further to that the recovery process often involves a couple of steps to download and then restore the data.

The other issue is that Backup Products have a terrible history of not delivering on their promises and tend to lack scalability, reliability and have issues with basic recovery. With great confidence I can say that Veeam Backup and Replication is rare in this ecosystem in that it just works. From Veeam B&R v6.5 to v7 there has been relative stability and it’s place in the market is not disputed.

With Veeam v8 about to go into beta one of the new features that will be available is Cloud Connect Cloud Connect is a Service Provider enabled service that will allow current Veeam customers to backup to a remote repository directly from the Veeam B&R Console without the need for a VPN style connection.

As a 100% channel-focused company, Veeam will be the enabler of this connection:  service providers will be able to license the technology for their side of the Veeam Cloud Connect connection through the Veeam Cloud Provider (VCP) program. With it, they will be able to build their own remote repositories with an architecture that was built from the ground up to be multi-tenant and scalable.

I am looking forward to delivering a ZettaGrid Product based on Veeam 8 Cloud Connector and look forward to us providing services that leverage’s this new Veeam technology…

Veeam are about to deliver once again!

For an further introduction to Veeam Cloud Connect go here:

NSX Bytes – Unicast Transport Zone Creation Error

One of VMware NSX’s best features is the ability to have the VXLAN Control Plane working in Unicast Mode…removing the requirement for Multicast traffic on the underlying Physical network. This is a significant step forward in that Multicast often created issues for traditional network teams and getting it retrofitted to an existing platform was often a challenge.

Note: As mentioned in my previous post…these NSX bytes are not meant to contain detailed info regarding NSX Components…for more info on the VXLAN Enhancements in NSX, check our Anthony Burke’s NSX Compendium here: And read up on the different Transport Modes here:

Back to the error I came across whilst configuring the Transport Zones in my lab…The lab was an upgraded 5.1 vCenter to 5.5 deployment with a mixture of ESXi 5.1 and 5.5 Hosts and Clusters. After the Controllers where deployed and the Hosts and Clusters prepared I went about the Logical Network Preparation and enabled the VXLAN Transport and Segment ID.

When attempting to configure the Transport Zones I received these errors when trying to create Unicast or Hybrid Zones.

I checked back through the NSX Documentation for the System Requirements and by the letter of the law, everything checked out.

I had my vCenter 5.5, ESXi 5.5 and had prepped the VXLAN transport network as instructed. Checking through the platform I found that my Distributed Switch was still at Version 5.1.0

I upgraded it to Version 5.5.0 and attempted to create the Unicast Transport zone again…this time with success. In discussions with @pandom_ What the documentation doesn’t specify (that we could find) is that you absolutely require your Distributed Switch to be version 5.5. Some people may call that an obvious oversight, but when coming from an established and potentially mixed environment that’s been upgraded it’s something to be aware of. As long as all your member Hosts are 5.5 you can upgrade the dvS.

A Multicast Transport zone remains an option to help deal with legacy VXLAN deployments such as the ones configured with vCNS and vCloud 5.x. In my case the end result was two Transport Zones…one configured with Unicast, and one with Multicast.

Now the fun begins!

REF: http://pubs.vmware.com/NSX-6/index.jsp#com.vmware.nsx.install.doc/GUID-D8578F6E-A40C-493A-9B43-877C2B75ED52.html

http://networkinferno.net/nsx-compendium

 

NSX Bytes – Controller Deployment Gotchyas

There are a lot of great posts already out there in regards to install and configuration of NSX. Rather than reinvent the wheel I’ve decided to do a series of NSX Bytes relating to a couple of gotchyas I’ve come across during the config stage. This post will focus on the NSX Controller deployment which provides a control plane to distribute network information to hosts in your VXLAN Transport zone.

To get up to this part you would have had to deploy the NSX Management VM and prepare your management network which you specify in the IP Pool setup of the deployment. It’s suggested that you deploy three NSX Controllers for HA and resiliency. If successful you should see the Management Tab in the Networking and Security Section of the vCenter Web Client looking like this:

In my first attempt I managed to successfully deploy all three controllers without issue, however in my second Lab I ran into a couple of issue that initially had me scratching my head. It must be noted that there isn’t much, if any error feedback provided to you via the vCenter Client. To get more detail I enabled SSH on the NSX Manager GUI and tailed the manager log by running the command

The output is verbose but useful and I’d encourage familiarity with them. There is a Syslog setting than can send the logs out to an external monitoring system as well if you wish.

ISSUE 1: No Host is Compatible with the Virtual Machine

After setting off a new Controller Build you see vCenter Deploy the OVF Template, Reconfigure the VM and then Power off and Delete the VM.

This was due to the fact that the resource requirements for the Controller Template where not able to be met…specifically the vCPU Count. The ESXi Hosts in my lab where only capable of running VMs with 2vCPUs (1 Socket, 2 Cores) and because of that the deployment failed.

Key there is to ensure that the spec can be reached as shown above. This issue should be restricted to Lab Hosts, but none the less it’s one to look out for just in case.

ISSUE 2: Controller VM Appears to deploys successfully then gets deleted

After setting off a new Controller Build you see the OVF Template Deploy and start up. After about 10 minutes, If you launch the VM console you will see that the Controller is being configured with the right IP Pool settings and reboots in a ready state. Without warning the VM is shutdown and deleted.

Checking through the Manager Logs you see this entry relating to the destruction of the Controller VM

This one is actually pretty easy, and was a user error on my part. The logs clearly state that there was a timeout waiting for the controller to be ready. This was due to the wrong Connected To Network being selected during the Add Controller phase. This network must be able to contact the NSX Manager and vSphere Components…again an obvious error once you view the logs…but initially it just appeared like vCenter via the NSX Service account was deleting the VM for the hell of it.

 

Note: For more detailed install guides check our Chris Wahl’s Post here and Anthony Burke’s Post here to get up to speed on initial NSX Management, Controller and Transport Prep.

Follow Up: The NSX Roadblock

About 7 weeks ago I wrote a post detailing my frustration at the way in which VMware’s NSX was being trickled out into the market place and the hefty caveats that where involved in getting ones hands on NSX…even for dev and test purposes…The basis of my arguement was that while I understood the general stratagy around the go to market I felt that the NSBU was shooting themselved in the feet by not allowing select and trusted partners with better access to the NSX platform.

That post ended up being well read, well supported and there was a fair bit of noise made in social media circles with people sharing stories of similar frustration. I was encouraged to hear that, for the most…even those within the VMware NSX team where not completely happy with the way in which it’s availability had been restricted. Making things even worse was the increased social media noise about NSX training Sessions and increasing blog content…which only served to tease myself and others.

A couple of weeks after that post my company (ZettaGrid) started negotiations with the local NSBU team to make them try and understand our specific circumstances and requirements to use the NSX platform to enhance our already solid VMware backed Service Provider offerings. As I commented on previously…I felt that through a proven record of innovating and bringing VMware products (like vCloud Director) successfully to market, we could employ those same proven processes and apply them to NSX. That said, there was no need for a paid PoC and we felt we just needed the software and basically start to tear it apart in our labs.

So…after a few weeks of successful negotiations I have gone from a sense of frustration to an extreme sense of privilege. Being careful not to gloat for those that still are fighting to have NSX made more readily available, I understand that ZettaGrid and myself (while not withstanding the facts that there is absolute merit based on the position ZettaGrid holds in the ANZ IaaS space) are in a very good position to have NSX deployed in our labs ready for testing and production.

Again, it’s been an interesting journey to get from disappointment and frustration to where I am now…I don’t for a second take for granted how lucky I am to work for a company that has placed me in a position to be one of the few (relatively speaking) at the moment to have NSX. I would encourage those that are still chasing down a deal with the NSBU to keep on prodding…but understand that there are good reasons for the selective release and you must have the backing and justification to warrant access to the software as well as a proven track record as a trusted VMware Partner with a history of successful delivery.

I must admit that I haven’t felt this excited and raw about a piece of software since I first looked at vCloud Director…and even that to be fair doesn’t compare to the excitement I have already generated from only just scratching the surface of NSX…the potential is unbelievable.

Many thanks to Anthony Burke and the Australian NSBU Team for their support…I will be starting to post a couple of NSX related articles over the next few weeks to compliment the good work already being done out there in the community to educate.

NSX Related Content:

http://networkinferno.net/nsx-compendium

http://wahlnetwork.com/category/deep-dives/working-with-nsx/

http://blog.scottlowe.org/learning-nvp-nsx/