Monthly Archives: June 2015

Leap Second Bug: Worth a Double Check…

In 2008 I vividly remember the impact that leap year/day/seconds can have on systems that are not prepared to handle the changes in time or date. It was the 29th of February and at the time I was working for a Service Provider offering Hosted Exchange services based on Exchange 2007. All off a sudden my provisioning scripts stopped working and we could not add, remove or modify Exchange Mailboxes.

After a day of frustration working with MS Support and dreading a full system rebuild the problem seemed to disappear the following day…the 1st of March. At the end of the day and after a couple of days of Microsoft scratching their head the Exchange Engineering team realised that they hadn’t allowed for the leap day somewhere deep in the bowls of their code which resulted in all account modifications not working during the 24 hours of the leap day.

Fast forward five years and the Earth’s rotation continues to slow and we have a situation where system administrators and operations teams need to be aware of another out of the norm situation that could affect systems and platforms. This time it’s due to a leap second adjustment which is scheduled for 30th of June 2015 at 23:59:60 UTC and it may cause issues for devices and operating systems that are NTP synchronised. Older Linux kernels seem to be the most affected by leap second with most vendors releasing KB articles regarding the leap second impact and how to work around it.

While this is not something that will bring down the internet it’s still something that all infrastructure IT professionals should be aware of and be double checking all systems to ensure there are no embarrassing time related incidents come the 30th of June.

ESXi and Other VMware Products:

As per this KB, ESXi is not impacted by the leap second bug…but other appliance based solutions (mostly SUSE based) look to require the enabling of Slew Mode for NTP.

ESX/ESXi utilizes the RFC-1589 clock model, appropriately handling leap seconds.

It is not necessary to enable Slew Mode for NTP in ESX/ESXi’s NTP client, or to otherwise work around leap seconds by disabling and re-enabling the NTP client before and after the leap second’s occurrence. For more information, see Enabling Slew Mode for NTP (2121016).

However, while ESX/ESXi server is not expected to experience negative impact from a leap second taking place, it remains possible for Guest Operating Systems and/or running applications to experience an impact, independent of ESX/ESXi, if it is not designed to handle one. VMware recommends customers to test their complete solutions.

This KB lists all the affected platforms and the suggested fixes for them. For vCloud SPs running vCloud Director… as most Cells run off Red Hat Enterprise there should be no impact, however it’s worth double checking as time skew is the number one enemy of vCloud Director IaaS platforms.
.
Service Providers:
While most Cloud providers don’t manage client Operating Systems directly it would be a good move to put out some form of advisory so that clients protect their VMs before the leap second hits…if not there could be a lot of angry service desk calls relating to increased and unexplained CPU usage, application slowdowns, application crashes, and failures on startup.
.

Read more

PernixData – Giving FVP Away! #VFD5

At Virtualization Field Day 2015, PernixData CTO Satyam Vaghani presented to the VFD5 delegates on some of the new features being released by PernixData. Personally speaking FVP is already a great product and at times I wonder what more can be done to make it better. However from what I have heard and now seen at VFD5…PernixData are not going to rest on the current success of FVP.

In what is becoming less and less of a surprise these days with disruptive Tech startups PernixData are releasing a free version of FVP called “FVP Freedom” This will be a free, community based version of FVP.

The free edition will come with the following limitations:

  • One cluster only
  • DFTM Only (no SSD or PCIe acceleration)
  • Read Acceleration Only – 128GB Per Cluster write-through
  • Community support only

The fact you can only accelerate VM read workloads on RAM is a little bit restrictive (and resource expensive) but being able to use the DFTM-Z feature is seriously impressive and a very smart move by PernixData who openly state that they way FVP in every ESXi hosts on the planet! Grand plans forsure, but by releasing this free tier it allows enthusiasts to consume the product at it’s most capable.

For a further read up on the reset of the PernixData Announcements at #VFD5, head over to Duncan Epping‘s blog post here or have a read of James Green’s post here.

For those that are interested, you can pre-register for FVP Freedom here:

https://get.pernixdata.com/FVPFreedom

Additional Links:

Fully functional 30 day trial of FVP 2.5 here:

http://pernixdata.com/free-downloads

At Zettagrid, we have already taken advantage of what PernixData has to offer and have integrated the FVP solution into parts of our IaaS platform. Zettagrid Case Study:

http://pernixdata.com/resource/pernixdata-fvp-software-keeps-zetta-grid

vCloud Director SP: VM Metrics Database Configuration Part 2

vCloud Director SP 5.6.x has the ability to export VM statistics to an external database source which can then be queried via a set of new API calls. I’ve gone through a couple of different posts on how to configure the Cassandra/KairosDB data platform.

vCloud Director SP: VM Metrics Database Configuration Part 1
Installing and Configuring Cassandra and KairosDB on Ubuntu 14.04 LTS

This post finishes off the series and goes through the configuration of vCloud Director to start exporting VM metrics and then how to query the vCloud APIs to get those metrics. One thing to mention before continuing is that the vCD SP Documentation KairosDB v 0.9.1 is referenced as the version to install. Even though there are newer builds 0.9.1 is the only one tested and verified…other versions cause bugs and I have seen some strange results.

Configuring vCD SP Metric Database Connection:

Data for historic metrics is stored in a KairosDB database backed by a Cassandra cluster. Cassandra and KairosDB are configured you then use the cell-management-tool utility to connect vCloud Director to KairosDB. To create a connection from KairosDB to a vCloud Director, use a command line with the following form:

cell-management-tool configure-metrics options

Those familiar with the vCD cell.log entries you will notice a couple new startup entries…there is a new port (8999) thats bound for KairosDB communications and the tail end of the log above appears after the config..though I’m not sure what it actually does.

Once the cell services have been restarted the VM metrics should start to be collected by KairosDB. Wait about 5-10 minutes and then enter this URL into a web browser : http://IP-KairosDD:8080/api/v1/metricnames You should see the following displayed:

You can see that the results show metrics relating to VMs…this is a good thing! Using a RestClient you will see a much prettier output

API Calls to Retrieve Current and Historical Metrics:

The following API Calls are used to gather current and historical VM metrics for vCD VMs. The Machine ID required used the VM GUID as seen in vCenter. The ID can be sourced from the VM Name. The vCD Machine ID shown below in the brackets is what you are after.



Quick Fix: vCloud Air Gateway Unreachable

I’ve just come across a situation in my vCloud Air On Demand Service where the Edge Gateway was showing up as unreachable.

Given vCloud Director is backing the vCloud Air Platform I identified this as a rare, but familiar occurrence of the vShield Edge VM either not being deploying correctly, or somehow loosing connectivity with its Manager. The good news is the fix is straightforward…

Click on Manage in vCloud Director in the top Right of the VCA Console.

This will launch the vCD UI…from there Click on Administration -> Your vDC -> Edge Gateways…You should see a System Alert next to the Edge…clicking on that alert will actually reload the page (my assumption is that this is either a vCD UI Bug, or System Alert Popups have been blocked by VMware) so what you need to do is Right-Click on the Edge and select Re-Deploy…you will also notice that all other options are grayed out, confirming that the Edge VM is unmanageable.

As shown above the Redeploying Edge Gateway Status will be displayed while the VM is being redeployed in the backend. This could take about 5 minutes…once done heading back to the VCA Console, hitting the Refresh Button should result in the alert disappearing and the Edge Gateway is now manageable again.

Announced: Veeam 9 Cloud Connect Replication For Service Providers

Last week Veeam announced that version 9 of Backup & Replication will feature a new addition to it’s Cloud Connect product…Replication for Service Providers. The version 8 functionality will be extended to include advanced image-based VM replication.

The extended functionality will give service providers the ability to provide clients with RaaS (recovery-as-a-service) in the form of Veeam Cloud Connect Replication for Service Providers. This builds on Veeam Cloud Connect which made it easy for existing and new Veeam customers to extend their backup infrastructure to cloud based repositories for offsite backups. I worked closely with Veeam last year in productizing Cloud Connect for Veeam and adding it to the Zettagrid Product Catalog and will be looking forward to seeing how the replication feature will add to the service offering.

Included with Veeam 9, clients will get a fully integrated, secure and efficient way to send VM replicas to Cloud based repositories which will enable the protection of applications and services with dramatically improved recovery time objectives.

Cutting through the marketing of the press release the key features are listed below:

  • Built-in multi-tenant support to securely share host or cluster CPU, RAM, storage and networking resource allocation between different tenants;
  • Full site failover to a service provider site with just a few clicks on a secure web portal, including failover orchestration with failover plans, and partial failover to instantly switch over selected VMs only;
  • Built-in network extension appliances to preserve communication with and between production VMs regardless of their location;
  • Failback to the existing or new infrastructure to restore normal business operations;
  • Failover testing for seamless failover simulation without disrupting production workloads;
  • Single port connectivity via a secure, reliable SSL connection to a service provider; and
  • Multiple traffic reduction technologies including built-in WAN acceleration, replica seeding, and replication from a backup.

“Veeam Cloud Connect not only enables our users to fulfill the offsite requirement without having to invest in offsite infrastructure or management, but also presents new opportunities for service providers to build recurring revenue from their existing customer base, expand their presence in the DRaaS market, and establish relationships with new customers” – Ratmir Timashev, CEO of Veeam

This is an interesting (but expected) move by Veeam who are competing with the likes of VMware vCenter Replication and DRaaS leaders Zerto…who have a strong offering based on vCloud Director that also can be used by Hyper-V houses to replicate (hypervisor agnostic) in almost real time from on-premises to a Service Provider Cloud. There has been a lot of debate on inline replication vs snapshot based replication solutions and this move is sure to fuel that debate even more.

Hopefully I can get my hands on the beta shortly and start to pull apart the replication mechanisms. Upon first glance there doesn’t seem to be any vCloud Director integration which be somewhat surprising given the Service Providers out there who are strong vCD + Veeam partners. Would be a shame to not carry forward Veeam’s history of vCloud Director integration.

Looking forward to seeing what Veeam have brought to the table with V9!

Further Reading:

http://www.veeam.com/blog/the-new-veeam-cloud-connect-now-with-replication-services.html

NSX Edge vs vShield Edge: Part 2 – High Availability

Overview:

High Availability in both VSE and NSX Edges ensures Edge Network Services are always available by deploying a pair of Edge Appliances that work together in an active/passive HA cluster Pair. The primary appliance is in the active state and the secondary appliance is in the standby state. The configuration of the primary appliance is replicated to the standby appliance.

All Edge services run on the active appliance. The primary appliance maintains a heartbeat with the standby appliance and sends service updates through an internal interface. Declared Dead Time is used to work out via Heartbeating between both appliances when a HA event should take place. If the primary is declared dead the standby appliance moves to the active state and takes over the interface configuration of the primary.

For both NSX and VSE managed via the NSX Manager, HA can be triggered by the vCenter Web Client or API. The VSE can also have HA triggered through the vCloud Director UI or API.

Configuring NSX/VSE HA From Web Client:

Double Click on the Edge under the NSX Edge Menu Option in Networking and Security, In the Settings Tab under Configuration click on Change in the HA Configuration Box

Click on Enable and leave the rest of the settings as default. You do have the option to select the vNIC if multiple Interfaces exist. Leaving it as default if a safe option. Almost all documentation I have written on the default Declare Dead Time states that it is 6 seconds, however in the Web Client it defaults to 15. You also have the ability to configure specific IPs to use as Management or Cluster IPs for each HA Pair.

At this point a second Edge Appliance will be deployed into the vCenter and you will see an Edge appliance with -1 appended to the name. As shown below the NSX Manager will initiate the creation of a DRS Anti Affinity Rule to keep the Edges separate

Shown above is an example of both an NSX and vShield Edge and their anti affinity rule configured.

NOTE: For the HA settings to be applied to both Appliances at least one Interface (excluding Uplink) needs to be configured. If you don’t have an Interface configured the HighAvailability Service status on the Edge will be set to not running.

Configuring VSE HA From vCloud Director UI:

Depending on your Level of access to External Networks, right click on the Edge in the vCD UI and click on the Enable High Availability Check Box as shown below.

Enabling/Disabling/Viewing NSX/VSE HA With REST API

Below are the key API commands to configure and manage HA.





There is is nothing fundamentally enhanced in the NSX HA vs VSE, it’s a simple…easy to enable feature that adds a level of availability to Edge Networking services.

Sources and More Reading:

http://blogs.vmware.com/vsphere/2013/03/vcloud-networking-and-security-5-1-edge-gateway-high-availability.html

https://pubs.vmware.com/NSX-6/index.jsp#com.vmware.nsx.admin.doc/GUID-6C4F0C33-C6DD-432B-AA91-10AD6B449125.html

http://nsxtech.net/2014/09/20/understanding-high-availability-on-the-nsx-edge-services-gateway/

http://lostdomain.org/2014/10/18/vmware-nsx-best-practices-from-vmworld/

vCloud Director SP: Upgrading and Applying New License Key

Thought I’d put up a very quick post on the process of upgrading from vCloud Director 5.5.x to vCloud Director SP 5.6.x. Any previous license key that you had associated with your vCD instance will not work once you upgrade to the SP Build.

After you upgrade using the vCD SP 5.6.x binaries…upon first login you will be prompted to enter in a new Serial Number.

If you haven’t done so already you will need to log into your MyVMware Account and update your existing key via the licensing portal and go through the upgrade license key process shown below:

 

 

 

 

Once that’s done you can go back to the vCD UI and apply the new license code and you on your way.

Reference:

http://kb.vmware.com/kb/2006974 

NSX Edge vs vShield Edge: Part 1 – Feature and Performance Matrix

I was having a discussion internally about why we where looking to productize the NSX Edges for our vCloud Director Virtual Datacenter offering over the existing vCNS vShield Edges. A quick search online didn’t come up with anything concrete so I’ve decided to list out the differences as concisely as possible.

This post will go through a basic side by side comparison of the features and performance numbers…I’ll then extend the series to go into specific differences between the key features. As a reminder vCloud Director is not NSX aware just yet, but through some retrofiting you can have NSX Edges providing network services for vCD Datacenters.

Firstly…what is an Edge device?

The Edge Gateway (NSX-v or vCNS) connects isolated, stub networks to shared (uplink) networks by providing common gateway services such as DHCP, VPN, NAT, dynamic routing (NSX Only) , and Load Balancing. Common deployments of Edges include in the DMZ, VPN Extranets, and multi-tenant Cloud environments where the Edge creates virtual boundaries for each tenant.

Below is a list of services provided by each version. The + signifies an enhanced version of the service offered by the NSX Edge.

Service Description vSheld
Edge
NSX Edge
Firewall Supported rules include IP 5-tuple configuration with IP and port ranges for stateful inspection for all protocols
NAT Separate controls for Source and Destination IP addresses, as well as port translation
DHCP Configuration of IP pools, gateways, DNS servers, and search domains ✔+
Site to Site VPN Uses standardized IPsec protocol settings to interoperate with all major VPN vendors
SSL VPN SSL VPN-Plus enables remote users to connect securely to private networks behind a NSX Edge gateway ✔+
Load Balancing Simple and dynamically configurable virtual IP addresses and server groups ✔+
High Availability High availability ensures an active NSX Edge on the network in case the primary NSX Edge virtual machine is unavailable ✔+
Syslog Syslog export for all services to remote servers
L2 VPN Provides the ability to stretch your L2 network.
Dynamic Routing Provides the necessary forwarding information between layer 2 broadcast domains, thereby allowing you to decrease layer 2 broadcast domains and improve network efficiency and scale. Provides North-South connectivity, thereby enabling tenants to access public networks.

Below is a table that shows the different sizes of each edge appliance and what (if any) impact that has to the performance of each service. As a disclaimer the below numbers have been cherry picked from different sources and are subject to change…I’ll keep them as up to date as possible

  vShield
Edge (Compact)
vShield
Edge (Large)
vShield
Edge (X-Large)
NSX
Edge (Compact)
NSX Edge (Large) NSX Edge (Quad-Large) NSX Edge (X-Large)
vCPU 1 2 2 1 2 4 6
Memory 256MB 1GB 8GB 512MB 1GB 1GB 8GB
Disk 320MB 320MB 4.4GB 512MB 512MB 512MB 4.5GB
Interfaces 10 10 10 10 10 10 10
Sub Interfaces (Trunk)  –  –  – 200 200 200 200
NAT Rules 2000 2000 2000 2000 2000 2000 2000
FW Rules 2000 2000 2000 2000 2000 2000 2000
DHCP Pools 10 10 10 20,000 20,000 20,000 20,000
Static Routes 100 100 100 2048 2048 2048 2048
LB Pools 64 64 64 64 64 64 64
LB Virtual Servers 64 64 64 64 64 64 64
LB Server / Pool 32 32 32 32 32 32 32
IPSec Tunnels 64 64 64 512 1600 4096 6000
SSLVPN Tunnels 25 100 50 100 100 1000
Concurrent Sessions 64,000 1,000,000  1,000,000 64,000 1,000,000 1,000,000 1,000,000
Sessions/Second 8,000 50,000
LB Connections/s (L7 Proxy) 46,000 50,000
LB Concurrent Connections (L7 Proxy) 8,000 60,000
LB Connections/s (L4 Mode) 50,000 50,000
LB Concurrent Connections (L4 Mode) 600,000 1,000,000
BGP Routes 20,000 50,000 250,000 250,000
BGP Neighbors 10 20 50 50
BGP Routes Redistributed No Limit No Limit No Limit No Limit
OSPF Routes 20,000 50,000 100,000 100,000
OSPF Adjacencies 10 20 40 40
OSPF Routes Redistributed 2000 5000 20,000 20,000
Total Routes 20,000 50,000 250,000 250,000

Note: I still have a few numbers to complete specifically around NSX Edge Load Balancing and I’m also trying to chase up throughput numbers for Firewall and LB.

From the table above it’s clear to see that the NSX Edge provides advanced networking services and higher levels of performance. Dynamic Routing is a huge part of the reason why and NSX Edge fronting a vCloud vDC opens up so many possibilities for true Hybrid Cloud.

vCNS’s future is a little cloudy, with vCNS 5.1 going EOL last September and 5.5 only available through the vCloud Suite with support ending on 19/09/2016. When you deploy edges with vCloud Director (or in vCloud Air On Demand) you deploy the 5.5.x version so short term understanding the differences is still important…however the future lies with the NSX Edge so don’t expect the VSE numbers to change or features to be added.

References:

https://www.vmware.com/files/pdf/products/nsx/vmw-nsx-network-virtualization-design-guide.pdf

https://pubs.vmware.com/NSX-6/index.jsp#com.vmware.nsx.admin.doc/GUID-3F96DECE-33FB-43EE-88D7-124A730830A4.html

http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2042799

vCloud Director 8.0 Beta Kick Off – Initial Thoughts and Reaction

As posted a couple of weeks ago the Beta Program for the new SP Release of vCloud Director was announced and the kickoff was held yesterday morning (1st of June) Pacific time. When I joined the call there where about 200 other callers signed in which must have been a pleasing number for the vCD Team and shows that there is still significant interest in vCloud Director as a Cloud Management Platform.

The vCloud Product team went through a few of basic concepts of the value of running vCD as a natural extension of vCenter while being able to leverage the economics of the Public Cloud through providing IaaS. Possibly a case of preaching to the converted but there is always room for marketing slides and it gave a good overview of how VMware still see vCD as a going concern for SPs to offer Cloud Services.

Again, the power of the vCloud Air Ecosystem was talked about and vCloud Air Network Providers offer an existing install base that covers a good chunk of the Total Addressable Cloud Market in conjunction with VMware’s vCloud Air offering. As was the message last year…Hybrid Cloud is the key to short to medium term success for SPs before the likes of Containers and 3rd Platform Apps gain traction.

The above slide goes through vCD’s main benefits as seen by VMware and in truth there is no better way to consume VMware Compute and Storage Resources and being able to offer a different approach to IaaS compared to the likes of AWS, Azure and other Public Cloud providers that offer an instance based approach. There is much to be said in offering clients the flexibility of a Pool of Virtual DataCenter Resources from which to deploy Virtual Machines and Applications over the fixed VM instance types you find in the SPs mentioned above.

What’s New:

As listed in my previous post, there is a fairly extensive list of new features with the highlight for me being the improvement to the vApp construct in making it more flexible for customers to consume while also making it easier for SPs to deploy and provision vApps and VMs from the one complete API tool set.

While nothing new has been exposed by way of features the interoprability between vCD 8 and vSphere 6.0 and NSX 6.1.x lays important groundwork for what may come. There also seems to be a focus on resource control and improved Tenant Throttling. One interesting thing I noted was that there are already provisions to start to include vCloud Air services such as DRaaS into future releases and I was heartened to hear that a full future product roadmap would be made available

To get a full list and explanation around the new features, i’d encourage those interested to register at the Beta site and download the Webinar.

https://beta.vchs.vmware.com/callout/?callid=A14E81A118AC471C833BEF9FBEF34A87

Reactions:

The Q&A session at the end was dominated by questions around the fact that the GUI hasn’t been upgraded or improved since 5.5 and all new features are only accessible via API calls. In this lies the single biggest issue for current providers offering vCD…The GUI is outdated, tired and has always been somewhat unintuitive.

The solution to this from the vCD Product Team is for SPs to write their own UI (which Zettagrid does) or work with ISV Vendors who are developing separate fontends for vCD but the reaction to this was along the lines of…”You want us to pay externally for a GUI?” I detected that the majority of SPs in the room are not happy with the fact that development has continued without UI improvements.

Even though I myself am learning more and more around the consuming of Cloud Platforms via APIs I appreciate that the majority of the current vCD user base don’t have the luxury of a UI team and don’t wont to be forking more money to pay for an 3rd Party UI to access the new features.

What’s Required:

While I am not privy to the reasons behind the vCD Team stopping the development of the UI…what I can suggest is that if something isn’t done to at least bring the UI up to date to include the newer features (let alone making it more current) more and more vCD customers will turn to alternatives like OpenNebua vOneCloud, Platform9 or any variance of OpenStack to run their Public or Private IaaS Platforms.

For those that are participants of the Beta, please use the Feature Request Submission page to voice your own opinion around the state of the UI…maybe there will be enough noise for someone up the chain to take note.

VMware have somewhat resurrected vCloud Director from the ashes over the past 12 months, but if they don’t give it the attention it deserves it will continue to be known as a decent functional Cloud Management Platform that could have (and should have) been great.