Monthly Archives: January 2016

New Book: Learning VMware NSX

Last year I was asked by @rjapproves if I would be interested in reviewing a book he was writing on VMware’s NSX-v platform. Ranjit approached me and was interested in having me as a technical reviewer based on the blog content I had done around NSX as part of the NSX Bytes series as well as the NSX vCloud Director Retrofit series. Having not done a critical review of technical material before I jumped at the opportunity…it also gave me an opportunity to validate the work I’ve done with NSX over the past 18 months and to have my contribution acknowledged along with co-reviewer @jfrappier.

The book acts as an introduction into the installation and configuration of NSX-v and works through the basics of getting NSX-v up and running in your vSphere platform. Ranjit goes through the concepts around all the core components that work together to make NSX-v tick.

The book is available on Amazon and is published through Packt Publishing. The kindle version is available now with the paperback shipping in early March.

Well done to Ranjit on pushing through and getting this book project done!

http://www.amazon.com/dp/1785886886/ref=cm_sw_r_tw_dp_lySQwb0753P2J 

vCloud Air Rumours – vCAN in Focus…Again | VMware Rethinks Strategy

Well…the news isn’t great filtering out over the internets about the VMware Job Cuts and the apparent clipping of vCloud Air’s wings. While this is yet to be 100% confirmed nor are there any specifics about what it actually means for the vCloud Air Network. If what I am reading is true and no more CapEx will be spent on existing vCloud Air zones then hopefully VMware has realised that the best way to fight the fight in terms of IaaS is to let its key partners deliver VMware based IaaS using core platform technologies from them such as vCenter, ESXi, NSX, vCloud Director and possibly throwing in VSAN.

Originally positioned as VMware’s public cloud service and a vehicle for customers to manage hybrid clouds, vCloud Air now offers specialty cloud services and software with characteristics unique to VMware, Gelsinger said. The vCloud Air service will still exist, but it sounds like the business’ main focus will be to provide techology for partner-run clouds.

UPDATE: Having just listened to the Earnings Call and reading through the transcript, I’ve included a key Pat Gelsinger quote below:

I’d like to take a moment to clarify our strategy for vCloud Air; the service will have narrower focus providing specialized cloud software and services unique to VMware and distinct from other public cloud providers. We will aggressively provide these innovations to our vCloud Air Network partners helping them to accelerate their growth.

VMware is creating cloud software and cloud services for cloud providers. It’s important to note that given that’s narrower focus, we believe the capital expenses we’ve already invested in vCloud Air will be adequate for our needs and that we expect our vCloud Air service to be accretive by the end of 2017.

There is a massive opportunity here for the vCAN and together with the news in December at the renewed vCloud Director push (which I assume is still happening) the time is now to work to fully exploit the power of the APIs that are offered and exposed as part of the VMware Cloud stack. vCAN Service Providers should be a little more relaxed this morning on the news of vCloud Air’s apparent scaling back and the worry that was front and centre in terms of VMware’s reluctance to drive business to partners and VMware competing against vCAN partners in deals…should go away.

Again…time will tell!

#LongLivevCD

References:

https://www.sdxcentral.com/articles/news/vmware-cuts-800-rethinks-cloud-vcloud-air/2016/01/

http://seekingalpha.com/article/3836736-vmware-vmw-ceo-pat-gelsinger-q4-2015-results-earnings-call-transcript?page=2

http://www.crn.com/news/cloud/300079456/microsoft-partners-fed-up-vmware-customers-are-switching-to-azure-cloud.htm

https://rcpmag.com/articles/2016/01/26/vmware-layoffs-begin.aspx

Sidenote:

I have to eat a little humble pie and give credit to CRN journalist Kevin McLaughlin who has been hot on vCloud Air for a while now and has put together a couple of articles on vCloud Air struggling…I still don’t agree with the claims that current VMware customers are flocking to AWS and Azure, but certainly if the news of today is correct I acknowledge the reporting 🙂

Veeam 9 Released – What’s in it for Service Providers…and their Customers

Last week Veeam released v9 of their Backup & Replication platform and I went through an listed out the top new general features of the v9 release. In that post I purposely left the features that relate to Veeam Cloud Service Providers as a dedicated post is fitting for the improvements and enhancements added around Cloud Connect and with the addition of Cloud Connect Replication.

At the moment there upwards of 7000 VCSPs around the world and much like the VMware vCloud Air Network these partners represent a ready made network of like for like platform targets for which customers can extend their onsite Veeam solutions to a VCSP of their choice. (Zettagrid is on of those providers and is Cloud Provider of the Year for the ANZ region) With Veeam 8, Cloud Connect was announced and released and has proven to be a popular service which started picking up significantly in the last half of 2015. With the success of Cloud Connect, which provided a great offsite repository location for clients it was no surprise that Veeam extended this functionality in v9 with Cloud Connect Replication.

Veeam Cloud Connect Replication:

The extended functionality will give service providers the ability to provide clients with RaaS (recovery-as-a-service) in the form of Veeam Cloud Connect Replication for Service Providers. This builds on Cloud Connect which made it easy for existing and new Veeam customers to extend their backup infrastructure to cloud based repositories for offsite backups. Cloud Connect Replication features includes:

  • A reserved set of compute and storage for DR with networking resource allocation from a service provider to dramatically simplify setting up replication jobs to the cloud
  • Full site failover to a remote DR site from anywhere with just a few clicks through the secure web portal (see pic below), and partial site failover to instantly switch over to selected VM replicas only
  • Built-in network extension appliances to simplify networking complexity and preserve communication with, and between, running VMs regardless of physical location
  • Failback to an existing or new infrastructure to restore normal business operations
  • 1-click failover orchestration for quick failover execution, and site failover testing for failover simulation without disrupting production workloads
  • Support for file level recovery from cloud replicas in case there are issues with local backups
  • Multiple traffic reduction technologies including built-in WAN acceleration, BitLooker
  • Single port connectivity via a secure SSL/TLS connection to a service provider with traffic encryption

Cloud Connect Improvements:

  • In addition to being included in the Enterprise Plus edition, this feature is now also included in the Enterprise edition for backup copy and replication jobs to Veeam Cloud Connect service providers
  • The ability for users to limit the maximum bandwidth consumption by each tenant on the service provider site to help protect all tenants using the same Cloud Gateway from a “noisy neighbor” problem has been added.
  • Switching the logging level for cloud service no longer requires the service to be restarted.

In addition to that Veeam have introduced full support for vCloud Director 8.0 and also have committed to future long term support of vCloud Director in light of the recent VMware announcements. There is a new Per VM Licensing Model to support VCSPs reporting structures and make things easier for reporting and billing of licensing and there is now full support for RESTful API for Service Providers in all product editions with that Per VM license.

While it’s easy to see how awesome Cloud Connect Replication will be for VCSPs to productize and offer true replication based RaaS there are some features that are available in the general v9 Backup & Replication engine that are not available for Cloud Connect just yet:

  • Scale Out Repositories
  • Per VM Backup File Chain Feature
  • vCloud Director Support

Those additional features are on the horizon and in my opinion can’t come soon enough…it will elevate Cloud Connect Replication even further. But overall another great update for the VCSP and I look forward to developing an offering around Cloud Connect Replication as soon as possible to go along with the existing Cloud Connect for Veeam.

References:

http://veeampdf.s3.amazonaws.com/new/veeam_backup_9_0_whats_new_en.pdf

https://www.veeam.com/blog/veeam-availability-suite-v9-is-here.html

Dealing with a Revoked vCenter SSL Certificate

Certificates and VMware don’t go together like a horse and carriage… And while I’ve never really had a major issue with SSL certs in VMware mainly because on a personal level I am ok with using self signed or default certificates (queue security nuts) I was forced recently to change a publicly signed vCenter SSL Certificate which also doubled as the Web Client SSL Certificate. This was due to VeriSign revoking the certificate that had been purchased on a per year renewal plan…the vCenter Client doesn’t like revoked certs.

Prior to vSphere 5.5 my usual trick of simply replacing the rui.crt and rui.key files in the vCenter/Web Client SSL folder and restarting vCenter didn’t work…in fact the vCenter Service (5.5 Update 2) won’t start if its done that way anymore…this is mainly due to the reliance on the SSO and Inventory services that don’t like the SSL thumbprint to be changed underneath them.

To resolve this I had to read through and learn how to use the VMware SSL Certificate Automation Tool. Once mastered it’s a great tool and lets you change/update all relevant vSphere SSL Certificates. Below is the quick and easy command line walkthrough to get the job done…note that you need to build up the SSL Certificate Chain correctly and make one small modification the ssl-environment.bat file

set ssl_tool_no_cert_san_check=1

A Couple of vCenter and Web Client service restarts later and the SSL Certificate has been replaced. While there are a lot more options there I only needed two steps to replace the original publicly signed certificate as all other certificates where the internally generated certs…As a specific heads up from the KB, these where the issues I ran into

  • SSL Certificate Update fails if vCenter Single Sign-On Password contains spaces or special characters such as &, ^, %, <.If the vCenter Single Sign-On password has a space or any special characters, such as &, ^, %, or <, the configuration of the Inventory service fails.To work around this issue, change the vCenter Single Sign-On password so it does not contain a space or any of the special characters &, ^, %, < in it.

  • If the certificate chain file for vCenter Single Sign-On is out-of-order, you see an error similar to:Certificate chain is incomplete: the root authority certificate is not present and could not be detected automatically. The presence of the root certificate is required so the other service can establish trust to this service. Try adding the authority certificate manually.To resolve this issue, ensure that the certificate chain file for vCenter Single Sign-On is created in the correct order. For more information, see Generating certificates for use with the VMware SSL Certificate Automation Tool (2044696).

References:

http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2057340

https://my.vmware.com/group/vmware/details?productId=351&downloadGroup=SSLTOOL550

 

 

Veeam 9 Released: Top New Features

This week Veeam released to GA version 9 of their Backup & Recovery product. It’s a significant release for Veeam for a number of reasons and after having attended their VeeamOn event Las Vegas late last year I believe this v9 release is their best to date and is representative of a company that’s listened to the specific pain points of their customers, looking to address the challenges of modern Virtual Machine backups and looked to sure themselves up against existing and up and coming challengers in their market space.

During VeeamOn (and throughout the 2015) Veeam announced the four or five key new killer features of v9. For a look back at what was talked about during last year have a look at my previous posts here. Below I’ve listed my top new features for the whole product set that I think should make existing Veeam customers upgrade at their first opportunity.

Standalone console The standalone console provides every user convenience, flexibility and ease of use by separating the Veeam B&R console from the backup server for installation on laptops and desktops, eliminating RDP sessions to a backup server. you can run multiple consoles at once on the same system.

Veeam Cloud Connect Replication Ensure Availability of your mission-critical applications without the cost and complexity of building and maintaining a disaster recovery site. Cloud Connect Replication provides fully integrated, fast and secure cloud based DR through a service provider

Per VM backup file chains. The Per-VM backup file chains provide a new backup repository
option that makes any backup job, that is writing to a repository, store each VM’s restore point in a dedicated backup file. This results in delivering up to 10x faster backup performance with multiple write streams by leveraging parallel VM processing for backup storage with limited ingest rate per stream — as is the case with most deduplicating storage appliances.

Increased job concurrency. v9 provides improved backup server stability when running over 100 jobs concurrently. It’s important to keep pe job memory requirements in mind when opting to start this many multiple jobs at once

vPower cache. vPower will now cache recently accessed backup file blocks in RAM, which will help speed up all functionality that relies on Instant VM Recovery

Full backup file defragmentation and compaction. Decrease the size and fragmentation of full backup files produced by forever incremental primary backup jobs by recreating backup files periodically based on the actual data, while moving obsolete data into the dedicated files which can be manually deleted or archived as necessary. This functionality lets you remove the data associated with deleted VMs, virtual disks or applications from full backup file without having to perform an active full backup

Backup copy parallel processing. Backup copy jobs will now process multiple VMs in parallel, just like primary backup jobs. This improves the backup copy and retention processing performance due to removing “dead time” between each VM

VM tags backup and restore. VM tags are now backed up along with all of the other VM properties, and the full VM restore wizard provides a new option to restore them

Improved Direct SAN restore performance. Direct SAN restore process will now create eager zeroed disks (as opposed to lazy zeroed), this was found to improve full VM restore performance in most cases.

Multi-user support. Backup administrators will now be warned of conflicting edits when attempting to save changes after editing the same job concurrently

Missing backup files pruning. The backup properties dialog now shows missing backup files and with only a few clicks allows users to easily remove these files, as well as backup files that are dependent on missing files

Infrastructure cache. To remove the wait time for virtual infrastructure objects to be loaded, the user interface now uses an infrastructure cache in certain places, such as in the Backup Job wizards and in the Virtual Machines tab. The default cache expiration time of 15 minutes can be changed by creating the InfrastructureCacheExpirationSec (DWORD) registry value under HKLM\SOFTWARE\Veeam\Veeam Backup and Replication key

That’s a pretty significant list of the enhancements as I see it. Those above are in addition to the Scale Out Backup Repository, BitLooker, EMC San Support and other major improvements announced over the past few months. I haven’t gone into detail around the enhancements for Veeam Cloud Service Providers in this post but I will be doing a separate post over the next few days going over the key enhancements for VCSPs.

If you have Veeam 8 running so yourself a favor and go through the required change controls to upgrade to v9…your backups will thank you 🙂

https://www.veeam.com/data-center-availability-suite.html 

References:

http://veeampdf.s3.amazonaws.com/new/veeam_backup_9_0_whats_new_en.pdf

https://www.veeam.com/blog/veeam-availability-suite-v9-is-here.html

Heads Up: Heavy VXLAN Traffic Causing Broadcom 10GB NICS to Drop

For the last couple of weeks we have had some intermittent issues where by ESXi network adapters have gone into a disconnected state requiring a host reboot to bring the link back online. Generally it was only one NIC at a time, but in some circumstances both NICs went offline resulting in host failure and VM HA events being triggered. From the console ESXi appears to be up, but each NIC was listed as disconnected and when we checked the switch ports there was no indication of a loss of link.

In the vmkernal logs the following entries are observed:

After some time working with VMware Support our Ops Engineer @santinidaniel came aross this VMwareKB which described the situation we where seeing. Interestingly enough we only saw this happening after recent host updates to ESXi 5.5 Update 3 builds but as the issue is listed as being present in ESXi 5, 5.5 and 6.0 that might just be a side note.

The cause as listed in the KB is:

This issue occurs when the guest virtual machine sends invalid metadata for TSO packets. The packet length is less than Maximum Segment Size (MSS), but the TSO bit is set. This causes the adapter and driver to go into a non-operational state.

Note: This issue occurs only with VXLAN configured and when there is heavy VXLAN traffic.

It just so happened that we did indeed have a large customer with high use Citrix Terminal Servers using our NSX Advanced Networking…and they where sitting on a VXLAN Virtualwire. The symptoms got worse today that coincided with the first official day of work for the new year.

There is a simple workaround:

That command has been described in blog posts relating to the Broadcom (which now present as QLogic drivers) drivers and where previously there was no resolution, there is now a fix in place by upgrading to the latest drivers here. Without upgrading to the latest certified drivers the quickest way to avoid the issue is to apply the workaround and reboot the host.

There has been recent outcry bemoaning the lack of QA with some of VMware’s latest releases but the reality is the more bits you add the more likelihood there is for issues to pop up…This is becoming more the case with ESXi as the base virtualization platform continues to add to it’s feature set which now includes VSAN baked in. Host extensions further add to the chance of things going wrong due to situations that are hard to test in as part of the QA process.

Deal, fix…and move on!

References:

http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2114957

https://my.vmware.com/web/vmware/details?downloadGroup=DT-ESXI55-QLOGIC-BNX2X-271250V556&productId=353

 

Quick Fix: vCenter 5.5 Update 3x Phone Home Warning and VPXD Service not Starting

This week I’ve been upgrading vCenter in a couple of our labs and came across this issue during and after the upgrade of vCenter from 5.5 Update 2 to 5.5 Update 3a or 3b. During the upgrade of the vCenter the error below is thrown.

It’s an easy one to ignore as it only relates to the Phone Home Service…which to be honest I didn’t think would or was important at the time. When you click ok the installed finished as being successful, however the vCenter Service is not brought up automatically and when you go to start the service you get the following error from the services manager.

Not sure why the Googling for this particular error wasn’t as straight forward to search against but if you search to Error 1053 or Error 1053 + VMware you get referenced to some generic forum issues and this VMware KB which is a red herring in relation to this error. With that I went back to search against the Phone Home Warning 32014 and got a hit against this VMware KB which contains the exact error and reference to the deployPkg.dll that you would see in the Windows Application Event Logs when you try to start the vCenter Service.

The KB title is a little misleading in that it states

Updating vCenter Server 5.5 to Update 3 fails with the error: Warning 32014

However the fix is the right fix and after working through the work around in the KB the upgrades went through without issue and vCenter was at 5.5 Update 3b.

References:

http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2134141

http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2069296

Quick Fix: Python Error When Removing PernixData FVP Host Extensions

Came across an issue this morning trying to remove old 2.0 PernixData FVP Host Extensions from an ESXi 5.5 Update 3a Host. When running the uninstall script I was getting the error shown below.

There is an old known issue with the version of Python that gets installed with the latest updates of ESXi 5.x and older versions of the FVP Host extensions.

FVP compatibility with ESXi 5.5, 5.1, and 5.0, and Python 2.7
Date announced: March 31, 2015
Upgrade issues exist with various ESXi 5.5, 5.1, and 5.0 releases that upgrade to Python 2.7, which is not compatible with FVP 2.5.0.1 or lower. Upgrading to any ESXi 5.5, 5.1, or 5.0 patch that upgrades the Python version to 2.7 (or later) requires an FVP upgrade to 2.5.0.2 or later. For additional information, please reference KB 1230.

That internal PernixData KB isn’t working at the moment, but I worked out a series of steps to get to a point where the FVP Extensions can be removed. If you have FVP installed with these versions:

  • FVP 1.5
  • FVP 2.0.0.0-3
  • FVP 2.0.1.0-5
  • FVP 2.5.0.0-1

You can apply the same steps to resolve the issue.

  1. Check the Status of the FVP Service
  2. Install over the top of the currently installed version at least 2.5.0.4-37360
  3. Reboot the Host
  4. Run the Uninstall Script

The Shell Dump is below of the steps above.