Category Archives: Backup

AWS re:Invent – Expectations from a VM Hugger…

Today is the first day offical day of AWS re:Invent 2017 and things are kicking off with the global partner summit. Today also is my first day of AWS re:Invent and I am looking forward to experiencing a different type of big IT conference with all previous experiences being at VMworld or the old Microsoft Tech Eds. Just buy looking at the agenda, schedule and content catalog I can already tell re:Invent is a very very different type of IT conference.

As you may or may not know I started this blog as Hosting is Life! and the first half of my career was spent around hosting applications and web services…in that I gravitated towards looking at AWS solutions to help compliment the hosting platforms I looked after and I was actively using a few AWS services in 2011 and 2012 and attended a couple of AWS courses. After joining Zettagrid my use of AWS decreased and it wasn’t until Veeam announced supportability for AWS storage as part of our v10 announcements that I decided to get back into the swing of things.

Subsequently we announced Veeam Availability for AWS which leverages EBS snapshots to perform agentless backups of AWS instances and more recently we where announced as a launch partner for VMware Cloud on AWS data availability solutions. For me, the fact that VMware have jumped into bed with AWS has obviously raised AWS’s profile in the VMware community and it’s certainly being seen as the cool thing to know (or claim to know) within the ecosystem.

Veeam isn’t the only backup vendor looking to leverage what AWS has to offer by way of extending availability into the hyper-scale cloud and every leading vendor is rushing to claim features that offload backups to AWS cloud storage as well as offering services to protect native AWS workloads…as with IT Pros this is also the in thing!

Apart from backup and availability, my sessions are focused on storage, compute, scalability and scale as well as some sessions on home automation with Alexa and alike. This years re:Invent is 100% a learning experience and I am looking forward to attending a lot of sessions and taking a lot of notes. I might even come out taking the whole serverless thing a little more seriously!

Moving away from the tech the AWS world is one that I am currently removed from…unlike the VMware ecosystem and VMworld I wouldn’t know 95% of the people delivering sessions and I certainly don’t know much about the AWS community. While I can’t fix that by just being here this week, I can certainly use this week as a launching pad to get myself more entrenched with the technology, the ecosystem and the community.

Looking forward to the week and please reach out if you are around.

VCSP Important Notice: 9.5 Update 3 RTM Is Out…With Insider Protection and more!

Earlier this week, Veeam made available to our VCSP partners the RTM of Update 3 for Backup & Replication 9.5 (Build 9.5.0.1335). Update 3 is what we term a breaking update, meaning that if a Cloud Connect tenant upgrades from any previous 9.5 version before VCSPs this will break backup or replication functionality. With that in mind the RTM has been made available for our VCSP partners to ensure it is installed and tested before being pushed out to production before the GA release. Veeam Backup & Replication releases from 8.0 (build 8.0.0.2084) can write backups to a cloud repository on 9.5 Update 3, and any release from 9.0 (build 9.0.0.902) can write replicas to a cloud host on 9.5 Update 3.

Update 3 is a very significant update and contains a number of enhancements and known issue fixes with a lot of those enhancements aimed at improving the scalability of the Backup & Replication platform that VCSPs can take advantage of. One important note is around new licensing for Cloud Connect Backup that all VCSPs should be aware of. There is a detailed post in the VCSP Forums and there will be emails sent to explains the changes.

We have also pushed out a number new features for our VCSPs with two of them highlighted below. One of which is the new Insider Protection feature or Recycle Bin for Cloud Connect Backups and the other is the a long awaited ask from our providers in the Maintenance Mode for Cloud Connect.

  • Insider protection: Option to hold backups deleted from a tenant’s cloud repository in a “recycle bin” folder for a designated period of time. For more information, see this post in the VCSP forum.

    • Maintenance Mode: Allows you to temporarily stop tenant backup and backup copy tasks from writing to cloud repositories. Already running tenant tasks are allowed to finish, but new tenant tasks fail with an error message indicating that the service provider infrastructure is undergoing maintenance. This is supported at the tenant end in 9.5 Update 3 GA, Agent for Windows 2.1 and Agent for Linux 2.0.

There has also been a lot of work to improve and enhance scalability in the Backup & Replication Cloud Connect functionality to accomodate the increasing usage of Veeam Agent for Windows of which there is a new version (2.1) coming in early December and prepare for the release of Veeam Agent for Linux (2.0) that will include support for backups to be sent to Cloud Connect repositories. For the recently released Veeam Availability Console, Update 3 is 100% compatible with the 2.0 GA (Build 2.0.1.1319) released last week and is good from Update 2 or later.

Conclusion:

Once again, Update 3 for Veeam Backup & Replication is an important update to apply for VCSPs running Cloud Connect services in preparation for the GA release which will happen in about two weeks. Once released I’ll link to the VeeamKB for a detailed look at the fixes but for the moment, if you have the ability to download the update do so and have it applied to your instances. For more info in the RTM, head to the VCSP Forum post here.

Veeam Availability Console – What’s in it for Service Providers

Today, the Veeam Availability Console was made GA meaning that after a long wait our new multi-tenant service provider management and reporting platform is available for download. VAC is an significant evolution of the Managed Backup Portal that was released in 2016 and acts as a central portal for Veeam Cloud and Service Providers to remotely manage and monitor customer instances of Backup & Replication including the ability to monitor Cloud Connect Backup and Replication jobs and failover plans. It also is the central mechanism to deploy and manage (Windows) agents which includes the ability to install agents onto on-premises machines and apply policies to those agents once deployed.

Veeam® Availability Console is a cloud-enabled platform built specifically for Veeam Cloud & Service Provider (VCSP) partners and resellers looking to launch a managed services business. Through its ability to remotely provision, manage and monitor virtual, physical and cloud-based Veeam environments without any special connectivity requirements, Veeam Availability Console enables you to increase revenue and add value to all your customers.

  • Simplified Setup – now allowing on-premises installs
  • Remote backup agent management and monitoring
  • Remote discovery and deployment with enhanced support for Veeam Cloud Connect
  • Web-based multi-tenant portal
  • Native billing and RESTful APIs
Cloud Connect Requirement:

The Cloud Connect Gateway is central to how the Veeam Availability Console operates and all management traffic is tunneled through the Cloud Connect Gateways. If you are a current VCSP offering Cloud Connect services then you already have the infrastructure in place to facilitate VAC, however if you are not a Cloud Connect partner you can apply for a special key that will enable you to deploy a Gateway without the need for specific Cloud Connect backup or Replication licenses.

For a deeper look at VAC architecture for Service Providers, head to Luca Dell’Oca’s VAC series here.

Designed for Service Providers First:

The Veeam Availability Console was designed from the ground up for Service Providers (there is an Enterprise version available) and contains a rich set of APIs that can be consumed for automation and provisioning purposes. There is also a three tier multi-tenancy design allowing VCSPs the ability to create restricted accounts for their partners or resellers from which in turn, another level of accounts can be created for their customers or tenants.

The multi-tenancy aspect means that partners/resellers and customers can control their own backups centrally from the console. Reporting on backup jobs can be viewed and a mechanism to control those jobs is available allowing retry/stop/start tasks against those jobs. If that’s not enough control or more troubleshooting on failed jobs needs to be done the Remote Console feature introduced in Veeam Backup & Replication Update 2 has been integrated into the console.

VAC also includes built in reporting and billing functionality which enables VCSPs who don’t have the capability for automated reporting and billing to offer that to their customers. The reporting can be accessed via the API meaning that if an existing billing engine is being used there is the possibility to have that interface with VAC to pull out key data points.

The Service Provider Opportunity:

Over the past year I’ve talked a lot about the opportunity that exists for Veeam’s Cloud and Service Providers to take advantage of the opportunity that exists with Veeam’s Agents to capture backups for workloads that previously were out of reach. VAC is central to this and opens up the ability to backup instances that live on-premises (physical or virtual) or in any public cloud hyper-scaler or otherwise.

If you are a reseller looking to cash in on the growing data availability market then you should be looking at how VAC can help you get started by leveraging the features mentioned above . Secondly, if you a reseller and not running Cloud Connect Backup or Replication then the time is right to start looking at getting Cloud Connect deployed and start generating revenue around backup and replication services.

For those existing VCSPs that are offering Cloud Connect services, adding VAC into the mix will allow you to take advantage of the agent opportunity that exists as shown above while also adding value to your existing Managed Backup and Cloud Connect services.

References and Product Guides:

https://www.veeam.com/vac_2_0_release_notes_rn.pdf

https://helpcenter.veeam.com/docs/vac/deployment/about.html?ver=20

https://www.veeam.com/availability-console-service-providers-faq.html

https://www.veeam.com/vac_2_0_whats_new_wn.pdf

Veeam Vault #9: Backup for Office 365 1.5 GA, Azure Stack and Vanguard Roundup

Welcome to another Veeam Vault! This is the ninth edition and given the last edition was focused around VMware and VMworld I thought just for a change, the focus for this edition will be Microsoft. Reason for that is over the past couple of weeks we have had some significant announcements around Azure Stack and the GA release of Backup for Office 365 1.5. I’ll cover both of those announcements, share some Veeam employee automation work that shows off the power of our new APIs and see what the Veeam Vanguard’s have been blogging about in the last month or so.

Backup for Office 365 1.5 GA:

The early part of my career was dedicated to Exchange Server however I drifted away from that as I made the switch to server virtualization and cloud computing. The old Exchange admin in my is still there however and it’s for that reason that I’m excited about the GA of our Backup for Office 365 product which is now at version 1.5. This release caters specifically for service providers adding scalability and automation enhancements as well as extended support for on-premises and hybrid Exchange setups.

New features and enhancements:

  • Distributed, scalable architecture: Enhanced scalability in distributed environments with several Remote Offices/Branch Offices and in service providers infrastructures
  • Backup proxies: take the workload off the management server, providing flexible throttling policy settings for performance optimization.
  • Support for multiple repositories: Streamlines data backup and restore processes.
  • Support for backup and restore of on-premises and hybrid Exchange organizations: Allows a variety of configurations and usage scenarios and implement those that meet your particular needs.
  • Increased performance: Restore operations allows for up to 5 times faster restores than in v1.0.
  • Restore of multiple datastore mailboxes using Veeam Explorer for Microsoft Exchange: simplifies workflow and minimizes workload for restore operators, as well as 1-Click restore of a mailbox to the original location.
  • RESTful API and PowerShell cmdlets: Helpful for automation of routine tasks and integration into existing or new portals.
  • UI Enhancements: Including main window, wizards, dialogs, and other elements, facilitating administration of the solution.
Examples of the Power of the Veeam APIs:

One of the features of Backup for Office 365 was the addition of a power set of RESTful APIs and PowerShell commandlets that are aimed are service providers automating the setup and management of their offerings around the product. A couple of our employees have written example interfaces for the Backup for Office 365 product and it shows that any service provider with some in house programming skill set can build customer portals that enhances their offerings and increases efficiency through automation.

Special welcome to Niels who this week joined our team. Great to have you on board!

Microsoft Azure Stack Support:

Last week at Microsoft Ignite, we announce our supportability for Azure Stack. This is based around our Windows Agent, Cloud Connect and Availability Console products that combine together to off an availability solution

Key benefits of Veeam’s support for the Azure Stack include:

  • Multi-tenancyVeeam Cloud Connect isolates backup copies for each tenant ensuring security and compliance; 
  • Multiple recovery options: Veeam Backup & Replication supports both granular item level recovery through Veeam Explorers for Microsoft Exchange, SQL Server, Microsoft SharePoint, Microsoft Active Directory and for Oracle, as well as full file level restores for tenant files that were deleted or corrupted;
  • Reporting & Billing: Veeam Availability Console supports real-time monitoring and chargeback on tenant usage, allow either Hosting providers or Enterprise organizations to easily manage and bill their tenants for Availability usage.

Veeam Vanguard Blog Post Roundup:

References:

https://helpcenter.veeam.com/docs/vbo365/guide/vbo_what’s_new_in_v1_5.html?ver=15

The One Problem with the VCSA

Over the past couple of months I noticed a trend in my top blog daily reporting…the Quick fix post on fixing a 503 Service Unavailable error was constantly in the top 5 and getting significant views. The 503 error in various forms has been around since the early days of the VCSA which usually manifests it’s self with the following.

503 Service Unavailable (Failed to connect to endpoint: [N7Vmacore4Http20NamedPipeServiceSpecE:0x0000559b1531ef80] _serverNamespace = / action = Allow _pipeName =/var/run/vmware/vpxd-webserver-pipe)

Looking at the traffic stats for that post it’s clear to see an upward trend in the page views since about the end of June.

This to me is both a good and bad thing. It tells me that more people are deploying or migrating to the VCSA which is what VMware want…but it also tells me that more people are running into this 503 error and looking for ways to fix it online.

The Very Good:

The vCenter Server Appliance is a brilliant initiative from VMware and there has been a huge effort in developing the platform over the past three to four years to get it to a point where it not only became equal to vCenter’s deployed on Windows (and relying on MSSQL) but surpassed it in a lot of features especially in the vSphere 6.5 release. Most VMware shops are planning to or have migrated from Windows to the VCSA and for VMware labs it’s a no brainer for both corporate or homelab instances.

Personally I’ve been running VCSA’s in my various labs since the 5.5 release, have deployed key management clusters with the VCSA and more recently have proven that even the most mature Windows vCenter can be upgraded with the excellent migration tool. Being free of Windows and more importantly MSSQL is a huge factor in why the VCSA is an important consideration and the fact you get extra goodies like HA and API UI’s adds to it’s value.

The One Bad:

Everyone who has dealt with storage issues knows that it can lead to Guest OS file systems errors. I’ve been involved with shared hosting storage platforms all my career so I know how fickle filesystems can be to storage latency or loss of connectivity. Reading through the many forums and blog posts around the 503 error there seems to be a common denominator of something going wrong with the underlying storage before a reboot triggers the 503 error. Clicking here will show the Google results for VCSA + 503 where you can read the various posts mentioned above.

As you may or may not know the 6.5 VCSA has twelve VMDKs, up from 2 in the initial release and to 11 in the 6.0 release. There a couple of great posts from William Lam and Mohammed Raffic that go through what each disk partition does. The big advantage in having these seperate partitions is that you can manage storage space a lot more granularly.

The problem as mentioned is that the underlying Linux file system is susceptible to storage issue. Not matter what storage platform you are running you are guaranteed to have issues at one point or another. In my experience Linux filesystems don’t deal will with those issues. Windows file systems seem to tolerate storage issue much better than their Linux counterparts and without starting a religious war I do know about the various tweaks that can be done to help make Linux filesystems more resilient to underlying storage issues.

With that in mind, the VCSA is very much susceptible to those same storage issues and I believe a lot of people are running into problems mainly triggered by storage related events. Most of the symptoms of the 503 relate back to key vCenter services unable to start after reboot. This usually requires some intervention to fix or a recovery of the VCSA from backup, but hopefully all that’s needed is to run an e2fsck against the filesystem(s) impacted.

The Solution:

VMware are putting a lot of faith into the VCSA and have done a tremendous job to develop it up to this point. It is the only option moving forward for VMware based platforms however there needs to be a little more work done into the resiliency of the services to protect against external issues that can impact the guest OS. PhotonOS is now the OS of choice from 6.5 onwards but that will not stop the legacy of susceptibility that comes with Linux based filesystems leading to issues such as the 503 error. If VMware can protect key services in the event of storage issues that will go a long way to improving that resiliency.

I believe it will get better and just this week VMware announced a monthly security patch program for the VCSA which shows that they are serious (not to say they where not before) about ensuring the appliance is protected but I’m sure many would agree that it needs to offer reliability as well…this is the one area where the Windows based vCenter has an advantage still.

With all that said, make sure you are doing everything possible to have the VCSA housed on as reliable as possible storage and make sure that you are not only backing up the VCSA and external dependancies correctly but understand how to restore the appliance including understanding of the inbuilt backup mechanisms for backing up the config and the PostGres database.

I love and would certainly recommend the VCSA…I just want to love it a little more without having to deal with possibility of having the 503 server error lurking around every storage event.

References:

http://www.vmwarearena.com/understanding-vcsa-6-5-vmdk-partitions-mount-points/

http://www.virtuallyghetto.com/2016/11/updates-to-vmdk-partitions-disk-resizing-in-vcsa-6-5.html

https://www.veeam.com/wp-vmware-vcenter-server-appliance-backup-restore.html

https://kb.vmware.com/kb/2091961

https://kb.vmware.com/kb/2147154

vSphere 6.5 Update 1 – What’s in it for Service Providers

Late last week VMware released vSphere 6.5 Update 1 which included updated builds of both vCenter and ESXi and as per usual I will go through some of the key features and fixes that are included in the latest versions of vCenter and ESXi. When looking through the release notes I generally keep an eye out for improvements that relate back to Service Providers who use vSphere as the foundation of their Managed or Infrastructure as a Service offerings. This update also contains an update to vSAN which is now at 6.6.1 so I’ll spend some time looking at what’s been added there.

 

New Features and Enhancements:

Without question this is a significant patch release for vCenter and ESXi and the length of the release notes is testament to that point. In terms of new features there isn’t anything groundbreaking but there are a few nice additions like being able to run the VCSA GUI and CLI installers on Windows 2012 and 2012 R2 as well as 2016 and macOS Sierra and Ubuntu 17.04 OS is supported for Guest OS Customization. vCenter now supports Microsoft SQL Server 2014 SP2 2016 and SP1 as well as some increased configuration maximums supporting Linked Mode with 15 vCenter Instances, 5000 ESXi hosts and 50,000 powered on virtual machines.

Ability to Upgrade or Migrate from vCenter 6.0 Update 3:

This release addresses the previous limitation in the upgrade and migration path for those running vSphere 6.0 U3 in going to vSphere 6.5. I know this will make a lot of providers happy as I know a lot that had to go to 6.0 Update 3 to address existing bug in the platform but where not yet ready or able to go to 6.5 at the time.

HTML5 Client Update:

The HTML5 Web Client has gotten it’s own update that brings it up to speed with the 3.15 Fligng version however it’s still partially functional which remains somewhat frustrating…The online documentation for supported functionality has been updated to vSphere 6.5U1 and is available here.

The list below is of the main updates in this release.

  • DRS/HA VM overrides
  • SDRS rules
  • Content Library – further actions
  • Roles and Global Permissions
  • Download multiple files as zip
  • Distributed Switch – further actions
  • Fault Tolerance
  • SPBM
  • VM Hardware – further items
  • Apply Customize Guest OS during Clone
  • VM Migration – further actions (compute+storage, Cross VC, batch)
vSAN Features:

For service providers, vSAN 6.6 was another major release that sured up vSANs status as a serious storage platform for service provider platforms.

vSAN 6.6.1 introduces three key new features:

  • VMware vSphere Update Manager (VUM) integration
  • Performance Diagnostics in vSAN Cloud Analytics
  • Storage Device Serviceability enhancement

The ability to upgrade with VUM is a nice touch and continues to improve on the usability and manageability of vSAN. For a full look at what’s new in this release for vSAN 6.6.1 head to this blog post.

Resolved Issues:

There are a bunch of resolved issues in this release and I’ve gone through the rather extensive list to pull out the biggest fixes that relate to my experience in service provider operations and have also extended this to include fixes that relate to backup operations. The majority of what I pick out related to storage, networking hosts and VM operations…the core of any platform, but even more important in the service provider world. The ones in red are specific fixes that relate to issues that iv’e come across…good to see them addressed!

vCenter:
  • First-boot failure occurs when upgrading from vSphere 5.5 or 6.0 to vSphere 6.5 on Windows If an older version of the OpеnSSL DLLs are installed, upgrading to vSphere 6.5 fails to run because the older DLL versions are loaded
  • Affinity rules configured on vCenter Server 5.5 can cause crashes after upgrading to vCenter Server 6.5 Migrating a VM with affinity rules configured while on vCenter Server 5.5 to a cluster that has affinity rules configured on vCenter Server 6.0 or 6.5 can cause vCenter Server to crash.
  • VM Snapshot Size (GB) alarm is not triggered after the VM is powered on. VM Snapshot Size (GB) alarm is reset if the virtual machine is shut down. Alarm fails to trigger after the VM is powered on. This issue occurs in alarms based on VM Snapshot (GB) and Vm Total Size on Disk because their status is altered when the power state of the VM is changed. This issue occurs because disk usage of a VM is the same regardless of the VM power state.
  • When you add ports to a vSphere Distributed Switch you get an error Because of a race condition, when you add ports to a vSphere Distributed Switch you get the error message: Cannot create a new port because number of ports exceeds 2147483647, maximum number of ports allowed on vDS.
  • A runtime exception “Unable to retrieve data about the distributed switch” might occur while upgrading vSphere Distributed Switch (vDS) from 5.0 to 6.5 version When you try to upgrade an existing distributed switch after the vCenter upgrade is completed, the runtime exception Unable to retrieve data about the distributed switch might occur in the wizard and the distributed switch cannot be upgraded. The exception is a result of unexpected value NULL for a LACP property of the distributed switch, instead of TRUE or FALSE, as LACP is not supported for the current version of vSphere Distributed Switch.
  • Host configuration might not be available after vCenter Server restarts After a vCenter Server restart, the host configuration might not be available if vCenter Server cannot communicate with the host. After connectivity is restored, the configuration becomes available.
  • OVF tool fails to upload OVF or OVA files larger than 10 GB If you use OVF tool fails to upload OVF or OVA files larger than 10 GB, the upload might fail.

ESXi:

  • Virtual machine crashes on ESXi 6.5 when multiple users log on to Windows Terminal Server VM Windows 2012 terminal server running VMware tools 10.1.0 on ESXi 6.5 stops responding when many users are logged in.vmware.log will show similar messages to2017-03-02T02:03:24.921Z| vmx| I125: GuestRpc: Too many RPCI vsocket channels opened.
    2017-03-02T02:03:24.921Z| vmx| E105: PANIC: ASSERT bora/lib/asyncsocket/asyncsocket.c:5217
    2017-03-02T02:03:28.920Z| vmx| W115: A core file is available in "/vmfs/volumes/515c94fa-d9ff4c34-ecd3-001b210c52a3/h8-
    ubuntu12.04x64/vmx-debug-zdump.001"
    2017-03-02T02:03:28.921Z| mks| W115: Panic in progress... ungrabbing 
  • An ESXi host might fail with purple diagnostic screen when collecting performance snapshots
    An ESXi host might fail with purple diagnostic screen when collecting performance snapshots with vm-support due to calls for memory access after the data structure has already been freed.An error message similar to the following is displayed:
  • Full duplex configured on physical switch may cause duplex mismatch issue with igb native Linux driver supporting only auto-negotiate mode for nic speed/duplex setting
    If you are using the igb native driver on an ESXi host, it always works in auto-negotiate speed and duplex mode. No matter what configuration you set up on this end of the connection, it is not applied on the ESXi side. The auto-negotiate support causes a duplex mismatch issue if a physical switch is set manually to a full-duplex mode.
  • An ESXi host might fail with a purple screen and a Spin count exceeded (refCount) – possible deadlock with PCPU error An ESXi host might fail with a purple screen and a Spin count exceeded (refCount) - possible deadlock with PCPU error, when you reboot the ESXi host under the following conditions:
    • You use the vSphere Network Appliance (DVFilter) in an NSX environment
    • You migrate a virtual machine with vMotion under DVFilter control
  • A Virtual Machine (VM) with e1000/e1000e vNIC might have network connectivity issues For a VM with e1000/e1000e vNIC, when the e1000/e1000e driver tells the e1000/e1000e vmkernel emulation to skip a descriptor (the transmit descriptor address and length are 0), a loss of network connectivity might occur.
  • An ESXi host might stop responding when you migrate a virtual machine with Storage vMotion between ESXi 6.0 and ESXi 6.5 hosts The vmxnet3 device tries to access the memory of the guest OS while the guest memory preallocation is in progress during the migration of virtual machine with Storage vMotion. This results in an invalid memory access and the ESXi 6.5 host failure.
  • Modification of IOPS limit of virtual disks with enabled Changed Block Tracking (CBT) fails with errors in the log files To define the storage I/O scheduling policy for a virtual machine, you can configure the I/O throughput for each virtual machine disk by modifying the IOPS limit. When you edit the IOPS limit and CBT is enabled for the virtual machine, the operation fails with an error The scheduling parameter change failed. Due to this problem, the scheduling policies of the virtual machine cannot be altered. The error message appears in the vSphere Recent Tasks pane.You can see the following errors in the /var/log/vmkernel.log file:2016-11-30T21:01:56.788Z cpu0:136101)VSCSI: 273: handle 8194(vscsi0:0):Input values: res=0 limit=-2 bw=-1 Shares=1000
    2016-11-30T21:01:56.788Z cpu0:136101)ScsiSched: 2760: Invalid Bandwidth Cap Configuration
    2016-11-30T21:01:56.788Z cpu0:136101)WARNING: VSCSI: 337: handle 8194(vscsi0:0):Failed to invert policy
  • When you hot-add an existing or new virtual disk to a CBT (Changed Block Tracking) enabled virtual machine (VM) residing on VVOL datastore, the guest operation system might stop responding When you hot-add an existing or new virtual disk to a CBT enabled VM residing on VVOL datastore, the guest operation system might stop responding until the hot-add process completes. The VM unresponsiveness depends on the size of the virtual disk being added. The VM automatically recovers once hot-add completes.
  • When you use vSphere Storage vMotion, the UUID of a virtual disk might change When you use vSphere Storage vMotion on vSphere Virtual Volumes storage, the UUID of a virtual disk might change. The UUID identifies the virtual disk and a changed UUID makes the virtual disk appear as a new and different disk. The UUID is also visible to the guest OS and might cause drives to be misidentified.
  • An ESXi host might become unresponsive if the VMFS-6 volume has no space for the journal When opening a VMFS-6 volume, it allocates a journal block. Upon successful allocation, a background thread is started. If there is no space on the volume for the journal, it is opened in read-only mode and no background thread is initiated. Any intent to close the volume, results in attempts to wake up a nonexistent thread. This results in the ESXi host failure.
  • SSD congestion might cause multiple virtual machines to become unresponsiv Depending on the workload and the number of virtual machines, diskgroups on the host might go into permanent device loss (PDL) state. This causes the diskgroups to not admit further IOs, rendering them unusable until manual intervention is performed.
  • Unable to collect vm-support bundle from an ESXi 6.5 host Unable to collect vm-support bundle from an ESXi 6.5 host because when generating logs in ESXi 6.5 by using the vSphere Web Client, the select specific logs to export text box is blank. The options: network, storage, fault tolerance, hardware etc. are blank as well. This issue occurs because the rhttpproxy port for /cgi-bin has a value different from 8303.This issue is resolved in this release.
  • vSphere Storage vMotion might fail with an error message if it takes more than 5 minutes The destination virtual machine of the vSphere Storage vMotion is incorrectly stopped by a periodic configuration validation for the virtual machine. vSphere Storage vMotion that takes more than 5 minutes fails with the The source detected that the destination failed to resume message.
    The VMkernel log from the ESXi host contains the message D: Migration cleanup initiated, the VMX has exited unexpectedly. Check the VMX log for more details.

vSAN:

  • Hosts in a vSAN cluster have high congestion which leads to host disconnects When vSAN components with invalid metadata are encountered while an ESXi host is booting, a leak of reference counts to SSD blocks can occur. If these components are removed by policy change, disk decommission, or other method, the leaked reference counts cause the next I/O to the SSD block to get stuck. The log files can build up, which causes high congestion and host disconnects.
  • vSAN cluster becomes partitioned after the member hosts and vCenter Server reboot If the hosts in a unicast vSAN cluster and the vCenter Server are rebooted at the same time, the cluster might become partitioned. The vCenter Server does not properly handle unstable vpxd property updates during a simultaneous reboot of hosts and vCenter Server.
  • Large File System overhead reported by the vSAN capacity monitor When deduplication and compression are enabled on a vSAN cluster, the Used Capacity Breakdown (Monitor > vSAN > Capacity) incorrectly displays the percentage of storage capacity used for file system overhead. This number does not reflect the actual capacity being used for file system activities. The display needs to correctly reflect the File System overhead for a vSAN cluster with deduplication and compression enabled.

It’s also worth reading through the Known Issues section as there is a fair bit to be aware of in Update 1 and that remain from the GA.

Happy upgrading!

References:

https://docs.vmware.com/en/VMware-vSphere/6.5/rn/vsphere-esxi-651-release-notes.html

https://docs.vmware.com/en/VMware-vSphere/6.5/rn/vsphere-vcenter-server-651-release-notes.html

Second vSphere Client (HTML5) update in vSphere 6.5U1

Introducing vSAN 6.6.1 and New Operational Savings

Attack from the Inside – Protecting Against Rogue Admins

In July of 2011, Distribute.IT, a domain registration and web hosting services provider in Australia was was hit with a targeted, malicious attack that resulted in the company going under and their customers left without their hosting or VPS data. The attack was calculated, targeted and vicious in it’s execution… I remember the incident well as I was working for Anittel at the time and we where offering similar services…everyone in the hosting organization was concerned when starting to think about the impact a similar attack would have within our systems.

“Hackers got into our network and were able to destroy a lot of data. It was all done in a logical order – knowing exactly where the critical stuff was and deleting that first,”

While it was reported at the time that a hacker got into the network, the way in which the attack was executed pointed to an inside job and all though it was never proved to be so it almost 100% certain that the attacker was a disgruntled ex-employee. The very real issue of an inside attack has popped up again…this time Verelox, a hosting company out of the Netherlands has effectively been taken out of business with a confirmed attack from within by an ex-employee.

My heart sinks when I read of situations like this and for me, it was the only thing that truely kept me up at night as someone who was ultimately responsible for similar hosting platforms. I could deal and probably reconcile with myself if I found myself in a situation where a piece of hardware failed causing data loss…but if an attacker had caused the data loss then all bets would have been off and I might have found myself scrambling to save face and along with others in the organization, may well have been searching for a new company…or worse a new career!

What Can Be Done at an Technical Level?

Knowing a lot about how hosting and cloud service providers operate my feeling is that 90% of organizations out there are not prepared for such attacks and are at the mercy of an attack from the inside…either by a current or ex-employee. Taking that a step further there are plenty that are at risk of an attack from the inside perpetrated by external malicious individuals. This is where the principal of least privileged access needs to be taken to the nth degree. Clear separation of operational and physical layers needs to be considered as well to ensure that if systems are attacked, not everything can be taken down at once.

Implementing some form of certification or compliancy such as ISO 27001, SOC and iRAP will force companies to become more vigilant through the stringent processes and controls that are forced upon companies once they meet compliancy. This in turn naturally leads to better and more complete disaster and business continuity scenarios that are written down and require testing and validation in order to pass certification.

From a backup point of view, these days with most systems being virtual it’s important to consider a backup strategy that not only looks to make use of the 3-2-1 rule of backups, but also look to implement some form of air-gapped backups that in theory are completely seperate and unaccessible from production networks, meaning that only a few very trusted employees have access to the backup and restore media. In practice implementing a complete air-gapped solution is complex and potentially costly and this is where service providers are chancing their futures on scenarios that have a small percentage chance of happening however the likelihood of that scenario playing out is greater than it’s ever been.

In a situation like Verelox, I wonder if, like most IaaS providers they didn’t backup all client workloads by default, meaning that backup services was an additional service charge that some customers didn’t know about…that said, if backup systems are wiped clean is there any use of having those services anyway? That is to say…is there a backup of the backup? This being the case I also believe that businesses need to start looking at cross cloud backups and not rely solely on their providers backup systems. Something like the Veeam Agent’s or Cloud Connect can help here.

So What Can Be Done at an Employee Level?

The more I think about the possible answer to this question, the more I believe that service providers can’t fully protect themselves from such internal attacks. At some point trust supersedes all else and no amount of vetting or process can stop someone with the right sort of access doing damage. To that end making sure that you are looking after your employee’s is probably the best defence against someone feeling aggrieved enough to carry out an malicious attack such as the one Verelox has just gone through. In addition to looking after employee’s well being it’s also a good idea to…within reason, keep tabs on an employee’s state in life in general. Are they going through any personal issues that might make them unstable, or have they been done wrong by someone else within the company? Generally social issues should be picked up during the hiring process, but complete vetting of employee stability is always going to be a lottery.

Conclusion

As mentioned above, this type of attack is a worst case scenario for every service provider that operates today…there are steps that can be taken to minimize the impact and protect against an employee getting to the point where they choose to do damage but my feeling is we haven’t seen the last of these attacks and unfortunately more will suffer…so where you can, try to implement policy and procedure to protect and then recover when or if they do happen.

Vote for your favorite blogs at vSphere-land!

Top vBlog Voting 2017

Resources:

https://www.crn.com.au/news/devastating-cyber-attack-turns-melbourne-victim-into-evangelist-397067/page1

https://www.itnews.com.au/news/distributeit-hit-by-malicious-attack-260306

https://news.ycombinator.com/item?id=14522181

Verelox (Netherlands hosting company) servers wiped by ex-admin from sysadmin

Homelab – Lab Access Made Easy with Free Veeam Powered Network

A couple of weeks ago at VeeamON we announced the RC of Veeam PN which is a lightweight SDN appliance that has been released for free. While the main messaging is focused around extending network availability for Microsoft Azure, Veeam PN can be deployed as a stand alone solution via a downloadable OVA from the veeam.com site. While testing the product through it’s early dev cycles I immediately put into action a use case that allowed me to access my homelab and other home devices while I was on the road…all without having to setup and configure relatively complex VPN or remote access solutions.

There are a lot of existing solutions that do what Veeam PN does and a lot of them are decent at what they do, however the biggest difference for me with comparing say the VPN functionality with a pfSense is that Veeam PN is purpose built and can be setup within a couple of clicks. The underlying technology is built upon OpenVPN so there is a level of familiarity and trust with what lies under the hood. The other great thing about leveraging OpenVPN is that any Windows, MacOS or Linux client will work with the configuration files generated for point-to-site connectivity.

Homelab Remote Connectivity Overview:

While on the road I wanted to access my homelab/office machines with minimal effort and without the reliance on published services externally via my entry level Belkin router. I also didn’t have a static IP which always proved problematic for remote services. At home I run a desktop that acts as my primary Windows workstation which also has VMware Workstation installed. I then have my SuperMicro 5028D-TNT4 server that has ESXi installed and runs my NestedESXi lab. I need access to at least RDP into that Windows workstation, but also get access to the management vCenter, SuperMicro IPMI and other systems that are running on the 192.168.1.0/24 subnet.

As seen above I also wanted to directly access workloads in the NestedESXi environment specifically on the 172.17.0.1/24 and 172.17.1.1/24 networks. A little more detail on my use case in a follow up post but as you can see from the diagram above, with the use of the Tunnelblick OpenVPN Client on my MBP I am able to create a point-to-site connection to the Veeam PN HUB which is in turn connected via site-to-site to each of the subnets I want to connect into.

Deploying and Configuring Veeam Powered Network:

As mentioned above you will need to download the Veeam PN OVA from the veeam.com website. This VeeamKB describes where to get the OVA and how to deploy and configure the appliance for first use. If you don’t have a DHCP enabled subnet to deploy the appliance into you can configure the network as a static by accessing the VM console, logging in with the default credentials and modifying the /etc/networking/interface file as described here.

Components

  • Veeam PN Hub Appliance x 1
  • Veeam PN Site Gateway x number of sites/subnets required
  • OpenVPN Client

The OVA is 1.5GB and when deployed the Virtual Machine has the base specifications of 1x vCPU, 1GB of vRAM and a 16GB of storage, which if thin provisioned consumes a tick over 5GB initially.

Networking Requirements

  • Veeam PN Hub Appliance – Incoming Ports TCP/UDP 1194, 6179 and TCP 443
  • Veeam PN Site Gateway – Outgoing access to at least TCP/UDP 1194
  • OpenVPN Client – Outgoing access to at least TCP/UDP 6179

Note that as part of the initial configuration you can configure the site-to-site and point-to-site protocol and ports which is handy if you are deploying into a locked down environment and want to have Veeam PN listen on different port numbers.

In my setup the Veeam PN Hub Appliance has been deployed into Azure mainly because that’s where I was able to test out the product initially, but also because in theory it provides a centralised, highly available location for all the site-to-site connections to terminate into. This central Hub can be deployed anywhere and as long as it’s got HTTPS connectivity configured correctly you can access the web interface and start to configure your site and standalone clients.

Configuring Site Clients (site-to-site):

To complete the configuration of the Veeam PN Site Gateway you need to register the sites from the Veeam PN Hub Appliance. When you register a client, Veeam PN generates a configuration file that contains VPN connection settings for the client. You must use the configuration file (downloadable as an XML) to set up the Site Gateway’s. Referencing the digram at the beginning of the post I needed to register three seperate client configurations as shown below.

Once this has been completed I deployed three Veeam PN Site Gateway’s on my Home Office infrastructure as shown in the diagram…one for each Site or subnet I wanted to have extended through the central Hub. I deployed one to my Windows VMware Workstation instance  on the 192.168.1.0/24 subnet and as shown below I deployed two Site Gateway’s into my NestedESXi lab on the 172.17.0.0/24 and 172.17.0.1/24 subnets respectively.

From there I imported the site configuration file into each corresponding Site Gateway that was generated from the central Hub Appliance and in as little as three clicks on each one, all three networks where joined using site-to-site connectivity to the central Hub.

Configuring Remote Clients (point-to-site):

To be able to connect into my home office and home lab which on the road the final step is to register a standalone client from the central Hub Appliance. Again, because Veeam PN is leveraging OpenVPN what we are producing here is an OVPN configuration file that has all the details required to create the point-to-site connection…noting that there isn’t any requirement to enter in a username and password as Veeam PN is authenticating using SSL authentication.

For my MPB I’m using the Tunnelblick OpenVPN Client I’ve found it to be an excellent client but obviously being OpenVPN there are a bunch of other clients for pretty much any platform you might be running. Once I’ve imported the OVPN configuration file into the client I am able to authenticate against the Hub Appliance endpoint as the site-to-site routing is injected into the network settings.

You can see above that the 192.168.1.0, 172.17.0.0 and 172.17.0.1 static routes have been added and set to use the tunnel interfaces default gateway which is on the central Hub Appliance. This means that from my MPB I can now get to any device on any of those three subnets no matter where I am in the world…in this case I can RDP to my Windows workstation, connect to vCenter or ssh into my ESXi hosts.

Conclusion:

Summerizing the steps that where taken in order to setup and configure the extension of my home office network using Veeam PN through its site-to-site connectivity feature to allow me to access systems and services via a point-to-site VPN:

  • Deploy and configure Veeam PN Hub Appliance
  • Register Sites
  • Register Endpoints
  • Deploy and configure Veeam PN Site Gateway
  • Setup Endpoint and connect to Hub Appliance

Those five steps took me less than 15 minutes which also took into consideration the OVA deployments as well…that to me is extremely streamlined, efficient process to achieve what in the past, could have taken hours and certainly would have involved a more complex set of commands and configuration steps. The simplicity of the solution is what makes it very useful for home labbers wanting a quick and easy way to access their systems…it just works!

Again, Veeam PN is free and is deployable from the Azure Marketplace to help extend availability for Microsoft Azure…or downloadable in OVA format directly from the veeam.com site. The use case i’ve described and have been using without issue for a number of months adds to the flexibility of the Veeam Powered Network solution.

References:

https://helpcenter.veeam.com/docs/veeampn/userguide/overview.html?ver=10

https://www.veeam.com/kb2271

 

VeeamON 2017 Wrap

VeeamON 2017 has come and gone and even though I left New Orleans on Friday afternoon, I just arrived back home…54 hours of travel, transit and delays has meant that my VeeamOFF continued longer than most! What an amazing week it was though for Veeam, our partners and our customers…The announcements that we made over the course of the event have been extremely well received and it’s clear to me that the Availability Platform vision that we first talked about last year is in full execution mode.

The TPM team executed brilliantly and along with the core team and the other 300 Veeam employee’s that where in New Orleans it was great to see all the hard work pay off. The Technical Evangelist’s main stage live demo’s all went off (if not for some dodgy HDMI) without a hitch and we all felt privileged to be able to demo some of the key announcements. On a personal note, It was a career highlight to be able to present to approximately 2000 people and be part of a brand new product launch for Veeam with Veeam PN.

From a networking point of view it was great to meet so many new people and put faces to Twitter handles. It was also great to see the strong Veeam Vanguard representation at the event and even though I couldn’t party with the group like previous years, it looked like they got a lot out of week, both from a Veeam technical point of view and without doubt on the social front…I was living vicariously through them as they where partying hard in New Orleans.

VeeamON Key Announcements:

Availability Suit 10

  • Built-in Management for Veeam Agent for Linux and Veeam Agent for Microsoft Windows
  • Scale-Out Backup Repository — Archive Tier
  • NAS Backup Support for SMB and NFS Shares
  • Veeam CDP (Continuous Data Protection)
  • Primary Storage Integrations — Universal Storage Integration API
  • DRaaS Enhancements (for service providers)
  • Additional enterprise scalability enhancements

For me, the above list shows our ongoing commitment to the Enterprise but more importantly for me working on enhancing our platform so that our Veeam Cloud and Service Providers can continue to leverage our technology to create and offer cloud based Disaster Recovery and Backup services.

Product Announcements and Releases:

I have been lucky enough to work as the TPM lead on Veeam PN and I was extremely excited to be able to demo it for the first time to the world. I’ve written a blog post here that goes into some more detail around Veeam PN and if you want to view the main stage demo I’ve linked to the video in the last section…I start the demo at the 29th minute mark if you want to skip through.

vCloud Director Cloud Connect Enhancements:

As mentioned above we have enhanced core capabilities in v10 when it comes to Cloud Connect Replication and Cloud Connect Backup. Obviously, the announcement that we will be supporting vCloud Director is significant and one that a lot of our Cloud and Service Providers are extremely happy with. It just makes the DRaaS experience that much more complete and when you add that to the CDP features in the core platform which will allow for sub minute RPO’s for replica’s it firmly places Cloud Connect as the market leader in Replication as a Service technologies.

We also announced backup to tape features for Cloud Connect Backup which will allow Cloud and Service Providers to offload long term backup files to cheaper storage. Note that this isn’t limited to tape if used in conjunction with a Virtual Tape Library. Hopefully our VCSP’s can create revenue generating service offerings around this feature as well.

VCSP Council Meeting:

On Thursday, our R&D leads met with a select group of our top Cloud and Service Provider partners over a three hour lunch meeting which could have gone all day if time permitted. It was great to be on the other side of the fence for the first time and hear all the great feedback, advice and suggestions from the group. It’s encouraging to hear about how Veeam Backup & Replication had become the central platform for IaaS, Cloud Replication an Backup offerings and with the v10 enhancements I expect that to be even more the case moving forward.

Main Stage Recordings:

Wednesday and Thursday morning both saw main stage general sessions where we announced our new products and features along with keynotes from Sanjay Poonen and Mark Russinovich as well as co-CEO Peter McKay and co-founder Ratmir Timashev. They are worth a look and I’ve posted links to the video recordings below. Note that they are unedited and contain all change overs and wait times.

https://www.veeam.com/veeamon/live

Press Releases:

Veeam is now in the Network Game! Introducing Veeam Powered Network.

Today at VeeamON 2017 we announced the Release Candidate of Veeam PN (Veeam Powered Network) which together with our existing feature, Direct Restore to Microsoft Azure creates a new solution called Veeam Disaster Recovery for Microsoft Azure. At the heart of this new solution is Veeam PN which extends an on-premises network to one that’s in Azure enhancing our availability capabilities around disaster recovery.

Veeam PN allows administrators to create, configure and connect site-to-site or point-to-site VPN tunnels easily through an intuitive and simple UI all within a couple of clicks. There are two components to Veeam PN, that being a Hub Appliance that’s deployable from the Azure Marketplace and a Site Gateway that’s downloadable from the veeam.com website and deployable on-premises from an OVA meaning it can be installed onto

Veeam PN for Microsoft Azure (Veeam Powered Network) is a free solution designed to simplify and automate the setup of a disaster recovery (DR) site in Microsoft Azure using lightweight software-defined networking (SDN).

  • Provides seamless and secure networking between on-premises and Azure-based IT resources
  • Delivers easy-to-use and fully automated site-to-site network connectivity between any site

Veeam PN is designed for both SMB and Enterprise customers, as well as service providers.

From my point of view this is a great example of how Veeam is no longer a backup company but a company that’s focused on availability. Networking is still the most complex part of executing a successful disaster recovery plan and with Veeam PN easily extending on-premises networks to DR networks as well as providing connectivity from remote sites back to DR networks via site-to-site connectivity while also providing access for remote endpoints the ability to connect into the HUB appliance and be connected to networking configured via a point-to-site connection.

Look out for more information from myself on Veeam PN as we get closer to GA.

« Older Entries