Recently VMware Education announced the availability of the “vSAN Specialist” exam which entitles those who pass it to receive the “vSAN Specialist 2017” badge. The badge holder is a “technical professional who understands the vSAN 6.6 architecture and its complete feature set, knows how to conduct a vSAN design and deployment exercise, can implement a live vSAN hyper-converged infrastructure environment based on certified hardware/software components and best practices, and can administer/operate a vSAN cluster properly“.
As I consider myself to be a vSAN specialist I thought this one should be rather easy to achieve, so after I read about it last week, I immediately scheduled my exam at Pearson VUE and took it today.
VMware Virtual Volumes (aka VVOL) was introduced in vSphere 6.0 to allow vSphere administrators to be able to manage external storage resources (and especially the storage requirements for individual VM’s) through a policy-based mechanism (called Storage Policy Based Management – or SPBM).
VVOL in itself is not a product, but more of a framework that VMware has defined where each storage vendor can use this framework to enable SPBM for vSphere administrators by implementing the underlying components like VASA providers, Containers with its Storage Capabilities and Protocol Endpoint in their own way (a good background on VVOLs can be found in this KB article). This makes it easy for each storage vendor to get started with introducing VVOL support, but also means that it is not easy comparing different vendors with regard to this feature (“YES we support VVOL’s …” does not really say much about the way an individual vendor has implemented this feature in their storage array and how they compare to other vendors).
In this blog I want to show the way Nimble Storage (now part of HPE) has implemented VVOL support. For now I will focus on the initial integration part. In a future blog I will show how this integration can be used to address the Nimble Storage capabilities for individual VM’s through the use of storage policies.
Last month VMware released vSAN version 6.6 as a patch release of vSphere (6.5.0d). New features included Data-at-Rest encryption, enhanced stretched clusters with local protection, change of vSAN communication from multicast to unicast and many more.
Perhaps al ittle less impressive but yet very useful change is the (simple) way a new vSAN cluster is configured. To illustrate this I have recorded a short demo of the configuration of a new vSAN 6.6 cluster.
I am a big vSAN fan and use it in my own Home Lab for most of my VM’s (main exception being VM’s used for backing up … they are on my QNAP fileserver connected via iSCSI). My vSAN cluster configuration is quite static and the only thing that might change in the near future is increasing the capacity by adding an additional ESXi host to the cluster.
Currently I am running with vSAN version 6.2 and since the environment is very stable and it is my “production” environment I don’t plan to upgrade to the latest and greatest version yet. Still, I do want to work with the newer versions and functions (like iSCSI target) to become familiar with them and stay up-to-date with my vSAN knowledge, so I have a test (virtual) vSphere 6.5 Cluster with vSAN 6.5 installed, currently in a 2-node (ROBO) setup with an additional witness appliance.
With the release of vSAN 6.6 (check out the release notes here) I wanted to upgrade my vSAN 6.5 environment. Actually I decided to create a new vSAN 6.6 cluster from scratch with my existing ESXi hosts, which means I first had to delete my existing vSAN 6.5 datastore.
Over-commitment of resources is a well known feature of vSphere and allows you to use the available physical resources as efficient as possible, resulting in a possibly higher consolidation ratio (number of VM’s per ESXi host). This feature is especially interesting with regard to CPU resources, as this is a type of resource that has a very low average utilization in many server environments. Using overcommitment of CPU allows you for example to configure a number of VM’s on a host with a total of let’s say 50 virtual CPU’s (vCPU’s) where the specific host only has 16 physical cores available. This is an example based on a general best practice to allow for a 3-to-1 overcommitment ratio (3x as many vCPU’s configured as available physical cores). Sometimes you might want to reduce this (if you have very CPU-intensive workload running on your hosts) or you could even decide to allow for a higher overcommitment ratio of 5-to-1 (for workload that uses relatively little CPU).
DRS (Distributed Resource Scheduler) is a feature of a vSphere cluster that makes sure that all workload (VM’s) running on all hosts in that cluster is provided with the resources it needs. Balancing the load within the cluster is done by using vMotion migration of VM’s from hosts that have relatively little resources to hosts where resources are more plentiful available.
Starting with vSphere 6.5 a new setting is available in DRS that allows you to configure the allowed CPU over-commitment ratio. If you enable this feature, you can configure a setting of up to 500% (a 5-to-1 over-commitment ratio).
Now … how does this work and does this have any impact on availability you may ask. So I created a little vSphere 6.5 cluster with two ESXi hosts with 2 cores each, so a total of 4 cores available. I also configured HA (without admission control enabled, so it would allow me to start as many VM’s as I would like from an HA perspective) and then I configured DRS with this new feature enabled and the over-commitment ratio set to 50%.
This would allow me to use a maximum total of 4 x 50% = 2 virtual CPU’s. So I started my first VM which is configured with 2 vCPU’s … no problem.
Now I started the second VM which only has a single vCPU configured (which would bring the total of actively used vCPU’s to 3). As we would expect, DRS will prevent us from doing this and gives us an error message to reflect this :
So this seems like a great feature to prevent you from powering on too many VM’s and makes sure that the VM’s that are running get enough CPU resources to reach an acceptable performance level. But what happens when a host goes down? As this is a cluster level setting, having only one host left in my lab cluster will result in 2 physical CPU’s being available and with an over-commitment level set to 50% this would mean I could only use 1 vCPU … Well let’s find out. First I have my 2-vCPU VM running on host esxi65a.
Then I powered off this host. Since I have HA configured I would assume this will take care of automatically restarting the VM on my host B. But what about the amount of available CPU resources ? I need 2 vCPU’s for this VM and DRS would allow only 1 (only 2 physical CPU’s left and over-commitment set to 50%). Well it appears that this is not a problem, since HA does it’s job as we would expect :
So we don’t need to be afraid that using this DRS setting will effect our level of availability. We DO need to be careful however, since after a host failure (or during maintenance windows) the amount of available cluster resources are reduced, so starting additional workload might result in unexpected failures (in which case you could temporarily disable this feature again or set it to a higher ratio).
During the VMware courses I teach I often get questions about the way to get certified and stay certified as a VCP. This blog post will try to explain your options.
First of all you need to be aware that several VCP certifications exist. The “classic” VCP (which focuses on vSphere) is called VCP-DCV nowadays (DCV being short for DataCenter Virtualization) and for those focusing on other VMware product lines additional certifications exist (specifically VCP-DTM for Desktop and Mobility, VCP-NV for Network Virtualization and VCP-CMA for Cloud Management and Automation).
Although these other certifications do not focus on vSphere they still require people that want to achieve them to at least have a solid base knowledge of vSphere. Therefore VMware has created a Foundation exam that every individual that wants to earn their first VCP certification (any type) needs to pass in addition to passing the specific VCP exam.
Recently VMware introduced the VMware Certification Manager website. Linked to your MyLearn account this portal gives you a very clear overview of your existing certifications (with expiration dates) and the history of exams you have taken in the past as well as a list of possibly expired certifications.
The portal also gives you access to logo material related to your certification status, allows you to create .pdf versions of your certifcations and create transcripts which you can share through several social media (like sharing it on your website … check out my certification transcript for example).
Finally you will be able to check out any new VMware certifications you would like to pursue and the paths you can (need to ) take to achieve these.
This week Cisco has introduced their Hyperconverged infrastructure (HCI) solution called HyperFlex (aka the HX Data Platform). The solution is a combination of Cisco UCS hardware (both server and networking components), VMware vSphere software as the hypervisor layer and the Springpath Data Platform software as the (converged) storage layer.
The latter is a relatively new player in the HCI market and only recently came out of stealth (I wrote about it last year). Since currently the Springpath software only supports VMware, both HX models that were announced come with ESXi pre-installed. In the future Springpath is expected to also support other hypervisors (Hyper-V and KVM were already mentioned), so probably other HX models will be available in the future as well.
Although based on the existing UCS hardware, the Cisco HyperFlex solution exists only as a completely pre-configured system. It is not possible to “build-you-own” HyperFlex system. Of course with a combination of Cisco UCS, VMware vSphere and Springpath you can create a system that is very similar to the pre-built configurations, but the advantage of the Cisco HyperFlex solution is that you only need to deal with a single support contact. Also by only supporting the pre-built configuration Cisco is better able to guarantee performance levels. This approach looks similar to Nutanix, which basically is a software product, but only sells it as a solution packaged with server and storage components.
Cisco differentiates itself however by also including the networking stack into the solution. Again this is mainly an advantage with regard to ease of support, as I guess that in many environments where HCI is installed, the networking part is also taken care of by Cisco components.
Some time ago I came across a neat little “Fling” on the VMware labs site (a collection of tools built by VMware engineers) called the ESXi Host Client. When installed in your stand-alone ESXi host you will be able to manage the host through a web browser instead of having to use the traditional vSphere Client. Although the status of this piece of software is “Tech Preview”, so it is not officially supported by VMware for production environments, it is already very complete and easy to use. Recently the host client presented me a popup suggesting I look at the fling website to check whether a newer version was already available … and it was. So I went ahead, downloaded the software and installed it. In this case I first removed the old version by logging in to the ESXi console and using the command :
esxcli software vib remove -n esx-ui
The procedure to install the host client is pretty straightforward and is described below.
This week was an exciting week for VMware Virtual SAN enthusiasts (of which I am one). I’m looking forward to checking out the new features and functions as they become available with version 6.2 (VMware stated this would be by the end of the quarter). With these features the Virtual SAN solution becomes quite a mature storage solution comparable in feature set with many traditional (SAN/NAS) midrange storage systems. Among the features that will become available with the core product are :
Checksumming … making data integrity more robust
IOPS limits per object … improving Storage based QoS
Deploying thin swap objects … decreasing the overhead required for swapping (which you would want to prevent anyway)
Improved Virtual SAN management capabilities in the WebClient … removing the need for additional (RVC based) tools