So, I’ve been working with Rubrik for more than a year and so far it’s been quite a journey. All in all, everything has been very positive and it’s nice to work with a vendor who really listens. Lately though, we’ve been having a few issues regarding scalability and session concurrency that I wanted to share.

Sessions

So, first of. Rubrik has a limit of 10 active sessions per user acccount. This might not seem as a big issue, but since I do pretty much everything via the Rubrik API it actually got to be an issue for me. Let me explain. I have a lot of vCenters (think +50 and growing rapidly). Some of the backend jobs that I’ve written run on a per vCenter basis. So, as an example I have a clean up script that is responsible for monitoring all jobs for a given vCenter. That job was configured to run on all vCenters within a 10 minute window. I had only created a single service account for all of my vCenters so naturally when the jobs ran, the first 10 would log in just fine and start processing. When the 11th began, it would make rubrik kill session number 1 and that could very well cause the job that session 1 was handling to fail.

It’s not mentioned anywhere that this limit exists, but if you write to support, they can increase the limit.

Concurrent backups pr. ESXi / concurrent task pr. Rubrik node.

Our current installation handles somewhere around 2000 backups a day. Recently though, we started seeing a lot of missed backups everyday. When researching this it was quickly evident that the rubrik cluster / node had a limitation of 3 concurrent backups pr. rubrik node. This was evident since we have 20 nodes and the cluster was constantly peaking at 60 tasks running every night. This number was constant during the backup window.

After talking with they support, I was explained that they have the following default settings:

  • Concurrent backups pr. ESXi: 3
  • Concurrent tasks pr. Rubrik node: 3

They can tweak both values if asked nicely. In my case, they ran some analysis on the cluster to see the activity and then changed the concurrent tasks pr. node to 4. This number will probably be tweaked a bit more for our installation. The reason for that is that it is quite worrisome that a cluster with 20 nodes, 1.2TB memory and 160 cpu cores is only able to handle 60 concurrent backups by default.

Wow, long title. But first, a little introduction.

So I’ve said this numerous times to many people at Veeam but it has no really changed much, and why should it? Veeam hasn’t really had any viable competition until recently. I mean, there are many vendors out there who do kinda the same things but either they hadn’t solved the things that was wrong with Veeam or else they had some giant show stopper of sorts. A perfect example for me is TSM (now Spectrum Protect) for Virtual Environments. They had solved the scaling issues of Veeam but it had a giant caveat which was that the job had to be defined in a single 256 character string containing the retention, exclusions, inclusions of vms, clusters and alike. Okay, got a little sidetracked, back to the subject at hand.

So what is the issue with Veeam?

To explain this let’s skip backwards two years or so. Back to the good old days before ReFS was there to save the day and make our backup merges fast. Back to when there was no way to scale beyond a single repository other than making multiple jobs. Back when you couldn’t restore a vm from a job where a merge process was going on. Back when the only way to efficiently manage all this was by having a bunch of jobs with a bunch of repositories and them having some scripts running to evenly distribute things. Combine that with the requirement to build a platform that scaled to ingesting 3-4000 VMs every night while also being able to boot roughly 20% them the next morning if a storage array decided to have a bad day. You see where I’m going with this.

Back then, I remember starting to read about version 9.0 and I remember thinking that Scale-out Backup Repositories would solve my issues with scaling the storage backend and that the Per-VM Backup Chain would solve my issues with lockings when doing merges. The only thing that was then left was to find a way of scaling the underlying storage infrastructure to be able to cope with the merge processes and the potential booting of 20% of the vms. That was solved pretty easily, we bought ScaleIO. Moving on.

So then what happened. Veeam decided that those features were to be placed in the enterprise plus package. For those of you who don’t know the service provider programs for Veeam this represents a roughly 3x increase in price, thus making the entire thing much to expensive. I only needed the standard package since my use case was simple, I just wanted an image level backup of my VMs every night, nothing fancy.

So what is then the actual issue?

If you’re thinking scaling, you’re wrong. As of 9.0, Veeam fixed that issue and as of Windows Server 2016 with ReFS (with the initial bugs fixed) it wasn’t really that difficult. The actual issue is Veeam’s inability to distinguish between features that provides value to the end customers and features that provides me, the user/administrator, with the ability to efficiently manage and scale the platform.

So let’s ask the questions.

  • Will my customer accept a 3x increase in cost per VM (before storage consumption) because I need to scale the platform? No.
  • Am I willing to take on the extra cost per VM in order to save some time on managing the infrastructure. No. Party because of the fact that I would loose a bunch of money per VM but most importantly because the features are simply not worth it.

So what should Veeam have done?

Well, service providers are different than enterprises. VMware knows this so they created separate packages with different feature sets for us. Veeam on the other hand hasn’t really caught on to this (at least not officially). What I said to my local rep was that some features needs to be given to away for free in order to facilitate growth and to keep the customers. The SOBR and Per-VM Backup Chain are perfect examples. These are things that provide no value to the end users what so ever, so why have them be an issue for the middleman (the service provider).

It is in Veeams interest to give their service providers a platform that is easily scale-able so that we can focus on the main goal, to protect our customers (preferably with Veeam products from Veeams point of view). Veeam didn’t agree with this so that forced us to look elsewhere.

Look elsewhere?

So as I said it the beginning, Veeam didn’t really have a lot of competition, at least not from products that was looked at as being smarter than Veeam. We’ve looked at a few of them and we still stuck with Veeam because it was easy and fitted our use case well.

But at around 2 years ago, Rubrik was launched. Some months later, Cohesity came out. Both of them offered great solutions that fitted our needs. That being, simple to manage, simple to scale, built entirely with APIs from the ground up and many other nice things. We ended up going with Rubrik, why is a subject for another blog.

The thing that I wanted to point out here was that if Veeam had “just” fixed the issue, we probably wouldn’t had looked at the other options out there as seriously, which would have meant that our primary image level backup platform would have been Veeam.

If you boil it down, the facts are that currently we’re backing up roughly 2000 VMs a day on the Rubrik platform, which means that Veeam is missing out on 2000x “the price per VM per month” which adds up to a lot of money. We still haven’t moved a single managed Veeam customer to the platform (due to missing features from Rubrik which will arrive in Q4 2017) and when that happens the number of VMs a month will go towards 4000 pretty quickly. Worse yet, Veeam allowed a competitor to get a hold of a pretty valuable reference. They could probably do without the money, but reference customers tends to be the basis of getting new customers.

Anyways, just me two cents.

Let’s start out with a brief introduction for those who don’t know. The VMware vCloud Air Network program is the way that service providers can take whatever product that VMware has to offer and sell it to their customers in whatever way they see fit. Everything is 100% usage based and there is no commitment up front. Most things are billed on a per entity basis, some are per GB allocated memory, some per GB allocated storage others are per VM. It’s pretty easy to figure out and get started with.

The way they calculate the cost of a product is by saying that this feature set is worth X points per entity (GB memory/GB storage/VM) and then it’s simple math from there on out. So let’s say that you use a product with a certain feature set that is worth 9 points per GB storage and you consume a 100GB on average over the course of a month, you then owe VMware 900 points. Now, depending on how good a customer you are (your size) you pay x-amount of money per point. As a rule of thumb, 1 point = 1 USD. So again, pretty simple and so far so good.

All of this is measured by the worst tool ever invented, the VMware Usage Meter. I could rant about that product for days but that is hardly constructive. Through this tool you can then create a report every month which you send to your aggregator who then sends you an invoice back.

Now, since this blog is about how to improve the program I won’t start sharing my thoughts on the price point of the individual products. That is for a later blog (if ever).

My issue with the program

So, this all sounds nice and flexible for everyone and for the most part of the last few years, it has been. The issue is that now per VM features that brings value to customers (compared to service providers) are starting to get more and more focus. Also, customers are paying less and less while expecting more and more, the whole idea of paying for the option to use something is not going to fly any more. To the point of features that provide value to customers, an example of that could be vm-level encryption. That is a easy sell to any customer who has the requirement. The issue is that that feature is in the enterprise plus version of ESXi. The version is usually only included in the 7 point/GB or higher package. For any service provider that standardizes on anything lower than that, they have an issue. Because the only way to provide that feature is to upgrade an entire cluster to a higher license (thus costing a lot more money) or choose not to provide the feature (thus costing VMware loss in revenue). My point here is that the packages are focused on entities that are far too large, typically cluster level. Compare that with VMware going great lengths to be able to define everything on the per VM, per NIC or per hard-disk and you can quickly see that something is off.

So what can be done?

Again, this requires some work done it should be very doable since they already implemented some of this in the usage meter for NSX. The extremely simple answer is that there should only be a single license and it should include everything. Also, the entry point should be lower.

Before I’ll go into example-mode you should know that the lowest you can go within the VCAN program is a 5 point/GB package. So a VM with 6GB of memory would cost 15 points a month (you can divide the points by 2 if the memory is not reserved). That covers the basics (Storage vMotion, vMotion, HA, DRS and the Distributed Switch).

So, my proposal would be something like this:

Host/Cluster Basics

1 Point: the vm is on a non-clustered host and powered on.

2 Point: the vm is on a clustered host without HA but with DRS

3 Point: the vm is on a clustered host with HA and DRS

From here on out there a feature usages that can trigger a higher cost for the month if used.

+2 Points: the vm is storage vmotioned to another datastore, the vm is portgroup which is defined on a distributed switch, vm is on a vVol, vm has vFlash Read Cache enabled.

+4 Points: the vm is on a datastore in a datastore cluster with storage drs enabled, the vm has vm level encryption enabled.

Features like tags, MPIO, VADP etc are things that should just be included not matter what. It’s 2017, these things are a given by now and not a differentiator.

This is just me trying to come up with something on the top of my head. But the point remains, having costs defined for all VMs based on the feature set defined by the cluster doesn’t make sense. We are all willing to pay for the features but only if used and the use of the feature should be measured per vm. The days of paying for something because you then have the option to use it are over. The customers aren’t willing to accept those terms, why should the service provider then be subject to it?

Why the lower points than 5?

Well, two things. First of 5 Points is just too much compared to what you are getting. Not all VMs have the requirement for DRS, HA. Most service providers by now have a low end offering which they use to get the customers on board and then grow them from there. If you want to be able to lure in customers that would otherwise buy from the likes of DigitalOcean the base license alone would make it nearly impossible. We have more than 1000 VMs on a KVM-based platform for that reason and there is really no reason other than cost for that. If VMware offered a platform with the same feature set then moving those to ESXi wouldn’t be that far fetched.

After all, with the hypervisors are getting more and more commoditized (maybe not as fast as many expected) but if your customers are going to replace your product with a cheaper one with less features, isn’t it then better just to provide your own products with a lessor feature set and then keep the business?

I’ve sat through a few VMUG sessions, webex presentations and sales pitches from newer, “cooler” backup vendors (you know who you are) who have made the argument that proxies were the enemy, they were up to no good and they provide nothing but complexity and annoyance for anyone who managed them. This point came up usually around the time where they showed a PowerPoint of their solution compared to the other vendors out there like IBM (with Spectrum Protect for Virtual Environments) or Veeam.

And yes, I get that for the average enterprise, the need/requirement of having to deploy a few proxies (or many) can be annoying but my guess is that it’s probably mostly because of the cost and the fact that getting them deployed by someone can take anywhere from days to weeks (depending on how siloed up they are). But most of that is probably not the proxies fault.

Personally, for me it takes around 2 hours to deploy Veeam setup from start to finish. That includes creating networks, 3 VMs (Console, Repository, Proxy (if needed)) and mapping some storage to the repository VM. So hey, not that big of a deal.

But why are they not the enemy? All I’ve said so far is that they are not that bad. Well, for me the proxies provide a lot of flexibility purely networking wise. They enable restores directly to the VMs from my repository. But why is that so? To explain that you have to understand how our products are built.

I won’t go into too much details about it here but the short and sweet is that my customers VMs are not able to reach/talk to the infrastructure that they are running on top of. So in short, a VM is not able to reach the veeam infrastructure directly and the veeam infrastructure is not able to reach the VMs. This is not due to firewall limitations, this is due to the, being on separate routed networks where we don’t decide what IP space they use = a lot of overlapping address space. All this is by design. So as you can see by this, restores can be difficult it proxies weren’t around.

What the proxies do for me is provide me with a machine that I can dual home (ie 2 NICs). Have the gateway on the customers routed network and then having a few (3) static routes to the things that it needs to reach on my network (VMware infrastructure, Veeam infrastructure) and then I’m good and the customer can now do his restores as he pleases. This does require the customer to forfeit the use of 3 /24 subnets but that is never an issue.

Rubrik has asked a few times if I had plans to begin using their agents to do in-guest backups. I’ve said no every time and given them a detailed explanation as to why that is. For the customers that I have who use Veeam as a method of backing up application level items (or just file level), the proxies are a critical component and as such, they cannot be moved to a competing product until that product has the same functionality.

So that’s why I think that proxies are great. They provide a lot of flexibility. A lot more than annoyance and complexity if you ask me 🙂

So there is a lot of different constructs within ScaleIO that might make you relate to the picture above. However, do not fear. Lets go through the most critical ones, those that you simply must know in order to configure a system.

ScaleIO Data Server

I touched in this one in a previous post as well, but the SDS is the component from which you map in block devices that can then be pooled and mapped back to the clients.

ScaleIO Data Client

As the name implies, this is the client to which you map volumes to from within ScaleIO.

Meta Data Manager

The most critical component. There can be either 1, 3 or 5 of these depending on the size of the cluster but given that the minimum amount of SDS’s are 3, I see not reason to ever having less that 3. It’s very common to have the first 3 SDS’s also be the MDM’s. These are responsible for knowing where all your storage is at any given time. Loose more than half of these servers and you will be offline. Loose all of them permanently, pray that you didn’t just download free and frictionless and put it into production because you will need to contact support in order to get things back online again. The MDM do take a backup automatically so there should always be a way back from this.

Protection Group

This is the logical grouping of servers that will use each other for parity (provided they have disks in the same storage pool). You can add nodes to a protection group and then create storage pools within the protection group. After that you can add disks to the storage pool or you can introduce an extra layer called a fault domain in which you can add nodes and disks.

A single protection domain can have many storage pools and a storage pool can hold many protection domains and disks. The current maximum for disks in a single storage pool is 300.

Storage Pool

This would then be you group of similar disks. And when I say similar disks, I’m talking about speed. You don’t really have to care about the size of the disks, you have to care about the size of the node in comparison to all the other nodes with disks in that storage pool. Is a perfectly valid use case to have different sizes or nodes, you just have to set the reservation capacity so that its larger than the largest node. I’m I saying that you shouldn’t try to have homogeneous nodes? Absolutely not (personal preference).

It’s also here that you configure the spare percentage, IO priority, ram read cache, checksums and a bunch of other stuff.

Fault Set

Well, you now deployed a system across two racks and now you want to make sure that copy a is in rack a and copy b is in rack b. Well, too bad, you can’t because this has to be setup before putting data on the system. But let’s say that you were planning to do build a new system then this would be the concept you needed to achieve that requirement. Also, this can be used for other things like getting around the maximum size or node (yep, per node… not the physical server).

Order of doing things?

So, let’s put things into order.

  1. Build the MDM Cluster
  2. Build the protection domains
  3. Build the storage pools
  4. Build the fault sets (optional)
  5. Add in the SDS
  6. Add the drives

So, this may come as a shocker to you, but there are actually quite a few people around the world who doesn’t know what ScaleIO is, even though prominent people like Chad Sakac has mentioned it a myriad of times on his blog. But just to make sure that everyone know, here is another introduction.

ScaleIO is a Software Defined Storage-product from EMC. It’s primary focus is performance and thus it’s actually quite low on features compared to many other products out there. So if you’re looking for something that does replication, look the other way. If you’re looking for something that does deduplication, look the other way. If you’re looking for something that does anything else than block, look the other way. If you’re looking for something that does compression… you get the point. However, if you’re on the other hand looking for something that can be deployed on most anything, is incredibly performant and will tolerate a giant beating before letting you down, ScaleIO might be something for you.

Components

A ScaleIO-system consists of a few components:

  • SDC: ScaleIO Data Client
  • SDC: ScaleIO Data Server
  • LIA: Light Installation Agent
  • MDM: Metadata Manager
  • ScaleIO GUI
  • ScaleIO Gateway

There are a few more than these, but I’m skipping them since they have no interest to me 🙂

SDC

So as you can probably guess, the SDC is a component that needs to be installed on whatever OS you want to map some storage to.

SDS

The SDS is installed on all the servers which make up the “storage array”. So basically all the servers that has spare capacity which you then want to present backup the all the SDC’s. And yes, an SDS can be an SDC as well (think HCI for instance).

LIA

This is the agent that is installed on all nodes (both MDM, SDC, SDS) in the ScaleIO-system. This agent is used when you want to do something on the endpoints, ie. collect logs or upgrade the system to a newer release.

MDM

These babies holds the keys to the castle. The contain information on where all data is at any given point. You would typically install the MDM in a 3-node active/passive/witness cluster or a 5-node active/passive/passive/witness/witness configuration. The MDM can be standalone machines or could be installed on the SDS’s. They can be deployed as masters or tie-breakers.

GUI

The management interface for ScaleIO. Wont say to much about this, a picture does a better job.

Billedresultat for scaleio management

Gateway

An application that you would typically install on a server separate from the SDS/SDC/MDM-servers. This is because this component is used for deploying/reconfiguring/updating the system.

Billedresultat for scaleio gateway

Scale

Probably much more than you will ever need, but here is a few pointers.

  • Min/Max ScaleIO Installation Size: 300GB to 16PB
  • Individual Device Size: 100GB to 8TB
  • Volume Size: 8GB to 1PB
  • Max SDS Size: 96TB
  • Max SDS pr. System: 1024
  • Max SDS pr. Protection Domain: 128
  • Max Disks pr. Storage Pool: 300

Deployment Options

ScaleIO can be deployed in 3 different ways from a purely architectural point of view.

  • New and fancy HCI-mode. SDS + SDC on the same node.
  • Two-Layer mode. If you want to build it like you would any traditional storage system (other than the fact that it doesn’t share any other things that the deployment model)
  • Hybrid. Some nodes have both SDS + SDC, some only have SDS, others only SDC. Giant mashup.

Deployment Example

So now you’re thinking: “okay, fair enough.. but how are all these deployed into a storage system?”

To give you an example, lets say that you have 7 servers on which you want to deploy ScaleIO. A deployment example would be like this:

  • Server 1
    • MDM
    • SDS
    • LIA
  • Server 2
    • MDM
    • SDS
    • LIA
  • Server 3
    • MDM
    • SDS
    • LIA
  • Server 4
    • MDM (TB)
    • SDS
    • LIA
  • Server 5
    • MDM (TB)
    • SDS
    • LIA
  • Server 6
    • SDS
    • LIA
  • Server 7
    • SDS
    • LIA

What this gives you is a 5-node MDM cluster managing a 7-node SDS cluster all with LIAs installed so that it can be patched/managed from a single ScaleIO Gateway.

Wrap Up

So this was the extremely quick 40000 foot view of ScaleIO. If asked about it, I would characterize it like this.

  • True software defined storage.
  • Very flexible.
  • High performance
  • Very durable.

But I Want To Know More!

Might I then recommend reading the architecture guide (https://www.emc.com/collateral/white-papers/h14344-emc-scaleio-basic-architecture.pdf) or the user guide (https://community.emc.com/docs/DOC-45035). Both are very good and very detailed.