NimbleStorage Blog » Ajay Singh Vice President, Product Management

Welcome to the (Multi-Core) CPU Era

Ajay Singh Vice President, Product Management — Wed, 04 Sep 2013 18:08:19 +0000

Hello Multi-Core CPUs (Finally)

Battling the laws of physics and Moore’s “law”, chip manufacturers like Intel and AMD have over the past decade shifted their focus from speeding up CPU clocks to increasing CPU core density. This transition has been painful for storage industry incumbents, whose software stacks had been developed in the 1990s to be “single threaded”, and therefore have needed years of painstaking effort to adapt to multi-core CPUs. Architectures developed after about 2005 have typically been multithreaded from the beginning, to take advantage of growing core densities.

One of the more interesting aspects of recent vendor announcements is just how long it has taken storage industry behemoths to upgrade their products to accommodate multi-core CPUs – judging by all the hoopla these are big engineering feats. Even a fully multi-threaded architecture like Nimble Storage’s CASL (Cache Accelerated Sequential Layout) needs software optimizing whenever there is a big jump in CPU core density to take full advantage of the added horsepower. The difference is that for Nimble these are maintenance releases (not announcement-worthy major or minor releases). We routinely deliver new hardware with the needed resiliency levels, and then add a series of software optimizations over subsequent maintenance releases to squeeze out significant performance gains.

Here’s an example of what’s been accomplished in the 1.4.x release train over the last several months. Nimble customers have particularly appreciated that these performance improvements were made available as non-disruptive firmware upgrades (no downtime), and did not require any additional disk shelves or SSDs (more on how we do this below).

Beyond Multi-Threading

Multi-core readiness is nice, but there is an even more fundamental distinction between architectures. Using CPU cores more efficiently in a “spindle bound” architecture still leaves it a spindle bound architecture. In other words, if your performance was directly proportional to the number of high RPM disks (or expensive SSDs), improved CPU utilization and more CPU cores may raise the limits of the system, but will still leave you needing the exact same number of high RPM disks or SSDs to achieve the same performance. So meeting high random IO performance requirements still takes ungodly amounts of racks full of disk shelves, even with a smattering of SSDs.

The Nimble Storage architecture actually does something very different – it takes advantage of the plentiful CPU cores (and flash) to transform storage performance (IOPS) from a “spindle count” problem to a CPU core problem, allowing us to deliver extremely high performance levels with very few low RPM disks and commodity flash, and achieving big gains in price/performance and density to boot (in many cases needing 1/10^th the hardware of competing architectures).

Similarly, the availability of more CPU cores does little to fix other fundamental limitations in some older architectures, such as compression algorithms that are constrained by disk IO performance, or heavy-duty snapshots that carry big performance (and in some cases capacity) penalties.

Thinking Outside the (Commodity) Box

It’s always good to raise the performance limits of the storage controllers, because this allows you to deliver more performance from a single storage node (improving manageability for large scale IT environments). Here again though, many (though not all) of the older architectures have a fundamental limitation – they can only use a limited number of CPU cores in a system if they want to leverage commodity hardware designs (e.g. standard Intel motherboards). A solution to this problem has been known for a long time – a horizontal clustering (scale-out) architecture. This approach allows a single workload to span the CPU resources of multiple commodity hardware enclosures, while maintaining the simplicity of managing a single storage entity or system. In the long run, this is the capability that allows an architecture to scale performance cost-efficiently without expensive, custom-built refrigerator-like hardware. This can also allow multiple generations of hardware to co-exist in one pool, with seamless load balancing and non-disruptive data migration, thus eliminating the horrendous “forklift” upgrades one of these recent vendor announcements promises to put their customers through.

So, while congratulations are in order to the engineering teams at our industry peers who worked so hard on making aging architectures multi-core ready, this is just a small step forward while the industry has moved ahead by leaps and bounds.

Are All Hybrid Storage Arrays Created Equal?

Ajay Singh Vice President, Product Management — Tue, 09 Oct 2012 18:30:13 +0000

Nimble Storage was founded in early 2008 on the premise that hybrid storage arrays would be the dominant networked storage architecture over the next decade – a premise that is now widely accepted. The interesting question today is, “Are all hybrid storage arrays are created equal?” After all, SSDs and HDDs are commodities, so the only factor setting them apart is the effectiveness of the array software.

How does one compare hybrid storage arrays? Here are some key factors:

How cost-effectively does the hybrid storage array use SSDs to minimize costs while maximizing performance?
How cost-effectively does the hybrid storage array use HDDs to minimize costs while maximizing useable capacity?
How responsive and flexible is the hybrid array at handling multiple workloads and workload changes?
Aside from price/performance and price/capacity, how efficient is the array data management functionality (such as snapshots, clones, and replication)?

This blog will cover the first three. The fourth dimension of efficient data management is a very important factor in evaluating storage arrays, and a topic we’ll cover in detail in a future blog post.

How cost-effectively does the hybrid storage array use SSDs?

Most hybrid storage array architectures stage all writes to SSDs first in order to accelerate write performance, allowing data that is deemed less “hot” to be moved to HDDs at a later point. However as explained below, this is an expensive approach. Nimble storage arrays employ a unique architecture in that only data that is deemed to be cache-worthy for subsequent read access is written to SSDs, while all data is written to low-cost HDDs. Nimble’s unique architecture achieves very high write performance despite writing all data to HDDs by converting random write IOs issued by applications into sequential IOs on the fly, leveraging the fact that HDDs are very good at handling sequential IO.

Write endurance issues demand the use of expensive SSDs. When SSDs receive random writes directly, the actual write activity within the physical SSD itself is higher than the number of logical writes issued to the SSD (a phenomenon called write amplification). This eats into the SSD lifespan, i.e. the number of write cycles that the SSD can endure. Consequently, many storage systems are forced to use higher endurance eMLC or SLC SSDs, which are far more expensive. In addition to the selective writing capability mentioned above, the Nimble architecture also optimizes the written data layout on SSDs so as to minimize write amplification. This allows the use of lower cost commodity MLC SSDs, while still delivering a 5 year lifespan.
Overheads reduce useable capacity relative to raw capacity of SSDs. Hybrid arrays that can leverage data reduction techniques such as compression and de-duplication can significantly increase useable capacity. On the flip side, RAID parity overheads can significantly reduce useable capacity. Nimble’s architecture eliminates the need for RAID overheads on SSD entirely and further increases useable capacity by using inline compression.
Infrequent decision-making about what data to place on SSDs and moving large-sized data chunks wastes SSD capacity. Most hybrid storage arrays determine what data gets placed on SSDs vs. HDDs by analyzing access patterns for (and eventually migrating) large “data chunks”, sometimes called pages or extents. This allows “hot” or more frequently requested data chunks to be promoted into SSDs, while keeping the “cold” or less frequently requested data on HDDs.

Infrequent decisions on data placement cause SSD over-provisioning. Many storage systems analyze what data is “hot” on an infrequent basis (every several hours) and move that data into SSDs with no ability to react to workload changes between periods. Consequently, they have to over-provision SSD capacity to optimize performance between periods. Nimble’s architecture optimizes data placement real-time, with every IO operation.
Optimizing data placement in large data chunks (many MB or even GB) causes SSD over-provisioning. The amount of meta-data needed to manage placement of data chunks gets larger as the data chunks get smaller. Most storage systems are not designed to manage a large amount of meta-data and they consequently use large-sized data chunks, which wastes SSD capacity. For example, if a storage array were to use data chunks that are 1GB in size, frequent access of a database record that is 8KB in size results in an entire 1GB chunk of data being treated as “hot” and getting moved into SSDs. Nimble’s architecture manages data placement in very small chunks (~4KB), thus avoiding SSD wastage.

How cost-effectively does the hybrid storage array use HDDs?

This means assessing the ratio of usable to raw HDD capacity, as well as the cost per GB of capacity. Three main areas drive this:

Type of HDDs. Many hybrid arrays are forced to use high-RPM (10K or 15K) HDDs to handle performance needs for data that is not on SSDs, because of their (higher) random IO performance. Unfortunately high RPM HDD capacity is about 5x costlier ($/GB) vs. low RPM HDDs. As mentioned earlier, Nimble’s write-optimized architecture coalesces thousands of random writes into a small number of sequential writes. Since low-cost, high-density HDDs are good at handling sequential IO, this allows Nimble storage arrays to deliver very high random write performance with low-cost HDDs. In fact a single shelf of low RPM HDDs with the Nimble layout handily outperforms the random write performance of multiple shelves of high RPM drives.
Data Reduction. Most hybrid arrays are unable to compress or de-duplicate data that is resident on HDDs (some may be able to compress or de-duplicate data resident on SSDs). Even among those that do, many recommend that data reduction approaches not be deployed for transactional applications (e.g., databases, mail applications, etc.). The Nimble architecture is able to compress data inline, even for high-performance applications.
RAID and Other System Overheads. Arrays can differ significantly in how much capacity is lost due to RAID protection and other system overheads. For example many architectures force the use of mirroring (RAID-10) for performance intensive workloads. Nimble on the other hand uses a very fast version of dual parity RAID that delivers resiliency in the event of dual disk failure, allows high performance, and yet consumes low capacity overhead. This can be assessed by comparing useable capacity relative to raw capacity, while using the vendor’s RAID best practices for your application.

How responsive and flexible is the hybrid array at handling multiple workloads?

One of the main purposes of a hybrid array is to deliver responsive, high performance at a lower cost than traditional arrays. There are a couple of keys to delivering on the performance promise:

Responsiveness to workload changes based on timeliness and granularity of data placement. As discussed earlier, hybrid arrays deliver high performance by ensuring that “hot” randomly accessed data is served out of SSDs. However many hybrid arrays manage this migration process only on a periodic basis (on the order of hours) which results in poor responsiveness if workloads change between intervals. And in most cases hybrid arrays can only manage very large data chunks for SSD migration, on the order of many MB or even GB. Unfortunately, when such large chunks are promoted into SSDs, large fractions of that can be “cold data” that is forced to be promoted because of design limitations. Then because some of the SSD capacity is used up by this cold data, not all the “hot” data that would have been SSD worthy is able to make it into SSDs. Nimble’s architecture optimizes data placement real-time, for every IO that can be as small as a 4 KB in size.
The IO penalty of promoting “hot” data and demoting “cold” data. Hybrid arrays that rely on a migration process often find that the very process of migration can actually hurt performance when it is most in need! In a migration based approach, promotion of “hot” data into SSDs requires not just that data be read from HDDs and written to SSDs, but also that to make room for that hot data, some colder data needs to be read from SSDs and written into HDDs – which we already know are slow at handling writes. The Nimble architecture is much more efficient in that promoting hot data only requires that data be read from HDDs and written into SSDs – the reverse process is not necessary since a copy of all data is already stored in HDDs.
Flexibly scaling the ratio of SSD to HDD on the fly. Hybrid arrays need to be flexible in that as the attributes of SSDs and HDDs change over time (performance, $/GB, sequential bandwidth, etc.), or as the workloads being consolidated on the array evolve over time, you can vary the ratio of SSD to HDD capacity within the array. A measure of this would be whether a hybrid array can change the SSD capacity on the fly without requiring application disruption, so that you can adapt the flash/disk ratio if and when needed, in the most cost effective manner.

We truly believe that storage infrastructure is going through the most significant transformation in over a decade, and that efficient hybrid storage arrays will displace modular storage over that time frame. Every storage vendor will deploy a combination of SSDs and HDDs within their arrays, and argue that they have already embraced hybrid storage architectures. The real winners over this transformation will be those who have truly engineered their product architectures to maximize the best that SSDs and HDDs together can bring to bear for Enterprise applications.

Thin Provisioning Storage Performance »

Ajay Singh Vice President, Product Management — Tue, 12 Jun 2012 15:26:59 +0000

Anyone familiar with storage today understands the notion of thin provisioning storage capacity. The idea is that hosts and applications don’t immediately (if ever) use all the space allocated to them. So you can logically “over-allocate” a virtualized pool of storage capacity – increasing storage utilization and saving storage capacity.

If you take a moment to think about it, you realize that the principle of thin provisioning is quite ubiquitous. Bank ATMs (or branches for that matter) store enough cash to meet the typical daily withdrawal needs of a predicted fraction of their local client base, but obviously not enough for everyone to withdraw all their deposits. Doing so allows them to put the rest of the deposits to work in loans/investments, funding account services. Car sharing services like Zipcar are increasingly popular in big cities with limited parking, or college campuses where many students can’t afford cars. For occasional users they offer flexible pick up/usage options on demand (say for a couple of hours), at lower cost than car ownership or traditional rentals. Fractional aircraft ownership services like NetJets provide companies and executives the flexibility (and luxury) of private air travel at a fraction of the cost of owning private jets.

In each case, the asset operator leverages knowledge of usage patterns to allow efficient & fairly predictable sharing with far fewer assets than a “100% reserved” model. In other words the operator “thin provisions” an expensive asset to maximize its usage and therefore reduce its cost, making it affordable to a bigger user group.

Back to storage – it’s common knowledge today that flash offers big performance gains, but the problem is that it comes at a high cost per GB of capacity (about 20x-100x higher than commodity disk depending on the grade of flash). Capacity reduction techniques help reduce flash cost somewhat, but many of them are also available on disk based systems so a huge cost gap still remains. This is why flash deployment of is typically limited to the narrow sliver of applications where one can justify the high cost per GB. This is unfortunate because a much broader pool of applications could benefit by leveraging flash intelligently. What’s worse (as most users intuitively know) is that this expensive investment is underutilized because a big percentage of data blocks within the application pool aren’t being accessed at any given point in time (between 80%-95% are inactive for most applications). Think about inactive tables in databases, old emails or inactive VMs. Even worse, think about capacity used by snapshots or replication copies: would you ever consider storing several days (let alone weeks) of snapshots in expensive flash storage?

But what if you could take advantage of this knowledge that only a fraction of the data blocks need high performance at a given point in time? Rather than “thick provisioning” performance like in all-flash systems, what if you had a way to share flash across a broader pool of applications in a way that was responsive enough to handle all their performance needs, but at a much lower cost per GB?

Well that’s what a hybrid flash/disk solution can offer, if it can deliver on a couple of counts:

The flash pool needs to be large enough to cover the performance needs (active data) of the relevant applications. The flash pool size could even be configurable depending on the performance needs, and the level of assurance required. Think of this as equivalent to buying more fractional shares in the jet fleet – all the way up to 100% in the extreme case.
Data placement in flash needs to be truly dynamic – capable of adapting to workload changes or hot spots within milliseconds to seconds, rather than hours or days.

A combination of these characteristics enables predictably high performance and low latency, at a far lower cost than the “all flash/thick provisioned” scenario. Essentially, this is what CASL does –thin provisioning flash (read) performance and reducing effective cost, so that a much broader universe of applications can benefit from it.

M.C. Escher and Storage: True Efficiency »

Ajay Singh Vice President, Product Management — Mon, 16 Jan 2012 21:57:44 +0000

The Escher Stairs of Efficiency Claims

An end user bombarded by the many efficiency claims made by storage vendors might be forgiven for being confused, skeptical, or both. How is it possible for so many vendors to claim they deliver storage with X% lower cost than other vendors? For all these claims to be true, the storage world would have to be the real world equivalent of M.C. Escher’s mind-bending Penrose Stairs. What’s really going on here?

Comparing Storage Efficiency

Well, the problem is that most such claims are based on simplistic comparisons, such as only comparing capacity efficiency (usable capacity/raw capacity). Or just comparing raw performance. And even these are often inflated with unrealistic assumptions.

While interesting, such one-dimensional comparisons are typically only useful for niche applications such as archiving or HPC. For mainstream applications you typically care about multiple dimensions of a storage solution such as price/performance, data protection, availability and capacity efficiency. Knowing this, the question then is – how does one construct more meaningful comparisons?

A Better Comparison

Assuming many solutions meet your threshold of reliability and availability, here are some dimensions of storage efficiency you might consider in comparing them:

· Capacity AND Performance Efficiency

A basic definition of capacity efficiency (usable capacity/raw capacity) can be too simplistic for a couple of reasons. Often it ignores capacity savings techniques like inline compression and cloning. More importantly, it ignores the inherent performance differences between architectures. If you could get 50% compression without a performance impact, that’s certainly nice. But if you could get the performance of high performance drives (15K rpm disks, or better, flash SSDs) and the capacity of high density drives (7.2K rpm disks) in a single tier of storage – that’s HUGE! When you consider 15K RPM drives cost 500% more per GB than 7.2K RPM drives, the above example translates to a 500% capacity advantage from the get go! To capture such differences, a meaningful comparison of efficiency ought to consider both $/GB AND $/IOP S.

· Data Protection Efficiency

The most visible elements of efficient data protection are the capacity efficiency of backup storage (e.g. dedupe ratios), and the bandwidth and capacity efficiency of DR storage. It’s less common to see quantitative comparisons of the level of data protection – namely the RPOs and RTOs enabled by the system although these translate to very real and potentially big costs. And then there’s another part which is sometimes overlooked and typically harder to quantify: operational efficiency, in other words how easy is it to setup and manage backups and DR on a day to day basis. More on this topic next.

· Operational Efficiency (i.e. Simplicity)

This is the dimension that is hardest to measure, but no less important to consider. Operational Efficiency encompasses qualitative attributes like simplicity – can an admin just install and start using a storage technology without days of training, professional services and years of experience? Does the performance adapt quickly to changing workloads? Quantitative measures might be the time (or number of steps) required for common tasks.

There’s another reason to pay close attention to operational efficiency – it helps you distinguish truly efficiently designed storage solutions from less efficiently “bundled” ones. Here’s a hypothetical example to illustrate:

What if you had a shrink-wrapped solution that bundled a small amount of expensive but fast storage together with a lot of cheap but slow storage. And also threw in some software to slowly move data back and forth – to relocate the right data on the right tier. And some more software to do the same for backup purposes. On paper such a solution can appear to have it all– good $/IOPS, good $/GB and automation to simplify management. So what could be missing – potentially a LOT!

If the data transfer process is slow and heavy duty – it might take hours to complete and impact performance while it’s happening. And since application workloads change dynamically, you’d be constantly monitoring workloads and over-allocating performance tiers to ensure bursty applications don’t experience bad performance for extended periods. Despite this, it’s virtually certain that some applications would experience poor performance. As for backups/restores – you’d be constantly battling backup windows and dealing with poor recovery points and slow, painful restores. So in reality, such a package would deliver much less than the sum of its parts.

What This Means for You

Not every application needs a multi-dimensional, well balanced storage solution. Perhaps for an archive tier $/GB is the one over-riding concern. Or maybe for a critical application you’re willing to pay a lot for performance, even if it means compromising on capacity and efficient data protection. However the vast majority of mainstream applications need more versatile storage solutions.

One approach to picking the right one is to assign explicit weights to your criteria: for example capacity efficiency, performance efficiency, data protection efficiency and operational efficiency might be all equally important in your environment and deserve equal weights. You can then compare storage solutions under each of these four criteria and rate each on a scale of 1-5. The overall weighted rating would give you a much better measure of storage efficiency for your applications than anything vendor marketing materials could. In upcoming blogs we will share real world data on how Nimble does on each of these criteria.

Extended Snapshots and Replication As Backup »

Ajay Singh Vice President, Product Management — Wed, 23 Feb 2011 02:22:43 +0000

There’s a quiet shift underway in the IT landscape. No, not cloud computing – few would call that a quiet shift. It’s the trend away from traditional backup and DR to something faster, simpler and lower cost: Extended Snapshots and Replication (ESR). IT practitioners talk about it. Analysts see a trend, for example ESG found (table below) that small-mid environments already use this commonly for VMs. Industry experts take some flak for calling it as they see it. Even folks historically linked with traditional backup acknowledge the shift. Naturally, vendors not best served by this trend vehemently argue against it. When you hear someone argue – “we could have offered this for years, but it’s just not the right approach”, make sure the real reason isn’t an inherent weakness of their underlying technology.

So what’s the fuss about? Let’s review a typical form of traditional backup and DR seen in a mid-sized enterprise, and contrast it with the ESR approach. We’ll skip archiving requirements, which have different solutions, and acknowledge some organizations have more specialized needs.

Traditional Backup and DR – Repeated Copying of Redundant Data

Backup software scans servers nightly for new data, and bulk copies changed data to a dedicated backup device, today likely to be disk based (although tape still rules for archives). Scanning and copying are resource hogs, impacting servers, storage and networks, so they’re done during designated backup windows. Because of restore performance and reliability issues, incremental backups are supplemented with massive weekly full copies which usually consume the weekends Backup dedupe makes it more affordable to retain the 30-90 days of backups most organizations need. However, the bulky upfront copy means you can’t afford to backup too often, so Recovery Points are sparse – typical RPO is one day. And restores still take hours to reconstitute data from the full and incremental backups. Deduped disk backups do have the benefit of enabling WAN efficient offsite replication. Once again though, Recovery Points are spread far apart, and restore times are long. Nor is there an option to run an application right off the DR copy – you need restores to primary storage.

Extended Snapshots and Replication Approach

The primary storage device captures (app consistent) near instant snapshots based on a predefined schedule (every few minutes, or once an hour) without affecting application performance. Efficient snapshot implementations are “un-duped and compressed” and reside on low cost disk, so you can afford the extended retention you need (say 30-90 days). Another subset is replicated (say every hour) using very efficient replication to an offsite DR array, where they are retained for say 60 days. When needed, the entire application or a subset can be restored from snapshots within minutes. Applications can also run directly off the backup/DR copies without any format conversion. There are no backup windows to manage.

Comparing the Approaches

Here’s how each approach handles common failure scenarios:

Traditional backup has had the advantage of incumbency. IT shops are familiar with it. Backup software has supported this approach longer. However, IT shops hate traditional backup, and many are looking to change. And software vendors are catching up in terms of managing snapshots. Finally, newer approaches have so dramatically improved the cost and simplicity of ESR, the contrast more striking than ever:

In the one case you have multiple devices juggling data, 3 data copies, and a lot of daily heavy lifting to get a barely acceptable level of SLAs for recovery. With the other approach, you have 2 devices, 2 data copies (unsurprisingly at a lower cost), no daily backup windows or pain, and much faster, better recovery options.

Which would you choose?

What Defines Converged Storage and Backup? »

Ajay Singh Vice President, Product Management — Wed, 22 Sep 2010 17:22:45 +0000

In previous posts and survey summaries, we described the storage and backup challenges that we at Nimble set out to address. As the solution design took shape (which gradually acquired the name “converged storage and backup”), we took care to ensure it had five key characteristics:

Naturally it has to meet the high performance and availability requirements of enterprise applications and databases, such as very good IOPS, latency and uptime. However IT organizations want the most cost effective performance to meet their business goals. As an example, although we use flash to boost performance, we integrate it very intelligently like a hybrid car, rather than for its raw power like a roadster.
To address chronic backup pain points, we built in instant backups/restores, sufficient retention to meet business needs, and simple application-integrated backup management (typical tasks like scheduling, retention policies, verification, restores, monitoring and alerts). Snapshots are natural building blocks for this, but we took care to overcome the cost and limitations of most traditional approaches such as performance impact, tendency to chew up lots of expensive primary capacity, big reserves, limited snapshot counts, poor manageability, etc.
There are few acquisition decisions where cost is not a consideration (except maybe the ones found here), Continual capacity growth no doubt fuels special sensitivity to the cost of storage. Low storage capacity cost is particularly important for converged storage. It is hard to deliver instant backups/restores and sufficient retention to meet business needs, unless the application data is on low cost primary storage to begin with. Otherwise you would either need the typical, cumbersome data copies to another storage tier, or compromise the weeks of retention that could be maintained cost effectively.
WAN efficient replication, with fast offsite recovery. In addition to high availability and fast restore capabilities, converged storage also has to protect against less likely but catastrophic scenarios such as power failure, site disaster or the equipment catching fire. This means frequent offsite replication, and the ability to rapidly failover to the offsite storage. This in turn means the replication has to be highly WAN efficient, so as not to overwhelm the modest WAN links of the typical organization.
Dead simple management. Every task from setup, to provisioning, backup and replication management and monitoring has to be simple, involve the fewest steps possible, and designed so it could be performed by an IT generalist without deep storage expertise. Capacity optimization techniques have to be fast and so simple as to be nearly invisible. And tasks that typically tend to complex or repetitive have to be simplified by pre-built templates and policies.

Upcoming posts will expand on how we developed the underlying technology solutions to meet and balance these goals.

Why Converge Primary and Backup Storage? »

Ajay Singh Vice President, Product Management — Thu, 22 Jul 2010 00:22:39 +0000

As Varun mentioned in his introduction, since Nimble’s inception we have spent hundreds of hours meeting with IT organizations of all sizes. We engaged them in an open dialogue to understand their challenges, and were listening very closely to the issues they expressed related to storage and backup. These candid discussions helped us build a deeper understanding of their daily challenges, as well as some chronic pain points. Among the most pervasive were fast growth in storage requirements and expense, the associated growth in the cost and complexity of backup, and a less-than-ideal level of disaster recovery preparedness. To elaborate:

Even during economic slowdowns, most companies continue to experience fast growth in primary storage capacity requirements. This in turn drives the need for expensive high-performance primary storage, typically powered by high-RPM drives. Although there is awareness of flash storage, most consider this a high-end solution for only the most pressing performance challenges. A broad cross-section of customers is also frustrated with their vendors’ pricing models, which they view as excessive.

Despite spending a lot of money on backup–including the adoption of disk based backup technologies–most IT groups say the backup process remains painfully resource intensive. The daily process is based on identifying changed data on the application servers and periodically copying large quantities off to backup devices (and even larger quantities each weekend). All of this relies on many moving parts from disparate vendors, and is one of the less reliable processes IT teams manage. Consequently, it puts a severe load on servers, networks and administrators, leading to the designation of long backup windows. Some organizations do use technologies like snapshots for short term recovery, but most are very limited in how many they can retain due to the capacity consumption on expensive primary storage.

Many, if not most, organizations believe their disaster recovery plan is inadequate, but struggle to improve it. This is partly because they find it hard to justify investments towards something that management perceives as a low probability event. But certainly some of the gaps are due to the cost and complexity of common DR solutions, and practical constraints like the limited availability of WAN bandwidth. As a result, the DR scheme for many organizations is linked to the backup process (either because they ship tapes offsite, or they replicate their backups). However the resource intensiveness of the backup process means many organizations can only afford daily backups, limiting the recovery points that an application could be restored to. This also means that the DR copy is in a backup format, slowing down and complicating the actual recovery.

Although myriad technologies try to address one or the other aspect of this picture, few existing solutions attempt to fundamentally simplify it. However in our conversations we heard over and over again that most organizations were open to considering new approaches that cohesively addressed their chronic pain points, even if they deviated from conventional wisdom in some respects. In related blog posts such as this one we describe how we shaped our converged storage and backup approach so we could accomplish just that.