As most of you know, storage snapshots are point-in-time copies, and they are immensely useful in recovering from user mistakes, driving consistent backups without freezing application writes, driving replication, etc.

Most storage vendors provide snapshot implementations that share blocks between snapshots to some degree. No respectable implementation would make a complete actual physical copy of the volume. Nonetheless, there is a wide variance between different implementations. Howard Marks wrote a good article about it recently.

Let’s consider the time overhead of snapshots. The time overhead is largely determined by where blocks are written.

Copy-on-write (COW) implementations write a block at a fixed location based on its logical address. In general, COW is optimized for sequential reads, but is not so good for writes. A sequential read of the current state performs well because it results in a sequential read on the disk. On the other hand, when any block is written for the first time since the last snapshot, its old version is copied off to another place, which requires an extra disk read and write. The more frequent the snapshots, the more frequent these high-overhead writes.

Redirect-on-write (ROW) implementations write the new version of every block at a new location. They do not add any additional IO to the first writes after a snapshot. However, this creates another problem. As blocks are overwritten and old snapshots are deleted, the old versions of blocks turn into free space or “holes”. Over time, the free space degenerates to relatively small and randomly distributed holes, creating a “Swiss cheese” pattern. Because new writes must fill these holes, a logically sequential write turns into physically random writes. Furthermore, a subsequent read of that data also turns into physically random reads.

Nimble’s CASL architecture is based on ROW but extends it with “sweeping” (ROW+S). A built-in, highly efficient sweeping process continually consolidates many small holes into fewer large holes. Specifically, it creates full RAID stripes of free space that can be written sequentially. This helps maintain consistent performance for both writes and subsequent reads.

Let’s now consider the space overhead of snapshots. The space overhead is largely determined by the block size.

Smaller block sizes result in increased sharing of data between snapshots. With large blocks, a change to a small portion of a block would create a full new block with mostly duplicate data, causing the snapshot size to be much larger than the amount of data changed.

However, smaller blocks also result in more metadata, which increases space usage. Ideally, the storage block size would match the application block size—the unit in which the application reads and writes. This matching ensures that sharing is optimal (without wasteful duplication) and that metadata usage is as low as possible to support optimal sharing.

Now, applications use different block sizes. E.g., Exchange 2003 uses 4KB block size, Exchange 2007 uses 8KB block size, and Exchange 2010 uses 32KB block size. Most storage systems support a single block size, which if too small results in large amounts of metadata, and if too large results in a lot of wasteful duplication (sometimes causing the snapshot to be 10 times larger than the change).

CASL supports customizable block sizes, which means different volumes can be configured with different block sizes. By optimally matching the block size to the application, this minimizes the space overhead. Moreover, all blocks are stored compressed, which reduces the space usage even further, typically by a factor of 2! The result is snapshots that consume much less space than they would with other implementations.

Share →
Buffer

3 Responses to How Snappy and Skinny Are Your Snapshots? »

  1. Chad says:

    Do you think that NetApp’s embrace of in-line compression on top of
    snaps and dedupe is going to rain on your parade?

    I suppose they’re still not “sweeping” as far as I know, so there is some potential for the “swiss-cheese effect” (to borrow your terms) as aggregates get past 90% utilization…

  2. admin says:

    Leif, thanks for your question.

    The block size on a Nimble volume can be customized for efficient and fast IO. It should be set to match the IO request sizes or a common denominator thereof.

    On the other hand, VMFS block size is used for allocating space, not doing IO. It is generally between 1MB and 8MB. The IO request size is still determined by the guest application. Thus, if you run Exchange 2007 in a guest VM, you should set the Nimble volume block size to 8KB. If you run multiple applications on the same datastore, you should set the Nimble volume size to a common denominator of the application blocks sizes.

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>