LTFS consistency and Index snapshots

An LTFS Volume must be in a consistent state when the volume is exchanged with another LTFS system. The LTFS Format Specification defines consistent state as:

A volume is consistent when both partitions are complete and the last Index Construct in the Index Partition has a back pointer to the last Index Construct in the Data Partition.

where a complete parition is:

An LTFS partition that consists of an LTFS Label Construct and a Content Area, where the last construct in the Content Area is an Index Construct.

The LTFS Format Specification additionally includes the following conformance statement:

Recorded media claiming conformance to this format shall be in a consistent state when interchanged or stored.

In practice this conformance statement means that when an LTFS Volume is ejected from an LTO drive the volume must be consistent. Software implementing support for the LTFS format can expect a consistent LTFS cartridge when the cartridge is loaded and reject inconsistent cartridges.

Logical layout of a consistent LTFS volume showing historical Indexes with back-pointers and multiple files written and edited in-place.

The LTFS software writes an updated Index to the end of the Data Partition (DP). The Indexes stored in the DP are interleaved with the file content that is also stored in the DP. LTFS maintains each Index written to the DP for three core reasons:

  • data safety,
  • performance, and
  • snapshots/rollback.

Data safety
To protect the data stored on the media it is vital that LTFS commit an updated Index to the media soon after the file content is written. Without the updated Index, new file content cannot be accessed. Additionally, maintaining the current Index in the DP along with the same logical Index stored in the Index Partition (IP) provides a redundant copy of the Index on the media. These redundant copies mean that in the event of catastrophic failure there is a second chance to read a valid current Index.

Performance
In normal LTFS operation the tape head is positioned in the DP to allow rapid access the file content storage area to service read and write requests. When a new Index needs to be written to the media the LTFS software will write the new Index to the DP and continue servicing any concurrent write operations. By writing the Index to the DP in-line with the file content the LTFS software does not incur a penalty for switching to the other partition and seeking back to the Beginning of Tape (BOT).

Snapshots and rollback
The Indexes written to the DP are likely to be followed by file content during write operations. LTO data tape is sequential media which makes it impossible to delete these Indexes without destroying any file content data that is written after the Index. Maintaining these Indexes provides the ability to inspect and potentially revert the LTFS Volume to any earlier time for which an Index exists on the media. For example, if you work with an LTFS Volume by adding, updating, and deleting files. If you later decide that these changes are not wanted then you can either mount the LTFS Volume read-only using an earlier Index to show and access an earlier version of the volume, or you can permanently revert the LTFS Volume to the an earlier state using a rollback operation.

The Index written to the IP is typically written during unmount processing of the LTFS file-system. During unmount the following operations are performed in order:

  1. all file content written to the volume but still in memory is flushed to the DP,
  2. an updated Index is written to the DP,
  3. LTO cartridge memory is written with new MAM parameters,
  4. the tape head is moved to the IP,
  5. any file content destined for the IP is written overwriting the old Index,
  6. a replica of the current Index is written to the IP overwriting the old Index,
  7. LTO cartridge memory is written with new MAM parameters, and
  8. media is rewound then optionally ejected.

Note that it is safe to overwrite the old Index on the IP because the LTFS software has successfully written the current Index at the end of the DP in operation #2. The details of how MAM parameters are used and the conditions in which file content may be written to the IP are described in a later blog post.

Posted in LTFS | Tagged , , , | 1 Comment

Can you use LTFS on LTO4 or earlier media?

I have noticed a pattern of Google searches related to the question “Can LTFS be used with LTO4 media?”. This question is implicitly answered in other posts but the frequency of the Google searches motivate me to post the following explicit answer.

LTFS relies on the cartridge partitioning support that was introduced in the LTO5 specification by the LTO Consortium. Earlier generations of LTO media and drives do not have this partitioning capability and are therefore unable to support the LTFS on-tape format.

Support for multiple partitions at the media and hardware level is the key enabling functionality that allowed my team at IBM to invent and implement the LTFS technology. The details of how LTFS uses partitioning and the tape media are the subject of other posts on this blog, specifically Partitioning in LTO5 and LTFS, and How does LTFS work?.

Without partitioning support there are limited options for where the file-system metadata (or “Index”) may be stored on the media. With LTO generations prior to LTO5 these limited options are all that we have to work with. None of these options provide a satisfactory mix of performance, data safety, and business opportunity to make it practical to pursue LTFS on these older generations.

LTFS is available for LTO5 from all current drive vendors (IBM, HP, and Quantum). Additionally, LTFS is available for IBM’s enterprise tape product the TS1140 (aka Jaguar) and has been ported by Oracle to their T10000C product line. At this point LTFS has been adopted by all actively developed data tape technologies.

Posted in LTFS | Tagged , | Leave a comment

sync() behavior in LTFS

Modern operating systems typically maintain an in-kernel cache of file-system metadata and small buffers of recent writes to open files. The file-system metadata typically includes data such as the filename, timestamps, and permissions for recently accessed files and directories. This metadata cache and the write buffers are periodically flushed to the storage media automatically by the system. This flush operation is commonly referred to as the sync() operation named after the user-space system call that exposes the functionality in UN*X-based systems such as Linux and Mac OS X. In Microsoft Windows this functionality is exposed by the FlushFileBuffers() call. sync() is so named because the system call “synchronizes” the file-system.

This blog entry outlines the sync() behavior in LTFS and the rational behind the associated design choices. Continue reading

Posted in LTFS | Tagged , , , | Leave a comment

How does LTFS work?

The Linear Tape File System (LTFS) relies on support for partitioning was introduced in LTO generation 5. Partitioning a LTO5 cartridge divides the media in two separate data storage areas known as “partitions”. Each partition can be written to without impacting data stored in the other partition on the media.

A data tape formatted for use with LTFS has two partitions, an Index Partition (IP) and a Data Partition (DP). The Index Partition has a relatively small capacity of 37.5GB. The Data Partition comprises the remaining 1.43TB of available capacity on the media.

LTFS writes an Index that holds all the file and folder meta-data for the LTFS Volume to the Index Partition. The LTFS Format Specification defines that a consistent LTFS Volume must, along with other properties, have the most-recent Index written to the end of both the IP and the DP. In the diagram below, “Index2” is the most-recent Index. The specification defines that an LTFS Volume must be consistent when it is exchanged with another system. In practice, this means that any compliant implementation of the specification that operates on an LTFS Volume must ensure that the media is consistent when the media is ejected.

At mount-time the LTFS software reads the current Index from the IP and builds an in-memory structure representing all of the folders and files stored on the media. This structure contains meta-data such as file timestamps, file permissions, file name, file size, etc. The structure also contains the location on the DP for each data extent that holds part of the file content.

When a user or an application traverses a mounted Linear Tape File System the LTFS software can return filenames, folders, timestamps, file sizes, and other meta-data from the in-memory index structure. When the user double-clicks on a file to open it, or an application reads from a file the LTFS software causes the tape drive to seek to the start of the relevant data extents and reads the data from the tape media.

LTO5 tape drives are able to stream data for reading and writing at 140MB/s. This speed is about 40% higher than the best hard-drives which top out at around 100MB/s. Of course, since LTO data tape is sequential access media there is a high seek time for data tape. Worst case seek time is moving from one end of the physical tape to the other end. An end-to-end seek will take roughly 90 seconds. So the average seek time is around 45 seconds.

Logical layout of a LTFS Volume on LTO5 media. Beginning of Tape (BOT) at the left edge of diagram, End of Tape (EOT) at the right edge. Diagram is not to scale. BOT to EOT is 846 meters (~2775 feet) and tape width top-to-bottom is 1.27 centimeters (1/2 inch).

LTO5 data tape writes data in a sequence of wraps on the physical media. Each wrap consists of one track of data written in a forward direction along the length of the tape plus another track of data written in a reverse direction along the length of the tape. There are 80 such wraps on LTO5 media. To fully traverse all of the stored data on a full LTO 5 cartridge requires traversing the length of the tape one-hundred and sixty times. (Eighty times in the forward direction interleaved with eighty times in the reverse direction.)

As a result of the wrap-based data layout a seek to the beginning of a randomly selected file is often quite fast. Typically a seek to the beginning of a file will only require a short tape movement along the length of the tape along with a lateral head movement that is perpendicular to the axis of tape movement. These lateral head moves are performed in a few seconds.

Writes to a file stored in LTFS are performed by seeking to the end of the data and writing the file content to the media. If the file already exists in the LTFS volume then only the “over-written” areas of the file are written at the end of the DP. The extent list for such files that are modified “in-place” (on tape media) is updated to insert the new extents into the appropriate offsets in the extent list for the file.

Posted in LTFS | Tagged , | 12 Comments

Partitioning in LTO5 and LTFS

The LTO5 specification includes support for partitions on LTO5 data tapes. This partition support can be classified into two groups: Partition Aware and Partition Enabled.

Here is the relevant definition from the LTO 5 specification:

All Generation 5 LTO Ultrium tape drives will support at least one partition when writing to or reading from a Generation 5 cartridge. Drives that support more than one partition will implement all requirements for partitions as described in this document. Drives that support only one partition will reject a cartridge that has more than one partition.

Partition Aware drives can identify partitioned media but will refuse to operate with partitioned media. Partition Aware are supported in the LTO5 specification as an approach to allow drive vendors time to explore and implement support for partitioning. The idea was that a vendor could choose to undertake minimal work to have their drive firmware identify partitioned media and refuse to operate over the partitioned media and still be in compliance with the specification.

Partition Enabled drives are supported in the LTO5 specification as drives that will identify, create, and work with partitioned LTO5 media. At the time of writing (June 2011) all LTO5 drive vendors offer Partition Enabled hardware.

Partitioning in LTO5 allows a maximum of 2 partitions on the tape media. Media utilizing one partition mirrors the format of previous LTO generations. That is, the whole empty tape is a single area of writable sequential media.

LTO5 media utilizing two partitions provides two tape areas of arbitrary size separated by a guard wrap. Each partition can be written to as sequential media without impacting the data stored on the other partition. Each partition can be of an arbitrary size but the media is allocated in terms of wraps with a guard wrap between the partitions. Since partitions are allocated as complete wraps the smallest practical partition size is one wrap, or approximately 37.5GB of uncompressed data storage. The guard wrap is necessary to provide sufficient physical separation on the media to ensure that writes to each partition will not interfere with data written to the other partition.

The LTFS software utilizes two partitions to construct the on media format for LTFS. The “Index Partition” is created to use one whole wrap on the media. The LTO5 partitioning scheme uses an additional wrap as a guard wrap between partition 0 and partition 1. This guard wrap is a buffer to ensure that writes to each partition cannot interfere with data on the other partition.

The LTFS software creates a second “Data Partition” utilizing the remaining wraps available on the media to consume the balance of the tape area. LTO5 media has a rated capacity of 1.5TB. Therefore, LTO5 media formatted for use by LTFS has a storage capacity of 1.43TB (1.5TB – (37.5GB + 37.5GB)).

Posted in LTFS | Tagged , | 2 Comments

Early adopter reports on experience with LTFS

I have been working with Thought Equity Motion from very early in the LTFS productization effort. They amass very large data archives and their CTO, Mark Lemmons, was particularly excited by LTFS from a very early stage.

Last year at the IBC trade show, in Amsterdam, Thought Equity announced that they are building a multi-petabyte video archive for EyeTV using LTFS for data storage. It is gratifying to hear the success that Mark and his team have had working with LTFS.

“With LTFS from IBM I can literally integrate my global storage strategy.”

“[I]mmediate, tangible economic benefit from our LTFS deployments today.”

“Cost per GB reduced by 80%.”

“Without IBM’s LTFS technology, we’d be in a world of hurt.”

“High degree of flexibility. High degree of interoperability. [LTFS] It’s fantastic.”

Posted in LTFS | Tagged , | Leave a comment