A filesystem for tape? Why?

Tape has been around forever1. Generally, there has been no change in the way that people and applications work with tape since it was first introduced. In common use, tape is a block storage medium with records separated by file marks. Often the record will consist of the contents of a complete file stored without the file meta-data such as filename, date modified, size, etc.

In the beginning, data tape was stored as a single reel of tape that lived on a shelf in a custom-sized box or cylindrical can. This reel-to-reel tape loads into a tape drive by threading (manually or automatically) the start of the tape on an empty reel that is part of the tape drive. Over time, various cartridge designs used dual reels encased in the cartridge to ease handling of the tape media and offer some level of protection. Modern LTO5, IBM Jaguar, and Oracle StorageTek tape media all use a single reel cartridge that automatically threads the tape media to a second reel that is part of the matching compatible tape drive.

Throughout the history of tape there has been a need to store information describing what is recorded on the tape media. Early approaches used a paper label attached to the tape media box or can. This label recorded a description of the content stored on the tape. Typically this description is a list of “content, start-block, and length (or end-block)”.

During the early days of Unix the tar format was developed as a way of packaging the file contents along with the file meta-data for the file. Many alternative file packaging approaches have been developed to support storing files on tape media. The tar format continues to be commonly used as of 2010 although this format is largely suited for bulk save and restore of files from tape media.

Apart from packaging formats like tar, most data tape use is mediated by applications. These applications effectively maintain an internal database that contains the logical equivalent of the early paper labels. These tape-aware applications often use a proprietary on-tape format for storing the file content and rely on some kind of database to keep track of the content on the managed tapes. All access to the file content on the data tape is mediated through the application either by custom APIs or by requiring a full-restore of the file to hard-drive or flash media before the application or user can work with the file. If the application or the database are unavailable or damaged there is very little that can be done to retrieve the data from the media.

Using a packaging format to store files on tape media necessitates that stored files must be fully restored before the user or an application can work with the file. This restoration typically is performed to hard drive or flash storage. Applications that provide a proprietary API to access file content from tape typically also require a full restore of the file data to hard drive or flash before an arbitrary application can work with the file. The proprietary APIs can be difficult to learn since they are unique for each particular tape management system. Development of the tape management system is expensive due, in part, to the need for specialized tape development skills.

If data tape media is exposed to the operating system as a standard POSIX compliant filesystem then the limitations of proprietary APIs and the need to restore files before access the file are removed. The POSIX filesystem API is what programs interface with for hard drive and flash media. All developers learn to program against the the POSIX filesystem API as part of normal practice. So if data tape can be exposed through a POSIX compliant filesystem, then the data tape will appear to users and applications as just a large removable hard drive or flash storage device. (Ignoring the difference in the typical duration of storage operations.) The end result is that data tape can be used anywhere that a hard drive or flash media is currently used without requiring special development or user skills. Tape could be just another type of storage with benefits and limitations that can be evaluated based on the data storage requirements.

A few attempts have been made over the years to build a tape-based file-system. Each of these attempts has not gained much market traction. These previous tape file-systems include ZARAH (1996) and HPTFS (2006). These earlier efforts have not enjoyed wide-spread interest.

The Linear Tape File System (LTFS) has been designed with a different combination of implementation choices than those used in earlier tape filesystems. Based on the widespread and growing interest in LTFS we can assume that the new combination resonates with modern storage needs.

  1. At least since 1951 starting with the UNIVAC 1. Certainly longer than silicon transistors and modern digital computers.
This entry was posted in computer history, data storage, LTFS. Bookmark the permalink.

One Response to A filesystem for tape? Why?

  1. Pingback: LTFS Format Specification and Open-Source - Michael RichmondMichael Richmond

Leave a Reply