Michael Richmond

Problem with video-stream playback in OS X Lion (10.7)

Posted on August 2, 2011 by Michael Richmond

My home IT infrastructure includes an Apple Mac Mini computer that is used as our media server and primary playback machine. The Mac Mini is connected to a 1080p LCD Television, 1GB LAN, and accesses a NAS for media storage. The main job of this machine is audio playback using iTunes, and video playback using Hulu Desktop[1. Development of Hulu Desktop has been halted. The Hulu Desktop download were removed from the hulu.com website on July 2, 2013.] with the occasional web-video streaming using Safari in fullscreen[2. We generally use our XBox360 for Netflix streaming since the interface is nicer. We also have a Sony BluRay player that is supposed to stream Amazon video on demand. It has been a 3 month and counting battle with Sony and Amazon support to get incorrect server-side state flushed before this device will stream correctly.].

This Mac Mini is a late 2009 model with a 2.26GHz Core 2 Duo processor, 4GB of RAM, and an NVIDIA GeForce 9400 GPU. Since this machine is not mission-critical for our household I figured it would be a reasonable candidate for our first Lion upgrade.

After the upgrade video streaming has become practically impossible. Both Hulu Desktop and web-based video streaming suffers from decompression artifacts that result in blocky video and video freezes. After a minute or so of playback the video stream will freeze on the current image and audio playback will continue. When the video freezes if I use VNC to access the machine I can see that the mouse cursor is beach-balling.

I have found that video playback can be made reliable and returned to the same quality as provided by OS X Snow Leopard (10.6) by disabling the new Lion functionality that restores applications to their running state after a reboot. To disable this function, open the System Preferences->General panel and uncheck the “Restore windows when quitting and re-opening apps” option. The option is shown unchecked in the screenshot below.

The OS X Lion 'Restore windows when quitting and re-opening apps' option. Option shown unchecked.

My suspicion is that the background task that takes a snapshot of the memory and system state for each running application is failing behind in taking snapshots when large amounts of RAM are updated. In video streaming by definition the application memory is undergoing large frequent updates. If my suspicion is correct, then instead of dropping work items the snapshot application is trying to process an ever-increasing list of work. The result appears to be CPU starvation to video playback. This starvation probably occurs in the Quicktime stack since the behavior is demonstrated in different playback applications.

Posted in bug, os x, streaming, Uncategorized, video | 4 Comments

Filesystem recovery examples with `ltfsck`

Posted on July 28, 2011 by Michael Richmond

In addition to the filesystem implementation, the Linear Tape File System (LTFS) software ships with two core utilities, “mkltfs” and “ltfsck”. mkltfs (pronounced “make LTFS”) is used to format LTO cartridges with the LTFS Format. ltfsck is used to check and if necessary recover a partially corrupted LTFS Volume back to a consistent and usable state.

In this post I describe some inconsistent states a volume may wind up in, the scenarios that may lead to these states (often power-loss), and how ltfsck recovers to a consistent state. Continue reading →

Posted in data safety, data storage, LTFS | 2 Comments

LTFS consistency and Index snapshots

Posted on July 21, 2011 by Michael Richmond

An LTFS Volume must be in a consistent state when the volume is exchanged with another LTFS system. The LTFS Format Specification defines consistent state as:

“A volume is consistent when both partitions are complete and the last Index Construct in the Index Partition has a back pointer to the last Index Construct in the Data Partition.”

where a complete parition is:

“An LTFS partition that consists of an LTFS Label Construct and a Content Area, where the last construct in the Content Area is an Index Construct.”

The LTFS Format Specification additionally includes the following conformance statement:

“Recorded media claiming conformance to this format shall be in a consistent state when interchanged or stored.”

In practice this conformance statement means that when an LTFS Volume is ejected from an LTO drive the volume must be consistent. Software implementing support for the LTFS format can expect a consistent LTFS cartridge when the cartridge is loaded and reject inconsistent cartridges.

Logical layout of a consistent LTFS volume showing historical Indexes with back-pointers and multiple files written and edited in-place.

The LTFS software writes an updated Index to the end of the Data Partition (DP). The Indexes stored in the DP are interleaved with the file content that is also stored in the DP. LTFS maintains each Index written to the DP for three core reasons:

data safety,

performance, and

snapshots/rollback.

Data safety
To protect the data stored on the media it is vital that LTFS commit an updated Index to the media soon after the file content is written. Without the updated Index, new file content cannot be accessed. Additionally, maintaining the current Index in the DP along with the same logical Index stored in the Index Partition (IP) provides a redundant copy of the Index on the media. These redundant copies mean that in the event of catastrophic failure there is a second chance to read a valid current Index.

Performance
In normal LTFS operation the tape head is positioned in the DP to allow rapid access the file content storage area to service read and write requests. When a new Index needs to be written to the media the LTFS software will write the new Index to the DP and continue servicing any concurrent write operations. By writing the Index to the DP in-line with the file content the LTFS software does not incur a penalty for switching to the other partition and seeking back to the Beginning of Tape (BOT).

Snapshots and rollback
The Indexes written to the DP are likely to be followed by file content during write operations. LTO data tape is sequential media which makes it impossible to delete these Indexes without destroying any file content data that is written after the Index. Maintaining these Indexes provides the ability to inspect and potentially revert the LTFS Volume to any earlier time for which an Index exists on the media. For example, if you work with an LTFS Volume by adding, updating, and deleting files. If you later decide that these changes are not wanted then you can either mount the LTFS Volume read-only using an earlier Index to show and access an earlier version of the volume, or you can permanently revert the LTFS Volume to the an earlier state using a rollback operation.

The Index written to the IP is typically written during unmount processing of the LTFS file-system. During unmount the following operations are performed in order:

all file content written to the volume but still in memory is flushed to the DP,

an updated Index is written to the DP,

LTO cartridge memory is written with new MAM parameters,

the tape head is moved to the IP,

any file content destined for the IP is written overwriting the old Index,

a replica of the current Index is written to the IP overwriting the old Index,

LTO cartridge memory is written with new MAM parameters, and

media is rewound then optionally ejected.

Note that it is safe to overwrite the old Index on the IP because the LTFS software has successfully written the current Index at the end of the DP in operation #2. The details of how MAM parameters are used and the conditions in which file content may be written to the IP are described in a later blog post.

Posted in data safety, data storage, LTFS, resiliency | 3 Comments

Can you use LTFS on LTO4 or earlier media?

Posted on July 13, 2011 by Michael Richmond

I have noticed a pattern of Google searches related to the question “Can LTFS be used with LTO4 media?”. This question is implicitly answered in other posts but the frequency of the Google searches motivate me to post the following explicit answer.

LTFS relies on the cartridge partitioning support that was introduced in the LTO5 specification by the LTO Consortium. Earlier generations of LTO media and drives do not have this partitioning capability and are therefore unable to support the LTFS on-tape format.

Support for multiple partitions at the media and hardware level is the key enabling functionality that allowed my team at IBM to invent and implement the LTFS technology. The details of how LTFS uses partitioning and the tape media are the subject of other posts on this blog, specifically Partitioning in LTO5 and LTFS, and How does LTFS work?.

Without partitioning support there are limited options for where the file-system metadata (or “Index”) may be stored on the media. With LTO generations prior to LTO5 these limited options are all that we have to work with. None of these options provide a satisfactory mix of performance, data safety, and business opportunity to make it practical to pursue LTFS on these older generations.

LTFS is available for LTO5 from all current drive vendors (IBM, HP, and Quantum). Additionally, LTFS is available for IBM’s enterprise tape product the TS1140 (aka Jaguar) and has been ported by Oracle to their T10000C product line. At this point LTFS has been adopted by all actively developed data tape technologies.

Posted in data storage, LTFS | Leave a comment

`sync()` behavior in LTFS

Posted on July 6, 2011 by Michael Richmond

Modern operating systems typically maintain an in-kernel cache of file-system metadata and small buffers of recent writes to open files. The file-system metadata typically includes data such as the filename, timestamps, and permissions for recently accessed files and directories. This metadata cache and the write buffers are periodically flushed to the storage media automatically by the system. This flush operation is commonly referred to as the sync() operation named after the user-space system call that exposes the functionality in UN*X-based systems such as Linux and Mac OS X. In Microsoft Windows this functionality is exposed by the FlushFileBuffers() call. sync() is so named because the system call “synchronizes” the file-system.

This blog entry outlines the sync() behavior in LTFS and the rational behind the associated design choices. Continue reading →

Posted in data storage, LTFS, sync, system call | Leave a comment

How does LTFS work?

Posted on June 24, 2011 by Michael Richmond

The Linear Tape File System (LTFS) relies on support for partitioning was introduced in LTO generation 5. Partitioning a LTO5 cartridge divides the media in two separate data storage areas known as “partitions”. Each partition can be written to without impacting data stored in the other partition on the media.

A data tape formatted for use with LTFS has two partitions, an Index Partition (IP) and a Data Partition (DP). The Index Partition has a relatively small capacity of 37.5GB. The Data Partition comprises the remaining 1.43TB of available capacity on the media.

LTFS writes an Index that holds all the file and folder meta-data for the LTFS Volume to the Index Partition. The LTFS Format Specification defines that a consistent LTFS Volume must, along with other properties, have the most-recent Index written to the end of both the IP and the DP. In the diagram below, “Index²” is the most-recent Index. The specification defines that an LTFS Volume must be consistent when it is exchanged with another system. In practice, this means that any compliant implementation of the specification that operates on an LTFS Volume must ensure that the media is consistent when the media is ejected.

At mount-time the LTFS software reads the current Index from the IP and builds an in-memory structure representing all of the folders and files stored on the media. This structure contains meta-data such as file timestamps, file permissions, file name, file size, etc. The structure also contains the location on the DP for each data extent that holds part of the file content.

When a user or an application traverses a mounted Linear Tape File System the LTFS software can return filenames, folders, timestamps, file sizes, and other meta-data from the in-memory index structure. When the user double-clicks on a file to open it, or an application reads from a file the LTFS software causes the tape drive to seek to the start of the relevant data extents and reads the data from the tape media.

LTO5 tape drives are able to stream data for reading and writing at 140MB/s. This speed is about 40% higher than the best hard-drives which top out at around 100MB/s. Of course, since LTO data tape is sequential access media there is a high seek time for data tape. Worst case seek time is moving from one end of the physical tape to the other end. An end-to-end seek will take roughly 90 seconds. So the average seek time is around 45 seconds.

Logical layout of a LTFS Volume on LTO5 media. Beginning of Tape (BOT) at the left edge of diagram, End of Tape (EOT) at the right edge. Diagram is not to scale. BOT to EOT is 846 meters (~2775 feet) and tape width top-to-bottom is 1.27 centimeters (1/2 inch).

LTO5 data tape writes data in a sequence of wraps on the physical media. Each wrap consists of one track of data written in a forward direction along the length of the tape plus another track of data written in a reverse direction along the length of the tape. There are 80 such wraps on LTO5 media. To fully traverse all of the stored data on a full LTO 5 cartridge requires traversing the length of the tape one-hundred and sixty times. (Eighty times in the forward direction interleaved with eighty times in the reverse direction.)

As a result of the wrap-based data layout a seek to the beginning of a randomly selected file is often quite fast. Typically a seek to the beginning of a file will only require a short tape movement along the length of the tape along with a lateral head movement that is perpendicular to the axis of tape movement. These lateral head moves are performed in a few seconds.

Writes to a file stored in LTFS are performed by seeking to the end of the data and writing the file content to the media. If the file already exists in the LTFS volume then only the “over-written” areas of the file are written at the end of the DP. The extent list for such files that are modified “in-place” (on tape media) is updated to insert the new extents into the appropriate offsets in the extent list for the file.

Posted in data storage, LTFS | 32 Comments

Partitioning in LTO5 and LTFS

Posted on June 21, 2011 by Michael Richmond

The LTO5 specification includes support for partitions on LTO5 data tapes. This partition support can be classified into two groups: Partition Aware and Partition Enabled.

Here is the relevant definition from the LTO 5 specification:

“All Generation 5 LTO Ultrium tape drives will support at least one partition when writing to or reading from a Generation 5 cartridge. Drives that support more than one partition will implement all requirements for partitions as described in this document. Drives that support only one partition will reject a cartridge that has more than one partition.”

Partition Aware drives can identify partitioned media but will refuse to operate with partitioned media. Partition Aware are supported in the LTO5 specification as an approach to allow drive vendors time to explore and implement support for partitioning. The idea was that a vendor could choose to undertake minimal work to have their drive firmware identify partitioned media and refuse to operate over the partitioned media and still be in compliance with the specification.

Partition Enabled drives are supported in the LTO5 specification as drives that will identify, create, and work with partitioned LTO5 media. At the time of writing (June 2011) all LTO5 drive vendors offer Partition Enabled hardware.

Partitioning in LTO5 allows a maximum of 2 partitions on the tape media. Media utilizing one partition mirrors the format of previous LTO generations. That is, the whole empty tape is a single area of writable sequential media.

LTO5 media utilizing two partitions provides two tape areas of arbitrary size separated by a guard wrap. Each partition can be written to as sequential media without impacting the data stored on the other partition. Each partition can be of an arbitrary size but the media is allocated in terms of wraps with a guard wrap between the partitions. Since partitions are allocated as complete wraps the smallest practical partition size is one wrap, or approximately 37.5GB of uncompressed data storage. The guard wrap is necessary to provide sufficient physical separation on the media to ensure that writes to each partition will not interfere with data written to the other partition.

The LTFS software utilizes two partitions to construct the on media format for LTFS. The “Index Partition” is created to use one whole wrap on the media. The LTO5 partitioning scheme uses an additional wrap as a guard wrap between partition 0 and partition 1. This guard wrap is a buffer to ensure that writes to each partition cannot interfere with data on the other partition.

The LTFS software creates a second “Data Partition” utilizing the remaining wraps available on the media to consume the balance of the tape area. LTO5 media has a rated capacity of 1.5TB. Therefore, LTO5 media formatted for use by LTFS has a storage capacity of 1.43TB (1.5TB – (37.5GB + 37.5GB)).

Posted in data storage, LTFS | 3 Comments

Early adopter reports on experience with LTFS

Posted on June 16, 2011 by Michael Richmond

I have been working with Thought Equity Motion from very early in the LTFS productization effort. They amass very large data archives and their CTO, Mark Lemmons, was particularly excited by LTFS from a very early stage.

Last year at the IBC trade show, in Amsterdam, Thought Equity announced that they are building a multi-petabyte video archive for EyeTV using LTFS for data storage. It is gratifying to hear the success that Mark and his team have had working with LTFS.

“With LTFS from IBM I can literally integrate my global storage strategy.”

“[I]mmediate, tangible economic benefit from our LTFS deployments today.”

“Cost per GB reduced by 80%.”

“Without IBM’s LTFS technology, we’d be in a world of hurt.”

“High degree of flexibility. High degree of interoperability. [LTFS] It’s fantastic.”

http://www.youtube.com/watch?v=M7w0jrkQnj4

Posted in data storage, LTFS | Leave a comment

Tape Partitioning and LTFS

Posted on June 10, 2011 by Michael Richmond

IBM LTO5 data tape media.

Tape partitioning has been introduced over the years to a number of data tape products. Generally, this partitioning support has been under-utilized, or not-utilized by users and applications. For example, DDS tape has supported partitioning since DDS-2 released in 1993.

In April 2010 the Linear Tape Open (LTO) consortium released LTO5, the fifth generation of the LTO technology with hardware and media availability from consortium members. Amongst other improvements, LTO5 introduced support for tape partitioning to the LTO technology. LTO5 partitioning supports a maximum of 2 partitions per cartridge and requires LTO5 tape hardware and tape media.

Tape partitioning is not compatible with LTO4 media even though the LTO5 hardware can write to LTO4 media. Adding partition support to LTO4 would require changes to the LTO4 specification and existing hardware products. If the LTO4 media is partitioned then this media would not be compatible with the LTO4 hardware already in the field.

The Linear Tape File System (LTFS) uses the partitioning support in LTO5 to create two partitions on the tape media. One partition, known as the Index Partition (IP), is used as a central place to store an Index of the contents of the tape. This Index holds the filesystem meta-data for all files written to the LTFS Volume. In addition to the filename, date stamps, extended attributes, and folder names, the Index records the location of the file content on the tape media. The Index Partition is occupies a relatively small amount of the total media capacity.

In general use, the file content is written to the larger of the two partitions on the tape media. This partition is known as the Data Partition (DP). In the current implementations of LTFS the Data Partition has been 93.7% of the raw data tape media capacity. This translates to roughly 1.43TB on LTO5 media.

The Linear Tape File System software interprets this partitioned media format to read the current Index from the media when the filesystem is mounted. The content of the Index is used by the software to build an in-memory structure that represents the folder and file structures on the LTFS Volume. This in-memory structure allows the filesystem to immediately respond to requests to list files and folders, access extended attributes, show timestamps, and read/write permissions. In this way the user can browse the filesystem using tools like ‘ls’, ‘find’, Windows Explorer, or OS X Finder without requiring the filesystem to touch the tape media.

When a user or an application performs an operation to access the content of a file stored in LTFS the filesystem moves the tape media to the location of the relevant data and performs a read. The LTFS filesystem does not need to unpack the file content from a packaging structure like ‘tar’ and the filesystem is not caching the file content on disk. Instead, the filesystem exposes stored files as byte-addressable data just like other POSIX filesystems.

Byte-addressable access to file data allows applications and users to extract segments out of the middle of a stored file. This kind of data extraction allows applications to retrieve a short clip of a few seconds of video directly from much larger video files. Most previous uses of data tape require the whole video file to be restored to a hard drive before a short clip can be extracted. With LTFS, the need for a full restore is eliminated.

Posted in data storage, LTFS | Leave a comment

A filesystem for tape? Why?

Posted on June 4, 2011 by Michael Richmond

Tape has been around forever[1. At least since 1951 starting with the UNIVAC 1. Certainly longer than silicon transistors and modern digital computers.]. Generally, there has been no change in the way that people and applications work with tape since it was first introduced. In common use, tape is a block storage medium with records separated by file marks. Often the record will consist of the contents of a complete file stored without the file meta-data such as filename, date modified, size, etc.

In the beginning, data tape was stored as a single reel of tape that lived on a shelf in a custom-sized box or cylindrical can. This reel-to-reel tape loads into a tape drive by threading (manually or automatically) the start of the tape on an empty reel that is part of the tape drive. Over time, various cartridge designs used dual reels encased in the cartridge to ease handling of the tape media and offer some level of protection. Modern LTO5, IBM Jaguar, and Oracle StorageTek tape media all use a single reel cartridge that automatically threads the tape media to a second reel that is part of the matching compatible tape drive.

Throughout the history of tape there has been a need to store information describing what is recorded on the tape media. Early approaches used a paper label attached to the tape media box or can. This label recorded a description of the content stored on the tape. Typically this description is a list of “content, start-block, and length (or end-block)”.

During the early days of Unix the tar format was developed as a way of packaging the file contents along with the file meta-data for the file. Many alternative file packaging approaches have been developed to support storing files on tape media. The tar format continues to be commonly used as of 2010 although this format is largely suited for bulk save and restore of files from tape media.

Apart from packaging formats like tar, most data tape use is mediated by applications. These applications effectively maintain an internal database that contains the logical equivalent of the early paper labels. These tape-aware applications often use a proprietary on-tape format for storing the file content and rely on some kind of database to keep track of the content on the managed tapes. All access to the file content on the data tape is mediated through the application either by custom APIs or by requiring a full-restore of the file to hard-drive or flash media before the application or user can work with the file. If the application or the database are unavailable or damaged there is very little that can be done to retrieve the data from the media.

Using a packaging format to store files on tape media necessitates that stored files must be fully restored before the user or an application can work with the file. This restoration typically is performed to hard drive or flash storage. Applications that provide a proprietary API to access file content from tape typically also require a full restore of the file data to hard drive or flash before an arbitrary application can work with the file. The proprietary APIs can be difficult to learn since they are unique for each particular tape management system. Development of the tape management system is expensive due, in part, to the need for specialized tape development skills.

If data tape media is exposed to the operating system as a standard POSIX compliant filesystem then the limitations of proprietary APIs and the need to restore files before access the file are removed. The POSIX filesystem API is what programs interface with for hard drive and flash media. All developers learn to program against the the POSIX filesystem API as part of normal practice. So if data tape can be exposed through a POSIX compliant filesystem, then the data tape will appear to users and applications as just a large removable hard drive or flash storage device. (Ignoring the difference in the typical duration of storage operations.) The end result is that data tape can be used anywhere that a hard drive or flash media is currently used without requiring special development or user skills. Tape could be just another type of storage with benefits and limitations that can be evaluated based on the data storage requirements.

A few attempts have been made over the years to build a tape-based file-system. Each of these attempts has not gained much market traction. These previous tape file-systems include ZARAH (1996) and HPTFS (2006). These earlier efforts have not enjoyed wide-spread interest.

The Linear Tape File System (LTFS) has been designed with a different combination of implementation choices than those used in earlier tape filesystems. Based on the widespread and growing interest in LTFS we can assume that the new combination resonates with modern storage needs.

Posted in computer history, data storage, LTFS | 1 Comment

Michael Richmond

Problem with video-stream playback in OS X Lion (10.7)

Filesystem recovery examples with `ltfsck`

LTFS consistency and Index snapshots

Can you use LTFS on LTO4 or earlier media?

`sync()` behavior in LTFS

How does LTFS work?

Partitioning in LTO5 and LTFS

Early adopter reports on experience with LTFS

Tape Partitioning and LTFS

A filesystem for tape? Why?

Archives

Meta