Size-changing algorithms (SCAs) are those that take as input a stream of data bits and produce output of a different number of bits. These SCAs share one quality in common: they are generally intended to work on whole streams of input data, from the beginning to the end of the stream. Some of the applications of such algorithms fall into several possible categories:
In this category we also consider transformations to support internationalization of text, as well as Unicoding.
Stronger encryption modes can use Cipher Block Chaining (CBC), a mode that typically increases the size of the output [SchneierSchneier1995]. This algorithm is considered stronger because it does not tell you what the original input data size was. Furthermore, by encoding the input into more output bits, the input data becomes more randomized, increasing the brute-force search space.
Transparent encryption could be particularly useful if combined with a network-based file system such as NFS. In that case, the data that is transmitted over insecure networks is encrypted, making it more difficult to attackers to decode sniffed packets.
There are many useful user-level tools that use SCAs, such as uuencode, compress, and pgp [ZimmermanZimmerman1995]. These tools work on whole files and are often used manually by users. This poses additional inconvenience to users. When you encrypt or decompress a data file, even if you wish to access just a small part of that file, you still have to encode or decode all of it until you reach the portion of interest--an action that consumes many resources. SCAs do not provide information that can help to decode or encode only the portion of interest. Furthermore, running user-level SCA tools repeatedly costs in additional overhead as data must be copied between the user process and the kernel several times. User-level SCA tools are therefore neither transparent to users nor do they perform well.
Instead, it would be useful for a file system to support SCAs. File systems are:
File systems have proven to be a useful abstraction for extending system functionality. Several SCAs (often compression) have been implemented as extensions to existing disk-based file systems [AyersAyers1997,Burrows, Jerian, Lampson, and MannBurrows et al.1992,NagarNagar1997]. Their disadvantages are that they only work with specific operating systems and file systems, and they only support those specific SCAs. Supporting general-purpose SCAs on a wide range of platforms was not possible.
Stackable file systems are an effective infrastructure for creating new file system functionality with minimal performance overhead and development cost [Guy, Heidemann, Mak, Page Jr., Popek, and RothmeierGuy et al.1990,Heidemann and PopekHeidemann and Popek1991,RosenthalRosenthal1990,Skinner and WongSkinner and Wong1993,Zadok, Badulescu, and ShenderZadok et al.1999,Zadok and NiehZadok and Nieh2000,ZadokZadok2001]. Stackable file systems can be developed independently and then stacked on top of each other to provide new functionality. Also, they are more portable and are easier to develop [Zadok and NiehZadok and Nieh2000]. For example, an encryption file system can be mounted on top of a native file system to provide secure and transparent data storage [Zadok, Badulescu, and ShenderZadok et al.1998]. Unfortunately, general-purpose SCAs have never been implemented in stackable file systems. The problem we set out to solve was how to support general-purpose SCAs in a way that is easy to use, performs well, and is available for many file systems.
We propose fast indexing as a solution for supporting SCAs in stackable file systems. Fast indexing provide a way to map file offsets between upper and lower layers in stackable file systems. Since the fast indexing is just a mapping, a lower-layer file system does not have to know anything about the details of the SCA used by an upper-level file system. We store this fast indexing information in index files. Each encoded file has a corresponding index file which is simply stored in a separate file in the lower-layer file system. The index file is much smaller than the original data file, resulting in negligible storage requirements. The index file is designed to be recoverable if it is somehow lost so that it does not compromise the reliability of the file system. Finally, fast indexing is designed to deliver good file system performance with low stacking overhead, especially for common file operations.
We have implemented fast indexing using stackable templates [Zadok, Badulescu, and ShenderZadok et al.1999,Zadok and NiehZadok and Nieh2000,ZadokZadok2001]. This allows us to provide transparent support for SCAs in a portable way. To demonstrate the effectiveness of our approach, we built and tested several size-changing file systems, including a compression file system. Our performance results show the following two points:
This paper describes fast index files and is organized as follows. Section 2 reviews the stacking file-system infrastructure used for this work and discusses related work in SCA support in file systems. Section 3 details the design of the index file. Section 4 describes the usage of the index file in relation to common file operations and discusses several optimizations. Section 5 details our design for a consistent and recoverable index file. Section 6 summarizes important implementation issues. Section 7 describes the file systems we built using this work and evaluates our system. Finally, we present conclusions and discuss directions for future work.