A Mirroring File System

Often there's a need to keep an identical copy of another file system locally. One common use for such mirrors is for distribution of new software over the Internet; e.g., Red Hat has many mirror sites of their RPMs and ISO images so as to lessen the load on Red Hat's networks and FTP/HTTP servers. This mirroring is usually done by copying the whole data over the network and keeping it in sync with the remote copy, using FTP/HTTP copy tools such as "wget" or TCP/IP based tools such as "rdist" or "rsync." The problem is that you make a full copy of the remote file system before it can be used locally, thus consuming a lot of network bandwidth and local disk space, while not knowing whether everything you're copying locally will actually be used.

Mirrorfs is a lazy-mirroring stackable file system. It creates a thin view of a remote file system by displaying only file and directory names, with their correct sizes. However, none of the actual file data is retrieved locally until someone (possibly remote) tries to read it. In that manner, only files that are actually being used get mirrored from the origin site. When a remote user tries to read a file (say via FTP or HTTP), Mirrorfs will open a T-style channel to both the user and the origin site, retrieving the data from the remote site on the fly, streaming it to the end-user, and concurrently writing it to the local disk for persistent storage. (In a way, the local disk in Mirrorfs acts as a caching file system.) Since remote users are trying to retrieve data over a slow network channels anyway, the added overhead for retrieving the data from the origin site is negligible. Mirrorfs can also lazy-copy the meta-data of the origin file system (file/directory names, sizes, etc.). Finally, we include a set of policies that can clean up data files that either no longer exist on the remote site, or haven't been used locally for a period of time.

Past Students:

# Name (click for home page) Program Period Current Location
1 Nikolai Joukov PhD Jan 2004 - Dec 2006 Research Staff Member, Storage and Data Services Research group, IBM T. J. Watson Research Center (Hawthorne, NY)

Sponsors:

# Sponsor Amount Period Type Title (click for award abstract)
1 NSF Trusted Computing (TC) $400,000 2003-2006 Sole PI A Layered Approach to Securing Network File Systems