Reducing Storage Management Costs via Informed User-Based Policies
Erez Zadok, Jeffrey Osborn, Ariye Shater,
Charles Wright, and Kiran-Kumar Muniswamy-Reddy
Stony Brook University
,
Jason Nieh
Columbia University
1
Abstract
Storage consumption continues to grow rapidly, especially with the
popularity of multimedia files.
Storage hardware costs represent a small fraction of overall
management costs, which include frequent maintenance and backups.
Our key approach to reducing total storage management costs is to
reduce actual storage consumption.
We achieve this in two ways.
First, we classify files into categories of importance.
Based on these categories, files can be backed up with various
frequencies, or even not at all.
Second, the system may also reclaim space based on a file's importance
(e.g., transparently compress old files).
Our system provides a rich set of policies. We allow users to
tailor their disk usage policies, offloading some of the management
burdens from the system and its administrators.
We have implemented the system and evaluated it. Performance
overheads under normal use are negligible. We report space savings on
modern systems ranging from 25% to 76%, which result in extending
storage lifetimes by 72%.
1 Introduction
Despite seemingly endless increases in the amount of storage and
decreasing hardware costs, managing storage is still expensive.
Furthermore, backing up more data takes more time and uses more
storage bandwidth-thus adversely affecting performance.
Users continue to fill increasingly larger disks.
In 1991, Baker reported that the size of large files had increased by
ten times since the 1985 BSD study
[1,10]. In 2000, Roselli
reported that large files were getting ten times larger than Baker
reported [12].
Our recent studies show that just three years later, large files are
ten times larger than Roselli reported.
Storage management costs have remained a significant component of
total storage costs. In 1989, Gelb reported that in the '70s, storage
management costs at IBM were several times more than hardware costs,
and projected that they would reach ten times the cost of the hardware
[6]. Today, management costs are indeed five to ten times
the cost of underlying hardware and are actually increasing as a
proportion of cost because each administrator can only manage a
limited amount of storage [5,9].
We believe that reducing the rate of consumption of storage is the
best solution to this problem. Our studies and independent studies
[13] indicate that significant savings are possible.
To improve storage management via efficient use of storage, we
designed the Elastic Quota System (Equota). Equota allows
users maximal freedom, with minimal administrator intervention.
Elastic quotas enter users into a contract with the system: users can
exceed their quota while space is available, under the condition that
the system does not provide as rigid assurances about the file's
safety. Users or applications may designate some files as
elastic. Non-elastic (or persistent) files maintain existing
semantics. Elastic quotas creates a hierarchy of data's importance:
the most important data will be backed up frequently; some data may be
compressed and other data can be compressed in a lossy manner; and
some files may not be backed up at all. Finally, if the system is
running short on space, the elastic files may even be removed. Users and
system administrators can configure flexible policies to designate
which files belong to which part of the hierarchy. Elastic quotas
introduce little overhead for normal operation, and demonstrate that
through this new disk usage model, significant space savings are
possible.
2 Motivational Study
Storage needs are increasing-often as quickly as larger storage
technologies are produced. Moreover, each upgrade is costly and
carries with it high fixed costs [5]. We conducted
a study to quantify this growth, with an eye toward reducing the rate
of growth.
We identified four classes of files, three of which can reduce the
growth rate and also the amount of data to backed up.
First, there are files that cannot be considered for reducing growth. These
files are important to users and should be backed up frequently, say
daily.
Second, studies indicate that 82-85% of storage is consumed by files
that have not been accessed in more than a month [2].
Our studies confirm this trend: 89.1% of files or 90.4% of storage
has not been accessed in the past month. These files can be
compressed to recover space. They need not be backed up with
the same frequency as the first class of files, because files that
have not changed recently are less likely to change in the near future.
Third, multimedia files such as JPEG or MP3 can be re-encoded with
lower quality. This method carries some risk because not all of the
original data is preserved, but the data is still available and
useful. These files can be backed up less frequently than other
files.
Fourth, previous studies show that over 20% of all
files-representing over half of the storage-are regenerable
[13]. These files need not be backed up.
Moreover, if the site policy chooses, then these files can even be
removed when space runs short.
To determine what savings are possible given the current usage of disk
space, we conducted a study of four sites, for which we had complete
access. These sites include a total of 3898 users, over 9 million
files, and 735.8GB of data dating back 15 years:
(A) a small software development company with 100
programmers, management, sales, marketing, and administrative users
with data from 1992-2003;
(B) an academic department with 3581 users who are mostly students,
using data from shared file servers, collected over 15 years;
(C) a research group with 177 users and data from 2000-2003;
and (D) a group of 40 cooperative users with personal Web sites and
data from 2000-2003.
Each of these sites has experienced real costs associated with
storage: A underwent several major storage upgrades in that period; B
continuously upgrades several file servers every six months; the
statistics for C were obtained from a file server that was recently
upgraded; and D has recently installed quotas to rein in disk
usage.
Figure 1: Space consumed by different classes. Actual amounts
appear to the right of the bars, with the total size on top.
We considered a transparent compression policy on all
uncompressed files that have not been accessed in 90 days.
We do not include already compressed data (e.g., .gz), compressed
media (e.g., MP3 or JPEG), or files that are only one block.
In this situation, we save between 4.6% from group B to 51
% from group C.
We yield large savings on group C: it has many .c files that
compress to 27% of their original size.
Group B contains a large number of active users, so the percentage of
files that were used in the past 90 days is less than that in the
other sites.
The next hatched bar in Figure 1 is the
savings from lossy compression of still images, videos, and sound
files.
The results varied from a savings of 2.5% for group A to a
savings of 35% for group D.
Groups B and D are more liberal sites, and therefore contain a large
number of personal .mp3 and .avi files. As media
files grow in popularity and size, so will the savings from a lossy
compression policy.
The next bar down represents space consumed by regenerable files, such
as .o files (with corresponding .c's) and
~ files, respectively. This varied between 1.7% for
group B to 40.5% for group A. This represents the amount of data
that need not be backed up, or can be removed.
Group A had large temporary backup tar files that were no longer
needed (ironically, they were created just prior to a file server
migration).
The amount of storage that cannot be reduced through these policies is
the dark bar at the bottom.
Overall, using the space reclamation methods, we can save between 25%
to 76.5% of the total disk space.
To verify if applying the aforementioned three space reclamation
methods would reduce the rate of disk space consumption, we correlated
the average savings we obtained in the above environments with the
SEER [8] and Roselli [12] traces.
We require filename and path information, since our space reclamation
methods depend on file types, which are highly correlated with names
[3].
We evaluated several other traces, but only the combination of SEER
and Roselli's traces provides us with the information we required.
The SEER traces have pathname information but do not have file size
information. Roselli's traces do not contain any file name
information, but have the file size information. We used the size
information obtained by Roselli to extrapolate the SEER growth rates.
The Roselli traces were taken around the same time of the SEER traces,
and therefore give us a good estimate of the average file size on a
system at the time.
At the rate of growth exhibited in the traces, the hard drives in the
machines would need to be upgraded after 11.14 months.
Adding a compression policy extends the disks' lifetime to 18.5 months.
Adding a lossy compression policy extends the disks' lifetime to 18.7
months.
Finally, the savings from removing regenerable files extended the
disks' lifetime to 19.2 months.
Although the savings from lossy files are small here, we believe this
is a result of the data available in the traces. Although the SEER
traces did provide filenames, only certain filenames remained
unanonymized, leaving us to estimate growth based on averages we
computed across all 9 million files in the four group study.
Also, lossy-compression-based policies are centered around media
files, which have increased in popularity only in recent years.
Nevertheless, our conservative study shows that we can still reduce
growth rates by 52%. Furthermore, as the number and footprint of
large-sized media files increase and large files get even larger
[1,12], so will the savings from lossy
compression. Based on these results, we have concluded that our
policies are promising storage management cost-reduction techniques.
3 Design
Our two primary design goals were to allow for versatile and efficient
elastic quota policy management. To achieve versatility we designed a
flexible policy configuration language for use by administrators and
users.
To achieve efficiency we designed the system to run as a kernel file
system with a database, which associates
user IDs, file names, and inode numbers.
Our present implementation marks a file as elastic using a single
inode bit. A more complex hierarchy could be created using extended
attributes.
Architecture
Figure 2 shows the overall architecture of our system.
We describe each component in the figure and then the interactions
between each component.
There are four components in our system:
(1) EQFS is a stackable file system that is mounted on top of
another file system such as Ext3 [17]. EQFS
includes a component (Edquot) that indirectly manages the kernel's
native quota accounting. EQFS also sends messages to a user space
component, Rubberd.
(2) Berkeley DB (BDB) databases records information about
elastic files [14]. We have two types of
databases. First, for each user we maintain a database that maps
inode numbers of elastic files to their names. Having separate
databases for each user allows us to easily locate and enumerate each
user's elastic files. The second type of database records an
abuse factor for each user denoting how "good" or "bad" a
given user has been with respect to historical utilization of disk
space. We describe abuse factors in detail in Section
4.
(3) Rubberd is a user-level daemon that contains two threads.
The database management thread is responsible for updating the BDB
databases. The policy thread periodically executes cleaning policies.
(4) Elastic Quota Utilities are enhanced quota utilities that
maintain the BDB databases and control both persistent and elastic
quotas.
Figure 2: Elastic Quota Architecture
System Operation
EQFS intercepts file system operations, performs related elastic quota
operations, and then passes the operation to the lower file system
(e.g., Ext2). EQFS also intercepts the quota management system call
and inserts its own set of quota management operations, edquot.
Quota operations are intercepted in reverse (e.g., from Ext2 to the
VFS), because only the native disk-based file system knows when an
operation has resulted in a change in the consumption of inodes or
disk blocks.
Each user on our system has two UIDs: one that accounts for persistent
usage and another that accounts for elastic usage. The latter,
called the shadow UID, is simply the ones-complement of the
former. The shadow UID is only used for quota accounting and does not
modify permissions.
When an Edquot operation is called, Edquot determines if it was for an
elastic or a persistent file, and informs dquot to account for the
changed resource (inode or disk block) for either the UID or shadow
UID. This allows us to use the existing quota infrastructure and
utilities to account for elastic usage.
EQFS informs Rubberd of events that create or change the association
of an inode to file name or owner. EQFS informs Rubberd about
creation, deletion, renames, hard links, and ownership changes of
elastic files. EQFS communicates this information to Rubberd's
database management thread over a Linux kernel-to-user socket called
netlink.
Rubberd records this information in the BDB databases.
For example, when a file is made elastic, EQFS sends a "create
elastic file" message to Rubberd along with the UID, inode number,
and the name of the file. Rubberd then inserts a new entry in that
user's database, using the inode number as the key and the file name
as the value.
Rubberd periodically records historical abuse factors for
each user, denoting the user's elastic space utilization over a period
of time as described in Section 4.2.
Elasticity Modes
EQFS can determine file's elasticity in five ways.
(1) Users explicitly can toggle the file's elasticity. This allows
users to control elasticity on a per file basis.
(2) Users can toggle the elastic bit on a directory inode.
Newly created files or sub-directories inherit the elastic bit.
(3) Users can tell EQFS to create all new files elastically (or not).
(4) Users can tell EQFS which newly-created files should be elastic by
their extension. This mode is particularly useful because users often
think of the importance of files by their type (e.g.,
.c are more important than .o files).
(5) Finally, application developers may know best which files are
temporary and can be marked elastic. This can be facilitated through
two new flags we added to the open and creat system calls
that tell EQFS to create the new file as elastic or persistent.
4 Elastic Quota Policies
The core of the elastic quota system is its handling of space
reclamation policies.
File system management involves two parties: the running system and
the people (administrators and users).
To the system, file system reclamation must be efficient so as not to
disturb normal operations. For example, when Rubberd wakes up
periodically, it must be able to quickly determine if the file system
is over the high watermark. If so, Rubberd must be able to locate all
elastic files quickly because those files are candidates for
reclamation.
To the people involved, file system reclamation policies must consider
three factors: fairness, convenience, and gaming. These three factors
are important especially in light of efficiency, because some policies
can be executed more efficiently than others. We describe these three
factors next. However, our overall design goal in this work was to
provide flexibility to allow both administrators and users to use a
suitable set of policies.
Fairness
Fairness is hard to quantify precisely. It is often perceived by the
individual users as how they personally feel that the system and the
administrators treat them.
Nevertheless, it is important to provide a number of policies that
could be tailored to the site's own needs. For example, some users
might consider a largest-file-first compression/removal policy unfair
because recently-created files may not remain on the system long
enough to be used.
For these reasons, the policies that are more fair are based on
individual users' disk space usage: users that consume more disk space
over longer periods of time should be considered the
worst offenders. Overall, it is more fair if the amount of
disk space being cleaned is proportional to the level of offense of
each user who is using elastic space.
Once the worst offender is determined and the amount of disk space to
clean from that user is calculated, however, the system must define
which specific files should be reclaimed from that user. Basic
policies allow for time-based or size-based policies for each user.
For the utmost in flexibility, users are allowed to define their own
ordered list of files to be processed first. This not just allows
users to override system-wide policies, but also to define new
policies based on file names and other attributes (e.g., lossy
compress *.jpg files first).
Convenience
For a system to be successful, it should be easy to use and
simple to understand. Users should be able to find out how much disk
space they are consuming in persistent and elastic files and which of
their elastic files will be removed first. Administrators should be
able to configure new policies easily.
The algorithms used to define a worst offender should be simple and
easy to understand. For example considering the current total elastic
usage is simple and easy to understand. A more complex algorithm
could count the elastic space usage over time as a weighted average.
Although such algorithm is also more fair because it accounts for
historical usage, it might be more difficult for users to understand.
Gaming
Gaming is defined as the ability of individual users to circumvent the
system and prevent their files from being processed first. Good
policies should be resistant to gaming. For example, a global LRU
policy that compresses older files could be circumvented simply by
reading those files.
Policies that are difficult to game include a per-user worst-offender
policy. Regardless of the file's attributes, a user still owns the
same total amount of data. Such policies work well on systems where
it is expected that users will try to exploit the system.
4.1 Rubberd Configuration Files
When Rubberd has to reclaim space, it first determines how much space
it should reclaim-the goal.
The configuration file may define multiple policies, one per line.
Rubberd then applies each policy in order until the goal is reached or
no more policies can be applied.
Each policy in this file has four parameters.
(1) type defines what kind of policy to use
and can have one of three values: global for a global policy,
user for a per-user policy, and user_profile for a
per-user policy that first considers the user's own personal policy
file. This way administrators can permit users to define
personalized policies.
(2) method, defines how space should be
reclaimed. Our prototype currently defines two policies:
gzip compresses files and rm removes them.
This allows administrators to define a system policy that first
compresses files and then removes them if necessary.
A policy using mv and tar could be used together as
an HSM system, archiving and migrating files to slower media at
cleaning time.
(3) sort, defines the order of files being reclaimed. We
define several keys: size (in disk blocks) for sorting by
largest file first, mtime for sorting by oldest modification
time first, and similarly for ctime and
atime.
(4) filter is an optional list of file name filters to apply
the policy to. If not specified, the policy applies to all files.
If users can define their own policy files and Rubberd cannot reclaim
enough space, then Rubberd continues to reclaim space as defined in
the system-wide policy file.
4.2 Abuse Factors
When Rubberd reclaims disk space, it must provide a fair mechanism to
distribute the amount of reclaimed space among users. To decide how
much disk space to reclaim from each user, Rubberd computes an
abuse factor (AF) for all users. Rubberd then distributes the
amount of space to reclaim from each user proportionally to their AF.
Deciding how to compute AF, however, can vary depending on what is
perceived as fair by users and administrators for a given site.
We define two types of AF calculations: current usage and historical
usage.
Current usage can be calculated in three ways. First, Equota can
consider the total elastic usage (in disk blocks) the user consumes.
Second, it can consider the total elastic usage minus the user's
available persistent space. Third, Equota can consider the total
amount of space consumed by the user (elastic and persistent). These
three modes give a system administrator enough flexibility to
calculate the abuse fairly given any group of users (we also have modes
based on a percentage of quota).
Historical usage can be calculated either as a linear or an exponential
average of a user's disk consumption over a period of time (using the
same metrics as current usage). The linear method calculates a linear
average over time to represent a user's abuse factor, while the
exponential method calculates the user's abuse with an exponentially
decaying average. These two historical methods provide further
flexibility to the system administrator in the determination of abuse
factors.
4.3 Cleaning Operation
To reclaim elastic space, Rubberd periodically wakes up and performs a
statfs to determine if high watermark has been reached. If
so, Rubberd spawns a new thread to perform the reclamation. The
thread reads the global policy file and applies each policy
sequentially, until the low watermark is met or all policy entries are
applied.
The application of each policy proceeds in three phases: abuse
calculation, candidate selection, and application. For user policies,
Rubberd retrieves the abuse factor of each user and then determines
the number of blocks to clean from each user proportionally to the
abuse factor. For global policies this step is skipped since all
files are considered without regard to the owner's abuse factor.
Rubberd performs the candidate selection and application phases only
once for global policies. For user policies these two phases are
performed once for each user.
In the candidate selection phase all candidate inode numbers are first
retrieved from the BDB databases. Rubberd then gets the attributes
(size and times) for each file (EQFS allows Rubberd to get these
attributes more efficiently by inode number rather than pathname which
is normally required for stat). Rubberd then sorts the
candidates based on the policy (e.g., largest or oldest files first).
In the application phase, we start at the first element of the
candidate array and retrieve its name from the BDB database. We then
reclaim disk space (e.g., compress the file). As we perform
the application phase, we tally the number of blocks reclaimed based
on the previously-obtained stat information; this avoids
repeatedly calling statfs to check if the low watermark was
reached. Once enough space is reclaimed, cleaning terminates.
5 Related Work
Elastic quotas are complementary to HSM systems. HSM systems provide
disk backup as well as ways to reclaim disk space by moving
less-frequently accessed files to a slower disk or tape. These
systems then provide a way to access files stored on the slower media,
ranging from file search software to replacing the migrated
file with a link to its new location.
Several HSM systems are in use today including UniTree
[4], SGI DMF (Data Migration Facility),
the SmartStor Infinet system, IBM Storage Management
[7], Veritas NetBackup Storage Migrator
[15], and parts of IBM OS/400 [11].
Most HSM systems use a combination of file size and last access times to
determine the file's eligibility for migration.
HP AutoRaid migrates data blocks using policies based on access
frequency [16]. Wilkes et. al. implemented this
at the block level, and suggested that per-file policies in the file
system might allow for more powerful policies; however, they claim
that it is difficult to provide an HSM at the file system level
because there are too many different file system implementations
deployed. We believe that using stackable file systems can mitigate
this concern, as they are relatively portable
[17].
In addition, HSMs typically do not take disk space usage per user over
time into consideration, and users are not given enough flexibility in
choosing storage control policies. We believe that integrating user-
and application-specific knowledge into an HSM system would reduce
overall storage management costs significantly.
6 Conclusions
The main contribution of this paper is in the exploration and
evaluation of various elastic quota policies.
These policies allow administrators to reduce the overall amount of
storage consumed; and to control what files are backed up when,
thereby reducing overall backup and storage costs.
Our system includes many features that allow both site administrators
and users to tailor their elastic quota policies to their needs. For
example, we provide several different ways to decide when a file
becomes elastic: from the directory's mode, from the file's name, from
the user's login session, and even by the application itself. Through
the concept of an abuse factor we have introduced historical use into
quota systems. Finally, our work provides an extensible framework for
new or custom policies to be added.
Our Linux prototype has a total of 14315 lines of code composed of
8309 lines of kernel code and 6006 lines of user code.
We evaluated our system extensively. Performance overheads are small
and acceptable for day-to-day use. We observed an overhead of 1.5%
when compiling gcc. For a worst-case benchmark, creation and
deletion of empty files, our overhead is 5.3% without database
operations (a mode that is useful when recursive scans may already be
performed by backup software) and as much as 89.9% with optional
database operations.
A full version of this paper, including a more detailed design and
a performance evaluation, is available at
www.fsl.cs.sunysb.edu/docs/equota-policy/policy.pdf.
References
- [1]
-
M. G. Baker, J. H. Hartman, M. D. Kupfer, K. W. Shirriff, and J. K. Ousterhout.
Measurements of a Distributed File System.
In Proceedings of 13th ACM Symposium on Operating Systems
Principles, pages 198-212. Association for Computing Machinery SIGOPS,
1991.
- [2]
-
J. M. Bennett, M. A. Bauer, and D. Kinchlea.
Characteristics of files in NFS environments.
ACM SIGSMALL/PC Notes, 18(3-4):18-25, 1992.
- [3]
-
D. Ellard, J. Ledlie, and M. Seltzer.
The Utility of File Names.
Technical Report TR-05-03, Computer Science Group, Harvard
University, March 2003.
- [4]
-
F. Kim.
UniTree: A Closer Look At Solving The Data Storage Problem.
www.networkbuyersguide.com/search/319002.htm, 1998.
- [5]
-
Gartner, Inc.
Server Storage and RAID Worldwide.
Technical report, Gartner Group/Dataquest, 1999.
www.gartner.com.
- [6]
-
J. P. Gelb.
System managed storage.
IBM Systems Journal, 28(1):77-103, 1989.
- [7]
-
IBM Tivoli.
Achieving cost savings through a true storage management
architecture.
www.tivoli.com/products/documents/whitepapers/sto_man_whpt.pdf,
2002.
- [8]
-
G. H. Kuenning.
Seer: Predictive File Hoarding for Disconnected Mobile
Operation.
PhD thesis, University of California, Los Angeles, May 1997.
- [9]
-
J. Moad.
The Real Cost of Storage.
eWeek, October 2001.
www.eweek.com/article2/0,4149,1249622,00.asp.
- [10]
-
J. Ousterhout, H. Costa, D. Harrison, J. Kunze, M. Kupfer, and
J. Thompson.
A Trace-Driven Analysis of the UNIX 4.2 BSD File System.
In Proceedings of the 10th ACM Symposium on Operating System
Principles, pages 15-24, Orcas Island, WA, December 1985. ACM.
- [11]
-
L. Rink.
Hierarchical Storage Management for iSeries and AS/400.
www.ibm.com/servers/eserver/iseries/whpapr/hsm.html.
- [12]
-
D. Roselli, J. R. Lorch, and T. E. Anderson.
A Comparison of File System Workloads.
In Proceedings of the Annual USENIX Technical Conference, pages
41-54, June 2000.
- [13]
-
D. S. Santry, M. J. Feeley, N. C. Hutchinson, A. C. Veitch, R.W. Carton, and
J. Ofir.
Deciding When to Forget in the Elephant File System.
In Proceedings of the 17th ACM Symposium on Operating Systems
Principles, pages 110-123, December 1999.
- [14]
-
M. Seltzer and O. Yigit.
A new hashing package for UNIX.
In Proceedings of the Winter USENIX Technical Conference, pages
173-84, January 1991.
www.sleepycat.com.
- [15]
-
VERITAS.
VERITAS NetBackup Storage Migrator.
A White Paper, February 2002.
- [16]
-
J. Wilkes, R. Golding, C. Staelin, and T. Sullivan.
The HP AutoRAID Hierarchical Storage System.
In ACM Transactions on Computer Systems, volume 14, pages
108-136, February 1996.
- [17]
-
E. Zadok and J. Nieh.
FiST: A Language for Stackable File Systems.
In Proceedings of the Annual USENIX Technical Conference, pages
55-70, June 2000.
Footnotes:
1Appears in the 12th NASA Goddard, 21st IEEE Conference on
Mass Storage Systems and Technologies (MSST 2004)
File translated from
TEX
by
TTH,
version 3.40.
On 20 Nov 2003, 17:21.