We are investigating methods of improving the performance of writes in the middle of files by decoupling the order of the bytes in the encoded file from their order in the original file. By decoupling their order, we could move writes in the middle of files elsewhere--say the end of the file (similar to a journal) or an auxiliary file. Another alternative is to structure the file differently internally: instead of a sequential set of blocks, it could be organized as a B-tree or hash table where the complexity order of insertions in the middle is sub-linear [Bender, Demaine, and Farach-ColtonBender et al.2000]. These methods would allow us to avoid having to shift bytes outward to make space for larger encoded units, and we could support busy large databases more effectively. However, if we begin storing many encoded chunks out of order, large files could get fragmented. We would need a method for compaction or coalescing all these chunks into a single sequential order.
A related and important optimization we plan to implement is to avoid extra copying of data into temporary buffers. This is only needed when an encoded buffer is written in the middle of a file and its encoded length is greater than its decoded length; in that case we must shift outward some data in the encoded data file to make room for the new encoded data. We can optimize this code and avoid making the temporary copies when files are appended to or being newly created and written sequentially.
Finally, we plan on taking advantage of the type of SCA to further improve performance. Some SCAs, such as compression, include a dictionary at the head of each compressed data chunk. This dictionary is needed to decode the compressed data chunk--much like a key used to decrypt a piece of ciphertext. Furthermore, each dictionary is unique to its data chunk. For this reason, one cannot append two compressed data chunks together and treat them as one: decoding such a concatenated data chunk with the first chunk's dictionary will result in data corruption. In other words, compression algorithms are not concatenateable. However, some SCAs are concatenateable. For example, unicoding or even Uuencodefs use algorithms that convert a small number of bits to another small number of bits. Such algorithms do not use a key or dictionary, and are therefore concatenateable. The knowledge that an algorithm is concatenateable can help FiST to produce code that would avoid decoding two data chunks and then concatenating them (often useful when appending data to the end of a file). Instead, an SCA file system that uses a concatenateable algorithm could simply append the data bytes without decoding prior bytes.