there should be no compromise of robustness, consis- tency or security. There should also be no added com- plexity in sharing and collaboration. Finally, the design should be tolerant of human error: improper use of the portable storage device (such as using the wrong de- vice or forgetting to copy the latest version of a file to it) should not hurt correctness.
Lookaside caching is an extension of AFS2-style whole-file caching [7] that meets the above goals. It is based on the observation that virtually all distributed file system protocols provide separate remote proce- dure calls (RPCs) for access of meta-data and access of data content. Lookaside caching extends the definition of meta-data to include a cryptographic hash of data content. This extension only increases the size of meta- data by a modest amount: just 20 bytes if SHA-1 [11] is used as the hash. Since hash size does not depend on file length, it costs very little to obtain and cache hash information even for many large files. Using POSIX terminology, caching the results of “ls -lR” of a large tree is feasible on a small client, even if there is not enough cache space for the contents of all the files in the tree. This continues to be true even if one augments stat information for each file or directory in the tree with its SHA-1 hash.
Once a client possesses valid meta-data for an ob- ject, it can use the hash to redirect the fetch of data content. If a mounted portable storage device has a file with matching length and hash, the client can obtain the contents of the file from the device rather than from the file server. Whether it is beneficial to do this depends, of course, on factors such as file size, network band- width, and device transfer rate. The important point is that possession of the hash gives a degree of freedom that clients of a distributed file system do not possess today.
Since lookaside caching treats the hash as part of the meta-data, there is no compromise in consistency. The underlying cache coherence protocol of the dis- tributed file system determines how closely client state tracks server state. There is no degradation in the ac- curacy of this tracking if the hash is used to redirect access of data content. To ensure no compromise in se- curity, the file server should return a null hash for any object on which the client only has permission to read the meta-data.
Lookaside caching can be viewed as a degenerate case of the use of file recipes, as described by Tolia et
al. [22]. In that work, a recipe is an XML description of file content that enables block-level reassembly of the file from content-addressable storage. One can view the hash of a file as the smallest possible recipe for it. The implementation using recipes is considerably more complex than our support for lookaside caching. In re- turn for this complexity, synthesis from recipes may succeed in many situations where lookaside fails.
4Prototype Implementation
We have implemented lookaside caching in the Coda file system on Linux. The user-level implemen- tation of Coda client cache manager and server code greatly simplified our effort since no kernel changes were needed. The implementation consists of four parts: a small change to the client-server protocol; a quick index check (the “lookaside”) in the code path for handling a cache miss; a tool for generating looka- side indexes; and a set of user commands to include or exclude specific lookaside devices.
The protocol change replaces two RPCs,
ViceGetAttr() and ViceValidateAttrs() with the extended calls ViceGetAttrPlusSHA() and ViceValidateAttrsPlusSHA() that have an extra parameter for the SHA-1 hash of the file. ViceGetAttr() is used to obtain meta-data for a file or directory, while ViceValidateAttrs() is used to revalidate cached meta-data for a collection of files or directories when connectivity is restored to a server. Our implementation preserves compatibility with legacy servers. If a client connects to a server that has not been upgraded to support lookaside caching, it falls back to using the original RPCs mentioned above.
The lookaside occurs just before the execution of the ViceFetch() RPC to fetch file contents. Before network communication is attempted, the client con- sults one or more lookaside indexes to see if a local file with identical SHA-1 value exists. Trusting in the colli- sion resistance of SHA-1 [10], a copy operation on the local file can then be a substitute for the RPC. To de- tect version skew between the local file and its index, the SHA-1 hash of the local file is re-computed. In case of a mismatch, the local file substitution is suppressed and the cache miss is serviced by contacting the file server. Coda’s consistency model is not compromised, although some small amount amount of work is wasted on the lookaside path.
The index generation tool walks the file tree rooted