Chapter 2 HPSS Planning
HPSS Installation Guide September 2002 61
Release 4.5, Revision 2
thatan implementation is thread-safe provided only one thread makes MPI calls. With HPSS MPI-
IO,multiple threads will make MPI calls. HPSS MPI-IO attempts to impose thread-safety on these
hostsby utilizing a global lock that must be acquired in order to make an MPI call. However, there
areknown problems with this approach, and the bottom line is that until these hosts provide true
thread-safety, the potential for deadlock within an MPI application will exist when using HPSS
MPI-IO in conjunction with other MPI operations. See the HPSS Programmers Reference Guide,
Volume 1, Release 4.5 for more details.
Filesread and written through the HPSS MPI-IO can also be accessed through the HPSS Client API,
FTP, Parallel FTP, or NFS interfaces. So even though the MPI-IO subsystem does not offer all the
migration, purging, and caching operations that are available in HPSS, parallel applications can
still do these tasks through the HPSS Client API or other HPSS interfaces.
The details of the MPI-IO API are described in theHPSS Programmer ’s Reference Guide, Volume 1.
2.5.7 DFS
DFS is offered by the Open Software Foundation (now the Open Group) as part of DCE. DFS is a
distributedfile system that allows users to access files using normal Unix utilities and system calls,
regardless of the file’s location. This transparency is one of the major attractions of DFS. The
advantageof DFS over NFS is that it provides greater security and allows files to be shared globally
between many sites using a common name space.
HPSS provides two options for controlling how DFS files are managed by HPSS: archived and
mirrored. The archived option gives users the impression of having an infinitely large DFS file
system that performs at near-native DFS speeds. This option is well suited to sites with large
numbersof small files. However, when using this option, the files can only be accessed through DFS
interfaces and cannot be accessed with HPSS utilities, such as parallel FTP. Therefore, the
performance for data transfers is limited to DFS speeds.
Themirrored option gives users the impression of having a single, common (mirrored) name space
whereobjects have the same path names in DFS and HPSS. With this option, large files can be stored
quickly on HPSS, then analyzed at a more leisurely pace from DFS. On the other hand, some
operations,such as file creates, perform slower when this option is used, as compared to when the
archived option is used.
HPSSand DFS define disk partitions differently from one another. In HPSS, the option for how files
aremirrored or archived is associated with a fileset. Recall that in DFS, multiple filesets may reside
ona single aggregate. However, the XDSM implementation provided in DFS generates events on a
per-aggregate basis. Therefore, in DFS this option applies to all filesets on a given aggregate.
To use the DFS/HPSS interface on an aggregate, the aggregate must be on a processor that has
Transarc’sDFS SMT kernel extensions installed. These extensions are available for Sun Solaris and
IBM AIX platforms. Once an aggregate has been set up, end users can access filesets on the
aggregatefrom any machine that supports DFS client software, including PCs. The wait/retry logic
in DFS client software was modified to account for potential delays caused by staging data from
HPSS. Using a DFS client without this change may result in long delays for some IO requests.
HPSS servers and DFS both use Encina as part of their infrastructure. Since the DFS and HPSS
release cycles to support the latest version of Encina may differ significantly, running the DFS
server on a different machine from the HPSS servers is recommended.