Copying files in parallel

# UseMCast

statement.

If Multicast is enabled, the ParaStation daemons exchange status information using multicast messages. Thus, a Linux kernel supporting multicast on all nodes of the cluster is required. This is usually no problem, since all standard kernels from all common distribution are compiled with multicast support. If a customized kernel is used, multicast support must be enabled within the kernel configuration! In order to learn more about multicast take a look at the Multicast over TCP/IP HOWTO.

In addition, the hardware also has to support multicast packets. Since all modern Ethernet switches support multicast and the nodes of a cluster typically live in a private subnet, this should be not a problem. If the cluster nodes are connected by a gateway, it has to be configured appropriately to allow multicast packets to reach all nodes of the cluster from all nodes.

Using a gateway in order to link parts of a cluster is not a recommended configuration.

On nodes with more than one Ethernet interface, typically frontend or head nodes, or systems where the default route does not point to the private cluster subnet, a proper route for the multicast traffic must be setup. This is done by the command

route add -net 224.0.0.0 netmask 240.0.0.0 dev ethX

where ethX should be replaced by the actual name of the interface connecting to all other nodes. In order to enable this route at system startup, a corresponding entry has to be added to /etc/route.conf or / etc/sysconfig/networks/routes, depending on the type of Linux distribution in use.

5.17. Copying files in parallel

To copy large files to many or all nodes in a cluster at once, pscp is very handy. It overlaps storing data to disk and transfering data on the network, therefore it scales very well with respect to the number of nodes. Arbitrary size of files may be copied, even archives containing large lists of files may be created and unpacked on-the-fly.

Pscp uses the ParaStation pscom library for data transfers, that automatically will use the most effective communication channel available. If required, the communication layer may be controlled using environment variables, refer to ps_environment(7) for details. The client process on each node is spawned using the ParaStation process management.

As pscp uses administrative ParaStation tasks to spawn the client processes, the user must be a member of the adminuser list or the user's group must be a member of theadmingroup list. By default, only root is a member of the adminuser list and therefore allowed to use pscp. Refer to ParaStation5 User's Guide and psiadmin(8) for details.

For more details refer to ParaStation5 User's Guide and pscp(8).

5.18. Using ParaStation accounting

ParaStation may write accounting information about each finished job run on the cluster to /var/ account/yyyymmdd, where yyyymmdd denotes the current accounting file in the form year, month and day.

To enable accouting, the special hardware accounter must be set within the ParaStation configuration file for at least one node. On each configured node, an accounting daemon collecting all information for all jobs within the cluster will store the job information in the accouting file.

26

ParaStation5 Administrator's Guide

Page 30
Image 30
PAR Technologies V5 manual Copying files in parallel, Using ParaStation accounting, # UseMCast