Escali 4.4 Network devices, Shared Memory Device, Ethernet Devices, Direct Access Transport DAT

Page 25

Section: 2.2 SMC network devices

library, which in turn may (e.g. Myrinet or SCI) or may not require a kernel driver (e.g. TCP/IP). These provider libraries provide a network device to SMC.

2.2.1 Network devices

There are two basic types of network devices in SMC, native and DAT. The native devices are built-in and are neither replaceble nor upgradable without replacing the Scali MPI Connect package. There are currently five built in devices, SMP, TCP, IB, GM and SCI; the Release Notes included with the Scali MPI Connect package should have more details on this issue.

To find out what network device is used between two processes, set the environment variable SCAMPI_NETWORKS_VERBOSE=2. With value 2 the MPI library will print out during startup a table over every process and what device it's using to every other process.

2.2.1.1 Direct Access Transport (DAT)

The other type of devices use the DAT uDAPL API in order to have an open API for generic third party vendors. uDAPL is an abbrevation for User DAT Provider library. This is a shared library that SMC loads at runtime through the static DAT registry. These libraries are normally listed in /etc/dat.conf. For clusters using ‘exotic’ interconnects whose vendor provides a uDAPL shared object, these can be added to this file (if this isn’t done automatically by the vendor). The device name is given by the uDAPL, and the interconnect vendor must provide it.

Please note that Scali has a certification program, and may not provide support for unknown third party vendors.

The DAT header files and registry library conforming to the uDAPL v1.1 specification, is provided by the dat-registry package.

For more information on DAT, please refer to http://www.datcollaborative.org.

2.2.2 Shared Memory Device

The SMP device is a shared memory device that is used exclusively for intra-node communication and use SYS V IPC shared memory. Mulit CPU nodes are frequent in clusters, and SMP provide optimal communication between the CPUs. In cases where only one processor per node is used, SMP is not used.

2.2.3 Ethernet Devices

An Ethernet for networking is a basic requirement fpr a cluster. For some uses this also has enough performance for carrying application communication. To serve this Scali MPI Connect has a TCP device. In addition there are Direct Ethernet Transport (DET) devices which implement a protocol devised by Scali for aggregating multiple TCP-type interconnects.

2.2.3.1 TCP

The TCP device is really a generic device that works over any TCP/IP network, even WANs. This network device requires only that the node names given to mpimon map correctly to the nodes IP address. TCP/IP connectivity is required for SMC operation, and for this reason the TCP device is always perational.

Note: Users should always append the TCP device at the end of a devicelist as the device of last resort. This way communication will fall back to the management ethernet that anyway has to be present for the cluster to work.

Scali MPI Connect Release 4.4 Users Guide

13

Image 25
Contents Scali MPI ConnectTM Users Guide Copyright 1999-2005 Scali AS. All rights reserved AcknowledgementScali Bronze Software Certificate II Software License Terms Commencement MaintenanceGrant of License License Manager SupportSub-license and distribution Export RequirementsLICENSEE’s Obligations SCALI’s ObligationsTransfer Title to Intellectual Property RightsCompliance with Licenses Warranty of Title and Substantial PerformanceScali MPI Connect Release 4.4 Users Guide Vii Limitation on Remedies and LiabilitiesMiscellaneous Proprietary InformationGoverning Law Scali MPI Connect Release 4.4 Users Guide Table of contents Profiling with Scali MPI Connect Appendix a Example MPI code Scali MPI Connect Release 4.4 Users Guide Scali MPI Connect product context ChapterProblem reports Scali mailing lists SMC FAQ SMC release documentsSupport Platforms supportedAcronyms and abbreviations How to read this guideLicensing FeedbackNIC Typographic conventions Terms and conventionsGUI style font Typographic conventions Scali MPI Connect components Description of Scali MPI ConnectSMC network devices Network devices Direct Access Transport DATShared Memory Device Ethernet DevicesUsing detstat Using detctl3.2 DET Infiniband Myrinet4.1 GM 5.1 IB6 SCI Communication protocols on DAT-devicesChannel buffer Eagerbuffering protocol Inlining protocolTransporter protocol Support for other interconnects MPI-2 FeaturesZerocopy protocol Scali MPI Connect Release 4.4 Users Guide MPI-2 Features Compiling and linking Setting up a Scali MPI Connect environmentScali MPI Connect environment variables RunningLinker flags Compiler supportNaming conventions Running Scali MPI Connect programsBasic usage Mpimon monitor programIdentity of parallel processes Standard input Controlling options to mpimonStandard output Program specNetwork options How to provide options to mpimonMpirun usage Mpirun wrapper scriptSuspending and resuming jobs Running with tcp error detection TfdrRunning with dynamic interconnect failover capabilities Part partDebugging with a sequential debugger Debugging and profilingBuilt-in-tools for debugging Using built-in segment protect violation handlerAssistance for external profiling Debugging with Etnus TotalviewCommunication resources on DAT-devices Controlling communication resourcesChannelinlinethreshold size to set threshold for inlining Using MPIBsend Using MPIIsend, MPIIrecvGood programming practice with SMC Matching MPIRecv with MPIProbeUser interface errors and warnings Error and warning messagesFatal errors Unsafe MPI programsMpimon options Prefix Giving numeric values to mpimonPostfix Scali MPI Connect Release 4.4 Users Guide Example Profiling with Scali MPI ConnectTracing Using Scali MPI Connect built-in trace+relSecs S eTime where AbsRank MPIcallcommNamerankcall-dependant-parameters whereExample FeaturesTiming Using Scali MPI Connect built-in timingMPIcallDcallsDtimeDfreq TcallsTtimeTfreq Commrank recv from fromworldFromcommonFields Using the scanalyzeCommrank send to toworldTocommonFields where Count!avrLen!zroLen!inline!eager!transporter! whereFor timing This produces the following report Using SMCs built-in CPU-usage functionalityScali MPI Connect Release 4.4 Users Guide Automatic buffer management Tuning communication resourcesBenchmarking How to optimize MPI performanceCaching the application program on the nodes First iteration is very slowMemory consumption increase after warm-up Collective operationsFinding the best algorithm Programs in the ScaMPItst package Appendix aImage contrast enhancement Scali MPI Connect Release 4.4 Users Guide Original File formatWhy does not my program start to run? When things do not work troubleshootingAppendix B Why does my program terminate abnormally? Why can I not start mpid?General problems Appendix C Per node installation of Scali MPI ConnectInstall Scali MPI Connect for Direct Ethernet Install Scali MPI Connect for TCP/IPInstall Scali MPI Connect for Myrinet ExampleInstall Scali MPI Connect for SCI Install Scali MPI Connect for InfinibandInstall and configure SCI management software License optionsTroubleshooting Network providers Uninstalling SMCScali kernel drivers Troubleshooting the GM provider Troubleshooting 3rdparty DAT providersScali MPI Connect Release 4.4 Users Guide Bracket expansion Appendix D Bracket expansion and groupingGrouping Scali MPI Connect Release 4.4 Users Guide Related documentation Appendix EScali MPI Connect Release 4.4 Users Guide List of figures Scali MPI Connect Release 4.4 Users Guide Transporter protocol IndexSSP