Q-Logic IB6054601-00 D manual Driver and Link Error Messages Reported by MPI Programs

Models: IB6054601-00 D

1 122
Download 122 pages 48.66 Kb
Page 101
Image 101

Q

C – Troubleshooting InfiniPath MPI Troubleshooting

$ mpirun -np

2 -m

~/tmp/q -q 60 mpi_latency 1000000 1000000

MPIRUN: MPI progress Quiescence Detected after 9000 seconds.

MPIRUN: 2 out of

2 ranks showed no MPI send or receive progress.

MPIRUN: Per-rank

details are the following:

MPIRUN: Rank

0

(<nodename>) caused MPI progress Quiescence.

MPIRUN: Rank

1

(<nodename>) caused MPI progress Quiescence.

MPIRUN: both

MPI

progress and Ping Quiescence Detected after 120

seconds.

 

 

Occasionally a stray process will continue to exist out of its context. mpirun checks for stray processes; they are killed after detection.The following is an example of the type of message you will see in this case:

$ mpirun -np 2 -ppn 1 -m ~/tmp/mfast mpi_latency 500000 2000 iqa-38: Received 1 out-of-context eager message(s) from stray process PID=29745

running on host 192.168.9.218

iqa-35: PSM pid 10513 on host IP 192.168.9.221 has detected that I am a stray process, exiting.

2000 5.222116

iqa-38:1.ips_ptl_report_strays: Process PID=29745 on host IP=192.168.9.218 sent

1 stray message(s) and was told so 1 time(s) (first stray message at 0.7s (13%),last at 0.7s (13%) into application run)

The following should never occur. Please inform Support if it does:

Internal Error: NULL function/argument found:func_ptr(arg_ptr)

C.8.12.3

Driver and Link Error Messages Reported by MPI Programs

Two types of error messages are described below.

1.When the InfiniBand link fails during a job, a message will be reported once per occurrence. The message will be similar to this:

ipath_check_unit_status: IB Link is down

This can happen when a cable is disconnected, a switch is rebooted, or if there are other problems with the link. The job will continue retrying until the quiescence interval expires. See the mpirun -qoption for information on quiescence.

2. If a hardware problem occurs, an error similar to this will be reported:

infinipath: [error strings] Hardware error

This will cause the MPI program to terminate. The error string may provide additional information as to the problem. To further determine the source of the problem, examine syslog on the node reporting the problem.

IB6054601-00 D

C-27

Page 101
Image 101
Q-Logic IB6054601-00 D manual Driver and Link Error Messages Reported by MPI Programs

IB6054601-00 D specifications

The Q-Logic IB6054601-00 D is a high-performance InfiniBand adapter card designed for data centers and enterprise applications requiring robust connectivity and low-latency communication. This adapter is part of QLogic's extensive portfolio of networking solutions, catering to the needs of high-performance computing (HPC), cloud computing, and virtualization environments.

One of the standout features of the IB6054601-00 D is its capability to support data transfer rates of up to 56 Gbps. This makes it ideal for applications demanding large bandwidth and quick data processing. The adapter is optimized for RDMA (Remote Direct Memory Access) technology, which allows data to be transferred directly between the memory of different computers without involving the CPU. This reduces latency and CPU overhead, leading to enhanced overall system performance.

The architecture of the IB6054601-00 D includes support for a dual-port design, which offers increased bandwidth, redundancy, and fault tolerance. This dual-port configuration is especially advantageous for environments that require high availability and reliability, such as financial services and mission-critical applications.

The adapter utilizes advanced error detection and correction mechanisms, ensuring that data integrity is maintained during transmission. With features like adaptive routing and congestion management, the IB6054601-00 D is capable of optimizing the handling of data flows, thereby enhancing performance even under heavy loads.

In terms of compatibility, the Q-Logic IB6054601-00 D supports a wide range of operating systems and virtualization technologies, making it easy to integrate into diverse IT environments. It also includes drivers and software packages that facilitate seamless deployment and management.

In addition to high-speed connectivity, the adapter is designed with power efficiency in mind. It adheres to Energy Star regulations, helping organizations lower their operational costs while minimizing their environmental footprint.

Overall, the Q-Logic IB6054601-00 D stands out for its high throughput, low latency, and reliability. Its combination of advanced features and technologies positions it as an excellent choice for organizations looking to enhance their data center performance and maximize the efficiency of their network infrastructure. With the growing demands for faster and more efficient data transfer, solutions like the IB6054601-00 D are essential in meeting the evolving needs of modern enterprises.