C – Troubleshooting InfiniPath MPI Troubleshooting

Q

The following message indicates that a node program may not be processing incoming packets, perhaps due to a very high system load:

eager array full after overflow, flushing (head h, tail t)

The following indicates an invalid InfiniPath link protocol version:

InfiniPath version ERROR: Expected version v, found w (memkey h)

The following error messages should rarely occur and indicate internal software problems:

ExpSend opcode h tid=j, rhf_error k: str

Asked to set timeout w/delay l, gives time in past (t2 < t1) Error in sending packet: str

Fatal error in sending packet, exiting: str

Fatal error in sending packet: str

Here the str can give additional clues to the reason for the failure.

The following probably indicates a node failure or malfunctioning link in the fabric:

Couldn’t connect to NODENAME, rank RANK#. Time elapsed HH:MM:SS. Still trying

NODENAME is the node (host) name, RANK# is the MPI rank, and HH:MM:SS are the hours, minutes, and seconds since we started trying to connect.

If you get messages similar to the following, it may mean that you are trying to receive to an invalid (unallocated) memory address, perhaps due to a logic error in the program, usually related to malloc/free:

ipath_update_tid_err: Failed TID update for rendevous, allocation problem

kernel: infinipath: get_user_pages (0x41 pages starting at 0x2aaaaeb50000

kernel: infinipath: Failed to lock addr 0002aaaaeb50000, 65 pages: errno 12

TID is short for Token ID, and is part of the InfiniPath hardware. This error indicates a failure of the program, not the hardware or driver.

C.8.12.2

MPI Messages

Some MPI error messages are issued from the parts of the code inherited from the MPICH implementation. See the MPICH documentation for descriptions of these. This section presents the error messages specific to the InfiniPath MPI implementation.

C-24

IB6054601-00 D

Page 98
Image 98
Q-Logic IB6054601-00 D manual MPI Messages

IB6054601-00 D specifications

The Q-Logic IB6054601-00 D is a high-performance InfiniBand adapter card designed for data centers and enterprise applications requiring robust connectivity and low-latency communication. This adapter is part of QLogic's extensive portfolio of networking solutions, catering to the needs of high-performance computing (HPC), cloud computing, and virtualization environments.

One of the standout features of the IB6054601-00 D is its capability to support data transfer rates of up to 56 Gbps. This makes it ideal for applications demanding large bandwidth and quick data processing. The adapter is optimized for RDMA (Remote Direct Memory Access) technology, which allows data to be transferred directly between the memory of different computers without involving the CPU. This reduces latency and CPU overhead, leading to enhanced overall system performance.

The architecture of the IB6054601-00 D includes support for a dual-port design, which offers increased bandwidth, redundancy, and fault tolerance. This dual-port configuration is especially advantageous for environments that require high availability and reliability, such as financial services and mission-critical applications.

The adapter utilizes advanced error detection and correction mechanisms, ensuring that data integrity is maintained during transmission. With features like adaptive routing and congestion management, the IB6054601-00 D is capable of optimizing the handling of data flows, thereby enhancing performance even under heavy loads.

In terms of compatibility, the Q-Logic IB6054601-00 D supports a wide range of operating systems and virtualization technologies, making it easy to integrate into diverse IT environments. It also includes drivers and software packages that facilitate seamless deployment and management.

In addition to high-speed connectivity, the adapter is designed with power efficiency in mind. It adheres to Energy Star regulations, helping organizations lower their operational costs while minimizing their environmental footprint.

Overall, the Q-Logic IB6054601-00 D stands out for its high throughput, low latency, and reliability. Its combination of advanced features and technologies positions it as an excellent choice for organizations looking to enhance their data center performance and maximize the efficiency of their network infrastructure. With the growing demands for faster and more efficient data transfer, solutions like the IB6054601-00 D are essential in meeting the evolving needs of modern enterprises.