node6 HCA-1 | 1 | 0 | 0 | 0 | 0 | 0 |
node8 HCA-1 | 1 | 0 | 0 | 0 | 0 | 0 |
node3 HCA-1 | 1 | 0 | 0 | 0 | 0 | 0 |
node4 HCA-1 | 1 | 0 | 0 | 0 | 0 | 0 |
node8 HCA-1 | 1 | 0 | 0 | 0 | 0 | 0 |
node7 HCA-1 | 1 | 0 | 0 | 0 | 0 | 0 |
inspect_ib_fabric.pl
Description – The inspect_ib_fabric.pl utility is provided as an additional tool for checking for errors in the InfiniBand fabric. This utility invokes ibnetdiscover and perfquery to detect components in the fabric and check their port counters. This information is then displayed in various formats, including one that shows errors on an InfiniBand link basis, depending on which output format flags are specified.
Usage –
#inspect_ip_fabric.pl [-(detailssummarylinkslinkerrsmappingfull)] [-scan=<file>] [-map=<file>] [-refresh] [-nocounters] [-swirate=<rate>] [hcarate=<rate>] [-rate=<rate>]
Output Format Options:
•-details– Displays each InfiniBand switch and HCA, along with a list of active ports with their error counters. Includes GUID, lid, and total port count information.
•-summary– Displays a single-line entry for each InfiniBand component detected in the fabric. Includes GUID, name, active/available port count, and total error count.
•-links– Displays each physical link between the InfiniBand components in the fabric. Links are depicted by either a ‘<====>’, ‘<**==>’, ‘<==**>’, or ‘<****>’. A ‘**’ in the link depiction indicates an error on that side of the link. Links are displayed using the component name. Detected link speed is also shown.
•-linkerrs– Displays only the links with errors and provides the detailed view of the link error.
•-mapping– Displays each InfiniBand component along with the name being used to identify that component.
•-full– Default. displays all the above formats.
Fabric Scan Options:
•-scan=<file>– Specifies the ibnetdiscover input/output file. By default the output file is /opt/clustertest/logs/ibnetdiscover.log.
•-map=<file>– Specifies a node-name map file to use with the ‘-node-name-map’ ibnetdiscover option. This file is used to override the default description text that is tied to each GUID.
•-refresh– When specifying an ibnetdiscover input file (-scan), this option skips running ibnetdiscover to generate a new file. Skips scanning the InfiniBand fabric.
•-nocounters– Do not collect port counter information.
Expected Link Rate Options:
•-swirate=<rate>– Sets the expected switch-to-switch link rate (for example, ‘4xDDR’).
•-hcarate=<rate>– Sets the expected switch-to-HCA link rate (for example, ‘4xDDR’)
•-rate=<rate>– Sets the expected switch-to-switch and switch-to-HCA link rate. The default expected link rate is ‘4xQDR’.
Naming and mapping – The inspect_ib_fabric.pl utility identifies GUIDs in the InfiniBand fabric by the description text common to other InfiniBand utilities and by a generated name. The generated name is in the format SWxxxyy or HCAxxxyy for switches and HCAs respectively.
Whenever possible, inspect_ib_fabric.pl attempts to group InfiniBand components together using the system GUID. If multiple components are detected in the fabric with the same system GUID, then they will use the same xxx identifier. The yy identifier is used to uniquely identify each component with the same system GUID. For example, if a switch with a fabric board and two line boards were discovered in the fabric utilizing the same system GUID, they would be named