1Diagnostic Tools 2shaharf@voltaire.com, halr@voltaire.com 3 4General: 5 6Model of operation: All utilities use direct MAD access to perform their 7operations. Operations that require QP 0 mads only, may use direct routed 8mads, and therefore may work even in unconfigured subnets. Almost all 9utilities can operate without accessing the SM, unless GUID to lid translation 10is required. 11 12Dependencies: Most utilities depend on libibmad and libibumad. 13 All utilities depend on the ib_umad kernel module. 14 15Multiple port/Multiple CA support: when no IB device or port is specified 16 (see the "local umad parameters" below), the libibumad library 17 selects the port to use by the following criteria: 18 1. the first port that is ACTIVE. 19 2. if not found, the first port that is UP (physical link up). 20 21 If a port and/or CA name is specified, the libibumad library 22 attempts to fulfill the user request, and will fail if it is not 23 possible. 24 For example: 25 ibaddr # use the 'best port' 26 ibaddr -C mthca1 # pick the best port from mthca1 only. 27 ibaddr -P 2 # use the second (active/up) port from the 28 first available IB device. 29 ibaddr -C mthca0 -P 2 # use the specified port only. 30 31Common options & flags: 32 Most diagnostics take the following flags. The exact list of supported 33 flags per utility can be found in the usage message and can be shown 34 using util_name -h syntax. 35 36 # Debugging flags 37 -d raise the IB debugging level. May be used 38 several times (-ddd or -d -d -d). 39 -e show umad send receive errors (timeouts and others) 40 -h show the usage message 41 -v increase the application verbosity level. 42 May be used several times (-vv or -v -v -v) 43 -V show the internal version info. 44 45 # Addressing flags 46 -D use directed path address arguments. The path 47 is a comma separated list of out ports. 48 Examples: 49 "0" # self port 50 "0,1,2,1,4" # out via port 1, then 2, ... 51 -G use GUID address arguments. In most cases, it is the Port GUID. 52 Examples: 53 "0x08f1040023" 54 -s <smlid> use 'smlid' as the target lid for SA queries. 55 56 # Local umad parameters: 57 -C <ca_name> use the specified ca_name. 58 -P <ca_port> use the specified ca_port. 59 -t <timeout_ms> override the default timeout for the solicited mads. 60 61CLI notation: all utilities use the POSIX style notation, 62 meaning that all options (flags) must precede all arguments 63 (parameters). 64 65 66Utilities descriptions: 67 681. ibstatus 69 70Description: 71ibstatus is a script which displays basic information obtained from the local 72IB driver. Output includes LID, SMLID, port state, link width active, and port 73physical state. 74 75Syntax: 76ibstatus [-h] [devname[:port]]... 77 78Examples: 79 ibstatus # display status of all IB ports 80 ibstatus mthca1 # status of mthca1 ports 81 ibstatus mthca1:1 mthca0:2 # show status of specified ports 82 83See also: 84 ibstat 85 862. ibstat 87 88Description: 89Similar to the ibstatus utility but implemented as a binary and not a script. 90It has options to list CAs and/or ports. 91 92Syntax: 93ibstat [-d(ebug) -l(ist_of_cas) -p(ort_list) -s(hort)] <ca_name> [portnum] 94 95Examples: 96 ibstat # display status of all IB ports 97 ibstat mthca1 # status of mthca1 ports 98 ibstat mthca1 2 # show status of specified ports 99 ibstat -p mthca0 # list the port guids of mthca0 100 ibstat -l # list all CA names 101 102See also: 103 ibstatus 104 1053. ibroute 106 107Description: 108ibroute uses SMPs to display the forwarding tables (unicast 109(LinearForwardingTable or LFT) or multicast (MulticastForwardingTable or MFT)) 110for the specified switch LID and the optional lid (mlid) range. 111The default range is all valid entries in the range 1...FDBTop. 112 113Syntax: 114ibroute [options] <switch_addr> [<startlid> [<endlid>]]] 115 116Non standard flags: 117 -a show all lids in range, even invalid entries. 118 -n do not try to resolve destinations. 119 -M show multicast forwarding tables. In this case the range 120 parameters are specifying mlid range. 121 122Examples: 123 ibroute 2 # dump all valid entries of switch lid 2 124 ibroute 2 15 # dump entries in the range 15...FDBTop. 125 ibroute -a 2 10 20 # dump all entries in the range 10..20 126 ibroute -n 2 # simple format 127 ibroute -M 2 # show multicast tables 128 129See also: 130 ibtracert 131 1324. ibtracert 133 134Description: 135ibtracert uses SMPs to trace the path from a source GID/LID to a 136destination GID/LID. Each hop along the path is displayed until the destination 137is reached or a hop does not respond. By using the -m option, multicast path 138tracing can be performed between source and destination nodes. 139 140Syntax: 141ibtracert [options] <src-addr> <dest-addr> 142 143Non standard flags: 144 -n simple format; don't show additional information. 145 -m <mlid> show the multicast trace of the specified mlid. 146 147Examples: 148 ibtracert 2 23 # show trace between lid 2 and 23 149 ibtracert -m 0xc000 3 5 # show multicast trace between lid 3 and 5 150 for mcast lid 0xc000. 151 1525. smpquery 153 154Description: 155smpquery allows a basic subset of standard SMP queries including the following: 156node info, node description, switch info, port info. Fields are displayed in 157human readable format. 158 159Syntax: 160smpquery [options] <op> <dest_addr> [op_params] 161 162Current supported operations and their parameters: 163 nodeinfo <addr> 164 nodedesc <addr> 165 portinfo <addr> [<portnum>] # default port is zero 166 switchinfo <addr> 167 pkeys <addr> [<portnum>] 168 sl2vl <addr> [<portnum>] 169 vlarb <addr> [<portnum>] 170 171Examples: 172 smpquery nodeinfo 2 # show nodeinfo for lid 2 173 smpquery portinfo 2 5 # show portinfo for lid 2 port 5 174 1756. smpdump 176 177Description: 178smpdump is a general purpose SMP utility which gets SM attributes from a 179specified SMA. The result is dumped in hex by default. 180 181Syntax: 182smpdump [options] <dest_addr> <attr> [mod] 183 184Non standard flags: 185 -s show output as string 186 187Examples: 188 smpdump -D 0,1,2 0x15 2 # port info, port 2 189 smpdump 3 0x15 2 # port info, lid 3 port 2 190 1917. ibaddr 192 193Description: 194ibaddr can be used to show the lid and GID addresses of the specified port, 195or the local port by default. 196Note: this utility can be used as simple address resolver. 197 198Syntax: 199ibaddr [options] [<dest_addr>] 200 201Examples: 202 ibaddr # show local address 203 ibaddr 2 # show address of the specified port lid 204 ibaddr -G 0x8f1040023 # show address of the specified port guid 205 2068. sminfo 207 208Description: 209sminfo issue and dumps the output of a sminfo query in human readable format. 210The target SM is the one listed in the local port info, or the SM specified 211by the optional SM lid or by the SM direct routed path. 212Note: using sminfo for any purposes other then simple query may be very 213dangerous, and may result in a malfunction of the target SM. 214 215Syntax: 216sminfo [options] <sm_lid|sm_dr_path> [sminfo_modifier] 217 218Non standard flags: 219 -s <state> # use the specified state in sminfo mad 220 -p <priority> # use the specified priority in sminfo mad 221 -a <activity> # use the specified activity in sminfo mad 222 223Examples: 224 sminfo # show sminfo of SM listed in local portinfo 225 sminfo 2 # query SM on port lid 2 226 2279. perfquery 228 229Description: 230perfquery uses PerfMgt GMPs to obtain the PortCounters (basic performance 231and error counters) from the PMA at the node specified. Optionally reset all 232or 233 234Syntax: 235perfquery [options] [<lid|guid> [[port] [reset_mask]]] 236 237Non standard flags: 238 -a show aggregated counters for all ports of the destination lid. 239 -r reset counters after read. 240 -R only reset counters. 241 242Examples: 243 perfquery # read local port's performance counters 244 perfquery 32 1 # read performance counters from lid 32, port 1 245 perfquery -a 32 # read node aggregated performance counters 246 perfquery -r 32 1 # read performance counters and reset 247 perfquery -R 32 1 # reset performance counters of port 1 only 248 perfquery -R -a 32 # reset performance counters of all ports 249 perfquery -R 32 2 0xf000 # reset only non-error counters of port 2 250 25110. ibping 252 253Description: 254ibping uses vendor mads to validate connectivity between IB nodes. 255On exit, (IP) ping like output is show. ibping is run as client/server. 256Default is to run as client. Note also that a default ping server is 257implemented within the kernel. 258 259Syntax: 260ibping [options] <dest lid|guid> 261 262Non standard flags: 263 -c <count> stop after count packets 264 -f flood destination: send packets back to back w/o delay 265 -o <oui> use specified OUI number to multiplex vendor mads 266 -S start in server mode (do not return) 267 26811. ibnetdiscover 269 270Description: 271ibnetdiscover performs IB subnet discovery and outputs a human readable 272topology file. GUIDs, node types, and port numbers are displayed 273as well as port LIDs and NodeDescriptions. All nodes (and links) are displayed 274(full topology). Optionally this utility can be used to list the current 275connected nodes. The output is printed to the standard output unless a 276topology file is specified. 277 278Syntax: 279ibnetdiscover [options] [<topology-filename>] 280 281Non standard flags: 282 -l List of connected nodes 283 -H List of connected HCAs 284 -S List of connected switches 285 -g Grouping 286 28712. ibhosts 288 289Description: 290ibhosts either walks the IB subnet topology or uses an already saved topology 291file and extracts the CA nodes. 292 293Syntax: 294ibhosts [-h] [<topology-file>] 295 296Dependencies: 297ibnetdiscover, ibnetdiscover format 298 29913. ibswitches 300 301Description: 302ibswitches either walks the IB subnet topology or uses an already saved 303topology file and extracts the IB switches. 304 305Syntax: 306ibswitches [-h] [<topology-file>] 307 308Dependencies: 309ibnetdiscover, ibnetdiscover format 310 31114. ibchecknet 312 313Description: 314ibchecknet uses a full topology file that was created by ibnetdiscover, 315scans the network to validate the connectivity and reports errors 316(from port counters). 317 318Syntax: 319ibchecknet [-h] [<topology-file>] 320 321Dependencies: 322ibnetdiscover, ibnetdiscover format, ibchecknode, ibcheckport, ibcheckerrs 323 32415. ibcheckport 325 326Description: 327Check connectivity and do some simple sanity checks for the specified port. 328Port address is lid unless -G option is used to specify a GUID address. 329 330Syntax: 331ibcheckport [-h] [-G] <lid|guid> <port_number> 332 333Example: 334 ibcheckport 2 3 # check lid 2 port 3 335 336Dependencies: 337smpquery, smpquery output format, ibaddr 338 33916. ibchecknode 340 341Description: 342Check connectivity and do some simple sanity checks for the specified node. 343Port address is lid unless -G option is used to specify a GUID address. 344 345Syntax: 346ibchecknode [-h] [-G] <lid|guid> 347 348Example: 349 ibchecknode 2 # check node via lid 2 350 351Dependencies: 352smpquery, smpquery output format, ibaddr 353 354Usage: 355 35617. ibcheckerrs 357 358Description: 359Check specified port (or node) and report errors that surpassed their predefined 360threshold. Port address is lid unless -G option is used to specify a GUID 361address. The predefined thresholds can be dumped using the -s option, and a 362user defined threshold_file (using the same format as the dump) can be 363specified using the -t <file> option. 364 365Syntax: 366ibcheckerrs [-h] [-G] [-t <threshold_file>] [-s(how_thresholds)] <lid|guid> [<port>] 367 368Examples: 369 ibcheckerrs 2 # check aggregated node counter for lid 2 370 ibcheckerrs 2 4 # check port counters for lid 2 port 4 371 ibcheckerrs -t xxx 2 # check node using xxx threshold file 372 373Dependencies: 374perfquery, perfquery output format, ibaddr 375 37618. ibportstate 377 378Description: 379ibportstate allows the port state and port physical state of an IB port 380to be queried or a switch port to be disabled or enabled. 381 382Syntax: 383ibportstate [-d(ebug) -e(rr_show) -v(erbose) -D(irect) -G(uid) -s smlid 384-V(ersion) -C ca_name -P ca_port -t timeout_ms] <dest dr_path|lid|guid> 385<portnum> [<op>] 386 supported ops: enable, disable, query 387 388Examples: 389 ibportstate 3 1 disable # by lid 390 ibportstate -G 0x2C9000100D051 1 enable # by guid 391 ibportstate -D 0 1 # by direct route 392 39319. ibcheckwidth 394 395Description: 396ibcheckwidth uses a full topology file that was created by ibnetdiscover, 397scans the network to validate the active link widths and reports any 1x 398links. 399 400Syntax: 401ibcheckwidth [-h] [<topology-file>] 402 403Dependencies: 404ibnetdiscover, ibnetdiscover format, ibchecknode, ibcheckportwidth 405 40620. ibcheckportwidth 407 408Description: 409Check connectivity and check the specified port for 1x link width. 410Port address is lid unless -G option is used to specify a GUID address. 411 412Syntax: 413ibcheckportwidth [-h] [-G] <lid|guid> <port> 414 415Example: 416 ibcheckportwidth 2 3 # check lid 2 port 3 417 418Dependencies: 419smpquery, smpquery output format, ibaddr 420 42121. ibcheckstate 422 423Description: 424ibcheckstate uses a full topology file that was created by ibnetdiscover, 425scans the network to validate the port state and port physical state, 426and reports any ports which have a port state other than Active or 427a port physical state other than LinkUp. 428 429Syntax: 430ibcheckstate [-h] [<topology-file>] 431 432Dependencies: 433ibnetdiscover, ibnetdiscover format, ibchecknode, ibcheckportstate 434 43522. ibcheckportstate 436 437Description: 438Check connectivity and check the specified port for proper port state 439(Active) and port physical state (LinkUp). 440Port address is lid unless -G option is used to specify a GUID address. 441 442yntax: 443ibcheckportstate [-h] [-G] <lid|guid> <port_number> 444 445Example: 446 ibcheckportstate 2 3 # check lid 2 port 3 447 448Dependencies: 449smpquery, smpquery output format, ibaddr 450 45123. ibcheckerrors 452 453ibcheckerrors uses a full topology file that was created by ibnetdiscover, 454scans the network to validate the connectivity and reports errors 455(from port counters). 456 457Syntax: 458ibnetcheckerrors [-h] [<topology-file>] 459 460Dependencies: 461ibnetdiscover, ibnetdiscover format, ibchecknode, ibcheckport, ibcheckerrs 462 46324. ibdiscover.pl 464 465ibdiscover.pl uses a topology file create by ibnetdiscover and a discover.map 466file which the network administrator creates which indicates the nodes 467to be expected and a ibdiscover.topo file which is the expected connectivity 468and produces a new connectivity file (discover.topo.new) and outputs 469the changes to stdout. The network administrator can choose to replace 470the "old" topo file with the new one or certain changes in. 471 472The syntax of the ibdiscover.map file is: 473<nodeGUID>|port|"Text for node"|<NodeDescription from ibnetdiscover format> 474e.g. 4758f10400410015|8|"ISR 6000"|# SW-6IB4 Voltaire port 0 lid 5 4768f10403960558|2|"HCA 1"|# MT23108 InfiniHost Mellanox Technologies 477 478The syntax of the old and new topo files (ibdiscover.topo and 479ibdiscover.topo.new) are: 480<LocalPort>|<LocalNodeGUID>|<RemotePort>|<RemoteNodeGUID> 481e.g. 48210|5442ba00003080|1|8f10400410015 483 484These topo files are produced by the ibdiscover.pl tool. 485 486Syntax: 487ibnetdiscover | ibdiscover.pl 488 489Dependencies: 490ibnetdiscover, ibnetdiscover format 491 49225. ibnodes 493 494Description: 495ibnodes either walks the IB subnet topology or uses an already saved topology 496file and extracts the IB nodes (CAs and switches). 497 498Syntax: 499ibnodes [<topology-file>] 500 501Dependencies: 502ibnetdiscover, ibnetdiscover format 503 50426. ibclearerrors 505 506Description: 507ibclearerrors clears the PMA error counters in PortCounters by either walking 508the IB subnet topology or using an already saved topology file. 509 510Syntax: 511ibclearerrors [-h] [<topology-file>] 512 513Dependencies: 514ibnetdiscover, ibnetdiscover format, perfquery 515 51627. ibclearcounters 517 518Description: 519ibclearcounters clears the PMA port counters by either walking 520the IB subnet topology or using an already saved topology file. 521 522Syntax: 523ibclearcounters [-h] [<topology-file>] 524 525Dependencies: 526ibnetdiscover, ibnetdiscover format, perfquery 527 52828. saquery 529 530Description: 531Issue some SA queries. 532 533Syntax: 534Usage: saquery [-h -d -P -N -L -G -s -g][<name>] 535 Queries node records by default 536 -d enable debugging 537 -P get PathRecord info 538 -N get NodeRecord info 539 -L Return just the Lid of the name specified 540 -G Return just the Guid of the name specified 541 -s Return the PortInfoRecords with isSM capability mask bit on 542 -g get multicast group info 543 544Dependencies: 545OpenSM libvendor, OpenSM libopensm, libibumad 546 54729. ibsysstat 548 549Description: 550ibsysstat uses vendor mads to validate connectivity between IB nodes 551and obtain other information about the IB node. ibsysstat is run as 552client/server. Default is to run as client. 553 554Syntax: 555ibsysstat [options] <dest lid|guid> [<op>] 556 557Non standard flags: 558 Current supported operations: 559 ping - verify connectivity to server (default) 560 host - obtain host information from server 561 cpu - obtain cpu information from server 562 -o <oui> use specified OUI number to multiplex vendor mads 563 -S start in server mode (do not return) 564 565