1This README is for OpenSM and the InfiniBand diagnostic utilities 2in this directory (management). 3 4The master source repository is 5git://git.openfabrics.org/~sashak/management.git and can be cloned by: 6 7 git clone git://git.openfabrics.org/~sashak/management.git 8 9 10Packages 11-------- 12libibcommon - common stuff 13libibumad - interface to ib_umad module (user_mad) library 14libibmad - generic MAD handling library 15opensm - OpenSM 16infiniband-diags - various diagnostic tools 17 18 19Building 20-------- 21To make this unpack tarballs and in directories libibcommon, libibumad, 22libibmad, opensm, infiniband-diags (in that order) run: 23 24 ./configure && make && make install 25 26(If you are building the cloned repository run also ./autogen.sh first) 27 28Typically the autogen and configure steps only need be done the first 29time unless configure.in or Makefile.am changes in the directories. 30 31Libraries are installed by default at /usr/local/lib and binaries at 32/usr/local/sbin. 33 34 35Running 36------- 37After compiling and installing, you can run opensm by invoking 38 39 /usr/local/sbin/opensm 40 41opensm must be run as root. Run 'opensm --help' to see the options. 42 43Note also that you must have udev mount /dev/infiniband or do it manually. 44See .../src/linux-kernel/docs/user_mad.txt. Also, ib_umad module must be 45loaded. 46 47opensm will run on the first existing port on the first IB device (HCA). 48You can override that by using "-g <portguid_in_hex>". 49Verify that the first port is active. This assumes the port is plugged 50into another IB device. 51 52In case of problems, run the opensm with -V and send the log file 53(/var/log/opensm.log). 54 55IMPORTANT: 56Don't forget to modprobe ib_umad and make sure udev is configured before 57using any of the userspace programs. 58 59 60OpenSM Limitations: 611. Retry mechanism in SM is primitive and needs enhancing to deal with 62ports which are active but don't respond to SM MADs. 632. Async events are not yet supported (by OpenSM). The only one supported 64is local LID change (and this is handled in the mthca driver). Future 65versions of OpenSM may need to act on more local events. 66 67 68Tuning OpenSM for Large Clusters 69-------------------------------- 70Currently OpenSM is compiled with debug and no optimization. This 71should be changed to at least -O2 (and perhaps -O4) but I would start 72with -O2. This results in a 2x speedup for some code paths. 73 74OpenSM supports a pipelining mode for SMPs. The default is 4 75outstanding SMPs. -maxsmps <#> indicates the number of outstanding SMPs 76allowed and should speed up the initialization. Useful values of this 77are 16 and 32. 78 79Beyond this, there may be some issue with a link which is causing 80timeout and retries to kick in. The OpenSM log should have some messages 81in there indicating this. 82 83 84Other utilities (infiniband diagnostics) 85--------------------------------------- 86ibstat - show host adapters status 87ibstatus - similar to ibstat but implemented as a script 88ibnetdiscover - scan topology 89ibaddr - shows the lid range and default GID of the target (default is 90 the local port) 91ibroute - display unicast and multicast forwarding tables of switches 92ibtracert - display unicast or multicast route from source to destination 93ibping - ping/pong between IB nodes (currently using vendor MADs) 94ibsysstat - obtain basic information for node (hostname, cpus, memory, 95 utilization) which may be remote 96sminfo - query the SMInfo attribute on a node 97smpdump - simple solicited SMP query tool. Output is hex dump 98 (unless requested otherwise, e.g. using -s) 99smpquery - formatted SMP query tool 100perfquery - dump (and optionally clear) the performance (including error) 101 counters of the destination port 102ibcheckport - perform some basic tests on the specified port 103ibchecknode - perform some basic tests on the specified node 104ibcheckerrs - check if the error counters of the port/node have passed 105 some predefined thresholds 106ibchecknet - perform port/node/errors check on the subnet. ibnetdiscover 107 output can be used as in input topology 108ibswitches - scan the net or use existing net topology file and list all 109 switches 110ibhosts - scan the net or use existing net topology file and list all hosts 111ibnodes - scan the net or use existing net topology file and list all nodes 112ibportstate - get the logical and physical port state of an IB port or 113 disable or enable the port (only on a switch) 114ibcheckwidth - perform port width check on the subnet. Used to find ports 115 with 1x link width. 116ibcheckportwidth - perform 1x port width check on specified port 117ibcheckstate - perform port state (and physical port state) check on 118 the subnet. Used to find ports not in LinkUp physical port state 119 and not Active port state 120ibcheckportstate - perform port state (and physical port state) check on 121 specified port 122ibcheckerrors - perform error check on subnet. Used to find ports with 123 error counters (PMA PortCounters) beyond the indicated thresholds 124ibclearerrors - clear all error counters on subnet 125ibclearcounters - clear all port counters on subnet 126ibdiscover.pl - takes output of ibnetdiscover and a map file and produces 127 a topology file (local node GUID and port connected to remote 128 node GUID and port) 129saquery - issue some SA queries 130 131Note that the above list is not up to date and the infiniband-diags 132subdirectory should be checked for the latest tools. 133