1\documentstyle[12pt,twoside]{article} 2\def\TITLE{IP Command Reference} 3\input preamble 4\begin{center} 5\Large\bf IP Command Reference. 6\end{center} 7 8 9\begin{center} 10{ \large Alexey~N.~Kuznetsov } \\ 11\em Institute for Nuclear Research, Moscow \\ 12\verb|kuznet@ms2.inr.ac.ru| \\ 13\rm April 14, 1999 14\end{center} 15 16\vspace{5mm} 17 18\tableofcontents 19 20\newpage 21 22\section{About this document} 23 24This document presents a comprehensive description of the \verb|ip| utility 25from the \verb|iproute2| package. It is not a tutorial or user's guide. 26It is a {\em dictionary\/}, not explaining terms, 27but translating them into other terms, which may also be unknown to the reader. 28However, the document is self-contained and the reader, provided they have a 29basic networking background, will find enough information 30and examples to understand and configure Linux-2.2 IP and IPv6 31networking. 32 33This document is split into sections explaining \verb|ip| commands 34and options, decrypting \verb|ip| output and containing a few examples. 35More voluminous examples and some topics, which require more elaborate 36discussion, are in the appendix. 37 38The paragraphs beginning with NB contain side notes, warnings about 39bugs and design drawbacks. They may be skipped at the first reading. 40 41\section{{\tt ip} --- command syntax} 42 43The generic form of an \verb|ip| command is: 44\begin{verbatim} 45ip [ OPTIONS ] OBJECT [ COMMAND [ ARGUMENTS ]] 46\end{verbatim} 47where \verb|OPTIONS| is a set of optional modifiers affecting the 48general behaviour of the \verb|ip| utility or changing its output. All options 49begin with the character \verb|'-'| and may be used in either long or abbreviated 50forms. Currently, the following options are available: 51 52\begin{itemize} 53\item \verb|-V|, \verb|-Version| 54 55--- print the version of the \verb|ip| utility and exit. 56 57 58\item \verb|-s|, \verb|-stats|, \verb|-statistics| 59 60--- output more information. If the option 61appears twice or more, the amount of information increases. 62As a rule, the information is statistics or some time values. 63 64 65\item \verb|-f|, \verb|-family| followed by a protocol family 66identifier: \verb|inet|, \verb|inet6| or \verb|link|. 67 68--- enforce the protocol family to use. If the option is not present, 69the protocol family is guessed from other arguments. If the rest of the command 70line does not give enough information to guess the family, \verb|ip| falls back to the default 71one, usually \verb|inet| or \verb|any|. \verb|link| is a special family 72identifier meaning that no networking protocol is involved. 73 74\item \verb|-4| 75 76--- shortcut for \verb|-family inet|. 77 78\item \verb|-6| 79 80--- shortcut for \verb|-family inet6|. 81 82\item \verb|-0| 83 84--- shortcut for \verb|-family link|. 85 86 87\item \verb|-o|, \verb|-oneline| 88 89--- output each record on a single line, replacing line feeds 90with the \verb|'\'| character. This is convenient when you want to 91count records with \verb|wc| or to \verb|grep| the output. The trivial 92script \verb|rtpr| converts the output back into readable form. 93 94\item \verb|-r|, \verb|-resolve| 95 96--- use the system's name resolver to print DNS names instead of 97host addresses. 98 99\begin{NB} 100 Do not use this option when reporting bugs or asking for advice. 101\end{NB} 102\begin{NB} 103 \verb|ip| never uses DNS to resolve names to addresses. 104\end{NB} 105 106\end{itemize} 107 108\verb|OBJECT| is the object to manage or to get information about. 109The object types currently understood by \verb|ip| are: 110 111\begin{itemize} 112\item \verb|link| --- network device 113\item \verb|address| --- protocol (IP or IPv6) address on a device 114\item \verb|neighbour| --- ARP or NDISC cache entry 115\item \verb|route| --- routing table entry 116\item \verb|rule| --- rule in routing policy database 117\item \verb|maddress| --- multicast address 118\item \verb|mroute| --- multicast routing cache entry 119\item \verb|tunnel| --- tunnel over IP 120\end{itemize} 121 122Again, the names of all objects may be written in full or 123abbreviated form, f.e.\ \verb|address| is abbreviated as \verb|addr| 124or just \verb|a|. 125 126\verb|COMMAND| specifies the action to perform on the object. 127The set of possible actions depends on the object type. 128As a rule, it is possible to \verb|add|, \verb|delete| and 129\verb|show| (or \verb|list|) objects, but some objects 130do not allow all of these operations or have some additional commands. 131The \verb|help| command is available for all objects. It prints 132out a list of available commands and argument syntax conventions. 133 134If no command is given, some default command is assumed. 135Usually it is \verb|list| or, if the objects of this class 136cannot be listed, \verb|help|. 137 138\verb|ARGUMENTS| is a list of arguments to the command. 139The arguments depend on the command and object. There are two types of arguments: 140{\em flags\/}, consisting of a single keyword, and {\em parameters\/}, 141consisting of a keyword followed by a value. For convenience, 142each command has some {\em default parameter\/} 143which may be omitted. F.e.\ parameter \verb|dev| is the default 144for the {\tt ip link} command, so {\tt ip link ls eth0} is equivalent 145to {\tt ip link ls dev eth0}. 146In the command descriptions below such parameters 147are distinguished with the marker: ``(default)''. 148 149Almost all keywords may be abbreviated with several first (or even single) 150letters. The shortcuts are convenient when \verb|ip| is used interactively, 151but they are not recommended in scripts or when reporting bugs 152or asking for advice. ``Officially'' allowed abbreviations are listed 153in the document body. 154 155 156 157\section{{\tt ip} --- error messages} 158 159\verb|ip| may fail for one of the following reasons: 160 161\begin{itemize} 162\item 163A syntax error on the command line: an unknown keyword, incorrectly formatted 164IP address {\em et al\/}. In this case \verb|ip| prints an error message 165and exits. As a rule, the error message will contain information 166about the reason for the failure. Sometimes it also prints a help page. 167 168\item 169The arguments did not pass verification for self-consistency. 170 171\item 172\verb|ip| failed to compile a kernel request from the arguments 173because the user didn't give enough information. 174 175\item 176The kernel returned an error to some syscall. In this case \verb|ip| 177prints the error message, as it is output with \verb|perror(3)|, 178prefixed with a comment and a syscall identifier. 179 180\item 181The kernel returned an error to some RTNETLINK request. 182In this case \verb|ip| prints the error message, as it is output 183with \verb|perror(3)| prefixed with ``RTNETLINK answers:''. 184 185\end{itemize} 186 187All the operations are atomic, i.e.\ 188if the \verb|ip| utility fails, it does not change anything 189in the system. One harmful exception is \verb|ip link| command 190(Sec.\ref{IP-LINK}, p.\pageref{IP-LINK}), 191which may change only some of the device parameters given 192on command line. 193 194It is difficult to list all the error messages (especially 195syntax errors). However, as a rule, their meaning is clear 196from the context of the command. 197 198The most common mistakes are: 199 200\begin{enumerate} 201\item Netlink is not configured in the kernel. The message is: 202\begin{verbatim} 203Cannot open netlink socket: Invalid value 204\end{verbatim} 205 206\item RTNETLINK is not configured in the kernel. In this case 207one of the following messages may be printed, depending on the command: 208\begin{verbatim} 209Cannot talk to rtnetlink: Connection refused 210Cannot send dump request: Connection refused 211\end{verbatim} 212 213\item The \verb|CONFIG_IP_MULTIPLE_TABLES| option was not selected 214when configuring the kernel. In this case any attempt to use the 215\verb|ip| \verb|rule| command will fail, f.e. 216\begin{verbatim} 217kuznet@kaiser $ ip rule list 218RTNETLINK error: Invalid argument 219dump terminated 220\end{verbatim} 221 222\end{enumerate} 223 224 225\section{{\tt ip link} --- network device configuration} 226\label{IP-LINK} 227 228\paragraph{Object:} A \verb|link| is a network device and the corresponding 229commands display and change the state of devices. 230 231\paragraph{Commands:} \verb|set| and \verb|show| (or \verb|list|). 232 233\subsection{{\tt ip link set} --- change device attributes} 234 235\paragraph{Abbreviations:} \verb|set|, \verb|s|. 236 237\paragraph{Arguments:} 238 239\begin{itemize} 240\item \verb|dev NAME| (default) 241 242--- \verb|NAME| specifies the network device on which to operate. 243 244\item \verb|up| and \verb|down| 245 246--- change the state of the device to \verb|UP| or \verb|DOWN|. 247 248\item \verb|arp on| or \verb|arp off| 249 250--- change the \verb|NOARP| flag on the device. 251 252\begin{NB} 253This operation is {\em not allowed\/} if the device is in state \verb|UP|. 254Though neither the \verb|ip| utility nor the kernel check for this condition. 255You can get unpredictable results changing this flag while the 256device is running. 257\end{NB} 258 259\item \verb|multicast on| or \verb|multicast off| 260 261--- change the \verb|MULTICAST| flag on the device. 262 263\item \verb|dynamic on| or \verb|dynamic off| 264 265--- change the \verb|DYNAMIC| flag on the device. 266 267\item \verb|name NAME| 268 269--- change the name of the device. This operation is not 270recommended if the device is running or has some addresses 271already configured. 272 273\item \verb|txqueuelen NUMBER| or \verb|txqlen NUMBER| 274 275--- change the transmit queue length of the device. 276 277\item \verb|mtu NUMBER| 278 279--- change the MTU of the device. 280 281\item \verb|address LLADDRESS| 282 283--- change the station address of the interface. 284 285\item \verb|broadcast LLADDRESS|, \verb|brd LLADDRESS| or \verb|peer LLADDRESS| 286 287--- change the link layer broadcast address or the peer address when 288the interface is \verb|POINTOPOINT|. 289 290\vskip 1mm 291\begin{NB} 292For most devices (f.e.\ for Ethernet) changing the link layer 293broadcast address will break networking. 294Do not use it, if you do not understand what this operation really does. 295\end{NB} 296 297\end{itemize} 298 299\vskip 1mm 300\begin{NB} 301The {\tt ip} utility does not change the \verb|PROMISC| 302or \verb|ALLMULTI| flags. These flags are considered 303obsolete and should not be changed administratively. 304\end{NB} 305 306\paragraph{Warning:} If multiple parameter changes are requested, 307\verb|ip| aborts immediately after any of the changes have failed. 308This is the only case when \verb|ip| can move the system to 309an unpredictable state. The solution is to avoid changing 310several parameters with one {\tt ip link set} call. 311 312\paragraph{Examples:} 313\begin{itemize} 314\item \verb|ip link set dummy address 00:00:00:00:00:01| 315 316--- change the station address of the interface \verb|dummy|. 317 318\item \verb|ip link set dummy up| 319 320--- start the interface \verb|dummy|. 321 322\end{itemize} 323 324 325\subsection{{\tt ip link show} --- display device attributes} 326\label{IP-LINK-SHOW} 327 328\paragraph{Abbreviations:} \verb|show|, \verb|list|, \verb|lst|, \verb|sh|, \verb|ls|, 329\verb|l|. 330 331\paragraph{Arguments:} 332\begin{itemize} 333\item \verb|dev NAME| (default) 334 335--- \verb|NAME| specifies the network device to show. 336If this argument is omitted all devices are listed. 337 338\item \verb|up| 339 340--- only display running interfaces. 341 342\end{itemize} 343 344 345\paragraph{Output format:} 346 347\begin{verbatim} 348kuznet@alisa:~ $ ip link ls eth0 3493: eth0: <BROADCAST,MULTICAST,UP> mtu 1500 qdisc cbq qlen 100 350 link/ether 00:a0:cc:66:18:78 brd ff:ff:ff:ff:ff:ff 351kuznet@alisa:~ $ ip link ls sit0 3525: sit0@NONE: <NOARP,UP> mtu 1480 qdisc noqueue 353 link/sit 0.0.0.0 brd 0.0.0.0 354kuznet@alisa:~ $ ip link ls dummy 3552: dummy: <BROADCAST,NOARP> mtu 1500 qdisc noop 356 link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff 357kuznet@alisa:~ $ 358\end{verbatim} 359 360 361The number before each colon is an {\em interface index\/} or {\em ifindex\/}. 362This number uniquely identifies the interface. This is followed by the {\em interface name\/} 363(\verb|eth0|, \verb|sit0| etc.). The interface name is also 364unique at every given moment. However, the interface may disappear from the 365list (f.e.\ when the corresponding driver module is unloaded) and another 366one with the same name may be created later. Besides that, 367the administrator may change the name of any device with 368\verb|ip| \verb|link| \verb|set| \verb|name| 369to make it more intelligible. 370 371The interface name may have another name or \verb|NONE| appended 372after the \verb|@| sign. This means that this device is bound to some other 373device, 374i.e.\ packets send through it are encapsulated and sent via the ``master'' 375device. If the name is \verb|NONE|, the master is unknown. 376 377Then we see the interface {\em mtu\/} (``maximal transfer unit''). This determines 378the maximal size of data which can be sent as a single packet over this interface. 379 380{\em qdisc\/} (``queuing discipline'') shows the queuing algorithm used 381on the interface. Particularly, \verb|noqueue| means that this interface 382does not queue anything and \verb|noop| means that the interface is in blackhole 383mode i.e.\ all packets sent to it are immediately discarded. 384{\em qlen\/} is the default transmit queue length of the device measured 385in packets. 386 387The interface flags are summarized in the angle brackets. 388 389\begin{itemize} 390\item \verb|UP| --- the device is turned on. It is ready to accept 391packets for transmission and it may inject into the kernel packets received 392from other nodes on the network. 393 394\item \verb|LOOPBACK| --- the interface does not communicate with other 395hosts. All packets sent through it will be returned 396and nothing but bounced packets can be received. 397 398\item \verb|BROADCAST| --- the device has the facility to send packets 399to all hosts sharing the same link. A typical example is an Ethernet link. 400 401\item \verb|POINTOPOINT| --- the link has only two ends with one node 402attached to each end. All packets sent to this link will reach the peer 403and all packets received by us came from this single peer. 404 405If neither \verb|LOOPBACK| nor \verb|BROADCAST| nor \verb|POINTOPOINT| 406are set, the interface is assumed to be NMBA (Non-Broadcast Multi-Access). 407This is the most generic type of device and the most complicated one, because 408the host attached to a NBMA link has no means to send to anyone 409without additionally configured information. 410 411\item \verb|MULTICAST| --- is an advisory flag indicating that the interface 412is aware of multicasting i.e.\ sending packets to some subset of neighbouring 413nodes. Broadcasting is a particular case of multicasting, where the multicast 414group consists of all nodes on the link. It is important to emphasize 415that software {\em must not\/} interpret the absence of this flag as the inability 416to use multicasting on this interface. Any \verb|POINTOPOINT| and 417\verb|BROADCAST| link is multicasting by definition, because we have 418direct access to all the neighbours and, hence, to any part of them. 419Certainly, the use of high bandwidth multicast transfers is not recommended 420on broadcast-only links because of high expense, but it is not strictly 421prohibited. 422 423\item \verb|PROMISC| --- the device listens to and feeds to the kernel all 424traffic on the link even if it is not destined for us, not broadcasted 425and not destined for a multicast group of which we are member. Usually 426this mode exists only on broadcast links and is used by bridges and for network 427monitoring. 428 429\item \verb|ALLMULTI| --- the device receives all multicast packets 430wandering on the link. This mode is used by multicast routers. 431 432\item \verb|NOARP| --- this flag is different from the other ones. It has 433no invariant value and its interpretation depends on the network protocols 434involved. As a rule, it indicates that the device needs no address 435resolution and that the software or hardware knows how to deliver packets 436without any help from the protocol stacks. 437 438\item \verb|DYNAMIC| --- is an advisory flag indicating that the interface is 439dynamically created and destroyed. 440 441\item \verb|SLAVE| --- this interface is bonded to some other interfaces 442to share link capacities. 443 444\end{itemize} 445 446\vskip 1mm 447\begin{NB} 448There are other flags but they are either obsolete (\verb|NOTRAILERS|) 449or not implemented (\verb|DEBUG|) or specific to some devices 450(\verb|MASTER|, \verb|AUTOMEDIA| and \verb|PORTSEL|). We do not discuss 451them here. 452\end{NB} 453\begin{NB} 454The values of \verb|PROMISC| and \verb|ALLMULTI| flags 455shown by the \verb|ifconfig| utility and by the \verb|ip| utility 456are {\em different\/}. \verb|ip link ls| shows the true device state, 457while \verb|ifconfig| shows the virtual state which was set with 458\verb|ifconfig| itself. 459\end{NB} 460 461 462The second line contains information on the link layer addresses 463associated with the device. The first word (\verb|ether|, \verb|sit|) 464defines the interface hardware type. This type determines the format and semantics 465of the addresses and is logically part of the address. 466The default format of the station address and the broadcast address 467(or the peer address for pointopoint links) is a 468sequence of hexadecimal bytes separated by colons, but some link 469types may have their natural address format, f.e.\ addresses 470of tunnels over IP are printed as dotted-quad IP addresses. 471 472\vskip 1mm 473\begin{NB} 474 NBMA links have no well-defined broadcast or peer address, 475 however this field may contain useful information, f.e.\ 476 about the address of broadcast relay or about the address of the ARP server. 477\end{NB} 478\begin{NB} 479Multicast addresses are not shown by this command, see 480\verb|ip maddr ls| in~Sec.\ref{IP-MADDR} (p.\pageref{IP-MADDR} of this 481document). 482\end{NB} 483 484 485\paragraph{Statistics:} With the \verb|-statistics| option, \verb|ip| also 486prints interface statistics: 487 488\begin{verbatim} 489kuznet@alisa:~ $ ip -s link ls eth0 4903: eth0: <BROADCAST,MULTICAST,UP> mtu 1500 qdisc cbq qlen 100 491 link/ether 00:a0:cc:66:18:78 brd ff:ff:ff:ff:ff:ff 492 RX: bytes packets errors dropped overrun mcast 493 2449949362 2786187 0 0 0 0 494 TX: bytes packets errors dropped carrier collsns 495 178558497 1783945 332 0 332 35172 496kuznet@alisa:~ $ 497\end{verbatim} 498\verb|RX:| and \verb|TX:| lines summarize receiver and transmitter 499statistics. They contain: 500\begin{itemize} 501\item \verb|bytes| --- the total number of bytes received or transmitted 502on the interface. This number wraps when the maximal length of the data type 503natural for the architecture is exceeded, so continuous monitoring requires 504a user level daemon snapping it periodically. 505\item \verb|packets| --- the total number of packets received or transmitted 506on the interface. 507\item \verb|errors| --- the total number of receiver or transmitter errors. 508\item \verb|dropped| --- the total number of packets dropped due to lack 509of resources. 510\item \verb|overrun| --- the total number of receiver overruns resulting 511in dropped packets. As a rule, if the interface is overrun, it means 512serious problems in the kernel or that your machine is too slow 513for this interface. 514\item \verb|mcast| --- the total number of received multicast packets. This option 515is only supported by a few devices. 516\item \verb|carrier| --- total number of link media failures f.e.\ because 517of lost carrier. 518\item \verb|collsns| --- the total number of collision events 519on Ethernet-like media. This number may have a different sense on other 520link types. 521\item \verb|compressed| --- the total number of compressed packets. This is 522available only for links using VJ header compression. 523\end{itemize} 524 525 526If the \verb|-s| option is entered twice or more, 527\verb|ip| prints more detailed statistics on receiver 528and transmitter errors. 529 530\begin{verbatim} 531kuznet@alisa:~ $ ip -s -s link ls eth0 5323: eth0: <BROADCAST,MULTICAST,UP> mtu 1500 qdisc cbq qlen 100 533 link/ether 00:a0:cc:66:18:78 brd ff:ff:ff:ff:ff:ff 534 RX: bytes packets errors dropped overrun mcast 535 2449949362 2786187 0 0 0 0 536 RX errors: length crc frame fifo missed 537 0 0 0 0 0 538 TX: bytes packets errors dropped carrier collsns 539 178558497 1783945 332 0 332 35172 540 TX errors: aborted fifo window heartbeat 541 0 0 0 332 542kuznet@alisa:~ $ 543\end{verbatim} 544These error names are pure Ethernetisms. Other devices 545may have non zero values in these fields but they may be 546interpreted differently. 547 548 549\section{{\tt ip address} --- protocol address management} 550 551\paragraph{Abbreviations:} \verb|address|, \verb|addr|, \verb|a|. 552 553\paragraph{Object:} The \verb|address| is a protocol (IP or IPv6) address attached 554to a network device. Each device must have at least one address 555to use the corresponding protocol. It is possible to have several 556different addresses attached to one device. These addresses are not 557discriminated, so that the term {\em alias\/} is not quite appropriate 558for them and we do not use it in this document. 559 560The \verb|ip addr| command displays addresses and their properties, 561adds new addresses and deletes old ones. 562 563\paragraph{Commands:} \verb|add|, \verb|delete|, \verb|flush| and \verb|show| 564(or \verb|list|). 565 566 567\subsection{{\tt ip address add} --- add a new protocol address} 568\label{IP-ADDR-ADD} 569 570\paragraph{Abbreviations:} \verb|add|, \verb|a|. 571 572\paragraph{Arguments:} 573 574\begin{itemize} 575\item \verb|dev NAME| 576 577\noindent--- the name of the device to add the address to. 578 579\item \verb|local ADDRESS| (default) 580 581--- the address of the interface. The format of the address depends 582on the protocol. It is a dotted quad for IP and a sequence of hexadecimal halfwords 583separated by colons for IPv6. The \verb|ADDRESS| may be followed by 584a slash and a decimal number which encodes the network prefix length. 585 586 587\item \verb|peer ADDRESS| 588 589--- the address of the remote endpoint for pointopoint interfaces. 590Again, the \verb|ADDRESS| may be followed by a slash and a decimal number, 591encoding the network prefix length. If a peer address is specified, 592the local address {\em cannot\/} have a prefix length. The network prefix is associated 593with the peer rather than with the local address. 594 595 596\item \verb|broadcast ADDRESS| 597 598--- the broadcast address on the interface. 599 600It is possible to use the special symbols \verb|'+'| and \verb|'-'| 601instead of the broadcast address. In this case, the broadcast address 602is derived by setting/resetting the host bits of the interface prefix. 603 604\vskip 1mm 605\begin{NB} 606Unlike \verb|ifconfig|, the \verb|ip| utility {\em does not\/} set any broadcast 607address unless explicitly requested. 608\end{NB} 609 610 611\item \verb|label NAME| 612 613--- Each address may be tagged with a label string. 614In order to preserve compatibility with Linux-2.0 net aliases, 615this string must coincide with the name of the device or must be prefixed 616with the device name followed by colon. 617 618 619\item \verb|scope SCOPE_VALUE| 620 621--- the scope of the area where this address is valid. 622The available scopes are listed in file \verb|/etc/iproute2/rt_scopes|. 623Predefined scope values are: 624 625 \begin{itemize} 626 \item \verb|global| --- the address is globally valid. 627 \item \verb|site| --- (IPv6 only) the address is site local, 628 i.e.\ it is valid inside this site. 629 \item \verb|link| --- the address is link local, i.e.\ 630 it is valid only on this device. 631 \item \verb|host| --- the address is valid only inside this host. 632 \end{itemize} 633 634Appendix~\ref{ADDR-SEL} (p.\pageref{ADDR-SEL} of this document) 635contains more details on address scopes. 636 637\end{itemize} 638 639\paragraph{Examples:} 640\begin{itemize} 641\item \verb|ip addr add 127.0.0.1/8 dev lo brd + scope host| 642 643--- add the usual loopback address to the loopback device. 644 645\item \verb|ip addr add 10.0.0.1/24 brd + dev eth0 label eth0:Alias| 646 647--- add the address 10.0.0.1 with prefix length 24 (i.e.\ netmask 648\verb|255.255.255.0|), standard broadcast and label \verb|eth0:Alias| 649to the interface \verb|eth0|. 650\end{itemize} 651 652 653\subsection{{\tt ip address delete} --- delete a protocol address} 654 655\paragraph{Abbreviations:} \verb|delete|, \verb|del|, \verb|d|. 656 657\paragraph{Arguments:} coincide with the arguments of \verb|ip addr add|. 658The device name is a required argument. The rest are optional. 659If no arguments are given, the first address is deleted. 660 661\paragraph{Examples:} 662\begin{itemize} 663\item \verb|ip addr del 127.0.0.1/8 dev lo| 664 665--- deletes the loopback address from the loopback device. 666It would be best not to repeat this experiment. 667 668\item Disable IP on the interface \verb|eth0|: 669\begin{verbatim} 670 while ip -f inet addr del dev eth0; do 671 : nothing 672 done 673\end{verbatim} 674Another method to disable IP on an interface using {\tt ip addr flush} 675may be found in sec.\ref{IP-ADDR-FLUSH}, p.\pageref{IP-ADDR-FLUSH}. 676 677\end{itemize} 678 679 680\subsection{{\tt ip address show} --- display protocol addresses} 681 682\paragraph{Abbreviations:} \verb|show|, \verb|list|, \verb|lst|, \verb|sh|, \verb|ls|, 683\verb|l|. 684 685\paragraph{Arguments:} 686 687\begin{itemize} 688\item \verb|dev NAME| (default) 689 690--- the name of the device. 691 692\item \verb|scope SCOPE_VAL| 693 694--- only list addresses with this scope. 695 696\item \verb|to PREFIX| 697 698--- only list addresses matching this prefix. 699 700\item \verb|label PATTERN| 701 702--- only list addresses with labels matching the \verb|PATTERN|. 703\verb|PATTERN| is a usual shell style pattern. 704 705 706\item \verb|dynamic| and \verb|permanent| 707 708--- (IPv6 only) only list addresses installed due to stateless 709address configuration or only list permanent (not dynamic) addresses. 710 711\item \verb|tentative| 712 713--- (IPv6 only) only list addresses which did not pass duplicate 714address detection. 715 716\item \verb|deprecated| 717 718--- (IPv6 only) only list deprecated addresses. 719 720 721\item \verb|primary| and \verb|secondary| 722 723--- only list primary (or secondary) addresses. 724 725\end{itemize} 726 727 728\paragraph{Output format:} 729 730\begin{verbatim} 731kuznet@alisa:~ $ ip addr ls eth0 7323: eth0: <BROADCAST,MULTICAST,UP> mtu 1500 qdisc cbq qlen 100 733 link/ether 00:a0:cc:66:18:78 brd ff:ff:ff:ff:ff:ff 734 inet 193.233.7.90/24 brd 193.233.7.255 scope global eth0 735 inet6 3ffe:2400:0:1:2a0:ccff:fe66:1878/64 scope global dynamic 736 valid_lft forever preferred_lft 604746sec 737 inet6 fe80::2a0:ccff:fe66:1878/10 scope link 738kuznet@alisa:~ $ 739\end{verbatim} 740 741The first two lines coincide with the output of \verb|ip link ls|. 742It is natural to interpret link layer addresses 743as addresses of the protocol family \verb|AF_PACKET|. 744 745Then the list of IP and IPv6 addresses follows, accompanied by 746additional address attributes: scope value (see Sec.\ref{IP-ADDR-ADD}, 747p.\pageref{IP-ADDR-ADD} above), flags and the address label. 748 749Address flags are set by the kernel and cannot be changed 750administratively. Currently, the following flags are defined: 751 752\begin{enumerate} 753\item \verb|secondary| 754 755--- the address is not used when selecting the default source address 756of outgoing packets (Cf.\ Appendix~\ref{ADDR-SEL}, p.\pageref{ADDR-SEL}.). 757An IP address becomes secondary if another address with the same 758prefix bits already exists. The first address is primary. 759It is the leader of the group of all secondary addresses. When the leader 760is deleted, all secondaries are purged too. 761 762 763\item \verb|dynamic| 764 765--- the address was created due to stateless autoconfiguration~\cite{RFC-ADDRCONF}. 766In this case the output also contains information on times, when 767the address is still valid. After \verb|preferred_lft| expires the address is 768moved to the deprecated state. After \verb|valid_lft| expires the address 769is finally invalidated. 770 771\item \verb|deprecated| 772 773--- the address is deprecated, i.e.\ it is still valid, but cannot 774be used by newly created connections. 775 776\item \verb|tentative| 777 778--- the address is not used because duplicate address detection~\cite{RFC-ADDRCONF} 779is still not complete or failed. 780 781\end{enumerate} 782 783 784\subsection{{\tt ip address flush} --- flush protocol addresses} 785\label{IP-ADDR-FLUSH} 786 787\paragraph{Abbreviations:} \verb|flush|, \verb|f|. 788 789\paragraph{Description:}This command flushes the protocol addresses 790selected by some criteria. 791 792\paragraph{Arguments:} This command has the same arguments as \verb|show|. 793The difference is that it does not run when no arguments are given. 794 795\paragraph{Warning:} This command (and other \verb|flush| commands 796described below) is pretty dangerous. If you make a mistake, it will 797not forgive it, but will cruelly purge all the addresses. 798 799\paragraph{Statistics:} With the \verb|-statistics| option, the command 800becomes verbose. It prints out the number of deleted addresses and the number 801of rounds made to flush the address list. If this option is given 802twice, \verb|ip addr flush| also dumps all the deleted addresses 803in the format described in the previous subsection. 804 805\paragraph{Example:} Delete all the addresses from the private network 80610.0.0.0/8: 807\begin{verbatim} 808netadm@amber:~ # ip -s -s a f to 10/8 8092: dummy inet 10.7.7.7/16 brd 10.7.255.255 scope global dummy 8103: eth0 inet 10.10.7.7/16 brd 10.10.255.255 scope global eth0 8114: eth1 inet 10.8.7.7/16 brd 10.8.255.255 scope global eth1 812 813*** Round 1, deleting 3 addresses *** 814*** Flush is complete after 1 round *** 815netadm@amber:~ # 816\end{verbatim} 817Another instructive example is disabling IP on all the Ethernets: 818\begin{verbatim} 819netadm@amber:~ # ip -4 addr flush label "eth*" 820\end{verbatim} 821And the last example shows how to flush all the IPv6 addresses 822acquired by the host from stateless address autoconfiguration 823after you enabled forwarding or disabled autoconfiguration. 824\begin{verbatim} 825netadm@amber:~ # ip -6 addr flush dynamic 826\end{verbatim} 827 828 829 830\section{{\tt ip neighbour} --- neighbour/arp tables management} 831 832\paragraph{Abbreviations:} \verb|neighbour|, \verb|neighbor|, \verb|neigh|, 833\verb|n|. 834 835\paragraph{Object:} \verb|neighbour| objects establish bindings between protocol 836addresses and link layer addresses for hosts sharing the same link. 837Neighbour entries are organized into tables. The IPv4 neighbour table 838is known by another name --- the ARP table. 839 840The corresponding commands display neighbour bindings 841and their properties, add new neighbour entries and delete old ones. 842 843\paragraph{Commands:} \verb|add|, \verb|change|, \verb|replace|, 844\verb|delete|, \verb|flush| and \verb|show| (or \verb|list|). 845 846\paragraph{See also:} Appendix~\ref{PROXY-NEIGH}, p.\pageref{PROXY-NEIGH} 847describes how to manage proxy ARP/NDISC with the \verb|ip| utility. 848 849 850\subsection{{\tt ip neighbour add} --- add a new neighbour entry\\ 851 {\tt ip neighbour change} --- change an existing entry\\ 852 {\tt ip neighbour replace} --- add a new entry or change an existing one} 853 854\paragraph{Abbreviations:} \verb|add|, \verb|a|; \verb|change|, \verb|chg|; 855\verb|replace|, \verb|repl|. 856 857\paragraph{Description:} These commands create new neighbour records 858or update existing ones. 859 860\paragraph{Arguments:} 861 862\begin{itemize} 863\item \verb|to ADDRESS| (default) 864 865--- the protocol address of the neighbour. It is either an IPv4 or IPv6 address. 866 867\item \verb|dev NAME| 868 869--- the interface to which this neighbour is attached. 870 871 872\item \verb|lladdr LLADDRESS| 873 874--- the link layer address of the neighbour. \verb|LLADDRESS| can also be 875\verb|null|. 876 877\item \verb|nud NUD_STATE| 878 879--- the state of the neighbour entry. \verb|nud| is an abbreviation for ``Neighbour 880Unreachability Detection''. The state can take one of the following values: 881 882\begin{enumerate} 883\item \verb|permanent| --- the neighbour entry is valid forever and can be only be removed 884administratively. 885\item \verb|noarp| --- the neighbour entry is valid. No attempts to validate 886this entry will be made but it can be removed when its lifetime expires. 887\item \verb|reachable| --- the neighbour entry is valid until the reachability 888timeout expires. 889\item \verb|stale| --- the neighbour entry is valid but suspicious. 890This option to \verb|ip neigh| does not change the neighbour state if 891it was valid and the address is not changed by this command. 892\end{enumerate} 893 894\end{itemize} 895 896\paragraph{Examples:} 897\begin{itemize} 898\item \verb|ip neigh add 10.0.0.3 lladdr 0:0:0:0:0:1 dev eth0 nud perm| 899 900--- add a permanent ARP entry for the neighbour 10.0.0.3 on the device \verb|eth0|. 901 902\item \verb|ip neigh chg 10.0.0.3 dev eth0 nud reachable| 903 904--- change its state to \verb|reachable|. 905\end{itemize} 906 907 908\subsection{{\tt ip neighbour delete} --- delete a neighbour entry} 909 910\paragraph{Abbreviations:} \verb|delete|, \verb|del|, \verb|d|. 911 912\paragraph{Description:} This command invalidates a neighbour entry. 913 914\paragraph{Arguments:} The arguments are the same as with \verb|ip neigh add|, 915except that \verb|lladdr| and \verb|nud| are ignored. 916 917 918\paragraph{Example:} 919\begin{itemize} 920\item \verb|ip neigh del 10.0.0.3 dev eth0| 921 922--- invalidate an ARP entry for the neighbour 10.0.0.3 on the device \verb|eth0|. 923 924\end{itemize} 925 926\begin{NB} 927 The deleted neighbour entry will not disappear from the tables 928 immediately. If it is in use it cannot be deleted until the last 929 client releases it. Otherwise it will be destroyed during 930 the next garbage collection. 931\end{NB} 932 933 934\paragraph{Warning:} Attempts to delete or manually change 935a \verb|noarp| entry created by the kernel may result in unpredictable behaviour. 936Particularly, the kernel may try to resolve this address even 937on a \verb|NOARP| interface or if the address is multicast or broadcast. 938 939 940\subsection{{\tt ip neighbour show} --- list neighbour entries} 941 942\paragraph{Abbreviations:} \verb|show|, \verb|list|, \verb|sh|, \verb|ls|. 943 944\paragraph{Description:}This commands displays neighbour tables. 945 946\paragraph{Arguments:} 947 948\begin{itemize} 949 950\item \verb|to ADDRESS| (default) 951 952--- the prefix selecting the neighbours to list. 953 954\item \verb|dev NAME| 955 956--- only list the neighbours attached to this device. 957 958\item \verb|unused| 959 960--- only list neighbours which are not currently in use. 961 962\item \verb|nud NUD_STATE| 963 964--- only list neighbour entries in this state. \verb|NUD_STATE| takes 965values listed below or the special value \verb|all| which means all states. 966This option may occur more than once. If this option is absent, \verb|ip| 967lists all entries except for \verb|none| and \verb|noarp|. 968 969\end{itemize} 970 971 972\paragraph{Output format:} 973 974\begin{verbatim} 975kuznet@alisa:~ $ ip neigh ls 976:: dev lo lladdr 00:00:00:00:00:00 nud noarp 977fe80::200:cff:fe76:3f85 dev eth0 lladdr 00:00:0c:76:3f:85 router \ 978 nud stale 9790.0.0.0 dev lo lladdr 00:00:00:00:00:00 nud noarp 980193.233.7.254 dev eth0 lladdr 00:00:0c:76:3f:85 nud reachable 981193.233.7.85 dev eth0 lladdr 00:e0:1e:63:39:00 nud stale 982kuznet@alisa:~ $ 983\end{verbatim} 984 985The first word of each line is the protocol address of the neighbour. 986Then the device name follows. The rest of the line describes the contents of 987the neighbour entry identified by the pair (device, address). 988 989\verb|lladdr| is the link layer address of the neighbour. 990 991\verb|nud| is the state of the ``neighbour unreachability detection'' machine 992for this entry. The detailed description of the neighbour 993state machine can be found in~\cite{RFC-NDISC}. Here is the full list 994of the states with short descriptions: 995 996\begin{enumerate} 997\item\verb|none| --- the state of the neighbour is void. 998\item\verb|incomplete| --- the neighbour is in the process of resolution. 999\item\verb|reachable| --- the neighbour is valid and apparently reachable. 1000\item\verb|stale| --- the neighbour is valid, but is probably already 1001unreachable, so the kernel will try to check it at the first transmission. 1002\item\verb|delay| --- a packet has been sent to the stale neighbour and the kernel is waiting 1003for confirmation. 1004\item\verb|probe| --- the delay timer expired but no confirmation was received. 1005The kernel has started to probe the neighbour with ARP/NDISC messages. 1006\item\verb|failed| --- resolution has failed. 1007\item\verb|noarp| --- the neighbour is valid. No attempts to check the entry 1008will be made. 1009\item\verb|permanent| --- it is a \verb|noarp| entry, but only the administrator 1010may remove the entry from the neighbour table. 1011\end{enumerate} 1012 1013The link layer address is valid in all states except for \verb|none|, 1014\verb|failed| and \verb|incomplete|. 1015 1016IPv6 neighbours can be marked with the additional flag \verb|router| 1017which means that the neighbour introduced itself as an IPv6 router~\cite{RFC-NDISC}. 1018 1019\paragraph{Statistics:} The \verb|-statistics| option displays some usage 1020statistics, f.e.\ 1021 1022\begin{verbatim} 1023kuznet@alisa:~ $ ip -s n ls 193.233.7.254 1024193.233.7.254 dev eth0 lladdr 00:00:0c:76:3f:85 ref 5 used 12/13/20 \ 1025 nud reachable 1026kuznet@alisa:~ $ 1027\end{verbatim} 1028 1029Here \verb|ref| is the number of users of this entry 1030and \verb|used| is a triplet of time intervals in seconds 1031separated by slashes. In this case they show that: 1032 1033\begin{enumerate} 1034\item the entry was used 12 seconds ago. 1035\item the entry was confirmed 13 seconds ago. 1036\item the entry was updated 20 seconds ago. 1037\end{enumerate} 1038 1039\subsection{{\tt ip neighbour flush} --- flush neighbour entries} 1040 1041\paragraph{Abbreviations:} \verb|flush|, \verb|f|. 1042 1043\paragraph{Description:}This command flushes neighbour tables, selecting 1044entries to flush by some criteria. 1045 1046\paragraph{Arguments:} This command has the same arguments as \verb|show|. 1047The differences are that it does not run when no arguments are given, 1048and that the default neighbour states to be flushed do not include 1049\verb|permanent| and \verb|noarp|. 1050 1051 1052\paragraph{Statistics:} With the \verb|-statistics| option, the command 1053becomes verbose. It prints out the number of deleted neighbours and the number 1054of rounds made to flush the neighbour table. If the option is given 1055twice, \verb|ip neigh flush| also dumps all the deleted neighbours 1056in the format described in the previous subsection. 1057 1058\paragraph{Example:} 1059\begin{verbatim} 1060netadm@alisa:~ # ip -s -s n f 193.233.7.254 1061193.233.7.254 dev eth0 lladdr 00:00:0c:76:3f:85 ref 5 used 12/13/20 \ 1062 nud reachable 1063 1064*** Round 1, deleting 1 entries *** 1065*** Flush is complete after 1 round *** 1066netadm@alisa:~ # 1067\end{verbatim} 1068 1069 1070\section{{\tt ip route} --- routing table management} 1071\label{IP-ROUTE} 1072 1073\paragraph{Abbreviations:} \verb|route|, \verb|ro|, \verb|r|. 1074 1075\paragraph{Object:} \verb|route| entries in the kernel routing tables keep 1076information about paths to other networked nodes. 1077 1078Each route entry has a {\em key\/} consisting of a {\em prefix\/} 1079(i.e.\ a pair containing a network address and the length of its mask) and, 1080optionally, the TOS value. An IP packet matches the route if the highest 1081bits of its destination address are equal to the route prefix at least 1082up to the prefix length and if the TOS of the route is zero or equal to 1083the TOS of the packet. 1084 1085If several routes match the packet, the following pruning rules 1086are used to select the best one (see~\cite{RFC1812}): 1087\begin{enumerate} 1088\item The longest matching prefix is selected. All shorter ones 1089are dropped. 1090 1091\item If the TOS of some route with the longest prefix is equal to the TOS 1092of the packet, the routes with different TOS are dropped. 1093 1094If no exact TOS match was found and routes with TOS=0 exist, 1095the rest of routes are pruned. 1096 1097Otherwise, the route lookup fails. 1098 1099\item If several routes remain after the previous steps, then 1100the routes with the best preference values are selected. 1101 1102\item If we still have several routes, then the {\em first\/} of them 1103is selected. 1104 1105\begin{NB} 1106 Note the ambiguity of the last step. Unfortunately, Linux 1107 historically allows such a bizarre situation. The sense of the 1108word ``first'' depends on the order of route additions and it is practically 1109impossible to maintain a bundle of such routes in this order. 1110\end{NB} 1111 1112For simplicity we will limit ourselves to the case where such a situation 1113is impossible and routes are uniquely identified by the triplet 1114\{prefix, tos, preference\}. Actually, it is impossible to create 1115non-unique routes with \verb|ip| commands described in this section. 1116 1117One useful exception to this rule is the default route on non-forwarding 1118hosts. It is ``officially'' allowed to have several fallback routes 1119when several routers are present on directly connected networks. 1120In this case, Linux-2.2 makes ``dead gateway detection''~\cite{RFC1122} 1121controlled by neighbour unreachability detection and by advice 1122from transport protocols to select a working router, so the order 1123of the routes is not essential. However, in this case, 1124fiddling with default routes manually is not recommended. Use the Router Discovery 1125protocol (see Appendix~\ref{EXAMPLE-SETUP}, p.\pageref{EXAMPLE-SETUP}) 1126instead. Actually, Linux-2.2 IPv6 does not give user level applications 1127any access to default routes. 1128\end{enumerate} 1129 1130Certainly, the steps above are not performed exactly 1131in this sequence. Instead, the routing table in the kernel is kept 1132in some data structure to achieve the final result 1133with minimal cost. However, not depending on a particular 1134routing algorithm implemented in the kernel, we can summarize 1135the statements above as: a route is identified by the triplet 1136\{prefix, tos, preference\}. This {\em key\/} lets us locate 1137the route in the routing table. 1138 1139\paragraph{Route attributes:} Each route key refers to a routing 1140information record containing 1141the data required to deliver IP packets (f.e.\ output device and 1142next hop router) and some optional attributes (f.e. the path MTU or 1143the preferred source address when communicating with this destination). 1144These attributes are described in the following subsection. 1145 1146\paragraph{Route types:} \label{IP-ROUTE-TYPES} 1147It is important that the set 1148of required and optional attributes depend on the route {\em type\/}. 1149The most important route type 1150is \verb|unicast|. It describes real paths to other hosts. 1151As a rule, common routing tables contain only such routes. However, 1152there are other types of routes with different semantics. The 1153full list of types understood by Linux-2.2 is: 1154\begin{itemize} 1155\item \verb|unicast| --- the route entry describes real paths to the 1156destinations covered by the route prefix. 1157\item \verb|unreachable| --- these destinations are unreachable. Packets 1158are discarded and the ICMP message {\em host unreachable\/} is generated. 1159The local senders get an \verb|EHOSTUNREACH| error. 1160\item \verb|blackhole| --- these destinations are unreachable. Packets 1161are discarded silently. The local senders get an \verb|EINVAL| error. 1162\item \verb|prohibit| --- these destinations are unreachable. Packets 1163are discarded and the ICMP message {\em communication administratively 1164prohibited\/} is generated. The local senders get an \verb|EACCES| error. 1165\item \verb|local| --- the destinations are assigned to this 1166host. The packets are looped back and delivered locally. 1167\item \verb|broadcast| --- the destinations are broadcast addresses. 1168The packets are sent as link broadcasts. 1169\item \verb|throw| --- a special control route used together with policy 1170rules (see sec.\ref{IP-RULE}, p.\pageref{IP-RULE}). If such a route is selected, lookup 1171in this table is terminated pretending that no route was found. 1172Without policy routing it is equivalent to the absence of the route in the routing 1173table. The packets are dropped and the ICMP message {\em net unreachable\/} 1174is generated. The local senders get an \verb|ENETUNREACH| error. 1175\item \verb|nat| --- a special NAT route. Destinations covered by the prefix 1176are considered to be dummy (or external) addresses which require translation 1177to real (or internal) ones before forwarding. The addresses to translate to 1178are selected with the attribute \verb|via|. More about NAT is 1179in Appendix~\ref{ROUTE-NAT}, p.\pageref{ROUTE-NAT}. 1180\item \verb|anycast| --- ({\em not implemented\/}) the destinations are 1181{\em anycast\/} addresses assigned to this host. They are mainly equivalent 1182to \verb|local| with one difference: such addresses are invalid when used 1183as the source address of any packet. 1184\item \verb|multicast| --- a special type used for multicast routing. 1185It is not present in normal routing tables. 1186\end{itemize} 1187 1188\paragraph{Route tables:} Linux-2.2 can pack routes into several routing 1189tables identified by a number in the range from 1 to 255 or by 1190name from the file \verb|/etc/iproute2/rt_tables|. By default all normal 1191routes are inserted into the \verb|main| table (ID 254) and the kernel only uses 1192this table when calculating routes. 1193 1194Actually, one other table always exists, which is invisible but 1195even more important. It is the \verb|local| table (ID 255). This table 1196consists of routes for local and broadcast addresses. The kernel maintains 1197this table automatically and the administrator usually need not modify it 1198or even look at it. 1199 1200The multiple routing tables enter the game when {\em policy routing\/} 1201is used. See sec.\ref{IP-RULE}, p.\pageref{IP-RULE}. 1202In this case, the table identifier effectively becomes 1203one more parameter, which should be added to the triplet 1204\{prefix, tos, preference\} to uniquely identify the route. 1205 1206 1207\subsection{{\tt ip route add} --- add a new route\\ 1208 {\tt ip route change} --- change a route\\ 1209 {\tt ip route replace} --- change a route or add a new one} 1210\label{IP-ROUTE-ADD} 1211 1212\paragraph{Abbreviations:} \verb|add|, \verb|a|; \verb|change|, \verb|chg|; 1213 \verb|replace|, \verb|repl|. 1214 1215 1216\paragraph{Arguments:} 1217\begin{itemize} 1218\item \verb|to PREFIX| or \verb|to TYPE PREFIX| (default) 1219 1220--- the destination prefix of the route. If \verb|TYPE| is omitted, 1221\verb|ip| assumes type \verb|unicast|. Other values of \verb|TYPE| 1222are listed above. \verb|PREFIX| is an IP or IPv6 address optionally followed 1223by a slash and the prefix length. If the length of the prefix is missing, 1224\verb|ip| assumes a full-length host route. There is also a special 1225\verb|PREFIX| --- \verb|default| --- which is equivalent to IP \verb|0/0| or 1226to IPv6 \verb|::/0|. 1227 1228\item \verb|tos TOS| or \verb|dsfield TOS| 1229 1230--- the Type Of Service (TOS) key. This key has no associated mask and 1231the longest match is understood as: First, compare the TOS 1232of the route and of the packet. If they are not equal, then the packet 1233may still match a route with a zero TOS. \verb|TOS| is either an 8 bit hexadecimal 1234number or an identifier from {\tt /etc/iproute2/rt\_dsfield}. 1235 1236 1237\item \verb|metric NUMBER| or \verb|preference NUMBER| 1238 1239--- the preference value of the route. \verb|NUMBER| is an arbitrary 32bit number. 1240 1241\item \verb|table TABLEID| 1242 1243--- the table to add this route to. 1244\verb|TABLEID| may be a number or a string from the file 1245\verb|/etc/iproute2/rt_tables|. If this parameter is omitted, 1246\verb|ip| assumes the \verb|main| table, with the exception of 1247\verb|local|, \verb|broadcast| and \verb|nat| routes, which are 1248put into the \verb|local| table by default. 1249 1250\item \verb|dev NAME| 1251 1252--- the output device name. 1253 1254\item \verb|via ADDRESS| 1255 1256--- the address of the nexthop router. Actually, the sense of this field depends 1257on the route type. For normal \verb|unicast| routes it is either the true nexthop 1258router or, if it is a direct route installed in BSD compatibility mode, 1259it can be a local address of the interface. 1260For NAT routes it is the first address of the block of translated IP destinations. 1261 1262\item \verb|src ADDRESS| 1263 1264--- the source address to prefer when sending to the destinations 1265covered by the route prefix. 1266 1267\item \verb|realm REALMID| 1268 1269--- the realm to which this route is assigned. 1270\verb|REALMID| may be a number or a string from the file 1271\verb|/etc/iproute2/rt_realms|. Sec.\ref{RT-REALMS} (p.\pageref{RT-REALMS}) 1272contains more information on realms. 1273 1274\item \verb|mtu MTU| or \verb|mtu lock MTU| 1275 1276--- the MTU along the path to the destination. If the modifier \verb|lock| is 1277not used, the MTU may be updated by the kernel due to Path MTU Discovery. 1278If the modifier \verb|lock| is used, no path MTU discovery will be tried, 1279all packets will be sent without the DF bit in IPv4 case 1280or fragmented to MTU for IPv6. 1281 1282\item \verb|window NUMBER| 1283 1284--- the maximal window for TCP to advertise to these destinations, 1285measured in bytes. It limits maximal data bursts that our TCP 1286peers are allowed to send to us. 1287 1288\item \verb|rtt NUMBER| 1289 1290--- the initial RTT (``Round Trip Time'') estimate. 1291 1292 1293\item \verb|rttvar NUMBER| 1294 1295--- \threeonly the initial RTT variance estimate. 1296 1297 1298\item \verb|ssthresh NUMBER| 1299 1300--- \threeonly an estimate for the initial slow start threshold. 1301 1302 1303\item \verb|cwnd NUMBER| 1304 1305--- \threeonly the clamp for congestion window. It is ignored if the \verb|lock| 1306 flag is not used. 1307 1308 1309\item \verb|advmss NUMBER| 1310 1311--- \threeonly the MSS (``Maximal Segment Size'') to advertise to these 1312 destinations when establishing TCP connections. If it is not given, 1313 Linux uses a default value calculated from the first hop device MTU. 1314 1315\begin{NB} 1316 If the path to these destination is asymmetric, this guess may be wrong. 1317\end{NB} 1318 1319\item \verb|reordering NUMBER| 1320 1321--- \threeonly Maximal reordering on the path to this destination. 1322 If it is not given, Linux uses the value selected with \verb|sysctl| 1323 variable \verb|net/ipv4/tcp_reordering|. 1324 1325 1326 1327\item \verb|nexthop NEXTHOP| 1328 1329--- the nexthop of a multipath route. \verb|NEXTHOP| is a complex value 1330with its own syntax similar to the top level argument lists: 1331\begin{itemize} 1332\item \verb|via ADDRESS| is the nexthop router. 1333\item \verb|dev NAME| is the output device. 1334\item \verb|weight NUMBER| is a weight for this element of a multipath 1335route reflecting its relative bandwidth or quality. 1336\end{itemize} 1337 1338\item \verb|scope SCOPE_VAL| 1339 1340--- the scope of the destinations covered by the route prefix. 1341\verb|SCOPE_VAL| may be a number or a string from the file 1342\verb|/etc/iproute2/rt_scopes|. 1343If this parameter is omitted, 1344\verb|ip| assumes scope \verb|global| for all gatewayed \verb|unicast| 1345routes, scope \verb|link| for direct \verb|unicast| and \verb|broadcast| routes 1346and scope \verb|host| for \verb|local| routes. 1347 1348\item \verb|protocol RTPROTO| 1349 1350--- the routing protocol identifier of this route. 1351\verb|RTPROTO| may be a number or a string from the file 1352\verb|/etc/iproute2/rt_protos|. If the routing protocol ID is 1353not given, \verb|ip| assumes protocol \verb|boot| (i.e.\ 1354it assumes the route was added by someone who doesn't 1355understand what they are doing). Several protocol values have a fixed interpretation. 1356Namely: 1357\begin{itemize} 1358\item \verb|redirect| --- the route was installed due to an ICMP redirect. 1359\item \verb|kernel| --- the route was installed by the kernel during 1360autoconfiguration. 1361\item \verb|boot| --- the route was installed during the bootup sequence. 1362If a routing daemon starts, it will purge all of them. 1363\item \verb|static| --- the route was installed by the administrator 1364to override dynamic routing. Routing daemon will respect them 1365and, probably, even advertise them to its peers. 1366\item \verb|ra| --- the route was installed by Router Discovery protocol. 1367\end{itemize} 1368The rest of the values are not reserved and the administrator is free 1369to assign (or not to assign) protocol tags. At least, routing 1370daemons should take care of setting some unique protocol values, 1371f.e.\ as they are assigned in \verb|rtnetlink.h| or in \verb|rt_protos| 1372database. 1373 1374 1375\item \verb|onlink| 1376 1377--- pretend that the nexthop is directly attached to this link, 1378even if it does not match any interface prefix. One application of this 1379option may be found in~\cite{IP-TUNNELS}. 1380 1381\item \verb|equalize| 1382 1383--- allow packet by packet randomization on multipath routes. 1384Without this modifier, the route will be frozen to one selected 1385nexthop, so that load splitting will only occur on per-flow base. 1386\verb|equalize| only works if the kernel is patched. 1387 1388 1389\end{itemize} 1390 1391 1392\begin{NB} 1393 Actually there are more commands: \verb|prepend| does the same 1394 thing as classic \verb|route add|, i.e.\ adds a route, even if another 1395 route to the same destination exists. Its opposite case is \verb|append|, 1396 which adds the route to the end of the list. Avoid these 1397 features. 1398\end{NB} 1399\begin{NB} 1400 More sad news, IPv6 only understands the \verb|append| command correctly. 1401 All the others are translated into \verb|append| commands. Certainly, 1402 this will change in the future. 1403\end{NB} 1404 1405\paragraph{Examples:} 1406\begin{itemize} 1407\item add a plain route to network 10.0.0/24 via gateway 193.233.7.65 1408\begin{verbatim} 1409 ip route add 10.0.0/24 via 193.233.7.65 1410\end{verbatim} 1411\item change it to a direct route via the \verb|dummy| device 1412\begin{verbatim} 1413 ip ro chg 10.0.0/24 dev dummy 1414\end{verbatim} 1415\item add a default multipath route splitting the load between \verb|ppp0| 1416and \verb|ppp1| 1417\begin{verbatim} 1418 ip route add default scope global nexthop dev ppp0 \ 1419 nexthop dev ppp1 1420\end{verbatim} 1421Note the scope value. It is not necessary but it informs the kernel 1422that this route is gatewayed rather than direct. Actually, if you 1423know the addresses of remote endpoints it would be better to use the 1424\verb|via| parameter. 1425\item announce that the address 192.203.80.144 is not a real one, but 1426should be translated to 193.233.7.83 before forwarding 1427\begin{verbatim} 1428 ip route add nat 192.203.80.144 via 193.233.7.83 1429\end{verbatim} 1430Backward translation is setup with policy rules described 1431in the following section (sec.\ref{IP-RULE}, p.\pageref{IP-RULE}). 1432\end{itemize} 1433 1434\subsection{{\tt ip route delete} --- delete a route} 1435 1436\paragraph{Abbreviations:} \verb|delete|, \verb|del|, \verb|d|. 1437 1438\paragraph{Arguments:} \verb|ip route del| has the same arguments as 1439\verb|ip route add|, but their semantics are a bit different. 1440 1441Key values (\verb|to|, \verb|tos|, \verb|preference| and \verb|table|) 1442select the route to delete. If optional attributes are present, \verb|ip| 1443verifies that they coincide with the attributes of the route to delete. 1444If no route with the given key and attributes was found, \verb|ip route del| 1445fails. 1446\begin{NB} 1447Linux-2.0 had the option to delete a route selected only by prefix address, 1448ignoring its length (i.e.\ netmask). This option no longer exists 1449because it was ambiguous. However, look at {\tt ip route flush} 1450(sec.\ref{IP-ROUTE-FLUSH}, p.\pageref{IP-ROUTE-FLUSH}) which 1451provides similar and even richer functionality. 1452\end{NB} 1453 1454\paragraph{Example:} 1455\begin{itemize} 1456\item delete the multipath route created by the command in previous subsection 1457\begin{verbatim} 1458 ip route del default scope global nexthop dev ppp0 \ 1459 nexthop dev ppp1 1460\end{verbatim} 1461\end{itemize} 1462 1463 1464 1465\subsection{{\tt ip route show} --- list routes} 1466 1467\paragraph{Abbreviations:} \verb|show|, \verb|list|, \verb|sh|, \verb|ls|, \verb|l|. 1468 1469\paragraph{Description:} the command displays the contents of the routing tables 1470or the route(s) selected by some criteria. 1471 1472 1473\paragraph{Arguments:} 1474\begin{itemize} 1475\item \verb|to SELECTOR| (default) 1476 1477--- only select routes from the given range of destinations. \verb|SELECTOR| 1478consists of an optional modifier (\verb|root|, \verb|match| or \verb|exact|) 1479and a prefix. \verb|root PREFIX| selects routes with prefixes not shorter 1480than \verb|PREFIX|. F.e.\ \verb|root 0/0| selects the entire routing table. 1481\verb|match PREFIX| selects routes with prefixes not longer than 1482\verb|PREFIX|. F.e.\ \verb|match 10.0/16| selects \verb|10.0/16|, 1483\verb|10/8| and \verb|0/0|, but it does not select \verb|10.1/16| and 1484\verb|10.0.0/24|. And \verb|exact PREFIX| (or just \verb|PREFIX|) 1485selects routes with this exact prefix. If neither of these options 1486are present, \verb|ip| assumes \verb|root 0/0| i.e.\ it lists the entire table. 1487 1488 1489\item \verb|tos TOS| or \verb|dsfield TOS| 1490 1491 --- only select routes with the given TOS. 1492 1493 1494\item \verb|table TABLEID| 1495 1496 --- show the routes from this table(s). The default setting is to show 1497\verb|table| \verb|main|. \verb|TABLEID| may either be the ID of a real table 1498or one of the special values: 1499 \begin{itemize} 1500 \item \verb|all| --- list all of the tables. 1501 \item \verb|cache| --- dump the routing cache. 1502 \end{itemize} 1503\begin{NB} 1504 IPv6 has a single table. However, splitting it into \verb|main|, \verb|local| 1505 and \verb|cache| is emulated by the \verb|ip| utility. 1506\end{NB} 1507 1508\item \verb|cloned| or \verb|cached| 1509 1510--- list cloned routes i.e.\ routes which were dynamically forked from 1511other routes because some route attribute (f.e.\ MTU) was updated. 1512Actually, it is equivalent to \verb|table cache|. 1513 1514\item \verb|from SELECTOR| 1515 1516--- the same syntax as for \verb|to|, but it binds the source address range 1517rather than destinations. Note that the \verb|from| option only works with 1518cloned routes. 1519 1520\item \verb|protocol RTPROTO| 1521 1522--- only list routes of this protocol. 1523 1524 1525\item \verb|scope SCOPE_VAL| 1526 1527--- only list routes with this scope. 1528 1529\item \verb|type TYPE| 1530 1531--- only list routes of this type. 1532 1533\item \verb|dev NAME| 1534 1535--- only list routes going via this device. 1536 1537\item \verb|via PREFIX| 1538 1539--- only list routes going via the nexthop routers selected by \verb|PREFIX|. 1540 1541\item \verb|src PREFIX| 1542 1543--- only list routes with preferred source addresses selected 1544by \verb|PREFIX|. 1545 1546\item \verb|realm REALMID| or \verb|realms FROMREALM/TOREALM| 1547 1548--- only list routes with these realms. 1549 1550\end{itemize} 1551 1552\paragraph{Examples:} Let us count routes of protocol \verb|gated/bgp| 1553on a router: 1554\begin{verbatim} 1555kuznet@amber:~ $ ip ro ls proto gated/bgp | wc 1556 1413 9891 79010 1557kuznet@amber:~ $ 1558\end{verbatim} 1559To count the size of the routing cache, we have to use the \verb|-o| option 1560because cached attributes can take more than one line of output: 1561\begin{verbatim} 1562kuznet@amber:~ $ ip -o ro ls cloned | wc 1563 159 2543 18707 1564kuznet@amber:~ $ 1565\end{verbatim} 1566 1567 1568\paragraph{Output format:} The output of this command consists 1569of per route records separated by line feeds. 1570However, some records may consist 1571of more than one line: particularly, this is the case when the route 1572is cloned or you requested additional statistics. If the 1573\verb|-o| option was given, then line feeds separating lines inside 1574records are replaced with the backslash sign. 1575 1576The output has the same syntax as arguments given to {\tt ip route add}, 1577so that it can be understood easily. F.e.\ 1578\begin{verbatim} 1579kuznet@amber:~ $ ip ro ls 193.233.7/24 1580193.233.7.0/24 dev eth0 proto gated/conn scope link \ 1581 src 193.233.7.65 realms inr.ac 1582kuznet@amber:~ $ 1583\end{verbatim} 1584 1585If you list cloned entries, the output contains other attributes which 1586are evaluated during route calculation and updated during route 1587lifetime. An example of the output is: 1588\begin{verbatim} 1589kuznet@amber:~ $ ip ro ls 193.233.7.82 tab cache 1590193.233.7.82 from 193.233.7.82 dev eth0 src 193.233.7.65 \ 1591 realms inr.ac/inr.ac 1592 cache <src-direct,redirect> mtu 1500 rtt 300 iif eth0 1593193.233.7.82 dev eth0 src 193.233.7.65 realms inr.ac 1594 cache mtu 1500 rtt 300 1595kuznet@amber:~ $ 1596\end{verbatim} 1597\begin{NB} 1598 \label{NB-strange-route} 1599 The route looks a bit strange, doesn't it? Did you notice that 1600 it is a path from 193.233.7.82 back to 193.233.82? Well, you will 1601 see in the section on \verb|ip route get| (p.\pageref{NB-nature-of-strangeness}) 1602 how it appeared. 1603\end{NB} 1604The second line, starting with the word \verb|cache|, shows 1605additional attributes which normal routes do not possess. 1606Cached flags are summarized in angle brackets: 1607\begin{itemize} 1608\item \verb|local| --- packets are delivered locally. 1609It stands for loopback unicast routes, for broadcast routes 1610and for multicast routes, if this host is a member of the corresponding 1611group. 1612 1613\item \verb|reject| --- the path is bad. Any attempt to use it results 1614in an error. See attribute \verb|error| below (p.\pageref{IP-ROUTE-GET-error}). 1615 1616\item \verb|mc| --- the destination is multicast. 1617 1618\item \verb|brd| --- the destination is broadcast. 1619 1620\item \verb|src-direct| --- the source is on a directly connected 1621interface. 1622 1623\item \verb|redirected| --- the route was created by an ICMP Redirect. 1624 1625\item \verb|redirect| --- packets going via this route will 1626trigger an ICMP redirect. 1627 1628\item \verb|fastroute| --- the route is eligible to be used for fastroute. 1629 1630\item \verb|equalize| --- make packet by packet randomization 1631along this path. 1632 1633\item \verb|dst-nat| --- the destination address requires translation. 1634 1635\item \verb|src-nat| --- the source address requires translation. 1636 1637\item \verb|masq| --- the source address requires masquerading. 1638This feature disappeared in linux-2.4. 1639 1640\item \verb|notify| --- ({\em not implemented}) change/deletion 1641of this route will trigger RTNETLINK notification. 1642\end{itemize} 1643 1644Then some optional attributes follow: 1645\begin{itemize} 1646\item \verb|error| --- on \verb|reject| routes it is error code 1647returned to local senders when they try to use this route. 1648These error codes are translated into ICMP error codes, sent to remote 1649senders, according to the rules described above in the subsection 1650devoted to route types (p.\pageref{IP-ROUTE-TYPES}). 1651\label{IP-ROUTE-GET-error} 1652 1653\item \verb|expires| --- this entry will expire after this timeout. 1654 1655\item \verb|iif| --- the packets for this path are expected to arrive 1656on this interface. 1657\end{itemize} 1658 1659\paragraph{Statistics:} With the \verb|-statistics| option, more 1660information about this route is shown: 1661\begin{itemize} 1662\item \verb|users| --- the number of users of this entry. 1663\item \verb|age| --- shows when this route was last used. 1664\item \verb|used| --- the number of lookups of this route since its creation. 1665\end{itemize} 1666 1667 1668\subsection{{\tt ip route flush} --- flush routing tables} 1669\label{IP-ROUTE-FLUSH} 1670 1671\paragraph{Abbreviations:} \verb|flush|, \verb|f|. 1672 1673\paragraph{Description:} this command flushes routes selected 1674by some criteria. 1675 1676\paragraph{Arguments:} the arguments have the same syntax and semantics 1677as the arguments of \verb|ip route show|, but routing tables are not 1678listed but purged. The only difference is the default action: \verb|show| 1679dumps all the IP main routing table but \verb|flush| prints the helper page. 1680The reason for this difference does not require any explanation, does it? 1681 1682 1683\paragraph{Statistics:} With the \verb|-statistics| option, the command 1684becomes verbose. It prints out the number of deleted routes and the number 1685of rounds made to flush the routing table. If the option is given 1686twice, \verb|ip route flush| also dumps all the deleted routes 1687in the format described in the previous subsection. 1688 1689\paragraph{Examples:} The first example flushes all the 1690gatewayed routes from the main table (f.e.\ after a routing daemon crash). 1691\begin{verbatim} 1692netadm@amber:~ # ip -4 ro flush scope global type unicast 1693\end{verbatim} 1694This option deserves to be put into a scriptlet \verb|routef|. 1695\begin{NB} 1696This option was described in the \verb|route(8)| man page borrowed 1697from BSD, but was never implemented in Linux. 1698\end{NB} 1699 1700The second example flushes all IPv6 cloned routes: 1701\begin{verbatim} 1702netadm@amber:~ # ip -6 -s -s ro flush cache 17033ffe:2400::220:afff:fef4:c5d1 via 3ffe:2400::220:afff:fef4:c5d1 \ 1704 dev eth0 metric 0 1705 cache used 2 age 12sec mtu 1500 rtt 300 17063ffe:2400::280:adff:feb7:8034 via 3ffe:2400::280:adff:feb7:8034 \ 1707 dev eth0 metric 0 1708 cache used 2 age 15sec mtu 1500 rtt 300 17093ffe:2400::280:c8ff:fe59:5bcc via 3ffe:2400::280:c8ff:fe59:5bcc \ 1710 dev eth0 metric 0 1711 cache users 1 used 1 age 23sec mtu 1500 rtt 300 17123ffe:2400:0:1:2a0:ccff:fe66:1878 via 3ffe:2400:0:1:2a0:ccff:fe66:1878 \ 1713 dev eth1 metric 0 1714 cache used 2 age 20sec mtu 1500 rtt 300 17153ffe:2400:0:1:a00:20ff:fe71:fb30 via 3ffe:2400:0:1:a00:20ff:fe71:fb30 \ 1716 dev eth1 metric 0 1717 cache used 2 age 33sec mtu 1500 rtt 300 1718ff02::1 via ff02::1 dev eth1 metric 0 1719 cache users 1 used 1 age 45sec mtu 1500 rtt 300 1720 1721*** Round 1, deleting 6 entries *** 1722*** Flush is complete after 1 round *** 1723netadm@amber:~ # ip -6 -s -s ro flush cache 1724Nothing to flush. 1725netadm@amber:~ # 1726\end{verbatim} 1727 1728The third example flushes BGP routing tables after a \verb|gated| 1729death. 1730\begin{verbatim} 1731netadm@amber:~ # ip ro ls proto gated/bgp | wc 1732 1408 9856 78730 1733netadm@amber:~ # ip -s ro f proto gated/bgp 1734 1735*** Round 1, deleting 1408 entries *** 1736*** Flush is complete after 1 round *** 1737netadm@amber:~ # ip ro f proto gated/bgp 1738Nothing to flush. 1739netadm@amber:~ # ip ro ls proto gated/bgp 1740netadm@amber:~ # 1741\end{verbatim} 1742 1743 1744\subsection{{\tt ip route get} --- get a single route} 1745\label{IP-ROUTE-GET} 1746 1747\paragraph{Abbreviations:} \verb|get|, \verb|g|. 1748 1749\paragraph{Description:} this command gets a single route to a destination 1750and prints its contents exactly as the kernel sees it. 1751 1752\paragraph{Arguments:} 1753\begin{itemize} 1754\item \verb|to ADDRESS| (default) 1755 1756--- the destination address. 1757 1758\item \verb|from ADDRESS| 1759 1760--- the source address. 1761 1762\item \verb|tos TOS| or \verb|dsfield TOS| 1763 1764--- the Type Of Service. 1765 1766\item \verb|iif NAME| 1767 1768--- the device from which this packet is expected to arrive. 1769 1770\item \verb|oif NAME| 1771 1772--- force the output device on which this packet will be routed. 1773 1774\item \verb|connected| 1775 1776--- if no source address (option \verb|from|) was given, relookup 1777the route with the source set to the preferred address received from the first lookup. 1778If policy routing is used, it may be a different route. 1779 1780\end{itemize} 1781 1782Note that this operation is not equivalent to \verb|ip route show|. 1783\verb|show| shows existing routes. \verb|get| resolves them and 1784creates new clones if necessary. Essentially, \verb|get| 1785is equivalent to sending a packet along this path. 1786If the \verb|iif| argument is not given, the kernel creates a route 1787to output packets towards the requested destination. 1788This is equivalent to pinging the destination 1789with a subsequent {\tt ip route ls cache}, however, no packets are 1790actually sent. With the \verb|iif| argument, the kernel pretends 1791that a packet arrived from this interface and searches for 1792a path to forward the packet. 1793 1794\paragraph{Output format:} This command outputs routes in the same 1795format as \verb|ip route ls|. 1796 1797\paragraph{Examples:} 1798\begin{itemize} 1799\item Find a route to output packets to 193.233.7.82: 1800\begin{verbatim} 1801kuznet@amber:~ $ ip route get 193.233.7.82 1802193.233.7.82 dev eth0 src 193.233.7.65 realms inr.ac 1803 cache mtu 1500 rtt 300 1804kuznet@amber:~ $ 1805\end{verbatim} 1806 1807\item Find a route to forward packets arriving on \verb|eth0| 1808from 193.233.7.82 and destined for 193.233.7.82: 1809\begin{verbatim} 1810kuznet@amber:~ $ ip r g 193.233.7.82 from 193.233.7.82 iif eth0 1811193.233.7.82 from 193.233.7.82 dev eth0 src 193.233.7.65 \ 1812 realms inr.ac/inr.ac 1813 cache <src-direct,redirect> mtu 1500 rtt 300 iif eth0 1814kuznet@amber:~ $ 1815\end{verbatim} 1816\begin{NB} 1817 \label{NB-nature-of-strangeness} 1818 This is the command that created the funny route from 193.233.7.82 1819 looped back to 193.233.7.82 (cf.\ NB on~p.\pageref{NB-strange-route}). 1820 Note the \verb|redirect| flag on it. 1821\end{NB} 1822 1823\item Find a multicast route for packets arriving on \verb|eth0| 1824from host 193.233.7.82 and destined for multicast group 224.2.127.254 1825(it is assumed that a multicast routing daemon is running. 1826In this case, it is \verb|pimd|) 1827\begin{verbatim} 1828kuznet@amber:~ $ ip r g 224.2.127.254 from 193.233.7.82 iif eth0 1829multicast 224.2.127.254 from 193.233.7.82 dev lo \ 1830 src 193.233.7.65 realms inr.ac/cosmos 1831 cache <mc> iif eth0 Oifs: eth1 pimreg 1832kuznet@amber:~ $ 1833\end{verbatim} 1834This route differs from the ones seen before. It contains a ``normal'' part 1835and a ``multicast'' part. The normal part is used to deliver (or not to 1836deliver) the packet to local IP listeners. In this case the router 1837is not a member 1838of this group, so that route has no \verb|local| flag and only 1839forwards packets. The output device for such entries is always loopback. 1840The multicast part consists of an additional \verb|Oifs:| list showing 1841the output interfaces. 1842\end{itemize} 1843 1844 1845It is time for a more complicated example. Let us add an invalid 1846gatewayed route for a destination which is really directly connected: 1847\begin{verbatim} 1848netadm@alisa:~ # ip route add 193.233.7.98 via 193.233.7.254 1849netadm@alisa:~ # ip route get 193.233.7.98 1850193.233.7.98 via 193.233.7.254 dev eth0 src 193.233.7.90 1851 cache mtu 1500 rtt 3072 1852netadm@alisa:~ # 1853\end{verbatim} 1854and probe it with ping: 1855\begin{verbatim} 1856netadm@alisa:~ # ping -n 193.233.7.98 1857PING 193.233.7.98 (193.233.7.98) from 193.233.7.90 : 56 data bytes 1858From 193.233.7.254: Redirect Host(New nexthop: 193.233.7.98) 185964 bytes from 193.233.7.98: icmp_seq=0 ttl=255 time=3.5 ms 1860From 193.233.7.254: Redirect Host(New nexthop: 193.233.7.98) 186164 bytes from 193.233.7.98: icmp_seq=1 ttl=255 time=2.2 ms 186264 bytes from 193.233.7.98: icmp_seq=2 ttl=255 time=0.4 ms 186364 bytes from 193.233.7.98: icmp_seq=3 ttl=255 time=0.4 ms 186464 bytes from 193.233.7.98: icmp_seq=4 ttl=255 time=0.4 ms 1865^C 1866--- 193.233.7.98 ping statistics --- 18675 packets transmitted, 5 packets received, 0% packet loss 1868round-trip min/avg/max = 0.4/1.3/3.5 ms 1869netadm@alisa:~ # 1870\end{verbatim} 1871What happened? Router 193.233.7.254 understood that we have a much 1872better path to the destination and sent us an ICMP redirect message. 1873We may retry \verb|ip route get| to see what we have in the routing 1874tables now: 1875\begin{verbatim} 1876netadm@alisa:~ # ip route get 193.233.7.98 1877193.233.7.98 dev eth0 src 193.233.7.90 1878 cache <redirected> mtu 1500 rtt 3072 1879netadm@alisa:~ # 1880\end{verbatim} 1881 1882 1883 1884\section{{\tt ip rule} --- routing policy database management} 1885\label{IP-RULE} 1886 1887\paragraph{Abbreviations:} \verb|rule|, \verb|ru|. 1888 1889\paragraph{Object:} \verb|rule|s in the routing policy database control 1890the route selection algorithm. 1891 1892Classic routing algorithms used in the Internet make routing decisions 1893based only on the destination address of packets (and in theory, 1894but not in practice, on the TOS field). The seminal review of classic 1895routing algorithms and their modifications can be found in~\cite{RFC1812}. 1896 1897In some circumstances we want to route packets differently depending not only 1898on destination addresses, but also on other packet fields: source address, 1899IP protocol, transport protocol ports or even packet payload. 1900This task is called ``policy routing''. 1901 1902\begin{NB} 1903 ``policy routing'' $\neq$ ``routing policy''. 1904 1905\noindent ``policy routing'' $=$ ``cunning routing''. 1906 1907\noindent ``routing policy'' $=$ ``routing tactics'' or ``routing plan''. 1908\end{NB} 1909 1910To solve this task, the conventional destination based routing table, ordered 1911according to the longest match rule, is replaced with a ``routing policy 1912database'' (or RPDB), which selects routes 1913by executing some set of rules. The rules may have lots of keys of different 1914natures and therefore they have no natural ordering, but one imposed 1915by the administrator. Linux-2.2 RPDB is a linear list of rules 1916ordered by numeric priority value. 1917RPDB explicitly allows matching a few packet fields: 1918 1919\begin{itemize} 1920\item packet source address. 1921\item packet destination address. 1922\item TOS. 1923\item incoming interface (which is packet metadata, rather than a packet field). 1924\end{itemize} 1925 1926Matching IP protocols and transport ports is also possible, 1927indirectly, via \verb|ipchains|, by exploiting their ability 1928to mark some classes of packets with \verb|fwmark|. Therefore, 1929\verb|fwmark| is also included in the set of keys checked by rules. 1930 1931Each policy routing rule consists of a {\em selector\/} and an {\em action\/} 1932predicate. The RPDB is scanned in the order of increasing priority. The selector 1933of each rule is applied to \{source address, destination address, incoming 1934interface, tos, fwmark\} and, if the selector matches the packet, 1935the action is performed. The action predicate may return with success. 1936In this case, it will either give a route or failure indication 1937and the RPDB lookup is terminated. Otherwise, the RPDB program 1938continues on the next rule. 1939 1940What is the action, semantically? The natural action is to select the 1941nexthop and the output device. This is what 1942Cisco IOS~\cite{IOS} does. Let us call it ``match \& set''. 1943The Linux-2.2 approach is more flexible. The action includes 1944lookups in destination-based routing tables and selecting 1945a route from these tables according to the classic longest match algorithm. 1946The ``match \& set'' approach is the simplest case of the Linux one. It is realized 1947when a second level routing table contains a single default route. 1948Recall that Linux-2.2 supports multiple tables 1949managed with the \verb|ip route| command, described in the previous section. 1950 1951At startup time the kernel configures the default RPDB consisting of three 1952rules: 1953 1954\begin{enumerate} 1955\item Priority: 0, Selector: match anything, Action: lookup routing 1956table \verb|local| (ID 255). 1957The \verb|local| table is a special routing table containing 1958high priority control routes for local and broadcast addresses. 1959 1960Rule 0 is special. It cannot be deleted or overridden. 1961 1962 1963\item Priority: 32766, Selector: match anything, Action: lookup routing 1964table \verb|main| (ID 254). 1965The \verb|main| table is the normal routing table containing all non-policy 1966routes. This rule may be deleted and/or overridden with other 1967ones by the administrator. 1968 1969\item Priority: 32767, Selector: match anything, Action: lookup routing 1970table \verb|default| (ID 253). 1971The \verb|default| table is empty. It is reserved for some 1972post-processing if no previous default rules selected the packet. 1973This rule may also be deleted. 1974 1975\end{enumerate} 1976 1977Do not confuse routing tables with rules: rules point to routing tables, 1978several rules may refer to one routing table and some routing tables 1979may have no rules pointing to them. If the administrator deletes all the rules 1980referring to a table, the table is not used, but it still exists 1981and will disappear only after all the routes contained in it are deleted. 1982 1983 1984\paragraph{Rule attributes:} Each RPDB entry has additional 1985attributes. F.e.\ each rule has a pointer to some routing 1986table. NAT and masquerading rules have an attribute to select new IP 1987address to translate/masquerade. Besides that, rules have some 1988optional attributes, which routes have, namely \verb|realms|. 1989These values do not override those contained in the routing tables. They 1990are only used if the route did not select any attributes. 1991 1992 1993\paragraph{Rule types:} The RPDB may contain rules of the following 1994types: 1995\begin{itemize} 1996\item \verb|unicast| --- the rule prescribes to return the route found 1997in the routing table referenced by the rule. 1998\item \verb|blackhole| --- the rule prescribes to silently drop the packet. 1999\item \verb|unreachable| --- the rule prescribes to generate a ``Network 2000is unreachable'' error. 2001\item \verb|prohibit| --- the rule prescribes to generate 2002``Communication is administratively prohibited'' error. 2003\item \verb|nat| --- the rule prescribes to translate the source address 2004of the IP packet into some other value. More about NAT is 2005in Appendix~\ref{ROUTE-NAT}, p.\pageref{ROUTE-NAT}. 2006\end{itemize} 2007 2008 2009\paragraph{Commands:} \verb|add|, \verb|delete| and \verb|show| 2010(or \verb|list|). 2011 2012\subsection{{\tt ip rule add} --- insert a new rule\\ 2013 {\tt ip rule delete} --- delete a rule} 2014\label{IP-RULE-ADD} 2015 2016\paragraph{Abbreviations:} \verb|add|, \verb|a|; \verb|delete|, \verb|del|, 2017 \verb|d|. 2018 2019\paragraph{Arguments:} 2020 2021\begin{itemize} 2022\item \verb|type TYPE| (default) 2023 2024--- the type of this rule. The list of valid types was given in the previous 2025subsection. 2026 2027\item \verb|from PREFIX| 2028 2029--- select the source prefix to match. 2030 2031\item \verb|to PREFIX| 2032 2033--- select the destination prefix to match. 2034 2035\item \verb|iif NAME| 2036 2037--- select the incoming device to match. If the interface is loopback, 2038the rule only matches packets originating from this host. This means that you 2039may create separate routing tables for forwarded and local packets and, 2040hence, completely segregate them. 2041 2042\item \verb|tos TOS| or \verb|dsfield TOS| 2043 2044--- select the TOS value to match. 2045 2046\item \verb|fwmark MARK| 2047 2048--- select the \verb|fwmark| value to match. 2049 2050\item \verb|priority PREFERENCE| 2051 2052--- the priority of this rule. Each rule should have an explicitly 2053set {\em unique\/} priority value. 2054\begin{NB} 2055 Really, for historical reasons \verb|ip rule add| does not require a 2056 priority value and allows them to be non-unique. 2057 If the user does not supplied a priority, it is selected by the kernel. 2058 If the user creates a rule with a priority value that 2059 already exists, the kernel does not reject the request. It adds 2060 the new rule before all old rules of the same priority. 2061 2062 It is mistake in design, no more. And it will be fixed one day, 2063 so do not rely on this feature. Use explicit priorities. 2064\end{NB} 2065 2066 2067\item \verb|table TABLEID| 2068 2069--- the routing table identifier to lookup if the rule selector matches. 2070 2071\item \verb|realms FROM/TO| 2072 2073--- Realms to select if the rule matched and the routing table lookup 2074succeeded. Realm \verb|TO| is only used if the route did not select 2075any realm. 2076 2077\item \verb|nat ADDRESS| 2078 2079--- The base of the IP address block to translate (for source addresses). 2080The \verb|ADDRESS| may be either the start of the block of NAT addresses 2081(selected by NAT routes) or in linux-2.2 a local host address (or even zero). 2082In the last case the router does not translate the packets, 2083but masquerades them to this address; this feature disappered in 2.4. 2084More about NAT is in Appendix~\ref{ROUTE-NAT}, 2085p.\pageref{ROUTE-NAT}. 2086 2087\end{itemize} 2088 2089\paragraph{Warning:} Changes to the RPDB made with these commands 2090do not become active immediately. It is assumed that after 2091a script finishes a batch of updates, it flushes the routing cache 2092with \verb|ip route flush cache|. 2093 2094\paragraph{Examples:} 2095\begin{itemize} 2096\item Route packets with source addresses from 192.203.80/24 2097according to routing table \verb|inr.ruhep|: 2098\begin{verbatim} 2099ip ru add from 192.203.80.0/24 table inr.ruhep prio 220 2100\end{verbatim} 2101 2102\item Translate packet source address 193.233.7.83 into 192.203.80.144 2103and route it according to table \#1 (actually, it is \verb|inr.ruhep|): 2104\begin{verbatim} 2105ip ru add from 193.233.7.83 nat 192.203.80.144 table 1 prio 320 2106\end{verbatim} 2107 2108\item Delete the unused default rule: 2109\begin{verbatim} 2110ip ru del prio 32767 2111\end{verbatim} 2112 2113\end{itemize} 2114 2115 2116 2117\subsection{{\tt ip rule show} --- list rules} 2118\label{IP-RULE-SHOW} 2119 2120\paragraph{Abbreviations:} \verb|show|, \verb|list|, \verb|sh|, \verb|ls|, \verb|l|. 2121 2122 2123\paragraph{Arguments:} Good news, this is one command that has no arguments. 2124 2125\paragraph{Output format:} 2126 2127\begin{verbatim} 2128kuznet@amber:~ $ ip ru ls 21290: from all lookup local 2130200: from 192.203.80.0/24 to 193.233.7.0/24 lookup main 2131210: from 192.203.80.0/24 to 192.203.80.0/24 lookup main 2132220: from 192.203.80.0/24 lookup inr.ruhep realms inr.ruhep/radio-msu 2133300: from 193.233.7.83 to 193.233.7.0/24 lookup main 2134310: from 193.233.7.83 to 192.203.80.0/24 lookup main 2135320: from 193.233.7.83 lookup inr.ruhep map-to 192.203.80.144 213632766: from all lookup main 2137kuznet@amber:~ $ 2138\end{verbatim} 2139 2140In the first column is the rule priority value followed 2141by a colon. Then the selectors follow. Each key is prefixed 2142with the same keyword that was used to create the rule. 2143 2144The keyword \verb|lookup| is followed by a routing table identifier, 2145as it is recorded in the file \verb|/etc/iproute2/rt_tables|. 2146 2147If the rule does NAT (f.e.\ rule \#320), it is shown by the keyword 2148\verb|map-to| followed by the start of the block of addresses to map. 2149 2150The sense of this example is pretty simple. The prefixes 2151192.203.80.0/24 and 193.233.7.0/24 form the internal network, but 2152they are routed differently when the packets leave it. 2153Besides that, the host 193.233.7.83 is translated into 2154another prefix to look like 192.203.80.144 when talking 2155to the outer world. 2156 2157 2158 2159\section{{\tt ip maddress} --- multicast addresses management} 2160\label{IP-MADDR} 2161 2162\paragraph{Object:} \verb|maddress| objects are multicast addresses. 2163 2164\paragraph{Commands:} \verb|add|, \verb|delete|, \verb|show| (or \verb|list|). 2165 2166\subsection{{\tt ip maddress show} --- list multicast addresses} 2167 2168\paragraph{Abbreviations:} \verb|show|, \verb|list|, \verb|sh|, \verb|ls|, \verb|l|. 2169 2170\paragraph{Arguments:} 2171 2172\begin{itemize} 2173 2174\item \verb|dev NAME| (default) 2175 2176--- the device name. 2177 2178\end{itemize} 2179 2180\paragraph{Output format:} 2181 2182\begin{verbatim} 2183kuznet@alisa:~ $ ip maddr ls dummy 21842: dummy 2185 link 33:33:00:00:00:01 2186 link 01:00:5e:00:00:01 2187 inet 224.0.0.1 users 2 2188 inet6 ff02::1 2189kuznet@alisa:~ $ 2190\end{verbatim} 2191 2192The first line of the output shows the interface index and its name. 2193Then the multicast address list follows. Each line starts with the 2194protocol identifier. The word \verb|link| denotes a link layer 2195multicast addresses. 2196 2197If a multicast address has more than one user, the number 2198of users is shown after the \verb|users| keyword. 2199 2200One additional feature not present in the example above 2201is the \verb|static| flag, which indicates that the address was joined 2202with \verb|ip maddr add|. See the following subsection. 2203 2204 2205 2206\subsection{{\tt ip maddress add} --- add a multicast address\\ 2207 {\tt ip maddress delete} --- delete a multicast address} 2208 2209\paragraph{Abbreviations:} \verb|add|, \verb|a|; \verb|delete|, \verb|del|, \verb|d|. 2210 2211\paragraph{Description:} these commands attach/detach 2212a static link layer multicast address to listen on the interface. 2213Note that it is impossible to join protocol multicast groups 2214statically. This command only manages link layer addresses. 2215 2216 2217\paragraph{Arguments:} 2218 2219\begin{itemize} 2220\item \verb|address LLADDRESS| (default) 2221 2222--- the link layer multicast address. 2223 2224\item \verb|dev NAME| 2225 2226--- the device to join/leave this multicast address. 2227 2228\end{itemize} 2229 2230 2231\paragraph{Example:} Let us continue with the example from the previous subsection. 2232 2233\begin{verbatim} 2234netadm@alisa:~ # ip maddr add 33:33:00:00:00:01 dev dummy 2235netadm@alisa:~ # ip -0 maddr ls dummy 22362: dummy 2237 link 33:33:00:00:00:01 users 2 static 2238 link 01:00:5e:00:00:01 2239netadm@alisa:~ # ip maddr del 33:33:00:00:00:01 dev dummy 2240\end{verbatim} 2241 2242\begin{NB} 2243 Neither \verb|ip| nor the kernel check for multicast address validity. 2244 Particularly, this means that you can try to load a unicast address 2245 instead of a multicast address. Most drivers will ignore such addresses, 2246 but several (f.e.\ Tulip) will intern it to their on-board filter. 2247 The effects may be strange. Namely, the addresses become additional 2248 local link addresses and, if you loaded the address of another host 2249 to the router, wait for duplicated packets on the wire. 2250 It is not a bug, but rather a hole in the API and intra-kernel interfaces. 2251 This feature is really more useful for traffic monitoring, but using it 2252 with Linux-2.2 you {\em have to\/} be sure that the host is not 2253 a router and, especially, that it is not a transparent proxy or masquerading 2254 agent. 2255\end{NB} 2256 2257 2258 2259\section{{\tt ip mroute} --- multicast routing cache management} 2260\label{IP-MROUTE} 2261 2262\paragraph{Abbreviations:} \verb|mroute|, \verb|mr|. 2263 2264\paragraph{Object:} \verb|mroute| objects are multicast routing cache 2265entries created by a user level mrouting daemon 2266(f.e.\ \verb|pimd| or \verb|mrouted|). 2267 2268Due to the limitations of the current interface to the multicast routing 2269engine, it is impossible to change \verb|mroute| objects administratively, 2270so we may only display them. This limitation will be removed 2271in the future. 2272 2273\paragraph{Commands:} \verb|show| (or \verb|list|). 2274 2275 2276\subsection{{\tt ip mroute show} --- list mroute cache entries} 2277 2278\paragraph{Abbreviations:} \verb|show|, \verb|list|, \verb|sh|, \verb|ls|, \verb|l|. 2279 2280\paragraph{Arguments:} 2281 2282\begin{itemize} 2283\item \verb|to PREFIX| (default) 2284 2285--- the prefix selecting the destination multicast addresses to list. 2286 2287 2288\item \verb|iif NAME| 2289 2290--- the interface on which multicast packets are received. 2291 2292 2293\item \verb|from PREFIX| 2294 2295--- the prefix selecting the IP source addresses of the multicast route. 2296 2297 2298\end{itemize} 2299 2300\paragraph{Output format:} 2301 2302\begin{verbatim} 2303kuznet@amber:~ $ ip mroute ls 2304(193.232.127.6, 224.0.1.39) Iif: unresolved 2305(193.232.244.34, 224.0.1.40) Iif: unresolved 2306(193.233.7.65, 224.66.66.66) Iif: eth0 Oifs: pimreg 2307kuznet@amber:~ $ 2308\end{verbatim} 2309 2310Each line shows one (S,G) entry in the multicast routing cache, 2311where S is the source address and G is the multicast group. \verb|Iif| is 2312the interface on which multicast packets are expected to arrive. 2313If the word \verb|unresolved| is there instead of the interface name, 2314it means that the routing daemon still hasn't resolved this entry. 2315The keyword \verb|oifs| is followed by a list of output interfaces, separated 2316by spaces. If a multicast routing entry is created with non-trivial 2317TTL scope, administrative distances are appended to the device names 2318in the \verb|oifs| list. 2319 2320\paragraph{Statistics:} The \verb|-statistics| option also prints the 2321number of packets and bytes forwarded along this route and 2322the number of packets that arrived on the wrong interface, if this number is not zero. 2323 2324\begin{verbatim} 2325kuznet@amber:~ $ ip -s mr ls 224.66/16 2326(193.233.7.65, 224.66.66.66) Iif: eth0 Oifs: pimreg 2327 9383 packets, 300256 bytes 2328kuznet@amber:~ $ 2329\end{verbatim} 2330 2331 2332\section{{\tt ip tunnel} --- tunnel configuration} 2333\label{IP-TUNNEL} 2334 2335\paragraph{Abbreviations:} \verb|tunnel|, \verb|tunl|. 2336 2337\paragraph{Object:} \verb|tunnel| objects are tunnels, encapsulating 2338packets in IPv4 packets and then sending them over the IP infrastructure. 2339 2340\paragraph{Commands:} \verb|add|, \verb|delete|, \verb|change|, \verb|show| 2341(or \verb|list|). 2342 2343\paragraph{See also:} A more informal discussion of tunneling 2344over IP and the \verb|ip tunnel| command can be found in~\cite{IP-TUNNELS}. 2345 2346\subsection{{\tt ip tunnel add} --- add a new tunnel\\ 2347 {\tt ip tunnel change} --- change an existing tunnel\\ 2348 {\tt ip tunnel delete} --- destroy a tunnel} 2349 2350\paragraph{Abbreviations:} \verb|add|, \verb|a|; \verb|change|, \verb|chg|; 2351\verb|delete|, \verb|del|, \verb|d|. 2352 2353 2354\paragraph{Arguments:} 2355 2356\begin{itemize} 2357 2358\item \verb|name NAME| (default) 2359 2360--- select the tunnel device name. 2361 2362\item \verb|mode MODE| 2363 2364--- set the tunnel mode. Three modes are currently available: 2365 \verb|ipip|, \verb|sit| and \verb|gre|. 2366 2367\item \verb|remote ADDRESS| 2368 2369--- set the remote endpoint of the tunnel. 2370 2371\item \verb|local ADDRESS| 2372 2373--- set the fixed local address for tunneled packets. 2374It must be an address on another interface of this host. 2375 2376\item \verb|ttl N| 2377 2378--- set a fixed TTL \verb|N| on tunneled packets. 2379 \verb|N| is a number in the range 1--255. 0 is a special value 2380 meaning that packets inherit the TTL value. 2381 The default value is: \verb|inherit|. 2382 2383\item \verb|tos T| or \verb|dsfield T| 2384 2385--- set a fixed TOS \verb|T| on tunneled packets. 2386 The default value is: \verb|inherit|. 2387 2388 2389 2390\item \verb|dev NAME| 2391 2392--- bind the tunnel to the device \verb|NAME| so that 2393 tunneled packets will only be routed via this device and will 2394 not be able to escape to another device when the route to endpoint changes. 2395 2396\item \verb|nopmtudisc| 2397 2398--- disable Path MTU Discovery on this tunnel. 2399 It is enabled by default. Note that a fixed ttl is incompatible 2400 with this option: tunnelling with a fixed ttl always makes pmtu discovery. 2401 2402\item \verb|key K|, \verb|ikey K|, \verb|okey K| 2403 2404--- (only GRE tunnels) use keyed GRE with key \verb|K|. \verb|K| is 2405 either a number or an IP address-like dotted quad. 2406 The \verb|key| parameter sets the key to use in both directions. 2407 The \verb|ikey| and \verb|okey| parameters set different keys for input and output. 2408 2409 2410\item \verb|csum|, \verb|icsum|, \verb|ocsum| 2411 2412--- (only GRE tunnels) generate/require checksums for tunneled packets. 2413 The \verb|ocsum| flag calculates checksums for outgoing packets. 2414 The \verb|icsum| flag requires that all input packets have the correct 2415 checksum. The \verb|csum| flag is equivalent to the combination 2416 ``\verb|icsum| \verb|ocsum|''. 2417 2418\item \verb|seq|, \verb|iseq|, \verb|oseq| 2419 2420--- (only GRE tunnels) serialize packets. 2421 The \verb|oseq| flag enables sequencing of outgoing packets. 2422 The \verb|iseq| flag requires that all input packets are serialized. 2423 The \verb|seq| flag is equivalent to the combination ``\verb|iseq| \verb|oseq|''. 2424 2425\begin{NB} 2426 I think this option does not 2427 work. At least, I did not test it, did not debug it and 2428 do not even understand how it is supposed to work or for what 2429 purpose Cisco planned to use it. Do not use it. 2430\end{NB} 2431 2432 2433\end{itemize} 2434 2435\paragraph{Example:} Create a pointopoint IPv6 tunnel with maximal TTL of 32. 2436\begin{verbatim} 2437netadm@amber:~ # ip tunl add Cisco mode sit remote 192.31.7.104 \ 2438 local 192.203.80.142 ttl 32 2439\end{verbatim} 2440 2441\subsection{{\tt ip tunnel show} --- list tunnels} 2442 2443\paragraph{Abbreviations:} \verb|show|, \verb|list|, \verb|sh|, \verb|ls|, \verb|l|. 2444 2445 2446\paragraph{Arguments:} None. 2447 2448\paragraph{Output format:} 2449\begin{verbatim} 2450kuznet@amber:~ $ ip tunl ls Cisco 2451Cisco: ipv6/ip remote 192.31.7.104 local 192.203.80.142 ttl 32 2452kuznet@amber:~ $ 2453\end{verbatim} 2454The line starts with the tunnel device name followed by a colon. 2455Then the tunnel mode follows. The parameters of the tunnel are listed 2456with the same keywords that were used when creating the tunnel. 2457 2458\paragraph{Statistics:} 2459 2460\begin{verbatim} 2461kuznet@amber:~ $ ip -s tunl ls Cisco 2462Cisco: ipv6/ip remote 192.31.7.104 local 192.203.80.142 ttl 32 2463RX: Packets Bytes Errors CsumErrs OutOfSeq Mcasts 2464 12566 1707516 0 0 0 0 2465TX: Packets Bytes Errors DeadLoop NoRoute NoBufs 2466 13445 1879677 0 0 0 0 2467kuznet@amber:~ $ 2468\end{verbatim} 2469Essentially, these numbers are the same as the numbers 2470printed with {\tt ip -s link show} 2471(sec.\ref{IP-LINK-SHOW}, p.\pageref{IP-LINK-SHOW}) but the tags are different 2472to reflect that they are tunnel specific. 2473\begin{itemize} 2474\item \verb|CsumErrs| --- the total number of packets dropped 2475because of checksum failures for a GRE tunnel with checksumming enabled. 2476\item \verb|OutOfSeq| --- the total number of packets dropped 2477because they arrived out of sequence for a GRE tunnel with 2478serialization enabled. 2479\item \verb|Mcasts| --- the total number of multicast packets 2480received on a broadcast GRE tunnel. 2481\item \verb|DeadLoop| --- the total number of packets which were not 2482transmitted because the tunnel is looped back to itself. 2483\item \verb|NoRoute| --- the total number of packets which were not 2484transmitted because there is no IP route to the remote endpoint. 2485\item \verb|NoBufs| --- the total number of packets which were not 2486transmitted because the kernel failed to allocate a buffer. 2487\end{itemize} 2488 2489 2490\section{{\tt ip monitor} and {\tt rtmon} --- state monitoring} 2491\label{IP-MONITOR} 2492 2493The \verb|ip| utility can monitor the state of devices, addresses 2494and routes continuously. This option has a slightly different format. 2495Namely, 2496the \verb|monitor| command is the first in the command line and then 2497the object list follows: 2498\begin{verbatim} 2499 ip monitor [ file FILE ] [ all | OBJECT-LIST ] 2500\end{verbatim} 2501\verb|OBJECT-LIST| is the list of object types that we want to monitor. 2502It may contain \verb|link|, \verb|address| and \verb|route|. 2503If no \verb|file| argument is given, \verb|ip| opens RTNETLINK, 2504listens on it and dumps state changes in the format described 2505in previous sections. 2506 2507If a file name is given, it does not listen on RTNETLINK, 2508but opens the file containing RTNETLINK messages saved in binary format 2509and dumps them. Such a history file can be generated with the 2510\verb|rtmon| utility. This utility has a command line syntax similar to 2511\verb|ip monitor|. 2512Ideally, \verb|rtmon| should be started before 2513the first network configuration command is issued. F.e.\ if 2514you insert: 2515\begin{verbatim} 2516 rtmon file /var/log/rtmon.log 2517\end{verbatim} 2518in a startup script, you will be able to view the full history 2519later. 2520 2521Certainly, it is possible to start \verb|rtmon| at any time. 2522It prepends the history with the state snapshot dumped at the moment 2523of starting. 2524 2525 2526\section{Route realms and policy propagation, {\tt rtacct}} 2527\label{RT-REALMS} 2528 2529On routers using OSPF ASE or, especially, the BGP protocol, routing 2530tables may be huge. If we want to classify or to account for the packets 2531per route, we will have to keep lots of information. Even worse, if we 2532want to distinguish the packets not only by their destination, but 2533also by their source, the task gets quadratic complexity and its solution 2534is physically impossible. 2535 2536One approach to propagating the policy from routing protocols 2537to the forwarding engine has been proposed in~\cite{IOS-BGP-PP}. 2538Essentially, Cisco Policy Propagation via BGP is based on the fact 2539that dedicated routers all have the RIB (Routing Information Base) 2540close to the forwarding engine, so policy routing rules can 2541check all the route attributes, including ASPATH information 2542and community strings. 2543 2544The Linux architecture, splitting the RIB (maintained by a user level 2545daemon) and the kernel based FIB (Forwarding Information Base), 2546does not allow such a simple approach. 2547 2548It is to our fortune because there is another solution 2549which allows even more flexible policy and richer semantics. 2550 2551Namely, routes can be clustered together in user space, based on their 2552attributes. F.e.\ a BGP router knows route ASPATH, its community; 2553an OSPF router knows the route tag or its area. The administrator, when adding 2554routes manually, also knows their nature. Providing that the number of such 2555aggregates (we call them {\em realms\/}) is low, the task of full 2556classification both by source and destination becomes quite manageable. 2557 2558So each route may be assigned to a realm. It is assumed that 2559this identification is made by a routing daemon, but static routes 2560can also be handled manually with \verb|ip route| (see sec.\ref{IP-ROUTE}, 2561p.\pageref{IP-ROUTE}). 2562\begin{NB} 2563 There is a patch to \verb|gated|, allowing classification of routes 2564 to realms with all the set of policy rules implemented in \verb|gated|: 2565 by prefix, by ASPATH, by origin, by tag etc. 2566\end{NB} 2567 2568To facilitate the construction (f.e.\ in case the routing 2569daemon is not aware of realms), missing realms may be completed 2570with routing policy rules, see sec.~\ref{IP-RULE}, p.\pageref{IP-RULE}. 2571 2572For each packet the kernel calculates a tuple of realms: source realm 2573and destination realm, using the following algorithm: 2574 2575\begin{enumerate} 2576\item If the route has a realm, the destination realm of the packet is set to it. 2577\item If the rule has a source realm, the source realm of the packet is set to it. 2578If the destination realm was not inherited from the route and the rule has a destination realm, 2579it is also set. 2580\item If at least one of the realms is still unknown, the kernel finds 2581the reversed route to the source of the packet. 2582\item If the source realm is still unknown, get it from the reversed route. 2583\item If one of the realms is still unknown, swap the realms of reversed 2584routes and apply step 2 again. 2585\end{enumerate} 2586 2587After this procedure is completed we know what realm the packet 2588arrived from and the realm where it is going to propagate to. 2589If some of the realms are unknown, they are initialized to zero 2590(or realm \verb|unknown|). 2591 2592The main application of realms is the TC \verb|route| classifier~\cite{TC-CREF}, 2593where they are used to help assign packets to traffic classes, 2594to account, police and schedule them according to this 2595classification. 2596 2597A much simpler but still very useful application is incoming packet 2598accounting by realms. The kernel gathers a packet statistics summary 2599which can be viewed with the \verb|rtacct| utility. 2600\begin{verbatim} 2601kuznet@amber:~ $ rtacct russia 2602Realm BytesTo PktsTo BytesFrom PktsFrom 2603russia 20576778 169176 47080168 153805 2604kuznet@amber:~ $ 2605\end{verbatim} 2606This shows that this router received 153805 packets from 2607the realm \verb|russia| and forwarded 169176 packets to \verb|russia|. 2608The realm \verb|russia| consists of routes with ASPATHs not leaving 2609Russia. 2610 2611Note that locally originating packets are not accounted here, 2612\verb|rtacct| shows incoming packets only. Using the \verb|route| 2613classifier (see~\cite{TC-CREF}) you can get even more detailed 2614accounting information about outgoing packets, optionally 2615summarizing traffic not only by source or destination, but 2616by any pair of source and destination realms. 2617 2618 2619\begin{thebibliography}{99} 2620\addcontentsline{toc}{section}{References} 2621\bibitem{RFC-NDISC} T.~Narten, E.~Nordmark, W.~Simpson. 2622``Neighbor Discovery for IP Version 6 (IPv6)'', RFC-2461. 2623 2624\bibitem{RFC-ADDRCONF} S.~Thomson, T.~Narten. 2625``IPv6 Stateless Address Autoconfiguration'', RFC-2462. 2626 2627\bibitem{RFC1812} F.~Baker. 2628``Requirements for IP Version 4 Routers'', RFC-1812. 2629 2630\bibitem{RFC1122} R.~T.~Braden. 2631``Requirements for Internet hosts --- communication layers'', RFC-1122. 2632 2633\bibitem{IOS} ``Cisco IOS Release 12.0 Network Protocols 2634Command Reference, Part 1'' and 2635``Cisco IOS Release 12.0 Quality of Service Solutions 2636Configuration Guide: Configuring Policy-Based Routing'',\\ 2637http://www.cisco.com/univercd/cc/td/doc/product/software/ios120. 2638 2639\bibitem{IP-TUNNELS} A.~N.~Kuznetsov. 2640``Tunnels over IP in Linux-2.2'', \\ 2641In: {\tt ftp://ftp.inr.ac.ru/ip-routing/iproute2-current.tar.gz}. 2642 2643\bibitem{TC-CREF} A.~N.~Kuznetsov. ``TC Command Reference'',\\ 2644In: {\tt ftp://ftp.inr.ac.ru/ip-routing/iproute2-current.tar.gz}. 2645 2646\bibitem{IOS-BGP-PP} ``Cisco IOS Release 12.0 Quality of Service Solutions 2647Configuration Guide: Configuring QoS Policy Propagation via 2648Border Gateway Protocol'',\\ 2649http://www.cisco.com/univercd/cc/td/doc/product/software/ios120. 2650 2651\bibitem{RFC-DHCP} R.~Droms. 2652``Dynamic Host Configuration Protocol.'', RFC-2131 2653 2654\end{thebibliography} 2655 2656 2657 2658 2659\appendix 2660\addcontentsline{toc}{section}{Appendix} 2661 2662\section{Source address selection} 2663\label{ADDR-SEL} 2664 2665When a host creates an IP packet, it must select some source 2666address. Correct source address selection is a critical procedure, 2667because it gives the receiver the information needed to deliver a 2668reply. If the source is selected incorrectly, in the best case, 2669the backward path may appear different to the forward one which 2670is harmful for performance. In the worst case, when the addresses 2671are administratively scoped, the reply may be lost entirely. 2672 2673Linux-2.2 selects source addresses using the following algorithm: 2674 2675\begin{itemize} 2676\item 2677The application may select a source address explicitly with \verb|bind(2)| 2678syscall or supplying it to \verb|sendmsg(2)| via the ancillary data object 2679\verb|IP_PKTINFO|. In this case the kernel only checks the validity 2680of the address and never tries to ``improve'' an incorrect user choice, 2681generating an error instead. 2682\begin{NB} 2683 Never say ``Never''. The sysctl option \verb|ip_dynaddr| breaks 2684 this axiom. It has been made deliberately with the purpose 2685 of automatically reselecting the address on hosts with dynamic dial-out interfaces. 2686 However, this hack {\em must not\/} be used on multihomed hosts 2687 and especially on routers: it would break them. 2688\end{NB} 2689 2690 2691\item Otherwise, IP routing tables can contain an explicit source 2692address hint for this destination. The hint is set with the \verb|src| parameter 2693to the \verb|ip route| command, sec.\ref{IP-ROUTE}, p.\pageref{IP-ROUTE}. 2694 2695 2696\item Otherwise, the kernel searches through the list of addresses 2697attached to the interface through which the packets will be routed. 2698The search strategies are different for IP and IPv6. Namely: 2699 2700\begin{itemize} 2701\item IPv6 searches for the first valid, not deprecated address 2702with the same scope as the destination. 2703 2704\item IP searches for the first valid address with a scope wider 2705than the scope of the destination but it prefers addresses 2706which fall to the same subnet as the nexthop of the route 2707to the destination. Unlike IPv6, the scopes of IPv4 destinations 2708are not encoded in their addresses but are supplied 2709in routing tables instead (the \verb|scope| parameter to the \verb|ip route| command, 2710sec.\ref{IP-ROUTE}, p.\pageref{IP-ROUTE}). 2711 2712\end{itemize} 2713 2714 2715\item Otherwise, if the scope of the destination is \verb|link| or \verb|host|, 2716the algorithm fails and returns a zero source address. 2717 2718\item Otherwise, all interfaces are scanned to search for an address 2719with an appropriate scope. The loopback device \verb|lo| is always the first 2720in the search list, so that if an address with global scope (not 127.0.0.1!) 2721is configured on loopback, it is always preferred. 2722 2723\end{itemize} 2724 2725 2726\section{Proxy ARP/NDISC} 2727\label{PROXY-NEIGH} 2728 2729Routers may answer ARP/NDISC solicitations on behalf of other hosts. 2730In Linux-2.2 proxy ARP on an interface may be enabled 2731by setting the kernel \verb|sysctl| variable 2732\verb|/proc/sys/net/ipv4/conf/<dev>/proxy_arp| to 1. After this, the router 2733starts to answer ARP requests on the interface \verb|<dev>|, provided 2734the route to the requested destination does {\em not\/} go back via the same 2735device. 2736 2737The variable \verb|/proc/sys/net/ipv4/conf/all/proxy_arp| enables proxy 2738ARP on all the IP devices. 2739 2740However, this approach fails in the case of IPv6 because the router 2741must join the solicited node multicast address to listen for the corresponding 2742NDISC queries. It means that proxy NDISC is possible only on a per destination 2743basis. 2744 2745Logically, proxy ARP/NDISC is not a kernel task. It can easily be implemented 2746in user space. However, similar functionality was present in BSD kernels 2747and in Linux-2.0, so we have to preserve it at least to the extent that 2748is standardized in BSD. 2749\begin{NB} 2750 Linux-2.0 ARP had a feature called {\em subnet\/} proxy ARP. 2751 It is replaced with the sysctl flag in Linux-2.2. 2752\end{NB} 2753 2754 2755The \verb|ip| utility provides a way to manage proxy ARP/NDISC 2756with the \verb|ip neigh| command, namely: 2757\begin{verbatim} 2758 ip neigh add proxy ADDRESS [ dev NAME ] 2759\end{verbatim} 2760adds a new proxy ARP/NDISC record and 2761\begin{verbatim} 2762 ip neigh del proxy ADDRESS [ dev NAME ] 2763\end{verbatim} 2764deletes it. 2765 2766If the name of the device is not given, the router will answer solicitations 2767for address \verb|ADDRESS| on all devices, otherwise it will only serve 2768the device \verb|NAME|. Even if the proxy entry is created with 2769\verb|ip neigh|, the router {\em will not\/} answer a query if the route 2770to the destination goes back via the interface from which the solicitation 2771was received. 2772 2773It is important to emphasize that proxy entries have {\em no\/} 2774parameters other than these (IP/IPv6 address and optional device). 2775Particularly, the entry does not store any link layer address. 2776It always advertises the station address of the interface 2777on which it sends advertisements (i.e. it's own station address). 2778 2779\section{Route NAT status} 2780\label{ROUTE-NAT} 2781 2782NAT (or ``Network Address Translation'') remaps some parts 2783of the IP address space into other ones. Linux-2.2 route NAT is supposed 2784to be used to facilitate policy routing by rewriting addresses 2785to other routing domains or to help while renumbering sites 2786to another prefix. 2787 2788\paragraph{What it is not:} 2789It is necessary to emphasize that {\em it is not supposed\/} 2790to be used to compress address space or to split load. 2791This is not missing functionality but a design principle. 2792Route NAT is {\em stateless\/}. It does not hold any state 2793about translated sessions. This means that it handles any number 2794of sessions flawlessly. But it also means that it is {\em static\/}. 2795It cannot detect the moment when the last TCP client stops 2796using an address. For the same reason, it will not help to split 2797load between several servers. 2798\begin{NB} 2799It is a pretty commonly held belief that it is useful to split load between 2800several servers with NAT. This is a mistake. All you get from this 2801is the requirement that the router keep the state of all the TCP connections 2802going via it. Well, if the router is so powerful, run apache on it. 8) 2803\end{NB} 2804 2805The second feature: it does not touch packet payload, 2806does not try to ``improve'' broken protocols by looking 2807through its data and mangling it. It mangles IP addresses, 2808only IP addresses and nothing but IP addresses. 2809This also, is not missing any functionality. 2810 2811To resume: if you need to compress address space or keep 2812active FTP clients happy, your choice is not route NAT but masquerading, 2813port forwarding, NAPT etc. 2814\begin{NB} 2815By the way, you may also want to look at 2816http://www.suse.com/\~mha/HyperNews/get/linux-ip-nat.html 2817\end{NB} 2818 2819 2820\paragraph{How it works.} 2821Some part of the address space is reserved for dummy addresses 2822which will look for all the world like some host addresses 2823inside your network. No other hosts may use these addresses, 2824however other routers may also be configured to translate them. 2825\begin{NB} 2826A great advantage of route NAT is that it may be used not 2827only in stub networks but in environments with arbitrarily complicated 2828structure. It does not firewall, it {\em forwards.} 2829\end{NB} 2830These addresses are selected by the \verb|ip route| command 2831(sec.\ref{IP-ROUTE-ADD}, p.\pageref{IP-ROUTE-ADD}). F.e.\ 2832\begin{verbatim} 2833 ip route add nat 192.203.80.144 via 193.233.7.83 2834\end{verbatim} 2835states that the single address 192.203.80.144 is a dummy NAT address. 2836For all the world it looks like a host address inside our network. 2837For neighbouring hosts and routers it looks like the local address 2838of the translating router. The router answers ARP for it, advertises 2839this address as routed via it, {\em et al\/}. When the router 2840receives a packet destined for 192.203.80.144, it replaces 2841this address with 193.233.7.83 which is the address of some real 2842host and forwards the packet. If you need to remap 2843blocks of addresses, you may use a command like: 2844\begin{verbatim} 2845 ip route add nat 192.203.80.192/26 via 193.233.7.64 2846\end{verbatim} 2847This command will map a block of 63 addresses 192.203.80.192-255 to 2848193.233.7.64-127. 2849 2850When an internal host (193.233.7.83 in the example above) 2851sends something to the outer world and these packets are forwarded 2852by our router, it should translate the source address 193.233.7.83 2853into 192.203.80.144. This task is solved by setting a special 2854policy rule (sec.\ref{IP-RULE-ADD}, p.\pageref{IP-RULE-ADD}): 2855\begin{verbatim} 2856 ip rule add prio 320 from 193.233.7.83 nat 192.203.80.144 2857\end{verbatim} 2858This rule says that the source address 193.233.7.83 2859should be translated into 192.203.80.144 before forwarding. 2860It is important that the address after the \verb|nat| keyword 2861is some NAT address, declared by {\tt ip route add nat}. 2862If it is just a random address the router will not map to it. 2863\begin{NB} 2864The exception is when the address is a local address of this 2865router (or 0.0.0.0) and masquerading is configured in the linux-2.2 2866kernel. In this case the router will masquerade the packets as this address. 2867If 0.0.0.0 is selected, the result is equivalent to one 2868obtained with firewalling rules. Otherwise, you have the way 2869to order Linux to masquerade to this fixed address. 2870NAT mechanism used in linux-2.4 is more flexible than 2871masquerading, so that this feature has lost meaning and disabled. 2872\end{NB} 2873 2874If the network has non-trivial internal structure, it is 2875useful and even necessary to add rules disabling translation 2876when a packet does not leave this network. Let us return to the 2877example from sec.\ref{IP-RULE-SHOW} (p.\pageref{IP-RULE-SHOW}). 2878\begin{verbatim} 2879300: from 193.233.7.83 to 193.233.7.0/24 lookup main 2880310: from 193.233.7.83 to 192.203.80.0/24 lookup main 2881320: from 193.233.7.83 lookup inr.ruhep map-to 192.203.80.144 2882\end{verbatim} 2883This block of rules causes normal forwarding when 2884packets from 193.233.7.83 do not leave networks 193.233.7/24 2885and 192.203.80/24. Also, if the \verb|inr.ruhep| table does not 2886contain a route to the destination (which means that the routing 2887domain owning addresses from 192.203.80/24 is dead), no translation 2888will occur. Otherwise, the packets are translated. 2889 2890\paragraph{How to only translate selected ports:} 2891If you only want to translate selected ports (f.e.\ http) 2892and leave the rest intact, you may use \verb|ipchains| 2893to \verb|fwmark| a class of packets. 2894Suppose you did and all the packets from 193.233.7.83 2895destined for port 80 are marked with marker 0x1234 in input fwchain. 2896In this case you may replace rule \#320 with: 2897\begin{verbatim} 2898320: from 193.233.7.83 fwmark 1234 lookup main map-to 192.203.80.144 2899\end{verbatim} 2900and translation will only be enabled for outgoing http requests. 2901 2902\section{Example: minimal host setup} 2903\label{EXAMPLE-SETUP} 2904 2905The following script gives an example of a fault safe 2906setup of IP (and IPv6, if it is compiled into the kernel) 2907in the common case of a node attached to a single broadcast 2908network. A more advanced script, which may be used both on multihomed 2909hosts and on routers, is described in the following 2910section. 2911 2912The utilities used in the script may be found in the 2913directory ftp://ftp.inr.ac.ru/ip-routing/: 2914\begin{enumerate} 2915\item \verb|ip| --- package \verb|iproute2|. 2916\item \verb|arping| --- package \verb|iputils|. 2917\item \verb|rdisc| --- package \verb|iputils|. 2918\end{enumerate} 2919\begin{NB} 2920It also refers to a DHCP client, \verb|dhcpcd|. I should refrain from 2921recommending a good DHCP client to use. All that I can 2922say is that ISC \verb|dhcp-2.0b1pl6| patched with the patch that 2923can be found in the \verb|dhcp.bootp.rarp| subdirectory of 2924the same ftp site {\em does\/} work, 2925at least on Ethernet and Token Ring. 2926\end{NB} 2927 2928\begin{verbatim} 2929#! /bin/bash 2930\end{verbatim} 2931\begin{flushleft} 2932\# {\bf Usage: \verb|ifone ADDRESS[/PREFIX-LENGTH] [DEVICE]|}\\ 2933\# {\bf Parameters:}\\ 2934\# \$1 --- Static IP address, optionally followed by prefix length.\\ 2935\# \$2 --- Device name. If it is missing, \verb|eth0| is asssumed.\\ 2936\# F.e. \verb|ifone 193.233.7.90| 2937\end{flushleft} 2938\begin{verbatim} 2939dev=$2 2940: ${dev:=eth0} 2941ipaddr= 2942\end{verbatim} 2943\# Parse IP address, splitting prefix length. 2944\begin{verbatim} 2945if [ "$1" != "" ]; then 2946 ipaddr=${1%/*} 2947 if [ "$1" != "$ipaddr" ]; then 2948 pfxlen=${1#*/} 2949 fi 2950 : ${pfxlen:=24} 2951fi 2952pfx="${ipaddr}/${pfxlen}" 2953\end{verbatim} 2954 2955\begin{flushleft} 2956\# {\bf Step 0} --- enable loopback.\\ 2957\#\\ 2958\# This step is necessary on any networked box before attempt\\ 2959\# to configure any other device.\\ 2960\end{flushleft} 2961\begin{verbatim} 2962ip link set up dev lo 2963ip addr add 127.0.0.1/8 dev lo brd + scope host 2964\end{verbatim} 2965\begin{flushleft} 2966\# IPv6 autoconfigure themself on loopback.\\ 2967\#\\ 2968\# If user gave loopback as device, we add the address as alias and exit. 2969\end{flushleft} 2970\begin{verbatim} 2971if [ "$dev" = "lo" ]; then 2972 if [ "$ipaddr" != "" -a "$ipaddr" != "127.0.0.1" ]; then 2973 ip address add $ipaddr dev $dev 2974 exit $? 2975 fi 2976 exit 0 2977fi 2978\end{verbatim} 2979 2980\noindent\# {\bf Step 1} --- enable device \verb|$dev| 2981 2982\begin{verbatim} 2983if ! ip link set up dev $dev ; then 2984 echo "Cannot enable interface $dev. Aborting." 1>&2 2985 exit 1 2986fi 2987\end{verbatim} 2988\begin{flushleft} 2989\# The interface is \verb|UP|. IPv6 started stateless autoconfiguration itself,\\ 2990\# and its configuration finishes here. However,\\ 2991\# IP still needs some static preconfigured address. 2992\end{flushleft} 2993\begin{verbatim} 2994if [ "$ipaddr" = "" ]; then 2995 echo "No address for $dev is configured, trying DHCP..." 1>&2 2996 dhcpcd 2997 exit $? 2998fi 2999\end{verbatim} 3000 3001\begin{flushleft} 3002\# {\bf Step 2} --- IP Duplicate Address Detection~\cite{RFC-DHCP}.\\ 3003\# Send two probes and wait for result for 3 seconds.\\ 3004\# If the interface opens slower f.e.\ due to long media detection,\\ 3005\# you want to increase the timeout.\\ 3006\end{flushleft} 3007\begin{verbatim} 3008if ! arping -q -c 2 -w 3 -D -I $dev $ipaddr ; then 3009 echo "Address $ipaddr is busy, trying DHCP..." 1>&2 3010 dhcpcd 3011 exit $? 3012fi 3013\end{verbatim} 3014\begin{flushleft} 3015\# OK, the address is unique, we may add it on the interface.\\ 3016\#\\ 3017\# {\bf Step 3} --- Configure the address on the interface. 3018\end{flushleft} 3019 3020\begin{verbatim} 3021if ! ip address add $pfx brd + dev $dev; then 3022 echo "Failed to add $pfx on $dev, trying DHCP..." 1>&2 3023 dhcpcd 3024 exit $? 3025fi 3026\end{verbatim} 3027 3028\noindent\# {\bf Step 4} --- Announce our presence on the link. 3029\begin{verbatim} 3030arping -A -c 1 -I $dev $ipaddr 3031noarp=$? 3032( sleep 2; 3033 arping -U -c 1 -I $dev $ipaddr ) >& /dev/null </dev/null & 3034\end{verbatim} 3035 3036\begin{flushleft} 3037\# {\bf Step 5} (optional) --- Add some control routes.\\ 3038\#\\ 3039\# 1. Prohibit link local multicast addresses.\\ 3040\# 2. Prohibit link local (alias, limited) broadcast.\\ 3041\# 3. Add default multicast route. 3042\end{flushleft} 3043\begin{verbatim} 3044ip route add unreachable 224.0.0.0/24 3045ip route add unreachable 255.255.255.255 3046if [ `ip link ls $dev | grep -c MULTICAST` -ge 1 ]; then 3047 ip route add 224.0.0.0/4 dev $dev scope global 3048fi 3049\end{verbatim} 3050 3051\begin{flushleft} 3052\# {\bf Step 6} --- Add fallback default route with huge metric.\\ 3053\# If a proxy ARP server is present on the interface, we will be\\ 3054\# able to talk to all the Internet without further configuration.\\ 3055\# It is not so cheap though and we still hope that this route\\ 3056\# will be overridden by more correct one by rdisc.\\ 3057\# Do not make this step if the device is not ARPable,\\ 3058\# because dead nexthop detection does not work on them. 3059\end{flushleft} 3060\begin{verbatim} 3061if [ "$noarp" = "0" ]; then 3062 ip ro add default dev $dev metric 30000 scope global 3063fi 3064\end{verbatim} 3065 3066\begin{flushleft} 3067\# {\bf Step 7} --- Restart router discovery and exit. 3068\end{flushleft} 3069\begin{verbatim} 3070killall -HUP rdisc || rdisc -fs 3071exit 0 3072\end{verbatim} 3073 3074 3075\section{Example: {\protect\tt ifcfg} --- interface address management} 3076\label{EXAMPLE-IFCFG} 3077 3078This is a simplistic script replacing one option of \verb|ifconfig|, 3079namely, IP address management. It not only adds 3080addresses, but also carries out Duplicate Address Detection~\cite{RFC-DHCP}, 3081sends unsolicited ARP to update the caches of other hosts sharing 3082the interface, adds some control routes and restarts Router Discovery 3083when it is necessary. 3084 3085I strongly recommend using it {\em instead\/} of \verb|ifconfig| both 3086on hosts and on routers. 3087 3088\begin{verbatim} 3089#! /bin/bash 3090\end{verbatim} 3091\begin{flushleft} 3092\# {\bf Usage: \verb?ifcfg DEVICE[:ALIAS] [add|del] ADDRESS[/LENGTH] [PEER]?}\\ 3093\# {\bf Parameters:}\\ 3094\# ---Device name. It may have alias suffix, separated by colon.\\ 3095\# ---Command: add, delete or stop.\\ 3096\# ---IP address, optionally followed by prefix length.\\ 3097\# ---Optional peer address for pointopoint interfaces.\\ 3098\# F.e. \verb|ifcfg eth0 193.233.7.90/24| 3099 3100\noindent\# This function determines, whether it is router or host.\\ 3101\# It returns 0, if the host is apparently not router. 3102\end{flushleft} 3103\begin{verbatim} 3104CheckForwarding () { 3105 local sbase fwd 3106 sbase=/proc/sys/net/ipv4/conf 3107 fwd=0 3108 if [ -d $sbase ]; then 3109 for dir in $sbase/*/forwarding; do 3110 fwd=$[$fwd + `cat $dir`] 3111 done 3112 else 3113 fwd=2 3114 fi 3115 return $fwd 3116} 3117\end{verbatim} 3118\begin{flushleft} 3119\# This function restarts Router Discovery.\\ 3120\end{flushleft} 3121\begin{verbatim} 3122RestartRDISC () { 3123 killall -HUP rdisc || rdisc -fs 3124} 3125\end{verbatim} 3126\begin{flushleft} 3127\# Calculate ABC "natural" mask length\\ 3128\# Arg: \$1 = dotquad address 3129\end{flushleft} 3130\begin{verbatim} 3131ABCMaskLen () { 3132 local class; 3133 class=${1%%.*} 3134 if [ $class -eq 0 -o $class -ge 224 ]; then return 0 3135 elif [ $class -ge 192 ]; then return 24 3136 elif [ $class -ge 128 ]; then return 16 3137 else return 8 ; fi 3138} 3139\end{verbatim} 3140 3141 3142\begin{flushleft} 3143\# {\bf MAIN()}\\ 3144\#\\ 3145\# Strip alias suffix separated by colon. 3146\end{flushleft} 3147\begin{verbatim} 3148label="label $1" 3149ldev=$1 3150dev=${1%:*} 3151if [ "$dev" = "" -o "$1" = "help" ]; then 3152 echo "Usage: ifcfg DEV [[add|del [ADDR[/LEN]] [PEER] | stop]" 1>&2 3153 echo " add - add new address" 1>&2 3154 echo " del - delete address" 1>&2 3155 echo " stop - completely disable IP" 1>&2 3156 exit 1 3157fi 3158shift 3159 3160CheckForwarding 3161fwd=$? 3162\end{verbatim} 3163\begin{flushleft} 3164\# Parse command. If it is ``stop'', flush and exit. 3165\end{flushleft} 3166\begin{verbatim} 3167deleting=0 3168case "$1" in 3169add) shift ;; 3170stop) 3171 if [ "$ldev" != "$dev" ]; then 3172 echo "Cannot stop alias $ldev" 1>&2 3173 exit 1; 3174 fi 3175 ip -4 addr flush dev $dev $label || exit 1 3176 if [ $fwd -eq 0 ]; then RestartRDISC; fi 3177 exit 0 ;; 3178del*) 3179 deleting=1; shift ;; 3180*) 3181esac 3182\end{verbatim} 3183\begin{flushleft} 3184\# Parse prefix, split prefix length, separated by slash. 3185\end{flushleft} 3186\begin{verbatim} 3187ipaddr= 3188pfxlen= 3189if [ "$1" != "" ]; then 3190 ipaddr=${1%/*} 3191 if [ "$1" != "$ipaddr" ]; then 3192 pfxlen=${1#*/} 3193 fi 3194 if [ "$ipaddr" = "" ]; then 3195 echo "$1 is bad IP address." 1>&2 3196 exit 1 3197 fi 3198fi 3199shift 3200\end{verbatim} 3201\begin{flushleft} 3202\# If peer address is present, prefix length is 32.\\ 3203\# Otherwise, if prefix length was not given, guess it. 3204\end{flushleft} 3205\begin{verbatim} 3206peer=$1 3207if [ "$peer" != "" ]; then 3208 if [ "$pfxlen" != "" -a "$pfxlen" != "32" ]; then 3209 echo "Peer address with non-trivial netmask." 1>&2 3210 exit 1 3211 fi 3212 pfx="$ipaddr peer $peer" 3213else 3214 if [ "$pfxlen" = "" ]; then 3215 ABCMaskLen $ipaddr 3216 pfxlen=$? 3217 fi 3218 pfx="$ipaddr/$pfxlen" 3219fi 3220if [ "$ldev" = "$dev" -a "$ipaddr" != "" ]; then 3221 label= 3222fi 3223\end{verbatim} 3224\begin{flushleft} 3225\# If deletion was requested, delete the address and restart RDISC 3226\end{flushleft} 3227\begin{verbatim} 3228if [ $deleting -ne 0 ]; then 3229 ip addr del $pfx dev $dev $label || exit 1 3230 if [ $fwd -eq 0 ]; then RestartRDISC; fi 3231 exit 0 3232fi 3233\end{verbatim} 3234\begin{flushleft} 3235\# Start interface initialization.\\ 3236\#\\ 3237\# {\bf Step 0} --- enable device \verb|$dev| 3238\end{flushleft} 3239\begin{verbatim} 3240if ! ip link set up dev $dev ; then 3241 echo "Error: cannot enable interface $dev." 1>&2 3242 exit 1 3243fi 3244if [ "$ipaddr" = "" ]; then exit 0; fi 3245\end{verbatim} 3246\begin{flushleft} 3247\# {\bf Step 1} --- IP Duplicate Address Detection~\cite{RFC-DHCP}.\\ 3248\# Send two probes and wait for result for 3 seconds.\\ 3249\# If the interface opens slower f.e.\ due to long media detection,\\ 3250\# you want to increase the timeout.\\ 3251\end{flushleft} 3252\begin{verbatim} 3253if ! arping -q -c 2 -w 3 -D -I $dev $ipaddr ; then 3254 echo "Error: some host already uses address $ipaddr on $dev." 1>&2 3255 exit 1 3256fi 3257\end{verbatim} 3258\begin{flushleft} 3259\# OK, the address is unique. We may add it to the interface.\\ 3260\#\\ 3261\# {\bf Step 2} --- Configure the address on the interface. 3262\end{flushleft} 3263\begin{verbatim} 3264if ! ip address add $pfx brd + dev $dev $label; then 3265 echo "Error: failed to add $pfx on $dev." 1>&2 3266 exit 1 3267fi 3268\end{verbatim} 3269\noindent\# {\bf Step 3} --- Announce our presence on the link 3270\begin{verbatim} 3271arping -q -A -c 1 -I $dev $ipaddr 3272noarp=$? 3273( sleep 2 ; 3274 arping -q -U -c 1 -I $dev $ipaddr ) >& /dev/null </dev/null & 3275\end{verbatim} 3276\begin{flushleft} 3277\# {\bf Step 4} (optional) --- Add some control routes.\\ 3278\#\\ 3279\# 1. Prohibit link local multicast addresses.\\ 3280\# 2. Prohibit link local (alias, limited) broadcast.\\ 3281\# 3. Add default multicast route. 3282\end{flushleft} 3283\begin{verbatim} 3284ip route add unreachable 224.0.0.0/24 >& /dev/null 3285ip route add unreachable 255.255.255.255 >& /dev/null 3286if [ `ip link ls $dev | grep -c MULTICAST` -ge 1 ]; then 3287 ip route add 224.0.0.0/4 dev $dev scope global >& /dev/null 3288fi 3289\end{verbatim} 3290\begin{flushleft} 3291\# {\bf Step 5} --- Add fallback default route with huge metric.\\ 3292\# If a proxy ARP server is present on the interface, we will be\\ 3293\# able to talk to all the Internet without further configuration.\\ 3294\# Do not make this step on router or if the device is not ARPable.\\ 3295\# because dead nexthop detection does not work on them. 3296\end{flushleft} 3297\begin{verbatim} 3298if [ $fwd -eq 0 ]; then 3299 if [ $noarp -eq 0 ]; then 3300 ip ro append default dev $dev metric 30000 scope global 3301 elif [ "$peer" != "" ]; then 3302 if ping -q -c 2 -w 4 $peer ; then 3303 ip ro append default via $peer dev $dev metric 30001 3304 fi 3305 fi 3306 RestartRDISC 3307fi 3308 3309exit 0 3310\end{verbatim} 3311\begin{flushleft} 3312\# End of {\bf MAIN()} 3313\end{flushleft} 3314 3315 3316\end{document} 3317