iproute2/doc/ip-cref.tex

0 \documentstyle[12pt,twoside]{article}
2 \def\TITLE{IP Command Reference}
3 \input preamble
4 \begin{center}
5 \Large\bf IP Command Reference.
6 \end{center}
9 \begin{center}
10 { \large Alexey~N.~Kuznetsov } \\
11 \em Institute for Nuclear Research, Moscow \\
12 \verb|kuznet@ms2.inr.ac.ru| \\
13 \rm April 14, 1999
14 \end{center}
16 \vspace{5mm}
18 \tableofcontents
20 \newpage
22 \section{About this document}
24 This document presents a comprehensive description of the \verb|ip| utility
25 from the \verb|iproute2| package. It is not a tutorial or user's guide.
26 It is a {\em dictionary\/}, not explaining terms,
27 but translating them into other terms, which may also be unknown to the reader.
28 However, the document is self-contained and the reader, provided they have a
29 basic networking background, will find enough information
30 and examples to understand and configure Linux-2.2 IP and IPv6
31 networking.
33 This document is split into sections explaining \verb|ip| commands
34 and options, decrypting \verb|ip| output and containing a few examples.
35 More voluminous examples and some topics, which require more elaborate
36 discussion, are in the appendix.
38 The paragraphs beginning with NB contain side notes, warnings about
39 bugs and design drawbacks. They may be skipped at the first reading.
41 \section{{\tt ip} --- command syntax}
43 The generic form of an \verb|ip| command is:
44 \begin{verbatim}
45 ip [ OPTIONS ] OBJECT [ COMMAND [ ARGUMENTS ]]
46 \end{verbatim}
47 where \verb|OPTIONS| is a set of optional modifiers affecting the
48 general behaviour of the \verb|ip| utility or changing its output. All options
49 begin with the character \verb|'-'| and may be used in either long or abbreviated
50 forms. Currently, the following options are available:
52 \begin{itemize}
53 \item \verb|-V|, \verb|-Version|
55 --- print the version of the \verb|ip| utility and exit.
58 \item \verb|-s|, \verb|-stats|, \verb|-statistics|
60 --- output more information. If the option
61 appears twice or more, the amount of information increases.
62 As a rule, the information is statistics or some time values.
65 \item \verb|-f|, \verb|-family| followed by a protocol family
66 identifier: \verb|inet|, \verb|inet6| or \verb|link|.
68 --- enforce the protocol family to use. If the option is not present,
69 the protocol family is guessed from other arguments. If the rest of the command
70 line does not give enough information to guess the family, \verb|ip| falls back to the default
71 one, usually \verb|inet| or \verb|any|. \verb|link| is a special family
72 identifier meaning that no networking protocol is involved.
74 \item \verb|-4|
76 --- shortcut for \verb|-family inet|.
78 \item \verb|-6|
80 --- shortcut for \verb|-family inet6|.
82 \item \verb|-0|
84 --- shortcut for \verb|-family link|.
87 \item \verb|-o|, \verb|-oneline|
89 --- output each record on a single line, replacing line feeds
90 with the \verb|'\'| character. This is convenient when you want to
91 count records with \verb|wc| or to \verb|grep| the output. The trivial
92 script \verb|rtpr| converts the output back into readable form.
94 \item \verb|-r|, \verb|-resolve|
96 --- use the system's name resolver to print DNS names instead of
97 host addresses.
99 \begin{NB}
100  Do not use this option when reporting bugs or asking for advice.
101 \end{NB}
102 \begin{NB}
103  \verb|ip| never uses DNS to resolve names to addresses.
104 \end{NB}
106 \end{itemize}
108 \verb|OBJECT| is the object to manage or to get information about.
109 The object types currently understood by \verb|ip| are:
111 \begin{itemize}
112 \item \verb|link| --- network device
113 \item \verb|address| --- protocol (IP or IPv6) address on a device
114 \item \verb|neighbour| --- ARP or NDISC cache entry
115 \item \verb|route| --- routing table entry
116 \item \verb|rule| --- rule in routing policy database
117 \item \verb|maddress| --- multicast address
118 \item \verb|mroute| --- multicast routing cache entry
119 \item \verb|tunnel| --- tunnel over IP
120 \end{itemize}
122 Again, the names of all objects may be written in full or
123 abbreviated form, f.e.\ \verb|address| is abbreviated as \verb|addr|
124 or just \verb|a|.
126 \verb|COMMAND| specifies the action to perform on the object.
127 The set of possible actions depends on the object type.
128 As a rule, it is possible to \verb|add|, \verb|delete| and
129 \verb|show| (or \verb|list|) objects, but some objects
130 do not allow all of these operations or have some additional commands.
131 The \verb|help| command is available for all objects. It prints
132 out a list of available commands and argument syntax conventions.
134 If no command is given, some default command is assumed.
135 Usually it is \verb|list| or, if the objects of this class
136 cannot be listed, \verb|help|.
138 \verb|ARGUMENTS| is a list of arguments to the command.
139 The arguments depend on the command and object. There are two types of arguments:
140 {\em flags\/}, consisting of a single keyword, and {\em parameters\/},
141 consisting of a keyword followed by a value. For convenience,
142 each command has some {\em default parameter\/}
143 which may be omitted. F.e.\ parameter \verb|dev| is the default
144 for the {\tt ip link} command, so {\tt ip link ls eth0} is equivalent
145 to {\tt ip link ls dev eth0}.
146 In the command descriptions below such parameters
147 are distinguished with the marker: ``(default)''.
149 Almost all keywords may be abbreviated with several first (or even single)
150 letters. The shortcuts are convenient when \verb|ip| is used interactively,
151 but they are not recommended in scripts or when reporting bugs
152 or asking for advice. ``Officially'' allowed abbreviations are listed
153 in the document body.
157 \section{{\tt ip} --- error messages}
159 \verb|ip| may fail for one of the following reasons:
161 \begin{itemize}
162 \item
163 A syntax error on the command line: an unknown keyword, incorrectly formatted
164 IP address {\em et al\/}. In this case \verb|ip| prints an error message
165 and exits. As a rule, the error message will contain information
166 about the reason for the failure. Sometimes it also prints a help page.
168 \item
169 The arguments did not pass verification for self-consistency.
171 \item
172 \verb|ip| failed to compile a kernel request from the arguments
173 because the user didn't give enough information.
175 \item
176 The kernel returned an error to some syscall. In this case \verb|ip|
177 prints the error message, as it is output with \verb|perror(3)|,
178 prefixed with a comment and a syscall identifier.
180 \item
181 The kernel returned an error to some RTNETLINK request.
182 In this case \verb|ip| prints the error message, as it is output
183 with \verb|perror(3)| prefixed with ``RTNETLINK answers:''.
185 \end{itemize}
187 All the operations are atomic, i.e.\
188 if the \verb|ip| utility fails, it does not change anything
189 in the system. One harmful exception is \verb|ip link| command
190 (Sec.\ref{IP-LINK}, p.\pageref{IP-LINK}),
191 which may change only some of the device parameters given
192 on command line.
194 It is difficult to list all the error messages (especially
195 syntax errors). However, as a rule, their meaning is clear
196 from the context of the command.
198 The most common mistakes are:
200 \begin{enumerate}
201 \item Netlink is not configured in the kernel. The message is:
202 \begin{verbatim}
203 Cannot open netlink socket: Invalid value
204 \end{verbatim}
206 \item RTNETLINK is not configured in the kernel. In this case
207 one of the following messages may be printed, depending on the command:
208 \begin{verbatim}
209 Cannot talk to rtnetlink: Connection refused
210 Cannot send dump request: Connection refused
211 \end{verbatim}
213 \item The \verb|CONFIG_IP_MULTIPLE_TABLES| option was not selected
214 when configuring the kernel. In this case any attempt to use the
215 \verb|ip| \verb|rule| command will fail, f.e.
216 \begin{verbatim}
217 kuznet@kaiser $ ip rule list
218 RTNETLINK error: Invalid argument
219 dump terminated
220 \end{verbatim}
222 \end{enumerate}
225 \section{{\tt ip link} --- network device configuration}
226 \label{IP-LINK}
228 \paragraph{Object:} A \verb|link| is a network device and the corresponding
229 commands display and change the state of devices.
231 \paragraph{Commands:} \verb|set| and \verb|show| (or \verb|list|).
233 \subsection{{\tt ip link set} --- change device attributes}
235 \paragraph{Abbreviations:} \verb|set|, \verb|s|.
237 \paragraph{Arguments:}
239 \begin{itemize}
240 \item \verb|dev NAME| (default)
242 --- \verb|NAME| specifies the network device on which to operate.
244 \item \verb|up| and \verb|down|
246 --- change the state of the device to \verb|UP| or \verb|DOWN|.
248 \item \verb|arp on| or \verb|arp off|
250 --- change the \verb|NOARP| flag on the device.
252 \begin{NB}
253 This operation is {\em not allowed\/} if the device is in state \verb|UP|.
254 Though neither the \verb|ip| utility nor the kernel check for this condition.
255 You can get unpredictable results changing this flag while the
256 device is running.
257 \end{NB}
259 \item \verb|multicast on| or \verb|multicast off|
261 --- change the \verb|MULTICAST| flag on the device.
263 \item \verb|dynamic on| or \verb|dynamic off|
265 --- change the \verb|DYNAMIC| flag on the device.
267 \item \verb|name NAME|
269 --- change the name of the device. This operation is not
270 recommended if the device is running or has some addresses
271 already configured.
273 \item \verb|txqueuelen NUMBER| or \verb|txqlen NUMBER|
275 --- change the transmit queue length of the device.
277 \item \verb|mtu NUMBER|
279 --- change the MTU of the device.
281 \item \verb|address LLADDRESS|
283 --- change the station address of the interface.
285 \item \verb|broadcast LLADDRESS|, \verb|brd LLADDRESS| or \verb|peer LLADDRESS|
287 --- change the link layer broadcast address or the peer address when
288 the interface is \verb|POINTOPOINT|.
290 \vskip 1mm
291 \begin{NB}
292 For most devices (f.e.\ for Ethernet) changing the link layer
293 broadcast address will break networking.
294 Do not use it, if you do not understand what this operation really does.
295 \end{NB}
297 \end{itemize}
299 \vskip 1mm
300 \begin{NB}
301 The {\tt ip} utility does not change the \verb|PROMISC|
302 or \verb|ALLMULTI| flags. These flags are considered
303 obsolete and should not be changed administratively.
304 \end{NB}
306 \paragraph{Warning:} If multiple parameter changes are requested,
307 \verb|ip| aborts immediately after any of the changes have failed.
308 This is the only case when \verb|ip| can move the system to
309 an unpredictable state. The solution is to avoid changing
310 several parameters with one {\tt ip link set} call.
312 \paragraph{Examples:}
313 \begin{itemize}
314 \item \verb|ip link set dummy address 00:00:00:00:00:01|
316 --- change the station address of the interface \verb|dummy|.
318 \item \verb|ip link set dummy up|
320 --- start the interface \verb|dummy|.
322 \end{itemize}
325 \subsection{{\tt ip link show} --- display device attributes}
326 \label{IP-LINK-SHOW}
328 \paragraph{Abbreviations:} \verb|show|, \verb|list|, \verb|lst|, \verb|sh|, \verb|ls|,
329 \verb|l|.
331 \paragraph{Arguments:}
332 \begin{itemize}
333 \item \verb|dev NAME| (default)
335 --- \verb|NAME| specifies the network device to show.
336 If this argument is omitted all devices are listed.
338 \item \verb|up|
340 --- only display running interfaces.
342 \end{itemize}
345 \paragraph{Output format:}
347 \begin{verbatim}
348 kuznet@alisa:~ $ ip link ls eth0
349 3: eth0: <BROADCAST,MULTICAST,UP> mtu 1500 qdisc cbq qlen 100
350     link/ether 00:a0:cc:66:18:78 brd ff:ff:ff:ff:ff:ff
351 kuznet@alisa:~ $ ip link ls sit0
352 5: sit0@NONE: <NOARP,UP> mtu 1480 qdisc noqueue
353     link/sit 0.0.0.0 brd 0.0.0.0
354 kuznet@alisa:~ $ ip link ls dummy
355 2: dummy: <BROADCAST,NOARP> mtu 1500 qdisc noop
356     link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff
357 kuznet@alisa:~ $
358 \end{verbatim}
361 The number before each colon is an {\em interface index\/} or {\em ifindex\/}.
362 This number uniquely identifies the interface. This is followed by the {\em interface name\/}
363 (\verb|eth0|, \verb|sit0| etc.). The interface name is also
364 unique at every given moment. However, the interface may disappear from the
365 list (f.e.\ when the corresponding driver module is unloaded) and another
366 one with the same name may be created later. Besides that,
367 the administrator may change the name of any device with
368 \verb|ip| \verb|link| \verb|set| \verb|name|
369 to make it more intelligible.
371 The interface name may have another name or \verb|NONE| appended
372 after the \verb|@| sign. This means that this device is bound to some other
373 device,
374 i.e.\ packets send through it are encapsulated and sent via the ``master''
375 device. If the name is \verb|NONE|, the master is unknown.
377 Then we see the interface {\em mtu\/} (``maximal transfer unit''). This determines
378 the maximal size of data which can be sent as a single packet over this interface.
380 {\em qdisc\/} (``queuing discipline'') shows the queuing algorithm used
381 on the interface. Particularly, \verb|noqueue| means that this interface
382 does not queue anything and \verb|noop| means that the interface is in blackhole
383 mode i.e.\ all packets sent to it are immediately discarded.
384 {\em qlen\/} is the default transmit queue length of the device measured
385 in packets.
387 The interface flags are summarized in the angle brackets.
389 \begin{itemize}
390 \item \verb|UP| --- the device is turned on. It is ready to accept
391 packets for transmission and it may inject into the kernel packets received
392 from other nodes on the network.
394 \item \verb|LOOPBACK| --- the interface does not communicate with other
395 hosts. All packets sent through it will be returned
396 and nothing but bounced packets can be received.
398 \item \verb|BROADCAST| --- the device has the facility to send packets
399 to all hosts sharing the same link. A typical example is an Ethernet link.
401 \item \verb|POINTOPOINT| --- the link has only two ends with one node
402 attached to each end. All packets sent to this link will reach the peer
403 and all packets received by us came from this single peer.
405 If neither \verb|LOOPBACK| nor \verb|BROADCAST| nor \verb|POINTOPOINT|
406 are set, the interface is assumed to be NMBA (Non-Broadcast Multi-Access).
407 This is the most generic type of device and the most complicated one, because
408 the host attached to a NBMA link has no means to send to anyone
409 without additionally configured information.
411 \item \verb|MULTICAST| --- is an advisory flag indicating that the interface
412 is aware of multicasting i.e.\ sending packets to some subset of neighbouring
413 nodes. Broadcasting is a particular case of multicasting, where the multicast
414 group consists of all nodes on the link. It is important to emphasize
415 that software {\em must not\/} interpret the absence of this flag as the inability
416 to use multicasting on this interface. Any \verb|POINTOPOINT| and
417 \verb|BROADCAST| link is multicasting by definition, because we have
418 direct access to all the neighbours and, hence, to any part of them.
419 Certainly, the use of high bandwidth multicast transfers is not recommended
420 on broadcast-only links because of high expense, but it is not strictly
421 prohibited.
423 \item \verb|PROMISC| --- the device listens to and feeds to the kernel all
424 traffic on the link even if it is not destined for us, not broadcasted
425 and not destined for a multicast group of which we are member. Usually
426 this mode exists only on broadcast links and is used by bridges and for network
427 monitoring.
429 \item \verb|ALLMULTI| --- the device receives all multicast packets
430 wandering on the link. This mode is used by multicast routers.
432 \item \verb|NOARP| --- this flag is different from the other ones. It has
433 no invariant value and its interpretation depends on the network protocols
434 involved. As a rule, it indicates that the device needs no address
435 resolution and that the software or hardware knows how to deliver packets
436 without any help from the protocol stacks.
438 \item \verb|DYNAMIC| --- is an advisory flag indicating that the interface is
439 dynamically created and destroyed.
441 \item \verb|SLAVE| --- this interface is bonded to some other interfaces
442 to share link capacities.
444 \end{itemize}
446 \vskip 1mm
447 \begin{NB}
448 There are other flags but they are either obsolete (\verb|NOTRAILERS|)
449 or not implemented (\verb|DEBUG|) or specific to some devices
450 (\verb|MASTER|, \verb|AUTOMEDIA| and \verb|PORTSEL|). We do not discuss
451 them here.
452 \end{NB}
453 \begin{NB}
454 The values of \verb|PROMISC| and \verb|ALLMULTI| flags
455 shown by the \verb|ifconfig| utility and by the \verb|ip| utility
456 are {\em different\/}. \verb|ip link ls| shows the true device state,
457 while \verb|ifconfig| shows the virtual state which was set with
458 \verb|ifconfig| itself.
459 \end{NB}
462 The second line contains information on the link layer addresses
463 associated with the device. The first word (\verb|ether|, \verb|sit|)
464 defines the interface hardware type. This type determines the format and semantics
465 of the addresses and is logically part of the address.
466 The default format of the station address and the broadcast address
467 (or the peer address for pointopoint links) is a
468 sequence of hexadecimal bytes separated by colons, but some link
469 types may have their natural address format, f.e.\ addresses
470 of tunnels over IP are printed as dotted-quad IP addresses.
472 \vskip 1mm
473 \begin{NB}
474   NBMA links have no well-defined broadcast or peer address,
475   however this field may contain useful information, f.e.\
476   about the address of broadcast relay or about the address of the ARP server.
477 \end{NB}
478 \begin{NB}
479 Multicast addresses are not shown by this command, see
480 \verb|ip maddr ls| in~Sec.\ref{IP-MADDR} (p.\pageref{IP-MADDR} of this
481 document).
482 \end{NB}
485 \paragraph{Statistics:} With the \verb|-statistics| option, \verb|ip| also
486 prints interface statistics:
488 \begin{verbatim}
489 kuznet@alisa:~ $ ip -s link ls eth0
490 3: eth0: <BROADCAST,MULTICAST,UP> mtu 1500 qdisc cbq qlen 100
491     link/ether 00:a0:cc:66:18:78 brd ff:ff:ff:ff:ff:ff
492     RX: bytes  packets  errors  dropped overrun mcast
493     2449949362 2786187  0       0       0       0
494     TX: bytes  packets  errors  dropped carrier collsns
495     178558497  1783945  332     0       332     35172
496 kuznet@alisa:~ $
497 \end{verbatim}
498 \verb|RX:| and \verb|TX:| lines summarize receiver and transmitter
499 statistics. They contain:
500 \begin{itemize}
501 \item \verb|bytes| --- the total number of bytes received or transmitted
502 on the interface. This number wraps when the maximal length of the data type
503 natural for the architecture is exceeded, so continuous monitoring requires
504 a user level daemon snapping it periodically.
505 \item \verb|packets| --- the total number of packets received or transmitted
506 on the interface.
507 \item \verb|errors| --- the total number of receiver or transmitter errors.
508 \item \verb|dropped| --- the total number of packets dropped due to lack
509 of resources.
510 \item \verb|overrun| --- the total number of receiver overruns resulting
511 in dropped packets. As a rule, if the interface is overrun, it means
512 serious problems in the kernel or that your machine is too slow
513 for this interface.
514 \item \verb|mcast| --- the total number of received multicast packets. This option
515 is only supported by a few devices.
516 \item \verb|carrier| --- total number of link media failures f.e.\ because
517 of lost carrier.
518 \item \verb|collsns| --- the total number of collision events
519 on Ethernet-like media. This number may have a different sense on other
520 link types.
521 \item \verb|compressed| --- the total number of compressed packets. This is
522 available only for links using VJ header compression.
523 \end{itemize}
526 If the \verb|-s| option is entered twice or more,
527 \verb|ip| prints more detailed statistics on receiver
528 and transmitter errors.
530 \begin{verbatim}
531 kuznet@alisa:~ $ ip -s -s link ls eth0
532 3: eth0: <BROADCAST,MULTICAST,UP> mtu 1500 qdisc cbq qlen 100
533     link/ether 00:a0:cc:66:18:78 brd ff:ff:ff:ff:ff:ff
534     RX: bytes  packets  errors  dropped overrun mcast
535     2449949362 2786187  0       0       0       0
536     RX errors: length   crc     frame   fifo    missed
537                0        0       0       0       0
538     TX: bytes  packets  errors  dropped carrier collsns
539     178558497  1783945  332     0       332     35172
540     TX errors: aborted  fifo    window  heartbeat
541                0        0       0       332
542 kuznet@alisa:~ $
543 \end{verbatim}
544 These error names are pure Ethernetisms. Other devices
545 may have non zero values in these fields but they may be
546 interpreted differently.
549 \section{{\tt ip address} --- protocol address management}
551 \paragraph{Abbreviations:} \verb|address|, \verb|addr|, \verb|a|.
553 \paragraph{Object:} The \verb|address| is a protocol (IP or IPv6) address attached
554 to a network device. Each device must have at least one address
555 to use the corresponding protocol. It is possible to have several
556 different addresses attached to one device. These addresses are not
557 discriminated, so that the term {\em alias\/} is not quite appropriate
558 for them and we do not use it in this document.
560 The \verb|ip addr| command displays addresses and their properties,
561 adds new addresses and deletes old ones.
563 \paragraph{Commands:} \verb|add|, \verb|delete|, \verb|flush| and \verb|show|
564 (or \verb|list|).
567 \subsection{{\tt ip address add} --- add a new protocol address}
568 \label{IP-ADDR-ADD}
570 \paragraph{Abbreviations:} \verb|add|, \verb|a|.
572 \paragraph{Arguments:}
574 \begin{itemize}
575 \item \verb|dev NAME|
577 \noindent--- the name of the device to add the address to.
579 \item \verb|local ADDRESS| (default)
581 --- the address of the interface. The format of the address depends
582 on the protocol. It is a dotted quad for IP and a sequence of hexadecimal halfwords
583 separated by colons for IPv6. The \verb|ADDRESS| may be followed by
584 a slash and a decimal number which encodes the network prefix length.
587 \item \verb|peer ADDRESS|
589 --- the address of the remote endpoint for pointopoint interfaces.
590 Again, the \verb|ADDRESS| may be followed by a slash and a decimal number,
591 encoding the network prefix length. If a peer address is specified,
592 the local address {\em cannot\/} have a prefix length. The network prefix is associated
593 with the peer rather than with the local address.
596 \item \verb|broadcast ADDRESS|
598 --- the broadcast address on the interface.
600 It is possible to use the special symbols \verb|'+'| and \verb|'-'|
601 instead of the broadcast address. In this case, the broadcast address
602 is derived by setting/resetting the host bits of the interface prefix.
604 \vskip 1mm
605 \begin{NB}
606 Unlike \verb|ifconfig|, the \verb|ip| utility {\em does not\/} set any broadcast
607 address unless explicitly requested.
608 \end{NB}
611 \item \verb|label NAME|
613 --- Each address may be tagged with a label string.
614 In order to preserve compatibility with Linux-2.0 net aliases,
615 this string must coincide with the name of the device or must be prefixed
616 with the device name followed by colon.
619 \item \verb|scope SCOPE_VALUE|
621 --- the scope of the area where this address is valid.
622 The available scopes are listed in file \verb|/etc/iproute2/rt_scopes|.
623 Predefined scope values are:
625  \begin{itemize}
626 	\item \verb|global| --- the address is globally valid.
627 	\item \verb|site| --- (IPv6 only) the address is site local,
628 	i.e.\ it is valid inside this site.
629 	\item \verb|link| --- the address is link local, i.e.\
630 	it is valid only on this device.
631 	\item \verb|host| --- the address is valid only inside this host.
632  \end{itemize}
634 Appendix~\ref{ADDR-SEL} (p.\pageref{ADDR-SEL} of this document)
635 contains more details on address scopes.
637 \end{itemize}
639 \paragraph{Examples:}
640 \begin{itemize}
641 \item \verb|ip addr add 127.0.0.1/8 dev lo brd + scope host|
643 --- add the usual loopback address to the loopback device.
645 \item \verb|ip addr add 10.0.0.1/24 brd + dev eth0 label eth0:Alias|
647 --- add the address 10.0.0.1 with prefix length 24 (i.e.\ netmask
648 \verb|255.255.255.0|), standard broadcast and label \verb|eth0:Alias|
649 to the interface \verb|eth0|.
650 \end{itemize}
653 \subsection{{\tt ip address delete} --- delete a protocol address}
655 \paragraph{Abbreviations:} \verb|delete|, \verb|del|, \verb|d|.
657 \paragraph{Arguments:} coincide with the arguments of \verb|ip addr add|.
658 The device name is a required argument. The rest are optional.
659 If no arguments are given, the first address is deleted.
661 \paragraph{Examples:}
662 \begin{itemize}
663 \item \verb|ip addr del 127.0.0.1/8 dev lo|
665 --- deletes the loopback address from the loopback device.
666 It would be best not to repeat this experiment.
668 \item Disable IP on the interface \verb|eth0|:
669 \begin{verbatim}
670   while ip -f inet addr del dev eth0; do
671     : nothing
672   done
673 \end{verbatim}
674 Another method to disable IP on an interface using {\tt ip addr flush}
675 may be found in sec.\ref{IP-ADDR-FLUSH}, p.\pageref{IP-ADDR-FLUSH}.
677 \end{itemize}
680 \subsection{{\tt ip address show} --- display protocol addresses}
682 \paragraph{Abbreviations:} \verb|show|, \verb|list|, \verb|lst|, \verb|sh|, \verb|ls|,
683 \verb|l|.
685 \paragraph{Arguments:}
687 \begin{itemize}
688 \item \verb|dev NAME| (default)
690 --- the name of the device.
692 \item \verb|scope SCOPE_VAL|
694 --- only list addresses with this scope.
696 \item \verb|to PREFIX|
698 --- only list addresses matching this prefix.
700 \item \verb|label PATTERN|
702 --- only list addresses with labels matching the \verb|PATTERN|.
703 \verb|PATTERN| is a usual shell style pattern.
706 \item \verb|dynamic| and \verb|permanent|
708 --- (IPv6 only) only list addresses installed due to stateless
709 address configuration or only list permanent (not dynamic) addresses.
711 \item \verb|tentative|
713 --- (IPv6 only) only list addresses which did not pass duplicate
714 address detection.
716 \item \verb|deprecated|
718 --- (IPv6 only) only list deprecated addresses.
721 \item  \verb|primary| and \verb|secondary|
723 --- only list primary (or secondary) addresses.
725 \end{itemize}
728 \paragraph{Output format:}
730 \begin{verbatim}
731 kuznet@alisa:~ $ ip addr ls eth0
732 3: eth0: <BROADCAST,MULTICAST,UP> mtu 1500 qdisc cbq qlen 100
733     link/ether 00:a0:cc:66:18:78 brd ff:ff:ff:ff:ff:ff
734     inet 193.233.7.90/24 brd 193.233.7.255 scope global eth0
735     inet6 3ffe:2400:0:1:2a0:ccff:fe66:1878/64 scope global dynamic
736        valid_lft forever preferred_lft 604746sec
737     inet6 fe80::2a0:ccff:fe66:1878/10 scope link
738 kuznet@alisa:~ $
739 \end{verbatim}
741 The first two lines coincide with the output of \verb|ip link ls|.
742 It is natural to interpret link layer addresses
743 as addresses of the protocol family \verb|AF_PACKET|.
745 Then the list of IP and IPv6 addresses follows, accompanied by
746 additional address attributes: scope value (see Sec.\ref{IP-ADDR-ADD},
747 p.\pageref{IP-ADDR-ADD} above), flags and the address label.
749 Address flags are set by the kernel and cannot be changed
750 administratively. Currently, the following flags are defined:
752 \begin{enumerate}
753 \item \verb|secondary|
755 --- the address is not used when selecting the default source address
756 of outgoing packets (Cf.\ Appendix~\ref{ADDR-SEL}, p.\pageref{ADDR-SEL}.).
757 An IP address becomes secondary if another address with the same
758 prefix bits already exists. The first address is primary.
759 It is the leader of the group of all secondary addresses. When the leader
760 is deleted, all secondaries are purged too.
763 \item \verb|dynamic|
765 --- the address was created due to stateless autoconfiguration~\cite{RFC-ADDRCONF}.
766 In this case the output also contains information on times, when
767 the address is still valid. After \verb|preferred_lft| expires the address is
768 moved to the deprecated state. After \verb|valid_lft| expires the address
769 is finally invalidated.
771 \item \verb|deprecated|
773 --- the address is deprecated, i.e.\ it is still valid, but cannot
774 be used by newly created connections.
776 \item \verb|tentative|
778 --- the address is not used because duplicate address detection~\cite{RFC-ADDRCONF}
779 is still not complete or failed.
781 \end{enumerate}
784 \subsection{{\tt ip address flush} --- flush protocol addresses}
785 \label{IP-ADDR-FLUSH}
787 \paragraph{Abbreviations:} \verb|flush|, \verb|f|.
789 \paragraph{Description:}This command flushes the protocol addresses
790 selected by some criteria.
792 \paragraph{Arguments:} This command has the same arguments as \verb|show|.
793 The difference is that it does not run when no arguments are given.
795 \paragraph{Warning:} This command (and other \verb|flush| commands
796 described below) is pretty dangerous. If you make a mistake, it will
797 not forgive it, but will cruelly purge all the addresses.
799 \paragraph{Statistics:} With the \verb|-statistics| option, the command
800 becomes verbose. It prints out the number of deleted addresses and the number
801 of rounds made to flush the address list. If this option is given
802 twice, \verb|ip addr flush| also dumps all the deleted addresses
803 in the format described in the previous subsection.
805 \paragraph{Example:} Delete all the addresses from the private network
806 10.0.0.0/8:
807 \begin{verbatim}
808 netadm@amber:~ # ip -s -s a f to 10/8
809 2: dummy    inet 10.7.7.7/16 brd 10.7.255.255 scope global dummy
810 3: eth0    inet 10.10.7.7/16 brd 10.10.255.255 scope global eth0
811 4: eth1    inet 10.8.7.7/16 brd 10.8.255.255 scope global eth1
813 *** Round 1, deleting 3 addresses ***
814 *** Flush is complete after 1 round ***
815 netadm@amber:~ #
816 \end{verbatim}
817 Another instructive example is disabling IP on all the Ethernets:
818 \begin{verbatim}
819 netadm@amber:~ # ip -4 addr flush label "eth*"
820 \end{verbatim}
821 And the last example shows how to flush all the IPv6 addresses
822 acquired by the host from stateless address autoconfiguration
823 after you enabled forwarding or disabled autoconfiguration.
824 \begin{verbatim}
825 netadm@amber:~ # ip -6 addr flush dynamic
826 \end{verbatim}
830 \section{{\tt ip neighbour} --- neighbour/arp tables management}
832 \paragraph{Abbreviations:} \verb|neighbour|, \verb|neighbor|, \verb|neigh|,
833 \verb|n|.
835 \paragraph{Object:} \verb|neighbour| objects establish bindings between protocol
836 addresses and link layer addresses for hosts sharing the same link.
837 Neighbour entries are organized into tables. The IPv4 neighbour table
838 is known by another name --- the ARP table.
840 The corresponding commands display neighbour bindings
841 and their properties, add new neighbour entries and delete old ones.
843 \paragraph{Commands:} \verb|add|, \verb|change|, \verb|replace|,
844 \verb|delete|, \verb|flush| and \verb|show| (or \verb|list|).
846 \paragraph{See also:} Appendix~\ref{PROXY-NEIGH}, p.\pageref{PROXY-NEIGH}
847 describes how to manage proxy ARP/NDISC with the \verb|ip| utility.
850 \subsection{{\tt ip neighbour add} --- add a new neighbour entry\\
851 	{\tt ip neighbour change} --- change an existing entry\\
852 	{\tt ip neighbour replace} --- add a new entry or change an existing one}
854 \paragraph{Abbreviations:} \verb|add|, \verb|a|; \verb|change|, \verb|chg|;
855 \verb|replace|,	\verb|repl|.
857 \paragraph{Description:} These commands create new neighbour records
858 or update existing ones.
860 \paragraph{Arguments:}
862 \begin{itemize}
863 \item \verb|to ADDRESS| (default)
865 --- the protocol address of the neighbour. It is either an IPv4 or IPv6 address.
867 \item \verb|dev NAME|
869 --- the interface to which this neighbour is attached.
872 \item \verb|lladdr LLADDRESS|
874 --- the link layer address of the neighbour. \verb|LLADDRESS| can also be
875 \verb|null|.
877 \item \verb|nud NUD_STATE|
879 --- the state of the neighbour entry. \verb|nud| is an abbreviation for ``Neighbour
880 Unreachability Detection''. The state can take one of the following values:
882 \begin{enumerate}
883 \item \verb|permanent| --- the neighbour entry is valid forever and can be only be removed
884 administratively.
885 \item \verb|noarp| --- the neighbour entry is valid. No attempts to validate
886 this entry will be made but it can be removed when its lifetime expires.
887 \item \verb|reachable| --- the neighbour entry is valid until the reachability
888 timeout expires.
889 \item \verb|stale| --- the neighbour entry is valid but suspicious.
890 This option to \verb|ip neigh| does not change the neighbour state if
891 it was valid and the address is not changed by this command.
892 \end{enumerate}
894 \end{itemize}
896 \paragraph{Examples:}
897 \begin{itemize}
898 \item \verb|ip neigh add 10.0.0.3 lladdr 0:0:0:0:0:1 dev eth0 nud perm|
900 --- add a permanent ARP entry for the neighbour 10.0.0.3 on the device \verb|eth0|.
902 \item \verb|ip neigh chg 10.0.0.3 dev eth0 nud reachable|
904 --- change its state to \verb|reachable|.
905 \end{itemize}
908 \subsection{{\tt ip neighbour delete} --- delete a neighbour entry}
910 \paragraph{Abbreviations:} \verb|delete|, \verb|del|, \verb|d|.
912 \paragraph{Description:} This command invalidates a neighbour entry.
914 \paragraph{Arguments:} The arguments are the same as with \verb|ip neigh add|,
915 except that \verb|lladdr| and \verb|nud| are ignored.
918 \paragraph{Example:}
919 \begin{itemize}
920 \item \verb|ip neigh del 10.0.0.3 dev eth0|
922 --- invalidate an ARP entry for the neighbour 10.0.0.3 on the device \verb|eth0|.
924 \end{itemize}
926 \begin{NB}
927  The deleted neighbour entry will not disappear from the tables
928  immediately. If it is in use it cannot be deleted until the last
929  client releases it. Otherwise it will be destroyed during
930  the next garbage collection.
931 \end{NB}
934 \paragraph{Warning:} Attempts to delete or manually change
935 a \verb|noarp| entry created by the kernel may result in unpredictable behaviour.
936 Particularly, the kernel may try to resolve this address even
937 on a \verb|NOARP| interface or if the address is multicast or broadcast.
940 \subsection{{\tt ip neighbour show} --- list neighbour entries}
942 \paragraph{Abbreviations:} \verb|show|, \verb|list|, \verb|sh|, \verb|ls|.
944 \paragraph{Description:}This commands displays neighbour tables.
946 \paragraph{Arguments:}
948 \begin{itemize}
950 \item \verb|to ADDRESS| (default)
952 --- the prefix selecting the neighbours to list.
954 \item \verb|dev NAME|
956 --- only list the neighbours attached to this device.
958 \item \verb|unused|
960 --- only list neighbours which are not currently in use.
962 \item \verb|nud NUD_STATE|
964 --- only list neighbour entries in this state. \verb|NUD_STATE| takes
965 values listed below or the special value \verb|all| which means all states.
966 This option may occur more than once. If this option is absent, \verb|ip|
967 lists all entries except for \verb|none| and \verb|noarp|.
969 \end{itemize}
972 \paragraph{Output format:}
974 \begin{verbatim}
975 kuznet@alisa:~ $ ip neigh ls
976 :: dev lo lladdr 00:00:00:00:00:00 nud noarp
977 fe80::200:cff:fe76:3f85 dev eth0 lladdr 00:00:0c:76:3f:85 router \
978     nud stale
979 0.0.0.0 dev lo lladdr 00:00:00:00:00:00 nud noarp
980 193.233.7.254 dev eth0 lladdr 00:00:0c:76:3f:85 nud reachable
981 193.233.7.85 dev eth0 lladdr 00:e0:1e:63:39:00 nud stale
982 kuznet@alisa:~ $
983 \end{verbatim}
985 The first word of each line is the protocol address of the neighbour.
986 Then the device name follows. The rest of the line describes the contents of
987 the neighbour entry identified by the pair (device, address).
989 \verb|lladdr| is the link layer address of the neighbour.
991 \verb|nud| is the state of the ``neighbour unreachability detection'' machine
992 for this entry. The detailed description of the neighbour
993 state machine can be found in~\cite{RFC-NDISC}. Here is the full list
994 of the states with short descriptions:
996 \begin{enumerate}
997 \item\verb|none| --- the state of the neighbour is void.
998 \item\verb|incomplete| --- the neighbour is in the process of resolution.
999 \item\verb|reachable| --- the neighbour is valid and apparently reachable.
1000 \item\verb|stale| --- the neighbour is valid, but is probably already
1001 unreachable, so the kernel will try to check it at the first transmission.
1002 \item\verb|delay| --- a packet has been sent to the stale neighbour and the kernel is waiting
1003 for confirmation.
1004 \item\verb|probe| --- the delay timer expired but no confirmation was received.
1005 The kernel has started to probe the neighbour with ARP/NDISC messages.
1006 \item\verb|failed| --- resolution has failed.
1007 \item\verb|noarp| --- the neighbour is valid. No attempts to check the entry
1008 will be made.
1009 \item\verb|permanent| --- it is a \verb|noarp| entry, but only the administrator
1010 may remove the entry from the neighbour table.
1011 \end{enumerate}
1013 The link layer address is valid in all states except for \verb|none|,
1014 \verb|failed| and \verb|incomplete|.
1016 IPv6 neighbours can be marked with the additional flag \verb|router|
1017 which means that the neighbour introduced itself as an IPv6 router~\cite{RFC-NDISC}.
1019 \paragraph{Statistics:} The \verb|-statistics| option displays some usage
1020 statistics, f.e.\
1022 \begin{verbatim}
1023 kuznet@alisa:~ $ ip -s n ls 193.233.7.254
1024 193.233.7.254 dev eth0 lladdr 00:00:0c:76:3f:85 ref 5 used 12/13/20 \
1025     nud reachable
1026 kuznet@alisa:~ $
1027 \end{verbatim}
1029 Here \verb|ref| is the number of users of this entry
1030 and \verb|used| is a triplet of time intervals in seconds
1031 separated by slashes. In this case they show that:
1033 \begin{enumerate}
1034 \item the entry was used 12 seconds ago.
1035 \item the entry was confirmed 13 seconds ago.
1036 \item the entry was updated 20 seconds ago.
1037 \end{enumerate}
1039 \subsection{{\tt ip neighbour flush} --- flush neighbour entries}
1041 \paragraph{Abbreviations:} \verb|flush|, \verb|f|.
1043 \paragraph{Description:}This command flushes neighbour tables, selecting
1044 entries to flush by some criteria.
1046 \paragraph{Arguments:} This command has the same arguments as \verb|show|.
1047 The differences are that it does not run when no arguments are given,
1048 and that the default neighbour states to be flushed do not include
1049 \verb|permanent| and \verb|noarp|.
1052 \paragraph{Statistics:} With the \verb|-statistics| option, the command
1053 becomes verbose. It prints out the number of deleted neighbours and the number
1054 of rounds made to flush the neighbour table. If the option is given
1055 twice, \verb|ip neigh flush| also dumps all the deleted neighbours
1056 in the format described in the previous subsection.
1058 \paragraph{Example:}
1059 \begin{verbatim}
1060 netadm@alisa:~ # ip -s -s n f 193.233.7.254
1061 193.233.7.254 dev eth0 lladdr 00:00:0c:76:3f:85 ref 5 used 12/13/20 \
1062     nud reachable
1064 *** Round 1, deleting 1 entries ***
1065 *** Flush is complete after 1 round ***
1066 netadm@alisa:~ #
1067 \end{verbatim}
1070 \section{{\tt ip route} --- routing table management}
1071 \label{IP-ROUTE}
1073 \paragraph{Abbreviations:} \verb|route|, \verb|ro|, \verb|r|.
1075 \paragraph{Object:} \verb|route| entries in the kernel routing tables keep
1076 information about paths to other networked nodes.
1078 Each route entry has a {\em key\/} consisting of a {\em prefix\/}
1079 (i.e.\ a pair containing a network address and the length of its mask) and,
1080 optionally, the TOS value. An IP packet matches the route if the highest
1081 bits of its destination address are equal to the route prefix at least
1082 up to the prefix length and if the TOS of the route is zero or equal to
1083 the TOS of the packet.
1085 If several routes match the packet, the following pruning rules
1086 are used to select the best one (see~\cite{RFC1812}):
1087 \begin{enumerate}
1088 \item The longest matching prefix is selected. All shorter ones
1089 are dropped.
1091 \item If the TOS of some route with the longest prefix is equal to the TOS
1092 of the packet, the routes with different TOS are dropped.
1094 If no exact TOS match was found and routes with TOS=0 exist,
1095 the rest of routes are pruned.
1097 Otherwise, the route lookup fails.
1099 \item If several routes remain after the previous steps, then
1100 the routes with the best preference values are selected.
1102 \item If we still have several routes, then the {\em first\/} of them
1103 is selected.
1105 \begin{NB}
1106  Note the ambiguity of the last step. Unfortunately, Linux
1107  historically allows such a bizarre situation. The sense of the
1108 word ``first'' depends on the order of route additions and it is practically
1109 impossible to maintain a bundle of such routes in this order.
1110 \end{NB}
1112 For simplicity we will limit ourselves to the case where such a situation
1113 is impossible and routes are uniquely identified by the triplet
1114 \{prefix, tos, preference\}. Actually, it is impossible to create
1115 non-unique routes with \verb|ip| commands described in this section.
1117 One useful exception to this rule is the default route on non-forwarding
1118 hosts. It is ``officially'' allowed to have several fallback routes
1119 when several routers are present on directly connected networks.
1120 In this case, Linux-2.2 makes ``dead gateway detection''~\cite{RFC1122}
1121 controlled by neighbour unreachability detection and by advice
1122 from transport protocols to select a working router, so the order
1123 of the routes is not essential. However, in this case,
1124 fiddling with default routes manually is not recommended. Use the Router Discovery
1125 protocol (see Appendix~\ref{EXAMPLE-SETUP}, p.\pageref{EXAMPLE-SETUP})
1126 instead. Actually, Linux-2.2 IPv6 does not give user level applications
1127 any access to default routes.
1128 \end{enumerate}
1130 Certainly, the steps above are not performed exactly
1131 in this sequence. Instead, the routing table in the kernel is kept
1132 in some data structure to achieve the final result
1133 with minimal cost. However, not depending on a particular
1134 routing algorithm implemented in the kernel, we can summarize
1135 the statements above as: a route is identified by the triplet
1136 \{prefix, tos, preference\}. This {\em key\/} lets us locate
1137 the route in the routing table.
1139 \paragraph{Route attributes:} Each route key refers to a routing
1140 information record containing
1141 the data required to deliver IP packets (f.e.\ output device and
1142 next hop router) and some optional attributes (f.e. the path MTU or
1143 the preferred source address when communicating with this destination).
1144 These attributes are described in the following subsection.
1146 \paragraph{Route types:} \label{IP-ROUTE-TYPES}
1147 It is important that the set
1148 of required and optional attributes depend on the route {\em type\/}.
1149 The most important route type
1150 is \verb|unicast|. It describes real paths to other hosts.
1151 As a rule, common routing tables contain only such routes. However,
1152 there are other types of routes with different semantics. The
1153 full list of types understood by Linux-2.2 is:
1154 \begin{itemize}
1155 \item \verb|unicast| --- the route entry describes real paths to the
1156 destinations covered by the route prefix.
1157 \item \verb|unreachable| --- these destinations are unreachable. Packets
1158 are discarded and the ICMP message {\em host unreachable\/} is generated.
1159 The local senders get an \verb|EHOSTUNREACH| error.
1160 \item \verb|blackhole| --- these destinations are unreachable. Packets
1161 are discarded silently. The local senders get an \verb|EINVAL| error.
1162 \item \verb|prohibit| --- these destinations are unreachable. Packets
1163 are discarded and the ICMP message {\em communication administratively
1164 prohibited\/} is generated. The local senders get an \verb|EACCES| error.
1165 \item \verb|local| --- the destinations are assigned to this
1166 host. The packets are looped back and delivered locally.
1167 \item \verb|broadcast| --- the destinations are broadcast addresses.
1168 The packets are sent as link broadcasts.
1169 \item \verb|throw| --- a special control route used together with policy
1170 rules (see sec.\ref{IP-RULE}, p.\pageref{IP-RULE}). If such a route is selected, lookup
1171 in this table is terminated pretending that no route was found.
1172 Without policy routing it is equivalent to the absence of the route in the routing
1173 table. The packets are dropped and the ICMP message {\em net unreachable\/}
1174 is generated. The local senders get an \verb|ENETUNREACH| error.
1175 \item \verb|nat| --- a special NAT route. Destinations covered by the prefix
1176 are considered to be dummy (or external) addresses which require translation
1177 to real (or internal) ones before forwarding. The addresses to translate to
1178 are selected with the attribute \verb|via|. More about NAT is
1179 in Appendix~\ref{ROUTE-NAT}, p.\pageref{ROUTE-NAT}.
1180 \item \verb|anycast| --- ({\em not implemented\/}) the destinations are
1181 {\em anycast\/} addresses assigned to this host. They are mainly equivalent
1182 to \verb|local| with one difference: such addresses are invalid when used
1183 as the source address of any packet.
1184 \item \verb|multicast| --- a special type used for multicast routing.
1185 It is not present in normal routing tables.
1186 \end{itemize}
1188 \paragraph{Route tables:} Linux-2.2 can pack routes into several routing
1189 tables identified by a number in the range from 1 to 255 or by
1190 name from the file \verb|/etc/iproute2/rt_tables|. By default all normal
1191 routes are inserted into the \verb|main| table (ID 254) and the kernel only uses
1192 this table when calculating routes.
1194 Actually, one other table always exists, which is invisible but
1195 even more important. It is the \verb|local| table (ID 255). This table
1196 consists of routes for local and broadcast addresses. The kernel maintains
1197 this table automatically and the administrator usually need not modify it
1198 or even look at it.
1200 The multiple routing tables enter the game when {\em policy routing\/}
1201 is used. See sec.\ref{IP-RULE}, p.\pageref{IP-RULE}.
1202 In this case, the table identifier effectively becomes
1203 one more parameter, which should be added to the triplet
1204 \{prefix, tos, preference\} to uniquely identify the route.
1207 \subsection{{\tt ip route add} --- add a new route\\
1208 	{\tt ip route change} --- change a route\\
1209 	{\tt ip route replace} --- change a route or add a new one}
1210 \label{IP-ROUTE-ADD}
1212 \paragraph{Abbreviations:} \verb|add|, \verb|a|; \verb|change|, \verb|chg|;
1213 	\verb|replace|, \verb|repl|.
1216 \paragraph{Arguments:}
1217 \begin{itemize}
1218 \item \verb|to PREFIX| or \verb|to TYPE PREFIX| (default)
1220 --- the destination prefix of the route. If \verb|TYPE| is omitted,
1221 \verb|ip| assumes type \verb|unicast|. Other values of \verb|TYPE|
1222 are listed above. \verb|PREFIX| is an IP or IPv6 address optionally followed
1223 by a slash and the prefix length. If the length of the prefix is missing,
1224 \verb|ip| assumes a full-length host route. There is also a special
1225 \verb|PREFIX| --- \verb|default| --- which is equivalent to IP \verb|0/0| or
1226 to IPv6 \verb|::/0|.
1228 \item \verb|tos TOS| or \verb|dsfield TOS|
1230 --- the Type Of Service (TOS) key. This key has no associated mask and
1231 the longest match is understood as: First, compare the TOS
1232 of the route and of the packet. If they are not equal, then the packet
1233 may still match a route with a zero TOS. \verb|TOS| is either an 8 bit hexadecimal
1234 number or an identifier from {\tt /etc/iproute2/rt\_dsfield}.
1237 \item \verb|metric NUMBER| or \verb|preference NUMBER|
1239 --- the preference value of the route. \verb|NUMBER| is an arbitrary 32bit number.
1241 \item \verb|table TABLEID|
1243 --- the table to add this route to.
1244 \verb|TABLEID| may be a number or a string from the file
1245 \verb|/etc/iproute2/rt_tables|. If this parameter is omitted,
1246 \verb|ip| assumes the \verb|main| table, with the exception of
1247 \verb|local|, \verb|broadcast| and \verb|nat| routes, which are
1248 put into the \verb|local| table by default.
1250 \item \verb|dev NAME|
1252 --- the output device name.
1254 \item \verb|via ADDRESS|
1256 --- the address of the nexthop router. Actually, the sense of this field depends
1257 on the route type. For normal \verb|unicast| routes it is either the true nexthop
1258 router or, if it is a direct route installed in BSD compatibility mode,
1259 it can be a local address of the interface.
1260 For NAT routes it is the first address of the block of translated IP destinations.
1262 \item \verb|src ADDRESS|
1264 --- the source address to prefer when sending to the destinations
1265 covered by the route prefix.
1267 \item \verb|realm REALMID|
1269 --- the realm to which this route is assigned.
1270 \verb|REALMID| may be a number or a string from the file
1271 \verb|/etc/iproute2/rt_realms|. Sec.\ref{RT-REALMS} (p.\pageref{RT-REALMS})
1272 contains more information on realms.
1274 \item \verb|mtu MTU| or \verb|mtu lock MTU|
1276 --- the MTU along the path to the destination. If the modifier \verb|lock| is
1277 not used, the MTU may be updated by the kernel due to Path MTU Discovery.
1278 If the modifier \verb|lock| is used, no path MTU discovery will be tried,
1279 all packets will be sent without the DF bit in IPv4 case
1280 or fragmented to MTU for IPv6.
1282 \item \verb|window NUMBER|
1284 --- the maximal window for TCP to advertise to these destinations,
1285 measured in bytes. It limits maximal data bursts that our TCP
1286 peers are allowed to send to us.
1288 \item \verb|rtt NUMBER|
1290 --- the initial RTT (``Round Trip Time'') estimate.
1293 \item \verb|rttvar NUMBER|
1295 --- \threeonly the initial RTT variance estimate.
1298 \item \verb|ssthresh NUMBER|
1300 --- \threeonly an estimate for the initial slow start threshold.
1303 \item \verb|cwnd NUMBER|
1305 --- \threeonly the clamp for congestion window. It is ignored if the \verb|lock|
1306     flag is not used.
1309 \item \verb|advmss NUMBER|
1311 --- \threeonly the MSS (``Maximal Segment Size'') to advertise to these
1312     destinations when establishing TCP connections. If it is not given,
1313     Linux uses a default value calculated from the first hop device MTU.
1315 \begin{NB}
1316   If the path to these destination is asymmetric, this guess may be wrong.
1317 \end{NB}
1319 \item \verb|reordering NUMBER|
1321 --- \threeonly Maximal reordering on the path to this destination.
1322     If it is not given, Linux uses the value selected with \verb|sysctl|
1323     variable \verb|net/ipv4/tcp_reordering|.
1327 \item \verb|nexthop NEXTHOP|
1329 --- the nexthop of a multipath route. \verb|NEXTHOP| is a complex value
1330 with its own syntax similar to the top level argument lists:
1331 \begin{itemize}
1332 \item \verb|via ADDRESS| is the nexthop router.
1333 \item \verb|dev NAME| is the output device.
1334 \item \verb|weight NUMBER| is a weight for this element of a multipath
1335 route reflecting its relative bandwidth or quality.
1336 \end{itemize}
1338 \item \verb|scope SCOPE_VAL|
1340 --- the scope of the destinations covered by the route prefix.
1341 \verb|SCOPE_VAL| may be a number or a string from the file
1342 \verb|/etc/iproute2/rt_scopes|.
1343 If this parameter is omitted,
1344 \verb|ip| assumes scope \verb|global| for all gatewayed \verb|unicast|
1345 routes, scope \verb|link| for direct \verb|unicast| and \verb|broadcast| routes
1346 and scope \verb|host| for \verb|local| routes.
1348 \item \verb|protocol RTPROTO|
1350 --- the routing protocol identifier of this route.
1351 \verb|RTPROTO| may be a number or a string from the file
1352 \verb|/etc/iproute2/rt_protos|. If the routing protocol ID is
1353 not given, \verb|ip| assumes protocol \verb|boot| (i.e.\
1354 it assumes the route was added by someone who doesn't
1355 understand what they are doing). Several protocol values have a fixed interpretation.
1356 Namely:
1357 \begin{itemize}
1358 \item \verb|redirect| --- the route was installed due to an ICMP redirect.
1359 \item \verb|kernel| --- the route was installed by the kernel during
1360 autoconfiguration.
1361 \item \verb|boot| --- the route was installed during the bootup sequence.
1362 If a routing daemon starts, it will purge all of them.
1363 \item \verb|static| --- the route was installed by the administrator
1364 to override dynamic routing. Routing daemon will respect them
1365 and, probably, even advertise them to its peers.
1366 \item \verb|ra| --- the route was installed by Router Discovery protocol.
1367 \end{itemize}
1368 The rest of the values are not reserved and the administrator is free
1369 to assign (or not to assign) protocol tags. At least, routing
1370 daemons should take care of setting some unique protocol values,
1371 f.e.\ as they are assigned in \verb|rtnetlink.h| or in \verb|rt_protos|
1372 database.
1375 \item \verb|onlink|
1377 --- pretend that the nexthop is directly attached to this link,
1378 even if it does not match any interface prefix. One application of this
1379 option may be found in~\cite{IP-TUNNELS}.
1381 \item \verb|equalize|
1383 --- allow packet by packet randomization on multipath routes.
1384 Without this modifier, the route will be frozen to one selected
1385 nexthop, so that load splitting will only occur on per-flow base.
1386 \verb|equalize| only works if the kernel is patched.
1389 \end{itemize}
1392 \begin{NB}
1393   Actually there are more commands: \verb|prepend| does the same
1394   thing as classic \verb|route add|, i.e.\ adds a route, even if another
1395   route to the same destination exists. Its opposite case is \verb|append|,
1396   which adds the route to the end of the list. Avoid these
1397   features.
1398 \end{NB}
1399 \begin{NB}
1400   More sad news, IPv6 only understands the \verb|append| command correctly.
1401   All the others are translated into \verb|append| commands. Certainly,
1402   this will change in the future.
1403 \end{NB}
1405 \paragraph{Examples:}
1406 \begin{itemize}
1407 \item add a plain route to network 10.0.0/24 via gateway 193.233.7.65
1408 \begin{verbatim}
1409   ip route add 10.0.0/24 via 193.233.7.65
1410 \end{verbatim}
1411 \item change it to a direct route via the \verb|dummy| device
1412 \begin{verbatim}
1413   ip ro chg 10.0.0/24 dev dummy
1414 \end{verbatim}
1415 \item add a default multipath route splitting the load between \verb|ppp0|
1416 and \verb|ppp1|
1417 \begin{verbatim}
1418   ip route add default scope global nexthop dev ppp0 \
1419                                     nexthop dev ppp1
1420 \end{verbatim}
1421 Note the scope value. It is not necessary but it informs the kernel
1422 that this route is gatewayed rather than direct. Actually, if you
1423 know the addresses of remote endpoints it would be better to use the
1424 \verb|via| parameter.
1425 \item announce that the address 192.203.80.144 is not a real one, but
1426 should be translated to 193.233.7.83 before forwarding
1427 \begin{verbatim}
1428   ip route add nat 192.203.80.144 via 193.233.7.83
1429 \end{verbatim}
1430 Backward translation is setup with policy rules described
1431 in the following section (sec.\ref{IP-RULE}, p.\pageref{IP-RULE}).
1432 \end{itemize}
1434 \subsection{{\tt ip route delete} --- delete a route}
1436 \paragraph{Abbreviations:} \verb|delete|, \verb|del|, \verb|d|.
1438 \paragraph{Arguments:} \verb|ip route del| has the same arguments as
1439 \verb|ip route add|, but their semantics are a bit different.
1441 Key values (\verb|to|, \verb|tos|, \verb|preference| and \verb|table|)
1442 select the route to delete. If optional attributes are present, \verb|ip|
1443 verifies that they coincide with the attributes of the route to delete.
1444 If no route with the given key and attributes was found, \verb|ip route del|
1445 fails.
1446 \begin{NB}
1447 Linux-2.0 had the option to delete a route selected only by prefix address,
1448 ignoring its length (i.e.\ netmask). This option no longer exists
1449 because it was ambiguous. However, look at {\tt ip route flush}
1450 (sec.\ref{IP-ROUTE-FLUSH}, p.\pageref{IP-ROUTE-FLUSH}) which
1451 provides similar and even richer functionality.
1452 \end{NB}
1454 \paragraph{Example:}
1455 \begin{itemize}
1456 \item delete the multipath route created by the command in previous subsection
1457 \begin{verbatim}
1458   ip route del default scope global nexthop dev ppp0 \
1459                                     nexthop dev ppp1
1460 \end{verbatim}
1461 \end{itemize}
1465 \subsection{{\tt ip route show} --- list routes}
1467 \paragraph{Abbreviations:} \verb|show|, \verb|list|, \verb|sh|, \verb|ls|, \verb|l|.
1469 \paragraph{Description:} the command displays the contents of the routing tables
1470 or the route(s) selected by some criteria.
1473 \paragraph{Arguments:}
1474 \begin{itemize}
1475 \item \verb|to SELECTOR| (default)
1477 --- only select routes from the given range of destinations. \verb|SELECTOR|
1478 consists of an optional modifier (\verb|root|, \verb|match| or \verb|exact|)
1479 and a prefix. \verb|root PREFIX| selects routes with prefixes not shorter
1480 than \verb|PREFIX|. F.e.\ \verb|root 0/0| selects the entire routing table.
1481 \verb|match PREFIX| selects routes with prefixes not longer than
1482 \verb|PREFIX|. F.e.\ \verb|match 10.0/16| selects \verb|10.0/16|,
1483 \verb|10/8| and \verb|0/0|, but it does not select \verb|10.1/16| and
1484 \verb|10.0.0/24|. And \verb|exact PREFIX| (or just \verb|PREFIX|)
1485 selects routes with this exact prefix. If neither of these options
1486 are present, \verb|ip| assumes \verb|root 0/0| i.e.\ it lists the entire table.
1489 \item \verb|tos TOS| or \verb|dsfield TOS|
1491  --- only select routes with the given TOS.
1494 \item \verb|table TABLEID|
1496  --- show the routes from this table(s). The default setting is to show
1497 \verb|table| \verb|main|. \verb|TABLEID| may either be the ID of a real table
1498 or one of the special values:
1499   \begin{itemize}
1500   \item \verb|all| --- list all of the tables.
1501   \item \verb|cache| --- dump the routing cache.
1502   \end{itemize}
1503 \begin{NB}
1504   IPv6 has a single table. However, splitting it into \verb|main|, \verb|local|
1505   and \verb|cache| is emulated by the \verb|ip| utility.
1506 \end{NB}
1508 \item \verb|cloned| or \verb|cached|
1510 --- list cloned routes i.e.\ routes which were dynamically forked from
1511 other routes because some route attribute (f.e.\ MTU) was updated.
1512 Actually, it is equivalent to \verb|table cache|.
1514 \item \verb|from SELECTOR|
1516 --- the same syntax as for \verb|to|, but it binds the source address range
1517 rather than destinations. Note that the \verb|from| option only works with
1518 cloned routes.
1520 \item \verb|protocol RTPROTO|
1522 --- only list routes of this protocol.
1525 \item \verb|scope SCOPE_VAL|
1527 --- only list routes with this scope.
1529 \item \verb|type TYPE|
1531 --- only list routes of this type.
1533 \item \verb|dev NAME|
1535 --- only list routes going via this device.
1537 \item \verb|via PREFIX|
1539 --- only list routes going via the nexthop routers selected by \verb|PREFIX|.
1541 \item \verb|src PREFIX|
1543 --- only list routes with preferred source addresses selected
1544 by \verb|PREFIX|.
1546 \item \verb|realm REALMID| or \verb|realms FROMREALM/TOREALM|
1548 --- only list routes with these realms.
1550 \end{itemize}
1552 \paragraph{Examples:} Let us count routes of protocol \verb|gated/bgp|
1553 on a router:
1554 \begin{verbatim}
1555 kuznet@amber:~ $ ip ro ls proto gated/bgp | wc
1556    1413    9891    79010
1557 kuznet@amber:~ $
1558 \end{verbatim}
1559 To count the size of the routing cache, we have to use the \verb|-o| option
1560 because cached attributes can take more than one line of output:
1561 \begin{verbatim}
1562 kuznet@amber:~ $ ip -o ro ls cloned | wc
1563    159    2543    18707
1564 kuznet@amber:~ $
1565 \end{verbatim}
1568 \paragraph{Output format:} The output of this command consists
1569 of per route records separated by line feeds.
1570 However, some records may consist
1571 of more than one line: particularly, this is the case when the route
1572 is cloned or you requested additional statistics. If the
1573 \verb|-o| option was given, then line feeds separating lines inside
1574 records are replaced with the backslash sign.
1576 The output has the same syntax as arguments given to {\tt ip route add},
1577 so that it can be understood easily. F.e.\
1578 \begin{verbatim}
1579 kuznet@amber:~ $ ip ro ls 193.233.7/24
1580 193.233.7.0/24 dev eth0  proto gated/conn  scope link \
1581     src 193.233.7.65 realms inr.ac
1582 kuznet@amber:~ $
1583 \end{verbatim}
1585 If you list cloned entries, the output contains other attributes which
1586 are evaluated during route calculation and updated during route
1587 lifetime. An example of the output is:
1588 \begin{verbatim}
1589 kuznet@amber:~ $ ip ro ls 193.233.7.82 tab cache
1590 193.233.7.82 from 193.233.7.82 dev eth0  src 193.233.7.65 \
1591   realms inr.ac/inr.ac
1592     cache <src-direct,redirect>  mtu 1500 rtt 300 iif eth0
1593 193.233.7.82 dev eth0  src 193.233.7.65 realms inr.ac
1594     cache  mtu 1500 rtt 300
1595 kuznet@amber:~ $
1596 \end{verbatim}
1597 \begin{NB}
1598   \label{NB-strange-route}
1599   The route looks a bit strange, doesn't it? Did you notice that
1600   it is a path from 193.233.7.82 back to 193.233.82? Well, you will
1601   see in the section on \verb|ip route get| (p.\pageref{NB-nature-of-strangeness})
1602   how it appeared.
1603 \end{NB}
1604 The second line, starting with the word \verb|cache|, shows
1605 additional attributes which normal routes do not possess.
1606 Cached flags are summarized in angle brackets:
1607 \begin{itemize}
1608 \item \verb|local| --- packets are delivered locally.
1609 It stands for loopback unicast routes, for broadcast routes
1610 and for multicast routes, if this host is a member of the corresponding
1611 group.
1613 \item \verb|reject| --- the path is bad. Any attempt to use it results
1614 in an error. See attribute \verb|error| below (p.\pageref{IP-ROUTE-GET-error}).
1616 \item \verb|mc| --- the destination is multicast.
1618 \item \verb|brd| --- the destination is broadcast.
1620 \item \verb|src-direct| --- the source is on a directly connected
1621 interface.
1623 \item \verb|redirected| --- the route was created by an ICMP Redirect.
1625 \item \verb|redirect| --- packets going via this route will
1626 trigger an ICMP redirect.
1628 \item \verb|fastroute| --- the route is eligible to be used for fastroute.
1630 \item \verb|equalize| --- make packet by packet randomization
1631 along this path.
1633 \item \verb|dst-nat| --- the destination address requires translation.
1635 \item \verb|src-nat| --- the source address requires translation.
1637 \item \verb|masq| --- the source address requires masquerading.
1638 This feature disappeared in linux-2.4.
1640 \item \verb|notify| --- ({\em not implemented}) change/deletion
1641 of this route will trigger RTNETLINK notification.
1642 \end{itemize}
1644 Then some optional attributes follow:
1645 \begin{itemize}
1646 \item \verb|error| --- on \verb|reject| routes it is error code
1647 returned to local senders when they try to use this route.
1648 These error codes are translated into ICMP error codes, sent to remote
1649 senders, according to the rules described above in the subsection
1650 devoted to route types (p.\pageref{IP-ROUTE-TYPES}).
1651 \label{IP-ROUTE-GET-error}
1653 \item \verb|expires| --- this entry will expire after this timeout.
1655 \item \verb|iif| --- the packets for this path are expected to arrive
1656 on this interface.
1657 \end{itemize}
1659 \paragraph{Statistics:} With the \verb|-statistics| option, more
1660 information about this route is shown:
1661 \begin{itemize}
1662 \item \verb|users| --- the number of users of this entry.
1663 \item \verb|age| --- shows when this route was last used.
1664 \item \verb|used| --- the number of lookups of this route since its creation.
1665 \end{itemize}
1668 \subsection{{\tt ip route flush} --- flush routing tables}
1669 \label{IP-ROUTE-FLUSH}
1671 \paragraph{Abbreviations:} \verb|flush|, \verb|f|.
1673 \paragraph{Description:} this command flushes routes selected
1674 by some criteria.
1676 \paragraph{Arguments:} the arguments have the same syntax and semantics
1677 as the arguments of \verb|ip route show|, but routing tables are not
1678 listed but purged. The only difference is the default action: \verb|show|
1679 dumps all the IP main routing table but \verb|flush| prints the helper page.
1680 The reason for this difference does not require any explanation, does it?
1683 \paragraph{Statistics:} With the \verb|-statistics| option, the command
1684 becomes verbose. It prints out the number of deleted routes and the number
1685 of rounds made to flush the routing table. If the option is given
1686 twice, \verb|ip route flush| also dumps all the deleted routes
1687 in the format described in the previous subsection.
1689 \paragraph{Examples:} The first example flushes all the
1690 gatewayed routes from the main table (f.e.\ after a routing daemon crash).
1691 \begin{verbatim}
1692 netadm@amber:~ # ip -4 ro flush scope global type unicast
1693 \end{verbatim}
1694 This option deserves to be put into a scriptlet \verb|routef|.
1695 \begin{NB}
1696 This option was described in the \verb|route(8)| man page borrowed
1697 from BSD, but was never implemented in Linux.
1698 \end{NB}
1700 The second example flushes all IPv6 cloned routes:
1701 \begin{verbatim}
1702 netadm@amber:~ # ip -6 -s -s ro flush cache
1703 3ffe:2400::220:afff:fef4:c5d1 via 3ffe:2400::220:afff:fef4:c5d1 \
1704   dev eth0  metric 0
1705     cache  used 2 age 12sec mtu 1500 rtt 300
1706 3ffe:2400::280:adff:feb7:8034 via 3ffe:2400::280:adff:feb7:8034 \
1707   dev eth0  metric 0
1708     cache  used 2 age 15sec mtu 1500 rtt 300
1709 3ffe:2400::280:c8ff:fe59:5bcc via 3ffe:2400::280:c8ff:fe59:5bcc \
1710   dev eth0  metric 0
1711     cache  users 1 used 1 age 23sec mtu 1500 rtt 300
1712 3ffe:2400:0:1:2a0:ccff:fe66:1878 via 3ffe:2400:0:1:2a0:ccff:fe66:1878 \
1713   dev eth1  metric 0
1714     cache  used 2 age 20sec mtu 1500 rtt 300
1715 3ffe:2400:0:1:a00:20ff:fe71:fb30 via 3ffe:2400:0:1:a00:20ff:fe71:fb30 \
1716   dev eth1  metric 0
1717     cache  used 2 age 33sec mtu 1500 rtt 300
1718 ff02::1 via ff02::1 dev eth1  metric 0
1719     cache  users 1 used 1 age 45sec mtu 1500 rtt 300
1721 *** Round 1, deleting 6 entries ***
1722 *** Flush is complete after 1 round ***
1723 netadm@amber:~ # ip -6 -s -s ro flush cache
1724 Nothing to flush.
1725 netadm@amber:~ #
1726 \end{verbatim}
1728 The third example flushes BGP routing tables after a \verb|gated|
1729 death.
1730 \begin{verbatim}
1731 netadm@amber:~ # ip ro ls proto gated/bgp | wc
1732    1408    9856    78730
1733 netadm@amber:~ # ip -s ro f proto gated/bgp
1735 *** Round 1, deleting 1408 entries ***
1736 *** Flush is complete after 1 round ***
1737 netadm@amber:~ # ip ro f proto gated/bgp
1738 Nothing to flush.
1739 netadm@amber:~ # ip ro ls proto gated/bgp
1740 netadm@amber:~ #
1741 \end{verbatim}
1744 \subsection{{\tt ip route get} --- get a single route}
1745 \label{IP-ROUTE-GET}
1747 \paragraph{Abbreviations:} \verb|get|, \verb|g|.
1749 \paragraph{Description:} this command gets a single route to a destination
1750 and prints its contents exactly as the kernel sees it.
1752 \paragraph{Arguments:}
1753 \begin{itemize}
1754 \item \verb|to ADDRESS| (default)
1756 --- the destination address.
1758 \item \verb|from ADDRESS|
1760 --- the source address.
1762 \item \verb|tos TOS| or \verb|dsfield TOS|
1764 --- the Type Of Service.
1766 \item \verb|iif NAME|
1768 --- the device from which this packet is expected to arrive.
1770 \item \verb|oif NAME|
1772 --- force the output device on which this packet will be routed.
1774 \item \verb|connected|
1776 --- if no source address (option \verb|from|) was given, relookup
1777 the route with the source set to the preferred address received from the first lookup.
1778 If policy routing is used, it may be a different route.
1780 \end{itemize}
1782 Note that this operation is not equivalent to \verb|ip route show|.
1783 \verb|show| shows existing routes. \verb|get| resolves them and
1784 creates new clones if necessary. Essentially, \verb|get|
1785 is equivalent to sending a packet along this path.
1786 If the \verb|iif| argument is not given, the kernel creates a route
1787 to output packets towards the requested destination.
1788 This is equivalent to pinging the destination
1789 with a subsequent {\tt ip route ls cache}, however, no packets are
1790 actually sent. With the \verb|iif| argument, the kernel pretends
1791 that a packet arrived from this interface and searches for
1792 a path to forward the packet.
1794 \paragraph{Output format:} This command outputs routes in the same
1795 format as \verb|ip route ls|.
1797 \paragraph{Examples:}
1798 \begin{itemize}
1799 \item Find a route to output packets to 193.233.7.82:
1800 \begin{verbatim}
1801 kuznet@amber:~ $ ip route get 193.233.7.82
1802 193.233.7.82 dev eth0  src 193.233.7.65 realms inr.ac
1803     cache  mtu 1500 rtt 300
1804 kuznet@amber:~ $
1805 \end{verbatim}
1807 \item Find a route to forward packets arriving on \verb|eth0|
1808 from 193.233.7.82 and destined for 193.233.7.82:
1809 \begin{verbatim}
1810 kuznet@amber:~ $ ip r g 193.233.7.82 from 193.233.7.82 iif eth0
1811 193.233.7.82 from 193.233.7.82 dev eth0  src 193.233.7.65 \
1812   realms inr.ac/inr.ac
1813     cache <src-direct,redirect>  mtu 1500 rtt 300 iif eth0
1814 kuznet@amber:~ $
1815 \end{verbatim}
1816 \begin{NB}
1817   \label{NB-nature-of-strangeness}
1818   This is the command that created the funny route from 193.233.7.82
1819   looped back to 193.233.7.82 (cf.\ NB on~p.\pageref{NB-strange-route}).
1820   Note the \verb|redirect| flag on it.
1821 \end{NB}
1823 \item Find a multicast route for packets arriving on \verb|eth0|
1824 from host 193.233.7.82 and destined for multicast group 224.2.127.254
1825 (it is assumed that a multicast routing daemon is running.
1826 In this case, it is \verb|pimd|)
1827 \begin{verbatim}
1828 kuznet@amber:~ $ ip r g 224.2.127.254 from 193.233.7.82 iif eth0
1829 multicast 224.2.127.254 from 193.233.7.82 dev lo  \
1830   src 193.233.7.65 realms inr.ac/cosmos
1831     cache <mc> iif eth0 Oifs: eth1 pimreg
1832 kuznet@amber:~ $
1833 \end{verbatim}
1834 This route differs from the ones seen before. It contains a ``normal'' part
1835 and a ``multicast'' part. The normal part is used to deliver (or not to
1836 deliver) the packet to local IP listeners. In this case the router
1837 is not a member
1838 of this group, so that route has no \verb|local| flag and only
1839 forwards packets. The output device for such entries is always loopback.
1840 The multicast part consists of an additional \verb|Oifs:| list showing
1841 the output interfaces.
1842 \end{itemize}
1845 It is time for a more complicated example. Let us add an invalid
1846 gatewayed route for a destination which is really directly connected:
1847 \begin{verbatim}
1848 netadm@alisa:~ # ip route add 193.233.7.98 via 193.233.7.254
1849 netadm@alisa:~ # ip route get 193.233.7.98
1850 193.233.7.98 via 193.233.7.254 dev eth0  src 193.233.7.90
1851     cache  mtu 1500 rtt 3072
1852 netadm@alisa:~ #
1853 \end{verbatim}
1854 and probe it with ping:
1855 \begin{verbatim}
1856 netadm@alisa:~ # ping -n 193.233.7.98
1857 PING 193.233.7.98 (193.233.7.98) from 193.233.7.90 : 56 data bytes
1858 From 193.233.7.254: Redirect Host(New nexthop: 193.233.7.98)
1859 64 bytes from 193.233.7.98: icmp_seq=0 ttl=255 time=3.5 ms
1860 From 193.233.7.254: Redirect Host(New nexthop: 193.233.7.98)
1861 64 bytes from 193.233.7.98: icmp_seq=1 ttl=255 time=2.2 ms
1862 64 bytes from 193.233.7.98: icmp_seq=2 ttl=255 time=0.4 ms
1863 64 bytes from 193.233.7.98: icmp_seq=3 ttl=255 time=0.4 ms
1864 64 bytes from 193.233.7.98: icmp_seq=4 ttl=255 time=0.4 ms
1865 ^C
1866 --- 193.233.7.98 ping statistics ---
1867 5 packets transmitted, 5 packets received, 0% packet loss
1868 round-trip min/avg/max = 0.4/1.3/3.5 ms
1869 netadm@alisa:~ #
1870 \end{verbatim}
1871 What happened? Router 193.233.7.254 understood that we have a much
1872 better path to the destination and sent us an ICMP redirect message.
1873 We may retry \verb|ip route get| to see what we have in the routing
1874 tables now:
1875 \begin{verbatim}
1876 netadm@alisa:~ # ip route get 193.233.7.98
1877 193.233.7.98 dev eth0  src 193.233.7.90
1878     cache <redirected>  mtu 1500 rtt 3072
1879 netadm@alisa:~ #
1880 \end{verbatim}
1884 \section{{\tt ip rule} --- routing policy database management}
1885 \label{IP-RULE}
1887 \paragraph{Abbreviations:} \verb|rule|, \verb|ru|.
1889 \paragraph{Object:} \verb|rule|s in the routing policy database control
1890 the route selection algorithm.
1892 Classic routing algorithms used in the Internet make routing decisions
1893 based only on the destination address of packets (and in theory,
1894 but not in practice, on the TOS field). The seminal review of classic
1895 routing algorithms and their modifications can be found in~\cite{RFC1812}.
1897 In some circumstances we want to route packets differently depending not only
1898 on destination addresses, but also on other packet fields: source address,
1899 IP protocol, transport protocol ports or even packet payload.
1900 This task is called ``policy routing''.
1902 \begin{NB}
1903   ``policy routing'' $\neq$ ``routing policy''.
1905 \noindent	``policy routing'' $=$ ``cunning routing''.
1907 \noindent	``routing policy'' $=$ ``routing tactics'' or ``routing plan''.
1908 \end{NB}
1910 To solve this task, the conventional destination based routing table, ordered
1911 according to the longest match rule, is replaced with a ``routing policy
1912 database'' (or RPDB), which selects routes
1913 by executing some set of rules. The rules may have lots of keys of different
1914 natures and therefore they have no natural ordering, but one imposed
1915 by the administrator. Linux-2.2 RPDB is a linear list of rules
1916 ordered by numeric priority value.
1917 RPDB explicitly allows matching a few packet fields:
1919 \begin{itemize}
1920 \item packet source address.
1921 \item packet destination address.
1922 \item TOS.
1923 \item incoming interface (which is packet metadata, rather than a packet field).
1924 \end{itemize}
1926 Matching IP protocols and transport ports is also possible,
1927 indirectly, via \verb|ipchains|, by exploiting their ability
1928 to mark some classes of packets with \verb|fwmark|. Therefore,
1929 \verb|fwmark| is also included in the set of keys checked by rules.
1931 Each policy routing rule consists of a {\em selector\/} and an {\em action\/}
1932 predicate. The RPDB is scanned in the order of increasing priority. The selector
1933 of each rule is applied to \{source address, destination address, incoming
1934 interface, tos, fwmark\} and, if the selector matches the packet,
1935 the action is performed.  The action predicate may return with success.
1936 In this case, it will either give a route or failure indication
1937 and the RPDB lookup is terminated. Otherwise, the RPDB program
1938 continues on the next rule.
1940 What is the action, semantically? The natural action is to select the
1941 nexthop and the output device. This is what
1942 Cisco IOS~\cite{IOS} does. Let us call it ``match \& set''.
1943 The Linux-2.2 approach is more flexible. The action includes
1944 lookups in destination-based routing tables and selecting
1945 a route from these tables according to the classic longest match algorithm.
1946 The ``match \& set'' approach is the simplest case of the Linux one. It is realized
1947 when a second level routing table contains a single default route.
1948 Recall that Linux-2.2 supports multiple tables
1949 managed with the \verb|ip route| command, described in the previous section.
1951 At startup time the kernel configures the default RPDB consisting of three
1952 rules:
1954 \begin{enumerate}
1955 \item Priority: 0, Selector: match anything, Action: lookup routing
1956 table \verb|local| (ID 255).
1957 The \verb|local| table is a special routing table containing
1958 high priority control routes for local and broadcast addresses.
1960 Rule 0 is special. It cannot be deleted or overridden.
1963 \item Priority: 32766, Selector: match anything, Action: lookup routing
1964 table \verb|main| (ID 254).
1965 The \verb|main| table is the normal routing table containing all non-policy
1966 routes. This rule may be deleted and/or overridden with other
1967 ones by the administrator.
1969 \item Priority: 32767, Selector: match anything, Action: lookup routing
1970 table \verb|default| (ID 253).
1971 The \verb|default| table is empty. It is reserved for some
1972 post-processing if no previous default rules selected the packet.
1973 This rule may also be deleted.
1975 \end{enumerate}
1977 Do not confuse routing tables with rules: rules point to routing tables,
1978 several rules may refer to one routing table and some routing tables
1979 may have no rules pointing to them. If the administrator deletes all the rules
1980 referring to a table, the table is not used, but it still exists
1981 and will disappear only after all the routes contained in it are deleted.
1984 \paragraph{Rule attributes:} Each RPDB entry has additional
1985 attributes. F.e.\ each rule has a pointer to some routing
1986 table. NAT and masquerading rules have an attribute to select new IP
1987 address to translate/masquerade. Besides that, rules have some
1988 optional attributes, which routes have, namely \verb|realms|.
1989 These values do not override those contained in the routing tables. They
1990 are only used if the route did not select any attributes.
1993 \paragraph{Rule types:} The RPDB may contain rules of the following
1994 types:
1995 \begin{itemize}
1996 \item \verb|unicast| --- the rule prescribes to return the route found
1997 in the routing table referenced by the rule.
1998 \item \verb|blackhole| --- the rule prescribes to silently drop the packet.
1999 \item \verb|unreachable| --- the rule prescribes to generate a ``Network
2000 is unreachable'' error.
2001 \item \verb|prohibit| --- the rule prescribes to generate
2002 ``Communication is administratively prohibited'' error.
2003 \item \verb|nat| --- the rule prescribes to translate the source address
2004 of the IP packet into some other value. More about NAT is
2005 in Appendix~\ref{ROUTE-NAT}, p.\pageref{ROUTE-NAT}.
2006 \end{itemize}
2009 \paragraph{Commands:} \verb|add|, \verb|delete| and \verb|show|
2010 (or \verb|list|).
2012 \subsection{{\tt ip rule add} --- insert a new rule\\
2013 	{\tt ip rule delete} --- delete a rule}
2014 \label{IP-RULE-ADD}
2016 \paragraph{Abbreviations:} \verb|add|, \verb|a|; \verb|delete|, \verb|del|,
2017 	\verb|d|.
2019 \paragraph{Arguments:}
2021 \begin{itemize}
2022 \item \verb|type TYPE| (default)
2024 --- the type of this rule. The list of valid types was given in the previous
2025 subsection.
2027 \item \verb|from PREFIX|
2029 --- select the source prefix to match.
2031 \item \verb|to PREFIX|
2033 --- select the destination prefix to match.
2035 \item \verb|iif NAME|
2037 --- select the incoming device to match. If the interface is loopback,
2038 the rule only matches packets originating from this host. This means that you
2039 may create separate routing tables for forwarded and local packets and,
2040 hence, completely segregate them.
2042 \item \verb|tos TOS| or \verb|dsfield TOS|
2044 --- select the TOS value to match.
2046 \item \verb|fwmark MARK|
2048 --- select the \verb|fwmark| value to match.
2050 \item \verb|priority PREFERENCE|
2052 --- the priority of this rule. Each rule should have an explicitly
2053 set {\em unique\/} priority value.
2054 \begin{NB}
2055   Really, for historical reasons \verb|ip rule add| does not require a
2056   priority value and allows them to be non-unique.
2057   If the user does not supplied a priority, it is selected by the kernel.
2058   If the user creates a rule with a priority value that
2059   already exists, the kernel does not reject the request. It adds
2060   the new rule before all old rules of the same priority.
2062   It is mistake in design, no more. And it will be fixed one day,
2063   so do not rely on this feature. Use explicit priorities.
2064 \end{NB}
2067 \item \verb|table TABLEID|
2069 --- the routing table identifier to lookup if the rule selector matches.
2071 \item \verb|realms FROM/TO|
2073 --- Realms to select if the rule matched and the routing table lookup
2074 succeeded. Realm \verb|TO| is only used if the route did not select
2075 any realm.
2077 \item \verb|nat ADDRESS|
2079 --- The base of the IP address block to translate (for source addresses).
2080 The \verb|ADDRESS| may be either the start of the block of NAT addresses
2081 (selected by NAT routes) or in linux-2.2 a local host address (or even zero).
2082 In the last case the router does not translate the packets,
2083 but masquerades them to this address; this feature disappered in 2.4.
2084 More about NAT is in Appendix~\ref{ROUTE-NAT},
2085 p.\pageref{ROUTE-NAT}.
2087 \end{itemize}
2089 \paragraph{Warning:} Changes to the RPDB made with these commands
2090 do not become active immediately. It is assumed that after
2091 a script finishes a batch of updates, it flushes the routing cache
2092 with \verb|ip route flush cache|.
2094 \paragraph{Examples:}
2095 \begin{itemize}
2096 \item Route packets with source addresses from 192.203.80/24
2097 according to routing table \verb|inr.ruhep|:
2098 \begin{verbatim}
2099 ip ru add from 192.203.80.0/24 table inr.ruhep prio 220
2100 \end{verbatim}
2102 \item Translate packet source address 193.233.7.83 into 192.203.80.144
2103 and route it according to table \#1 (actually, it is \verb|inr.ruhep|):
2104 \begin{verbatim}
2105 ip ru add from 193.233.7.83 nat 192.203.80.144 table 1 prio 320
2106 \end{verbatim}
2108 \item Delete the unused default rule:
2109 \begin{verbatim}
2110 ip ru del prio 32767
2111 \end{verbatim}
2113 \end{itemize}
2117 \subsection{{\tt ip rule show} --- list rules}
2118 \label{IP-RULE-SHOW}
2120 \paragraph{Abbreviations:} \verb|show|, \verb|list|, \verb|sh|, \verb|ls|, \verb|l|.
2123 \paragraph{Arguments:} Good news, this is one command that has no arguments.
2125 \paragraph{Output format:}
2127 \begin{verbatim}
2128 kuznet@amber:~ $ ip ru ls
2129 0:	from all lookup local
2130 200:	from 192.203.80.0/24 to 193.233.7.0/24 lookup main
2131 210:	from 192.203.80.0/24 to 192.203.80.0/24 lookup main
2132 220:	from 192.203.80.0/24 lookup inr.ruhep realms inr.ruhep/radio-msu
2133 300:	from 193.233.7.83 to 193.233.7.0/24 lookup main
2134 310:	from 193.233.7.83 to 192.203.80.0/24 lookup main
2135 320:	from 193.233.7.83 lookup inr.ruhep map-to 192.203.80.144
2136 32766:	from all lookup main
2137 kuznet@amber:~ $
2138 \end{verbatim}
2140 In the first column is the rule priority value followed
2141 by a colon. Then the selectors follow. Each key is prefixed
2142 with the same keyword that was used to create the rule.
2144 The keyword \verb|lookup| is followed by a routing table identifier,
2145 as it is recorded in the file \verb|/etc/iproute2/rt_tables|.
2147 If the rule does NAT (f.e.\ rule \#320), it is shown by the keyword
2148 \verb|map-to| followed by the start of the block of addresses to map.
2150 The sense of this example is pretty simple. The prefixes
2151 192.203.80.0/24 and 193.233.7.0/24 form the internal network, but
2152 they are routed differently when the packets leave it.
2153 Besides that, the host 193.233.7.83 is translated into
2154 another prefix to look like 192.203.80.144 when talking
2155 to the outer world.
2159 \section{{\tt ip maddress} --- multicast addresses management}
2160 \label{IP-MADDR}
2162 \paragraph{Object:} \verb|maddress| objects are multicast addresses.
2164 \paragraph{Commands:} \verb|add|, \verb|delete|, \verb|show| (or \verb|list|).
2166 \subsection{{\tt ip maddress show} --- list multicast addresses}
2168 \paragraph{Abbreviations:} \verb|show|, \verb|list|, \verb|sh|, \verb|ls|, \verb|l|.
2170 \paragraph{Arguments:}
2172 \begin{itemize}
2174 \item \verb|dev NAME| (default)
2176 --- the device name.
2178 \end{itemize}
2180 \paragraph{Output format:}
2182 \begin{verbatim}
2183 kuznet@alisa:~ $ ip maddr ls dummy
2184 2:  dummy
2185     link  33:33:00:00:00:01
2186     link  01:00:5e:00:00:01
2187     inet  224.0.0.1 users 2
2188     inet6 ff02::1
2189 kuznet@alisa:~ $
2190 \end{verbatim}
2192 The first line of the output shows the interface index and its name.
2193 Then the multicast address list follows. Each line starts with the
2194 protocol identifier. The word \verb|link| denotes a link layer
2195 multicast addresses.
2197 If a multicast address has more than one user, the number
2198 of users is shown after the \verb|users| keyword.
2200 One additional feature not present in the example above
2201 is the \verb|static| flag, which indicates that the address was joined
2202 with \verb|ip maddr add|. See the following subsection.
2206 \subsection{{\tt ip maddress add} --- add a multicast address\\
2207 	    {\tt ip maddress delete} --- delete a multicast address}
2209 \paragraph{Abbreviations:} \verb|add|, \verb|a|; \verb|delete|, \verb|del|, \verb|d|.
2211 \paragraph{Description:} these commands attach/detach
2212 a static link layer multicast address to listen on the interface.
2213 Note that it is impossible to join protocol multicast groups
2214 statically. This command only manages link layer addresses.
2217 \paragraph{Arguments:}
2219 \begin{itemize}
2220 \item \verb|address LLADDRESS| (default)
2222 --- the link layer multicast address.
2224 \item \verb|dev NAME|
2226 --- the device to join/leave this multicast address.
2228 \end{itemize}
2231 \paragraph{Example:} Let us continue with the example from the previous subsection.
2233 \begin{verbatim}
2234 netadm@alisa:~ # ip maddr add 33:33:00:00:00:01 dev dummy
2235 netadm@alisa:~ # ip -0 maddr ls dummy
2236 2:  dummy
2237     link  33:33:00:00:00:01 users 2 static
2238     link  01:00:5e:00:00:01
2239 netadm@alisa:~ # ip maddr del 33:33:00:00:00:01 dev dummy
2240 \end{verbatim}
2242 \begin{NB}
2243  Neither \verb|ip| nor the kernel check for multicast address validity.
2244  Particularly, this means that you can try to load a unicast address
2245  instead of a multicast address. Most drivers will ignore such addresses,
2246  but several (f.e.\ Tulip) will intern it to their on-board filter.
2247  The effects may be strange. Namely, the addresses become additional
2248  local link addresses and, if you loaded the address of another host
2249  to the router, wait for duplicated packets on the wire.
2250  It is not a bug, but rather a hole in the API and intra-kernel interfaces.
2251  This feature is really more useful for traffic monitoring, but using it
2252  with Linux-2.2 you {\em have to\/} be sure that the host is not
2253  a router and, especially, that it is not a transparent proxy or masquerading
2254  agent.
2255 \end{NB}
2259 \section{{\tt ip mroute} --- multicast routing cache management}
2260 \label{IP-MROUTE}
2262 \paragraph{Abbreviations:} \verb|mroute|, \verb|mr|.
2264 \paragraph{Object:} \verb|mroute| objects are multicast routing cache
2265 entries created by a user level mrouting daemon
2266 (f.e.\ \verb|pimd| or \verb|mrouted|).
2268 Due to the limitations of the current interface to the multicast routing
2269 engine, it is impossible to change \verb|mroute| objects administratively,
2270 so we may only display them. This limitation will be removed
2271 in the future.
2273 \paragraph{Commands:} \verb|show| (or \verb|list|).
2276 \subsection{{\tt ip mroute show} --- list mroute cache entries}
2278 \paragraph{Abbreviations:} \verb|show|, \verb|list|, \verb|sh|, \verb|ls|, \verb|l|.
2280 \paragraph{Arguments:}
2282 \begin{itemize}
2283 \item \verb|to PREFIX| (default)
2285 --- the prefix selecting the destination multicast addresses to list.
2288 \item \verb|iif NAME|
2290 --- the interface on which multicast packets are received.
2293 \item \verb|from PREFIX|
2295 --- the prefix selecting the IP source addresses of the multicast route.
2298 \end{itemize}
2300 \paragraph{Output format:}
2302 \begin{verbatim}
2303 kuznet@amber:~ $ ip mroute ls
2304 (193.232.127.6, 224.0.1.39)      Iif: unresolved
2305 (193.232.244.34, 224.0.1.40)     Iif: unresolved
2306 (193.233.7.65, 224.66.66.66)     Iif: eth0       Oifs: pimreg
2307 kuznet@amber:~ $
2308 \end{verbatim}
2310 Each line shows one (S,G) entry in the multicast routing cache,
2311 where S is the source address and G is the multicast group. \verb|Iif| is
2312 the interface on which multicast packets are expected to arrive.
2313 If the word \verb|unresolved| is there instead of the interface name,
2314 it means that the routing daemon still hasn't resolved this entry.
2315 The keyword \verb|oifs| is followed by a list of output interfaces, separated
2316 by spaces. If a multicast routing entry is created with non-trivial
2317 TTL scope, administrative distances are appended to the device names
2318 in the \verb|oifs| list.
2320 \paragraph{Statistics:} The \verb|-statistics| option also prints the
2321 number of packets and bytes forwarded along this route and
2322 the number of packets that arrived on the wrong interface, if this number is not zero.
2324 \begin{verbatim}
2325 kuznet@amber:~ $ ip -s mr ls 224.66/16
2326 (193.233.7.65, 224.66.66.66)     Iif: eth0       Oifs: pimreg
2327   9383 packets, 300256 bytes
2328 kuznet@amber:~ $
2329 \end{verbatim}
2332 \section{{\tt ip tunnel} --- tunnel configuration}
2333 \label{IP-TUNNEL}
2335 \paragraph{Abbreviations:} \verb|tunnel|, \verb|tunl|.
2337 \paragraph{Object:} \verb|tunnel| objects are tunnels, encapsulating
2338 packets in IPv4 packets and then sending them over the IP infrastructure.
2340 \paragraph{Commands:} \verb|add|, \verb|delete|, \verb|change|, \verb|show|
2341 (or \verb|list|).
2343 \paragraph{See also:} A more informal discussion of tunneling
2344 over IP and the \verb|ip tunnel| command can be found in~\cite{IP-TUNNELS}.
2346 \subsection{{\tt ip tunnel add} --- add a new tunnel\\
2347 	{\tt ip tunnel change} --- change an existing tunnel\\
2348 	{\tt ip tunnel delete} --- destroy a tunnel}
2350 \paragraph{Abbreviations:} \verb|add|, \verb|a|; \verb|change|, \verb|chg|;
2351 \verb|delete|, \verb|del|, \verb|d|.
2354 \paragraph{Arguments:}
2356 \begin{itemize}
2358 \item \verb|name NAME| (default)
2360 --- select the tunnel device name.
2362 \item \verb|mode MODE|
2364 --- set the tunnel mode. Three modes are currently available:
2365 	\verb|ipip|, \verb|sit| and \verb|gre|.
2367 \item \verb|remote ADDRESS|
2369 --- set the remote endpoint of the tunnel.
2371 \item \verb|local ADDRESS|
2373 --- set the fixed local address for tunneled packets.
2374 It must be an address on another interface of this host.
2376 \item \verb|ttl N|
2378 --- set a fixed TTL \verb|N| on tunneled packets.
2379 	\verb|N| is a number in the range 1--255. 0 is a special value
2380 	meaning that packets inherit the TTL value.
2381 		The default value is: \verb|inherit|.
2383 \item \verb|tos T| or \verb|dsfield T|
2385 --- set a fixed TOS \verb|T| on tunneled packets.
2386 		The default value is: \verb|inherit|.
2390 \item \verb|dev NAME|
2392 --- bind the tunnel to the device \verb|NAME| so that
2393 	tunneled packets will only be routed via this device and will
2394 	not be able to escape to another device when the route to endpoint changes.
2396 \item \verb|nopmtudisc|
2398 --- disable Path MTU Discovery on this tunnel.
2399 	It is enabled by default. Note that a fixed ttl is incompatible
2400 	with this option: tunnelling with a fixed ttl always makes pmtu discovery.
2402 \item \verb|key K|, \verb|ikey K|, \verb|okey K|
2404 --- (only GRE tunnels) use keyed GRE with key \verb|K|. \verb|K| is
2405 	either a number or an IP address-like dotted quad.
2406    The \verb|key| parameter sets the key to use in both directions.
2407    The \verb|ikey| and \verb|okey| parameters set different keys for input and output.
2410 \item \verb|csum|, \verb|icsum|, \verb|ocsum|
2412 --- (only GRE tunnels) generate/require checksums for tunneled packets.
2413    The \verb|ocsum| flag calculates checksums for outgoing packets.
2414    The \verb|icsum| flag requires that all input packets have the correct
2415    checksum. The \verb|csum| flag is equivalent to the combination
2416   ``\verb|icsum| \verb|ocsum|''.
2418 \item \verb|seq|, \verb|iseq|, \verb|oseq|
2420 --- (only GRE tunnels) serialize packets.
2421    The \verb|oseq| flag enables sequencing of outgoing packets.
2422    The \verb|iseq| flag requires that all input packets are serialized.
2423    The \verb|seq| flag is equivalent to the combination ``\verb|iseq| \verb|oseq|''.
2425 \begin{NB}
2426  I think this option does not
2427 	work. At least, I did not test it, did not debug it and
2428 	do not even understand how it is supposed to work or for what
2429 	purpose Cisco planned to use it. Do not use it.
2430 \end{NB}
2433 \end{itemize}
2435 \paragraph{Example:} Create a pointopoint IPv6 tunnel with maximal TTL of 32.
2436 \begin{verbatim}
2437 netadm@amber:~ # ip tunl add Cisco mode sit remote 192.31.7.104 \
2438     local 192.203.80.142 ttl 32
2439 \end{verbatim}
2441 \subsection{{\tt ip tunnel show} --- list tunnels}
2443 \paragraph{Abbreviations:} \verb|show|, \verb|list|, \verb|sh|, \verb|ls|, \verb|l|.
2446 \paragraph{Arguments:} None.
2448 \paragraph{Output format:}
2449 \begin{verbatim}
2450 kuznet@amber:~ $ ip tunl ls Cisco
2451 Cisco: ipv6/ip  remote 192.31.7.104  local 192.203.80.142  ttl 32
2452 kuznet@amber:~ $
2453 \end{verbatim}
2454 The line starts with the tunnel device name followed by a colon.
2455 Then the tunnel mode follows. The parameters of the tunnel are listed
2456 with the same keywords that were used when creating the tunnel.
2458 \paragraph{Statistics:}
2460 \begin{verbatim}
2461 kuznet@amber:~ $ ip -s tunl ls Cisco
2462 Cisco: ipv6/ip  remote 192.31.7.104  local 192.203.80.142  ttl 32
2463 RX: Packets    Bytes        Errors CsumErrs OutOfSeq Mcasts
2464     12566      1707516      0      0        0        0
2465 TX: Packets    Bytes        Errors DeadLoop NoRoute  NoBufs
2466     13445      1879677      0      0        0        0
2467 kuznet@amber:~ $
2468 \end{verbatim}
2469 Essentially, these numbers are the same as the numbers
2470 printed with {\tt ip -s link show}
2471 (sec.\ref{IP-LINK-SHOW}, p.\pageref{IP-LINK-SHOW}) but the tags are different
2472 to reflect that they are tunnel specific.
2473 \begin{itemize}
2474 \item \verb|CsumErrs| --- the total number of packets dropped
2475 because of checksum failures for a GRE tunnel with checksumming enabled.
2476 \item \verb|OutOfSeq| --- the total number of packets dropped
2477 because they arrived out of sequence for a GRE tunnel with
2478 serialization enabled.
2479 \item \verb|Mcasts| --- the total number of multicast packets
2480 received on a broadcast GRE tunnel.
2481 \item \verb|DeadLoop| --- the total number of packets which were not
2482 transmitted because the tunnel is looped back to itself.
2483 \item \verb|NoRoute| --- the total number of packets which were not
2484 transmitted because there is no IP route to the remote endpoint.
2485 \item \verb|NoBufs| --- the total number of packets which were not
2486 transmitted because the kernel failed to allocate a buffer.
2487 \end{itemize}
2490 \section{{\tt ip monitor} and {\tt rtmon} --- state monitoring}
2491 \label{IP-MONITOR}
2493 The \verb|ip| utility can monitor the state of devices, addresses
2494 and routes continuously. This option has a slightly different format.
2495 Namely,
2496 the \verb|monitor| command is the first in the command line and then
2497 the object list follows:
2498 \begin{verbatim}
2499   ip monitor [ file FILE ] [ all | OBJECT-LIST ]
2500 \end{verbatim}
2501 \verb|OBJECT-LIST| is the list of object types that we want to monitor.
2502 It may contain \verb|link|, \verb|address| and \verb|route|.
2503 If no \verb|file| argument is given, \verb|ip| opens RTNETLINK,
2504 listens on it and dumps state changes in the format described
2505 in previous sections.
2507 If a file name is given, it does not listen on RTNETLINK,
2508 but opens the file containing RTNETLINK messages saved in binary format
2509 and dumps them. Such a history file can be generated with the
2510 \verb|rtmon| utility. This utility has a command line syntax similar to
2511 \verb|ip monitor|.
2512 Ideally, \verb|rtmon| should be started before
2513 the first network configuration command is issued. F.e.\ if
2514 you insert:
2515 \begin{verbatim}
2516   rtmon file /var/log/rtmon.log
2517 \end{verbatim}
2518 in a startup script, you will be able to view the full history
2519 later.
2521 Certainly, it is possible to start \verb|rtmon| at any time.
2522 It prepends the history with the state snapshot dumped at the moment
2523 of starting.
2526 \section{Route realms and policy propagation, {\tt rtacct}}
2527 \label{RT-REALMS}
2529 On routers using OSPF ASE or, especially, the BGP protocol, routing
2530 tables may be huge. If we want to classify or to account for the packets
2531 per route, we will have to keep lots of information. Even worse, if we
2532 want to distinguish the packets not only by their destination, but
2533 also by their source, the task gets quadratic complexity and its solution
2534 is physically impossible.
2536 One approach to propagating the policy from routing protocols
2537 to the forwarding engine has been proposed in~\cite{IOS-BGP-PP}.
2538 Essentially, Cisco Policy Propagation via BGP is based on the fact
2539 that dedicated routers all have the RIB (Routing Information Base)
2540 close to the forwarding engine, so policy routing rules can
2541 check all the route attributes, including ASPATH information
2542 and community strings.
2544 The Linux architecture, splitting the RIB (maintained by a user level
2545 daemon) and the kernel based FIB (Forwarding Information Base),
2546 does not allow such a simple approach.
2548 It is to our fortune because there is another solution
2549 which allows even more flexible policy and richer semantics.
2551 Namely, routes can be clustered together in user space, based on their
2552 attributes.  F.e.\ a BGP router knows route ASPATH, its community;
2553 an OSPF router knows the route tag or its area. The administrator, when adding
2554 routes manually, also knows their nature. Providing that the number of such
2555 aggregates (we call them {\em realms\/}) is low, the task of full
2556 classification both by source and destination becomes quite manageable.
2558 So each route may be assigned to a realm. It is assumed that
2559 this identification is made by a routing daemon, but static routes
2560 can also be handled manually with \verb|ip route| (see sec.\ref{IP-ROUTE},
2561 p.\pageref{IP-ROUTE}).
2562 \begin{NB}
2563   There is a patch to \verb|gated|, allowing classification of routes
2564   to realms with all the set of policy rules implemented in \verb|gated|:
2565   by prefix, by ASPATH, by origin, by tag etc.
2566 \end{NB}
2568 To facilitate the construction (f.e.\ in case the routing
2569 daemon is not aware of realms), missing realms may be completed
2570 with routing policy rules, see sec.~\ref{IP-RULE}, p.\pageref{IP-RULE}.
2572 For each packet the kernel calculates a tuple of realms: source realm
2573 and destination realm, using the following algorithm:
2575 \begin{enumerate}
2576 \item If the route has a realm, the destination realm of the packet is set to it.
2577 \item If the rule has a source realm, the source realm of the packet is set to it.
2578 If the destination realm was not inherited from the route and the rule has a destination realm,
2579 it is also set.
2580 \item If at least one of the realms is still unknown, the kernel finds
2581 the reversed route to the source of the packet.
2582 \item If the source realm is still unknown, get it from the reversed route.
2583 \item If one of the realms is still unknown, swap the realms of reversed
2584 routes and apply step 2 again.
2585 \end{enumerate}
2587 After this procedure is completed we know what realm the packet
2588 arrived from and the realm where it is going to propagate to.
2589 If some of the realms are unknown, they are initialized to zero
2590 (or realm \verb|unknown|).
2592 The main application of realms is the TC \verb|route| classifier~\cite{TC-CREF},
2593 where they are used to help assign packets to traffic classes,
2594 to account, police and schedule them according to this
2595 classification.
2597 A much simpler but still very useful application is incoming packet
2598 accounting by realms. The kernel gathers a packet statistics summary
2599 which can be viewed with the \verb|rtacct| utility.
2600 \begin{verbatim}
2601 kuznet@amber:~ $ rtacct russia
2602 Realm      BytesTo    PktsTo     BytesFrom  PktsFrom
2603 russia     20576778   169176     47080168   153805
2604 kuznet@amber:~ $
2605 \end{verbatim}
2606 This shows that this router received 153805 packets from
2607 the realm \verb|russia| and forwarded 169176 packets to \verb|russia|.
2608 The realm \verb|russia| consists of routes with ASPATHs not leaving
2609 Russia.
2611 Note that locally originating packets are not accounted here,
2612 \verb|rtacct| shows incoming packets only. Using the \verb|route|
2613 classifier (see~\cite{TC-CREF}) you can get even more detailed
2614 accounting information about outgoing packets, optionally
2615 summarizing traffic not only by source or destination, but
2616 by any pair of source and destination realms.
2619 \begin{thebibliography}{99}
2620 \addcontentsline{toc}{section}{References}
2621 \bibitem{RFC-NDISC} T.~Narten, E.~Nordmark, W.~Simpson.
2622 ``Neighbor Discovery for IP Version 6 (IPv6)'', RFC-2461.
2624 \bibitem{RFC-ADDRCONF} S.~Thomson, T.~Narten.
2625 ``IPv6 Stateless Address Autoconfiguration'', RFC-2462.
2627 \bibitem{RFC1812} F.~Baker.
2628 ``Requirements for IP Version 4 Routers'', RFC-1812.
2630 \bibitem{RFC1122} R.~T.~Braden.
2631 ``Requirements for Internet hosts --- communication layers'', RFC-1122.
2633 \bibitem{IOS} ``Cisco IOS Release 12.0 Network Protocols
2634 Command Reference, Part 1'' and
2635 ``Cisco IOS Release 12.0 Quality of Service Solutions
2636 Configuration Guide: Configuring Policy-Based Routing'',\\
2637 http://www.cisco.com/univercd/cc/td/doc/product/software/ios120.
2639 \bibitem{IP-TUNNELS} A.~N.~Kuznetsov.
2640 ``Tunnels over IP in Linux-2.2'', \\
2641 In: {\tt ftp://ftp.inr.ac.ru/ip-routing/iproute2-current.tar.gz}.
2643 \bibitem{TC-CREF} A.~N.~Kuznetsov. ``TC Command Reference'',\\
2644 In: {\tt ftp://ftp.inr.ac.ru/ip-routing/iproute2-current.tar.gz}.
2646 \bibitem{IOS-BGP-PP} ``Cisco IOS Release 12.0 Quality of Service Solutions
2647 Configuration Guide: Configuring QoS Policy Propagation via
2648 Border Gateway Protocol'',\\
2649 http://www.cisco.com/univercd/cc/td/doc/product/software/ios120.
2651 \bibitem{RFC-DHCP} R.~Droms.
2652 ``Dynamic Host Configuration Protocol.'', RFC-2131
2654 \end{thebibliography}
2659 \appendix
2660 \addcontentsline{toc}{section}{Appendix}
2662 \section{Source address selection}
2663 \label{ADDR-SEL}
2665 When a host creates an IP packet, it must select some source
2666 address. Correct source address selection is a critical procedure,
2667 because it gives the receiver the information needed to deliver a
2668 reply. If the source is selected incorrectly, in the best case,
2669 the backward path may appear different to the forward one which
2670 is harmful for performance. In the worst case, when the addresses
2671 are administratively scoped, the reply may be lost entirely.
2673 Linux-2.2 selects source addresses using the following algorithm:
2675 \begin{itemize}
2676 \item
2677 The application may select a source address explicitly with \verb|bind(2)|
2678 syscall or supplying it to \verb|sendmsg(2)| via the ancillary data object
2679 \verb|IP_PKTINFO|. In this case the kernel only checks the validity
2680 of the address and never tries to ``improve'' an incorrect user choice,
2681 generating an error instead.
2682 \begin{NB}
2683  Never say ``Never''. The sysctl option \verb|ip_dynaddr| breaks
2684  this axiom. It has been made deliberately with the purpose
2685  of automatically reselecting the address on hosts with dynamic dial-out interfaces.
2686  However, this hack {\em must not\/} be used on multihomed hosts
2687  and especially on routers: it would break them.
2688 \end{NB}
2691 \item Otherwise, IP routing tables can contain an explicit source
2692 address hint for this destination. The hint is set with the \verb|src| parameter
2693 to the \verb|ip route| command, sec.\ref{IP-ROUTE}, p.\pageref{IP-ROUTE}.
2696 \item Otherwise, the kernel searches through the list of addresses
2697 attached to the interface through which the packets will be routed.
2698 The search strategies are different for IP and IPv6. Namely:
2700 \begin{itemize}
2701 \item IPv6 searches for the first valid, not deprecated address
2702 with the same scope as the destination.
2704 \item IP searches for the first valid address with a scope wider
2705 than the scope of the destination but it prefers addresses
2706 which fall to the same subnet as the nexthop of the route
2707 to the destination. Unlike IPv6, the scopes of IPv4 destinations
2708 are not encoded in their addresses but are supplied
2709 in routing tables instead (the \verb|scope| parameter to the \verb|ip route| command,
2710 sec.\ref{IP-ROUTE}, p.\pageref{IP-ROUTE}).
2712 \end{itemize}
2715 \item Otherwise, if the scope of the destination is \verb|link| or \verb|host|,
2716 the algorithm fails and returns a zero source address.
2718 \item Otherwise, all interfaces are scanned to search for an address
2719 with an appropriate scope. The loopback device \verb|lo| is always the first
2720 in the search list, so that if an address with global scope (not 127.0.0.1!)
2721 is configured on loopback, it is always preferred.
2723 \end{itemize}
2726 \section{Proxy ARP/NDISC}
2727 \label{PROXY-NEIGH}
2729 Routers may answer ARP/NDISC solicitations on behalf of other hosts.
2730 In Linux-2.2 proxy ARP on an interface may be enabled
2731 by setting the kernel \verb|sysctl| variable
2732 \verb|/proc/sys/net/ipv4/conf/<dev>/proxy_arp| to 1. After this, the router
2733 starts to answer ARP requests on the interface \verb|<dev>|, provided
2734 the route to the requested destination does {\em not\/} go back via the same
2735 device.
2737 The variable \verb|/proc/sys/net/ipv4/conf/all/proxy_arp| enables proxy
2738 ARP on all the IP devices.
2740 However, this approach fails in the case of IPv6 because the router
2741 must join the solicited node multicast address to listen for the corresponding
2742 NDISC queries. It means that proxy NDISC is possible only on a per destination
2743 basis.
2745 Logically, proxy ARP/NDISC is not a kernel task. It can easily be implemented
2746 in user space. However, similar functionality was present in BSD kernels
2747 and in Linux-2.0, so we have to preserve it at least to the extent that
2748 is standardized in BSD.
2749 \begin{NB}
2750   Linux-2.0 ARP had a feature called {\em subnet\/} proxy ARP.
2751   It is replaced with the sysctl flag in Linux-2.2.
2752 \end{NB}
2755 The \verb|ip| utility provides a way to manage proxy ARP/NDISC
2756 with the \verb|ip neigh| command, namely:
2757 \begin{verbatim}
2758   ip neigh add proxy ADDRESS [ dev NAME ]
2759 \end{verbatim}
2760 adds a new proxy ARP/NDISC record and
2761 \begin{verbatim}
2762   ip neigh del proxy ADDRESS [ dev NAME ]
2763 \end{verbatim}
2764 deletes it.
2766 If the name of the device is not given, the router will answer solicitations
2767 for address \verb|ADDRESS| on all devices, otherwise it will only serve
2768 the device \verb|NAME|. Even if the proxy entry is created with
2769 \verb|ip neigh|, the router {\em will not\/} answer a query if the route
2770 to the destination goes back via the interface from which the solicitation
2771 was received.
2773 It is important to emphasize that proxy entries have {\em no\/}
2774 parameters other than these (IP/IPv6 address and optional device).
2775 Particularly, the entry does not store any link layer address.
2776 It always advertises the station address of the interface
2777 on which it sends advertisements (i.e. it's own station address).
2779 \section{Route NAT status}
2780 \label{ROUTE-NAT}
2782 NAT (or ``Network Address Translation'') remaps some parts
2783 of the IP address space into other ones. Linux-2.2 route NAT is supposed
2784 to be used to facilitate policy routing by rewriting addresses
2785 to other routing domains or to help while renumbering sites
2786 to another prefix.
2788 \paragraph{What it is not:}
2789 It is necessary to emphasize that {\em it is not supposed\/}
2790 to be used to compress address space or to split load.
2791 This is not missing functionality but a design principle.
2792 Route NAT is {\em stateless\/}. It does not hold any state
2793 about translated sessions. This means that it handles any number
2794 of sessions flawlessly. But it also means that it is {\em static\/}.
2795 It cannot detect the moment when the last TCP client stops
2796 using an address. For the same reason, it will not help to split
2797 load between several servers.
2798 \begin{NB}
2799 It is a pretty commonly held belief that it is useful to split load between
2800 several servers with NAT. This is a mistake. All you get from this
2801 is the requirement that the router keep the state of all the TCP connections
2802 going via it. Well, if the router is so powerful, run apache on it. 8)
2803 \end{NB}
2805 The second feature: it does not touch packet payload,
2806 does not try to ``improve'' broken protocols by looking
2807 through its data and mangling it. It mangles IP addresses,
2808 only IP addresses and nothing but IP addresses.
2809 This also, is not missing any functionality.
2811 To resume: if you need to compress address space or keep
2812 active FTP clients happy, your choice is not route NAT but masquerading,
2813 port forwarding, NAPT etc.
2814 \begin{NB}
2815 By the way, you may also want to look at
2816 http://www.suse.com/\~mha/HyperNews/get/linux-ip-nat.html
2817 \end{NB}
2820 \paragraph{How it works.}
2821 Some part of the address space is reserved for dummy addresses
2822 which will look for all the world like some host addresses
2823 inside your network. No other hosts may use these addresses,
2824 however other routers may also be configured to translate them.
2825 \begin{NB}
2826 A great advantage of route NAT is that it may be used not
2827 only in stub networks but in environments with arbitrarily complicated
2828 structure. It does not firewall, it {\em forwards.}
2829 \end{NB}
2830 These addresses are selected by the \verb|ip route| command
2831 (sec.\ref{IP-ROUTE-ADD}, p.\pageref{IP-ROUTE-ADD}). F.e.\
2832 \begin{verbatim}
2833   ip route add nat 192.203.80.144 via 193.233.7.83
2834 \end{verbatim}
2835 states that the single address 192.203.80.144 is a dummy NAT address.
2836 For all the world it looks like a host address inside our network.
2837 For neighbouring hosts and routers it looks like the local address
2838 of the translating router. The router answers ARP for it, advertises
2839 this address as routed via it, {\em et al\/}. When the router
2840 receives a packet destined for 192.203.80.144, it replaces
2841 this address with 193.233.7.83 which is the address of some real
2842 host and forwards the packet. If you need to remap
2843 blocks of addresses, you may use a command like:
2844 \begin{verbatim}
2845   ip route add nat 192.203.80.192/26 via 193.233.7.64
2846 \end{verbatim}
2847 This command will map a block of 63 addresses 192.203.80.192-255 to
2848 193.233.7.64-127.
2850 When an internal host (193.233.7.83 in the example above)
2851 sends something to the outer world and these packets are forwarded
2852 by our router, it should translate the source address 193.233.7.83
2853 into 192.203.80.144. This task is solved by setting a special
2854 policy rule (sec.\ref{IP-RULE-ADD}, p.\pageref{IP-RULE-ADD}):
2855 \begin{verbatim}
2856   ip rule add prio 320 from 193.233.7.83 nat 192.203.80.144
2857 \end{verbatim}
2858 This rule says that the source address 193.233.7.83
2859 should be translated into 192.203.80.144 before forwarding.
2860 It is important that the address after the \verb|nat| keyword
2861 is some NAT address, declared by {\tt ip route add nat}.
2862 If it is just a random address the router will not map to it.
2863 \begin{NB}
2864 The exception is when the address is a local address of this
2865 router (or 0.0.0.0) and masquerading is configured in the linux-2.2
2866 kernel. In this case the router will masquerade the packets as this address.
2867 If 0.0.0.0 is selected, the result is equivalent to one
2868 obtained with firewalling rules. Otherwise, you have the way
2869 to order Linux to masquerade to this fixed address.
2870 NAT mechanism used in linux-2.4 is more flexible than
2871 masquerading, so that this feature has lost meaning and disabled.
2872 \end{NB}
2874 If the network has non-trivial internal structure, it is
2875 useful and even necessary to add rules disabling translation
2876 when a packet does not leave this network. Let us return to the
2877 example from sec.\ref{IP-RULE-SHOW} (p.\pageref{IP-RULE-SHOW}).
2878 \begin{verbatim}
2879 300:	from 193.233.7.83 to 193.233.7.0/24 lookup main
2880 310:	from 193.233.7.83 to 192.203.80.0/24 lookup main
2881 320:	from 193.233.7.83 lookup inr.ruhep map-to 192.203.80.144
2882 \end{verbatim}
2883 This block of rules causes normal forwarding when
2884 packets from 193.233.7.83 do not leave networks 193.233.7/24
2885 and 192.203.80/24. Also, if the \verb|inr.ruhep| table does not
2886 contain a route to the destination (which means that the routing
2887 domain owning addresses from 192.203.80/24 is dead), no translation
2888 will occur. Otherwise, the packets are translated.
2890 \paragraph{How to only translate selected ports:}
2891 If you only want to translate selected ports (f.e.\ http)
2892 and leave the rest intact, you may use \verb|ipchains|
2893 to \verb|fwmark| a class of packets.
2894 Suppose you did and all the packets from 193.233.7.83
2895 destined for port 80 are marked with marker 0x1234 in input fwchain.
2896 In this case you may replace rule \#320 with:
2897 \begin{verbatim}
2898 320:	from 193.233.7.83 fwmark 1234 lookup main map-to 192.203.80.144
2899 \end{verbatim}
2900 and translation will only be enabled for outgoing http requests.
2902 \section{Example: minimal host setup}
2903 \label{EXAMPLE-SETUP}
2905 The following script gives an example of a fault safe
2906 setup of IP (and IPv6, if it is compiled into the kernel)
2907 in the common case of a node attached to a single broadcast
2908 network. A more advanced script, which may be used both on multihomed
2909 hosts and on routers, is described in the following
2910 section.
2912 The utilities used in the script may be found in the
2913 directory ftp://ftp.inr.ac.ru/ip-routing/:
2914 \begin{enumerate}
2915 \item \verb|ip| --- package \verb|iproute2|.
2916 \item \verb|arping| --- package \verb|iputils|.
2917 \item \verb|rdisc| --- package \verb|iputils|.
2918 \end{enumerate}
2919 \begin{NB}
2920 It also refers to a DHCP client, \verb|dhcpcd|. I should refrain from
2921 recommending a good DHCP client to use. All that I can
2922 say is that ISC \verb|dhcp-2.0b1pl6| patched with the patch that
2923 can be found in the \verb|dhcp.bootp.rarp| subdirectory of
2924 the same ftp site {\em does\/} work,
2925 at least on Ethernet and Token Ring.
2926 \end{NB}
2928 \begin{verbatim}
2929 #! /bin/bash
2930 \end{verbatim}
2931 \begin{flushleft}
2932 \# {\bf Usage: \verb|ifone ADDRESS[/PREFIX-LENGTH] [DEVICE]|}\\
2933 \# {\bf Parameters:}\\
2934 \# \$1 --- Static IP address, optionally followed by prefix length.\\
2935 \# \$2 --- Device name. If it is missing, \verb|eth0| is asssumed.\\
2936 \# F.e. \verb|ifone 193.233.7.90|
2937 \end{flushleft}
2938 \begin{verbatim}
2939 dev=$2
2940 : ${dev:=eth0}
2941 ipaddr=
2942 \end{verbatim}
2943 \# Parse IP address, splitting prefix length.
2944 \begin{verbatim}
2945 if [ "$1" != "" ]; then
2946   ipaddr=${1%/*}
2947   if [ "$1" != "$ipaddr" ]; then
2948     pfxlen=${1#*/}
2949   fi
2950   : ${pfxlen:=24}
2951 fi
2952 pfx="${ipaddr}/${pfxlen}"
2953 \end{verbatim}
2955 \begin{flushleft}
2956 \# {\bf Step 0} --- enable loopback.\\
2958 \# This step is necessary on any networked box before attempt\\
2959 \# to configure any other device.\\
2960 \end{flushleft}
2961 \begin{verbatim}
2962 ip link set up dev lo
2963 ip addr add 127.0.0.1/8 dev lo brd + scope host
2964 \end{verbatim}
2965 \begin{flushleft}
2966 \# IPv6 autoconfigure themself on loopback.\\
2968 \# If user gave loopback as device, we add the address as alias and exit.
2969 \end{flushleft}
2970 \begin{verbatim}
2971 if [ "$dev" = "lo" ]; then
2972   if [ "$ipaddr" != "" -a  "$ipaddr" != "127.0.0.1" ]; then
2973     ip address add $ipaddr dev $dev
2974     exit $?
2975   fi
2976   exit 0
2977 fi
2978 \end{verbatim}
2980 \noindent\# {\bf Step 1} --- enable device \verb|$dev|
2982 \begin{verbatim}
2983 if ! ip link set up dev $dev ; then
2984   echo "Cannot enable interface $dev. Aborting." 1>&2
2985   exit 1
2986 fi
2987 \end{verbatim}
2988 \begin{flushleft}
2989 \# The interface is \verb|UP|. IPv6 started stateless autoconfiguration itself,\\
2990 \# and its configuration finishes here. However,\\
2991 \# IP still needs some static preconfigured address.
2992 \end{flushleft}
2993 \begin{verbatim}
2994 if [ "$ipaddr" = "" ]; then
2995   echo "No address for $dev is configured, trying DHCP..." 1>&2
2996   dhcpcd
2997   exit $?
2998 fi
2999 \end{verbatim}
3001 \begin{flushleft}
3002 \# {\bf Step 2} --- IP Duplicate Address Detection~\cite{RFC-DHCP}.\\
3003 \# Send two probes and wait for result for 3 seconds.\\
3004 \# If the interface opens slower f.e.\ due to long media detection,\\
3005 \# you want to increase the timeout.\\
3006 \end{flushleft}
3007 \begin{verbatim}
3008 if ! arping -q -c 2 -w 3 -D -I $dev $ipaddr ; then
3009   echo "Address $ipaddr is busy, trying DHCP..." 1>&2
3010   dhcpcd
3011   exit $?
3012 fi
3013 \end{verbatim}
3014 \begin{flushleft}
3015 \# OK, the address is unique, we may add it on the interface.\\
3017 \# {\bf Step 3} --- Configure the address on the interface.
3018 \end{flushleft}
3020 \begin{verbatim}
3021 if ! ip address add $pfx brd + dev $dev; then
3022   echo "Failed to add $pfx on $dev, trying DHCP..." 1>&2
3023   dhcpcd
3024   exit $?
3025 fi
3026 \end{verbatim}
3028 \noindent\# {\bf Step 4} --- Announce our presence on the link.
3029 \begin{verbatim}
3030 arping -A -c 1 -I $dev $ipaddr
3031 noarp=$?
3032 ( sleep 2;
3033   arping -U -c 1 -I $dev $ipaddr ) >& /dev/null </dev/null &
3034 \end{verbatim}
3036 \begin{flushleft}
3037 \# {\bf Step 5} (optional) --- Add some control routes.\\
3039 \# 1. Prohibit link local multicast addresses.\\
3040 \# 2. Prohibit link local (alias, limited) broadcast.\\
3041 \# 3. Add default multicast route.
3042 \end{flushleft}
3043 \begin{verbatim}
3044 ip route add unreachable 224.0.0.0/24
3045 ip route add unreachable 255.255.255.255
3046 if [ `ip link ls $dev | grep -c MULTICAST` -ge 1 ]; then
3047   ip route add 224.0.0.0/4 dev $dev scope global
3048 fi
3049 \end{verbatim}
3051 \begin{flushleft}
3052 \# {\bf Step 6} --- Add fallback default route with huge metric.\\
3053 \# If a proxy ARP server is present on the interface, we will be\\
3054 \# able to talk to all the Internet without further configuration.\\
3055 \# It is not so cheap though and we still hope that this route\\
3056 \# will be overridden by more correct one by rdisc.\\
3057 \# Do not make this step if the device is not ARPable,\\
3058 \# because dead nexthop detection does not work on them.
3059 \end{flushleft}
3060 \begin{verbatim}
3061 if [ "$noarp" = "0" ]; then
3062   ip ro add default dev $dev metric 30000 scope global
3063 fi
3064 \end{verbatim}
3066 \begin{flushleft}
3067 \# {\bf Step 7} --- Restart router discovery and exit.
3068 \end{flushleft}
3069 \begin{verbatim}
3070 killall -HUP rdisc || rdisc -fs
3071 exit 0
3072 \end{verbatim}
3075 \section{Example: {\protect\tt ifcfg} --- interface address management}
3076 \label{EXAMPLE-IFCFG}
3078 This is a simplistic script replacing one option of \verb|ifconfig|,
3079 namely, IP address management. It not only adds
3080 addresses, but also carries out Duplicate Address Detection~\cite{RFC-DHCP},
3081 sends unsolicited ARP to update the caches of other hosts sharing
3082 the interface, adds some control routes and restarts Router Discovery
3083 when it is necessary.
3085 I strongly recommend using it {\em instead\/} of \verb|ifconfig| both
3086 on hosts and on routers.
3088 \begin{verbatim}
3089 #! /bin/bash
3090 \end{verbatim}
3091 \begin{flushleft}
3092 \# {\bf Usage: \verb?ifcfg DEVICE[:ALIAS] [add|del] ADDRESS[/LENGTH] [PEER]?}\\
3093 \# {\bf Parameters:}\\
3094 \# ---Device name. It may have alias suffix, separated by colon.\\
3095 \# ---Command: add, delete or stop.\\
3096 \# ---IP address, optionally followed by prefix length.\\
3097 \# ---Optional peer address for pointopoint interfaces.\\
3098 \# F.e. \verb|ifcfg eth0 193.233.7.90/24|
3100 \noindent\# This function determines, whether it is router or host.\\
3101 \# It returns 0, if the host is apparently not router.
3102 \end{flushleft}
3103 \begin{verbatim}
3104 CheckForwarding () {
3105   local sbase fwd
3106   sbase=/proc/sys/net/ipv4/conf
3107   fwd=0
3108   if [ -d $sbase ]; then
3109     for dir in $sbase/*/forwarding; do
3110       fwd=$[$fwd + `cat $dir`]
3111     done
3112   else
3113     fwd=2
3114   fi
3115   return $fwd
3117 \end{verbatim}
3118 \begin{flushleft}
3119 \# This function restarts Router Discovery.\\
3120 \end{flushleft}
3121 \begin{verbatim}
3122 RestartRDISC () {
3123   killall -HUP rdisc || rdisc -fs
3125 \end{verbatim}
3126 \begin{flushleft}
3127 \# Calculate ABC "natural" mask length\\
3128 \# Arg: \$1 = dotquad address
3129 \end{flushleft}
3130 \begin{verbatim}
3131 ABCMaskLen () {
3132   local class;
3133   class=${1%%.*}
3134   if [ $class -eq 0 -o $class -ge 224 ]; then return 0
3135   elif [ $class -ge 192 ]; then return 24
3136   elif [ $class -ge 128 ]; then return 16
3137   else  return 8 ; fi
3139 \end{verbatim}
3142 \begin{flushleft}
3143 \# {\bf MAIN()}\\
3145 \# Strip alias suffix separated by colon.
3146 \end{flushleft}
3147 \begin{verbatim}
3148 label="label $1"
3149 ldev=$1
3150 dev=${1%:*}
3151 if [ "$dev" = "" -o "$1" = "help" ]; then
3152   echo "Usage: ifcfg DEV [[add|del [ADDR[/LEN]] [PEER] | stop]" 1>&2
3153   echo "       add - add new address" 1>&2
3154   echo "       del - delete address" 1>&2
3155   echo "       stop - completely disable IP" 1>&2
3156   exit 1
3157 fi
3158 shift
3160 CheckForwarding
3161 fwd=$?
3162 \end{verbatim}
3163 \begin{flushleft}
3164 \# Parse command. If it is ``stop'', flush and exit.
3165 \end{flushleft}
3166 \begin{verbatim}
3167 deleting=0
3168 case "$1" in
3169 add) shift ;;
3170 stop)
3171   if [ "$ldev" != "$dev" ]; then
3172     echo "Cannot stop alias $ldev" 1>&2
3173     exit 1;
3174   fi
3175   ip -4 addr flush dev $dev $label || exit 1
3176   if [ $fwd -eq 0 ]; then RestartRDISC; fi
3177   exit 0 ;;
3178 del*)
3179   deleting=1; shift ;;
3181 esac
3182 \end{verbatim}
3183 \begin{flushleft}
3184 \# Parse prefix, split prefix length, separated by slash.
3185 \end{flushleft}
3186 \begin{verbatim}
3187 ipaddr=
3188 pfxlen=
3189 if [ "$1" != "" ]; then
3190   ipaddr=${1%/*}
3191   if [ "$1" != "$ipaddr" ]; then
3192     pfxlen=${1#*/}
3193   fi
3194   if [ "$ipaddr" = "" ]; then
3195     echo "$1 is bad IP address." 1>&2
3196     exit 1
3197   fi
3198 fi
3199 shift
3200 \end{verbatim}
3201 \begin{flushleft}
3202 \# If peer address is present, prefix length is 32.\\
3203 \# Otherwise, if prefix length was not given, guess it.
3204 \end{flushleft}
3205 \begin{verbatim}
3206 peer=$1
3207 if [ "$peer" != "" ]; then
3208   if [ "$pfxlen" != "" -a "$pfxlen" != "32" ]; then
3209     echo "Peer address with non-trivial netmask." 1>&2
3210     exit 1
3211   fi
3212   pfx="$ipaddr peer $peer"
3213 else
3214   if [ "$pfxlen" = "" ]; then
3215     ABCMaskLen $ipaddr
3216     pfxlen=$?
3217   fi
3218   pfx="$ipaddr/$pfxlen"
3219 fi
3220 if [ "$ldev" = "$dev" -a "$ipaddr" != "" ]; then
3221   label=
3222 fi
3223 \end{verbatim}
3224 \begin{flushleft}
3225 \# If deletion was requested, delete the address and restart RDISC
3226 \end{flushleft}
3227 \begin{verbatim}
3228 if [ $deleting -ne 0 ]; then
3229   ip addr del $pfx dev $dev $label || exit 1
3230   if [ $fwd -eq 0 ]; then RestartRDISC; fi
3231   exit 0
3232 fi
3233 \end{verbatim}
3234 \begin{flushleft}
3235 \# Start interface initialization.\\
3237 \# {\bf Step 0} --- enable device \verb|$dev|
3238 \end{flushleft}
3239 \begin{verbatim}
3240 if ! ip link set up dev $dev ; then
3241   echo "Error: cannot enable interface $dev." 1>&2
3242   exit 1
3243 fi
3244 if [ "$ipaddr" = "" ]; then exit 0; fi
3245 \end{verbatim}
3246 \begin{flushleft}
3247 \# {\bf Step 1} --- IP Duplicate Address Detection~\cite{RFC-DHCP}.\\
3248 \# Send two probes and wait for result for 3 seconds.\\
3249 \# If the interface opens slower f.e.\ due to long media detection,\\
3250 \# you want to increase the timeout.\\
3251 \end{flushleft}
3252 \begin{verbatim}
3253 if ! arping -q -c 2 -w 3 -D -I $dev $ipaddr ; then
3254   echo "Error: some host already uses address $ipaddr on $dev." 1>&2
3255   exit 1
3256 fi
3257 \end{verbatim}
3258 \begin{flushleft}
3259 \# OK, the address is unique. We may add it to the interface.\\
3261 \# {\bf Step 2} --- Configure the address on the interface.
3262 \end{flushleft}
3263 \begin{verbatim}
3264 if ! ip address add $pfx brd + dev $dev $label; then
3265   echo "Error: failed to add $pfx on $dev." 1>&2
3266   exit 1
3267 fi
3268 \end{verbatim}
3269 \noindent\# {\bf Step 3} --- Announce our presence on the link
3270 \begin{verbatim}
3271 arping -q -A -c 1 -I $dev $ipaddr
3272 noarp=$?
3273 ( sleep 2 ;
3274   arping -q -U -c 1 -I $dev $ipaddr ) >& /dev/null </dev/null &
3275 \end{verbatim}
3276 \begin{flushleft}
3277 \# {\bf Step 4} (optional) --- Add some control routes.\\
3279 \# 1. Prohibit link local multicast addresses.\\
3280 \# 2. Prohibit link local (alias, limited) broadcast.\\
3281 \# 3. Add default multicast route.
3282 \end{flushleft}
3283 \begin{verbatim}
3284 ip route add unreachable 224.0.0.0/24 >& /dev/null
3285 ip route add unreachable 255.255.255.255 >& /dev/null
3286 if [ `ip link ls $dev | grep -c MULTICAST` -ge 1 ]; then
3287   ip route add 224.0.0.0/4 dev $dev scope global >& /dev/null
3288 fi
3289 \end{verbatim}
3290 \begin{flushleft}
3291 \# {\bf Step 5} --- Add fallback default route with huge metric.\\
3292 \# If a proxy ARP server is present on the interface, we will be\\
3293 \# able to talk to all the Internet without further configuration.\\
3294 \# Do not make this step on router or if the device is not ARPable.\\
3295 \# because dead nexthop detection does not work on them.
3296 \end{flushleft}
3297 \begin{verbatim}
3298 if [ $fwd -eq 0 ]; then
3299   if [ $noarp -eq 0 ]; then
3300     ip ro append default dev $dev metric 30000 scope global
3301   elif [ "$peer" != "" ]; then
3302     if ping -q -c 2 -w 4 $peer ; then
3303       ip ro append default via $peer dev $dev metric 30001
3304     fi
3305   fi
3306   RestartRDISC
3307 fi
3309 exit 0
3310 \end{verbatim}
3311 \begin{flushleft}
3312 \# End of {\bf MAIN()}
3313 \end{flushleft}
3316 \end{document}