1\documentstyle[12pt,twoside]{article}
2\def\TITLE{IP Command Reference}
3\input preamble
4\begin{center}
5\Large\bf IP Command Reference.
6\end{center}
7
8
9\begin{center}
10{ \large Alexey~N.~Kuznetsov } \\
11\em Institute for Nuclear Research, Moscow \\
12\verb|kuznet@ms2.inr.ac.ru| \\
13\rm April 14, 1999
14\end{center}
15
16\vspace{5mm}
17
18\tableofcontents
19
20\newpage
21
22\section{About this document}
23
24This document presents a comprehensive description of the \verb|ip| utility
25from the \verb|iproute2| package. It is not a tutorial or user's guide.
26It is a {\em dictionary\/}, not explaining terms,
27but translating them into other terms, which may also be unknown to the reader.
28However, the document is self-contained and the reader, provided they have a
29basic networking background, will find enough information
30and examples to understand and configure Linux-2.2 IP and IPv6
31networking.
32
33This document is split into sections explaining \verb|ip| commands
34and options, decrypting \verb|ip| output and containing a few examples.
35More voluminous examples and some topics, which require more elaborate
36discussion, are in the appendix.
37
38The paragraphs beginning with NB contain side notes, warnings about
39bugs and design drawbacks. They may be skipped at the first reading.
40
41\section{{\tt ip} --- command syntax}
42
43The generic form of an \verb|ip| command is:
44\begin{verbatim}
45ip [ OPTIONS ] OBJECT [ COMMAND [ ARGUMENTS ]]
46\end{verbatim}
47where \verb|OPTIONS| is a set of optional modifiers affecting the
48general behaviour of the \verb|ip| utility or changing its output. All options
49begin with the character \verb|'-'| and may be used in either long or abbreviated 
50forms. Currently, the following options are available:
51
52\begin{itemize}
53\item \verb|-V|, \verb|-Version|
54
55--- print the version of the \verb|ip| utility and exit.
56
57
58\item \verb|-s|, \verb|-stats|, \verb|-statistics|
59
60--- output more information. If the option
61appears twice or more, the amount of information increases.
62As a rule, the information is statistics or some time values.
63
64
65\item \verb|-f|, \verb|-family| followed by a protocol family
66identifier: \verb|inet|, \verb|inet6| or \verb|link|.
67
68--- enforce the protocol family to use. If the option is not present,
69the protocol family is guessed from other arguments. If the rest of the command
70line does not give enough information to guess the family, \verb|ip| falls back to the default
71one, usually \verb|inet| or \verb|any|. \verb|link| is a special family
72identifier meaning that no networking protocol is involved.
73
74\item \verb|-4|
75
76--- shortcut for \verb|-family inet|.
77
78\item \verb|-6|
79
80--- shortcut for \verb|-family inet6|.
81
82\item \verb|-0|
83
84--- shortcut for \verb|-family link|.
85
86
87\item \verb|-o|, \verb|-oneline|
88
89--- output each record on a single line, replacing line feeds
90with the \verb|'\'| character. This is convenient when you want to
91count records with \verb|wc| or to \verb|grep| the output. The trivial
92script \verb|rtpr| converts the output back into readable form.
93
94\item \verb|-r|, \verb|-resolve|
95
96--- use the system's name resolver to print DNS names instead of
97host addresses.
98
99\begin{NB}
100 Do not use this option when reporting bugs or asking for advice.
101\end{NB}
102\begin{NB}
103 \verb|ip| never uses DNS to resolve names to addresses.
104\end{NB}
105
106\end{itemize}
107
108\verb|OBJECT| is the object to manage or to get information about.
109The object types currently understood by \verb|ip| are:
110
111\begin{itemize}
112\item \verb|link| --- network device
113\item \verb|address| --- protocol (IP or IPv6) address on a device
114\item \verb|neighbour| --- ARP or NDISC cache entry
115\item \verb|route| --- routing table entry
116\item \verb|rule| --- rule in routing policy database
117\item \verb|maddress| --- multicast address
118\item \verb|mroute| --- multicast routing cache entry
119\item \verb|tunnel| --- tunnel over IP
120\end{itemize}
121
122Again, the names of all objects may be written in full or
123abbreviated form, f.e.\ \verb|address| is abbreviated as \verb|addr|
124or just \verb|a|.
125
126\verb|COMMAND| specifies the action to perform on the object.
127The set of possible actions depends on the object type.
128As a rule, it is possible to \verb|add|, \verb|delete| and
129\verb|show| (or \verb|list|) objects, but some objects
130do not allow all of these operations or have some additional commands.
131The \verb|help| command is available for all objects. It prints
132out a list of available commands and argument syntax conventions.
133
134If no command is given, some default command is assumed.
135Usually it is \verb|list| or, if the objects of this class
136cannot be listed, \verb|help|.
137
138\verb|ARGUMENTS| is a list of arguments to the command.
139The arguments depend on the command and object. There are two types of arguments:
140{\em flags\/}, consisting of a single keyword, and {\em parameters\/},
141consisting of a keyword followed by a value. For convenience,
142each command has some {\em default parameter\/}
143which may be omitted. F.e.\ parameter \verb|dev| is the default
144for the {\tt ip link} command, so {\tt ip link ls eth0} is equivalent
145to {\tt ip link ls dev eth0}.
146In the command descriptions below such parameters
147are distinguished with the marker: ``(default)''.
148
149Almost all keywords may be abbreviated with several first (or even single)
150letters. The shortcuts are convenient when \verb|ip| is used interactively,
151but they are not recommended in scripts or when reporting bugs
152or asking for advice. ``Officially'' allowed abbreviations are listed
153in the document body.
154
155
156
157\section{{\tt ip} --- error messages}
158
159\verb|ip| may fail for one of the following reasons:
160
161\begin{itemize}
162\item
163A syntax error on the command line: an unknown keyword, incorrectly formatted
164IP address {\em et al\/}. In this case \verb|ip| prints an error message
165and exits. As a rule, the error message will contain information
166about the reason for the failure. Sometimes it also prints a help page.
167
168\item
169The arguments did not pass verification for self-consistency.
170
171\item
172\verb|ip| failed to compile a kernel request from the arguments
173because the user didn't give enough information.
174
175\item
176The kernel returned an error to some syscall. In this case \verb|ip|
177prints the error message, as it is output with \verb|perror(3)|,
178prefixed with a comment and a syscall identifier.
179
180\item
181The kernel returned an error to some RTNETLINK request.
182In this case \verb|ip| prints the error message, as it is output
183with \verb|perror(3)| prefixed with ``RTNETLINK answers:''.
184
185\end{itemize}
186
187All the operations are atomic, i.e.\ 
188if the \verb|ip| utility fails, it does not change anything
189in the system. One harmful exception is \verb|ip link| command
190(Sec.\ref{IP-LINK}, p.\pageref{IP-LINK}),
191which may change only some of the device parameters given
192on command line.
193
194It is difficult to list all the error messages (especially
195syntax errors). However, as a rule, their meaning is clear
196from the context of the command.
197
198The most common mistakes are:
199
200\begin{enumerate}
201\item Netlink is not configured in the kernel. The message is:
202\begin{verbatim}
203Cannot open netlink socket: Invalid value
204\end{verbatim}
205
206\item RTNETLINK is not configured in the kernel. In this case
207one of the following messages may be printed, depending on the command:
208\begin{verbatim}
209Cannot talk to rtnetlink: Connection refused
210Cannot send dump request: Connection refused
211\end{verbatim}
212
213\item The \verb|CONFIG_IP_MULTIPLE_TABLES| option was not selected
214when configuring the kernel. In this case any attempt to use the
215\verb|ip| \verb|rule| command will fail, f.e.
216\begin{verbatim}
217kuznet@kaiser $ ip rule list
218RTNETLINK error: Invalid argument
219dump terminated
220\end{verbatim}
221
222\end{enumerate}
223
224
225\section{{\tt ip link} --- network device configuration}
226\label{IP-LINK}
227
228\paragraph{Object:} A \verb|link| is a network device and the corresponding
229commands display and change the state of devices.
230
231\paragraph{Commands:} \verb|set| and \verb|show| (or \verb|list|).
232
233\subsection{{\tt ip link set} --- change device attributes}
234
235\paragraph{Abbreviations:} \verb|set|, \verb|s|.
236
237\paragraph{Arguments:}
238
239\begin{itemize}
240\item \verb|dev NAME| (default)
241
242--- \verb|NAME| specifies the network device on which to operate.
243
244\item \verb|up| and \verb|down|
245
246--- change the state of the device to \verb|UP| or \verb|DOWN|.
247
248\item \verb|arp on| or \verb|arp off|
249
250--- change the \verb|NOARP| flag on the device.
251
252\begin{NB}
253This operation is {\em not allowed\/} if the device is in state \verb|UP|.
254Though neither the \verb|ip| utility nor the kernel check for this condition.
255You can get unpredictable results changing this flag while the
256device is running.
257\end{NB}
258
259\item \verb|multicast on| or \verb|multicast off|
260
261--- change the \verb|MULTICAST| flag on the device.
262
263\item \verb|dynamic on| or \verb|dynamic off|
264
265--- change the \verb|DYNAMIC| flag on the device.
266
267\item \verb|name NAME|
268
269--- change the name of the device. This operation is not
270recommended if the device is running or has some addresses
271already configured.
272
273\item \verb|txqueuelen NUMBER| or \verb|txqlen NUMBER|
274
275--- change the transmit queue length of the device.
276
277\item \verb|mtu NUMBER|
278
279--- change the MTU of the device.
280
281\item \verb|address LLADDRESS|
282
283--- change the station address of the interface.
284
285\item \verb|broadcast LLADDRESS|, \verb|brd LLADDRESS| or \verb|peer LLADDRESS|
286
287--- change the link layer broadcast address or the peer address when
288the interface is \verb|POINTOPOINT|.
289
290\vskip 1mm
291\begin{NB}
292For most devices (f.e.\ for Ethernet) changing the link layer
293broadcast address will break networking.
294Do not use it, if you do not understand what this operation really does.
295\end{NB}
296
297\end{itemize}
298
299\vskip 1mm
300\begin{NB}
301The {\tt ip} utility does not change the \verb|PROMISC| 
302or \verb|ALLMULTI| flags. These flags are considered
303obsolete and should not be changed administratively.
304\end{NB}
305
306\paragraph{Warning:} If multiple parameter changes are requested,
307\verb|ip| aborts immediately after any of the changes have failed.
308This is the only case when \verb|ip| can move the system to
309an unpredictable state. The solution is to avoid changing
310several parameters with one {\tt ip link set} call.
311
312\paragraph{Examples:}
313\begin{itemize}
314\item \verb|ip link set dummy address 00:00:00:00:00:01|
315
316--- change the station address of the interface \verb|dummy|.
317
318\item \verb|ip link set dummy up|
319
320--- start the interface \verb|dummy|.
321
322\end{itemize}
323
324
325\subsection{{\tt ip link show} --- display device attributes}
326\label{IP-LINK-SHOW}
327
328\paragraph{Abbreviations:} \verb|show|, \verb|list|, \verb|lst|, \verb|sh|, \verb|ls|,
329\verb|l|.
330
331\paragraph{Arguments:}
332\begin{itemize}
333\item \verb|dev NAME| (default)
334
335--- \verb|NAME| specifies the network device to show.
336If this argument is omitted all devices are listed.
337
338\item \verb|up|
339
340--- only display running interfaces.
341
342\end{itemize}
343
344
345\paragraph{Output format:}
346
347\begin{verbatim}
348kuznet@alisa:~ $ ip link ls eth0
3493: eth0: <BROADCAST,MULTICAST,UP> mtu 1500 qdisc cbq qlen 100
350    link/ether 00:a0:cc:66:18:78 brd ff:ff:ff:ff:ff:ff
351kuznet@alisa:~ $ ip link ls sit0
3525: sit0@NONE: <NOARP,UP> mtu 1480 qdisc noqueue
353    link/sit 0.0.0.0 brd 0.0.0.0
354kuznet@alisa:~ $ ip link ls dummy
3552: dummy: <BROADCAST,NOARP> mtu 1500 qdisc noop
356    link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff
357kuznet@alisa:~ $ 
358\end{verbatim}
359
360
361The number before each colon is an {\em interface index\/} or {\em ifindex\/}.
362This number uniquely identifies the interface. This is followed by the {\em interface name\/}
363(\verb|eth0|, \verb|sit0| etc.). The interface name is also
364unique at every given moment. However, the interface may disappear from the
365list (f.e.\ when the corresponding driver module is unloaded) and another
366one with the same name may be created later. Besides that,
367the administrator may change the name of any device with
368\verb|ip| \verb|link| \verb|set| \verb|name|
369to make it more intelligible.
370
371The interface name may have another name or \verb|NONE| appended 
372after the \verb|@| sign. This means that this device is bound to some other
373device,
374i.e.\ packets send through it are encapsulated and sent via the ``master''
375device. If the name is \verb|NONE|, the master is unknown.
376
377Then we see the interface {\em mtu\/} (``maximal transfer unit''). This determines
378the maximal size of data which can be sent as a single packet over this interface.
379
380{\em qdisc\/} (``queuing discipline'') shows the queuing algorithm used
381on the interface. Particularly, \verb|noqueue| means that this interface
382does not queue anything and \verb|noop| means that the interface is in blackhole
383mode i.e.\ all packets sent to it are immediately discarded.
384{\em qlen\/} is the default transmit queue length of the device measured
385in packets.
386
387The interface flags are summarized in the angle brackets.
388
389\begin{itemize}
390\item \verb|UP| --- the device is turned on. It is ready to accept
391packets for transmission and it may inject into the kernel packets received
392from other nodes on the network.
393
394\item \verb|LOOPBACK| --- the interface does not communicate with other
395hosts. All packets sent through it will be returned
396and nothing but bounced packets can be received.
397
398\item \verb|BROADCAST| --- the device has the facility to send packets
399to all hosts sharing the same link. A typical example is an Ethernet link.
400
401\item \verb|POINTOPOINT| --- the link has only two ends with one node
402attached to each end. All packets sent to this link will reach the peer
403and all packets received by us came from this single peer.
404
405If neither \verb|LOOPBACK| nor \verb|BROADCAST| nor \verb|POINTOPOINT|
406are set, the interface is assumed to be NMBA (Non-Broadcast Multi-Access).
407This is the most generic type of device and the most complicated one, because
408the host attached to a NBMA link has no means to send to anyone
409without additionally configured information.
410
411\item \verb|MULTICAST| --- is an advisory flag indicating that the interface
412is aware of multicasting i.e.\ sending packets to some subset of neighbouring
413nodes. Broadcasting is a particular case of multicasting, where the multicast
414group consists of all nodes on the link. It is important to emphasize
415that software {\em must not\/} interpret the absence of this flag as the inability
416to use multicasting on this interface. Any \verb|POINTOPOINT| and
417\verb|BROADCAST| link is multicasting by definition, because we have
418direct access to all the neighbours and, hence, to any part of them.
419Certainly, the use of high bandwidth multicast transfers is not recommended
420on broadcast-only links because of high expense, but it is not strictly
421prohibited.
422
423\item \verb|PROMISC| --- the device listens to and feeds to the kernel all
424traffic on the link even if it is not destined for us, not broadcasted
425and not destined for a multicast group of which we are member. Usually
426this mode exists only on broadcast links and is used by bridges and for network
427monitoring.
428
429\item \verb|ALLMULTI| --- the device receives all multicast packets
430wandering on the link. This mode is used by multicast routers.
431
432\item \verb|NOARP| --- this flag is different from the other ones. It has
433no invariant value and its interpretation depends on the network protocols
434involved. As a rule, it indicates that the device needs no address
435resolution and that the software or hardware knows how to deliver packets
436without any help from the protocol stacks.
437
438\item \verb|DYNAMIC| --- is an advisory flag indicating that the interface is
439dynamically created and destroyed.
440
441\item \verb|SLAVE| --- this interface is bonded to some other interfaces
442to share link capacities.
443
444\end{itemize}
445
446\vskip 1mm
447\begin{NB}
448There are other flags but they are either obsolete (\verb|NOTRAILERS|)
449or not implemented (\verb|DEBUG|) or specific to some devices
450(\verb|MASTER|, \verb|AUTOMEDIA| and \verb|PORTSEL|). We do not discuss
451them here.
452\end{NB}
453\begin{NB}
454The values of \verb|PROMISC| and \verb|ALLMULTI| flags
455shown by the \verb|ifconfig| utility and by the \verb|ip| utility
456are {\em different\/}. \verb|ip link ls| shows the true device state,
457while \verb|ifconfig| shows the virtual state which was set with
458\verb|ifconfig| itself.
459\end{NB}
460
461
462The second line contains information on the link layer addresses
463associated with the device. The first word (\verb|ether|, \verb|sit|)
464defines the interface hardware type. This type determines the format and semantics
465of the addresses and is logically part of the address.
466The default format of the station address and the broadcast address
467(or the peer address for pointopoint links) is a
468sequence of hexadecimal bytes separated by colons, but some link
469types may have their natural address format, f.e.\ addresses
470of tunnels over IP are printed as dotted-quad IP addresses.
471
472\vskip 1mm
473\begin{NB}
474  NBMA links have no well-defined broadcast or peer address,
475  however this field may contain useful information, f.e.\
476  about the address of broadcast relay or about the address of the ARP server.
477\end{NB}
478\begin{NB}
479Multicast addresses are not shown by this command, see
480\verb|ip maddr ls| in~Sec.\ref{IP-MADDR} (p.\pageref{IP-MADDR} of this
481document).
482\end{NB}
483
484
485\paragraph{Statistics:} With the \verb|-statistics| option, \verb|ip| also
486prints interface statistics:
487
488\begin{verbatim}
489kuznet@alisa:~ $ ip -s link ls eth0
4903: eth0: <BROADCAST,MULTICAST,UP> mtu 1500 qdisc cbq qlen 100
491    link/ether 00:a0:cc:66:18:78 brd ff:ff:ff:ff:ff:ff
492    RX: bytes  packets  errors  dropped overrun mcast   
493    2449949362 2786187  0       0       0       0      
494    TX: bytes  packets  errors  dropped carrier collsns 
495    178558497  1783945  332     0       332     35172  
496kuznet@alisa:~ $
497\end{verbatim}
498\verb|RX:| and \verb|TX:| lines summarize receiver and transmitter
499statistics. They contain:
500\begin{itemize}
501\item \verb|bytes| --- the total number of bytes received or transmitted
502on the interface. This number wraps when the maximal length of the data type
503natural for the architecture is exceeded, so continuous monitoring requires
504a user level daemon snapping it periodically.
505\item \verb|packets| --- the total number of packets received or transmitted
506on the interface.
507\item \verb|errors| --- the total number of receiver or transmitter errors.
508\item \verb|dropped| --- the total number of packets dropped due to lack
509of resources.
510\item \verb|overrun| --- the total number of receiver overruns resulting
511in dropped packets. As a rule, if the interface is overrun, it means
512serious problems in the kernel or that your machine is too slow
513for this interface.
514\item \verb|mcast| --- the total number of received multicast packets. This option
515is only supported by a few devices.
516\item \verb|carrier| --- total number of link media failures f.e.\ because
517of lost carrier.
518\item \verb|collsns| --- the total number of collision events
519on Ethernet-like media. This number may have a different sense on other
520link types.
521\item \verb|compressed| --- the total number of compressed packets. This is
522available only for links using VJ header compression.
523\end{itemize}
524
525
526If the \verb|-s| option is entered twice or more,
527\verb|ip| prints more detailed statistics on receiver
528and transmitter errors.
529
530\begin{verbatim}
531kuznet@alisa:~ $ ip -s -s link ls eth0
5323: eth0: <BROADCAST,MULTICAST,UP> mtu 1500 qdisc cbq qlen 100
533    link/ether 00:a0:cc:66:18:78 brd ff:ff:ff:ff:ff:ff
534    RX: bytes  packets  errors  dropped overrun mcast   
535    2449949362 2786187  0       0       0       0      
536    RX errors: length   crc     frame   fifo    missed
537               0        0       0       0       0      
538    TX: bytes  packets  errors  dropped carrier collsns 
539    178558497  1783945  332     0       332     35172  
540    TX errors: aborted  fifo    window  heartbeat
541               0        0       0       332    
542kuznet@alisa:~ $
543\end{verbatim}
544These error names are pure Ethernetisms. Other devices
545may have non zero values in these fields but they may be
546interpreted differently.
547
548
549\section{{\tt ip address} --- protocol address management}
550
551\paragraph{Abbreviations:} \verb|address|, \verb|addr|, \verb|a|.
552
553\paragraph{Object:} The \verb|address| is a protocol (IP or IPv6) address attached
554to a network device. Each device must have at least one address
555to use the corresponding protocol. It is possible to have several
556different addresses attached to one device. These addresses are not
557discriminated, so that the term {\em alias\/} is not quite appropriate
558for them and we do not use it in this document.
559
560The \verb|ip addr| command displays addresses and their properties,
561adds new addresses and deletes old ones.
562
563\paragraph{Commands:} \verb|add|, \verb|delete|, \verb|flush| and \verb|show|
564(or \verb|list|).
565
566
567\subsection{{\tt ip address add} --- add a new protocol address}
568\label{IP-ADDR-ADD}
569
570\paragraph{Abbreviations:} \verb|add|, \verb|a|.
571
572\paragraph{Arguments:}
573
574\begin{itemize}
575\item \verb|dev NAME|
576
577\noindent--- the name of the device to add the address to.
578
579\item \verb|local ADDRESS| (default)
580
581--- the address of the interface. The format of the address depends
582on the protocol. It is a dotted quad for IP and a sequence of hexadecimal halfwords
583separated by colons for IPv6. The \verb|ADDRESS| may be followed by
584a slash and a decimal number which encodes the network prefix length.
585
586
587\item \verb|peer ADDRESS|
588
589--- the address of the remote endpoint for pointopoint interfaces.
590Again, the \verb|ADDRESS| may be followed by a slash and a decimal number,
591encoding the network prefix length. If a peer address is specified,
592the local address {\em cannot\/} have a prefix length. The network prefix is associated
593with the peer rather than with the local address.
594
595
596\item \verb|broadcast ADDRESS|
597
598--- the broadcast address on the interface.
599
600It is possible to use the special symbols \verb|'+'| and \verb|'-'|
601instead of the broadcast address. In this case, the broadcast address
602is derived by setting/resetting the host bits of the interface prefix.
603
604\vskip 1mm
605\begin{NB}
606Unlike \verb|ifconfig|, the \verb|ip| utility {\em does not\/} set any broadcast
607address unless explicitly requested.
608\end{NB}
609
610
611\item \verb|label NAME|
612
613--- Each address may be tagged with a label string.
614In order to preserve compatibility with Linux-2.0 net aliases,
615this string must coincide with the name of the device or must be prefixed
616with the device name followed by colon.
617
618
619\item \verb|scope SCOPE_VALUE|
620
621--- the scope of the area where this address is valid.
622The available scopes are listed in file \verb|/etc/iproute2/rt_scopes|.
623Predefined scope values are:
624
625 \begin{itemize}
626	\item \verb|global| --- the address is globally valid.
627	\item \verb|site| --- (IPv6 only) the address is site local,
628	i.e.\ it is valid inside this site.
629	\item \verb|link| --- the address is link local, i.e.\ 
630	it is valid only on this device.
631	\item \verb|host| --- the address is valid only inside this host.
632 \end{itemize}
633
634Appendix~\ref{ADDR-SEL} (p.\pageref{ADDR-SEL} of this document)
635contains more details on address scopes.
636
637\end{itemize}
638
639\paragraph{Examples:}
640\begin{itemize}
641\item \verb|ip addr add 127.0.0.1/8 dev lo brd + scope host|
642
643--- add the usual loopback address to the loopback device.
644
645\item \verb|ip addr add 10.0.0.1/24 brd + dev eth0 label eth0:Alias|
646
647--- add the address 10.0.0.1 with prefix length 24 (i.e.\ netmask
648\verb|255.255.255.0|), standard broadcast and label \verb|eth0:Alias|
649to the interface \verb|eth0|.
650\end{itemize}
651
652
653\subsection{{\tt ip address delete} --- delete a protocol address}
654
655\paragraph{Abbreviations:} \verb|delete|, \verb|del|, \verb|d|.
656
657\paragraph{Arguments:} coincide with the arguments of \verb|ip addr add|.
658The device name is a required argument. The rest are optional.
659If no arguments are given, the first address is deleted.
660
661\paragraph{Examples:}
662\begin{itemize}
663\item \verb|ip addr del 127.0.0.1/8 dev lo|
664
665--- deletes the loopback address from the loopback device.
666It would be best not to repeat this experiment.
667
668\item Disable IP on the interface \verb|eth0|:
669\begin{verbatim}
670  while ip -f inet addr del dev eth0; do
671    : nothing
672  done
673\end{verbatim}
674Another method to disable IP on an interface using {\tt ip addr flush}
675may be found in sec.\ref{IP-ADDR-FLUSH}, p.\pageref{IP-ADDR-FLUSH}.
676
677\end{itemize}
678
679
680\subsection{{\tt ip address show} --- display protocol addresses}
681
682\paragraph{Abbreviations:} \verb|show|, \verb|list|, \verb|lst|, \verb|sh|, \verb|ls|,
683\verb|l|.
684
685\paragraph{Arguments:}
686
687\begin{itemize}
688\item \verb|dev NAME| (default)
689
690--- the name of the device.
691
692\item \verb|scope SCOPE_VAL|
693
694--- only list addresses with this scope.
695
696\item \verb|to PREFIX|
697
698--- only list addresses matching this prefix.
699
700\item \verb|label PATTERN|
701
702--- only list addresses with labels matching the \verb|PATTERN|.
703\verb|PATTERN| is a usual shell style pattern.
704
705
706\item \verb|dynamic| and \verb|permanent|
707
708--- (IPv6 only) only list addresses installed due to stateless
709address configuration or only list permanent (not dynamic) addresses.
710
711\item \verb|tentative|
712
713--- (IPv6 only) only list addresses which did not pass duplicate
714address detection.
715
716\item \verb|deprecated|
717
718--- (IPv6 only) only list deprecated addresses.
719
720
721\item  \verb|primary| and \verb|secondary|
722
723--- only list primary (or secondary) addresses.
724
725\end{itemize}
726
727
728\paragraph{Output format:}
729
730\begin{verbatim}
731kuznet@alisa:~ $ ip addr ls eth0
7323: eth0: <BROADCAST,MULTICAST,UP> mtu 1500 qdisc cbq qlen 100
733    link/ether 00:a0:cc:66:18:78 brd ff:ff:ff:ff:ff:ff
734    inet 193.233.7.90/24 brd 193.233.7.255 scope global eth0
735    inet6 3ffe:2400:0:1:2a0:ccff:fe66:1878/64 scope global dynamic 
736       valid_lft forever preferred_lft 604746sec
737    inet6 fe80::2a0:ccff:fe66:1878/10 scope link 
738kuznet@alisa:~ $ 
739\end{verbatim}
740
741The first two lines coincide with the output of \verb|ip link ls|.
742It is natural to interpret link layer addresses
743as addresses of the protocol family \verb|AF_PACKET|.
744
745Then the list of IP and IPv6 addresses follows, accompanied by
746additional address attributes: scope value (see Sec.\ref{IP-ADDR-ADD},
747p.\pageref{IP-ADDR-ADD} above), flags and the address label.
748
749Address flags are set by the kernel and cannot be changed
750administratively. Currently, the following flags are defined:
751
752\begin{enumerate}
753\item \verb|secondary|
754
755--- the address is not used when selecting the default source address
756of outgoing packets (Cf.\ Appendix~\ref{ADDR-SEL}, p.\pageref{ADDR-SEL}.).
757An IP address becomes secondary if another address with the same
758prefix bits already exists. The first address is primary.
759It is the leader of the group of all secondary addresses. When the leader
760is deleted, all secondaries are purged too.
761
762
763\item \verb|dynamic|
764
765--- the address was created due to stateless autoconfiguration~\cite{RFC-ADDRCONF}.
766In this case the output also contains information on times, when
767the address is still valid. After \verb|preferred_lft| expires the address is
768moved to the deprecated state. After \verb|valid_lft| expires the address
769is finally invalidated.
770
771\item \verb|deprecated|
772
773--- the address is deprecated, i.e.\ it is still valid, but cannot
774be used by newly created connections.
775
776\item \verb|tentative|
777
778--- the address is not used because duplicate address detection~\cite{RFC-ADDRCONF}
779is still not complete or failed.
780
781\end{enumerate}
782
783
784\subsection{{\tt ip address flush} --- flush protocol addresses}
785\label{IP-ADDR-FLUSH}
786
787\paragraph{Abbreviations:} \verb|flush|, \verb|f|.
788
789\paragraph{Description:}This command flushes the protocol addresses
790selected by some criteria.
791
792\paragraph{Arguments:} This command has the same arguments as \verb|show|.
793The difference is that it does not run when no arguments are given.
794
795\paragraph{Warning:} This command (and other \verb|flush| commands
796described below) is pretty dangerous. If you make a mistake, it will
797not forgive it, but will cruelly purge all the addresses.
798
799\paragraph{Statistics:} With the \verb|-statistics| option, the command
800becomes verbose. It prints out the number of deleted addresses and the number
801of rounds made to flush the address list. If this option is given
802twice, \verb|ip addr flush| also dumps all the deleted addresses
803in the format described in the previous subsection.
804
805\paragraph{Example:} Delete all the addresses from the private network
80610.0.0.0/8:
807\begin{verbatim}
808netadm@amber:~ # ip -s -s a f to 10/8
8092: dummy    inet 10.7.7.7/16 brd 10.7.255.255 scope global dummy
8103: eth0    inet 10.10.7.7/16 brd 10.10.255.255 scope global eth0
8114: eth1    inet 10.8.7.7/16 brd 10.8.255.255 scope global eth1
812
813*** Round 1, deleting 3 addresses ***
814*** Flush is complete after 1 round ***
815netadm@amber:~ # 
816\end{verbatim}
817Another instructive example is disabling IP on all the Ethernets:
818\begin{verbatim}
819netadm@amber:~ # ip -4 addr flush label "eth*"
820\end{verbatim}
821And the last example shows how to flush all the IPv6 addresses
822acquired by the host from stateless address autoconfiguration
823after you enabled forwarding or disabled autoconfiguration.
824\begin{verbatim}
825netadm@amber:~ # ip -6 addr flush dynamic
826\end{verbatim}
827
828
829
830\section{{\tt ip neighbour} --- neighbour/arp tables management}
831
832\paragraph{Abbreviations:} \verb|neighbour|, \verb|neighbor|, \verb|neigh|,
833\verb|n|.
834
835\paragraph{Object:} \verb|neighbour| objects establish bindings between protocol
836addresses and link layer addresses for hosts sharing the same link.
837Neighbour entries are organized into tables. The IPv4 neighbour table
838is known by another name --- the ARP table.
839
840The corresponding commands display neighbour bindings
841and their properties, add new neighbour entries and delete old ones.
842
843\paragraph{Commands:} \verb|add|, \verb|change|, \verb|replace|,
844\verb|delete|, \verb|flush| and \verb|show| (or \verb|list|).
845
846\paragraph{See also:} Appendix~\ref{PROXY-NEIGH}, p.\pageref{PROXY-NEIGH}
847describes how to manage proxy ARP/NDISC with the \verb|ip| utility.
848
849
850\subsection{{\tt ip neighbour add} --- add a new neighbour entry\\
851	{\tt ip neighbour change} --- change an existing entry\\
852	{\tt ip neighbour replace} --- add a new entry or change an existing one}
853
854\paragraph{Abbreviations:} \verb|add|, \verb|a|; \verb|change|, \verb|chg|;
855\verb|replace|,	\verb|repl|.
856
857\paragraph{Description:} These commands create new neighbour records
858or update existing ones.
859
860\paragraph{Arguments:}
861
862\begin{itemize}
863\item \verb|to ADDRESS| (default)
864
865--- the protocol address of the neighbour. It is either an IPv4 or IPv6 address.
866
867\item \verb|dev NAME|
868
869--- the interface to which this neighbour is attached.
870
871
872\item \verb|lladdr LLADDRESS|
873
874--- the link layer address of the neighbour. \verb|LLADDRESS| can also be
875\verb|null|. 
876
877\item \verb|nud NUD_STATE|
878
879--- the state of the neighbour entry. \verb|nud| is an abbreviation for ``Neighbour
880Unreachability Detection''. The state can take one of the following values:
881
882\begin{enumerate}
883\item \verb|permanent| --- the neighbour entry is valid forever and can be only be removed
884administratively.
885\item \verb|noarp| --- the neighbour entry is valid. No attempts to validate
886this entry will be made but it can be removed when its lifetime expires.
887\item \verb|reachable| --- the neighbour entry is valid until the reachability
888timeout expires.
889\item \verb|stale| --- the neighbour entry is valid but suspicious.
890This option to \verb|ip neigh| does not change the neighbour state if
891it was valid and the address is not changed by this command.
892\end{enumerate}
893
894\end{itemize}
895
896\paragraph{Examples:}
897\begin{itemize}
898\item \verb|ip neigh add 10.0.0.3 lladdr 0:0:0:0:0:1 dev eth0 nud perm|
899
900--- add a permanent ARP entry for the neighbour 10.0.0.3 on the device \verb|eth0|.
901
902\item \verb|ip neigh chg 10.0.0.3 dev eth0 nud reachable|
903
904--- change its state to \verb|reachable|.
905\end{itemize}
906
907
908\subsection{{\tt ip neighbour delete} --- delete a neighbour entry}
909
910\paragraph{Abbreviations:} \verb|delete|, \verb|del|, \verb|d|.
911
912\paragraph{Description:} This command invalidates a neighbour entry.
913
914\paragraph{Arguments:} The arguments are the same as with \verb|ip neigh add|,
915except that \verb|lladdr| and \verb|nud| are ignored.
916
917
918\paragraph{Example:}
919\begin{itemize}
920\item \verb|ip neigh del 10.0.0.3 dev eth0|
921
922--- invalidate an ARP entry for the neighbour 10.0.0.3 on the device \verb|eth0|.
923
924\end{itemize}
925
926\begin{NB}
927 The deleted neighbour entry will not disappear from the tables
928 immediately. If it is in use it cannot be deleted until the last
929 client releases it. Otherwise it will be destroyed during
930 the next garbage collection.
931\end{NB}
932
933
934\paragraph{Warning:} Attempts to delete or manually change
935a \verb|noarp| entry created by the kernel may result in unpredictable behaviour.
936Particularly, the kernel may try to resolve this address even
937on a \verb|NOARP| interface or if the address is multicast or broadcast.
938
939
940\subsection{{\tt ip neighbour show} --- list neighbour entries}
941
942\paragraph{Abbreviations:} \verb|show|, \verb|list|, \verb|sh|, \verb|ls|.
943
944\paragraph{Description:}This commands displays neighbour tables.
945
946\paragraph{Arguments:}
947
948\begin{itemize}
949
950\item \verb|to ADDRESS| (default)
951
952--- the prefix selecting the neighbours to list.
953
954\item \verb|dev NAME|
955
956--- only list the neighbours attached to this device.
957
958\item \verb|unused|
959
960--- only list neighbours which are not currently in use.
961
962\item \verb|nud NUD_STATE|
963
964--- only list neighbour entries in this state. \verb|NUD_STATE| takes
965values listed below or the special value \verb|all| which means all states.
966This option may occur more than once. If this option is absent, \verb|ip|
967lists all entries except for \verb|none| and \verb|noarp|.
968
969\end{itemize}
970
971
972\paragraph{Output format:}
973
974\begin{verbatim}
975kuznet@alisa:~ $ ip neigh ls
976:: dev lo lladdr 00:00:00:00:00:00 nud noarp
977fe80::200:cff:fe76:3f85 dev eth0 lladdr 00:00:0c:76:3f:85 router \
978    nud stale
9790.0.0.0 dev lo lladdr 00:00:00:00:00:00 nud noarp
980193.233.7.254 dev eth0 lladdr 00:00:0c:76:3f:85 nud reachable
981193.233.7.85 dev eth0 lladdr 00:e0:1e:63:39:00 nud stale
982kuznet@alisa:~ $ 
983\end{verbatim}
984
985The first word of each line is the protocol address of the neighbour.
986Then the device name follows. The rest of the line describes the contents of
987the neighbour entry identified by the pair (device, address).
988
989\verb|lladdr| is the link layer address of the neighbour.
990
991\verb|nud| is the state of the ``neighbour unreachability detection'' machine
992for this entry. The detailed description of the neighbour
993state machine can be found in~\cite{RFC-NDISC}. Here is the full list
994of the states with short descriptions:
995
996\begin{enumerate}
997\item\verb|none| --- the state of the neighbour is void.
998\item\verb|incomplete| --- the neighbour is in the process of resolution.
999\item\verb|reachable| --- the neighbour is valid and apparently reachable.
1000\item\verb|stale| --- the neighbour is valid, but is probably already
1001unreachable, so the kernel will try to check it at the first transmission.
1002\item\verb|delay| --- a packet has been sent to the stale neighbour and the kernel is waiting
1003for confirmation.
1004\item\verb|probe| --- the delay timer expired but no confirmation was received.
1005The kernel has started to probe the neighbour with ARP/NDISC messages.
1006\item\verb|failed| --- resolution has failed.
1007\item\verb|noarp| --- the neighbour is valid. No attempts to check the entry
1008will be made.
1009\item\verb|permanent| --- it is a \verb|noarp| entry, but only the administrator
1010may remove the entry from the neighbour table.
1011\end{enumerate}
1012
1013The link layer address is valid in all states except for \verb|none|,
1014\verb|failed| and \verb|incomplete|.
1015
1016IPv6 neighbours can be marked with the additional flag \verb|router|
1017which means that the neighbour introduced itself as an IPv6 router~\cite{RFC-NDISC}.
1018
1019\paragraph{Statistics:} The \verb|-statistics| option displays some usage
1020statistics, f.e.\
1021
1022\begin{verbatim}
1023kuznet@alisa:~ $ ip -s n ls 193.233.7.254
1024193.233.7.254 dev eth0 lladdr 00:00:0c:76:3f:85 ref 5 used 12/13/20 \
1025    nud reachable
1026kuznet@alisa:~ $ 
1027\end{verbatim}
1028
1029Here \verb|ref| is the number of users of this entry
1030and \verb|used| is a triplet of time intervals in seconds
1031separated by slashes. In this case they show that:
1032
1033\begin{enumerate}
1034\item the entry was used 12 seconds ago.
1035\item the entry was confirmed 13 seconds ago.
1036\item the entry was updated 20 seconds ago.
1037\end{enumerate}
1038
1039\subsection{{\tt ip neighbour flush} --- flush neighbour entries}
1040
1041\paragraph{Abbreviations:} \verb|flush|, \verb|f|.
1042
1043\paragraph{Description:}This command flushes neighbour tables, selecting
1044entries to flush by some criteria.
1045
1046\paragraph{Arguments:} This command has the same arguments as \verb|show|.
1047The differences are that it does not run when no arguments are given,
1048and that the default neighbour states to be flushed do not include
1049\verb|permanent| and \verb|noarp|.
1050
1051
1052\paragraph{Statistics:} With the \verb|-statistics| option, the command
1053becomes verbose. It prints out the number of deleted neighbours and the number
1054of rounds made to flush the neighbour table. If the option is given
1055twice, \verb|ip neigh flush| also dumps all the deleted neighbours
1056in the format described in the previous subsection.
1057
1058\paragraph{Example:}
1059\begin{verbatim}
1060netadm@alisa:~ # ip -s -s n f 193.233.7.254
1061193.233.7.254 dev eth0 lladdr 00:00:0c:76:3f:85 ref 5 used 12/13/20 \
1062    nud reachable
1063
1064*** Round 1, deleting 1 entries ***
1065*** Flush is complete after 1 round ***
1066netadm@alisa:~ # 
1067\end{verbatim}
1068
1069
1070\section{{\tt ip route} --- routing table management}
1071\label{IP-ROUTE}
1072
1073\paragraph{Abbreviations:} \verb|route|, \verb|ro|, \verb|r|.
1074
1075\paragraph{Object:} \verb|route| entries in the kernel routing tables keep
1076information about paths to other networked nodes.
1077
1078Each route entry has a {\em key\/} consisting of a {\em prefix\/}
1079(i.e.\ a pair containing a network address and the length of its mask) and,
1080optionally, the TOS value. An IP packet matches the route if the highest
1081bits of its destination address are equal to the route prefix at least
1082up to the prefix length and if the TOS of the route is zero or equal to
1083the TOS of the packet.
1084 
1085If several routes match the packet, the following pruning rules
1086are used to select the best one (see~\cite{RFC1812}):
1087\begin{enumerate}
1088\item The longest matching prefix is selected. All shorter ones
1089are dropped.
1090
1091\item If the TOS of some route with the longest prefix is equal to the TOS
1092of the packet, the routes with different TOS are dropped.
1093
1094If no exact TOS match was found and routes with TOS=0 exist,
1095the rest of routes are pruned.
1096
1097Otherwise, the route lookup fails.
1098
1099\item If several routes remain after the previous steps, then
1100the routes with the best preference values are selected.
1101
1102\item If we still have several routes, then the {\em first\/} of them
1103is selected.
1104
1105\begin{NB}
1106 Note the ambiguity of the last step. Unfortunately, Linux
1107 historically allows such a bizarre situation. The sense of the
1108word ``first'' depends on the order of route additions and it is practically
1109impossible to maintain a bundle of such routes in this order.
1110\end{NB}
1111
1112For simplicity we will limit ourselves to the case where such a situation
1113is impossible and routes are uniquely identified by the triplet
1114\{prefix, tos, preference\}. Actually, it is impossible to create
1115non-unique routes with \verb|ip| commands described in this section.
1116
1117One useful exception to this rule is the default route on non-forwarding
1118hosts. It is ``officially'' allowed to have several fallback routes
1119when several routers are present on directly connected networks.
1120In this case, Linux-2.2 makes ``dead gateway detection''~\cite{RFC1122}
1121controlled by neighbour unreachability detection and by advice
1122from transport protocols to select a working router, so the order
1123of the routes is not essential. However, in this case,
1124fiddling with default routes manually is not recommended. Use the Router Discovery
1125protocol (see Appendix~\ref{EXAMPLE-SETUP}, p.\pageref{EXAMPLE-SETUP})
1126instead. Actually, Linux-2.2 IPv6 does not give user level applications
1127any access to default routes.
1128\end{enumerate}
1129
1130Certainly, the steps above are not performed exactly
1131in this sequence. Instead, the routing table in the kernel is kept
1132in some data structure to achieve the final result
1133with minimal cost. However, not depending on a particular
1134routing algorithm implemented in the kernel, we can summarize
1135the statements above as: a route is identified by the triplet
1136\{prefix, tos, preference\}. This {\em key\/} lets us locate
1137the route in the routing table.
1138
1139\paragraph{Route attributes:} Each route key refers to a routing
1140information record containing
1141the data required to deliver IP packets (f.e.\ output device and
1142next hop router) and some optional attributes (f.e. the path MTU or
1143the preferred source address when communicating with this destination).
1144These attributes are described in the following subsection.
1145
1146\paragraph{Route types:} \label{IP-ROUTE-TYPES}
1147It is important that the set
1148of required and optional attributes depend on the route {\em type\/}.
1149The most important route type
1150is \verb|unicast|. It describes real paths to other hosts.
1151As a rule, common routing tables contain only such routes. However,
1152there are other types of routes with different semantics. The
1153full list of types understood by Linux-2.2 is:
1154\begin{itemize}
1155\item \verb|unicast| --- the route entry describes real paths to the
1156destinations covered by the route prefix.
1157\item \verb|unreachable| --- these destinations are unreachable. Packets
1158are discarded and the ICMP message {\em host unreachable\/} is generated.
1159The local senders get an \verb|EHOSTUNREACH| error.
1160\item \verb|blackhole| --- these destinations are unreachable. Packets
1161are discarded silently. The local senders get an \verb|EINVAL| error.
1162\item \verb|prohibit| --- these destinations are unreachable. Packets
1163are discarded and the ICMP message {\em communication administratively
1164prohibited\/} is generated. The local senders get an \verb|EACCES| error.
1165\item \verb|local| --- the destinations are assigned to this
1166host. The packets are looped back and delivered locally.
1167\item \verb|broadcast| --- the destinations are broadcast addresses.
1168The packets are sent as link broadcasts.
1169\item \verb|throw| --- a special control route used together with policy
1170rules (see sec.\ref{IP-RULE}, p.\pageref{IP-RULE}). If such a route is selected, lookup
1171in this table is terminated pretending that no route was found.
1172Without policy routing it is equivalent to the absence of the route in the routing
1173table. The packets are dropped and the ICMP message {\em net unreachable\/}
1174is generated. The local senders get an \verb|ENETUNREACH| error.
1175\item \verb|nat| --- a special NAT route. Destinations covered by the prefix
1176are considered to be dummy (or external) addresses which require translation
1177to real (or internal) ones before forwarding. The addresses to translate to
1178are selected with the attribute \verb|via|. More about NAT is
1179in Appendix~\ref{ROUTE-NAT}, p.\pageref{ROUTE-NAT}.
1180\item \verb|anycast| --- ({\em not implemented\/}) the destinations are
1181{\em anycast\/} addresses assigned to this host. They are mainly equivalent
1182to \verb|local| with one difference: such addresses are invalid when used
1183as the source address of any packet.
1184\item \verb|multicast| --- a special type used for multicast routing.
1185It is not present in normal routing tables.
1186\end{itemize}
1187
1188\paragraph{Route tables:} Linux-2.2 can pack routes into several routing
1189tables identified by a number in the range from 1 to 255 or by
1190name from the file \verb|/etc/iproute2/rt_tables|. By default all normal
1191routes are inserted into the \verb|main| table (ID 254) and the kernel only uses
1192this table when calculating routes.
1193
1194Actually, one other table always exists, which is invisible but
1195even more important. It is the \verb|local| table (ID 255). This table
1196consists of routes for local and broadcast addresses. The kernel maintains
1197this table automatically and the administrator usually need not modify it
1198or even look at it.
1199
1200The multiple routing tables enter the game when {\em policy routing\/}
1201is used. See sec.\ref{IP-RULE}, p.\pageref{IP-RULE}.
1202In this case, the table identifier effectively becomes
1203one more parameter, which should be added to the triplet
1204\{prefix, tos, preference\} to uniquely identify the route.
1205
1206
1207\subsection{{\tt ip route add} --- add a new route\\
1208	{\tt ip route change} --- change a route\\
1209	{\tt ip route replace} --- change a route or add a new one}
1210\label{IP-ROUTE-ADD}
1211
1212\paragraph{Abbreviations:} \verb|add|, \verb|a|; \verb|change|, \verb|chg|;
1213	\verb|replace|, \verb|repl|.
1214
1215
1216\paragraph{Arguments:}
1217\begin{itemize}
1218\item \verb|to PREFIX| or \verb|to TYPE PREFIX| (default)
1219
1220--- the destination prefix of the route. If \verb|TYPE| is omitted,
1221\verb|ip| assumes type \verb|unicast|. Other values of \verb|TYPE|
1222are listed above. \verb|PREFIX| is an IP or IPv6 address optionally followed
1223by a slash and the prefix length. If the length of the prefix is missing,
1224\verb|ip| assumes a full-length host route. There is also a special
1225\verb|PREFIX| --- \verb|default| --- which is equivalent to IP \verb|0/0| or
1226to IPv6 \verb|::/0|.
1227
1228\item \verb|tos TOS| or \verb|dsfield TOS|
1229
1230--- the Type Of Service (TOS) key. This key has no associated mask and
1231the longest match is understood as: First, compare the TOS
1232of the route and of the packet. If they are not equal, then the packet
1233may still match a route with a zero TOS. \verb|TOS| is either an 8 bit hexadecimal
1234number or an identifier from {\tt /etc/iproute2/rt\_dsfield}.
1235
1236
1237\item \verb|metric NUMBER| or \verb|preference NUMBER|
1238
1239--- the preference value of the route. \verb|NUMBER| is an arbitrary 32bit number.
1240
1241\item \verb|table TABLEID|
1242
1243--- the table to add this route to.
1244\verb|TABLEID| may be a number or a string from the file
1245\verb|/etc/iproute2/rt_tables|. If this parameter is omitted,
1246\verb|ip| assumes the \verb|main| table, with the exception of
1247\verb|local|, \verb|broadcast| and \verb|nat| routes, which are
1248put into the \verb|local| table by default.
1249
1250\item \verb|dev NAME|
1251
1252--- the output device name.
1253
1254\item \verb|via ADDRESS|
1255
1256--- the address of the nexthop router. Actually, the sense of this field depends
1257on the route type. For normal \verb|unicast| routes it is either the true nexthop
1258router or, if it is a direct route installed in BSD compatibility mode,
1259it can be a local address of the interface.
1260For NAT routes it is the first address of the block of translated IP destinations.
1261
1262\item \verb|src ADDRESS|
1263
1264--- the source address to prefer when sending to the destinations
1265covered by the route prefix.
1266
1267\item \verb|realm REALMID|
1268
1269--- the realm to which this route is assigned.
1270\verb|REALMID| may be a number or a string from the file
1271\verb|/etc/iproute2/rt_realms|. Sec.\ref{RT-REALMS} (p.\pageref{RT-REALMS})
1272contains more information on realms.
1273
1274\item \verb|mtu MTU| or \verb|mtu lock MTU|
1275
1276--- the MTU along the path to the destination. If the modifier \verb|lock| is
1277not used, the MTU may be updated by the kernel due to Path MTU Discovery.
1278If the modifier \verb|lock| is used, no path MTU discovery will be tried,
1279all packets will be sent without the DF bit in IPv4 case
1280or fragmented to MTU for IPv6.
1281
1282\item \verb|window NUMBER|
1283
1284--- the maximal window for TCP to advertise to these destinations,
1285measured in bytes. It limits maximal data bursts that our TCP
1286peers are allowed to send to us.
1287
1288\item \verb|rtt NUMBER|
1289
1290--- the initial RTT (``Round Trip Time'') estimate.
1291
1292
1293\item \verb|rttvar NUMBER|
1294
1295--- \threeonly the initial RTT variance estimate.
1296
1297
1298\item \verb|ssthresh NUMBER|
1299
1300--- \threeonly an estimate for the initial slow start threshold.
1301
1302
1303\item \verb|cwnd NUMBER|
1304
1305--- \threeonly the clamp for congestion window. It is ignored if the \verb|lock|
1306    flag is not used.
1307
1308
1309\item \verb|advmss NUMBER|
1310
1311--- \threeonly the MSS (``Maximal Segment Size'') to advertise to these
1312    destinations when establishing TCP connections. If it is not given,
1313    Linux uses a default value calculated from the first hop device MTU.
1314
1315\begin{NB}
1316  If the path to these destination is asymmetric, this guess may be wrong.
1317\end{NB}
1318
1319\item \verb|reordering NUMBER|
1320
1321--- \threeonly Maximal reordering on the path to this destination.
1322    If it is not given, Linux uses the value selected with \verb|sysctl|
1323    variable \verb|net/ipv4/tcp_reordering|.
1324
1325
1326
1327\item \verb|nexthop NEXTHOP|
1328
1329--- the nexthop of a multipath route. \verb|NEXTHOP| is a complex value
1330with its own syntax similar to the top level argument lists:
1331\begin{itemize}
1332\item \verb|via ADDRESS| is the nexthop router.
1333\item \verb|dev NAME| is the output device.
1334\item \verb|weight NUMBER| is a weight for this element of a multipath
1335route reflecting its relative bandwidth or quality.
1336\end{itemize}
1337
1338\item \verb|scope SCOPE_VAL|
1339
1340--- the scope of the destinations covered by the route prefix.
1341\verb|SCOPE_VAL| may be a number or a string from the file
1342\verb|/etc/iproute2/rt_scopes|.
1343If this parameter is omitted,
1344\verb|ip| assumes scope \verb|global| for all gatewayed \verb|unicast|
1345routes, scope \verb|link| for direct \verb|unicast| and \verb|broadcast| routes
1346and scope \verb|host| for \verb|local| routes.
1347
1348\item \verb|protocol RTPROTO|
1349
1350--- the routing protocol identifier of this route.
1351\verb|RTPROTO| may be a number or a string from the file
1352\verb|/etc/iproute2/rt_protos|. If the routing protocol ID is
1353not given, \verb|ip| assumes protocol \verb|boot| (i.e.\
1354it assumes the route was added by someone who doesn't
1355understand what they are doing). Several protocol values have a fixed interpretation.
1356Namely:
1357\begin{itemize}
1358\item \verb|redirect| --- the route was installed due to an ICMP redirect.
1359\item \verb|kernel| --- the route was installed by the kernel during
1360autoconfiguration.
1361\item \verb|boot| --- the route was installed during the bootup sequence.
1362If a routing daemon starts, it will purge all of them.
1363\item \verb|static| --- the route was installed by the administrator
1364to override dynamic routing. Routing daemon will respect them
1365and, probably, even advertise them to its peers.
1366\item \verb|ra| --- the route was installed by Router Discovery protocol.
1367\end{itemize}
1368The rest of the values are not reserved and the administrator is free
1369to assign (or not to assign) protocol tags. At least, routing
1370daemons should take care of setting some unique protocol values,
1371f.e.\ as they are assigned in \verb|rtnetlink.h| or in \verb|rt_protos|
1372database.
1373
1374
1375\item \verb|onlink|
1376
1377--- pretend that the nexthop is directly attached to this link,
1378even if it does not match any interface prefix. One application of this
1379option may be found in~\cite{IP-TUNNELS}.
1380
1381\item \verb|equalize|
1382
1383--- allow packet by packet randomization on multipath routes.
1384Without this modifier, the route will be frozen to one selected
1385nexthop, so that load splitting will only occur on per-flow base.
1386\verb|equalize| only works if the kernel is patched.
1387
1388
1389\end{itemize}
1390
1391
1392\begin{NB}
1393  Actually there are more commands: \verb|prepend| does the same
1394  thing as classic \verb|route add|, i.e.\ adds a route, even if another
1395  route to the same destination exists. Its opposite case is \verb|append|,
1396  which adds the route to the end of the list. Avoid these
1397  features.
1398\end{NB}
1399\begin{NB}
1400  More sad news, IPv6 only understands the \verb|append| command correctly.
1401  All the others are translated into \verb|append| commands. Certainly,
1402  this will change in the future.
1403\end{NB}
1404
1405\paragraph{Examples:}
1406\begin{itemize}
1407\item add a plain route to network 10.0.0/24 via gateway 193.233.7.65
1408\begin{verbatim}
1409  ip route add 10.0.0/24 via 193.233.7.65
1410\end{verbatim}
1411\item change it to a direct route via the \verb|dummy| device
1412\begin{verbatim}
1413  ip ro chg 10.0.0/24 dev dummy
1414\end{verbatim}
1415\item add a default multipath route splitting the load between \verb|ppp0|
1416and \verb|ppp1|
1417\begin{verbatim}
1418  ip route add default scope global nexthop dev ppp0 \
1419                                    nexthop dev ppp1
1420\end{verbatim}
1421Note the scope value. It is not necessary but it informs the kernel
1422that this route is gatewayed rather than direct. Actually, if you
1423know the addresses of remote endpoints it would be better to use the
1424\verb|via| parameter.
1425\item announce that the address 192.203.80.144 is not a real one, but
1426should be translated to 193.233.7.83 before forwarding
1427\begin{verbatim}
1428  ip route add nat 192.203.80.144 via 193.233.7.83
1429\end{verbatim}
1430Backward translation is setup with policy rules described
1431in the following section (sec.\ref{IP-RULE}, p.\pageref{IP-RULE}).
1432\end{itemize}
1433
1434\subsection{{\tt ip route delete} --- delete a route}
1435
1436\paragraph{Abbreviations:} \verb|delete|, \verb|del|, \verb|d|.
1437
1438\paragraph{Arguments:} \verb|ip route del| has the same arguments as
1439\verb|ip route add|, but their semantics are a bit different.
1440
1441Key values (\verb|to|, \verb|tos|, \verb|preference| and \verb|table|)
1442select the route to delete. If optional attributes are present, \verb|ip|
1443verifies that they coincide with the attributes of the route to delete.
1444If no route with the given key and attributes was found, \verb|ip route del|
1445fails.
1446\begin{NB}
1447Linux-2.0 had the option to delete a route selected only by prefix address,
1448ignoring its length (i.e.\ netmask). This option no longer exists
1449because it was ambiguous. However, look at {\tt ip route flush}
1450(sec.\ref{IP-ROUTE-FLUSH}, p.\pageref{IP-ROUTE-FLUSH}) which
1451provides similar and even richer functionality.
1452\end{NB}
1453
1454\paragraph{Example:}
1455\begin{itemize}
1456\item delete the multipath route created by the command in previous subsection
1457\begin{verbatim}
1458  ip route del default scope global nexthop dev ppp0 \
1459                                    nexthop dev ppp1
1460\end{verbatim}
1461\end{itemize}
1462
1463
1464
1465\subsection{{\tt ip route show} --- list routes}
1466
1467\paragraph{Abbreviations:} \verb|show|, \verb|list|, \verb|sh|, \verb|ls|, \verb|l|.
1468
1469\paragraph{Description:} the command displays the contents of the routing tables
1470or the route(s) selected by some criteria.
1471
1472
1473\paragraph{Arguments:}
1474\begin{itemize}
1475\item \verb|to SELECTOR| (default)
1476
1477--- only select routes from the given range of destinations. \verb|SELECTOR|
1478consists of an optional modifier (\verb|root|, \verb|match| or \verb|exact|)
1479and a prefix. \verb|root PREFIX| selects routes with prefixes not shorter
1480than \verb|PREFIX|. F.e.\ \verb|root 0/0| selects the entire routing table.
1481\verb|match PREFIX| selects routes with prefixes not longer than
1482\verb|PREFIX|. F.e.\ \verb|match 10.0/16| selects \verb|10.0/16|,
1483\verb|10/8| and \verb|0/0|, but it does not select \verb|10.1/16| and
1484\verb|10.0.0/24|. And \verb|exact PREFIX| (or just \verb|PREFIX|)
1485selects routes with this exact prefix. If neither of these options
1486are present, \verb|ip| assumes \verb|root 0/0| i.e.\ it lists the entire table.
1487
1488
1489\item \verb|tos TOS| or \verb|dsfield TOS|
1490
1491 --- only select routes with the given TOS.
1492
1493
1494\item \verb|table TABLEID|
1495
1496 --- show the routes from this table(s). The default setting is to show
1497\verb|table| \verb|main|. \verb|TABLEID| may either be the ID of a real table
1498or one of the special values:
1499  \begin{itemize}
1500  \item \verb|all| --- list all of the tables.
1501  \item \verb|cache| --- dump the routing cache.
1502  \end{itemize}
1503\begin{NB}
1504  IPv6 has a single table. However, splitting it into \verb|main|, \verb|local|
1505  and \verb|cache| is emulated by the \verb|ip| utility.
1506\end{NB}
1507
1508\item \verb|cloned| or \verb|cached|
1509
1510--- list cloned routes i.e.\ routes which were dynamically forked from
1511other routes because some route attribute (f.e.\ MTU) was updated.
1512Actually, it is equivalent to \verb|table cache|.
1513
1514\item \verb|from SELECTOR|
1515
1516--- the same syntax as for \verb|to|, but it binds the source address range
1517rather than destinations. Note that the \verb|from| option only works with
1518cloned routes.
1519
1520\item \verb|protocol RTPROTO|
1521
1522--- only list routes of this protocol.
1523
1524
1525\item \verb|scope SCOPE_VAL|
1526
1527--- only list routes with this scope.
1528
1529\item \verb|type TYPE|
1530
1531--- only list routes of this type.
1532
1533\item \verb|dev NAME|
1534
1535--- only list routes going via this device.
1536
1537\item \verb|via PREFIX|
1538
1539--- only list routes going via the nexthop routers selected by \verb|PREFIX|.
1540
1541\item \verb|src PREFIX|
1542
1543--- only list routes with preferred source addresses selected
1544by \verb|PREFIX|.
1545
1546\item \verb|realm REALMID| or \verb|realms FROMREALM/TOREALM|
1547
1548--- only list routes with these realms.
1549
1550\end{itemize}
1551
1552\paragraph{Examples:} Let us count routes of protocol \verb|gated/bgp|
1553on a router:
1554\begin{verbatim}
1555kuznet@amber:~ $ ip ro ls proto gated/bgp | wc
1556   1413    9891    79010
1557kuznet@amber:~ $
1558\end{verbatim}
1559To count the size of the routing cache, we have to use the \verb|-o| option
1560because cached attributes can take more than one line of output:
1561\begin{verbatim}
1562kuznet@amber:~ $ ip -o ro ls cloned | wc
1563   159    2543    18707
1564kuznet@amber:~ $
1565\end{verbatim}
1566
1567
1568\paragraph{Output format:} The output of this command consists
1569of per route records separated by line feeds.
1570However, some records may consist
1571of more than one line: particularly, this is the case when the route
1572is cloned or you requested additional statistics. If the
1573\verb|-o| option was given, then line feeds separating lines inside
1574records are replaced with the backslash sign.
1575
1576The output has the same syntax as arguments given to {\tt ip route add},
1577so that it can be understood easily. F.e.\
1578\begin{verbatim}
1579kuznet@amber:~ $ ip ro ls 193.233.7/24
1580193.233.7.0/24 dev eth0  proto gated/conn  scope link \
1581    src 193.233.7.65 realms inr.ac 
1582kuznet@amber:~ $
1583\end{verbatim}
1584
1585If you list cloned entries, the output contains other attributes which
1586are evaluated during route calculation and updated during route
1587lifetime. An example of the output is:
1588\begin{verbatim}
1589kuznet@amber:~ $ ip ro ls 193.233.7.82 tab cache
1590193.233.7.82 from 193.233.7.82 dev eth0  src 193.233.7.65 \
1591  realms inr.ac/inr.ac 
1592    cache <src-direct,redirect>  mtu 1500 rtt 300 iif eth0
1593193.233.7.82 dev eth0  src 193.233.7.65 realms inr.ac 
1594    cache  mtu 1500 rtt 300
1595kuznet@amber:~ $
1596\end{verbatim}
1597\begin{NB}
1598  \label{NB-strange-route}
1599  The route looks a bit strange, doesn't it? Did you notice that
1600  it is a path from 193.233.7.82 back to 193.233.82? Well, you will
1601  see in the section on \verb|ip route get| (p.\pageref{NB-nature-of-strangeness})
1602  how it appeared.
1603\end{NB}
1604The second line, starting with the word \verb|cache|, shows
1605additional attributes which normal routes do not possess.
1606Cached flags are summarized in angle brackets:
1607\begin{itemize}
1608\item \verb|local| --- packets are delivered locally.
1609It stands for loopback unicast routes, for broadcast routes
1610and for multicast routes, if this host is a member of the corresponding
1611group.
1612
1613\item \verb|reject| --- the path is bad. Any attempt to use it results
1614in an error. See attribute \verb|error| below (p.\pageref{IP-ROUTE-GET-error}).
1615
1616\item \verb|mc| --- the destination is multicast.
1617
1618\item \verb|brd| --- the destination is broadcast.
1619
1620\item \verb|src-direct| --- the source is on a directly connected
1621interface.
1622
1623\item \verb|redirected| --- the route was created by an ICMP Redirect.
1624
1625\item \verb|redirect| --- packets going via this route will 
1626trigger an ICMP redirect.
1627
1628\item \verb|fastroute| --- the route is eligible to be used for fastroute.
1629
1630\item \verb|equalize| --- make packet by packet randomization
1631along this path.
1632
1633\item \verb|dst-nat| --- the destination address requires translation.
1634
1635\item \verb|src-nat| --- the source address requires translation.
1636
1637\item \verb|masq| --- the source address requires masquerading.
1638This feature disappeared in linux-2.4.
1639
1640\item \verb|notify| --- ({\em not implemented}) change/deletion
1641of this route will trigger RTNETLINK notification.
1642\end{itemize}
1643
1644Then some optional attributes follow:
1645\begin{itemize}
1646\item \verb|error| --- on \verb|reject| routes it is error code
1647returned to local senders when they try to use this route.
1648These error codes are translated into ICMP error codes, sent to remote
1649senders, according to the rules described above in the subsection
1650devoted to route types (p.\pageref{IP-ROUTE-TYPES}).
1651\label{IP-ROUTE-GET-error}
1652
1653\item \verb|expires| --- this entry will expire after this timeout.
1654
1655\item \verb|iif| --- the packets for this path are expected to arrive
1656on this interface.
1657\end{itemize}
1658
1659\paragraph{Statistics:} With the \verb|-statistics| option, more
1660information about this route is shown:
1661\begin{itemize}
1662\item \verb|users| --- the number of users of this entry.
1663\item \verb|age| --- shows when this route was last used.
1664\item \verb|used| --- the number of lookups of this route since its creation.
1665\end{itemize}
1666
1667
1668\subsection{{\tt ip route flush} --- flush routing tables}
1669\label{IP-ROUTE-FLUSH}
1670
1671\paragraph{Abbreviations:} \verb|flush|, \verb|f|.
1672
1673\paragraph{Description:} this command flushes routes selected
1674by some criteria.
1675
1676\paragraph{Arguments:} the arguments have the same syntax and semantics
1677as the arguments of \verb|ip route show|, but routing tables are not
1678listed but purged. The only difference is the default action: \verb|show|
1679dumps all the IP main routing table but \verb|flush| prints the helper page.
1680The reason for this difference does not require any explanation, does it?
1681
1682
1683\paragraph{Statistics:} With the \verb|-statistics| option, the command
1684becomes verbose. It prints out the number of deleted routes and the number
1685of rounds made to flush the routing table. If the option is given
1686twice, \verb|ip route flush| also dumps all the deleted routes
1687in the format described in the previous subsection.
1688
1689\paragraph{Examples:} The first example flushes all the
1690gatewayed routes from the main table (f.e.\ after a routing daemon crash).
1691\begin{verbatim}
1692netadm@amber:~ # ip -4 ro flush scope global type unicast
1693\end{verbatim}
1694This option deserves to be put into a scriptlet \verb|routef|.
1695\begin{NB}
1696This option was described in the \verb|route(8)| man page borrowed
1697from BSD, but was never implemented in Linux.
1698\end{NB}
1699
1700The second example flushes all IPv6 cloned routes:
1701\begin{verbatim}
1702netadm@amber:~ # ip -6 -s -s ro flush cache
17033ffe:2400::220:afff:fef4:c5d1 via 3ffe:2400::220:afff:fef4:c5d1 \
1704  dev eth0  metric 0 
1705    cache  used 2 age 12sec mtu 1500 rtt 300
17063ffe:2400::280:adff:feb7:8034 via 3ffe:2400::280:adff:feb7:8034 \
1707  dev eth0  metric 0 
1708    cache  used 2 age 15sec mtu 1500 rtt 300
17093ffe:2400::280:c8ff:fe59:5bcc via 3ffe:2400::280:c8ff:fe59:5bcc \
1710  dev eth0  metric 0 
1711    cache  users 1 used 1 age 23sec mtu 1500 rtt 300
17123ffe:2400:0:1:2a0:ccff:fe66:1878 via 3ffe:2400:0:1:2a0:ccff:fe66:1878 \
1713  dev eth1  metric 0 
1714    cache  used 2 age 20sec mtu 1500 rtt 300
17153ffe:2400:0:1:a00:20ff:fe71:fb30 via 3ffe:2400:0:1:a00:20ff:fe71:fb30 \
1716  dev eth1  metric 0 
1717    cache  used 2 age 33sec mtu 1500 rtt 300
1718ff02::1 via ff02::1 dev eth1  metric 0 
1719    cache  users 1 used 1 age 45sec mtu 1500 rtt 300
1720
1721*** Round 1, deleting 6 entries ***
1722*** Flush is complete after 1 round ***
1723netadm@amber:~ # ip -6 -s -s ro flush cache
1724Nothing to flush.
1725netadm@amber:~ #
1726\end{verbatim}
1727
1728The third example flushes BGP routing tables after a \verb|gated|
1729death.
1730\begin{verbatim}
1731netadm@amber:~ # ip ro ls proto gated/bgp | wc
1732   1408    9856    78730
1733netadm@amber:~ # ip -s ro f proto gated/bgp
1734
1735*** Round 1, deleting 1408 entries ***
1736*** Flush is complete after 1 round ***
1737netadm@amber:~ # ip ro f proto gated/bgp
1738Nothing to flush.
1739netadm@amber:~ # ip ro ls proto gated/bgp
1740netadm@amber:~ #
1741\end{verbatim}
1742
1743
1744\subsection{{\tt ip route get} --- get a single route}
1745\label{IP-ROUTE-GET}
1746
1747\paragraph{Abbreviations:} \verb|get|, \verb|g|.
1748
1749\paragraph{Description:} this command gets a single route to a destination
1750and prints its contents exactly as the kernel sees it.
1751
1752\paragraph{Arguments:} 
1753\begin{itemize}
1754\item \verb|to ADDRESS| (default)
1755
1756--- the destination address.
1757
1758\item \verb|from ADDRESS|
1759
1760--- the source address.
1761
1762\item \verb|tos TOS| or \verb|dsfield TOS|
1763
1764--- the Type Of Service.
1765
1766\item \verb|iif NAME|
1767
1768--- the device from which this packet is expected to arrive.
1769
1770\item \verb|oif NAME|
1771
1772--- force the output device on which this packet will be routed.
1773
1774\item \verb|connected|
1775
1776--- if no source address (option \verb|from|) was given, relookup
1777the route with the source set to the preferred address received from the first lookup.
1778If policy routing is used, it may be a different route.
1779
1780\end{itemize}
1781
1782Note that this operation is not equivalent to \verb|ip route show|.
1783\verb|show| shows existing routes. \verb|get| resolves them and
1784creates new clones if necessary. Essentially, \verb|get|
1785is equivalent to sending a packet along this path.
1786If the \verb|iif| argument is not given, the kernel creates a route
1787to output packets towards the requested destination.
1788This is equivalent to pinging the destination
1789with a subsequent {\tt ip route ls cache}, however, no packets are
1790actually sent. With the \verb|iif| argument, the kernel pretends
1791that a packet arrived from this interface and searches for
1792a path to forward the packet.
1793
1794\paragraph{Output format:} This command outputs routes in the same
1795format as \verb|ip route ls|.
1796
1797\paragraph{Examples:} 
1798\begin{itemize}
1799\item Find a route to output packets to 193.233.7.82:
1800\begin{verbatim}
1801kuznet@amber:~ $ ip route get 193.233.7.82
1802193.233.7.82 dev eth0  src 193.233.7.65 realms inr.ac
1803    cache  mtu 1500 rtt 300
1804kuznet@amber:~ $
1805\end{verbatim}
1806
1807\item Find a route to forward packets arriving on \verb|eth0|
1808from 193.233.7.82 and destined for 193.233.7.82:
1809\begin{verbatim}
1810kuznet@amber:~ $ ip r g 193.233.7.82 from 193.233.7.82 iif eth0
1811193.233.7.82 from 193.233.7.82 dev eth0  src 193.233.7.65 \
1812  realms inr.ac/inr.ac 
1813    cache <src-direct,redirect>  mtu 1500 rtt 300 iif eth0
1814kuznet@amber:~ $
1815\end{verbatim}
1816\begin{NB}
1817  \label{NB-nature-of-strangeness}
1818  This is the command that created the funny route from 193.233.7.82
1819  looped back to 193.233.7.82 (cf.\ NB on~p.\pageref{NB-strange-route}).
1820  Note the \verb|redirect| flag on it.
1821\end{NB}
1822
1823\item Find a multicast route for packets arriving on \verb|eth0|
1824from host 193.233.7.82 and destined for multicast group 224.2.127.254
1825(it is assumed that a multicast routing daemon is running.
1826In this case, it is \verb|pimd|)
1827\begin{verbatim}
1828kuznet@amber:~ $ ip r g 224.2.127.254 from 193.233.7.82 iif eth0
1829multicast 224.2.127.254 from 193.233.7.82 dev lo  \
1830  src 193.233.7.65 realms inr.ac/cosmos 
1831    cache <mc> iif eth0 Oifs: eth1 pimreg
1832kuznet@amber:~ $
1833\end{verbatim}
1834This route differs from the ones seen before. It contains a ``normal'' part
1835and a ``multicast'' part. The normal part is used to deliver (or not to
1836deliver) the packet to local IP listeners. In this case the router
1837is not a member
1838of this group, so that route has no \verb|local| flag and only
1839forwards packets. The output device for such entries is always loopback.
1840The multicast part consists of an additional \verb|Oifs:| list showing
1841the output interfaces.
1842\end{itemize}
1843
1844
1845It is time for a more complicated example. Let us add an invalid
1846gatewayed route for a destination which is really directly connected:
1847\begin{verbatim}
1848netadm@alisa:~ # ip route add 193.233.7.98 via 193.233.7.254
1849netadm@alisa:~ # ip route get 193.233.7.98
1850193.233.7.98 via 193.233.7.254 dev eth0  src 193.233.7.90
1851    cache  mtu 1500 rtt 3072
1852netadm@alisa:~ #
1853\end{verbatim}
1854and probe it with ping:
1855\begin{verbatim}
1856netadm@alisa:~ # ping -n 193.233.7.98
1857PING 193.233.7.98 (193.233.7.98) from 193.233.7.90 : 56 data bytes
1858From 193.233.7.254: Redirect Host(New nexthop: 193.233.7.98)
185964 bytes from 193.233.7.98: icmp_seq=0 ttl=255 time=3.5 ms
1860From 193.233.7.254: Redirect Host(New nexthop: 193.233.7.98)
186164 bytes from 193.233.7.98: icmp_seq=1 ttl=255 time=2.2 ms
186264 bytes from 193.233.7.98: icmp_seq=2 ttl=255 time=0.4 ms
186364 bytes from 193.233.7.98: icmp_seq=3 ttl=255 time=0.4 ms
186464 bytes from 193.233.7.98: icmp_seq=4 ttl=255 time=0.4 ms
1865^C
1866--- 193.233.7.98 ping statistics ---
18675 packets transmitted, 5 packets received, 0% packet loss
1868round-trip min/avg/max = 0.4/1.3/3.5 ms
1869netadm@alisa:~ #
1870\end{verbatim}
1871What happened? Router 193.233.7.254 understood that we have a much
1872better path to the destination and sent us an ICMP redirect message.
1873We may retry \verb|ip route get| to see what we have in the routing
1874tables now:
1875\begin{verbatim}
1876netadm@alisa:~ # ip route get 193.233.7.98
1877193.233.7.98 dev eth0  src 193.233.7.90 
1878    cache <redirected>  mtu 1500 rtt 3072
1879netadm@alisa:~ #
1880\end{verbatim}
1881
1882
1883
1884\section{{\tt ip rule} --- routing policy database management}
1885\label{IP-RULE}
1886
1887\paragraph{Abbreviations:} \verb|rule|, \verb|ru|.
1888
1889\paragraph{Object:} \verb|rule|s in the routing policy database control
1890the route selection algorithm.
1891
1892Classic routing algorithms used in the Internet make routing decisions
1893based only on the destination address of packets (and in theory,
1894but not in practice, on the TOS field). The seminal review of classic
1895routing algorithms and their modifications can be found in~\cite{RFC1812}.
1896
1897In some circumstances we want to route packets differently depending not only
1898on destination addresses, but also on other packet fields: source address,
1899IP protocol, transport protocol ports or even packet payload.
1900This task is called ``policy routing''.
1901
1902\begin{NB}
1903  ``policy routing'' $\neq$ ``routing policy''.
1904
1905\noindent	``policy routing'' $=$ ``cunning routing''.
1906
1907\noindent	``routing policy'' $=$ ``routing tactics'' or ``routing plan''.
1908\end{NB}
1909
1910To solve this task, the conventional destination based routing table, ordered
1911according to the longest match rule, is replaced with a ``routing policy
1912database'' (or RPDB), which selects routes
1913by executing some set of rules. The rules may have lots of keys of different
1914natures and therefore they have no natural ordering, but one imposed
1915by the administrator. Linux-2.2 RPDB is a linear list of rules
1916ordered by numeric priority value.
1917RPDB explicitly allows matching a few packet fields:
1918
1919\begin{itemize}
1920\item packet source address.
1921\item packet destination address.
1922\item TOS.
1923\item incoming interface (which is packet metadata, rather than a packet field).
1924\end{itemize}
1925
1926Matching IP protocols and transport ports is also possible,
1927indirectly, via \verb|ipchains|, by exploiting their ability
1928to mark some classes of packets with \verb|fwmark|. Therefore,
1929\verb|fwmark| is also included in the set of keys checked by rules.
1930
1931Each policy routing rule consists of a {\em selector\/} and an {\em action\/}
1932predicate. The RPDB is scanned in the order of increasing priority. The selector
1933of each rule is applied to \{source address, destination address, incoming
1934interface, tos, fwmark\} and, if the selector matches the packet,
1935the action is performed.  The action predicate may return with success.
1936In this case, it will either give a route or failure indication
1937and the RPDB lookup is terminated. Otherwise, the RPDB program
1938continues on the next rule.
1939
1940What is the action, semantically? The natural action is to select the
1941nexthop and the output device. This is what
1942Cisco IOS~\cite{IOS} does. Let us call it ``match \& set''.
1943The Linux-2.2 approach is more flexible. The action includes
1944lookups in destination-based routing tables and selecting
1945a route from these tables according to the classic longest match algorithm.
1946The ``match \& set'' approach is the simplest case of the Linux one. It is realized
1947when a second level routing table contains a single default route.
1948Recall that Linux-2.2 supports multiple tables
1949managed with the \verb|ip route| command, described in the previous section.
1950
1951At startup time the kernel configures the default RPDB consisting of three
1952rules:
1953
1954\begin{enumerate}
1955\item Priority: 0, Selector: match anything, Action: lookup routing
1956table \verb|local| (ID 255).
1957The \verb|local| table is a special routing table containing
1958high priority control routes for local and broadcast addresses.
1959
1960Rule 0 is special. It cannot be deleted or overridden.
1961
1962
1963\item Priority: 32766, Selector: match anything, Action: lookup routing
1964table \verb|main| (ID 254).
1965The \verb|main| table is the normal routing table containing all non-policy
1966routes. This rule may be deleted and/or overridden with other
1967ones by the administrator.
1968
1969\item Priority: 32767, Selector: match anything, Action: lookup routing
1970table \verb|default| (ID 253).
1971The \verb|default| table is empty. It is reserved for some
1972post-processing if no previous default rules selected the packet.
1973This rule may also be deleted.
1974
1975\end{enumerate}
1976
1977Do not confuse routing tables with rules: rules point to routing tables,
1978several rules may refer to one routing table and some routing tables
1979may have no rules pointing to them. If the administrator deletes all the rules
1980referring to a table, the table is not used, but it still exists
1981and will disappear only after all the routes contained in it are deleted.
1982
1983
1984\paragraph{Rule attributes:} Each RPDB entry has additional
1985attributes. F.e.\ each rule has a pointer to some routing
1986table. NAT and masquerading rules have an attribute to select new IP
1987address to translate/masquerade. Besides that, rules have some
1988optional attributes, which routes have, namely \verb|realms|.
1989These values do not override those contained in the routing tables. They
1990are only used if the route did not select any attributes.
1991
1992
1993\paragraph{Rule types:} The RPDB may contain rules of the following
1994types:
1995\begin{itemize}
1996\item \verb|unicast| --- the rule prescribes to return the route found
1997in the routing table referenced by the rule.
1998\item \verb|blackhole| --- the rule prescribes to silently drop the packet.
1999\item \verb|unreachable| --- the rule prescribes to generate a ``Network
2000is unreachable'' error.
2001\item \verb|prohibit| --- the rule prescribes to generate
2002``Communication is administratively prohibited'' error.
2003\item \verb|nat| --- the rule prescribes to translate the source address
2004of the IP packet into some other value. More about NAT is
2005in Appendix~\ref{ROUTE-NAT}, p.\pageref{ROUTE-NAT}.
2006\end{itemize}
2007
2008
2009\paragraph{Commands:} \verb|add|, \verb|delete| and \verb|show|
2010(or \verb|list|).
2011
2012\subsection{{\tt ip rule add} --- insert a new rule\\
2013	{\tt ip rule delete} --- delete a rule}
2014\label{IP-RULE-ADD}
2015
2016\paragraph{Abbreviations:} \verb|add|, \verb|a|; \verb|delete|, \verb|del|,
2017	\verb|d|.
2018
2019\paragraph{Arguments:}
2020
2021\begin{itemize}
2022\item \verb|type TYPE| (default)
2023
2024--- the type of this rule. The list of valid types was given in the previous
2025subsection.
2026
2027\item \verb|from PREFIX|
2028
2029--- select the source prefix to match.
2030
2031\item \verb|to PREFIX|
2032
2033--- select the destination prefix to match.
2034
2035\item \verb|iif NAME|
2036
2037--- select the incoming device to match. If the interface is loopback,
2038the rule only matches packets originating from this host. This means that you
2039may create separate routing tables for forwarded and local packets and,
2040hence, completely segregate them.
2041
2042\item \verb|tos TOS| or \verb|dsfield TOS|
2043
2044--- select the TOS value to match.
2045
2046\item \verb|fwmark MARK|
2047
2048--- select the \verb|fwmark| value to match.
2049
2050\item \verb|priority PREFERENCE|
2051
2052--- the priority of this rule. Each rule should have an explicitly
2053set {\em unique\/} priority value.
2054\begin{NB}
2055  Really, for historical reasons \verb|ip rule add| does not require a
2056  priority value and allows them to be non-unique.
2057  If the user does not supplied a priority, it is selected by the kernel.
2058  If the user creates a rule with a priority value that
2059  already exists, the kernel does not reject the request. It adds
2060  the new rule before all old rules of the same priority.
2061
2062  It is mistake in design, no more. And it will be fixed one day,
2063  so do not rely on this feature. Use explicit priorities.
2064\end{NB}
2065
2066
2067\item \verb|table TABLEID|
2068
2069--- the routing table identifier to lookup if the rule selector matches.
2070
2071\item \verb|realms FROM/TO|
2072
2073--- Realms to select if the rule matched and the routing table lookup
2074succeeded. Realm \verb|TO| is only used if the route did not select
2075any realm.
2076
2077\item \verb|nat ADDRESS|
2078
2079--- The base of the IP address block to translate (for source addresses).
2080The \verb|ADDRESS| may be either the start of the block of NAT addresses
2081(selected by NAT routes) or in linux-2.2 a local host address (or even zero).
2082In the last case the router does not translate the packets,
2083but masquerades them to this address; this feature disappered in 2.4.
2084More about NAT is in Appendix~\ref{ROUTE-NAT},
2085p.\pageref{ROUTE-NAT}.
2086
2087\end{itemize}
2088
2089\paragraph{Warning:} Changes to the RPDB made with these commands
2090do not become active immediately. It is assumed that after
2091a script finishes a batch of updates, it flushes the routing cache
2092with \verb|ip route flush cache|.
2093
2094\paragraph{Examples:}
2095\begin{itemize}
2096\item Route packets with source addresses from 192.203.80/24
2097according to routing table \verb|inr.ruhep|:
2098\begin{verbatim}
2099ip ru add from 192.203.80.0/24 table inr.ruhep prio 220
2100\end{verbatim}
2101
2102\item Translate packet source address 193.233.7.83 into 192.203.80.144
2103and route it according to table \#1 (actually, it is \verb|inr.ruhep|):
2104\begin{verbatim}
2105ip ru add from 193.233.7.83 nat 192.203.80.144 table 1 prio 320
2106\end{verbatim}
2107
2108\item Delete the unused default rule:
2109\begin{verbatim}
2110ip ru del prio 32767
2111\end{verbatim}
2112
2113\end{itemize}
2114
2115
2116
2117\subsection{{\tt ip rule show} --- list rules}
2118\label{IP-RULE-SHOW}
2119
2120\paragraph{Abbreviations:} \verb|show|, \verb|list|, \verb|sh|, \verb|ls|, \verb|l|.
2121
2122
2123\paragraph{Arguments:} Good news, this is one command that has no arguments.
2124
2125\paragraph{Output format:}
2126
2127\begin{verbatim}
2128kuznet@amber:~ $ ip ru ls
21290:	from all lookup local 
2130200:	from 192.203.80.0/24 to 193.233.7.0/24 lookup main
2131210:	from 192.203.80.0/24 to 192.203.80.0/24 lookup main
2132220:	from 192.203.80.0/24 lookup inr.ruhep realms inr.ruhep/radio-msu
2133300:	from 193.233.7.83 to 193.233.7.0/24 lookup main
2134310:	from 193.233.7.83 to 192.203.80.0/24 lookup main
2135320:	from 193.233.7.83 lookup inr.ruhep map-to 192.203.80.144
213632766:	from all lookup main 
2137kuznet@amber:~ $
2138\end{verbatim}
2139
2140In the first column is the rule priority value followed
2141by a colon. Then the selectors follow. Each key is prefixed
2142with the same keyword that was used to create the rule.
2143
2144The keyword \verb|lookup| is followed by a routing table identifier,
2145as it is recorded in the file \verb|/etc/iproute2/rt_tables|.
2146
2147If the rule does NAT (f.e.\ rule \#320), it is shown by the keyword
2148\verb|map-to| followed by the start of the block of addresses to map.
2149
2150The sense of this example is pretty simple. The prefixes
2151192.203.80.0/24 and 193.233.7.0/24 form the internal network, but
2152they are routed differently when the packets leave it.
2153Besides that, the host 193.233.7.83 is translated into
2154another prefix to look like 192.203.80.144 when talking
2155to the outer world.
2156
2157
2158
2159\section{{\tt ip maddress} --- multicast addresses management}
2160\label{IP-MADDR}
2161
2162\paragraph{Object:} \verb|maddress| objects are multicast addresses.
2163
2164\paragraph{Commands:} \verb|add|, \verb|delete|, \verb|show| (or \verb|list|).
2165
2166\subsection{{\tt ip maddress show} --- list multicast addresses}
2167
2168\paragraph{Abbreviations:} \verb|show|, \verb|list|, \verb|sh|, \verb|ls|, \verb|l|.
2169
2170\paragraph{Arguments:}
2171
2172\begin{itemize}
2173
2174\item \verb|dev NAME| (default)
2175
2176--- the device name.
2177
2178\end{itemize}
2179
2180\paragraph{Output format:}
2181
2182\begin{verbatim}
2183kuznet@alisa:~ $ ip maddr ls dummy
21842:  dummy
2185    link  33:33:00:00:00:01
2186    link  01:00:5e:00:00:01
2187    inet  224.0.0.1 users 2
2188    inet6 ff02::1
2189kuznet@alisa:~ $ 
2190\end{verbatim}
2191
2192The first line of the output shows the interface index and its name.
2193Then the multicast address list follows. Each line starts with the
2194protocol identifier. The word \verb|link| denotes a link layer
2195multicast addresses.
2196
2197If a multicast address has more than one user, the number
2198of users is shown after the \verb|users| keyword.
2199
2200One additional feature not present in the example above
2201is the \verb|static| flag, which indicates that the address was joined
2202with \verb|ip maddr add|. See the following subsection.
2203
2204
2205
2206\subsection{{\tt ip maddress add} --- add a multicast address\\
2207	    {\tt ip maddress delete} --- delete a multicast address}
2208
2209\paragraph{Abbreviations:} \verb|add|, \verb|a|; \verb|delete|, \verb|del|, \verb|d|.
2210
2211\paragraph{Description:} these commands attach/detach
2212a static link layer multicast address to listen on the interface.
2213Note that it is impossible to join protocol multicast groups
2214statically. This command only manages link layer addresses.
2215
2216
2217\paragraph{Arguments:}
2218
2219\begin{itemize}
2220\item \verb|address LLADDRESS| (default)
2221
2222--- the link layer multicast address.
2223
2224\item \verb|dev NAME|
2225
2226--- the device to join/leave this multicast address.
2227
2228\end{itemize}
2229
2230
2231\paragraph{Example:} Let us continue with the example from the previous subsection.
2232
2233\begin{verbatim}
2234netadm@alisa:~ # ip maddr add 33:33:00:00:00:01 dev dummy
2235netadm@alisa:~ # ip -0 maddr ls dummy
22362:  dummy
2237    link  33:33:00:00:00:01 users 2 static
2238    link  01:00:5e:00:00:01
2239netadm@alisa:~ # ip maddr del 33:33:00:00:00:01 dev dummy
2240\end{verbatim}
2241
2242\begin{NB}
2243 Neither \verb|ip| nor the kernel check for multicast address validity.
2244 Particularly, this means that you can try to load a unicast address
2245 instead of a multicast address. Most drivers will ignore such addresses,
2246 but several (f.e.\ Tulip) will intern it to their on-board filter.
2247 The effects may be strange. Namely, the addresses become additional
2248 local link addresses and, if you loaded the address of another host
2249 to the router, wait for duplicated packets on the wire.
2250 It is not a bug, but rather a hole in the API and intra-kernel interfaces.
2251 This feature is really more useful for traffic monitoring, but using it
2252 with Linux-2.2 you {\em have to\/} be sure that the host is not
2253 a router and, especially, that it is not a transparent proxy or masquerading
2254 agent.
2255\end{NB}
2256
2257
2258
2259\section{{\tt ip mroute} --- multicast routing cache management}
2260\label{IP-MROUTE}
2261
2262\paragraph{Abbreviations:} \verb|mroute|, \verb|mr|.
2263
2264\paragraph{Object:} \verb|mroute| objects are multicast routing cache
2265entries created by a user level mrouting daemon
2266(f.e.\ \verb|pimd| or \verb|mrouted|).
2267
2268Due to the limitations of the current interface to the multicast routing
2269engine, it is impossible to change \verb|mroute| objects administratively,
2270so we may only display them. This limitation will be removed
2271in the future.
2272
2273\paragraph{Commands:} \verb|show| (or \verb|list|).
2274
2275
2276\subsection{{\tt ip mroute show} --- list mroute cache entries}
2277
2278\paragraph{Abbreviations:} \verb|show|, \verb|list|, \verb|sh|, \verb|ls|, \verb|l|.
2279
2280\paragraph{Arguments:}
2281
2282\begin{itemize}
2283\item \verb|to PREFIX| (default)
2284
2285--- the prefix selecting the destination multicast addresses to list.
2286
2287
2288\item \verb|iif NAME|
2289
2290--- the interface on which multicast packets are received.
2291
2292
2293\item \verb|from PREFIX|
2294
2295--- the prefix selecting the IP source addresses of the multicast route.
2296
2297
2298\end{itemize}
2299
2300\paragraph{Output format:}
2301
2302\begin{verbatim}
2303kuznet@amber:~ $ ip mroute ls
2304(193.232.127.6, 224.0.1.39)      Iif: unresolved 
2305(193.232.244.34, 224.0.1.40)     Iif: unresolved 
2306(193.233.7.65, 224.66.66.66)     Iif: eth0       Oifs: pimreg 
2307kuznet@amber:~ $ 
2308\end{verbatim}
2309
2310Each line shows one (S,G) entry in the multicast routing cache,
2311where S is the source address and G is the multicast group. \verb|Iif| is
2312the interface on which multicast packets are expected to arrive.
2313If the word \verb|unresolved| is there instead of the interface name,
2314it means that the routing daemon still hasn't resolved this entry.
2315The keyword \verb|oifs| is followed by a list of output interfaces, separated
2316by spaces. If a multicast routing entry is created with non-trivial
2317TTL scope, administrative distances are appended to the device names
2318in the \verb|oifs| list.
2319
2320\paragraph{Statistics:} The \verb|-statistics| option also prints the
2321number of packets and bytes forwarded along this route and
2322the number of packets that arrived on the wrong interface, if this number is not zero.
2323
2324\begin{verbatim}
2325kuznet@amber:~ $ ip -s mr ls 224.66/16
2326(193.233.7.65, 224.66.66.66)     Iif: eth0       Oifs: pimreg 
2327  9383 packets, 300256 bytes
2328kuznet@amber:~ $
2329\end{verbatim}
2330
2331
2332\section{{\tt ip tunnel} --- tunnel configuration}
2333\label{IP-TUNNEL}
2334
2335\paragraph{Abbreviations:} \verb|tunnel|, \verb|tunl|.
2336
2337\paragraph{Object:} \verb|tunnel| objects are tunnels, encapsulating
2338packets in IPv4 packets and then sending them over the IP infrastructure.
2339
2340\paragraph{Commands:} \verb|add|, \verb|delete|, \verb|change|, \verb|show|
2341(or \verb|list|).
2342
2343\paragraph{See also:} A more informal discussion of tunneling
2344over IP and the \verb|ip tunnel| command can be found in~\cite{IP-TUNNELS}.
2345
2346\subsection{{\tt ip tunnel add} --- add a new tunnel\\
2347	{\tt ip tunnel change} --- change an existing tunnel\\
2348	{\tt ip tunnel delete} --- destroy a tunnel}
2349
2350\paragraph{Abbreviations:} \verb|add|, \verb|a|; \verb|change|, \verb|chg|;
2351\verb|delete|, \verb|del|, \verb|d|.
2352
2353
2354\paragraph{Arguments:}
2355
2356\begin{itemize}
2357
2358\item \verb|name NAME| (default)
2359
2360--- select the tunnel device name.
2361
2362\item \verb|mode MODE|
2363
2364--- set the tunnel mode. Three modes are currently available:
2365	\verb|ipip|, \verb|sit| and \verb|gre|.
2366
2367\item \verb|remote ADDRESS|
2368
2369--- set the remote endpoint of the tunnel.
2370
2371\item \verb|local ADDRESS|
2372
2373--- set the fixed local address for tunneled packets.
2374It must be an address on another interface of this host.
2375
2376\item \verb|ttl N|
2377
2378--- set a fixed TTL \verb|N| on tunneled packets.
2379	\verb|N| is a number in the range 1--255. 0 is a special value
2380	meaning that packets inherit the TTL value. 
2381		The default value is: \verb|inherit|.
2382
2383\item \verb|tos T| or \verb|dsfield T|
2384
2385--- set a fixed TOS \verb|T| on tunneled packets.
2386		The default value is: \verb|inherit|.
2387
2388
2389
2390\item \verb|dev NAME| 
2391
2392--- bind the tunnel to the device \verb|NAME| so that
2393	tunneled packets will only be routed via this device and will
2394	not be able to escape to another device when the route to endpoint changes.
2395
2396\item \verb|nopmtudisc|
2397
2398--- disable Path MTU Discovery on this tunnel.
2399	It is enabled by default. Note that a fixed ttl is incompatible
2400	with this option: tunnelling with a fixed ttl always makes pmtu discovery.
2401
2402\item \verb|key K|, \verb|ikey K|, \verb|okey K|
2403
2404--- (only GRE tunnels) use keyed GRE with key \verb|K|. \verb|K| is
2405	either a number or an IP address-like dotted quad.
2406   The \verb|key| parameter sets the key to use in both directions.
2407   The \verb|ikey| and \verb|okey| parameters set different keys for input and output.
2408   
2409
2410\item \verb|csum|, \verb|icsum|, \verb|ocsum|
2411
2412--- (only GRE tunnels) generate/require checksums for tunneled packets.
2413   The \verb|ocsum| flag calculates checksums for outgoing packets.
2414   The \verb|icsum| flag requires that all input packets have the correct
2415   checksum. The \verb|csum| flag is equivalent to the combination
2416  ``\verb|icsum| \verb|ocsum|''.
2417
2418\item \verb|seq|, \verb|iseq|, \verb|oseq|
2419
2420--- (only GRE tunnels) serialize packets.
2421   The \verb|oseq| flag enables sequencing of outgoing packets.
2422   The \verb|iseq| flag requires that all input packets are serialized.
2423   The \verb|seq| flag is equivalent to the combination ``\verb|iseq| \verb|oseq|''.
2424
2425\begin{NB}
2426 I think this option does not
2427	work. At least, I did not test it, did not debug it and
2428	do not even understand how it is supposed to work or for what
2429	purpose Cisco planned to use it. Do not use it.
2430\end{NB}
2431
2432
2433\end{itemize}
2434
2435\paragraph{Example:} Create a pointopoint IPv6 tunnel with maximal TTL of 32.
2436\begin{verbatim}
2437netadm@amber:~ # ip tunl add Cisco mode sit remote 192.31.7.104 \
2438    local 192.203.80.142 ttl 32 
2439\end{verbatim}
2440
2441\subsection{{\tt ip tunnel show} --- list tunnels}
2442
2443\paragraph{Abbreviations:} \verb|show|, \verb|list|, \verb|sh|, \verb|ls|, \verb|l|.
2444
2445
2446\paragraph{Arguments:} None.
2447
2448\paragraph{Output format:}
2449\begin{verbatim}
2450kuznet@amber:~ $ ip tunl ls Cisco
2451Cisco: ipv6/ip  remote 192.31.7.104  local 192.203.80.142  ttl 32 
2452kuznet@amber:~ $ 
2453\end{verbatim}
2454The line starts with the tunnel device name followed by a colon.
2455Then the tunnel mode follows. The parameters of the tunnel are listed
2456with the same keywords that were used when creating the tunnel.
2457
2458\paragraph{Statistics:}
2459
2460\begin{verbatim}
2461kuznet@amber:~ $ ip -s tunl ls Cisco
2462Cisco: ipv6/ip  remote 192.31.7.104  local 192.203.80.142  ttl 32 
2463RX: Packets    Bytes        Errors CsumErrs OutOfSeq Mcasts
2464    12566      1707516      0      0        0        0       
2465TX: Packets    Bytes        Errors DeadLoop NoRoute  NoBufs
2466    13445      1879677      0      0        0        0     
2467kuznet@amber:~ $ 
2468\end{verbatim}
2469Essentially, these numbers are the same as the numbers
2470printed with {\tt ip -s link show}
2471(sec.\ref{IP-LINK-SHOW}, p.\pageref{IP-LINK-SHOW}) but the tags are different
2472to reflect that they are tunnel specific.
2473\begin{itemize}
2474\item \verb|CsumErrs| --- the total number of packets dropped
2475because of checksum failures for a GRE tunnel with checksumming enabled.
2476\item \verb|OutOfSeq| --- the total number of packets dropped
2477because they arrived out of sequence for a GRE tunnel with
2478serialization enabled.
2479\item \verb|Mcasts| --- the total number of multicast packets
2480received on a broadcast GRE tunnel.
2481\item \verb|DeadLoop| --- the total number of packets which were not
2482transmitted because the tunnel is looped back to itself.
2483\item \verb|NoRoute| --- the total number of packets which were not
2484transmitted because there is no IP route to the remote endpoint.
2485\item \verb|NoBufs| --- the total number of packets which were not
2486transmitted because the kernel failed to allocate a buffer.
2487\end{itemize}
2488
2489
2490\section{{\tt ip monitor} and {\tt rtmon} --- state monitoring}
2491\label{IP-MONITOR}
2492
2493The \verb|ip| utility can monitor the state of devices, addresses
2494and routes continuously. This option has a slightly different format.
2495Namely,
2496the \verb|monitor| command is the first in the command line and then
2497the object list follows:
2498\begin{verbatim}
2499  ip monitor [ file FILE ] [ all | OBJECT-LIST ]
2500\end{verbatim}
2501\verb|OBJECT-LIST| is the list of object types that we want to monitor.
2502It may contain \verb|link|, \verb|address| and \verb|route|.
2503If no \verb|file| argument is given, \verb|ip| opens RTNETLINK,
2504listens on it and dumps state changes in the format described
2505in previous sections.
2506
2507If a file name is given, it does not listen on RTNETLINK,
2508but opens the file containing RTNETLINK messages saved in binary format
2509and dumps them. Such a history file can be generated with the
2510\verb|rtmon| utility. This utility has a command line syntax similar to
2511\verb|ip monitor|.
2512Ideally, \verb|rtmon| should be started before
2513the first network configuration command is issued. F.e.\ if
2514you insert:
2515\begin{verbatim}
2516  rtmon file /var/log/rtmon.log
2517\end{verbatim}
2518in a startup script, you will be able to view the full history
2519later.
2520
2521Certainly, it is possible to start \verb|rtmon| at any time.
2522It prepends the history with the state snapshot dumped at the moment
2523of starting.
2524
2525
2526\section{Route realms and policy propagation, {\tt rtacct}}
2527\label{RT-REALMS}
2528
2529On routers using OSPF ASE or, especially, the BGP protocol, routing
2530tables may be huge. If we want to classify or to account for the packets
2531per route, we will have to keep lots of information. Even worse, if we
2532want to distinguish the packets not only by their destination, but
2533also by their source, the task gets quadratic complexity and its solution
2534is physically impossible.
2535
2536One approach to propagating the policy from routing protocols
2537to the forwarding engine has been proposed in~\cite{IOS-BGP-PP}.
2538Essentially, Cisco Policy Propagation via BGP is based on the fact
2539that dedicated routers all have the RIB (Routing Information Base)
2540close to the forwarding engine, so policy routing rules can
2541check all the route attributes, including ASPATH information
2542and community strings.
2543
2544The Linux architecture, splitting the RIB (maintained by a user level
2545daemon) and the kernel based FIB (Forwarding Information Base),
2546does not allow such a simple approach.
2547
2548It is to our fortune because there is another solution
2549which allows even more flexible policy and richer semantics.
2550
2551Namely, routes can be clustered together in user space, based on their
2552attributes.  F.e.\ a BGP router knows route ASPATH, its community;
2553an OSPF router knows the route tag or its area. The administrator, when adding
2554routes manually, also knows their nature. Providing that the number of such
2555aggregates (we call them {\em realms\/}) is low, the task of full
2556classification both by source and destination becomes quite manageable.
2557
2558So each route may be assigned to a realm. It is assumed that
2559this identification is made by a routing daemon, but static routes
2560can also be handled manually with \verb|ip route| (see sec.\ref{IP-ROUTE},
2561p.\pageref{IP-ROUTE}).
2562\begin{NB}
2563  There is a patch to \verb|gated|, allowing classification of routes
2564  to realms with all the set of policy rules implemented in \verb|gated|:
2565  by prefix, by ASPATH, by origin, by tag etc.
2566\end{NB}
2567
2568To facilitate the construction (f.e.\ in case the routing
2569daemon is not aware of realms), missing realms may be completed
2570with routing policy rules, see sec.~\ref{IP-RULE}, p.\pageref{IP-RULE}.
2571
2572For each packet the kernel calculates a tuple of realms: source realm
2573and destination realm, using the following algorithm:
2574
2575\begin{enumerate}
2576\item If the route has a realm, the destination realm of the packet is set to it.
2577\item If the rule has a source realm, the source realm of the packet is set to it.
2578If the destination realm was not inherited from the route and the rule has a destination realm,
2579it is also set.
2580\item If at least one of the realms is still unknown, the kernel finds
2581the reversed route to the source of the packet.
2582\item If the source realm is still unknown, get it from the reversed route.
2583\item If one of the realms is still unknown, swap the realms of reversed
2584routes and apply step 2 again.
2585\end{enumerate}
2586
2587After this procedure is completed we know what realm the packet
2588arrived from and the realm where it is going to propagate to.
2589If some of the realms are unknown, they are initialized to zero
2590(or realm \verb|unknown|).
2591
2592The main application of realms is the TC \verb|route| classifier~\cite{TC-CREF},
2593where they are used to help assign packets to traffic classes,
2594to account, police and schedule them according to this
2595classification.
2596
2597A much simpler but still very useful application is incoming packet
2598accounting by realms. The kernel gathers a packet statistics summary
2599which can be viewed with the \verb|rtacct| utility.
2600\begin{verbatim}
2601kuznet@amber:~ $ rtacct russia
2602Realm      BytesTo    PktsTo     BytesFrom  PktsFrom   
2603russia     20576778   169176     47080168   153805     
2604kuznet@amber:~ $
2605\end{verbatim}
2606This shows that this router received 153805 packets from
2607the realm \verb|russia| and forwarded 169176 packets to \verb|russia|.
2608The realm \verb|russia| consists of routes with ASPATHs not leaving
2609Russia.
2610
2611Note that locally originating packets are not accounted here,
2612\verb|rtacct| shows incoming packets only. Using the \verb|route|
2613classifier (see~\cite{TC-CREF}) you can get even more detailed
2614accounting information about outgoing packets, optionally
2615summarizing traffic not only by source or destination, but
2616by any pair of source and destination realms.
2617
2618
2619\begin{thebibliography}{99}
2620\addcontentsline{toc}{section}{References}
2621\bibitem{RFC-NDISC} T.~Narten, E.~Nordmark, W.~Simpson.
2622``Neighbor Discovery for IP Version 6 (IPv6)'', RFC-2461.
2623
2624\bibitem{RFC-ADDRCONF} S.~Thomson, T.~Narten.
2625``IPv6 Stateless Address Autoconfiguration'', RFC-2462.
2626
2627\bibitem{RFC1812} F.~Baker.
2628``Requirements for IP Version 4 Routers'', RFC-1812.
2629
2630\bibitem{RFC1122} R.~T.~Braden.
2631``Requirements for Internet hosts --- communication layers'', RFC-1122.
2632
2633\bibitem{IOS} ``Cisco IOS Release 12.0 Network Protocols
2634Command Reference, Part 1'' and
2635``Cisco IOS Release 12.0 Quality of Service Solutions
2636Configuration Guide: Configuring Policy-Based Routing'',\\
2637http://www.cisco.com/univercd/cc/td/doc/product/software/ios120.
2638
2639\bibitem{IP-TUNNELS} A.~N.~Kuznetsov.
2640``Tunnels over IP in Linux-2.2'', \\
2641In: {\tt ftp://ftp.inr.ac.ru/ip-routing/iproute2-current.tar.gz}.
2642
2643\bibitem{TC-CREF} A.~N.~Kuznetsov. ``TC Command Reference'',\\
2644In: {\tt ftp://ftp.inr.ac.ru/ip-routing/iproute2-current.tar.gz}.
2645
2646\bibitem{IOS-BGP-PP} ``Cisco IOS Release 12.0 Quality of Service Solutions
2647Configuration Guide: Configuring QoS Policy Propagation via
2648Border Gateway Protocol'',\\
2649http://www.cisco.com/univercd/cc/td/doc/product/software/ios120.
2650
2651\bibitem{RFC-DHCP} R.~Droms.
2652``Dynamic Host Configuration Protocol.'', RFC-2131
2653
2654\end{thebibliography}
2655
2656
2657
2658
2659\appendix
2660\addcontentsline{toc}{section}{Appendix}
2661
2662\section{Source address selection}
2663\label{ADDR-SEL}
2664
2665When a host creates an IP packet, it must select some source
2666address. Correct source address selection is a critical procedure,
2667because it gives the receiver the information needed to deliver a
2668reply. If the source is selected incorrectly, in the best case,
2669the backward path may appear different to the forward one which
2670is harmful for performance. In the worst case, when the addresses
2671are administratively scoped, the reply may be lost entirely.
2672
2673Linux-2.2 selects source addresses using the following algorithm:
2674
2675\begin{itemize}
2676\item
2677The application may select a source address explicitly with \verb|bind(2)|
2678syscall or supplying it to \verb|sendmsg(2)| via the ancillary data object
2679\verb|IP_PKTINFO|. In this case the kernel only checks the validity
2680of the address and never tries to ``improve'' an incorrect user choice,
2681generating an error instead.
2682\begin{NB}
2683 Never say ``Never''. The sysctl option \verb|ip_dynaddr| breaks
2684 this axiom. It has been made deliberately with the purpose
2685 of automatically reselecting the address on hosts with dynamic dial-out interfaces.
2686 However, this hack {\em must not\/} be used on multihomed hosts
2687 and especially on routers: it would break them.
2688\end{NB}
2689
2690
2691\item Otherwise, IP routing tables can contain an explicit source
2692address hint for this destination. The hint is set with the \verb|src| parameter
2693to the \verb|ip route| command, sec.\ref{IP-ROUTE}, p.\pageref{IP-ROUTE}.
2694
2695
2696\item Otherwise, the kernel searches through the list of addresses
2697attached to the interface through which the packets will be routed.
2698The search strategies are different for IP and IPv6. Namely:
2699
2700\begin{itemize}
2701\item IPv6 searches for the first valid, not deprecated address
2702with the same scope as the destination.
2703
2704\item IP searches for the first valid address with a scope wider
2705than the scope of the destination but it prefers addresses
2706which fall to the same subnet as the nexthop of the route
2707to the destination. Unlike IPv6, the scopes of IPv4 destinations
2708are not encoded in their addresses but are supplied
2709in routing tables instead (the \verb|scope| parameter to the \verb|ip route| command,
2710sec.\ref{IP-ROUTE}, p.\pageref{IP-ROUTE}).
2711
2712\end{itemize}
2713
2714
2715\item Otherwise, if the scope of the destination is \verb|link| or \verb|host|,
2716the algorithm fails and returns a zero source address.
2717
2718\item Otherwise, all interfaces are scanned to search for an address
2719with an appropriate scope. The loopback device \verb|lo| is always the first
2720in the search list, so that if an address with global scope (not 127.0.0.1!)
2721is configured on loopback, it is always preferred.
2722
2723\end{itemize}
2724
2725
2726\section{Proxy ARP/NDISC}
2727\label{PROXY-NEIGH}
2728
2729Routers may answer ARP/NDISC solicitations on behalf of other hosts.
2730In Linux-2.2 proxy ARP on an interface may be enabled
2731by setting the kernel \verb|sysctl| variable 
2732\verb|/proc/sys/net/ipv4/conf/<dev>/proxy_arp| to 1. After this, the router
2733starts to answer ARP requests on the interface \verb|<dev>|, provided
2734the route to the requested destination does {\em not\/} go back via the same
2735device.
2736
2737The variable \verb|/proc/sys/net/ipv4/conf/all/proxy_arp| enables proxy
2738ARP on all the IP devices.
2739
2740However, this approach fails in the case of IPv6 because the router
2741must join the solicited node multicast address to listen for the corresponding
2742NDISC queries. It means that proxy NDISC is possible only on a per destination
2743basis.
2744
2745Logically, proxy ARP/NDISC is not a kernel task. It can easily be implemented
2746in user space. However, similar functionality was present in BSD kernels
2747and in Linux-2.0, so we have to preserve it at least to the extent that
2748is standardized in BSD.
2749\begin{NB}
2750  Linux-2.0 ARP had a feature called {\em subnet\/} proxy ARP.
2751  It is replaced with the sysctl flag in Linux-2.2.
2752\end{NB}
2753
2754
2755The \verb|ip| utility provides a way to manage proxy ARP/NDISC
2756with the \verb|ip neigh| command, namely:
2757\begin{verbatim}
2758  ip neigh add proxy ADDRESS [ dev NAME ]
2759\end{verbatim}
2760adds a new proxy ARP/NDISC record and
2761\begin{verbatim}
2762  ip neigh del proxy ADDRESS [ dev NAME ]
2763\end{verbatim}
2764deletes it.
2765
2766If the name of the device is not given, the router will answer solicitations
2767for address \verb|ADDRESS| on all devices, otherwise it will only serve
2768the device \verb|NAME|. Even if the proxy entry is created with
2769\verb|ip neigh|, the router {\em will not\/} answer a query if the route
2770to the destination goes back via the interface from which the solicitation
2771was received.
2772
2773It is important to emphasize that proxy entries have {\em no\/}
2774parameters other than these (IP/IPv6 address and optional device).
2775Particularly, the entry does not store any link layer address.
2776It always advertises the station address of the interface
2777on which it sends advertisements (i.e. it's own station address).
2778
2779\section{Route NAT status}
2780\label{ROUTE-NAT}
2781
2782NAT (or ``Network Address Translation'') remaps some parts
2783of the IP address space into other ones. Linux-2.2 route NAT is supposed
2784to be used to facilitate policy routing by rewriting addresses
2785to other routing domains or to help while renumbering sites
2786to another prefix.
2787
2788\paragraph{What it is not:}
2789It is necessary to emphasize that {\em it is not supposed\/}
2790to be used to compress address space or to split load.
2791This is not missing functionality but a design principle.
2792Route NAT is {\em stateless\/}. It does not hold any state
2793about translated sessions. This means that it handles any number
2794of sessions flawlessly. But it also means that it is {\em static\/}.
2795It cannot detect the moment when the last TCP client stops
2796using an address. For the same reason, it will not help to split
2797load between several servers.
2798\begin{NB}
2799It is a pretty commonly held belief that it is useful to split load between
2800several servers with NAT. This is a mistake. All you get from this
2801is the requirement that the router keep the state of all the TCP connections
2802going via it. Well, if the router is so powerful, run apache on it. 8)
2803\end{NB}
2804
2805The second feature: it does not touch packet payload,
2806does not try to ``improve'' broken protocols by looking
2807through its data and mangling it. It mangles IP addresses,
2808only IP addresses and nothing but IP addresses.
2809This also, is not missing any functionality.
2810
2811To resume: if you need to compress address space or keep
2812active FTP clients happy, your choice is not route NAT but masquerading,
2813port forwarding, NAPT etc. 
2814\begin{NB}
2815By the way, you may also want to look at
2816http://www.csn.tu-chemnitz.de/HyperNews/get/linux-ip-nat.html
2817\end{NB}
2818
2819
2820\paragraph{How it works.}
2821Some part of the address space is reserved for dummy addresses
2822which will look for all the world like some host addresses
2823inside your network. No other hosts may use these addresses,
2824however other routers may also be configured to translate them.
2825\begin{NB}
2826A great advantage of route NAT is that it may be used not
2827only in stub networks but in environments with arbitrarily complicated
2828structure. It does not firewall, it {\em forwards.}
2829\end{NB}
2830These addresses are selected by the \verb|ip route| command
2831(sec.\ref{IP-ROUTE-ADD}, p.\pageref{IP-ROUTE-ADD}). F.e.\
2832\begin{verbatim}
2833  ip route add nat 192.203.80.144 via 193.233.7.83
2834\end{verbatim}
2835states that the single address 192.203.80.144 is a dummy NAT address.
2836For all the world it looks like a host address inside our network.
2837For neighbouring hosts and routers it looks like the local address
2838of the translating router. The router answers ARP for it, advertises
2839this address as routed via it, {\em et al\/}. When the router
2840receives a packet destined for 192.203.80.144, it replaces 
2841this address with 193.233.7.83 which is the address of some real
2842host and forwards the packet. If you need to remap
2843blocks of addresses, you may use a command like:
2844\begin{verbatim}
2845  ip route add nat 192.203.80.192/26 via 193.233.7.64
2846\end{verbatim}
2847This command will map a block of 63 addresses 192.203.80.192-255 to
2848193.233.7.64-127.
2849
2850When an internal host (193.233.7.83 in the example above)
2851sends something to the outer world and these packets are forwarded
2852by our router, it should translate the source address 193.233.7.83
2853into 192.203.80.144. This task is solved by setting a special
2854policy rule (sec.\ref{IP-RULE-ADD}, p.\pageref{IP-RULE-ADD}):
2855\begin{verbatim}
2856  ip rule add prio 320 from 193.233.7.83 nat 192.203.80.144
2857\end{verbatim}
2858This rule says that the source address 193.233.7.83
2859should be translated into 192.203.80.144 before forwarding.
2860It is important that the address after the \verb|nat| keyword
2861is some NAT address, declared by {\tt ip route add nat}.
2862If it is just a random address the router will not map to it.
2863\begin{NB}
2864The exception is when the address is a local address of this
2865router (or 0.0.0.0) and masquerading is configured in the linux-2.2
2866kernel. In this case the router will masquerade the packets as this address.
2867If 0.0.0.0 is selected, the result is equivalent to one
2868obtained with firewalling rules. Otherwise, you have the way
2869to order Linux to masquerade to this fixed address.
2870NAT mechanism used in linux-2.4 is more flexible than
2871masquerading, so that this feature has lost meaning and disabled.
2872\end{NB}
2873
2874If the network has non-trivial internal structure, it is
2875useful and even necessary to add rules disabling translation
2876when a packet does not leave this network. Let us return to the
2877example from sec.\ref{IP-RULE-SHOW} (p.\pageref{IP-RULE-SHOW}).
2878\begin{verbatim}
2879300:	from 193.233.7.83 to 193.233.7.0/24 lookup main
2880310:	from 193.233.7.83 to 192.203.80.0/24 lookup main
2881320:	from 193.233.7.83 lookup inr.ruhep map-to 192.203.80.144
2882\end{verbatim}
2883This block of rules causes normal forwarding when
2884packets from 193.233.7.83 do not leave networks 193.233.7/24
2885and 192.203.80/24. Also, if the \verb|inr.ruhep| table does not
2886contain a route to the destination (which means that the routing
2887domain owning addresses from 192.203.80/24 is dead), no translation
2888will occur. Otherwise, the packets are translated.
2889
2890\paragraph{How to only translate selected ports:}
2891If you only want to translate selected ports (f.e.\ http)
2892and leave the rest intact, you may use \verb|ipchains|
2893to \verb|fwmark| a class of packets.
2894Suppose you did and all the packets from 193.233.7.83
2895destined for port 80 are marked with marker 0x1234 in input fwchain.
2896In this case you may replace rule \#320 with:
2897\begin{verbatim}
2898320:	from 193.233.7.83 fwmark 1234 lookup main map-to 192.203.80.144
2899\end{verbatim}
2900and translation will only be enabled for outgoing http requests.
2901
2902\section{Example: minimal host setup}
2903\label{EXAMPLE-SETUP}
2904
2905The following script gives an example of a fault safe
2906setup of IP (and IPv6, if it is compiled into the kernel)
2907in the common case of a node attached to a single broadcast
2908network. A more advanced script, which may be used both on multihomed
2909hosts and on routers, is described in the following
2910section.
2911
2912The utilities used in the script may be found in the
2913directory ftp://ftp.inr.ac.ru/ip-routing/:
2914\begin{enumerate}
2915\item \verb|ip| --- package \verb|iproute2|.
2916\item \verb|arping| --- package \verb|iputils|.
2917\item \verb|rdisc| --- package \verb|iputils|.
2918\end{enumerate}
2919\begin{NB}
2920It also refers to a DHCP client, \verb|dhcpcd|. I should refrain from
2921recommending a good DHCP client to use. All that I can
2922say is that ISC \verb|dhcp-2.0b1pl6| patched with the patch that
2923can be found in the \verb|dhcp.bootp.rarp| subdirectory of
2924the same ftp site {\em does\/} work,
2925at least on Ethernet and Token Ring.
2926\end{NB}
2927
2928\begin{verbatim}
2929#! /bin/bash
2930\end{verbatim}
2931\begin{flushleft}
2932\# {\bf Usage: \verb|ifone ADDRESS[/PREFIX-LENGTH] [DEVICE]|}\\
2933\# {\bf Parameters:}\\
2934\# \$1 --- Static IP address, optionally followed by prefix length.\\
2935\# \$2 --- Device name. If it is missing, \verb|eth0| is asssumed.\\
2936\# F.e. \verb|ifone 193.233.7.90|
2937\end{flushleft}
2938\begin{verbatim}
2939dev=$2
2940: ${dev:=eth0}
2941ipaddr=
2942\end{verbatim}
2943\# Parse IP address, splitting prefix length.
2944\begin{verbatim}
2945if [ "$1" != "" ]; then
2946  ipaddr=${1%/*}
2947  if [ "$1" != "$ipaddr" ]; then
2948    pfxlen=${1#*/}
2949  fi
2950  : ${pfxlen:=24}
2951fi
2952pfx="${ipaddr}/${pfxlen}"
2953\end{verbatim}
2954
2955\begin{flushleft}
2956\# {\bf Step 0} --- enable loopback.\\
2957\#\\
2958\# This step is necessary on any networked box before attempt\\
2959\# to configure any other device.\\
2960\end{flushleft}
2961\begin{verbatim}
2962ip link set up dev lo
2963ip addr add 127.0.0.1/8 dev lo brd + scope host
2964\end{verbatim}
2965\begin{flushleft}
2966\# IPv6 autoconfigure themself on loopback.\\
2967\#\\
2968\# If user gave loopback as device, we add the address as alias and exit.
2969\end{flushleft}
2970\begin{verbatim}
2971if [ "$dev" = "lo" ]; then
2972  if [ "$ipaddr" != "" -a  "$ipaddr" != "127.0.0.1" ]; then
2973    ip address add $ipaddr dev $dev
2974    exit $?
2975  fi
2976  exit 0
2977fi
2978\end{verbatim}
2979
2980\noindent\# {\bf Step 1} --- enable device \verb|$dev|
2981
2982\begin{verbatim}
2983if ! ip link set up dev $dev ; then
2984  echo "Cannot enable interface $dev. Aborting." 1>&2
2985  exit 1
2986fi
2987\end{verbatim}
2988\begin{flushleft}
2989\# The interface is \verb|UP|. IPv6 started stateless autoconfiguration itself,\\
2990\# and its configuration finishes here. However,\\
2991\# IP still needs some static preconfigured address.
2992\end{flushleft}
2993\begin{verbatim}
2994if [ "$ipaddr" = "" ]; then
2995  echo "No address for $dev is configured, trying DHCP..." 1>&2
2996  dhcpcd
2997  exit $?
2998fi
2999\end{verbatim}
3000
3001\begin{flushleft}
3002\# {\bf Step 2} --- IP Duplicate Address Detection~\cite{RFC-DHCP}.\\
3003\# Send two probes and wait for result for 3 seconds.\\
3004\# If the interface opens slower f.e.\ due to long media detection,\\
3005\# you want to increase the timeout.\\
3006\end{flushleft}
3007\begin{verbatim}
3008if ! arping -q -c 2 -w 3 -D -I $dev $ipaddr ; then
3009  echo "Address $ipaddr is busy, trying DHCP..." 1>&2
3010  dhcpcd
3011  exit $?
3012fi
3013\end{verbatim}
3014\begin{flushleft}
3015\# OK, the address is unique, we may add it on the interface.\\
3016\#\\
3017\# {\bf Step 3} --- Configure the address on the interface.
3018\end{flushleft}
3019
3020\begin{verbatim}
3021if ! ip address add $pfx brd + dev $dev; then
3022  echo "Failed to add $pfx on $dev, trying DHCP..." 1>&2
3023  dhcpcd
3024  exit $?
3025fi
3026\end{verbatim}
3027
3028\noindent\# {\bf Step 4} --- Announce our presence on the link.
3029\begin{verbatim}
3030arping -A -c 1 -I $dev $ipaddr
3031noarp=$?
3032( sleep 2;
3033  arping -U -c 1 -I $dev $ipaddr ) >& /dev/null </dev/null &
3034\end{verbatim}
3035
3036\begin{flushleft}
3037\# {\bf Step 5} (optional) --- Add some control routes.\\
3038\#\\
3039\# 1. Prohibit link local multicast addresses.\\
3040\# 2. Prohibit link local (alias, limited) broadcast.\\
3041\# 3. Add default multicast route.
3042\end{flushleft}
3043\begin{verbatim}
3044ip route add unreachable 224.0.0.0/24 
3045ip route add unreachable 255.255.255.255
3046if [ `ip link ls $dev | grep -c MULTICAST` -ge 1 ]; then
3047  ip route add 224.0.0.0/4 dev $dev scope global
3048fi
3049\end{verbatim}
3050
3051\begin{flushleft}
3052\# {\bf Step 6} --- Add fallback default route with huge metric.\\
3053\# If a proxy ARP server is present on the interface, we will be\\
3054\# able to talk to all the Internet without further configuration.\\
3055\# It is not so cheap though and we still hope that this route\\
3056\# will be overridden by more correct one by rdisc.\\
3057\# Do not make this step if the device is not ARPable,\\
3058\# because dead nexthop detection does not work on them.
3059\end{flushleft}
3060\begin{verbatim}
3061if [ "$noarp" = "0" ]; then
3062  ip ro add default dev $dev metric 30000 scope global
3063fi
3064\end{verbatim}
3065
3066\begin{flushleft}
3067\# {\bf Step 7} --- Restart router discovery and exit.
3068\end{flushleft}
3069\begin{verbatim}
3070killall -HUP rdisc || rdisc -fs
3071exit 0
3072\end{verbatim}
3073
3074
3075\section{Example: {\protect\tt ifcfg} --- interface address management}
3076\label{EXAMPLE-IFCFG}
3077
3078This is a simplistic script replacing one option of \verb|ifconfig|,
3079namely, IP address management. It not only adds
3080addresses, but also carries out Duplicate Address Detection~\cite{RFC-DHCP},
3081sends unsolicited ARP to update the caches of other hosts sharing
3082the interface, adds some control routes and restarts Router Discovery
3083when it is necessary.
3084
3085I strongly recommend using it {\em instead\/} of \verb|ifconfig| both
3086on hosts and on routers.
3087
3088\begin{verbatim}
3089#! /bin/bash
3090\end{verbatim}
3091\begin{flushleft}
3092\# {\bf Usage: \verb?ifcfg DEVICE[:ALIAS] [add|del] ADDRESS[/LENGTH] [PEER]?}\\
3093\# {\bf Parameters:}\\
3094\# ---Device name. It may have alias suffix, separated by colon.\\
3095\# ---Command: add, delete or stop.\\
3096\# ---IP address, optionally followed by prefix length.\\
3097\# ---Optional peer address for pointopoint interfaces.\\
3098\# F.e. \verb|ifcfg eth0 193.233.7.90/24|
3099
3100\noindent\# This function determines, whether it is router or host.\\
3101\# It returns 0, if the host is apparently not router.
3102\end{flushleft}
3103\begin{verbatim}
3104CheckForwarding () {
3105  local sbase fwd
3106  sbase=/proc/sys/net/ipv4/conf
3107  fwd=0
3108  if [ -d $sbase ]; then
3109    for dir in $sbase/*/forwarding; do
3110      fwd=$[$fwd + `cat $dir`]
3111    done
3112  else
3113    fwd=2
3114  fi
3115  return $fwd
3116}
3117\end{verbatim}
3118\begin{flushleft}
3119\# This function restarts Router Discovery.\\
3120\end{flushleft}
3121\begin{verbatim}
3122RestartRDISC () {
3123  killall -HUP rdisc || rdisc -fs
3124}
3125\end{verbatim}
3126\begin{flushleft}
3127\# Calculate ABC "natural" mask length\\
3128\# Arg: \$1 = dotquad address
3129\end{flushleft}
3130\begin{verbatim}
3131ABCMaskLen () {
3132  local class;
3133  class=${1%%.*}
3134  if [ $class -eq 0 -o $class -ge 224 ]; then return 0
3135  elif [ $class -ge 192 ]; then return 24
3136  elif [ $class -ge 128 ]; then return 16
3137  else  return 8 ; fi
3138}
3139\end{verbatim}
3140
3141
3142\begin{flushleft}
3143\# {\bf MAIN()}\\
3144\#\\
3145\# Strip alias suffix separated by colon.
3146\end{flushleft}
3147\begin{verbatim}
3148label="label $1"
3149ldev=$1
3150dev=${1%:*}
3151if [ "$dev" = "" -o "$1" = "help" ]; then
3152  echo "Usage: ifcfg DEV [[add|del [ADDR[/LEN]] [PEER] | stop]" 1>&2
3153  echo "       add - add new address" 1>&2
3154  echo "       del - delete address" 1>&2
3155  echo "       stop - completely disable IP" 1>&2
3156  exit 1
3157fi
3158shift
3159
3160CheckForwarding
3161fwd=$?
3162\end{verbatim}
3163\begin{flushleft}
3164\# Parse command. If it is ``stop'', flush and exit.
3165\end{flushleft}
3166\begin{verbatim}
3167deleting=0
3168case "$1" in
3169add) shift ;;
3170stop)
3171  if [ "$ldev" != "$dev" ]; then
3172    echo "Cannot stop alias $ldev" 1>&2
3173    exit 1;
3174  fi
3175  ip -4 addr flush dev $dev $label || exit 1
3176  if [ $fwd -eq 0 ]; then RestartRDISC; fi
3177  exit 0 ;;
3178del*)
3179  deleting=1; shift ;;
3180*)
3181esac
3182\end{verbatim}
3183\begin{flushleft}
3184\# Parse prefix, split prefix length, separated by slash.
3185\end{flushleft}
3186\begin{verbatim}
3187ipaddr=
3188pfxlen=
3189if [ "$1" != "" ]; then
3190  ipaddr=${1%/*}
3191  if [ "$1" != "$ipaddr" ]; then
3192    pfxlen=${1#*/}
3193  fi
3194  if [ "$ipaddr" = "" ]; then
3195    echo "$1 is bad IP address." 1>&2
3196    exit 1
3197  fi
3198fi
3199shift
3200\end{verbatim}
3201\begin{flushleft}
3202\# If peer address is present, prefix length is 32.\\
3203\# Otherwise, if prefix length was not given, guess it.
3204\end{flushleft}
3205\begin{verbatim}
3206peer=$1
3207if [ "$peer" != "" ]; then
3208  if [ "$pfxlen" != "" -a "$pfxlen" != "32" ]; then
3209    echo "Peer address with non-trivial netmask." 1>&2
3210    exit 1
3211  fi
3212  pfx="$ipaddr peer $peer"
3213else
3214  if [ "$pfxlen" = "" ]; then
3215    ABCMaskLen $ipaddr
3216    pfxlen=$?
3217  fi
3218  pfx="$ipaddr/$pfxlen"
3219fi
3220if [ "$ldev" = "$dev" -a "$ipaddr" != "" ]; then
3221  label=
3222fi
3223\end{verbatim}
3224\begin{flushleft}
3225\# If deletion was requested, delete the address and restart RDISC
3226\end{flushleft}
3227\begin{verbatim}
3228if [ $deleting -ne 0 ]; then
3229  ip addr del $pfx dev $dev $label || exit 1
3230  if [ $fwd -eq 0 ]; then RestartRDISC; fi
3231  exit 0
3232fi
3233\end{verbatim}
3234\begin{flushleft}
3235\# Start interface initialization.\\
3236\#\\
3237\# {\bf Step 0} --- enable device \verb|$dev|
3238\end{flushleft}
3239\begin{verbatim}
3240if ! ip link set up dev $dev ; then
3241  echo "Error: cannot enable interface $dev." 1>&2
3242  exit 1
3243fi
3244if [ "$ipaddr" = "" ]; then exit 0; fi
3245\end{verbatim}
3246\begin{flushleft}
3247\# {\bf Step 1} --- IP Duplicate Address Detection~\cite{RFC-DHCP}.\\
3248\# Send two probes and wait for result for 3 seconds.\\
3249\# If the interface opens slower f.e.\ due to long media detection,\\
3250\# you want to increase the timeout.\\
3251\end{flushleft}
3252\begin{verbatim}
3253if ! arping -q -c 2 -w 3 -D -I $dev $ipaddr ; then
3254  echo "Error: some host already uses address $ipaddr on $dev." 1>&2
3255  exit 1
3256fi
3257\end{verbatim}
3258\begin{flushleft}
3259\# OK, the address is unique. We may add it to the interface.\\
3260\#\\
3261\# {\bf Step 2} --- Configure the address on the interface.
3262\end{flushleft}
3263\begin{verbatim}
3264if ! ip address add $pfx brd + dev $dev $label; then
3265  echo "Error: failed to add $pfx on $dev." 1>&2
3266  exit 1
3267fi
3268\end{verbatim}
3269\noindent\# {\bf Step 3} --- Announce our presence on the link
3270\begin{verbatim}
3271arping -q -A -c 1 -I $dev $ipaddr
3272noarp=$?
3273( sleep 2 ;
3274  arping -q -U -c 1 -I $dev $ipaddr ) >& /dev/null </dev/null &
3275\end{verbatim}
3276\begin{flushleft}
3277\# {\bf Step 4} (optional) --- Add some control routes.\\
3278\#\\
3279\# 1. Prohibit link local multicast addresses.\\
3280\# 2. Prohibit link local (alias, limited) broadcast.\\
3281\# 3. Add default multicast route.
3282\end{flushleft}
3283\begin{verbatim}
3284ip route add unreachable 224.0.0.0/24 >& /dev/null 
3285ip route add unreachable 255.255.255.255 >& /dev/null
3286if [ `ip link ls $dev | grep -c MULTICAST` -ge 1 ]; then
3287  ip route add 224.0.0.0/4 dev $dev scope global >& /dev/null
3288fi
3289\end{verbatim}
3290\begin{flushleft}
3291\# {\bf Step 5} --- Add fallback default route with huge metric.\\
3292\# If a proxy ARP server is present on the interface, we will be\\
3293\# able to talk to all the Internet without further configuration.\\
3294\# Do not make this step on router or if the device is not ARPable.\\
3295\# because dead nexthop detection does not work on them.
3296\end{flushleft}
3297\begin{verbatim}
3298if [ $fwd -eq 0 ]; then
3299  if [ $noarp -eq 0 ]; then
3300    ip ro append default dev $dev metric 30000 scope global
3301  elif [ "$peer" != "" ]; then
3302    if ping -q -c 2 -w 4 $peer ; then
3303      ip ro append default via $peer dev $dev metric 30001
3304    fi
3305  fi
3306  RestartRDISC
3307fi
3308
3309exit 0
3310\end{verbatim}
3311\begin{flushleft}
3312\# End of {\bf MAIN()}
3313\end{flushleft}
3314
3315
3316\end{document}
3317