History log of /linux-master/drivers/net/ifb.c
Revision Date Author Comments
# 55c90047 27-Oct-2023 Jakub Kicinski <kuba@kernel.org>

net: fill in MODULE_DESCRIPTION()s under drivers/net/

W=1 builds now warn if module is built without a MODULE_DESCRIPTION().

Acked-by: Willem de Bruijn <willemb@google.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Jason Wang <jasowang@redhat.com>
Acked-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 068c38ad 26-Oct-2022 Thomas Gleixner <tglx@linutronix.de>

net: Remove the obsolte u64_stats_fetch_*_irq() users (drivers).

Now that the 32bit UP oddity is gone and 32bit uses always a sequence
count, there is no need for the fetch_irq() variants anymore.

Convert to the regular interface.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# a21ee5b2 27-Nov-2021 Tonghao Zhang <xiangxia.m.yue@gmail.com>

net: ifb: support ethtools stats

With this feature, we can use the ethtools to get tx/rx
queues stats. This patch, introduce the ifb_update_q_stats
helper to update the queues stats, and ifb_q_stats to simplify
the codes.

Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Link: https://lore.kernel.org/r/20211128014631.43627-1-xiangxia.m.yue@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 7444d706 29-Oct-2021 Arnd Bergmann <arnd@arndb.de>

ifb: fix building without CONFIG_NET_CLS_ACT

The driver no longer depends on this option, but it fails to
build if it's disabled because the skb->tc_skip_classify is
hidden behind an #ifdef:

drivers/net/ifb.c:81:8: error: no member named 'tc_skip_classify' in 'struct sk_buff'
skb->tc_skip_classify = 1;

Use the same #ifdef around the assignment.

Fixes: 046178e726c2 ("ifb: Depend on netfilter alternatively to tc")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 42df6e1d 08-Oct-2021 Lukas Wunner <lukas@wunner.de>

netfilter: Introduce egress hook

Support classifying packets with netfilter on egress to satisfy user
requirements such as:
* outbound security policies for containers (Laura)
* filtering and mangling intra-node Direct Server Return (DSR) traffic
on a load balancer (Laura)
* filtering locally generated traffic coming in through AF_PACKET,
such as local ARP traffic generated for clustering purposes or DHCP
(Laura; the AF_PACKET plumbing is contained in a follow-up commit)
* L2 filtering from ingress and egress for AVB (Audio Video Bridging)
and gPTP with nftables (Pablo)
* in the future: in-kernel NAT64/NAT46 (Pablo)

The egress hook introduced herein complements the ingress hook added by
commit e687ad60af09 ("netfilter: add netfilter ingress hook after
handle_ing() under unique static key"). A patch for nftables to hook up
egress rules from user space has been submitted separately, so users may
immediately take advantage of the feature.

Alternatively or in addition to netfilter, packets can be classified
with traffic control (tc). On ingress, packets are classified first by
tc, then by netfilter. On egress, the order is reversed for symmetry.
Conceptually, tc and netfilter can be thought of as layers, with
netfilter layered above tc.

Traffic control is capable of redirecting packets to another interface
(man 8 tc-mirred). E.g., an ingress packet may be redirected from the
host namespace to a container via a veth connection:
tc ingress (host) -> tc egress (veth host) -> tc ingress (veth container)

In this case, netfilter egress classifying is not performed when leaving
the host namespace! That's because the packet is still on the tc layer.
If tc redirects the packet to a physical interface in the host namespace
such that it leaves the system, the packet is never subjected to
netfilter egress classifying. That is only logical since it hasn't
passed through netfilter ingress classifying either.

Packets can alternatively be redirected at the netfilter layer using
nft fwd. Such a packet *is* subjected to netfilter egress classifying
since it has reached the netfilter layer.

Internally, the skb->nf_skip_egress flag controls whether netfilter is
invoked on egress by __dev_queue_xmit(). Because __dev_queue_xmit() may
be called recursively by tunnel drivers such as vxlan, the flag is
reverted to false after sch_handle_egress(). This ensures that
netfilter is applied both on the overlay and underlying network.

Interaction between tc and netfilter is possible by setting and querying
skb->mark.

If netfilter egress classifying is not enabled on any interface, it is
patched out of the data path by way of a static_key and doesn't make a
performance difference that is discernible from noise:

Before: 1537 1538 1538 1537 1538 1537 Mb/sec
After: 1536 1534 1539 1539 1539 1540 Mb/sec
Before + tc accept: 1418 1418 1418 1419 1419 1418 Mb/sec
After + tc accept: 1419 1424 1418 1419 1422 1420 Mb/sec
Before + tc drop: 1620 1619 1619 1619 1620 1620 Mb/sec
After + tc drop: 1616 1624 1625 1624 1622 1619 Mb/sec

When netfilter egress classifying is enabled on at least one interface,
a minimal performance penalty is incurred for every egress packet, even
if the interface it's transmitted over doesn't have any netfilter egress
rules configured. That is caused by checking dev->nf_hooks_egress
against NULL.

Measurements were performed on a Core i7-3615QM. Commands to reproduce:
ip link add dev foo type dummy
ip link set dev foo up
modprobe pktgen
echo "add_device foo" > /proc/net/pktgen/kpktgend_3
samples/pktgen/pktgen_bench_xmit_mode_queue_xmit.sh -i foo -n 400000000 -m "11:11:11:11:11:11" -d 1.1.1.1

Accept all traffic with tc:
tc qdisc add dev foo clsact
tc filter add dev foo egress bpf da bytecode '1,6 0 0 0,'

Drop all traffic with tc:
tc qdisc add dev foo clsact
tc filter add dev foo egress bpf da bytecode '1,6 0 0 2,'

Apply this patch when measuring packet drops to avoid errors in dmesg:
https://lore.kernel.org/netdev/a73dda33-57f4-95d8-ea51-ed483abd6a7a@iogearbox.net/

Signed-off-by: Lukas Wunner <lukas@wunner.de>
Cc: Laura García Liébana <nevola@gmail.com>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# cf9207d7 19-May-2021 Hui Tang <tanghui20@huawei.com>

ifb: remove leading spaces before tabs

There are a few leading spaces before tabs and remove it by running
the following commard:

$ find . -name '*.[ch]' | xargs sed -r -i 's/^[ ]+\t/\t/'

Signed-off-by: Hui Tang <tanghui20@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 08267523 30-Jan-2021 Emil Renner Berthing <kernel@esmil.dk>

ifb: use new tasklet API

This converts the driver to use the new tasklet API introduced in
commit 12cc923f1ccc ("tasklet: Introduce new initialization API")

Signed-off-by: Emil Renner Berthing <kernel@esmil.dk>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# ecb8fed4 01-Nov-2020 Alexander Lobakin <alobakin@pm.me>

net: bonding, dummy, ifb, team: advertise NETIF_F_GSO_SOFTWARE

Virtual netdevs should use NETIF_F_GSO_SOFTWARE to forward GSO skbs
as-is and let the final drivers deal with them when supported.
Also remove NETIF_F_GSO_UDP_L4 from bonding and team drivers as it's
now included in the "software" list.

Suggested-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Alexander Lobakin <alobakin@pm.me>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 2c64605b 25-Mar-2020 Pablo Neira Ayuso <pablo@netfilter.org>

net: Fix CONFIG_NET_CLS_ACT=n and CONFIG_NFT_FWD_NETDEV={y, m} build

net/netfilter/nft_fwd_netdev.c: In function ‘nft_fwd_netdev_eval’:
net/netfilter/nft_fwd_netdev.c:32:10: error: ‘struct sk_buff’ has no member named ‘tc_redirected’
pkt->skb->tc_redirected = 1;
^~
net/netfilter/nft_fwd_netdev.c:33:10: error: ‘struct sk_buff’ has no member named ‘tc_from_ingress’
pkt->skb->tc_from_ingress = 1;
^~

To avoid a direct dependency with tc actions from netfilter, wrap the
redirect bits around CONFIG_NET_REDIRECT and move helpers to
include/linux/skbuff.h. Turn on this toggle from the ifb driver, the
only existing client of these bits in the tree.

This patch adds skb_set_redirected() that sets on the redirected bit
on the skbuff, it specifies if the packet was redirect from ingress
and resets the timestamp (timestamp reset was originally missing in the
netfilter bugfix).

Fixes: bcfabee1afd99484 ("netfilter: nft_fwd_netdev: allow to redirect to ifb via ingress")
Reported-by: noreply@ellerman.id.au
Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 2874c5fd 27-May-2019 Thomas Gleixner <tglx@linutronix.de>

treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152

Based on 1 normalized pattern(s):

this program is free software you can redistribute it and or modify
it under the terms of the gnu general public license as published by
the free software foundation either version 2 of the license or at
your option any later version

extracted by the scancode license scanner the SPDX license identifier

GPL-2.0-or-later

has been chosen to replace the boilerplate/reference in 3029 file(s).

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Allison Randal <allison@lohutok.net>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190527070032.746973796@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


# b1d2e4e0 24-May-2018 Jon Maxwell <jmaxwell37@gmail.com>

ifb: fix packets checksum

Fixup the checksum for CHECKSUM_COMPLETE when pulling skbs on RX path.
Otherwise we get splats when tc mirred is used to redirect packets to ifb.

Before fix:

nic: hw csum failure

Signed-off-by: Jon Maxwell <jmaxwell37@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 554873e5 30-Mar-2018 Kirill Tkhai <ktkhai@virtuozzo.com>

net: Do not take net_rwsem in __rtnl_link_unregister()

This function calls call_netdevice_notifier(), which also
may take net_rwsem. So, we can't use net_rwsem here.

This patch makes callers of this functions take pernet_ops_rwsem,
like register_netdevice_notifier() does. This will protect
the modifications of net_namespace_list, and allows notifiers
to take it (they won't have to care about context).

Since __rtnl_link_unregister() is used on module load
and unload (which are not frequent operations), this looks
for me better, than make all call_netdevice_notifier()
always executing in "protected net_namespace_list" context.

Also, this fixes the problem we had a deal in 328fbe747ad4
"Close race between {un, }register_netdevice_notifier and ...",
and guarantees __rtnl_link_unregister() does not skip
exitting net.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# e94cd811 22-Sep-2017 Zhang Shengju <zhangshengju@cmss.chinamobile.com>

net: remove MTU limits for dummy and ifb device

These two drivers (dummy and ifb) call ether_setup(), after commit
61e84623ace3 ("net: centralize net_device min/max MTU checking"), the
range of mtu is [min_mtu, max_mtu], which is [68, 1500] by default.

These two devices should not have limits on MTU. This patch set their
min_mtu/max_mtu to 0. So that dev_set_mtu() will not check the mtu range,
and can be set with any value.

CC: Eric Dumazet <edumazet@google.com>
CC: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: Zhang Shengju <zhangshengju@cmss.chinamobile.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# a8b8a889 25-Jun-2017 Matthias Schiffer <mschiffer@universe-factory.net>

net: add netlink_ext_ack argument to rtnl_link_ops.validate

Add support for extended error reporting.

Signed-off-by: Matthias Schiffer <mschiffer@universe-factory.net>
Acked-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# cf124db5 07-May-2017 David S. Miller <davem@davemloft.net>

net: Fix inconsistent teardown and release of private netdev state.

Network devices can allocate reasources and private memory using
netdev_ops->ndo_init(). However, the release of these resources
can occur in one of two different places.

Either netdev_ops->ndo_uninit() or netdev->destructor().

The decision of which operation frees the resources depends upon
whether it is necessary for all netdev refs to be released before it
is safe to perform the freeing.

netdev_ops->ndo_uninit() presumably can occur right after the
NETDEV_UNREGISTER notifier completes and the unicast and multicast
address lists are flushed.

netdev->destructor(), on the other hand, does not run until the
netdev references all go away.

Further complicating the situation is that netdev->destructor()
almost universally does also a free_netdev().

This creates a problem for the logic in register_netdevice().
Because all callers of register_netdevice() manage the freeing
of the netdev, and invoke free_netdev(dev) if register_netdevice()
fails.

If netdev_ops->ndo_init() succeeds, but something else fails inside
of register_netdevice(), it does call ndo_ops->ndo_uninit(). But
it is not able to invoke netdev->destructor().

This is because netdev->destructor() will do a free_netdev() and
then the caller of register_netdevice() will do the same.

However, this means that the resources that would normally be released
by netdev->destructor() will not be.

Over the years drivers have added local hacks to deal with this, by
invoking their destructor parts by hand when register_netdevice()
fails.

Many drivers do not try to deal with this, and instead we have leaks.

Let's close this hole by formalizing the distinction between what
private things need to be freed up by netdev->destructor() and whether
the driver needs unregister_netdevice() to perform the free_netdev().

netdev->priv_destructor() performs all actions to free up the private
resources that used to be freed by netdev->destructor(), except for
free_netdev().

netdev->needs_free_netdev is a boolean that indicates whether
free_netdev() should be done at the end of unregister_netdevice().

Now, register_netdevice() can sanely release all resources after
ndo_ops->ndo_init() succeeds, by invoking both ndo_ops->ndo_uninit()
and netdev->priv_destructor().

And at the end of unregister_netdevice(), we invoke
netdev->priv_destructor() and optionally call free_netdev().

Signed-off-by: David S. Miller <davem@davemloft.net>


# bc31c905 07-Jan-2017 Willem de Bruijn <willemb@google.com>

net-tc: convert tc_from to tc_from_ingress and tc_redirected

The tc_from field fulfills two roles. It encodes whether a packet was
redirected by an act_mirred device and, if so, whether act_mirred was
called on ingress or egress. Split it into separate fields.

The information is needed by the special IFB loop, where packets are
taken out of the normal path by act_mirred, forwarded to IFB, then
reinjected at their original location (ingress or egress) by IFB.

The IFB device cannot use skb->tc_at_ingress, because that may have
been overwritten as the packet travels from act_mirred to ifb_xmit,
when it passes through tc_classify on the IFB egress path. Cache this
value in skb->tc_from_ingress.

That field is valid only if a packet arriving at ifb_xmit came from
act_mirred. Other packets can be crafted to reach ifb_xmit. These
must be dropped. Set tc_redirected on redirection and drop all packets
that do not have this bit set.

Both fields are set only on cloned skbs in tc actions, so original
packet sources do not have to clear the bit when reusing packets
(notably, pktgen and octeon).

Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# a5135bcf 07-Jan-2017 Willem de Bruijn <willemb@google.com>

net-tc: convert tc_verd to integer bitfields

Extract the remaining two fields from tc_verd and remove the __u16
completely. TC_AT and TC_FROM are converted to equivalent two-bit
integer fields tc_at and tc_from. Where possible, use existing
helper skb_at_tc_ingress when reading tc_at. Introduce helper
skb_reset_tc to clear fields.

Not documenting tc_from and tc_at, because they will be replaced
with single bit fields in follow-on patches.

Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# e7246e12 07-Jan-2017 Willem de Bruijn <willemb@google.com>

net-tc: extract skip classify bit from tc_verd

Packets sent by the IFB device skip subsequent tc classification.
A single bit governs this state. Move it out of tc_verd in
anticipation of removing that __u16 completely.

The new bitfield tc_skip_classify temporarily uses one bit of a
hole, until tc_verd is removed completely in a follow-up patch.

Remove the bit hole comment. It could be 2, 3, 4 or 5 bits long.
With that many options, little value in documenting it.

Introduce a helper function to deduplicate the logic in the two
sites that check this bit.

The field tc_skip_classify is set only in IFB on skbs cloned in
act_mirred, so original packet sources do not have to clear the
bit when reusing packets (notably, pktgen and octeon).

Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# bc1f4470 06-Jan-2017 stephen hemminger <stephen@networkplumber.org>

net: make ndo_get_stats64 a void function

The network device operation for reading statistics is only called
in one place, and it ignores the return value. Having a structure
return value is potentially confusing because some future driver could
incorrectly assume that the return value was used.

Fix all drivers with ndo_get_stats64 to have a void function.

Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 7d945796 06-May-2016 Eric Dumazet <edumazet@google.com>

ifb: support more features

When using ifb+netem on ingress on SIT/IPIP/GRE traffic,
GRO packets are not properly processed.

Segmentation should not be forced, since ifb is already adding
quite a performance hit.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 9e29e21a 06-Jul-2015 Eric Dumazet <edumazet@google.com>

ifb: add multiqueue operation

Add multiqueue capabilities to ifb netdevice.

This removes last bottleneck for ingress when mq qdisc can be used
to shard load from multiple RX queues on physical device.

Tested:

# netem based setup, installed at receiver side
ETH=eth0
IFB=ifb10
EST="est 1sec 4sec" # Optional rate estimator
RTT_HALF=2ms
#REORDER=20us
#LOSS="loss 1"
TXQ=8

ip link add ifb10 numtxqueues $TXQ type ifb
ip link set dev $IFB up

tc qdisc add dev $ETH ingress 2>/dev/null

tc filter add dev $ETH parent ffff: \
protocol ip u32 match u32 0 0 flowid 1:1 \
action mirred egress redirect dev $IFB

tc qdisc del dev $IFB root 2>/dev/null

tc qdisc add dev $IFB root handle 1: mq
for i in `seq 1 $TXQ`
do
slot=$( printf %x $(( i )) )
tc qd add dev $IFB parent 1:$slot $EST netem \
limit 100000 delay $RTT_HALF $REORDER $LOSS
done

lpaa24:~# tc -s -d qd sh dev ifb10
qdisc mq 1: root
Sent 316544766 bytes 5265927 pkt (dropped 0, overlimits 0 requeues 0)
backlog 98880b 1648p requeues 0
qdisc netem 8002: parent 1:1 limit 100000 delay 2.0ms
Sent 39601416 bytes 658721 pkt (dropped 0, overlimits 0 requeues 0)
rate 38235Kbit 79657pps backlog 12240b 204p requeues 0
qdisc netem 8003: parent 1:2 limit 100000 delay 2.0ms
Sent 39472866 bytes 657227 pkt (dropped 0, overlimits 0 requeues 0)
rate 38234Kbit 79655pps backlog 10620b 176p requeues 0
qdisc netem 8004: parent 1:3 limit 100000 delay 2.0ms
Sent 39703417 bytes 659699 pkt (dropped 0, overlimits 0 requeues 0)
rate 38320Kbit 79831pps backlog 12780b 213p requeues 0
qdisc netem 8005: parent 1:4 limit 100000 delay 2.0ms
Sent 39565149 bytes 658011 pkt (dropped 0, overlimits 0 requeues 0)
rate 38174Kbit 79530pps backlog 11880b 198p requeues 0
qdisc netem 8006: parent 1:5 limit 100000 delay 2.0ms
Sent 39506078 bytes 657354 pkt (dropped 0, overlimits 0 requeues 0)
rate 38195Kbit 79571pps backlog 12480b 208p requeues 0
qdisc netem 8007: parent 1:6 limit 100000 delay 2.0ms
Sent 39675994 bytes 658849 pkt (dropped 0, overlimits 0 requeues 0)
rate 38323Kbit 79838pps backlog 12600b 210p requeues 0
qdisc netem 8008: parent 1:7 limit 100000 delay 2.0ms
Sent 39532042 bytes 658367 pkt (dropped 0, overlimits 0 requeues 0)
rate 38177Kbit 79536pps backlog 13140b 219p requeues 0
qdisc netem 8009: parent 1:8 limit 100000 delay 2.0ms
Sent 39488164 bytes 657705 pkt (dropped 0, overlimits 0 requeues 0)
rate 38192Kbit 79568pps backlog 13Kb 222p requeues 0

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: John Fastabend <john.fastabend@gmail.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# f40ae913 16-Apr-2015 Herbert Xu <herbert@gondor.apana.org.au>

act_mirred: Fix bogus header when redirecting from VLAN

When you redirect a VLAN device to any device, you end up with
crap in af_packet on the xmit path because hard_header_len is
not equal to skb->mac_len. So the redirected packet contains
four extra bytes at the start which then gets interpreted as
part of the MAC address.

This patch fixes this by only pushing skb->mac_len. We also
need to fix ifb because it tries to undo the pushing done by
act_mirred.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 02875878 05-Oct-2014 Eric Dumazet <edumazet@google.com>

net: better IFF_XMIT_DST_RELEASE support

Testing xmit_more support with netperf and connected UDP sockets,
I found strange dst refcount false sharing.

Current handling of IFF_XMIT_DST_RELEASE is not optimal.

Dropping dst in validate_xmit_skb() is certainly too late in case
packet was queued by cpu X but dequeued by cpu Y

The logical point to take care of drop/force is in __dev_queue_xmit()
before even taking qdisc lock.

As Julian Anastasov pointed out, need for skb_dst() might come from some
packet schedulers or classifiers.

This patch adds new helper to cleanly express needs of various drivers
or qdiscs/classifiers.

Drivers that need skb_dst() in their ndo_start_xmit() should call
following helper in their setup instead of the prior :

dev->priv_flags &= ~IFF_XMIT_DST_RELEASE;
->
netif_keep_dst(dev);

Instead of using a single bit, we use two bits, one being
eventually rebuilt in bonding/team drivers.

The other one, is permanent and blocks IFF_XMIT_DST_RELEASE being
rebuilt in bonding/team. Eventually, we could add something
smarter later.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>


# c835a677 14-Jul-2014 Tom Gundersen <teg@jklm.no>

net: set name_assign_type in alloc_netdev()

Extend alloc_netdev{,_mq{,s}}() to take name_assign_type as argument, and convert
all users to pass NET_NAME_UNKNOWN.

Coccinelle patch:

@@
expression sizeof_priv, name, setup, txqs, rxqs, count;
@@

(
-alloc_netdev_mqs(sizeof_priv, name, setup, txqs, rxqs)
+alloc_netdev_mqs(sizeof_priv, name, NET_NAME_UNKNOWN, setup, txqs, rxqs)
|
-alloc_netdev_mq(sizeof_priv, name, setup, count)
+alloc_netdev_mq(sizeof_priv, name, NET_NAME_UNKNOWN, setup, count)
|
-alloc_netdev(sizeof_priv, name, setup)
+alloc_netdev(sizeof_priv, name, NET_NAME_UNKNOWN, setup)
)

v9: move comments here from the wrong commit

Signed-off-by: Tom Gundersen <teg@jklm.no>
Reviewed-by: David Herrmann <dh.herrmann@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 8dd6e147 27-Mar-2014 Vlad Yasevich <vyasevic@redhat.com>

ifb: Remove vlan acceleration from vlan_features

Do not include vlan acceleration features in vlan_features as that
precludes correct Q-in-Q operation.

Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 57a7744e 13-Mar-2014 Eric W. Biederman <ebiederm@xmission.com>

net: Replace u64_stats_fetch_begin_bh to u64_stats_fetch_begin_irq

Replace the bh safe variant with the hard irq safe variant.

We need a hard irq safe variant to deal with netpoll transmitting
packets from hard irq context, and we need it in most if not all of
the places using the bh safe variant.

Except on 32bit uni-processor the code is exactly the same so don't
bother with a bh variant, just have a hard irq safe variant that
everyone can use.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 827da44c 07-Oct-2013 John Stultz <john.stultz@linaro.org>

net: Explicitly initialize u64_stats_sync structures for lockdep

In order to enable lockdep on seqcount/seqlock structures, we
must explicitly initialize any locks.

The u64_stats_sync structure, uses a seqcount, and thus we need
to introduce a u64_stats_init() function and use it to initialize
the structure.

This unfortunately adds a lot of fairly trivial initialization code
to a number of drivers. But the benefit of ensuring correctness makes
this worth while.

Because these changes are required for lockdep to be enabled, and the
changes are quite trivial, I've not yet split this patch out into 30-some
separate patches, as I figured it would be better to get the various
maintainers thoughts on how to best merge this change along with
the seqcount lockdep enablement.

Feedback would be appreciated!

Signed-off-by: John Stultz <john.stultz@linaro.org>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Cc: James Morris <jmorris@namei.org>
Cc: Jesse Gross <jesse@nicira.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Mirko Lindner <mlindner@marvell.com>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Roger Luethi <rl@hellgate.ch>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Simon Horman <horms@verge.net.au>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Cc: Wensong Zhang <wensong@linux-vs.org>
Cc: netdev@vger.kernel.org
Link: http://lkml.kernel.org/r/1381186321-4906-2-git-send-email-john.stultz@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# f2966cd5 11-Jul-2013 dingtianhong <dingtianhong@huawei.com>

ifb: fix oops when loading the ifb failed

If __rtnl_link_register() return faild when loading the ifb, it will
take the wrong path and get oops, so fix it just like dummy.

Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 440d57bc 09-Jul-2013 dingtianhong <dingtianhong@huawei.com>

ifb: fix rcu_sched self-detected stalls

According to the commit 16b0dc29c1af9df341428f4c49ada4f626258082
(dummy: fix rcu_sched self-detected stalls)

Eric Dumazet fix the problem in dummy, but the ifb will occur the
same problem like the dummy modules.

Trying to "modprobe ifb numifbs=30000" triggers :

INFO: rcu_sched self-detected stall on CPU

After this splat, RTNL is locked and reboot is needed.

We must call cond_resched() to avoid this, even holding RTNL.

Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 28d2b136 18-Apr-2013 Patrick McHardy <kaber@trash.net>

net: vlan: announce STAG offload capability in some drivers

- macvlan: propagate STAG filtering capabilities from underlying device
- ifb: announce STAG tagging support in addition to CTAG tagging support
- veth: announce STAG tagging/stripping support in addition to CTAG support

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>


# f646968f 18-Apr-2013 Patrick McHardy <kaber@trash.net>

net: vlan: rename NETIF_F_HW_VLAN_* feature flags to NETIF_F_HW_VLAN_CTAG_*

Rename the hardware VLAN acceleration features to include "CTAG" to indicate
that they only support CTAGs. Follow up patches will introduce 802.1ad
server provider tagging (STAGs) and require the distinction for hardware not
supporting acclerating both.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 73bf0d0e 13-Jan-2013 Eric Dumazet <edumazet@google.com>

ifb: dont hard code inet_net use

ifb should lookup devices in the appropriate namespace.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Benjamin LaHaise <bcrl@kvack.org>
Signed-off-by: David S. Miller <davem@davemloft.net>


# f2cedb63 14-Feb-2012 Danny Kukawka <danny.kukawka@bisect.de>

net: replace random_ether_addr() with eth_hw_addr_random()

Replace usage of random_ether_addr() with eth_hw_addr_random()
to set addr_assign_type correctly to NET_ADDR_RANDOM.

Change the trivial cases.

v2: adapt to renamed eth_hw_addr_random()

Signed-off-by: Danny Kukawka <danny.kukawka@bisect.de>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 34324dc2 15-Nov-2011 Michał Mirosław <mirq-linux@rere.qmqm.pl>

net: remove NETIF_F_NO_CSUM feature bit

Only distinct use is checking if NETIF_F_NOCACHE_COPY should be
enabled by default. The check heuristics is altered a bit here,
so it hits other people than before. The default shouldn't be
trusted for performance-critical cases anyway.

For all other uses NETIF_F_NO_CSUM is equivalent to NETIF_F_HW_CSUM.

Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 550fd08c 26-Jul-2011 Neil Horman <nhorman@tuxdriver.com>

net: Audit drivers to identify those needing IFF_TX_SKB_SHARING cleared

After the last patch, We are left in a state in which only drivers calling
ether_setup have IFF_TX_SKB_SHARING set (we assume that drivers touching real
hardware call ether_setup for their net_devices and don't hold any state in
their skbs. There are a handful of drivers that violate this assumption of
course, and need to be fixed up. This patch identifies those drivers, and marks
them as not being able to support the safe transmission of skbs by clearning the
IFF_TX_SKB_SHARING flag in priv_flags

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
CC: Karsten Keil <isdn@linux-pingi.de>
CC: "David S. Miller" <davem@davemloft.net>
CC: Jay Vosburgh <fubar@us.ibm.com>
CC: Andy Gospodarek <andy@greyhouse.net>
CC: Patrick McHardy <kaber@trash.net>
CC: Krzysztof Halasa <khc@pm.waw.pl>
CC: "John W. Linville" <linville@tuxdriver.com>
CC: Greg Kroah-Hartman <gregkh@suse.de>
CC: Marcel Holtmann <marcel@holtmann.org>
CC: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 3b0c9cbb 20-Jun-2011 stephen hemminger <shemminger@vyatta.com>

ifb: convert to 64 bit stats

Convert input functional block device to use 64 bit stats.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>


# a6b7a407 06-Jun-2011 Alexey Dobriyan <adobriyan@gmail.com>

net: remove interrupt.h inclusion from netdevice.h

* remove interrupt.g inclusion from netdevice.h -- not needed
* fixup fallout, add interrupt.h and hardirq.h back where needed.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 1c5cae81 29-Apr-2011 Jiri Pirko <jpirko@redhat.com>

net: call dev_alloc_name from register_netdevice

Force dev_alloc_name() to be called from register_netdevice() by
dev_get_valid_name(). That allows to remove multiple explicit
dev_alloc_name() calls.

The possibility to call dev_alloc_name in advance remains.

This also fixes veth creation regresion caused by
84c49d8c3e4abefb0a41a77b25aa37ebe8d6b743

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 39980292 03-Jan-2011 Eric Dumazet <eric.dumazet@gmail.com>

ifb: add performance flags

Le lundi 03 janvier 2011 à 11:40 -0800, David Miller a écrit :
> From: Jarek Poplawski <jarkao2@gmail.com>
> Date: Mon, 3 Jan 2011 20:37:03 +0100
>
> > On Sun, Jan 02, 2011 at 09:24:36PM +0100, Eric Dumazet wrote:
> >> Le mercredi 29 décembre 2010 ?? 00:07 +0100, Jarek Poplawski a écrit :
> >>
> >> > Ingress is before vlans handler so these features and the
> >> > NETIF_F_HW_VLAN_TX flag seem useful for ifb considering
> >> > dev_hard_start_xmit() checks.
> >>
> >> OK, here is v2 of the patch then, thanks everybody.
> >>
> >>
> >> [PATCH v2 net-next-2.6] ifb: add performance flags
> >>
> >> IFB can use the full set of features flags (NETIF_F_SG |
> >> NETIF_F_FRAGLIST | NETIF_F_TSO | NETIF_F_NO_CSUM | NETIF_F_HIGHDMA) to
> >> avoid unnecessary split of some packets (GRO for example)
> >>
> >> Changli suggested to also set vlan_features,
> >
> > He also suggested more GSO flags of which especially NETIF_F_TSO6
> > seems interesting (wrt GRO)?
>
> I think at least TSO6 would very much be appropriate here.

Yes, why not, I am only wondering why loopback / dummy (and others ?)
only set NETIF_F_TSO :)

Since I want to play with ECN, I might also add NETIF_F_TSO_ECN ;)

For other flags, I really doubt it can matter on ifb ?

[PATCH v3 net-next-2.6] ifb: add performance flags

IFB can use the full set of features flags (NETIF_F_SG |
NETIF_F_FRAGLIST | NETIF_F_TSO | NETIF_F_NO_CSUM | NETIF_F_HIGHDMA) to
avoid unnecessary split of some packets (GRO for example)

Changli suggested to also set vlan_features, NETIF_F_TSO6,
NETIF_F_TSO_ECN.

Jarek suggested to add NETIF_F_HW_VLAN_TX as well.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Changli Gao <xiaosuo@gmail.com>
Cc: Jarek Poplawski <jarkao2@gmail.com>
Cc: Pawel Staszewski <pstaszewski@itcare.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 1a75972c 14-Dec-2010 Eric Dumazet <eric.dumazet@gmail.com>

ifb: use netif_receive_skb() instead of netif_rx()

In ri_tasklet(), we run from softirq, so can directly handle packet
through netif_receive_skb() instead of netif_rx().
There is no risk of recursion.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Changli Gao <xiaosuo@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 7edc3453 15-Dec-2010 Eric Dumazet <eric.dumazet@gmail.com>

ifb: fix a lockdep splat

After recent ifb changes, we must use lockless __skb_dequeue() since
lock is not anymore initialized.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Jamal Hadi Salim <hadi@cyberus.ca>
Cc: Changli Gao <xiaosuo@gmail.com>
Acked-by: Changli Gao <xiaosuo@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 957fca95 04-Dec-2010 Changli Gao <xiaosuo@gmail.com>

ifb: use the lockless variants of skb_queue

rq and tq are both protected by tx queue lock, so we can simply use
the lockless variants of skb_queue.

skb_queue_splice_tail_init() is used instead of the open coded and slow
one.

Signed-off-by: Changli Gao <xiaosuo@gmail.com>
Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>


# c6350362 03-Dec-2010 Changli Gao <xiaosuo@gmail.com>

ifb: remove unused macro TX_TIMEOUT

Signed-off-by: Changli Gao <xiaosuo@gmail.com>
Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>


# e1f91505 03-Dec-2010 Changli Gao <xiaosuo@gmail.com>

ifb: remove the useless debug stats

These debug stats are not exported, and become useless.

Signed-off-by: Changli Gao <xiaosuo@gmail.com>
Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 75c1c825 04-Dec-2010 Changli Gao <xiaosuo@gmail.com>

ifb: goto resched directly if error happens and dp->tq isn't empty

If we break the loop when there are still skbs in tq and no skb in
rq, the skbs will be left in txq until new skbs are enqueued into rq.
In rare cases, no new skb is queued, then these skbs will stay in rq
forever.

After this patch, if tq isn't empty when we break the loop, we goto
resched directly.

Signed-off-by: Changli Gao <xiaosuo@gmail.com>
Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 1ae5dc34 10-May-2010 Eric Dumazet <eric.dumazet@gmail.com>

net: trans_start cleanups

Now that core network takes care of trans_start updates, dont do it
in drivers themselves, if possible. Drivers can avoid one cache miss
(on dev->trans_start) in their start_xmit() handler.

Exceptions are NETIF_F_LLTX drivers

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 8964be4a 20-Nov-2009 Eric Dumazet <eric.dumazet@gmail.com>

net: rename skb->iif to skb->skb_iif

To help grep games, rename iif to skb_iif

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 05e8689c 01-Nov-2009 Eric Dumazet <eric.dumazet@gmail.com>

ifb: RCU locking avoids touching dev refcount

Avoids touching dev refcount in hotpath

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# db519144 19-Oct-2009 Eric Dumazet <eric.dumazet@gmail.com>

ifb: should not use __dev_get_by_index() without locks

At this point (ri_tasklet()), RTNL or dev_base_lock are not held,
we must use dev_get_by_index() instead of __dev_get_by_index()

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 424efe9c 31-Aug-2009 Stephen Hemminger <shemminger@vyatta.com>

netdev: convert pseudo drivers to netdev_tx_t

These are all drivers that don't touch real hardware.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# ec634fe3 05-Jul-2009 Patrick McHardy <kaber@trash.net>

net: convert remaining non-symbolic return values in ndo_start_xmit() functions

This patch converts the remaining occurences of raw return values to their
symbolic counterparts in ndo_start_xmit() functions that were missed by the
previous automatic conversion.

Additionally code that assumed the symbolic value of NETDEV_TX_OK to be zero
is changed to explicitly use NETDEV_TX_OK.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 93f154b5 18-May-2009 Eric Dumazet <dada1@cosmosbay.com>

net: release dst entry in dev_hard_start_xmit()

One point of contention in high network loads is the dst_release() performed
when a transmited skb is freed. This is because NIC tx completion calls
dev_kree_skb() long after original call to dev_queue_xmit(skb).

CPU cache is cold and the atomic op in dst_release() stalls. On SMP, this is
quite visible if one CPU is 100% handling softirqs for a network device,
since dst_clone() is done by other cpus, involving cache line ping pongs.

It seems right place to release dst is in dev_hard_start_xmit(), for most
devices but ones that are virtual, and some exceptions.

David Miller suggested to define a new device flag, set in alloc_netdev_mq()
(so that most devices set it at init time), and carefuly unset in devices
which dont want a NULL skb->dst in their ndo_start_xmit().

List of devices that must clear this flag is :

- loopback device, because it calls netif_rx() and quoting Patrick :
"ip_route_input() doesn't accept loopback addresses, so loopback packets
already need to have a dst_entry attached."
- appletalk/ipddp.c : needs skb->dst in its xmit function

- And all devices that call again dev_queue_xmit() from their xmit function
(as some classifiers need skb->dst) : bonding, vlan, macvlan, eql, ifb, hdlc_fr

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 00829823 20-Nov-2008 Stephen Hemminger <shemminger@vyatta.com>

netdev: add more functions to netdevice ops

This patch moves neigh_setup and hard_start_xmit into the network device ops
structure. For bisection, fix all the previously converted drivers as well.
Bonding driver took the biggest hit on this.

Added a prefetch of the hard_start_xmit in the fast path to try and reduce
any impact this would have.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 8dfcdf34 19-Nov-2008 Stephen Hemminger <shemminger@vyatta.com>

ifb: convert to net_device_ops

Convert to new network device ops interface.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# c3f26a26 31-Jul-2008 David S. Miller <davem@davemloft.net>

netdev: Fix lockdep warnings in multiqueue configurations.

When support for multiple TX queues were added, the
netif_tx_lock() routines we converted to iterate over
all TX queues and grab each queue's spinlock.

This causes heartburn for lockdep and it's not a healthy
thing to do with lots of TX queues anyways.

So modify this to use a top-level lock and a "frozen"
state for the individual TX queues.

Signed-off-by: David S. Miller <davem@davemloft.net>


# 83874000 17-Jul-2008 David S. Miller <davem@davemloft.net>

pkt_sched: Kill netdev_queue lock.

We can simply use the qdisc->q.lock for all of the
qdisc tree synchronization.

Signed-off-by: David S. Miller <davem@davemloft.net>


# e8a0464c 17-Jul-2008 David S. Miller <davem@davemloft.net>

netdev: Allocate multiple queues for TX.

alloc_netdev_mq() now allocates an array of netdev_queue
structures for TX, based upon the queue_count argument.

Furthermore, all accesses to the TX queues are now vectored
through the netdev_get_tx_queue() and netdev_for_each_tx_queue()
interfaces. This makes it easy to grep the tree for all
things that want to get to a TX queue of a net device.

Problem spots which are not really multiqueue aware yet, and
only work with one queue, can easily be spotted by grepping
for all netdev_get_tx_queue() calls that pass in a zero index.

Signed-off-by: David S. Miller <davem@davemloft.net>


# 555353cf 08-Jul-2008 David S. Miller <davem@davemloft.net>

netdev: The ingress_lock member is no longer needed.

Every qdisc is assosciated with a queue, and in the case of ingress
qdiscs that will now be netdev->rx_queue so using that queue's lock is
the thing to do.

Signed-off-by: David S. Miller <davem@davemloft.net>


# dc2b4847 08-Jul-2008 David S. Miller <davem@davemloft.net>

netdev: Move queue_lock into struct netdev_queue.

The lock is now an attribute of the device queue.

One thing to notice is that "suspicious" places
emerge which will need specific training about
multiple queue handling. They are so marked with
explicit "netdev->rx_queue" and "netdev->tx_queue"
references.

Signed-off-by: David S. Miller <davem@davemloft.net>


# 94833dfb 20-Mar-2008 Jarek Poplawski <jarkao2@gmail.com>

[NET] ifb: set separate lockdep classes for queue locks

[ 10.536424] =======================================================
[ 10.536424] [ INFO: possible circular locking dependency detected ]
[ 10.536424] 2.6.25-rc3-devel #3
[ 10.536424] -------------------------------------------------------
[ 10.536424] swapper/0 is trying to acquire lock:
[ 10.536424] (&dev->queue_lock){-+..}, at: [<c0299b4a>]
dev_queue_xmit+0x175/0x2f3
[ 10.536424]
[ 10.536424] but task is already holding lock:
[ 10.536424] (&p->tcfc_lock){-+..}, at: [<f8a67154>] tcf_mirred+0x20/0x178
[act_mirred]
[ 10.536424]
[ 10.536424] which lock already depends on the new lock.

lockdep warns of locking order while using ifb with sch_ingress and
act_mirred: ingress_lock, tcfc_lock, queue_lock (usually queue_lock
is at the beginning). This patch is only to tell lockdep that ifb is
a different device (e.g. from eth) and has its own pair of queue
locks. (This warning is a false-positive in common scenario of using
ifb; yet there are possible situations, when this order could be
dangerous; lockdep should warn in such a case.) (With suggestions by
David S. Miller)

Reported-and-tested-by: Denys Fedoryshchenko <denys@visp.net.lb>
Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Acked-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 09f75cd7 03-Oct-2007 Jeff Garzik <jeff@garzik.org>

[NET] drivers/net: statistics cleanup #1 -- save memory and shrink code

We now have struct net_device_stats embedded in struct net_device,
and the default ->get_stats() hook does the obvious thing for us.

Run through drivers/net/* and remove the driver-local storage of
statistics, and driver-local ->get_stats() hook where applicable.

This was just the low-hanging fruit in drivers/net; plenty more drivers
remain to be updated.

[ Resolved conflicts with napi_struct changes and fix sunqe build
regression... -DaveM ]

Signed-off-by: Jeff Garzik <jeff@garzik.org>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 10d024c1 17-Sep-2007 Ralf Baechle <ralf@linux-mips.org>

[NET]: Nuke SET_MODULE_OWNER macro.

It's been a useless no-op for long enough in 2.6 so I figured it's time to
remove it. The number of people that could object because they're
maintaining unified 2.4 and 2.6 drivers is probably rather small.

[ Handled drivers added by netdev tree and some missed IRDA cases... -DaveM ]

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 881d966b 17-Sep-2007 Eric W. Biederman <ebiederm@xmission.com>

[NET]: Make the device list and device lookups per namespace.

This patch makes most of the generic device layer network
namespace safe. This patch makes dev_base_head a
network namespace variable, and then it picks up
a few associated variables. The functions:
dev_getbyhwaddr
dev_getfirsthwbytype
dev_get_by_flags
dev_get_by_name
__dev_get_by_name
dev_get_by_index
__dev_get_by_index
dev_ioctl
dev_ethtool
dev_load
wireless_process_ioctl

were modified to take a network namespace argument, and
deal with it.

vlan_ioctl_set and brioctl_set were modified so their
hooks will receive a network namespace argument.

So basically anthing in the core of the network stack that was
affected to by the change of dev_base was modified to handle
multiple network namespaces. The rest of the network stack was
simply modified to explicitly use &init_net the initial network
namespace. This can be fixed when those components of the network
stack are modified to handle multiple network namespaces.

For now the ifindex generator is left global.

Fundametally ifindex numbers are per namespace, or else
we will have corner case problems with migration when
we get that far.

At the same time there are assumptions in the network stack
that the ifindex of a network device won't change. Making
the ifindex number global seems a good compromise until
the network stack can cope with ifindex changes when
you change namespaces, and the like.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 0e06877c 11-Jul-2007 Patrick McHardy <kaber@trash.net>

[RTNETLINK]: rtnl_link: allow specifying initial device address

Drivers need to validate the initial addresses in their netlink attribute
validation function or manually reject them if they can't support this.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 2d85cba2 11-Jul-2007 Patrick McHardy <kaber@trash.net>

[RTNETLINK]: rtnl_link API simplification

All drivers need to unregister their devices in the module unload function.
While doing so they must hold the rtnl and atomically unregister the
rtnl_link ops as well. This makes the rtnl_link_unregister function that
takes the rtnl itself completely useless.

Provide default newlink/dellink functions, make __rtnl_link_unregister and
rtnl_link_unregister unregister all devices with matching rtnl_link_ops and
change the existing users to take advantage of that.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 9ba2cd65 13-Jun-2007 Patrick McHardy <kaber@trash.net>

[IFB]: Use rtnl_link API

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 62b7ffca 13-Jun-2007 Patrick McHardy <kaber@trash.net>

[IFB]: Keep ifb devices on list

Use a list instead of an array to allow creating new devices.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>


# c01003c2 29-Mar-2007 Patrick McHardy <kaber@trash.net>

[IFB]: Fix crash on input device removal

The input_device pointer is not refcounted, which means the device may
disappear while packets are queued, causing a crash when ifb passes packets
with a stale skb->dev pointer to netif_rx().

Fix by storing the interface index instead and do a lookup where neccessary.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Acked-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>


# bcdddfb6 30-Jan-2007 Linus Torvalds <torvalds@woody.linux-foundation.org>

Revert "net: ifb error path loop fix"

This reverts commit 0c0b3ae68ec93b1db5c637d294647d1cca0df763.

Quoth David:

"Jeff, please revert

It's wrong. We had a lengthy analysis of this piece of code
several months ago, and it is correct.

Consider, if we run the loop and we get an error
the following happens:

1) attempt of ifb_init_one(i) fails, therefore we should
not try to "ifb_free_one()" on "i" since it failed
2) the loop iteration first increments "i", then it
check for error

Therefore we must decrement "i" twice before the first
free during the cleanup. One to "undo" the for() loop
increment, and one to "skip" the ifb_init_one() case which
failed."

Reported-by: David Miller <davem@davemloft.net>
Acked-by: Jeff Garzik <jgarzik@pobox.com>
Cc: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 0c0b3ae6 27-Jan-2007 Mariusz Kozlowski <m.kozlowski@tuxland.pl>

net: ifb error path loop fix

On error we should start freeing resources at [i-1] not [i-2].

Signed-off-by: Mariusz Kozlowski <m.kozlowski@tuxland.pl>
Cc: Jeff Garzik <jeff@garzik.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Jeff Garzik <jeff@garzik.org>


# 3136dcb3 01-Jan-2007 dean gaudet <dean@arctic.org>

[NET]: ifb double-counts packets

Signed-off-by: dean gaudet <dean@arctic.org>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 8057de64 03-Oct-2006 Zach Brown <zach.brown@oracle.com>

[PATCH] pr_debug: ifb: replace missing comma to separate pr_debug arguments

ifb: replace missing comma to separate pr_debug arguments

Signed-off-by: Zach Brown <zach.brown@oracle.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# 6aa20a22 13-Sep-2006 Jeff Garzik <jeff@garzik.org>

drivers/net: Trim trailing whitespace

Signed-off-by: Jeff Garzik <jeff@garzik.org>


# 4a9c74e5 21-Jul-2006 Nicolas Dichtel <nicolas.dichtel@6wind.com>

[IFB] After ifb_init_one() failed, i is increased. Decrease

It before entering in the loop for freeing the other ifb devices.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Acked-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 6ab3d562 30-Jun-2006 Jörn Engel <joern@wohnheim.fh-wedel.de>

Remove obsolete #include <linux/config.h>

Signed-off-by: Jörn Engel <joern@wohnheim.fh-wedel.de>
Signed-off-by: Adrian Bunk <bunk@stusta.de>


# 932ff279 09-Jun-2006 Herbert Xu <herbert@gondor.apana.org.au>

[NET]: Add netif_tx_lock

Various drivers use xmit_lock internally to synchronise with their
transmission routines. They do so without setting xmit_lock_owner.
This is fine as long as netpoll is not in use.

With netpoll it is possible for deadlocks to occur if xmit_lock_owner
isn't set. This is because if a printk occurs while xmit_lock is held
and xmit_lock_owner is not set can cause netpoll to attempt to take
xmit_lock recursively.

While it is possible to resolve this by getting netpoll to use
trylock, it is suboptimal because netpoll's sole objective is to
maximise the chance of getting the printk out on the wire. So
delaying or dropping the message is to be avoided as much as possible.

So the only alternative is to always set xmit_lock_owner. The
following patch does this by introducing the netif_tx_lock family of
functions that take care of setting/unsetting xmit_lock_owner.

I renamed xmit_lock to _xmit_lock to indicate that it should not be
used directly. I didn't provide irq versions of the netif_tx_lock
functions since xmit_lock is meant to be a BH-disabling lock.

This is pretty much a straight text substitution except for a small
bug fix in winbond. It currently uses
netif_stop_queue/spin_unlock_wait to stop transmission. This is
unsafe as an IRQ can potentially wake up the queue. So it is safer to
use netif_tx_disable.

The hamradio bits used spin_lock_irq but it is unnecessary as
xmit_lock must never be taken in an IRQ handler.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 35eaa31e 23-Feb-2006 Richard Lucassen <spamtrap@lucassen.org>

[NET]: Increase default IFB device count.

The most usable number of ifb devices is 2. Change the default to 2.

Signed-off-by: Richard Lucassen <spamtrap@lucassen.org>
Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 253af423 08-Jan-2006 Jamal Hadi Salim <hadi@cyberus.ca>

[NET]: Add IFB (Intermediate Functional Block) network device.

A new device to do intermidiate functional block in a system shared
manner. To use the new functionality, you need to turn on
qos/classifier actions.

The new functionality can be grouped as:

1) qdiscs/policies that are per device as opposed to system wide. ifb
allows for a device which can be redirected to thus providing an
impression of sharing.

2) Allows for queueing incoming traffic for shaping instead of
dropping.

Packets are redirected to this device using tc/action mirred redirect
construct. If they are sent to it by plain routing instead then they
will merely be dropped and the stats would indicate that.

Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>