History log of /linux-master/net/bridge/br_if.c
Revision Date Author Comments
# 6ca3c005 30-Jun-2023 Vladimir Oltean <vladimir.oltean@nxp.com>

net: bridge: keep ports without IFF_UNICAST_FLT in BR_PROMISC mode

According to the synchronization rules for .ndo_get_stats() as seen in
Documentation/networking/netdevices.rst, acquiring a plain spin_lock()
should not be illegal, but the bridge driver implementation makes it so.

After running these commands, I am being faced with the following
lockdep splat:

$ ip link add link swp0 name macsec0 type macsec encrypt on && ip link set swp0 up
$ ip link add dev br0 type bridge vlan_filtering 1 && ip link set br0 up
$ ip link set macsec0 master br0 && ip link set macsec0 up

========================================================
WARNING: possible irq lock inversion dependency detected
6.4.0-04295-g31b577b4bd4a #603 Not tainted
--------------------------------------------------------
swapper/1/0 just changed the state of lock:
ffff6bd348724cd8 (&br->lock){+.-.}-{3:3}, at: br_forward_delay_timer_expired+0x34/0x198
but this lock took another, SOFTIRQ-unsafe lock in the past:
(&ocelot->stats_lock){+.+.}-{3:3}

and interrupts could create inverse lock ordering between them.

other info that might help us debug this:
Chain exists of:
&br->lock --> &br->hash_lock --> &ocelot->stats_lock

Possible interrupt unsafe locking scenario:

CPU0 CPU1
---- ----
lock(&ocelot->stats_lock);
local_irq_disable();
lock(&br->lock);
lock(&br->hash_lock);
<Interrupt>
lock(&br->lock);

*** DEADLOCK ***

(details about the 3 locks skipped)

swp0 is instantiated by drivers/net/dsa/ocelot/felix.c, and this
only matters to the extent that its .ndo_get_stats64() method calls
spin_lock(&ocelot->stats_lock).

Documentation/locking/lockdep-design.rst says:

| A lock is irq-safe means it was ever used in an irq context, while a lock
| is irq-unsafe means it was ever acquired with irq enabled.

(...)

| Furthermore, the following usage based lock dependencies are not allowed
| between any two lock-classes::
|
| <hardirq-safe> -> <hardirq-unsafe>
| <softirq-safe> -> <softirq-unsafe>

Lockdep marks br->hash_lock as softirq-safe, because it is sometimes
taken in softirq context (for example br_fdb_update() which runs in
NET_RX softirq), and when it's not in softirq context it blocks softirqs
by using spin_lock_bh().

Lockdep marks ocelot->stats_lock as softirq-unsafe, because it never
blocks softirqs from running, and it is never taken from softirq
context. So it can always be interrupted by softirqs.

There is a call path through which a function that holds br->hash_lock:
fdb_add_hw_addr() will call a function that acquires ocelot->stats_lock:
ocelot_port_get_stats64(). This can be seen below:

ocelot_port_get_stats64+0x3c/0x1e0
felix_get_stats64+0x20/0x38
dsa_slave_get_stats64+0x3c/0x60
dev_get_stats+0x74/0x2c8
rtnl_fill_stats+0x4c/0x150
rtnl_fill_ifinfo+0x5cc/0x7b8
rtmsg_ifinfo_build_skb+0xe4/0x150
rtmsg_ifinfo+0x5c/0xb0
__dev_notify_flags+0x58/0x200
__dev_set_promiscuity+0xa0/0x1f8
dev_set_promiscuity+0x30/0x70
macsec_dev_change_rx_flags+0x68/0x88
__dev_set_promiscuity+0x1a8/0x1f8
__dev_set_rx_mode+0x74/0xa8
dev_uc_add+0x74/0xa0
fdb_add_hw_addr+0x68/0xd8
fdb_add_local+0xc4/0x110
br_fdb_add_local+0x54/0x88
br_add_if+0x338/0x4a0
br_add_slave+0x20/0x38
do_setlink+0x3a4/0xcb8
rtnl_newlink+0x758/0x9d0
rtnetlink_rcv_msg+0x2f0/0x550
netlink_rcv_skb+0x128/0x148
rtnetlink_rcv+0x24/0x38

the plain English explanation for it is:

The macsec0 bridge port is created without p->flags & BR_PROMISC,
because it is what br_manage_promisc() decides for a VLAN filtering
bridge with a single auto port.

As part of the br_add_if() procedure, br_fdb_add_local() is called for
the MAC address of the device, and this results in a call to
dev_uc_add() for macsec0 while the softirq-safe br->hash_lock is taken.

Because macsec0 does not have IFF_UNICAST_FLT, dev_uc_add() ends up
calling __dev_set_promiscuity() for macsec0, which is propagated by its
implementation, macsec_dev_change_rx_flags(), to the lower device: swp0.
This triggers the call path:

dev_set_promiscuity(swp0)
-> rtmsg_ifinfo()
-> dev_get_stats()
-> ocelot_port_get_stats64()

with a calling context that lockdep doesn't like (br->hash_lock held).

Normally we don't see this, because even though many drivers that can be
bridge ports don't support IFF_UNICAST_FLT, we need a driver that

(a) doesn't support IFF_UNICAST_FLT, *and*
(b) it forwards the IFF_PROMISC flag to another driver, and
(c) *that* driver implements ndo_get_stats64() using a softirq-unsafe
spinlock.

Condition (b) is necessary because the first __dev_set_rx_mode() calls
__dev_set_promiscuity() with "bool notify=false", and thus, the
rtmsg_ifinfo() code path won't be entered.

The same criteria also hold true for DSA switches which don't report
IFF_UNICAST_FLT. When the DSA master uses a spin_lock() in its
ndo_get_stats64() method, the same lockdep splat can be seen.

I think the deadlock possibility is real, even though I didn't reproduce
it, and I'm thinking of the following situation to support that claim:

fdb_add_hw_addr() runs on a CPU A, in a context with softirqs locally
disabled and br->hash_lock held, and may end up attempting to acquire
ocelot->stats_lock.

In parallel, ocelot->stats_lock is currently held by a thread B (say,
ocelot_check_stats_work()), which is interrupted while holding it by a
softirq which attempts to lock br->hash_lock.

Thread B cannot make progress because br->hash_lock is held by A. Whereas
thread A cannot make progress because ocelot->stats_lock is held by B.

When taking the issue at face value, the bridge can avoid that problem
by simply making the ports promiscuous from a code path with a saner
calling context (br->hash_lock not held). A bridge port without
IFF_UNICAST_FLT is going to become promiscuous as soon as we call
dev_uc_add() on it (which we do unconditionally), so why not be
preemptive and make it promiscuous right from the beginning, so as to
not be taken by surprise.

With this, we've broken the links between code that holds br->hash_lock
or br->lock and code that calls into the ndo_change_rx_flags() or
ndo_get_stats64() ops of the bridge port.

Fixes: 2796d0c648c9 ("bridge: Automatically manage port promiscuous mode.")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 6be42ed0 19-Apr-2023 Ido Schimmel <idosch@nvidia.com>

bridge: Take per-{Port, VLAN} neighbor suppression into account

The bridge driver gates the neighbor suppression code behind an internal
per-bridge flag called 'BROPT_NEIGH_SUPPRESS_ENABLED'. The flag is set
when at least one bridge port has neighbor suppression enabled.

As a preparation for per-{Port, VLAN} neighbor suppression, make sure
the global flag is also set if per-{Port, VLAN} neighbor suppression is
enabled. That is, when the 'BR_NEIGH_VLAN_SUPPRESS' flag is set on at
least one bridge port.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David S. Miller <davem@davemloft.net>


# e8c6cbd7 13-Feb-2023 Thomas Weißschuh <linux@weissschuh.net>

net: bridge: make kobj_type structure constant

Since commit ee6d3dd4ed48 ("driver core: make kobj_type constant.")
the driver core allows the usage of const struct kobj_type.

Take advantage of this to constify the structure definition to prevent
modification at runtime.

Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 02a476d9 21-Nov-2022 Greg Kroah-Hartman <gregkh@linuxfoundation.org>

kobject: make kobject_get_ownership() take a constant kobject *

The call, kobject_get_ownership(), does not modify the kobject passed
into it, so make it const. This propagates down into the kobj_type
function callbacks so make the kobject passed into them also const,
ensuring that nothing in the kobject is being changed here.

This helps make it more obvious what calls and callbacks do, and do not,
modify structures passed to them.

Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: Anna Schumaker <anna@kernel.org>
Cc: Roopa Prabhu <roopa@nvidia.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Chuck Lever <chuck.lever@oracle.com>
Cc: Jeff Layton <jlayton@kernel.org>
Cc: linux-nfs@vger.kernel.org
Cc: bridge@lists.linux-foundation.org
Cc: netdev@vger.kernel.org
Acked-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Rafael J. Wysocki <rafael@kernel.org>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://lore.kernel.org/r/20221121094649.1556002-1-gregkh@linuxfoundation.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


# bd139381 28-Sep-2022 Steven Hsieh <steven.hsieh@broadcom.com>

net: bridge: assign path_cost for 2.5G and 5G link speed

As 2.5G, 5G ethernet ports are more common and affordable,
these ports are being used in LAN bridge devices.
STP port_cost() is missing path_cost assignment for these link speeds,
causes highest cost 100 being used.
This result in lower speed port being picked
when there is loop between 5G and 1G ports.

Original path_cost: 10G=2, 1G=4, 100m=19, 10m=100
Adjusted path_cost: 10G=2, 5G=3, 2.5G=4, 1G=5, 100m=19, 10m=100
speed greater than 10G = 1

Signed-off-by: Steven Hsieh <steven.hsieh@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 920a33cd 19-Aug-2022 Vladimir Oltean <vladimir.oltean@nxp.com>

net: bridge: move DSA master bridging restriction to DSA

When DSA gains support for multiple CPU ports in a LAG, it will become
mandatory to monitor the changeupper events for the DSA master.

In fact, there are already some restrictions to be imposed in that area,
namely that a DSA master cannot be a bridge port except in some special
circumstances.

Centralize the restrictions at the level of the DSA layer as a
preliminary step.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>


# d62607c3 07-Jun-2022 Jakub Kicinski <kuba@kernel.org>

net: rename reference+tracking helpers

Netdev reference helpers have a dev_ prefix for historic
reasons. Renaming the old helpers would be too much churn
but we can rename the tracking ones which are relatively
recent and should be the default for new code.

Rename:
dev_hold_track() -> netdev_hold()
dev_put_track() -> netdev_put()
dev_replace_track() -> netdev_ref_replace()

Link: https://lore.kernel.org/r/20220608043955.919359-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# ee8b7a11 05-May-2022 Jakub Kicinski <kuba@kernel.org>

net: make drivers set the TSO limit not the GSO limit

Drivers should call the TSO setting helper, GSO is controllable
by user space.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>


# fcfb894d 12-Jan-2022 Eric Dumazet <edumazet@google.com>

net: bridge: fix net device refcount tracking issue in error path

I left one dev_put() in br_add_if() error path and sure enough
syzbot found its way.

As the tracker is allocated in new_nbp(), we must make sure
to properly free it.

We have to call dev_put_track(dev, &p->dev_tracker) before
@p object is freed, of course. This is not an issue because
br_add_if() owns a reference on @dev.

Fixes: b2dcdc7f731d ("net: bridge: add net device refcount tracker")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# b2dcdc7f 06-Dec-2021 Eric Dumazet <edumazet@google.com>

net: bridge: add net device refcount tracker

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 6d872df3 19-Nov-2021 Eric Dumazet <edumazet@google.com>

net: annotate accesses to dev->gso_max_segs

dev->gso_max_segs is written under RTNL protection, or when the device is
not yet visible, but is read locklessly.

Add netif_set_gso_max_segs() helper.

Add the READ_ONCE()/WRITE_ONCE() pairs, and use netif_set_gso_max_segs()
where we can to better document what is going on.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 4b66d216 19-Nov-2021 Eric Dumazet <edumazet@google.com>

net: annotate accesses to dev->gso_max_size

dev->gso_max_size is written under RTNL protection, or when the device is
not yet visible, but is read locklessly.

Add the READ_ONCE()/WRITE_ONCE() pairs, and use netif_set_gso_max_size()
where we can to better document what is going on.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# cc0be1ad 14-Nov-2021 Christophe JAILLET <christophe.jaillet@wanadoo.fr>

net: bridge: Slightly optimize 'find_portno()'

The 'inuse' bitmap is local to this function. So we can use the
non-atomic '__set_bit()' to save a few cycles.

While at it, also remove some useless {}.

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# f6814fdc 26-Oct-2021 Vladimir Oltean <vladimir.oltean@nxp.com>

net: bridge: rename br_fdb_insert to br_fdb_add_local

br_fdb_insert() is a wrapper over fdb_insert() that also takes the
bridge hash_lock.

With fdb_insert() being renamed to fdb_add_local(), rename
br_fdb_insert() to br_fdb_add_local().

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 254ec036 16-Oct-2021 Kyungrok Chung <acadx0@gmail.com>

net: make use of helper netif_is_bridge_master()

Make use of netdev helper functions to improve code readability.
Replace 'dev->priv_flags & IFF_EBRIDGE' with netif_is_bridge_master(dev).

Signed-off-by: Kyungrok Chung <acadx0@gmail.com>
Reviewed-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 893b1958 05-Aug-2021 Nikolay Aleksandrov <nikolay@nvidia.com>

net: bridge: fix ioctl locking

Before commit ad2f99aedf8f ("net: bridge: move bridge ioctls out of
.ndo_do_ioctl") the bridge ioctl calls were divided in two parts:
one was deviceless called by sock_ioctl and didn't expect rtnl to be held,
the other was with a device called by dev_ifsioc() and expected rtnl to be
held. After the commit above they were united in a single ioctl stub, but
it didn't take care of the locking expectations.
For sock_ioctl now we acquire (1) br_ioctl_mutex, (2) rtnl
and for dev_ifsioc we acquire (1) rtnl, (2) br_ioctl_mutex

The fix is to get a refcnt on the netdev for dev_ifsioc calls and drop rtnl
then to reacquire it in the bridge ioctl stub after br_ioctl_mutex has
been acquired. That will avoid playing locking games and make the rules
straight-forward: we always take br_ioctl_mutex first, and then rtnl.

Reported-by: syzbot+34fe5894623c4ab1b379@syzkaller.appspotmail.com
Fixes: ad2f99aedf8f ("net: bridge: move bridge ioctls out of .ndo_do_ioctl")
Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 2f5dc00f 21-Jul-2021 Vladimir Oltean <vladimir.oltean@nxp.com>

net: bridge: switchdev: let drivers inform which bridge ports are offloaded

On reception of an skb, the bridge checks if it was marked as 'already
forwarded in hardware' (checks if skb->offload_fwd_mark == 1), and if it
is, it assigns the source hardware domain of that skb based on the
hardware domain of the ingress port. Then during forwarding, it enforces
that the egress port must have a different hardware domain than the
ingress one (this is done in nbp_switchdev_allowed_egress).

Non-switchdev drivers don't report any physical switch id (neither
through devlink nor .ndo_get_port_parent_id), therefore the bridge
assigns them a hardware domain of 0, and packets coming from them will
always have skb->offload_fwd_mark = 0. So there aren't any restrictions.

Problems appear due to the fact that DSA would like to perform software
fallback for bonding and team interfaces that the physical switch cannot
offload.

+-- br0 ---+
/ / | \
/ / | \
/ | | bond0
/ | | / \
swp0 swp1 swp2 swp3 swp4

There, it is desirable that the presence of swp3 and swp4 under a
non-offloaded LAG does not preclude us from doing hardware bridging
beteen swp0, swp1 and swp2. The bandwidth of the CPU is often times high
enough that software bridging between {swp0,swp1,swp2} and bond0 is not
impractical.

But this creates an impossible paradox given the current way in which
port hardware domains are assigned. When the driver receives a packet
from swp0 (say, due to flooding), it must set skb->offload_fwd_mark to
something.

- If we set it to 0, then the bridge will forward it towards swp1, swp2
and bond0. But the switch has already forwarded it towards swp1 and
swp2 (not to bond0, remember, that isn't offloaded, so as far as the
switch is concerned, ports swp3 and swp4 are not looking up the FDB,
and the entire bond0 is a destination that is strictly behind the
CPU). But we don't want duplicated traffic towards swp1 and swp2, so
it's not ok to set skb->offload_fwd_mark = 0.

- If we set it to 1, then the bridge will not forward the skb towards
the ports with the same switchdev mark, i.e. not to swp1, swp2 and
bond0. Towards swp1 and swp2 that's ok, but towards bond0? It should
have forwarded the skb there.

So the real issue is that bond0 will be assigned the same hardware
domain as {swp0,swp1,swp2}, because the function that assigns hardware
domains to bridge ports, nbp_switchdev_add(), recurses through bond0's
lower interfaces until it finds something that implements devlink (calls
dev_get_port_parent_id with bool recurse = true). This is a problem
because the fact that bond0 can be offloaded by swp3 and swp4 in our
example is merely an assumption.

A solution is to give the bridge explicit hints as to what hardware
domain it should use for each port.

Currently, the bridging offload is very 'silent': a driver registers a
netdevice notifier, which is put on the netns's notifier chain, and
which sniffs around for NETDEV_CHANGEUPPER events where the upper is a
bridge, and the lower is an interface it knows about (one registered by
this driver, normally). Then, from within that notifier, it does a bunch
of stuff behind the bridge's back, without the bridge necessarily
knowing that there's somebody offloading that port. It looks like this:

ip link set swp0 master br0
|
v
br_add_if() calls netdev_master_upper_dev_link()
|
v
call_netdevice_notifiers
|
v
dsa_slave_netdevice_event
|
v
oh, hey! it's for me!
|
v
.port_bridge_join

What we do to solve the conundrum is to be less silent, and change the
switchdev drivers to present themselves to the bridge. Something like this:

ip link set swp0 master br0
|
v
br_add_if() calls netdev_master_upper_dev_link()
|
v bridge: Aye! I'll use this
call_netdevice_notifiers ^ ppid as the
| | hardware domain for
v | this port, and zero
dsa_slave_netdevice_event | if I got nothing.
| |
v |
oh, hey! it's for me! |
| |
v |
.port_bridge_join |
| |
+------------------------+
switchdev_bridge_port_offload(swp0, swp0)

Then stacked interfaces (like bond0 on top of swp3/swp4) would be
treated differently in DSA, depending on whether we can or cannot
offload them.

The offload case:

ip link set bond0 master br0
|
v
br_add_if() calls netdev_master_upper_dev_link()
|
v bridge: Aye! I'll use this
call_netdevice_notifiers ^ ppid as the
| | switchdev mark for
v | bond0.
dsa_slave_netdevice_event | Coincidentally (or not),
| | bond0 and swp0, swp1, swp2
v | all have the same switchdev
hmm, it's not quite for me, | mark now, since the ASIC
but my driver has already | is able to forward towards
called .port_lag_join | all these ports in hw.
for it, because I have |
a port with dp->lag_dev == bond0. |
| |
v |
.port_bridge_join |
for swp3 and swp4 |
| |
+------------------------+
switchdev_bridge_port_offload(bond0, swp3)
switchdev_bridge_port_offload(bond0, swp4)

And the non-offload case:

ip link set bond0 master br0
|
v
br_add_if() calls netdev_master_upper_dev_link()
|
v bridge waiting:
call_netdevice_notifiers ^ huh, switchdev_bridge_port_offload
| | wasn't called, okay, I'll use a
v | hwdom of zero for this one.
dsa_slave_netdevice_event : Then packets received on swp0 will
| : not be software-forwarded towards
v : swp1, but they will towards bond0.
it's not for me, but
bond0 is an upper of swp3
and swp4, but their dp->lag_dev
is NULL because they couldn't
offload it.

Basically we can draw the conclusion that the lowers of a bridge port
can come and go, so depending on the configuration of lowers for a
bridge port, it can dynamically toggle between offloaded and unoffloaded.
Therefore, we need an equivalent switchdev_bridge_port_unoffload too.

This patch changes the way any switchdev driver interacts with the
bridge. From now on, everybody needs to call switchdev_bridge_port_offload
and switchdev_bridge_port_unoffload, otherwise the bridge will treat the
port as non-offloaded and allow software flooding to other ports from
the same ASIC.

Note that these functions lay the ground for a more complex handshake
between switchdev drivers and the bridge in the future.

For drivers that will request a replay of the switchdev objects when
they offload and unoffload a bridge port (DSA, dpaa2-switch, ocelot), we
place the call to switchdev_bridge_port_unoffload() strategically inside
the NETDEV_PRECHANGEUPPER notifier's code path, and not inside
NETDEV_CHANGEUPPER. This is because the switchdev object replay helpers
need the netdev adjacency lists to be valid, and that is only true in
NETDEV_PRECHANGEUPPER.

Cc: Vadym Kochan <vkochan@marvell.com>
Cc: Taras Chornyi <tchornyi@marvell.com>
Cc: Ioana Ciornei <ioana.ciornei@nxp.com>
Cc: Lars Povlsen <lars.povlsen@microchip.com>
Cc: Steen Hegelund <Steen.Hegelund@microchip.com>
Cc: UNGLinuxDriver@microchip.com
Cc: Claudiu Manoil <claudiu.manoil@nxp.com>
Cc: Alexandre Belloni <alexandre.belloni@bootlin.com>
Cc: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Tested-by: Ioana Ciornei <ioana.ciornei@nxp.com> # dpaa2-switch: regression
Acked-by: Ioana Ciornei <ioana.ciornei@nxp.com> # dpaa2-switch
Tested-by: Horatiu Vultur <horatiu.vultur@microchip.com> # ocelot-switch
Signed-off-by: David S. Miller <davem@davemloft.net>


# 85826610 21-Jul-2021 Tobias Waldekranz <tobias@waldekranz.com>

net: bridge: switchdev: recycle unused hwdoms

Since hwdoms have only been used thus far for equality comparisons, the
bridge has used the simplest possible assignment policy; using a
counter to keep track of the last value handed out.

With the upcoming transmit offloading, we need to perform set
operations efficiently based on hwdoms, e.g. we want to answer
questions like "has this skb been forwarded to any port within this
hwdom?"

Move to a bitmap-based allocation scheme that recycles hwdoms once all
members leaves the bridge. This means that we can use a single
unsigned long to keep track of the hwdoms that have received an skb.

v1->v2: convert the typedef DECLARE_BITMAP(br_hwdom_map_t, BR_HWDOM_MAX)
into a plain unsigned long.
v2->v6: none

Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# f7cf972f 21-Jul-2021 Tobias Waldekranz <tobias@waldekranz.com>

net: bridge: disambiguate offload_fwd_mark

Before this change, four related - but distinct - concepts where named
offload_fwd_mark:

- skb->offload_fwd_mark: Set by the switchdev driver if the underlying
hardware has already forwarded this frame to the other ports in the
same hardware domain.

- nbp->offload_fwd_mark: An idetifier used to group ports that share
the same hardware forwarding domain.

- br->offload_fwd_mark: Counter used to make sure that unique IDs are
used in cases where a bridge contains ports from multiple hardware
domains.

- skb->cb->offload_fwd_mark: The hardware domain on which the frame
ingressed and was forwarded.

Introduce the term "hardware forwarding domain" ("hwdom") in the
bridge to denote a set of ports with the following property:

If an skb with skb->offload_fwd_mark set, is received on a port
belonging to hwdom N, that frame has already been forwarded to all
other ports in hwdom N.

By decoupling the name from "offload_fwd_mark", we can extend the
term's definition in the future - e.g. to add constraints that
describe expected egress behavior - without overloading the meaning of
"offload_fwd_mark".

- nbp->offload_fwd_mark thus becomes nbp->hwdom.

- br->offload_fwd_mark becomes br->last_hwdom.

- skb->cb->offload_fwd_mark becomes skb->cb->src_hwdom. The slight
change in naming here mandates a slight change in behavior of the
nbp_switchdev_frame_mark() function. Previously, it only set this
value in skb->cb for packets with skb->offload_fwd_mark true (ones
which were forwarded in hardware). Whereas now we always track the
incoming hwdom for all packets coming from a switchdev (even for the
packets which weren't forwarded in hardware, such as STP BPDUs, IGMP
reports etc). As all uses of skb->cb->offload_fwd_mark were already
gated behind checks of skb->offload_fwd_mark, this will not introduce
any functional change, but it paves the way for future changes where
the ingressing hwdom must be known for frames coming from a switchdev
regardless of whether they were forwarded in hardware or not
(basically, if the skb comes from a switchdev, skb->cb->src_hwdom now
always tracks which one).

A typical example where this is relevant: the switchdev has a fixed
configuration to trap STP BPDUs, but STP is not running on the bridge
and the group_fwd_mask allows them to be forwarded. Say we have this
setup:

br0
/ | \
/ | \
swp0 swp1 swp2

A BPDU comes in on swp0 and is trapped to the CPU; the driver does not
set skb->offload_fwd_mark. The bridge determines that the frame should
be forwarded to swp{1,2}. It is imperative that forward offloading is
_not_ allowed in this case, as the source hwdom is already "poisoned".

Recording the source hwdom allows this case to be handled properly.

v2->v3: added code comments
v3->v6: none

Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Grygorii Strashko <grygorii.strashko@ti.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 519133de 09-Aug-2021 Yang Yingliang <yangyingliang@huawei.com>

net: bridge: fix memleak in br_add_if()

I got a memleak report:

BUG: memory leak
unreferenced object 0x607ee521a658 (size 240):
comm "syz-executor.0", pid 955, jiffies 4294780569 (age 16.449s)
hex dump (first 32 bytes, cpu 1):
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
backtrace:
[<00000000d830ea5a>] br_multicast_add_port+0x1c2/0x300 net/bridge/br_multicast.c:1693
[<00000000274d9a71>] new_nbp net/bridge/br_if.c:435 [inline]
[<00000000274d9a71>] br_add_if+0x670/0x1740 net/bridge/br_if.c:611
[<0000000012ce888e>] do_set_master net/core/rtnetlink.c:2513 [inline]
[<0000000012ce888e>] do_set_master+0x1aa/0x210 net/core/rtnetlink.c:2487
[<0000000099d1cafc>] __rtnl_newlink+0x1095/0x13e0 net/core/rtnetlink.c:3457
[<00000000a01facc0>] rtnl_newlink+0x64/0xa0 net/core/rtnetlink.c:3488
[<00000000acc9186c>] rtnetlink_rcv_msg+0x369/0xa10 net/core/rtnetlink.c:5550
[<00000000d4aabb9c>] netlink_rcv_skb+0x134/0x3d0 net/netlink/af_netlink.c:2504
[<00000000bc2e12a3>] netlink_unicast_kernel net/netlink/af_netlink.c:1314 [inline]
[<00000000bc2e12a3>] netlink_unicast+0x4a0/0x6a0 net/netlink/af_netlink.c:1340
[<00000000e4dc2d0e>] netlink_sendmsg+0x789/0xc70 net/netlink/af_netlink.c:1929
[<000000000d22c8b3>] sock_sendmsg_nosec net/socket.c:654 [inline]
[<000000000d22c8b3>] sock_sendmsg+0x139/0x170 net/socket.c:674
[<00000000e281417a>] ____sys_sendmsg+0x658/0x7d0 net/socket.c:2350
[<00000000237aa2ab>] ___sys_sendmsg+0xf8/0x170 net/socket.c:2404
[<000000004f2dc381>] __sys_sendmsg+0xd3/0x190 net/socket.c:2433
[<0000000005feca6c>] do_syscall_64+0x37/0x90 arch/x86/entry/common.c:47
[<000000007304477d>] entry_SYSCALL_64_after_hwframe+0x44/0xae

On error path of br_add_if(), p->mcast_stats allocated in
new_nbp() need be freed, or it will be leaked.

Fixes: 1080ab95e3c7 ("net: bridge: add support for IGMP/MLD stats and export them via netlink")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Link: https://lore.kernel.org/r/20210809132023.978546-1-yangyingliang@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# a019abd8 02-Jul-2021 Wolfgang Bumiller <w.bumiller@proxmox.com>

net: bridge: sync fdb to new unicast-filtering ports

Since commit 2796d0c648c9 ("bridge: Automatically manage
port promiscuous mode.")
bridges with `vlan_filtering 1` and only 1 auto-port don't
set IFF_PROMISC for unicast-filtering-capable ports.

Normally on port changes `br_manage_promisc` is called to
update the promisc flags and unicast filters if necessary,
but it cannot distinguish between *new* ports and ones
losing their promisc flag, and new ports end up not
receiving the MAC address list.

Fix this by calling `br_fdb_sync_static` in `br_add_if`
after the port promisc flags are updated and the unicast
filter was supposed to have been filled.

Fixes: 2796d0c648c9 ("bridge: Automatically manage port promiscuous mode.")
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 86a14b79 27-Oct-2020 Henrik Bjoernlund <henrik.bjoernlund@microchip.com>

bridge: cfm: Kernel space implementation of CFM. MEP create/delete.

This is the first commit of the implementation of the CFM protocol
according to 802.1Q section 12.14.

It contains MEP instance create, delete and configuration.

Connectivity Fault Management (CFM) comprises capabilities for
detecting, verifying, and isolating connectivity failures in
Virtual Bridged Networks. These capabilities can be used in
networks operated by multiple independent organizations, each
with restricted management access to each others equipment.

CFM functions are partitioned as follows:
- Path discovery
- Fault detection
- Fault verification and isolation
- Fault notification
- Fault recovery

Interface consists of these functions:
br_cfm_mep_create()
br_cfm_mep_delete()
br_cfm_mep_config_set()
br_cfm_cc_config_set()
br_cfm_cc_peer_mep_add()
br_cfm_cc_peer_mep_remove()

A MEP instance is created by br_cfm_mep_create()
-It is the Maintenance association End Point
described in 802.1Q section 19.2.
-It is created on a specific level (1-7) and is assuring
that no CFM frames are passing through this MEP on lower levels.
-It initiates and validates CFM frames on its level.
-It can only exist on a port that is related to a bridge.
-Attributes given cannot be changed until the instance is
deleted.

A MEP instance can be deleted by br_cfm_mep_delete().

A created MEP instance has attributes that can be
configured by br_cfm_mep_config_set().

A MEP Continuity Check feature can be configured by
br_cfm_cc_config_set()
The Continuity Check Receiver state machine can be
enabled and disabled.
According to 802.1Q section 19.2.8

A MEP can have Peer MEPs added and removed by
br_cfm_cc_peer_mep_add() and br_cfm_cc_peer_mep_remove()
The Continuity Check feature can maintain connectivity
status on each added Peer MEP.

Signed-off-by: Henrik Bjoernlund <henrik.bjoernlund@microchip.com>
Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 9eb8eff0 10-May-2020 Vladimir Oltean <vladimir.oltean@nxp.com>

net: bridge: allow enslaving some DSA master network devices

Commit 8db0a2ee2c63 ("net: bridge: reject DSA-enabled master netdevices
as bridge members") added a special check in br_if.c in order to check
for a DSA master network device with a tagging protocol configured. This
was done because back then, such devices, once enslaved in a bridge
would become inoperative and would not pass DSA tagged traffic anymore
due to br_handle_frame returning RX_HANDLER_CONSUMED.

But right now we have valid use cases which do require bridging of DSA
masters. One such example is when the DSA master ports are DSA switch
ports themselves (in a disjoint tree setup). This should be completely
equivalent, functionally speaking, from having multiple DSA switches
hanging off of the ports of a switchdev driver. So we should allow the
enslaving of DSA tagged master network devices.

Instead of the regular br_handle_frame(), install a new function
br_handle_frame_dummy() on these DSA masters, which returns
RX_HANDLER_PASS in order to call into the DSA specific tagging protocol
handlers, and lift the restriction from br_add_if.

Suggested-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Suggested-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Tested-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 65369933 26-Apr-2020 Horatiu Vultur <horatiu.vultur@microchip.com>

bridge: mrp: Integrate MRP into the bridge

To integrate MRP into the bridge, the bridge needs to do the following:
- detect if the MRP frame was received on MRP ring port in that case it would be
processed otherwise just forward it as usual.
- enable parsing of MRP
- before whenever the bridge was set up, it would set all the ports in
forwarding state. Add an extra check to not set ports in forwarding state if
the port is an MRP ring port. The reason of this change is that if the MRP
instance initially sets the port in blocked state by setting the bridge up it
would overwrite this setting.

Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 2874c5fd 27-May-2019 Thomas Gleixner <tglx@linutronix.de>

treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152

Based on 1 normalized pattern(s):

this program is free software you can redistribute it and or modify
it under the terms of the gnu general public license as published by
the free software foundation either version 2 of the license or at
your option any later version

extracted by the scancode license scanner the SPDX license identifier

GPL-2.0-or-later

has been chosen to replace the boilerplate/reference in 3029 file(s).

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Allison Randal <allison@lohutok.net>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190527070032.746973796@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


# bdfad5ae 09-May-2019 Tobin C. Harding <tobin@kernel.org>

bridge: Fix error path for kobject_init_and_add()

Currently error return from kobject_init_and_add() is not followed by a
call to kobject_put(). This means there is a memory leak. We currently
set p to NULL so that kfree() may be called on it as a noop, the code is
arguably clearer if we move the kfree() up closer to where it is
called (instead of after goto jump).

Remove a goto label 'err1' and jump to call to kobject_put() in error
return from kobject_init_and_add() fixing the memory leak. Re-name goto
label 'put_back' to 'err1' now that we don't use err1, following current
nomenclature (err1, err2 ...). Move call to kfree out of the error
code at bottom of function up to closer to where memory was allocated.
Add comment to clarify call to kfree().

Signed-off-by: Tobin C. Harding <tobin@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 35f861e3 29-Mar-2019 Julian Wiedmann <jwi@linux.ibm.com>

net: bridge: use netif_is_bridge_port()

Replace the br_port_exists() macro with its twin from netdevice.h

CC: Roopa Prabhu <roopa@cumulusnetworks.com>
CC: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# ca935da7 13-Dec-2018 Petr Machata <petrm@mellanox.com>

net: bridge: Issue NETDEV_PRE_CHANGEADDR

When a port is attached to a bridge, the address of the bridge in
question may change as well. Even if it would not change at this
point (because the current bridge address is lower), it might end up
changing later as a result of detach of another port, which can't be
vetoed.

Therefore issue NETDEV_PRE_CHANGEADDR regardless of whether the address
will be used at this point or not, and make sure all involved parties
would agree with the change.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 169327d5 12-Dec-2018 Petr Machata <petrm@mellanox.com>

net: bridge: Propagate extack to switchdev

ndo_bridge_setlink has been updated in the previous patch to have extack
available, and changelink RTNL op has had this argument since the time
extack was added. Propagate both through the bridge driver to eventually
reach br_switchdev_port_vlan_add(), where it will be used by subsequent
patches.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Acked-by: Ivan Vecera <ivecera@redhat.com>
Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 085ddc87 21-Nov-2018 Ido Schimmel <idosch@mellanox.com>

bridge: Allow querying bridge port flags

Allow querying bridge port flags so that drivers capable of performing
VxLAN learning will update the bridge driver only if learning is enabled
on its bridge port corresponding to the VxLAN device.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 3341d917 26-Sep-2018 Nikolay Aleksandrov <nikolay@cumulusnetworks.com>

net: bridge: convert mtu_set_by_user to a bit

Convert the last remaining bool option to a bit thus reducing the overall
net_bridge size further by 8 bytes.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Reviewed-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 459479da 30-Aug-2018 Andy Shevchenko <andriy.shevchenko@linux.intel.com>

bridge: Switch to bitmap_zalloc()

Switch to bitmap_zalloc() to show clearly what we are allocating.
Besides that it returns pointer of bitmap type instead of opaque void *.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 2756f68c 23-Jul-2018 Nikolay Aleksandrov <nikolay@cumulusnetworks.com>

net: bridge: add support for backup port

This patch adds a new port attribute - IFLA_BRPORT_BACKUP_PORT, which
allows to set a backup port to be used for known unicast traffic if the
port has gone carrier down. The backup pointer is rcu protected and set
only under RTNL, a counter is maintained so when deleting a port we know
how many other ports reference it as a backup and we remove it from all.
Also the pointer is in the first cache line which is hot at the time of
the check and thus in the common case we only add one more test.
The backup port will be used only for the non-flooding case since
it's a part of the bridge and the flooded packets will be forwarded to it
anyway. To remove the forwarding just send a 0/non-existing backup port.
This is used to avoid numerous scalability problems when using MLAG most
notably if we have thousands of fdbs one would need to change all of them
on port carrier going down which takes too long and causes a storm of fdb
notifications (and again when the port comes back up). In a Multi-chassis
Link Aggregation setup usually hosts are connected to two different
switches which act as a single logical switch. Those switches usually have
a control and backup link between them called peerlink which might be used
for communication in case a host loses connectivity to one of them.
We need a fast way to failover in case a host port goes down and currently
none of the solutions (like bond) cannot fulfill the requirements because
the participating ports are actually the "master" devices and must have the
same peerlink as their backup interface and at the same time all of them
must participate in the bridge device. As Roopa noted it's normal practice
in routing called fast re-route where a precalculated backup path is used
when the main one is down.
Another use case of this is with EVPN, having a single vxlan device which
is backup of every port. Due to the nature of master devices it's not
currently possible to use one device as a backup for many and still have
all of them participate in the bridge (which is master itself).
More detailed information about MLAG is available at the link below.
https://docs.cumulusnetworks.com/display/DOCS/Multi-Chassis+Link+Aggregation+-+MLAG

Further explanation and a diagram by Roopa:
Two switches acting in a MLAG pair are connected by the peerlink
interface which is a bridge port.

the config on one of the switches looks like the below. The other
switch also has a similar config.
eth0 is connected to one port on the server. And the server is
connected to both switches.

br0 -- team0---eth0
|
-- switch-peerlink

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 705e0dea 20-Jul-2018 Tyler Hicks <tyhicks@canonical.com>

bridge: make sure objects belong to container's owner

When creating various bridge objects in /sys/class/net/... make sure
that they belong to the container's owner instead of global root (if
they belong to a container/namespace).

Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# faa1cd82 03-May-2018 Nikolay Aleksandrov <nikolay@cumulusnetworks.com>

net: bridge: avoid duplicate notification on up/down/change netdev events

While handling netdevice events, br_device_event() sometimes uses
br_stp_(disable|enable)_port which unconditionally send a notification,
but then a second notification for the same event is sent at the end of
the br_device_event() function. To avoid sending duplicate notifications
in such cases, check if one has already been sent (i.e.
br_stp_enable/disable_port have been called).
The patch is based on a change by Satish Ashok.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# e8238fc2 27-Apr-2018 Hangbin Liu <liuhangbin@gmail.com>

bridge: check iface upper dev when setting master via ioctl

When we set a bond slave's master to bridge via ioctl, we only check
the IFF_BRIDGE_PORT flag. Although we will find the slave's real master
at netdev_master_upper_dev_link() later, it already does some settings
and allocates some resources. It would be better to return as early
as possible.

v1 -> v2:
use netdev_master_upper_dev_get() instead of netdev_has_any_upper_dev()
to check if we have a master, because not all upper devs are masters,
e.g. vlan device.

Reported-by: syzbot+de73361ee4971b6e6f75@syzkaller.appspotmail.com
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 804b854d 30-Mar-2018 Nikolay Aleksandrov <nikolay@cumulusnetworks.com>

net: bridge: disable bridge MTU auto tuning if it was set manually

As Roopa noted today the biggest source of problems when configuring
bridge and ports is that the bridge MTU keeps changing automatically on
port events (add/del/changemtu). That leads to inconsistent behaviour
and network config software needs to chase the MTU and fix it on each
such event. Let's improve on that situation and allow for the user to
set any MTU within ETH_MIN/MAX limits, but once manually configured it
is the user's responsibility to keep it correct afterwards.

In case the MTU isn't manually set - the behaviour reverts to the
previous and the bridge follows the minimum MTU.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# f40aa233 30-Mar-2018 Nikolay Aleksandrov <nikolay@cumulusnetworks.com>

net: bridge: set min MTU on port events and allow user to set max

Recently the bridge was changed to automatically set maximum MTU on port
events (add/del/changemtu) when vlan filtering is enabled, but that
actually changes behaviour in a way which breaks some setups and can lead
to packet drops. In order to still allow that maximum to be set while being
compatible, we add the ability for the user to tune the bridge MTU up to
the maximum when vlan filtering is enabled, but that has to be done
explicitly and all port events (add/del/changemtu) lead to resetting that
MTU to the minimum as before.

Suggested-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 82792a07 23-Mar-2018 Nikolay Aleksandrov <nikolay@cumulusnetworks.com>

net: bridge: fix direct access to bridge vlan_enabled and use helper

We need to use br_vlan_enabled() helper otherwise we'll break builds
without bridge vlans:
net/bridge//br_if.c: In function ‘br_mtu’:
net/bridge//br_if.c:458:8: error: ‘const struct net_bridge’ has no
member named ‘vlan_enabled’
if (br->vlan_enabled)
^
net/bridge//br_if.c:462:1: warning: control reaches end of non-void
function [-Wreturn-type]
}
^
scripts/Makefile.build:324: recipe for target 'net/bridge//br_if.o'
failed

Fixes: 419d14af9e07 ("bridge: Allow max MTU when multiple VLANs present")
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 419d14af 22-Mar-2018 Chas Williams <3chas3@gmail.com>

bridge: Allow max MTU when multiple VLANs present

If the bridge is allowing multiple VLANs, some VLANs may have
different MTUs. Instead of choosing the minimum MTU for the
bridge interface, choose the maximum MTU of the bridge members.
With this the user only needs to set a larger MTU on the member
ports that are participating in the large MTU VLANS.

Signed-off-by: Chas Williams <3chas3@gmail.com>
Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 92899063 31-Oct-2017 Nikolay Aleksandrov <nikolay@cumulusnetworks.com>

net: bridge: add notifications for the bridge dev on vlan change

Currently the bridge device doesn't generate any notifications upon vlan
modifications on itself because it doesn't use the generic bridge
notifications.
With the recent changes we know if anything was modified in the vlan config
thus we can generate a notification when necessary for the bridge device
so add support to br_ifinfo_notify() similar to how other combined
functions are done - if port is present it takes precedence, otherwise
notify about the bridge. I've explicitly marked the locations where the
notification should be always for the port by setting bridge to NULL.
I've also taken the liberty to rearrange each modified function's local
variables in reverse xmas tree as well.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 821f1b21 06-Oct-2017 Roopa Prabhu <roopa@cumulusnetworks.com>

bridge: add new BR_NEIGH_SUPPRESS port flag to suppress arp and nd flood

This patch adds a new bridge port flag BR_NEIGH_SUPPRESS to
suppress arp and nd flood on bridge ports. It implements
rfc7432, section 10.
https://tools.ietf.org/html/rfc7432#section-10
for ethernet VPN deployments. It is similar to the existing
BR_PROXYARP* flags but has a few semantic differences to conform
to EVPN standard. Unlike the existing flags, this new flag suppresses
flood of all neigh discovery packets (arp and nd) to tunnel ports.
Supports both vlan filtering and non-vlan filtering bridges.

In case of EVPN, it is mainly used to avoid flooding
of arp and nd packets to tunnel ports like vxlan.

This patch adds netlink and sysfs support to set this bridge port
flag.

Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# ca752be0 04-Oct-2017 David Ahern <dsahern@gmail.com>

net: bridge: Pass extack to down to netdev_master_upper_dev_link

Pass extack arg to br_add_if. Add messages for a couple of failures
and pass arg to netdev_master_upper_dev_link.

Signed-off-by: David Ahern <dsahern@gmail.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 42ab19ee 04-Oct-2017 David Ahern <dsahern@gmail.com>

net: Add extack to upper device linking

Add extack arg to netdev_upper_dev_link and netdev_master_upper_dev_link

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 1f51445a 26-May-2017 Ido Schimmel <idosch@mellanox.com>

bridge: Export VLAN filtering state

It's useful for drivers supporting bridge offload to be able to query
the bridge's VLAN filtering state.

Currently, upon enslavement to a bridge master, the offloading driver
will only learn about the bridge's VLAN filtering state after the bridge
device was already linked with its slave.

Being able to query the bridge's VLAN filtering state allows such
drivers to forbid enslavement in case resource couldn't be allocated for
a VLAN-aware bridge and also choose the correct initialization routine
for the enslaved port, which is dependent on the bridge type.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 99f906e9 26-Apr-2017 Mike Manning <mmanning@brocade.com>

bridge: add per-port broadcast flood flag

Support for l2 multicast flood control was added in commit b6cb5ac8331b
("net: bridge: add per-port multicast flood flag"). It allows broadcast
as it was introduced specifically for unknown multicast flood control.
But as broadcast is a special case of multicast, this may also need to
be disabled. For this purpose, introduce a flag to disable the flooding
of received l2 broadcasts. This approach is backwards compatible and
provides flexibility in filtering for the desired packet types.

Cc: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: Mike Manning <mmanning@brocade.com>
Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# b1b9d366 25-Apr-2017 Xin Long <lucien.xin@gmail.com>

bridge: move bridge multicast cleanup to ndo_uninit

During removing a bridge device, if the bridge is still up, a new mdb entry
still can be added in br_multicast_add_group() after all mdb entries are
removed in br_multicast_dev_del(). Like the path:

mld_ifc_timer_expire ->
mld_sendpack -> ...
br_multicast_rcv ->
br_multicast_add_group

The new mp's timer will be set up. If the timer expires after the bridge
is freed, it may cause use-after-free panic in br_multicast_group_expired.

BUG: unable to handle kernel NULL pointer dereference at 0000000000000048
IP: [<ffffffffa07ed2c8>] br_multicast_group_expired+0x28/0xb0 [bridge]
Call Trace:
<IRQ>
[<ffffffff81094536>] call_timer_fn+0x36/0x110
[<ffffffffa07ed2a0>] ? br_mdb_free+0x30/0x30 [bridge]
[<ffffffff81096967>] run_timer_softirq+0x237/0x340
[<ffffffff8108dcbf>] __do_softirq+0xef/0x280
[<ffffffff8169889c>] call_softirq+0x1c/0x30
[<ffffffff8102c275>] do_softirq+0x65/0xa0
[<ffffffff8108e055>] irq_exit+0x115/0x120
[<ffffffff81699515>] smp_apic_timer_interrupt+0x45/0x60
[<ffffffff81697a5d>] apic_timer_interrupt+0x6d/0x80

Nikolay also found it would cause a memory leak - the mdb hash is
reallocated and not freed due to the mdb rehash.

unreferenced object 0xffff8800540ba800 (size 2048):
backtrace:
[<ffffffff816e2287>] kmemleak_alloc+0x67/0xc0
[<ffffffff81260bea>] __kmalloc+0x1ba/0x3e0
[<ffffffffa05c60ee>] br_mdb_rehash+0x5e/0x340 [bridge]
[<ffffffffa05c74af>] br_multicast_new_group+0x43f/0x6e0 [bridge]
[<ffffffffa05c7aa3>] br_multicast_add_group+0x203/0x260 [bridge]
[<ffffffffa05ca4b5>] br_multicast_rcv+0x945/0x11d0 [bridge]
[<ffffffffa05b6b10>] br_dev_xmit+0x180/0x470 [bridge]
[<ffffffff815c781b>] dev_hard_start_xmit+0xbb/0x3d0
[<ffffffff815c8743>] __dev_queue_xmit+0xb13/0xc10
[<ffffffff815c8850>] dev_queue_xmit+0x10/0x20
[<ffffffffa02f8d7a>] ip6_finish_output2+0x5ca/0xac0 [ipv6]
[<ffffffffa02fbfc6>] ip6_finish_output+0x126/0x2c0 [ipv6]
[<ffffffffa02fc245>] ip6_output+0xe5/0x390 [ipv6]
[<ffffffffa032b92c>] NF_HOOK.constprop.44+0x6c/0x240 [ipv6]
[<ffffffffa032bd16>] mld_sendpack+0x216/0x3e0 [ipv6]
[<ffffffffa032d5eb>] mld_ifc_timer_expire+0x18b/0x2b0 [ipv6]

This could happen when ip link remove a bridge or destroy a netns with a
bridge device inside.

With Nikolay's suggestion, this patch is to clean up bridge multicast in
ndo_uninit after bridge dev is shutdown, instead of br_dev_delete, so
that netif_running check in br_multicast_add_group can avoid this issue.

v1->v2:
- fix this issue by moving br_multicast_dev_del to ndo_uninit, instead
of calling dev_close in br_dev_delete.

(NOTE: Depends upon b6fe0440c637 ("bridge: implement missing ndo_uninit()"))

Fixes: e10177abf842 ("bridge: multicast: fix handling of temp and perm entries")
Reported-by: Jianwen Ji <jiji@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Reviewed-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# b6fe0440 10-Apr-2017 Ido Schimmel <idosch@mellanox.com>

bridge: implement missing ndo_uninit()

While the bridge driver implements an ndo_init(), it was missing a
symmetric ndo_uninit(), causing the different de-initialization
operations to be scattered around its dellink() and destructor().

Implement a symmetric ndo_uninit() and remove the overlapping operations
from its dellink() and destructor().

This is a prerequisite for the next patch, as it allows us to have a
proper cleanup upon changelink() failure during the bridge's newlink().

Fixes: b6677449dff6 ("bridge: netlink: call br_changelink() during br_dev_newlink()")
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# c6e970a0 28-Mar-2017 Andrew Lunn <andrew@lunn.ch>

net: break include loop netdevice.h, dsa.h, devlink.h

There is an include loop between netdevice.h, dsa.h, devlink.h because
of NETDEV_ALIGN, making it impossible to use devlink structures in
dsa.h.

Break this loop by taking dsa.h out of netdevice.h, add a forward
declaration of dsa_switch_tree and netdev_set_default_ethtool_ops()
function, which is what netdevice.h requires.

No longer having dsa.h in netdevice.h means the includes in dsa.h no
longer get included. This breaks a few other files which depend on
these includes. Add these directly in the affected file.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# f7cdee8a 04-Feb-2017 Nikolay Aleksandrov <nikolay@cumulusnetworks.com>

bridge: move to workqueue gc

Move the fdb garbage collector to a workqueue which fires at least 10
milliseconds apart and cleans chain by chain allowing for other tasks
to run in the meantime. When having thousands of fdbs the system is much
more responsive. Most importantly remove the need to check if the
matched entry has expired in __br_fdb_get that causes false-sharing and
is completely unnecessary if we cleanup entries, at worst we'll get 10ms
of traffic for that entry before it gets deleted.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# b6cb5ac8 31-Aug-2016 Nikolay Aleksandrov <nikolay@cumulusnetworks.com>

net: bridge: add per-port multicast flood flag

Add a per-port flag to control the unknown multicast flood, similar to the
unknown unicast flood flag and break a few long lines in the netlink flag
exports.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 6bc506b4 25-Aug-2016 Ido Schimmel <idosch@mellanox.com>

bridge: switchdev: Add forward mark support for stacked devices

switchdev_port_fwd_mark_set() is used to set the 'offload_fwd_mark' of
port netdevs so that packets being flooded by the device won't be
flooded twice.

It works by assigning a unique identifier (the ifindex of the first
bridge port) to bridge ports sharing the same parent ID. This prevents
packets from being flooded twice by the same switch, but will flood
packets through bridge ports belonging to a different switch.

This method is problematic when stacked devices are taken into account,
such as VLANs. In such cases, a physical port netdev can have upper
devices being members in two different bridges, thus requiring two
different 'offload_fwd_mark's to be configured on the port netdev, which
is impossible.

The main problem is that packet and netdev marking is performed at the
physical netdev level, whereas flooding occurs between bridge ports,
which are not necessarily port netdevs.

Instead, packet and netdev marking should really be done in the bridge
driver with the switch driver only telling it which packets it already
forwarded. The bridge driver will mark such packets using the mark
assigned to the ingress bridge port and will prevent the packet from
being forwarded through any bridge port sharing the same mark (i.e.
having the same parent ID).

Remove the current switchdev 'offload_fwd_mark' implementation and
instead implement the proposed method. In addition, make rocker - the
sole user of the mark - use the proposed method.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 1080ab95 28-Jun-2016 Nikolay Aleksandrov <nikolay@cumulusnetworks.com>

net: bridge: add support for IGMP/MLD stats and export them via netlink

This patch adds stats support for the currently used IGMP/MLD types by the
bridge. The stats are per-port (plus one stat per-bridge) and per-direction
(RX/TX). The stats are exported via netlink via the new linkxstats API
(RTM_GETSTATS). In order to minimize the performance impact, a new option
is used to enable/disable the stats - multicast_stats_enabled, similar to
the recent vlan stats. Also in order to avoid multiple IGMP/MLD type
lookups and checks, we make use of the current "igmp" member of the bridge
private skb->cb region to record the type on Rx (both host-generated and
external packets pass by multicast_rcv()). We can do that since the igmp
member was used as a boolean and all the valid IGMP/MLD types are positive
values. The normal bridge fast-path is not affected at all, the only
affected paths are the flooding ones and since we make use of the IGMP/MLD
type, we can quickly determine if the packet should be counted using
cache-hot data (cb's igmp member). We add counters for:
* IGMP Queries
* IGMP Leaves
* IGMP v1/v2/v3 reports

* MLD Queries
* MLD Leaves
* MLD v1/v2 reports

These are invaluable when monitoring or debugging complex multicast setups
with bridges.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# ae74f100 21-Mar-2016 Eric Dumazet <edumazet@google.com>

bridge: update max_gso_segs and max_gso_size

It can be useful to lower max_gso_segs on NIC with very low
number of TX descriptors like bcmgenet.

However, this is defeated by bridge since it does not propagate
the lower value of max_gso_segs and max_gso_size.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Petri Gynther <pgynther@google.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 45493d47 26-Feb-2016 Paolo Abeni <pabeni@redhat.com>

bridge: notify enslaved devices of headroom changes

On bridge needed_headroom changes, the enslaved devices are
notified via the ndo_set_rx_headroom method

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 702b26a2 24-Feb-2016 David Decotigny <decot@googlers.com>

net: bridge: use __ethtool_get_ksettings

Signed-off-by: David Decotigny <decot@googlers.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 08474cc1 06-Jan-2016 Elad Raz <eladr@mellanox.com>

bridge: Propagate vlan add failure to user

Disallow adding interfaces to a bridge when vlan filtering operation
failed. Send the failure code to the user.

Signed-off-by: Elad Raz <eladr@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 29bf24af 02-Dec-2015 Jiri Pirko <jiri@mellanox.com>

net: add possibility to pass information about upper device via notifier

Sometimes the drivers and other code would find it handy to know some
internal information about upper device being changed. So allow upper-code
to pass information down to notifier listeners during linking.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 6dffb044 02-Dec-2015 Jiri Pirko <jiri@mellanox.com>

net: propagate upper priv via netdev_master_upper_dev_link

Eliminate netdev_master_upper_dev_link_private and pass priv directly as
a parameter of netdev_master_upper_dev_link.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 56607386 14-Oct-2015 Jiri Pirko <jiri@mellanox.com>

bridge: defer switchdev fdb del call in fdb_del_external_learn

Since spinlock is held here, defer the switchdev operation. Also, ensure
that defered switchdev ops are processed before port master device
is unlinked.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# f409d0ed 12-Oct-2015 Nikolay Aleksandrov <nikolay@cumulusnetworks.com>

bridge: vlan: move back vlan_flush

Ido Schimmel reported a problem with switchdev devices because of the
order change of del_nbp operations, more specifically the move of
nbp_vlan_flush() which deletes all vlans and frees vlgrp after the
rx_handler has been unregistered. So in order to fix this move
vlan_flush back where it was and make it destroy the rhtable after
NULLing vlgrp and waiting a grace period to make sure noone can see it.

Reported-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 263344e6 30-Sep-2015 Nikolay Aleksandrov <nikolay@cumulusnetworks.com>

bridge: vlan: fix possible null ptr derefs on port init and deinit

When a new port is being added we need to make vlgrp available after
rhashtable has been initialized and when removing a port we need to
flush the vlans and free the resources after we're sure noone can use
the port, i.e. after it's removed from the port list and synchronize_rcu
is executed.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# e10177ab 15-Jul-2015 Satish Ashok <sashok@cumulusnetworks.com>

bridge: multicast: fix handling of temp and perm entries

When the bridge (or port) is brought down/up flush only temp entries and
leave the perm ones. Flush perm entries only when deleting the bridge
device or the associated port.

Signed-off-by: Satish Ashok <sashok@cumulusnetworks.com>
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 1ea2d020 23-Jun-2015 Nikolay Aleksandrov <nikolay@cumulusnetworks.com>

bridge: vlan: flush the dynamically learned entries on port vlan delete

Add a new argument to br_fdb_delete_by_port which allows to specify a
vid to match when flushing entries and use it in nbp_vlan_delete() to
flush the dynamically learned entries of the vlan/port pair when removing
a vlan from a port. Before this patch only the local mac was being
removed and the dynamically learned ones were left to expire.
Note that the do_all argument is still respected and if specified, the
vid will be ignored.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 4c906c27 13-Mar-2015 Venkat Venkatsubra <venkat.x.venkatsubra@oracle.com>

bridge: reset bridge mtu after deleting an interface

On adding an interface br_add_if() sets the MTU to the min of
all the interfaces. Do the same thing on removing an interface too
in br_del_if.

Signed-off-by: Venkat Venkatsubra <venkat.x.venkatsubra@oracle.com>
Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 8db0a2ee 16-Jan-2015 Florian Fainelli <f.fainelli@gmail.com>

net: bridge: reject DSA-enabled master netdevices as bridge members

DSA-enabled master network devices with a switch tagging protocol should
strip the protocol specific format before handing the frame over to
higher layer.

When adding such a DSA master network device as a bridge member, we go
through the following code path when receiving a frame:

__netif_receive_skb_core
-> first ptype check against ptype_all is not returning any
handler for this skb

-> check and invoke rx_handler:
-> deliver frame to the bridge layer: br_handle_frame

DSA registers a ptype handler with the fake ETH_XDSA ethertype, which is
called *after* the bridge-layer rx_handler has run. br_handle_frame()
tries to parse the frame it received from the DSA master network device,
and will not be able to match any of its conditions and jumps straight
at the end of the end of br_handle_frame() and returns
RX_HANDLER_CONSUMED there.

Since we returned RX_HANDLER_CONSUMED, __netif_receive_skb_core() stops
RX processing for this frame and returns NET_RX_SUCCESS, so we never get
a chance to call our switch tag packet processing logic and deliver
frames to the DSA slave network devices, and so we do not get any
functional bridge members at all.

Instead of cluttering the bridge receive path with DSA-specific checks,
and rely on assumptions about how __netif_receive_skb_core() is
processing frames, we simply deny adding the DSA master network device
(conduit interface) as a bridge member, leaving only the slave DSA
network devices to be bridge members, since those will work correctly in
all circumstances.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# f902e881 08-Jan-2015 Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>

bridge: Add ability to enable TSO

Currently a bridge device turns off TSO feature if no bridge ports
support it. We can always enable it, since packets can be segmented on
ports by software as well as on the bridge device.
This will reduce the number of packets processed in the bridge.

Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 5be5a2df 03-Oct-2014 Vlad Yasevich <vyasevich@gmail.com>

bridge: Add filtering support for default_pvid

Currently when vlan filtering is turned on on the bridge, the bridge
will drop all traffic untill the user configures the filter. This
isn't very nice for ports that don't care about vlans and just
want untagged traffic.

A concept of a default_pvid was recently introduced. This patch
adds filtering support for default_pvid. Now, ports that don't
care about vlans and don't define there own filter will belong
to the VLAN of the default_pvid and continue to receive untagged
traffic.

This filtering can be disabled by setting default_pvid to 0.

Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 775dd692 30-Sep-2014 Florian Fainelli <f.fainelli@gmail.com>

net: bridge: add a br_set_state helper function

In preparation for being able to propagate port states to e.g: notifiers
or other kernel parts, do not manipulate the port state directly, but
instead use a helper function which will allow us to do a bit more than
just setting the state.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 0f49579a 05-Sep-2014 Jiri Pirko <jiri@resnulli.us>

bridge: switch order of rx_handler reg and upper dev link

The thing is that netdev_master_upper_dev_link calls
call_netdevice_notifiers(NETDEV_CHANGEUPPER, dev). That generates rtnl
link message and during that, rtnl_link_ops->fill_slave_info is called.
But with current ordering, rx_handler and IFF_BRIDGE_PORT are not set
yet so there would have to be check for that in fill_slave_info callback.

Resolve this by reordering to similar what bonding and team does to
avoid the check.

Also add removal of IFF_BRIDGE_PORT flag into error path.

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>


# c835a677 14-Jul-2014 Tom Gundersen <teg@jklm.no>

net: set name_assign_type in alloc_netdev()

Extend alloc_netdev{,_mq{,s}}() to take name_assign_type as argument, and convert
all users to pass NET_NAME_UNKNOWN.

Coccinelle patch:

@@
expression sizeof_priv, name, setup, txqs, rxqs, count;
@@

(
-alloc_netdev_mqs(sizeof_priv, name, setup, txqs, rxqs)
+alloc_netdev_mqs(sizeof_priv, name, NET_NAME_UNKNOWN, setup, txqs, rxqs)
|
-alloc_netdev_mq(sizeof_priv, name, setup, count)
+alloc_netdev_mq(sizeof_priv, name, NET_NAME_UNKNOWN, setup, count)
|
-alloc_netdev(sizeof_priv, name, setup)
+alloc_netdev(sizeof_priv, name, NET_NAME_UNKNOWN, setup)
)

v9: move comments here from the wrong commit

Signed-off-by: Tom Gundersen <teg@jklm.no>
Reviewed-by: David Herrmann <dh.herrmann@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# e0a47d1f 05-Jun-2014 Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>

bridge: Fix incorrect judgment of promisc

br_manage_promisc() incorrectly expects br_auto_port() to return only 0
or 1, while it actually returns flags, i.e., a subset of BR_AUTO_MASK.

Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 019ee792 28-May-2014 wangweidong <wangweidong1@huawei.com>

bridge: fix the unbalanced promiscuous count when add_if failed

As commit 2796d0c648c94 ("bridge: Automatically manage port
promiscuous mode."), make the add_if use dev_set_allmulti
instead of dev_set_promiscuous, so when add_if failed, we
should do dev_set_allmulti(dev, -1).

Signed-off-by: Wang Weidong <wangweidong1@huawei.com>
Reviewed-by: Amos Kong <akong@redhat.com>
Acked-by: Vlad Yasevich <vyasevic@redhat.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 025559ee 16-May-2014 stephen hemminger <stephen@networkplumber.org>

bridge: fix spelling of promiscuous

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 2796d0c6 16-May-2014 Vlad Yasevich <vyasevic@redhat.com>

bridge: Automatically manage port promiscuous mode.

There exist configurations where the administrator or another management
entity has the foreknowledge of all the mac addresses of end systems
that are being bridged together.

In these environments, the administrator can statically configure known
addresses in the bridge FDB and disable flooding and learning on ports.
This makes it possible to turn off promiscuous mode on the interfaces
connected to the bridge.

Here is why disabling flooding and learning allows us to control
promiscuity:
Consider port X. All traffic coming into this port from outside the
bridge (ingress) will be either forwarded through other ports of the
bridge (egress) or dropped. Forwarding (egress) is defined by FDB
entries and by flooding in the event that no FDB entry exists.
In the event that flooding is disabled, only FDB entries define
the egress. Once learning is disabled, only static FDB entries
provided by a management entity define the egress. If we provide
information from these static FDBs to the ingress port X, then we'll
be able to accept all traffic that can be successfully forwarded and
drop all the other traffic sooner without spending CPU cycles to
process it.
Another way to define the above is as following equations:
ingress = egress + drop
expanding egress
ingress = static FDB + learned FDB + flooding + drop
disabling flooding and learning we a left with
ingress = static FDB + drop

By adding addresses from the static FDB entries to the MAC address
filter of an ingress port X, we fully define what the bridge can
process without dropping and can thus turn off promiscuous mode,
thus dropping packets sooner.

There have been suggestions that we may want to allow learning
and update the filters with learned addresses as well. This
would require mac-level authentication similar to 802.1x to
prevent attacks against the hw filters as they are limited
resource.

Additionally, if the user places the bridge device in promiscuous mode,
all ports are placed in promiscuous mode regardless of the changes
to flooding and learning.

Since the above functionality depends on full static configuration,
we have also require that vlan filtering be enabled to take
advantage of this. The reason is that the bridge has to be
able to receive and process VLAN-tagged frames and the there
are only 2 ways to accomplish this right now: promiscuous mode
or vlan filtering.

Suggested-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# f3a6ddf1 16-May-2014 Vlad Yasevich <vyasevic@redhat.com>

bridge: Introduce BR_PROMISC flag

Introduce a BR_PROMISC per-port flag that will help us track if the
current port is supposed to be in promiscuous mode or not. For now,
always start in promiscuous mode.

Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# e028e4b8 16-May-2014 Vlad Yasevich <vyasevic@redhat.com>

bridge: Keep track of ports capable of automatic discovery.

By default, ports on the bridge are capable of automatic
discovery of nodes located behind the port. This is accomplished
via flooding of unknown traffic (BR_FLOOD) and learning the
mac addresses from these packets (BR_LEARNING).
If the above functionality is disabled by turning off these
flags, the port requires static configuration in the form
of static FDB entries to function properly.

This patch adds functionality to keep track of all ports
capable of automatic discovery. This will later be used
to control promiscuity settings.

Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# a8779ec1 27-Mar-2014 Eric W. Biederman <ebiederm@xmission.com>

netpoll: Remove gfp parameter from __netpoll_setup

The gfp parameter was added in:
commit 47be03a28cc6c80e3aa2b3e8ed6d960ff0c5c0af
Author: Amerigo Wang <amwang@redhat.com>
Date: Fri Aug 10 01:24:37 2012 +0000

netpoll: use GFP_ATOMIC in slave_enable_netpoll() and __netpoll_setup()

slave_enable_netpoll() and __netpoll_setup() may be called
with read_lock() held, so should use GFP_ATOMIC to allocate
memory. Eric suggested to pass gfp flags to __netpoll_setup().

Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: "David S. Miller" <davem@davemloft.net>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

The reason for the gfp parameter was removed in:
commit c4cdef9b7183159c23c7302aaf270d64c549f557
Author: dingtianhong <dingtianhong@huawei.com>
Date: Tue Jul 23 15:25:27 2013 +0800

bonding: don't call slave_xxx_netpoll under spinlocks

The slave_xxx_netpoll will call synchronize_rcu_bh(),
so the function may schedule and sleep, it should't be
called under spinlocks.

bond_netpoll_setup() and bond_netpoll_cleanup() are always
protected by rtnl lock, it is no need to take the read lock,
as the slave list couldn't be changed outside rtnl lock.

Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Cc: Jay Vosburgh <fubar@us.ibm.com>
Cc: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

Nothing else that calls __netpoll_setup or ndo_netpoll_setup
requires a gfp paramter, so remove the gfp parameter from both
of these functions making the code clearer.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# a4b816d8 07-Feb-2014 Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>

bridge: Change local fdb entries whenever mac address of bridge device changes

Vlan code may need fdb change when changing mac address of bridge device
even if it is caused by the mac address changing of a bridge port.

Example configuration:
ip link set eth0 address 12:34:56:78:90:ab
ip link set eth1 address aa:bb:cc:dd:ee:ff
brctl addif br0 eth0
brctl addif br0 eth1 # br0 will have mac address 12:34:56:78:90:ab
bridge vlan add dev br0 vid 10 self
bridge vlan add dev eth0 vid 10
We will have fdb entry such that f->dst == NULL, f->vlan_id == 10 and
f->addr == 12:34:56:78:90:ab at this time.
Next, change the mac address of eth0 to greater value.
ip link set eth0 address ee:ff:12:34:56:78
Then, mac address of br0 will be recalculated and set to aa:bb:cc:dd:ee:ff.
However, an entry aa:bb:cc:dd:ee:ff will not be created and we will be not
able to communicate using br0 on vlan 10.

Address this issue by deleting and adding local entries whenever
changing the mac address of the bridge device.

If there already exists an entry that has the same address, for example,
in case that br_fdb_changeaddr() has already inserted it,
br_fdb_change_mac_address() will simply fail to insert it and no
duplicated entry will be made, as it was.

This approach also needs br_add_if() to call br_fdb_insert() before
br_stp_recalculate_bridge_id() so that we don't create an entry whose
dst == NULL in this function to preserve previous behavior.

Note that this is a slight change in behavior where the bridge device can
receive the traffic to the new address before calling
br_stp_recalculate_bridge_id() in br_add_if().
However, it is not a problem because we have already the address on the
new port and such a way to insert new one before recalculating bridge id
is taken in br_device_event() as well.

Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Acked-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# b86f81cc 10-Jan-2014 WANG Cong <xiyou.wangcong@gmail.com>

bridge: move br_net_exit() to br.c

And it can become static.

Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 1a81a2e0 16-Dec-2013 tanxiaojun <tanxiaojun@huawei.com>

bridge: spelling fixes

Fix spelling errors in bridge driver.

Signed-off-by: Tan Xiaojun <tanxiaojun@huawei.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>


# f8730420 07-Dec-2013 Ding Tianhong <dingtianhong@huawei.com>

bridge: flush br's address entry in fdb when remove the

bridge dev

When the following commands are executed:

brctl addbr br0
ifconfig br0 hw ether <addr>
rmmod bridge

The calltrace will occur:

[ 563.312114] device eth1 left promiscuous mode
[ 563.312188] br0: port 1(eth1) entered disabled state
[ 563.468190] kmem_cache_destroy bridge_fdb_cache: Slab cache still has objects
[ 563.468197] CPU: 6 PID: 6982 Comm: rmmod Tainted: G O 3.12.0-0.7-default+ #9
[ 563.468199] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007
[ 563.468200] 0000000000000880 ffff88010f111e98 ffffffff814d1c92 ffff88010f111eb8
[ 563.468204] ffffffff81148efd ffff88010f111eb8 0000000000000000 ffff88010f111ec8
[ 563.468206] ffffffffa062a270 ffff88010f111ed8 ffffffffa063ac76 ffff88010f111f78
[ 563.468209] Call Trace:
[ 563.468218] [<ffffffff814d1c92>] dump_stack+0x6a/0x78
[ 563.468234] [<ffffffff81148efd>] kmem_cache_destroy+0xfd/0x100
[ 563.468242] [<ffffffffa062a270>] br_fdb_fini+0x10/0x20 [bridge]
[ 563.468247] [<ffffffffa063ac76>] br_deinit+0x4e/0x50 [bridge]
[ 563.468254] [<ffffffff810c7dc9>] SyS_delete_module+0x199/0x2b0
[ 563.468259] [<ffffffff814e0922>] system_call_fastpath+0x16/0x1b
[ 570.377958] Bridge firewalling registered

--------------------------- cut here -------------------------------

The reason is that when the bridge dev's address is changed, the
br_fdb_change_mac_address() will add new address in fdb, but when
the bridge was removed, the address entry in the fdb did not free,
the bridge_fdb_cache still has objects when destroy the cache, Fix
this by flushing the bridge address entry when removing the bridge.

v2: according to the Toshiaki Makita and Vlad's suggestion, I only
delete the vlan0 entry, it still have a leak here if the vlan id
is other number, so I need to call fdb_delete_by_port(br, NULL, 1)
to flush all entries whose dst is NULL for the bridge.

Suggested-by: Toshiaki Makita <toshiaki.makita1@gmail.com>
Suggested-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# b4e09b29 13-Nov-2013 Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>

bridge: Fix memory leak when deleting bridge with vlan filtering enabled

We currently don't call br_vlan_flush() when deleting a bridge, which
leads to memory leak if br->vlan_info is allocated.

Steps to reproduce:
while :
do
brctl addbr br0
bridge vlan add dev br0 vid 10 self
brctl delbr br0
done
We can observe the cache size of corresponding slab entry
(as kmalloc-2048 in SLUB) is increased.

kmemleak output:
unreferenced object 0xffff8800b68a7000 (size 2048):
comm "bridge", pid 2086, jiffies 4295774704 (age 47.656s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 48 9b 36 00 88 ff ff .........H.6....
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
backtrace:
[<ffffffff815eb6ae>] kmemleak_alloc+0x4e/0xb0
[<ffffffff8116a1ca>] kmem_cache_alloc_trace+0xca/0x220
[<ffffffffa03eddd6>] br_vlan_add+0x66/0xe0 [bridge]
[<ffffffffa03e543c>] br_setlink+0x2dc/0x340 [bridge]
[<ffffffff8150e481>] rtnl_bridge_setlink+0x101/0x200
[<ffffffff8150d9d9>] rtnetlink_rcv_msg+0x99/0x260
[<ffffffff81528679>] netlink_rcv_skb+0xa9/0xc0
[<ffffffff8150d938>] rtnetlink_rcv+0x28/0x30
[<ffffffff81527ccd>] netlink_unicast+0xdd/0x190
[<ffffffff8152807f>] netlink_sendmsg+0x2ff/0x740
[<ffffffff814e8368>] sock_sendmsg+0x88/0xc0
[<ffffffff814e8ac8>] ___sys_sendmsg.part.14+0x298/0x2b0
[<ffffffff814e91de>] __sys_sendmsg+0x4e/0x90
[<ffffffff814e922e>] SyS_sendmsg+0xe/0x10
[<ffffffff81601669>] system_call_fastpath+0x16/0x1b
[<ffffffffffffffff>] 0xffffffffffffffff

Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>


# fd094808 26-Aug-2013 Florian Fainelli <f.fainelli@gmail.com>

bridge: inherit slave devices needed_headroom

Some slave devices may have set a dev->needed_headroom value which is
different than the default one, most likely in order to prepend a
hardware descriptor in front of the Ethernet frame to send. Whenever a
new slave is added to a bridge, ensure that we update the
needed_headroom value accordingly to account for the slave
needed_headroom value.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 93d8bf9f 24-Jul-2013 stephen hemminger <stephen@networkplumber.org>

bridge: cleanup netpoll code

This started out with fixing a sparse warning, then I realized that
the wrapper function br_netpoll_info could just be collapsed away
by rolling it into the enable code.

Also, eliminate unnecessary goto's

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Reviewed-by: Jiri Pirko <jiri@resnulli.us>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 867a5943 05-Jun-2013 Vlad Yasevich <vyasevic@redhat.com>

bridge: Add a flag to control unicast packet flood.

Add a flag to control flood of unicast traffic. By default, flood is
on and the bridge will flood unicast traffic if it doesn't know
the destination. When the flag is turned off, unicast traffic
without an FDB will not be forwarded to the specified port.

Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 9ba18891 05-Jun-2013 Vlad Yasevich <vyasevic@redhat.com>

bridge: Add flag to control mac learning.

Allow user to control whether mac learning is enabled on the port.
By default, mac learning is enabled. Disabling mac learning will
cause new dynamic FDB entries to not be created for a particular port.

Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 8f3359bd 13-Apr-2013 stephen hemminger <stephen@networkplumber.org>

bridge: make user modified path cost sticky

Keep a STP port path cost value if it was set by a user.
Don't replace it with the link-speed based path cost
whenever the link goes down and comes back up.

Reported-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 4a0b5ec1 01-Apr-2013 Eric Dumazet <edumazet@google.com>

bridge: remove a redundant synchronize_net()

commit 00cfec37484761 (net: add a synchronize_net() in
netdev_rx_handler_unregister())
allows us to remove the synchronized_net() call from del_nbp()

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Veaceslav Falico <vfalico@redhat.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# bc9a25d2 12-Feb-2013 Vlad Yasevich <vyasevic@redhat.com>

bridge: Add vlan support for local fdb entries

When VLAN is added to the port, a local fdb entry for that port
(the entry with the mac address of the port) is added for that
VLAN. This way we can correctly determine if the traffic
is for the bridge itself. If the address of the port changes,
we try to change all the local fdb entries we have for that port.

Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 407af329 12-Feb-2013 Vlad Yasevich <vyasevic@redhat.com>

bridge: Add netlink interface to configure vlans on bridge ports

Add a netlink interface to add and remove vlan configuration on bridge port.
The interface uses the RTM_SETLINK message and encodes the vlan
configuration inside the IFLA_AF_SPEC. It is possble to include multiple
vlans to either add or remove in a single message.

Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 243a2e63 12-Feb-2013 Vlad Yasevich <vyasevic@redhat.com>

bridge: Add vlan filtering infrastructure

Adds an optional infrustructure component to bridge that would allow
native vlan filtering in the bridge. Each bridge port (as well
as the bridge device) now get a VLAN bitmap. Each bit in the bitmap
is associated with a vlan id. This way if the bit corresponding to
the vid is set in the bitmap that the packet with vid is allowed to
enter and exit the port.

Write access the bitmap is protected by RTNL and read access
protected by RCU.

Vlan functionality is disabled by default.

Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 74fdd93f 03-Jan-2013 Jiri Pirko <jiri@resnulli.us>

bridge: remove usage of netdev_set_master()

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 576eb625 28-Dec-2012 stephen hemminger <shemminger@vyatta.com>

bridge: respect RFC2863 operational state

The bridge link detection should follow the operational state
of the lower device, rather than the carrier bit. This allows devices
like tunnels that are controlled by userspace control plane to work
with bridge STP link management.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Reviewed-by: Flavio Leitner <fbl@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 9b1536c4 19-Dec-2012 Gao feng <gaofeng@cn.fujitsu.com>

bridge: call br_netpoll_disable in br_add_if

When netdev_set_master faild in br_add_if, we should
call br_netpoll_disable to do some cleanup jobs,such
as free the memory of struct netpoll which allocated
in br_netpoll_enable.

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Acked-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# d30362c0 09-Aug-2012 Amerigo Wang <amwang@redhat.com>

bridge: add some comments for NETDEV_RELEASE

Add comments on why we don't notify NETDEV_RELEASE.

Cc: David Miller <davem@davemloft.net>
Cc: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 47be03a2 09-Aug-2012 Amerigo Wang <amwang@redhat.com>

netpoll: use GFP_ATOMIC in slave_enable_netpoll() and __netpoll_setup()

slave_enable_netpoll() and __netpoll_setup() may be called
with read_lock() held, so should use GFP_ATOMIC to allocate
memory. Eric suggested to pass gfp flags to __netpoll_setup().

Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: "David S. Miller" <davem@davemloft.net>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 149ddd83 25-Jun-2012 stephen hemminger <shemminger@vyatta.com>

bridge: Assign rtnl_link_ops to bridge devices created via ioctl (v2)

This ensures that bridges created with brctl(8) or ioctl(2) directly
also carry IFLA_LINKINFO when dumped over netlink. This also allows
to create a bridge with ioctl(2) and delete it with RTM_DELLINK.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# c8f44aff 15-Nov-2011 Michał Mirosław <mirq-linux@rere.qmqm.pl>

net: introduce and use netdev_features_t for device features sets

v2: add couple missing conversions in drivers
split unexporting netdev_fix_features()
implemented %pNF
convert sock::sk_route_(no?)caps

Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 1ce5cce8 06-Oct-2011 stephen hemminger <shemminger@vyatta.com>

bridge: fix hang on removal of bridge via netlink

Need to cleanup bridge device timers and ports when being bridge
device is being removed via netlink.

This fixes the problem of observed when doing:
ip link add br0 type bridge
ip link set dev eth1 master br0
ip link set br0 up
ip link del br0

which would cause br0 to hang in unregister_netdev because
of leftover reference count.

Reported-by: Sridhar Samudrala <sri@us.ibm.com>
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Acked-by: Sridhar Samudrala <sri@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 77f98598 30-Sep-2011 stephen hemminger <shemminger@vyatta.com>

bridge: fix ordering of NEWLINK and NEWNEIGH events

When port is added to a bridge, the old code would send the new neighbor
netlink message before the subsequent new link message. This bug makes
it difficult to use the monitoring API in an application.

This code changes the ordering to add the forwarding entry
after the port is setup. One of the error checks (for invalid address)
is moved earlier in the process to avoid having to do unwind.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 4bc71cb9 02-Sep-2011 Jiri Pirko <jpirko@redhat.com>

net: consolidate and fix ethtool_ops->get_settings calling

This patch does several things:
- introduces __ethtool_get_settings which is called from ethtool code and
from drivers as well. Put ASSERT_RTNL there.
- dev_ethtool_get_settings() is replaced by __ethtool_get_settings()
- changes calling in drivers so rtnl locking is respected. In
iboe_get_rate was previously ->get_settings() called unlocked. This
fixes it. Also prb_calc_retire_blk_tmo() in af_packet.c had the same
problem. Also fixed by calling __dev_get_by_index() instead of
dev_get_by_index() and holding rtnl_lock for both calls.
- introduces rtnl_lock in bnx2fc_vport_create() and fcoe_vport_create()
so bnx2fc_if_create() and fcoe_if_create() are called locked as they
are from other places.
- use __ethtool_get_settings() in bonding code

Signed-off-by: Jiri Pirko <jpirko@redhat.com>

v2->v3:
-removed dev_ethtool_get_settings()
-added ASSERT_RTNL into __ethtool_get_settings()
-prb_calc_retire_blk_tmo - use __dev_get_by_index() and lock
around it and __ethtool_get_settings() call
v1->v2:
add missing export_symbol
Reviewed-by: Ben Hutchings <bhutchings@solarflare.com> [except FCoE bits]
Acked-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: David S. Miller <davem@davemloft.net>


# fa3df928 31-Aug-2011 Jiri Pirko <jpirko@redhat.com>

br: remove redundant check and init

Since these checks and initialization are done in
dev_ethtool_get_settings called later on, remove this redundancy.

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Acked-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 11f3a6bd 22-Aug-2011 Eric Dumazet <eric.dumazet@gmail.com>

bridge: fix a possible net_device leak

Jan Beulich reported a possible net_device leak in bridge code after
commit bb900b27a2f4 (bridge: allow creating bridge devices with netlink)

Reported-by: Jan Beulich <JBeulich@novell.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 9be6dd65 05-Aug-2011 Andrei Warkentin <andreiw@motorola.com>

Bridge: Always send NETDEV_CHANGEADDR up on br MAC change.

This ensures the neighbor entries associated with the bridge
dev are flushed, also invalidating the associated cached L2 headers.

This means we br_add_if/br_del_if ports to implement hand-over and
not wind up with bridge packets going out with stale MAC.

This means we can also change MAC of port device and also not wind
up with bridge packets going out with stale MAC.

This builds on Stephen Hemminger's patch, also handling the br_del_if
case and the port MAC change case.

Cc: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: Andrei Warkentin <andreiw@motorola.com>
Acked-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 56139fc5 22-Jul-2011 stephen hemminger <shemminger@vyatta.com>

bridge: notifier called with the wrong device

If a new device is added to a bridge, the ethernet address of the
bridge network device may change. When the address changes, the
appropriate callback is called, but with the wrong device argument.
The address of the bridge device (ie br0) changes not the address
of the device being passed to add_if (ie eth0).

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 6df427fe 19-May-2011 Eric Dumazet <eric.dumazet@gmail.com>

net: remove synchronize_net() from netdev_set_master()

In the old days, we used to access dev->master in __netif_receive_skb()
in a rcu_read_lock section.

So one synchronize_net() call was needed in netdev_set_master() to make
sure another cpu could not use old master while/after we release it.

We now use netdev_rx_handler infrastructure and added one
synchronize_net() call in bond_release()/bond_release_all()

Remove the obsolete synchronize_net() from netdev_set_master() and add
one in bridge del_nbp() after its netdev_rx_handler_unregister() call.

This makes enslave -d a bit faster.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Jiri Pirko <jpirko@redhat.com>
CC: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# bb8ed630 19-May-2011 Amerigo Wang <amwang@redhat.com>

bridge: call NETDEV_JOIN notifiers when add a slave

In the previous patch I added NETDEV_JOIN, now
we can notify netconsole when adding a device to a bridge too.

Signed-off-by: WANG Cong <amwang@redhat.com>
Cc: Neil Horman <nhorman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 25db0338 27-Apr-2011 David Decotigny <decot@google.com>

ethtool: Use full 32 bit speed range in ethtool's set_settings

This makes sure the ethtool's set_settings() callback of network
drivers don't ignore the 16 most significant bits when ethtool calls
their set_settings().

All drivers compiled with make allyesconfig on x86_64 have been
updated.

Signed-off-by: David Decotigny <decot@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# c4d27ef9 22-Apr-2011 Michał Mirosław <mirq-linux@rere.qmqm.pl>

bridge: convert br_features_recompute() to ndo_fix_features

Note: netdev_update_features() needs only rtnl_lock as br->port_list
is only changed while holding it.

Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>


# bb900b27 04-Apr-2011 stephen hemminger <shemminger@vyatta.com>

bridge: allow creating bridge devices with netlink

Add netlink device ops to allow creating bridge device via netlink.
This works in a manner similar to vlan, macvlan and bonding.

Example:
# ip link add link dev br0 type bridge
# ip link del dev br0

The change required rearranging initializtion code to deal with
being called by create link. Most of the initialization happens
in br_dev_setup, but allocation of stats is done in ndo_init callback
to deal with allocation failure. Sysfs setup has to wait until
after the network device kobject is registered.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# edf947f1 24-Mar-2011 stephen hemminger <shemminger@vyatta.com>

bridge: notify applications if address of bridge device changes

The mac address of the bridge device may be changed when a new interface
is added to the bridge. If this happens, then the bridge needs to call
the network notifiers to tickle any other systems that care. Since bridge
can be a module, this also means exporting the notifier function.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# afc6151a 13-Feb-2011 Jiri Pirko <jpirko@redhat.com>

bridge: implement [add/del]_slave ops

add possibility to addif/delif via rtnetlink

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# acd1130e 24-Jan-2011 Michał Mirosław <mirq-linux@rere.qmqm.pl>

net: reduce and unify printk level in netdev_fix_features()

Reduce printk() levels to KERN_INFO in netdev_fix_features() as this will
be used by ethtool and might spam dmesg unnecessarily.

This converts the function to use netdev_info() instead of plain printk().

As a side effect, bonding and bridge devices will now log dropped features
on every slave device change.

Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 04ed3e74 24-Jan-2011 Michał Mirosław <mirq-linux@rere.qmqm.pl>

net: change netdev->features to u32

Quoting Ben Hutchings: we presumably won't be defining features that
can only be enabled on 64-bit architectures.

Occurences found by `grep -r` on net/, drivers/net, include/

[ Move features and vlan_features next to each other in
struct netdev, as per Eric Dumazet's suggestion -DaveM ]

Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>


# ec1e5610 14-Nov-2010 Eric Dumazet <eric.dumazet@gmail.com>

bridge: add RCU annotations to bridge port lookup

br_port_get() renamed to br_port_get_rtnl() to make clear RTNL is held.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# b5ed54e9 14-Nov-2010 stephen hemminger <shemminger@vyatta.com>

bridge: fix RCU races with bridge port

The macro br_port_exists() is not enough protection when only
RCU is being used. There is a tiny race where other CPU has cleared port
handler hook, but is bridge port flag might still be set.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# aa7c6e5f 24-Aug-2010 stephen hemminger <shemminger@vyatta.com>

bridge: avoid ethtool on non running interface

If bridge port is offline, don't call ethtool to query speed.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 944c794d 24-Aug-2010 Stephen Hemminger <shemminger@linux-foundation.org>

bridge: fix locking comment

The carrier check is not called from work queue in current code.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# f350a0a8 15-Jun-2010 Jiri Pirko <jpirko@redhat.com>

bridge: use rx_handler_data pointer to store net_bridge_port pointer

Register net_bridge_port pointer as rx_handler data pointer. As br_port is
removed from struct net_device, another netdev priv_flag is added to indicate
the device serves as a bridge port. Also rcuized pointers are now correctly
dereferenced in br_fdb.c and in netfilter parts.

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 93e2c32b 09-Jun-2010 Jiri Pirko <jpirko@redhat.com>

net: add rx_handler data pointer

Add possibility to register rx_handler data pointer along with a rx_handler.

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 91d2c34a 10-Jun-2010 Herbert Xu <herbert@gondor.apana.org.au>

bridge: Fix netpoll support

There are multiple problems with the newly added netpoll support:

1) Use-after-free on each netpoll packet.
2) Invoking unsafe code on netpoll/IRQ path.
3) Breaks when netpoll is enabled on the underlying device.

This patch fixes all of these problems. In particular, we now
allocate proper netpoll structures for each underlying device.

We only allow netpoll to be enabled on the bridge when all the
devices underneath it support netpoll. Once it is enabled, we
do not allow non-netpoll devices to join the bridge (until netpoll
is disabled again).

This allows us to do away with the npinfo juggling that caused
problem number 1.

Incidentally this patch fixes number 2 by bypassing unsafe code
such as multicast snooping and netfilter.

Reported-by: Qianfeng Zhang <frzhang@redhat.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>


# ab95bfe0 01-Jun-2010 Jiri Pirko <jpirko@redhat.com>

net: replace hooks in __netif_receive_skb V5

What this patch does is it removes two receive frame hooks (for bridge and for
macvlan) from __netif_receive_skb. These are replaced them with a single
hook for both. It only supports one hook per device because it makes no
sense to do bridging and macvlan on the same device.

Then a network driver (of virtual netdev like macvlan or bridge) can register
an rx_handler for needed net device.

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# b3bcb72e 18-May-2010 Randy Dunlap <randy.dunlap@oracle.com>

bridge: fix build for CONFIG_SYSFS disabled

Fix build when CONFIG_SYSFS is not enabled:
net/bridge/br_if.c:136: error: 'struct net_bridge_port' has no member named 'sysfs_name'

Note: dev->name == sysfs_name except when change name is in
progress, and we are protected from that by RTNL mutex.

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Acked-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# e0f43752 10-May-2010 Simon Arlott <simon@octiron.net>

bridge: update sysfs link names if port device names have changed

Links for each port are created in sysfs using the device
name, but this could be changed after being added to the
bridge.

As well as being unable to remove interfaces after this
occurs (because userspace tools don't recognise the new
name, and the kernel won't recognise the old name), adding
another interface with the old name to the bridge will
cause an error trying to create the sysfs link.

This fixes the problem by listening for NETDEV_CHANGENAME
notifications and renaming the link.

https://bugzilla.kernel.org/show_bug.cgi?id=12743

Signed-off-by: Simon Arlott <simon@fire.lp0.eu>
Acked-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# cfb478da 10-May-2010 stephen hemminger <shemminger@vyatta.com>

bridge: netpoll cleanup

Move code around so that the ifdef for NETPOLL_CONTROLLER don't have to
show up in main code path. The control functions should be in helpers
that are only compiled if needed.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# c06ee961 06-May-2010 WANG Cong <amwang@redhat.com>

bridge: make bridge support netpoll

Based on the previous patch, make bridge support netpoll by:

1) implement the 2 methods to support netpoll for bridge;

2) modify netpoll during forwarding packets via bridge;

3) disable netpoll support of bridge when a netpoll-unabled device
is added to bridge;

4) enable netpoll support when all underlying devices support netpoll.

Cc: David Miller <davem@davemloft.net>
Cc: Neil Horman <nhorman@tuxdriver.com>
Cc: Stephen Hemminger <shemminger@linux-foundation.org>
Cc: Matt Mackall <mpm@selenic.com>
Signed-off-by: WANG Cong <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 5a0e3ad6 24-Mar-2010 Tejun Heo <tj@kernel.org>

include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h

percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files. percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.

percpu.h -> slab.h dependency is about to be removed. Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability. As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.

http://userweb.kernel.org/~tj/misc/slabh-sweep.py

The script does the followings.

* Scan files for gfp and slab usages and update includes such that
only the necessary includes are there. ie. if only gfp is used,
gfp.h, if slab is used, slab.h.

* When the script inserts a new include, it looks at the include
blocks and try to put the new include such that its order conforms
to its surrounding. It's put in the include block which contains
core kernel includes, in the same order that the rest are ordered -
alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
doesn't seem to be any matching order.

* If the script can't find a place to put a new include (mostly
because the file doesn't have fitting include block), it prints out
an error message indicating which .h file needs to be added to the
file.

The conversion was done in the following steps.

1. The initial automatic conversion of all .c files updated slightly
over 4000 files, deleting around 700 includes and adding ~480 gfp.h
and ~3000 slab.h inclusions. The script emitted errors for ~400
files.

2. Each error was manually checked. Some didn't need the inclusion,
some needed manual addition while adding it to implementation .h or
embedding .c file was more appropriate for others. This step added
inclusions to around 150 files.

3. The script was run again and the output was compared to the edits
from #2 to make sure no file was left behind.

4. Several build tests were done and a couple of problems were fixed.
e.g. lib/decompress_*.c used malloc/free() wrappers around slab
APIs requiring slab.h to be added manually.

5. The script was run on all .h files but without automatically
editing them as sprinkling gfp.h and slab.h inclusions around .h
files could easily lead to inclusion dependency hell. Most gfp.h
inclusion directives were ignored as stuff from gfp.h was usually
wildly available and often used in preprocessor macros. Each
slab.h inclusion directive was examined and added manually as
necessary.

6. percpu.h was updated not to include slab.h.

7. Build test were done on the following configurations and failures
were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
distributed build env didn't work with gcov compiles) and a few
more options had to be turned off depending on archs to make things
build (like ipr on powerpc/64 which failed due to missing writeq).

* x86 and x86_64 UP and SMP allmodconfig and a custom test config.
* powerpc and powerpc64 SMP allmodconfig
* sparc and sparc64 SMP allmodconfig
* ia64 SMP allmodconfig
* s390 SMP allmodconfig
* alpha SMP allmodconfig
* um on x86_64 SMP allmodconfig

8. percpu.h modifications were reverted so that it could be applied as
a separate patch and serve as bisection point.

Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.

Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>


# 14bb4789 02-Mar-2010 stephen hemminger <shemminger@vyatta.com>

bridge: per-cpu packet statistics (v3)

The shared packet statistics are a potential source of slow down
on bridged traffic. Convert to per-cpu array, but only keep those
statistics which change per-packet.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 3fe2d7c7 28-Feb-2010 Herbert Xu <herbert@gondor.apana.org.au>

bridge: Add multicast start/stop hooks

This patch hooks up the bridge start/stop and add/delete/disable
port functions to the new multicast module.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 570930fe 04-Feb-2010 Herbert Xu <herbert@gondor.apana.org.au>

bridge: Remove unused age_list

This patch removes the unused age_list member from the net_bridge
structure.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 2c8c1e72 16-Jan-2010 Alexey Dobriyan <adobriyan@gmail.com>

net: spread __net_init, __net_exit

__net_init/__net_exit are apparently not going away, so use them
to full extent.

In some cases __net_init was removed, because it was called from
__net_exit code.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# ad4bb6f8 18-Nov-2009 Johannes Berg <johannes@sipsolutions.net>

cfg80211: disallow bridging managed/adhoc interfaces

A number of people have tried to add a wireless interface
(in managed mode) to a bridge and then complained that it
doesn't work. It cannot work, however, because in 802.11
networks all packets need to be acknowledged and as such
need to be sent to the right address. Promiscuous doesn't
help here. The wireless address format used for these
links has only space for three addresses, the
* transmitter, which must be equal to the sender (origin)
* receiver (on the wireless medium), which is the AP in
the case of managed mode
* the recipient (destination), which is on the APs local
network segment

In an IBSS, it is similar, but the receiver and recipient
must match and the third address is used as the BSSID.

To avoid such mistakes in the future, disallow adding a
wireless interface to a bridge.

Felix has recently added a four-address mode to the AP
and client side that can be used (after negotiating that
it is possible, which must happen out-of-band by setting
up both sides) for bridging, so allow that case.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Acked-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>


# 1056bd51 05-Nov-2009 Stephen Hemminger <shemminger@vyatta.com>

bridge: prevent bridging wrong device

The bridge code assumes ethernet addressing, so be more strict in
the what is allowed. This showed up when GRE had a bug and was not
using correct address format.

Add some more comments for increased clarity.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 8c56ba05 27-Oct-2009 Eric Dumazet <eric.dumazet@gmail.com>

bridge: Optimize multiple unregistration

Speedup module unloading by factorizing synchronize_rcu() calls

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 30df94f8 28-Sep-2009 Jeff Hansen <x@jeffhansen.com>

bridge: Fix double-free in br_add_if.

There is a potential double-kfree in net/bridge/br_if.c. If br_fdb_insert
fails, then the kobject is put back (which calls kfree due to the kobject
release), and then kfree is called again on the net_bridge_port. This
patch fixes the crash.

Thanks to Stephen Hemminger for the one-line fix.

Signed-off-by: Jeff Hansen <x@jeffhansen.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 384912ed 31-Aug-2009 Marcel Holtmann <marcel@holtmann.org>

net: Add DEVTYPE support for Ethernet based devices

The Ethernet framing is used for a lot of devices these days. Most
prominent are WiFi and WiMAX based devices. However for userspace
application it is important to classify these devices correctly and
not only see them as Ethernet devices. The daemons like HAL, DeviceKit
or even NetworkManager with udev support tries to do the classification
in userspace with a lot trickery and extra system calls. This is not
good and actually reaches its limitations. Especially since the kernel
does know the type of the Ethernet device it is pretty stupid.

To solve this problem the underlying device type needs to be set and
then the value will be exported as DEVTYPE via uevents and available
within udev.

# cat /sys/class/net/wlan0/uevent
DEVTYPE=wlan
INTERFACE=wlan0
IFINDEX=5

This is similar to subsystems like USB and SCSI that distinguish
between hosts, devices, disks, partitions etc.

The new SET_NETDEV_DEVTYPE() is a convenience helper to set the actual
device type. All device types are free form, but for convenience the
same strings as used with RFKILL are choosen.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 3982d3d2 13-Aug-2009 Fischer, Anna <anna.fischer@hp.com>

net/bridge: Add 'hairpin' port forwarding mode

This patch adds a 'hairpin' (also called 'reflective relay') mode
port configuration to the Linux Ethernet bridge kernel module.
A bridge supporting hairpin forwarding mode can send frames back
out through the port the frame was received on.

Hairpin mode is required to support basic VEPA (Virtual
Ethernet Port Aggregator) capabilities.

You can find additional information on VEPA here:
http://tech.groups.yahoo.com/group/evb/
http://www.ieee802.org/1/files/public/docs2009/new-hudson-vepa_seminar-20090514d.pdf
http://www.internet2.edu/presentations/jt2009jul/20090719-congdon.pdf

An additional patch 'bridge-utils: Add 'hairpin' port forwarding mode'
is provided to allow configuring hairpin mode from userspace tools.

Signed-off-by: Paul Congdon <paul.congdon@hp.com>
Signed-off-by: Anna Fischer <anna.fischer@hp.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>


# c587aea9 23-Jul-2009 Xiaotian Feng <dfeng@redhat.com>

net/bridge: use kobject_put to release kobject in br_add_if error path

kobject_init_and_add will alloc memory for kobj->name, so in br_add_if
error path, simply use kobject_del will not free memory for kobj->name.
Fix by using kobject_put instead, kobject_put will internally calls
kobject_del and frees memory for kobj->name.

Signed-off-by: Xiaotian Feng <dfeng@redhat.com>
Acked-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# cda6d377 25-Mar-2009 Stephen Hemminger <shemminger@vyatta.com>

bridge: bad error handling when adding invalid ether address

This fixes an crash when empty bond device is added to a bridge.
If an interface with invalid ethernet address (all zero) is added
to a bridge, then bridge code detects it when setting up the forward
databas entry. But the error unwind is broken, the bridge port object
can get freed twice: once when ref count went to zeo, and once by kfree.
Since object is never really accessible, just free it.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 00829823 20-Nov-2008 Stephen Hemminger <shemminger@vyatta.com>

netdev: add more functions to netdevice ops

This patch moves neigh_setup and hard_start_xmit into the network device ops
structure. For bisection, fix all the previously converted drivers as well.
Bonding driver took the biggest hit on this.

Added a prefetch of the hard_start_xmit in the fast path to try and reduce
any impact this would have.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 524ad0a7 13-Nov-2008 Wang Chen <wangchen@cn.fujitsu.com>

netdevice: safe convert to netdev_priv() #part-4

We have some reasons to kill netdev->priv:
1. netdev->priv is equal to netdev_priv().
2. netdev_priv() wraps the calculation of netdev->priv's offset, obviously
netdev_priv() is more flexible than netdev->priv.
But we cann't kill netdev->priv, because so many drivers reference to it
directly.

This patch is a safe convert for netdev->priv to netdev_priv(netdev).
Since all of the netdev->priv is only for read.
But it is too big to be sent in one mail.
I split it to 4 parts and make every part smaller than 100,000 bytes,
which is max size allowed by vger.

Signed-off-by: Wang Chen <wangchen@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# b63365a2 23-Oct-2008 Herbert Xu <herbert@gondor.apana.org.au>

net: Fix disjunct computation of netdev features

My change

commit e2a6b85247aacc52d6ba0d9b37a99b8d1a3e0d83
net: Enable TSO if supported by at least one device

didn't do what was intended because the netdev_compute_features
function was designed for conjunctions. So what happened was that
it would simply take the TSO status of the last constituent device.

This patch extends it to support both conjunctions and disjunctions
under the new name of netdev_increment_features.

It also adds a new function netdev_fix_features which does the
sanity checking that usually occurs upon registration. This ensures
that the computation doesn't result in an illegal combination
since this checking is absent when the change is initiated via
ethtool.

The two users of netdev_compute_features have been converted.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 712d6954 08-Sep-2008 Alexey Dobriyan <adobriyan@gmail.com>

netns bridge: cleanup bridges during netns stop

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Acked-by: Stephen Hemminger <shemming@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 4aa678ba 08-Sep-2008 Alexey Dobriyan <adobriyan@gmail.com>

netns bridge: allow bridges in netns!

Bridge as netdevice doesn't cross netns boundaries.

Bridge ports and bridge itself live in same netns.

Notifiers are fixed.

netns propagated from userspace socket for setup and teardown.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Acked-by: Stephen Hemminger <shemming@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 4adf0af6 30-Jul-2008 Simon Wunderlich <siwu@hrz.tu-chemnitz.de>

bridge: send correct MTU value in PMTU (revised)

When bridging interfaces with different MTUs, the bridge correctly chooses
the minimum of the MTUs of the physical devices as the bridges MTU. But
when a frame is passed which fits through the incoming, but not through
the outgoing interface, a "Fragmentation Needed" packet is generated.

However, the propagated MTU is hardcoded to 1500, which is wrong in this
situation. The sender will repeat the packet again with the same frame
size, and the same problem will occur again.

Instead of sending 1500, the (correct) MTU value of the bridge is now sent
via PMTU. To achieve this, the corresponding rtable structure is stored
in its net_bridge structure.

Modified to get rid of fake_net_device as well.

Signed-off-by: Simon Wunderlich <siwu@hrz.tu-chemnitz.de>
Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>


# bc3f9076 14-Jul-2008 Wang Chen <wangchen@cn.fujitsu.com>

bridge: Check return of dev_set_promiscuity

dev_set_promiscuity/allmulti might overflow.
Commit: "netdevice: Fix promiscuity and allmulti overflow" in net-next makes
dev_set_promiscuity/allmulti return error number if overflow happened.

Here, we check the positive increment for promiscuity to get error return.

Signed-off-by: Wang Chen <wangchen@cn.fujitsu.com>
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# ab1b2046 03-Jul-2008 Patrick McHardy <kaber@trash.net>

bridge: fix use-after-free in br_cleanup_bridges()

Unregistering a bridge device may cause virtual devices stacked on the
bridge, like vlan or macvlan devices, to be unregistered as well.
br_cleanup_bridges() uses for_each_netdev_safe() to iterate over all
devices during cleanup. This is not enough however, if one of the
additionally unregistered devices is next in the list to the bridge
device, it will get freed as well and the iteration continues on
the freed element.

Restart iteration after each bridge device removal from the beginning to
fix this, similar to what rtnl_link_unregister() does.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Acked-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 0187bdfb 19-Jun-2008 Ben Hutchings <bhutchings@solarflare.com>

net: Disable LRO on devices that are forwarding

Large Receive Offload (LRO) is only appropriate for packets that are
destined for the host, and should be disabled if received packets may be
forwarded. It can also confuse the GSO on output.

Add dev_disable_lro() function which uses the appropriate ethtool ops to
disable LRO if enabled.

Add calls to dev_disable_lro() in br_add_if() and functions that enable
IPv4 and IPv6 forwarding.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 0b040829 10-Jun-2008 Adrian Bunk <bunk@kernel.org>

net: remove CVS keywords

This patch removes CVS keywords that weren't updated for a long time
from comments.

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>


# e340a90e 04-May-2008 Pavel Emelyanov <xemul@openvz.org>

bridge: Consolidate error paths in br_add_bridge().

This actually had to be merged with the patch #1, but I decided not to
mix two changes in one patch.

There are already two calls to free_netdev() in there, so merge them
into one.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>


# c37aa90b 04-May-2008 Pavel Emelyanov <xemul@openvz.org>

bridge: Net device leak in br_add_bridge().

In case the register_netdevice() call fails the device is leaked,
since the out: label is just rtnl_unlock()+return.

Free the device.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 43af8532 29-Apr-2008 Volodymyr G Lukiianyk <volodymyrgl@gmail.com>

bridge: fix error handling in br_add_if()

When device is added to bridge its refcnt is incremented (in new_nbp()), but if
error occurs during further br_add_if() operations this counter is not
decremented back. Fix it by adding dev_put() call in the error path.

Signed-off-by: Volodymyr G Lukiianyk <volodymyrgl@gmail.com>
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# e32cc736 17-Dec-2007 Greg Kroah-Hartman <gregkh@suse.de>

Kobject: convert net/bridge/br_if.c to use kobject_init/add_ng()

This converts the code to use the new kobject functions, cleaning up the
logic in doing so.

Cc: Stephen Hemminger <shemminger@linux-foundation.org>
Cc: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>


# 43b98c4a 17-Dec-2007 Greg Kroah-Hartman <gregkh@suse.de>

Kobject: change net/bridge to use kobject_create_and_add

The kobject in the bridge code is only used for registering with sysfs,
not for any lifespan rules. This patch changes it to be only a pointer
and use the simpler api for this kind of thing.

Cc: Stephen Hemminger <shemminger@linux-foundation.org>
Cc: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>


# 881d966b 17-Sep-2007 Eric W. Biederman <ebiederm@xmission.com>

[NET]: Make the device list and device lookups per namespace.

This patch makes most of the generic device layer network
namespace safe. This patch makes dev_base_head a
network namespace variable, and then it picks up
a few associated variables. The functions:
dev_getbyhwaddr
dev_getfirsthwbytype
dev_get_by_flags
dev_get_by_name
__dev_get_by_name
dev_get_by_index
__dev_get_by_index
dev_ioctl
dev_ethtool
dev_load
wireless_process_ioctl

were modified to take a network namespace argument, and
deal with it.

vlan_ioctl_set and brioctl_set were modified so their
hooks will receive a network namespace argument.

So basically anthing in the core of the network stack that was
affected to by the change of dev_base was modified to handle
multiple network namespaces. The rest of the network stack was
simply modified to explicitly use &init_net the initial network
namespace. This can be fixed when those components of the network
stack are modified to handle multiple network namespaces.

For now the ifindex generator is left global.

Fundametally ifindex numbers are per namespace, or else
we will have corner case problems with migration when
we get that far.

At the same time there are assumptions in the network stack
that the ifindex of a network device won't change. Making
the ifindex number global seems a good compromise until
the network stack can cope with ifindex changes when
you change namespaces, and the like.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# b4a488d1 30-Aug-2007 Stephen Hemminger <shemminger@linux-foundation.org>

[BRIDGE]: Fix OOPS when bridging device without ethtool.

Bridge code calls ethtool to get speed. The conversion to using
only ethtool_ops broke the case of devices without ethtool_ops.
This is a new regression in 2.6.23.

Rearranged the switch to a logical order, and use gcc initializer.

Ps: speed should have been part of the network device structure from
the start rather than burying it in ethtool.

Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Acked-by: Matthew Wilcox <matthew@wil.cx>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 7f353bf2 10-Aug-2007 Herbert Xu <herbert@gondor.apana.org.au>

[NET]: Share correct feature code between bridging and bonding

http://bugzilla.kernel.org/show_bug.cgi?id=8797 shows that the
bonding driver may produce bogus combinations of the checksum
flags and SG/TSO.

For example, if you bond devices with NETIF_F_HW_CSUM and
NETIF_F_IP_CSUM you'll end up with a bonding device that
has neither flag set. If both have TSO then this produces
an illegal combination.

The bridge device on the other hand has the correct code to
deal with this.

In fact, the same code can be used for both. So this patch
moves that logic into net/core/dev.c and uses it for both
bonding and bridging.

In the process I've made small adjustments such as only
setting GSO_ROBUST if at least one constituent device
supports it.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 61a44b9c 31-Jul-2007 Matthew Wilcox <willy@infradead.org>

[NET]: ethtool ops are the only way

During the transition to the ethtool_ops way of doing things, we supported
calling the device's ->do_ioctl method to allow unconverted drivers to
continue working. Those days are long behind us, all in-tree drivers
use the ethtool_ops way, and so we no longer need to support this.

The bonding driver is the biggest beneficiary of this; it no longer
needs to call ioctl() as a fallback if ethtool_ops aren't supported.

Also put a proper copyright statement on ethtool.c.

Signed-off-by: Matthew Wilcox <matthew@wil.cx>
Signed-off-by: David S. Miller <davem@davemloft.net>


# d212f87b 27-Jun-2007 Stephen Hemminger <shemminger@linux-foundation.org>

[NET]: IPV6 checksum offloading in network devices

The existing model for checksum offload does not correctly handle
devices that can offload IPV4 and IPV6 only. The NETIF_F_HW_CSUM flag
implies device can do any arbitrary protocol.

This patch:
* adds NETIF_F_IPV6_CSUM for those devices
* fixes bnx2 and tg3 devices that need it
* add NETIF_F_IPV6_CSUM to ipv6 output (incl GSO)
* fixes assumptions about NETIF_F_ALL_CSUM in nat
* adjusts bridge union of checksumming computation

Signed-off-by: David S. Miller <davem@davemloft.net>


# 7562f876 03-May-2007 Pavel Emelianov <xemul@openvz.org>

[NET]: Rework dev_base via list_head (v3)

Cleanup of dev_base list use, with the aim to simplify making device
list per-namespace. In almost every occasion, use of dev_base variable
and dev->next pointer could be easily replaced by for_each_netdev
loop. A few most complicated places were converted to using
first_netdev()/next_netdev().

Signed-off-by: Pavel Emelianov <xemul@openvz.org>
Acked-by: Kirill Korotaev <dev@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>


# b86c4503 22-Mar-2007 Stephen Hemminger <shemminger@linux-foundation.org>

bridge: change when netlink events go to STP

Need to tell STP daemon about more events, like any time a
device is added even when it is down.

Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>


# 9cde0708 21-Mar-2007 Stephen Hemminger <shemminger@linux-foundation.org>

bridge: add support for user mode STP

This patchset based on work by Aji_Srinivas@emc.com provides allows
spanning tree to be controled from userspace. Like hotplug, it
uses call_usermodehelper when spanning tree is enabled so there
is no visible API change. If call to start usermode STP fails
it falls back to existing kernel STP.

Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>


# de79059e 07-Mar-2007 Aji Srinivas <emc.com>

[BRIDGE]: adding new device to bridge should enable if up

One change introduced by the workqueue removal patch is that adding an
interface that is up to a bridge which is also up does not ever call
br_stp_enable_port(), leaving the port in DISABLED state until we do
ifconfig down and up or link events occur.

The following patch to the br_add_if function fixes it.
This is a regression introduced in 2.6.21.

Submitted-by: Aji_Srinivas@emc.com
Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 269def7c 22-Feb-2007 Stephen Hemminger <shemminger@linux-foundation.org>

[BRIDGE]: eliminate workqueue for carrier check

Having a work queue for checking carrier leads to lots of race issues.
Simpler to just get the cost when data structure is created and
update on change.

Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>


# a10d567c 13-Feb-2007 Jarek Poplawski <jarkao2@o2.pl>

[BRIDGE] br_if: Fix oops in port_carrier_check

Signed-off-by: Jarek Poplawski <jarkao2@o2.pl>
Acked-by: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 9d6f229f 09-Feb-2007 YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>

[NET] BRIDGE: Fix whitespace errors.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 43cb76d9 09-Apr-2002 Greg Kroah-Hartman <gregkh@suse.de>

Network: convert network devices to use struct device instead of class_device

This lets the network core have the ability to handle suspend/resume
issues, if it wants to.

Thanks to Frederik Deweerdt <frederik.deweerdt@gmail.com> for the arm
driver fixes.

Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>


# c4028958 22-Nov-2006 David Howells <dhowells@redhat.com>

WorkStruct: make allyesconfig

Fix up for make allyesconfig.

Signed-Off-By: David Howells <dhowells@redhat.com>


# 1a620698 12-Oct-2006 Stephen Hemminger <shemminger@osdl.org>

[BRIDGE]: flush forwarding table when device carrier off

Flush the forwarding table when carrier is lost. This helps for
availability because we don't want to forward to a downed device and
new packets may come in on other links.

Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 78eb8877 17-Aug-2006 Herbert Xu <herbert@gondor.apana.org.au>

[BRIDGE]: Disable SG/GSO if TX checksum is off

When the bridge recomputes features, it does not maintain the
constraint that SG/GSO must be off if TX checksum is off.
This patch adds that constraint.

On a completely unrelated note, I've also added TSO6 and TSO_ECN
feature bits if GSO is enabled on the underlying device through
the new NETIF_F_GSO_SOFTWARE macro.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 576a30eb 27-Jun-2006 Herbert Xu <herbert@gondor.apana.org.au>

[NET]: Added GSO header verification

When GSO packets come from an untrusted source (e.g., a Xen guest domain),
we need to verify the header integrity before passing it to the hardware.

Since the first step in GSO is to verify the header, we can reuse that
code by adding a new bit to gso_type: SKB_GSO_DODGY. Packets with this
bit set can only be fed directly to devices with the corresponding bit
NETIF_F_GSO_ROBUST. If the device doesn't have that bit, then the skb
is fed to the GSO engine which will allow the packet to be sent to the
hardware if it passes the header check.

This patch changes the sg flag to a full features flag. The same method
can be used to implement TSO ECN support. We simply have to mark packets
with CWR set with SKB_GSO_ECN so that only hardware with a corresponding
NETIF_F_TSO_ECN can accept them. The GSO engine can either fully segment
the packet, or segment the first MTU and pass the rest to the hardware for
further segmentation.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 37c3185a 22-Jun-2006 Herbert Xu <herbert@gondor.apana.org.au>

[NET]: Added GSO toggle

This patch adds a generic segmentation offload toggle that can be turned
on/off for each net device. For now it only supports in TCPv4.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 2c6cc0d8 17-Jun-2006 Herbert Xu <herbert@gondor.apana.org.au>

[BRIDGE]: Add support for NETIF_F_HW_CSUM devices

As it is the bridge will only ever declare NETIF_F_IP_CSUM even if all
its constituent devices support NETIF_F_HW_CSUM. This patch fixes
this by supporting the first one out of NETIF_F_NO_CSUM,
NETIF_F_HW_CSUM, and NETIF_F_IP_CSUM that is supported by all
constituent devices.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 8648b305 17-Jun-2006 Herbert Xu <herbert@gondor.apana.org.au>

[NET]: Add NETIF_F_GEN_CSUM and NETIF_F_ALL_CSUM

The current stack treats NETIF_F_HW_CSUM and NETIF_F_NO_CSUM
identically so we test for them in quite a few places. For the sake
of brevity, I'm adding the macro NETIF_F_GEN_CSUM for these two. We
also test the disjunct of NETIF_F_IP_CSUM and the other two in various
places, for that purpose I've added NETIF_F_ALL_CSUM.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 36485707 05-Jun-2006 Jiri Benc <jbenc@suse.cz>

[BRIDGE]: fix locking and memory leak in br_add_bridge

There are several bugs in error handling in br_add_bridge:
- when dev_alloc_name fails, allocated net_device is not freed
- unregister_netdev is called when rtnl lock is held
- free_netdev is called before netdev_run_todo has a chance to be run after
unregistering net_device

Signed-off-by: Jiri Benc <jbenc@suse.cz>
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>


# ac05202e 10-May-2006 Stephen Hemminger <shemminger@osdl.org>

[BRIDGE]: Do sysfs registration inside rtnl.

Now that netdevice sysfs registration is done as part of
register_netdevice; bridge code no longer has to be tricky when adding
it's kobjects to bridges.

Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>


# cee48541 20-Mar-2006 Stephen Hemminger <shemminger@osdl.org>

[BRIDGE]: use kzalloc

Use kzalloc versus kmalloc+memset. Also don't need to do
memset() of bridge address since it is in netdev private data
that is already zero'd in alloc_netdev.

Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 3b781fa1 20-Mar-2006 Stephen Hemminger <shemminger@osdl.org>

[BRIDGE]: use kcalloc

Use kcalloc rather than kmalloc + memset.

Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 125a12cc 03-Mar-2006 Stephen Hemminger <shemminger@osdl.org>

[BRIDGE]: generate kobject remove event

The earlier round of kobject/sysfs changes to bridge caused
it not to generate a uevent on removal. Don't think any application
cares (not sure about Xen) but since it generates add uevent
it should generate remove as well.

Signed-off-by: Stephen Hemminger <shemmigner@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>


# d32439c0 03-Mar-2006 Stephen Hemminger <shemminger@osdl.org>

[BRIDGE]: port timer initialization

Initialize the STP timers for a port when it is created,
rather than when it is enabled. This will prevent future race conditions
where timer gets started before port is enabled.

Signed-off-by: Stephen Hemminger <shemmigner@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 6e86b890 03-Mar-2006 Stephen Hemminger <shemminger@osdl.org>

[BRIDGE]: fix crash in STP

Bridge would crash because of uninitailized timer if STP is used and
device was inserted into a bridge before bridge was up. This got
introduced when the delayed port checking was added. Fix is to not
enable STP on port unless bridge is up.

Bugzilla: http://bugzilla.kernel.org/show_bug.cgi?id=6140
Dup: http://bugzilla.kernel.org/show_bug.cgi?id=6156

Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>


# bab1deea 09-Feb-2006 Stephen Hemminger <shemminger@osdl.org>

[BRIDGE]: fix error handling for add interface to bridge

Refactor how the bridge code interacts with kobject system.
It should still use kobjects even if not using sysfs.
Fix the error unwind handling in br_add_if.

Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>


# b3f1be4b 09-Feb-2006 Stephen Hemminger <shemminger@osdl.org>

[BRIDGE]: fix for RCU and deadlock on device removal

Change Bridge receive path to correctly handle RCU removal of device
from bridge. Also fixes deadlock between carrier_check and del_nbp.
This replaces the previous deleted flag fix.

Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 3f4cfc2d 31-Jan-2006 Stephen Hemminger <shemminger@osdl.org>

[BRIDGE]: Fix device delete race.

This is a simpler fix for the two races in bridge device removal.
The Xen race of delif and notify is managed now by a new deleted flag.
No need for barriers or other locking because of rtnl mutex.

The del_timer_sync()'s are unnecessary, because br_stp_disable_port
delete's the timers, and they will finish running before RCU callback.

Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 46f25dff 05-Jan-2006 Kris Katterjohn <kjak@users.sourceforge.net>

[NET]: Change 1500 to ETH_DATA_LEN in some files

These patches add the header linux/if_ether.h and change 1500 to
ETH_DATA_LEN in some files.

Signed-off-by: Kris Katterjohn <kjak@users.sourceforge.net>
Signed-off-by: David S. Miller <davem@davemloft.net>


# edb5e46f 21-Dec-2005 Stephen Hemminger <shemminger@osdl.org>

[BRIDGE]: limited ethtool support

Add limited ethtool support to bridge to allow disabling
features.

Note: if underlying device does not support a feature (like checksum
offload), then the bridge device won't inherit it.

Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 4433f420 20-Dec-2005 Stephen Hemminger <shemminger@osdl.org>

[BRIDGE]: handle speed detection after carrier changes

Speed of a interface may not be available until carrier
is detected in the case of autonegotiation. To get the correct value
we need to recheck speed after carrier event. But the check needs to
be done in a context that is similar to normal ethtool interface (can sleep).

Also, delay check for 1ms to try avoid any carrier bounce transitions.

Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 133747e8 23-Nov-2005 Olaf Rempel <razzor@kopf-tisch.de>

[BRIDGE]: recompute features when adding a new device

We must recompute bridge features everytime the list of underlying
devices changes, or we might end up with features that are not
supported by all devices (eg. NETIF_F_TSO)
This patch adds the missing recompute when adding a device to the bridge.

Signed-off-by: Olaf Rempel <razzor@kopf-tisch.de>
Signed-off-by: David S. Miller <davem@davemloft.net>


# ab4060e8 12-Oct-2005 Stephen Hemminger <shemminger@osdl.org>

[BRIDGE]: fix race on bridge del if

This fixes the RCU race on bridge delete interface. Basically,
the network device has to be detached from the bridge in the first
step (pre-RCU), rather than later. At that point, no more bridge traffic
will come in, and the other code will not think that network device
is part of a bridge.

This should also fix the XEN test problems.

Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 81d35307 29-May-2005 Stephen Hemminger <shemminger@osdl.org>

[BRIDGE]: set features based on enslaved devices

Make features of the bridge pseudo-device be a subset of the underlying
devices. Motivated by Xen and others who use bridging to do failover.

Signed-off-by: Catalin BOIE <catab at umrella.ro>
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 1da177e4 16-Apr-2005 Linus Torvalds <torvalds@ppc970.osdl.org>

Linux-2.6.12-rc2

Initial git repository build. I'm not bothering with the full history,
even though we have it. We can create a separate "historical" git
archive of that later if we want to, and in the meantime it's about
3.2GB when imported into git - space that would just make the early
git days unnecessarily complicated, when we don't have a lot of good
infrastructure for it.

Let it rip!