#
751de201 |
|
09-Apr-2024 |
Pablo Neira Ayuso <pablo@netfilter.org> |
netfilter: br_netfilter: skip conntrack input hook for promisc packets For historical reasons, when bridge device is in promisc mode, packets that are directed to the taps follow bridge input hook path. This patch adds a workaround to reset conntrack for these packets. Jianbo Liu reports warning splats in their test infrastructure where cloned packets reach the br_netfilter input hook to confirm the conntrack object. Scratch one bit from BR_INPUT_SKB_CB to annotate that this packet has reached the input hook because it is passed up to the bridge device to reach the taps. [ 57.571874] WARNING: CPU: 1 PID: 0 at net/bridge/br_netfilter_hooks.c:616 br_nf_local_in+0x157/0x180 [br_netfilter] [ 57.572749] Modules linked in: xt_MASQUERADE nf_conntrack_netlink nfnetlink iptable_nat xt_addrtype xt_conntrack nf_nat br_netfilter rpcsec_gss_krb5 auth_rpcgss oid_registry overlay rpcrdma rdma_ucm ib_iser libiscsi scsi_transport_isc si ib_umad rdma_cm ib_ipoib iw_cm ib_cm mlx5_ib ib_uverbs ib_core mlx5ctl mlx5_core [ 57.575158] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 6.8.0+ #19 [ 57.575700] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 [ 57.576662] RIP: 0010:br_nf_local_in+0x157/0x180 [br_netfilter] [ 57.577195] Code: fe ff ff 41 bd 04 00 00 00 be 04 00 00 00 e9 4a ff ff ff be 04 00 00 00 48 89 ef e8 f3 a9 3c e1 66 83 ad b4 00 00 00 04 eb 91 <0f> 0b e9 f1 fe ff ff 0f 0b e9 df fe ff ff 48 89 df e8 b3 53 47 e1 [ 57.578722] RSP: 0018:ffff88885f845a08 EFLAGS: 00010202 [ 57.579207] RAX: 0000000000000002 RBX: ffff88812dfe8000 RCX: 0000000000000000 [ 57.579830] RDX: ffff88885f845a60 RSI: ffff8881022dc300 RDI: 0000000000000000 [ 57.580454] RBP: ffff88885f845a60 R08: 0000000000000001 R09: 0000000000000003 [ 57.581076] R10: 00000000ffff1300 R11: 0000000000000002 R12: 0000000000000000 [ 57.581695] R13: ffff8881047ffe00 R14: ffff888108dbee00 R15: ffff88814519b800 [ 57.582313] FS: 0000000000000000(0000) GS:ffff88885f840000(0000) knlGS:0000000000000000 [ 57.583040] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 57.583564] CR2: 000000c4206aa000 CR3: 0000000103847001 CR4: 0000000000370eb0 [ 57.584194] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 57.584820] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 57.585440] Call Trace: [ 57.585721] <IRQ> [ 57.585976] ? __warn+0x7d/0x130 [ 57.586323] ? br_nf_local_in+0x157/0x180 [br_netfilter] [ 57.586811] ? report_bug+0xf1/0x1c0 [ 57.587177] ? handle_bug+0x3f/0x70 [ 57.587539] ? exc_invalid_op+0x13/0x60 [ 57.587929] ? asm_exc_invalid_op+0x16/0x20 [ 57.588336] ? br_nf_local_in+0x157/0x180 [br_netfilter] [ 57.588825] nf_hook_slow+0x3d/0xd0 [ 57.589188] ? br_handle_vlan+0x4b/0x110 [ 57.589579] br_pass_frame_up+0xfc/0x150 [ 57.589970] ? br_port_flags_change+0x40/0x40 [ 57.590396] br_handle_frame_finish+0x346/0x5e0 [ 57.590837] ? ipt_do_table+0x32e/0x430 [ 57.591221] ? br_handle_local_finish+0x20/0x20 [ 57.591656] br_nf_hook_thresh+0x4b/0xf0 [br_netfilter] [ 57.592286] ? br_handle_local_finish+0x20/0x20 [ 57.592802] br_nf_pre_routing_finish+0x178/0x480 [br_netfilter] [ 57.593348] ? br_handle_local_finish+0x20/0x20 [ 57.593782] ? nf_nat_ipv4_pre_routing+0x25/0x60 [nf_nat] [ 57.594279] br_nf_pre_routing+0x24c/0x550 [br_netfilter] [ 57.594780] ? br_nf_hook_thresh+0xf0/0xf0 [br_netfilter] [ 57.595280] br_handle_frame+0x1f3/0x3d0 [ 57.595676] ? br_handle_local_finish+0x20/0x20 [ 57.596118] ? br_handle_frame_finish+0x5e0/0x5e0 [ 57.596566] __netif_receive_skb_core+0x25b/0xfc0 [ 57.597017] ? __napi_build_skb+0x37/0x40 [ 57.597418] __netif_receive_skb_list_core+0xfb/0x220 Fixes: 62e7151ae3eb ("netfilter: bridge: confirm multicast packets before passing them up the stack") Reported-by: Jianbo Liu <jianbol@nvidia.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
#
f5c3eb4b |
|
27-Jan-2024 |
Linus Lüssing <linus.luessing@c0d3.blue> |
bridge: mcast: fix disabled snooping after long uptime The original idea of the delay_time check was to not apply multicast snooping too early when an MLD querier appears. And to instead wait at least for MLD reports to arrive before switching from flooding to group based, MLD snooped forwarding, to avoid temporary packet loss. However in a batman-adv mesh network it was noticed that after 248 days of uptime 32bit MIPS based devices would start to signal that they had stopped applying multicast snooping due to missing queriers - even though they were the elected querier and still sending MLD queries themselves. While time_is_before_jiffies() generally is safe against jiffies wrap-arounds, like the code comments in jiffies.h explain, it won't be able to track a difference larger than ULONG_MAX/2. With a 32bit large jiffies and one jiffies tick every 10ms (CONFIG_HZ=100) on these MIPS devices running OpenWrt this would result in a difference larger than ULONG_MAX/2 after 248 (= 2^32/100/60/60/24/2) days and time_is_before_jiffies() would then start to return false instead of true. Leading to multicast snooping not being applied to multicast packets anymore. Fix this issue by using a proper timer_list object which won't have this ULONG_MAX/2 difference limitation. Fixes: b00589af3b04 ("bridge: disable snooping if there is no querier") Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://lore.kernel.org/r/20240127175033.9640-1-linus.luessing@c0d3.blue Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
a6acb535 |
|
17-Dec-2023 |
Ido Schimmel <idosch@nvidia.com> |
bridge: mdb: Add MDB bulk deletion support Implement MDB bulk deletion support in the bridge driver, allowing MDB entries to be deleted in bulk according to provided parameters. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
bcc1f84e |
|
01-Dec-2023 |
Hangbin Liu <liuhangbin@gmail.com> |
docs: bridge: Add kAPI/uAPI fields Add kAPI/uAPI field for bridge doc. Update struct net_bridge_vlan comments to fix doc build warning. Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
#
68b380a3 |
|
25-Oct-2023 |
Ido Schimmel <idosch@nvidia.com> |
bridge: mcast: Add MDB get support Implement support for MDB get operation by looking up a matching MDB entry, allocating the skb according to the entry's size and then filling in the response. The operation is performed under the bridge multicast lock to ensure that the entry does not change between the time the reply size is determined and when the reply is filled in. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
6d0259dd |
|
25-Oct-2023 |
Ido Schimmel <idosch@nvidia.com> |
bridge: mcast: Rename MDB entry get function The current name is going to conflict with the upcoming net device operation for the MDB get operation. Rename the function to br_mdb_entry_skb_get(). No functional changes intended. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
bdb4dfda |
|
16-Oct-2023 |
Johannes Nixdorf <jnixdorf-oss@avm.de> |
net: bridge: Track and limit dynamically learned FDB entries A malicious actor behind one bridge port may spam the kernel with packets with a random source MAC address, each of which will create an FDB entry, each of which is a dynamic allocation in the kernel. There are roughly 2^48 different MAC addresses, further limited by the rhashtable they are stored in to 2^31. Each entry is of the type struct net_bridge_fdb_entry, which is currently 128 bytes big. This means the maximum amount of memory allocated for FDB entries is 2^31 * 128B = 256GiB, which is too much for most computers. Mitigate this by maintaining a per bridge count of those automatically generated entries in fdb_n_learned, and a limit in fdb_max_learned. If the limit is hit new entries are not learned anymore. For backwards compatibility the default setting of 0 disables the limit. User-added entries by netlink or from bridge or bridge port addresses are never blocked and do not count towards that limit. Introduce a new fdb entry flag BR_FDB_DYNAMIC_LEARNED to keep track of whether an FDB entry is included in the count. The flag is enabled for dynamically learned entries, and disabled for all other entries. This should be equivalent to BR_FDB_ADDED_BY_USER and BR_FDB_LOCAL being unset, but contrary to the two flags it can be toggled atomically. Atomicity is required here, as there are multiple callers that modify the flags, but are not under a common lock (br_fdb_update is the exception for br->hash_lock, br_fdb_external_learn_add for RTNL). Reviewed-by: Ido Schimmel <idosch@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Johannes Nixdorf <jnixdorf-oss@avm.de> Link: https://lore.kernel.org/r/20231016-fdb_limit-v5-2-32cddff87758@avm.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
38985e8c |
|
09-Oct-2023 |
Amit Cohen <amcohen@nvidia.com> |
net: Handle bulk delete policy in bridge driver The merge commit 92716869375b ("Merge branch 'br-flush-filtering'") added support for FDB flushing in bridge driver. The following patches will extend VXLAN driver to support FDB flushing as well. The netlink message for bulk delete is shared between the drivers. With the existing implementation, there is no way to prevent user from flushing with attributes that are not supported per driver. For example, when VNI will be added, user will not get an error for flush FDB entries in bridge with VNI, although this attribute is not relevant for bridge. As preparation for support of FDB flush in VXLAN driver, move the policy to be handled in bridge driver, later a new policy for VXLAN will be added in VXLAN driver. Do not pass 'vid' as part of ndo_fdb_del_bulk(), as this field is relevant only for bridge. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
4d66f235 |
|
26-Jul-2023 |
YueHaibing <yuehaibing@huawei.com> |
bridge: Remove unused declaration br_multicast_set_hash_max() Since commit 19e3a9c90c53 ("net: bridge: convert multicast to generic rhashtable") this is not used, so can remove it. Signed-off-by: YueHaibing <yuehaibing@huawei.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://lore.kernel.org/r/20230726143141.11704-1-yuehaibing@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
f2e2857b |
|
19-Jul-2023 |
Petr Machata <petrm@nvidia.com> |
net: switchdev: Add a helper to replay objects on a bridge port When a front panel joins a bridge via another netdevice (typically a LAG), the driver needs to learn about the objects configured on the bridge port. When the bridge port is offloaded by the driver for the first time, this can be achieved by passing a notifier to switchdev_bridge_port_offload(). The notifier is then invoked for the individual objects (such as VLANs) configured on the bridge, and can look for the interesting ones. Calling switchdev_bridge_port_offload() when the second port joins the bridge lower is unnecessary, but the replay is still needed. To that end, add a new function, switchdev_bridge_port_replay(), which does only the replay part of the _offload() function in exactly the same way as that function. Cc: Jiri Pirko <jiri@resnulli.us> Cc: Ivan Vecera <ivecera@redhat.com> Cc: Roopa Prabhu <roopa@nvidia.com> Cc: Nikolay Aleksandrov <razor@blackwall.org> Cc: bridge@lists.linux-foundation.org Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Danielle Ratson <danieller@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
29cfb2aa |
|
17-Jul-2023 |
Ido Schimmel <idosch@nvidia.com> |
bridge: Add backup nexthop ID support Add a new bridge port attribute that allows attaching a nexthop object ID to an skb that is redirected to a backup bridge port with VLAN tunneling enabled. Specifically, when redirecting a known unicast packet, read the backup nexthop ID from the bridge port that lost its carrier and set it in the bridge control block of the skb before forwarding it via the backup port. Note that reading the ID from the bridge port should not result in a cache miss as the ID is added next to the 'backup_port' field that was already accessed. After this change, the 'state' field still stays on the first cache line, together with other data path related fields such as 'flags and 'vlgrp': struct net_bridge_port { struct net_bridge * br; /* 0 8 */ struct net_device * dev; /* 8 8 */ netdevice_tracker dev_tracker; /* 16 0 */ struct list_head list; /* 16 16 */ long unsigned int flags; /* 32 8 */ struct net_bridge_vlan_group * vlgrp; /* 40 8 */ struct net_bridge_port * backup_port; /* 48 8 */ u32 backup_nhid; /* 56 4 */ u8 priority; /* 60 1 */ u8 state; /* 61 1 */ u16 port_no; /* 62 2 */ /* --- cacheline 1 boundary (64 bytes) --- */ [...] } __attribute__((__aligned__(8))); When forwarding an skb via a bridge port that has VLAN tunneling enabled, check if the backup nexthop ID stored in the bridge control block is valid (i.e., not zero). If so, instead of attaching the pre-allocated metadata (that only has the tunnel key set), allocate a new metadata, set both the tunnel key and the nexthop object ID and attach it to the skb. By default, do not dump the new attribute to user space as a value of zero is an invalid nexthop object ID. The above is useful for EVPN multihoming. When one of the links composing an Ethernet Segment (ES) fails, traffic needs to be redirected towards the host via one of the other ES peers. For example, if a host is multihomed to three different VTEPs, the backup port of each ES link needs to be set to the VXLAN device and the backup nexthop ID needs to point to an FDB nexthop group that includes the IP addresses of the other two VTEPs. The VXLAN driver will extract the ID from the metadata of the redirected skb, calculate its flow hash and forward it towards one of the other VTEPs. If the ID does not exist, or represents an invalid nexthop object, the VXLAN driver will drop the skb. This relieves the bridge driver from the need to validate the ID. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
7b4858df |
|
29-May-2023 |
Ido Schimmel <idosch@nvidia.com> |
skbuff: bridge: Add layer 2 miss indication For EVPN non-DF (Designated Forwarder) filtering we need to be able to prevent decapsulated traffic from being flooded to a multi-homed host. Filtering of multicast and broadcast traffic can be achieved using the following flower filter: # tc filter add dev bond0 egress pref 1 proto all flower indev vxlan0 dst_mac 01:00:00:00:00:00/01:00:00:00:00:00 action drop Unlike broadcast and multicast traffic, it is not currently possible to filter unknown unicast traffic. The classification into unknown unicast is performed by the bridge driver, but is not visible to other layers such as tc. Solve this by adding a new 'l2_miss' bit to the tc skb extension. Clear the bit whenever a packet enters the bridge (received from a bridge port or transmitted via the bridge) and set it if the packet did not match an FDB or MDB entry. If there is no skb extension and the bit needs to be cleared, then do not allocate one as no extension is equivalent to the bit being cleared. The bit is not set for broadcast packets as they never perform a lookup and therefore never incur a miss. A bit that is set for every flooded packet would also work for the current use case, but it does not allow us to differentiate between registered and unregistered multicast traffic, which might be useful in the future. To keep the performance impact to a minimum, the marking of packets is guarded by the 'tc_skb_ext_tc' static key. When 'false', the skb is not touched and an skb extension is not allocated. Instead, only a 5 bytes nop is executed, as demonstrated below for the call site in br_handle_frame(). Before the patch: ``` memset(skb->cb, 0, sizeof(struct br_input_skb_cb)); c37b09: 49 c7 44 24 28 00 00 movq $0x0,0x28(%r12) c37b10: 00 00 p = br_port_get_rcu(skb->dev); c37b12: 49 8b 44 24 10 mov 0x10(%r12),%rax memset(skb->cb, 0, sizeof(struct br_input_skb_cb)); c37b17: 49 c7 44 24 30 00 00 movq $0x0,0x30(%r12) c37b1e: 00 00 c37b20: 49 c7 44 24 38 00 00 movq $0x0,0x38(%r12) c37b27: 00 00 ``` After the patch (when static key is disabled): ``` memset(skb->cb, 0, sizeof(struct br_input_skb_cb)); c37c29: 49 c7 44 24 28 00 00 movq $0x0,0x28(%r12) c37c30: 00 00 c37c32: 49 8d 44 24 28 lea 0x28(%r12),%rax c37c37: 48 c7 40 08 00 00 00 movq $0x0,0x8(%rax) c37c3e: 00 c37c3f: 48 c7 40 10 00 00 00 movq $0x0,0x10(%rax) c37c46: 00 #ifdef CONFIG_HAVE_JUMP_LABEL_HACK static __always_inline bool arch_static_branch(struct static_key *key, bool branch) { asm_volatile_goto("1:" c37c47: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1) br_tc_skb_miss_set(skb, false); p = br_port_get_rcu(skb->dev); c37c4c: 49 8b 44 24 10 mov 0x10(%r12),%rax ``` Subsequent patches will extend the flower classifier to be able to match on the new 'l2_miss' bit and enable / disable the static key when filters that match on it are added / deleted. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Acked-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
3aca683e |
|
19-Apr-2023 |
Ido Schimmel <idosch@nvidia.com> |
bridge: Encapsulate data path neighbor suppression logic Currently, there are various places in the bridge data path that check whether neighbor suppression is enabled on a given bridge port. As a preparation for per-{Port, VLAN} neighbor suppression, encapsulate this logic in a function and pass the VLAN ID of the packet as an argument. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
a714e3ec |
|
19-Apr-2023 |
Ido Schimmel <idosch@nvidia.com> |
bridge: Add internal flags for per-{Port, VLAN} neighbor suppression Add two internal flags that will be used to enable / disable per-{Port, VLAN} neighbor suppression: 1. 'BR_NEIGH_VLAN_SUPPRESS': A per-port flag used to indicate that per-{Port, VLAN} neighbor suppression is enabled on the bridge port. When set, 'BR_NEIGH_SUPPRESS' has no effect. 2. 'BR_VLFLAG_NEIGH_SUPPRESS_ENABLED': A per-VLAN flag used to indicate that neighbor suppression is enabled on the given VLAN. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
e408336a |
|
19-Apr-2023 |
Ido Schimmel <idosch@nvidia.com> |
bridge: Pass VLAN ID to br_flood() Subsequent patches are going to add per-{Port, VLAN} neighbor suppression, which will require br_flood() to potentially suppress ARP / NS packets on a per-{Port, VLAN} basis. As a preparation, pass the VLAN ID of the packet as another argument to br_flood(). Signed-off-by: Ido Schimmel <idosch@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
cc7f5022 |
|
15-Mar-2023 |
Ido Schimmel <idosch@nvidia.com> |
rtnetlink: bridge: mcast: Move MDB handlers out of bridge driver Currently, the bridge driver registers handlers for MDB netlink messages, making it impossible for other drivers to implement MDB support. As a preparation for VXLAN MDB support, move the MDB handlers out of the bridge driver to the core rtnetlink code. The rtnetlink code will call into individual drivers by invoking their previously added MDB net device operations. Note that while the diffstat is large, the change is mechanical. It moves code out of the bridge driver to rtnetlink code. Also note that a similar change was made in 2012 with commit 77162022ab26 ("net: add generic PF_BRIDGE:RTM_ FDB hooks") that moved FDB handlers out of the bridge driver to the core rtnetlink code. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
c009de10 |
|
15-Mar-2023 |
Ido Schimmel <idosch@nvidia.com> |
bridge: mcast: Implement MDB net device operations Implement the previously added MDB net device operations in the bridge driver so that they could be invoked by core rtnetlink code in the next patch. The operations are identical to the existing br_mdb_{dump,add,del} functions. The '_new' suffix will be removed in the next patch. The functions are re-implemented in this patch to make the conversion in the next patch easier to review. Add dummy implementations when 'CONFIG_BRIDGE_IGMP_SNOOPING' is disabled, so that an error will be returned to user space when it is trying to add or delete an MDB entry. This is consistent with existing behavior where the bridge driver does not even register rtnetlink handlers for RTM_{NEW,DEL,GET}MDB messages when this Kconfig option is disabled. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
a1aee20d |
|
02-Feb-2023 |
Petr Machata <petrm@nvidia.com> |
net: bridge: Add netlink knobs for number / maximum MDB entries The previous patch added accounting for number of MDB entries per port and per port-VLAN, and the logic to verify that these values stay within configured bounds. However it didn't provide means to actually configure those bounds or read the occupancy. This patch does that. Two new netlink attributes are added for the MDB occupancy: IFLA_BRPORT_MCAST_N_GROUPS for the per-port occupancy and BRIDGE_VLANDB_ENTRY_MCAST_N_GROUPS for the per-port-VLAN occupancy. And another two for the maximum number of MDB entries: IFLA_BRPORT_MCAST_MAX_GROUPS for the per-port maximum, and BRIDGE_VLANDB_ENTRY_MCAST_MAX_GROUPS for the per-port-VLAN one. Note that the two new IFLA_BRPORT_ attributes prompt bumping of RTNL_SLAVE_MAX_TYPE to size the slave attribute tables large enough. The new attributes are used like this: # ip link add name br up type bridge vlan_filtering 1 mcast_snooping 1 \ mcast_vlan_snooping 1 mcast_querier 1 # ip link set dev v1 master br # bridge vlan add dev v1 vid 2 # bridge vlan set dev v1 vid 1 mcast_max_groups 1 # bridge mdb add dev br port v1 grp 230.1.2.3 temp vid 1 # bridge mdb add dev br port v1 grp 230.1.2.4 temp vid 1 Error: bridge: Port-VLAN is already in 1 groups, and mcast_max_groups=1. # bridge link set dev v1 mcast_max_groups 1 # bridge mdb add dev br port v1 grp 230.1.2.3 temp vid 2 Error: bridge: Port is already in 1 groups, and mcast_max_groups=1. # bridge -d link show 5: v1@v2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 master br [...] [...] mcast_n_groups 1 mcast_max_groups 1 # bridge -d vlan show port vlan-id br 1 PVID Egress Untagged state forwarding mcast_router 1 v1 1 PVID Egress Untagged [...] mcast_n_groups 1 mcast_max_groups 1 2 [...] mcast_n_groups 0 mcast_max_groups 0 Signed-off-by: Petr Machata <petrm@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
b57e8d87 |
|
02-Feb-2023 |
Petr Machata <petrm@nvidia.com> |
net: bridge: Maintain number of MDB entries in net_bridge_mcast_port The MDB maintained by the bridge is limited. When the bridge is configured for IGMP / MLD snooping, a buggy or malicious client can easily exhaust its capacity. In SW datapath, the capacity is configurable through the IFLA_BR_MCAST_HASH_MAX parameter, but ultimately is finite. Obviously a similar limit exists in the HW datapath for purposes of offloading. In order to prevent the issue of unilateral exhaustion of MDB resources, introduce two parameters in each of two contexts: - Per-port and per-port-VLAN number of MDB entries that the port is member in. - Per-port and (when BROPT_MCAST_VLAN_SNOOPING_ENABLED is enabled) per-port-VLAN maximum permitted number of MDB entries, or 0 for no limit. The per-port multicast context is used for tracking of MDB entries for the port as a whole. This is available for all bridges. The per-port-VLAN multicast context is then only available on VLAN-filtering bridges on VLANs that have multicast snooping on. With these changes in place, it will be possible to configure MDB limit for bridge as a whole, or any one port as a whole, or any single port-VLAN. Note that unlike the global limit, exhaustion of the per-port and per-port-VLAN maximums does not cause disablement of multicast snooping. It is also permitted to configure the local limit larger than hash_max, even though that is not useful. In this patch, introduce only the accounting for number of entries, and the max field itself, but not the means to toggle the max. The next patch introduces the netlink APIs to toggle and read the values. Signed-off-by: Petr Machata <petrm@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
976b3858 |
|
02-Feb-2023 |
Petr Machata <petrm@nvidia.com> |
net: bridge: Add br_multicast_del_port_group() Since cleaning up the effects of br_multicast_new_port_group() just consists of delisting and freeing the memory, the function br_mdb_add_group_star_g() inlines the corresponding code. In the following patches, number of per-port and per-port-VLAN MDB entries is going to be maintained, and that counter will have to be updated. Because that logic is going to be hidden in the br_multicast module, introduce a new hook intended to again remove a newly-created group. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
60977a0c |
|
02-Feb-2023 |
Petr Machata <petrm@nvidia.com> |
net: bridge: Add extack to br_multicast_new_port_group() Make it possible to set an extack in br_multicast_new_port_group(). Eventually, this function will check for per-port and per-port-vlan MDB maximums, and will use the extack to communicate the reason for the bounce. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
61f21835 |
|
10-Dec-2022 |
Ido Schimmel <idosch@nvidia.com> |
bridge: mcast: Support replacement of MDB port group entries Now that user space can specify additional attributes of port group entries such as filter mode and source list, it makes sense to allow user space to atomically modify these attributes by replacing entries instead of forcing user space to delete the entries and add them back. Replace MDB port group entries when the 'NLM_F_REPLACE' flag is specified in the netlink message header. When a (*, G) entry is replaced, update the following attributes: Source list, state, filter mode, protocol and flags. If the entry is temporary and in EXCLUDE mode, reset the group timer to the group membership interval. If the entry is temporary and in INCLUDE mode, reset the source timers of associated sources to the group membership interval. Examples: # bridge mdb replace dev br0 port dummy10 grp 239.1.1.1 permanent source_list 192.0.2.1,192.0.2.2 filter_mode include # bridge -d -s mdb show dev br0 port dummy10 grp 239.1.1.1 src 192.0.2.2 permanent filter_mode include proto static 0.00 dev br0 port dummy10 grp 239.1.1.1 src 192.0.2.1 permanent filter_mode include proto static 0.00 dev br0 port dummy10 grp 239.1.1.1 permanent filter_mode include source_list 192.0.2.2/0.00,192.0.2.1/0.00 proto static 0.00 # bridge mdb replace dev br0 port dummy10 grp 239.1.1.1 permanent source_list 192.0.2.1,192.0.2.3 filter_mode exclude proto zebra # bridge -d -s mdb show dev br0 port dummy10 grp 239.1.1.1 src 192.0.2.3 permanent filter_mode include proto zebra blocked 0.00 dev br0 port dummy10 grp 239.1.1.1 src 192.0.2.1 permanent filter_mode include proto zebra blocked 0.00 dev br0 port dummy10 grp 239.1.1.1 permanent filter_mode exclude source_list 192.0.2.3/0.00,192.0.2.1/0.00 proto zebra 0.00 # bridge mdb replace dev br0 port dummy10 grp 239.1.1.1 temp source_list 192.0.2.4,192.0.2.3 filter_mode include proto bgp # bridge -d -s mdb show dev br0 port dummy10 grp 239.1.1.1 src 192.0.2.4 temp filter_mode include proto bgp 0.00 dev br0 port dummy10 grp 239.1.1.1 src 192.0.2.3 temp filter_mode include proto bgp 0.00 dev br0 port dummy10 grp 239.1.1.1 temp filter_mode include source_list 192.0.2.4/259.44,192.0.2.3/259.44 proto bgp 0.00 Signed-off-by: Ido Schimmel <idosch@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
1d7b66a7 |
|
10-Dec-2022 |
Ido Schimmel <idosch@nvidia.com> |
bridge: mcast: Allow user space to specify MDB entry routing protocol Add the 'MDBE_ATTR_RTPORT' attribute to allow user space to specify the routing protocol of the MDB port group entry. Enforce a minimum value of 'RTPROT_STATIC' to prevent user space from using protocol values that should only be set by the kernel (e.g., 'RTPROT_KERNEL'). Maintain backward compatibility by defaulting to 'RTPROT_STATIC'. The protocol is already visible to user space in RTM_NEWMDB responses and notifications via the 'MDBA_MDB_EATTR_RTPROT' attribute. The routing protocol allows a routing daemon to distinguish between entries configured by it and those configured by the administrator. Once MDB flush is supported, the protocol can be used as a criterion according to which the flush is performed. Examples: # bridge mdb add dev br0 port dummy10 grp 239.1.1.1 permanent proto kernel Error: integer out of range. # bridge mdb add dev br0 port dummy10 grp 239.1.1.1 permanent proto static # bridge mdb add dev br0 port dummy10 grp 239.1.1.1 src 192.0.2.1 permanent proto zebra # bridge mdb add dev br0 port dummy10 grp 239.1.1.2 permanent source_list 198.51.100.1,198.51.100.2 filter_mode include proto 250 # bridge -d mdb show dev br0 port dummy10 grp 239.1.1.2 src 198.51.100.2 permanent filter_mode include proto 250 dev br0 port dummy10 grp 239.1.1.2 src 198.51.100.1 permanent filter_mode include proto 250 dev br0 port dummy10 grp 239.1.1.2 permanent filter_mode include source_list 198.51.100.2/0.00,198.51.100.1/0.00 proto 250 dev br0 port dummy10 grp 239.1.1.1 src 192.0.2.1 permanent filter_mode include proto zebra dev br0 port dummy10 grp 239.1.1.1 permanent filter_mode exclude proto static Signed-off-by: Ido Schimmel <idosch@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
b1c8fec8 |
|
10-Dec-2022 |
Ido Schimmel <idosch@nvidia.com> |
bridge: mcast: Add support for (*, G) with a source list and filter mode In preparation for allowing user space to add (*, G) entries with a source list and associated filter mode, add the necessary plumbing to handle such requests. Extend the MDB configuration structure with a currently empty source array and filter mode that is currently hard coded to EXCLUDE. Add the source entries and the corresponding (S, G) entries before making the new (*, G) port group entry visible to the data path. Handle the creation of each source entry in a similar fashion to how it is created from the data path in response to received Membership Reports: Create the source entry, arm the source timer (if needed), add a corresponding (S, G) forwarding entry and finally mark the source entry as installed (by user space). Add the (S, G) entry by populating an MDB configuration structure and calling br_mdb_add_group_sg() as if a new entry is created by user space, with the sole difference that the 'src_entry' field is set to make sure that the group timer of such entries is never armed. Note that it is not currently possible to add more than 32 source entries to a port group entry. If this proves to be a problem we can either increase 'PG_SRC_ENT_LIMIT' or avoid forcing a limit on entries created by user space. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
079afd66 |
|
10-Dec-2022 |
Ido Schimmel <idosch@nvidia.com> |
bridge: mcast: Avoid arming group timer when (S, G) corresponds to a source User space will soon be able to install a (*, G) with a source list, prompting the creation of a (S, G) entry for each source. In this case, the group timer of the (S, G) entry should never be set. Solve this by adding a new field to the MDB configuration structure that denotes whether the (S, G) corresponds to a source or not. The field will be set in a subsequent patch where br_mdb_add_group_sg() is called in order to create a (S, G) entry for each user provided source. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
a01ecb17 |
|
10-Dec-2022 |
Ido Schimmel <idosch@nvidia.com> |
bridge: mcast: Add a flag for user installed source entries There are a few places where the bridge driver differentiates between (S, G) entries installed by the kernel (in response to Membership Reports) and those installed by user space. One of them is when deleting an (S, G) entry corresponding to a source entry that is being deleted. While user space cannot currently add a source entry to a (*, G), it can add an (S, G) entry that later corresponds to a source entry created by the reception of a Membership Report. If this source entry is later deleted because its source timer expired or because the (*, G) entry is being deleted, the bridge driver will not delete the corresponding (S, G) entry if it was added by user space as permanent. This is going to be a problem when the ability to install a (*, G) with a source list is exposed to user space. In this case, when user space installs the (*, G) as permanent, then all the (S, G) entries corresponding to its source list will also be installed as permanent. When user space deletes the (*, G), all the source entries will be deleted and the expectation is that the corresponding (S, G) entries will be deleted as well. Solve this by introducing a new source entry flag denoting that the entry was installed by user space. When the entry is deleted, delete the corresponding (S, G) entry even if it was installed by user space as permanent, as the flag tells us that it was installed in response to the source entry being created. The flag will be set in a subsequent patch where source entries are created in response to user requests. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
083e3534 |
|
10-Dec-2022 |
Ido Schimmel <idosch@nvidia.com> |
bridge: mcast: Expose __br_multicast_del_group_src() Expose __br_multicast_del_group_src() which is symmetric to br_multicast_new_group_src() and does not remove the installed {S, G} forwarding entry, unlike br_multicast_del_group_src(). The function will be used in the error path when user space was able to add a new source entry, but failed to install a corresponding forwarding entry. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
fd0c6961 |
|
10-Dec-2022 |
Ido Schimmel <idosch@nvidia.com> |
bridge: mcast: Expose br_multicast_new_group_src() Currently, new group source entries are only created in response to received Membership Reports. Subsequent patches are going to allow user space to install (*, G) entries with a source list. As a preparatory step, expose br_multicast_new_group_src() so that it could later be invoked from the MDB code (i.e., br_mdb.c) that handles RTM_NEWMDB messages. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
f86c3e2c |
|
05-Dec-2022 |
Ido Schimmel <idosch@nvidia.com> |
bridge: mcast: Constify 'group' argument in br_multicast_new_port_group() The 'group' argument is not modified, so mark it as 'const'. It will allow us to constify arguments of the callers of this function in future patches. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
cb453926 |
|
05-Dec-2022 |
Ido Schimmel <idosch@nvidia.com> |
bridge: mcast: Centralize netlink attribute parsing Netlink attributes are currently passed deep in the MDB creation call chain, making it difficult to add new attributes. In addition, some validity checks are performed under the multicast lock although they can be performed before it is ever acquired. As a first step towards solving these issues, parse the RTM_{NEW,DEL}MDB messages into a configuration structure, relieving other functions from the need to handle raw netlink attributes. Subsequent patches will convert the MDB code to use this configuration structure. This is consistent with how other rtnetlink objects are handled, such as routes and nexthops. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
27fabd02 |
|
08-Nov-2022 |
Hans J. Schultz <netdev@kapio-technology.com> |
bridge: switchdev: Allow device drivers to install locked FDB entries When the bridge is offloaded to hardware, FDB entries are learned and aged-out by the hardware. Some device drivers synchronize the hardware and software FDBs by generating switchdev events towards the bridge. When a port is locked, the hardware must not learn autonomously, as otherwise any host will blindly gain authorization. Instead, the hardware should generate events regarding hosts that are trying to gain authorization and their MAC addresses should be notified by the device driver as locked FDB entries towards the bridge driver. Allow device drivers to notify the bridge driver about such entries by extending the 'switchdev_notifier_fdb_info' structure with the 'locked' bit. The bit can only be set by device drivers and not by the bridge driver. Prevent a locked entry from being installed if MAB is not enabled on the bridge port. If an entry already exists in the bridge driver, reject the locked entry if the current entry does not have the "locked" flag set or if it points to a different port. The same semantics are implemented in the software data path. Signed-off-by: Hans J. Schultz <netdev@kapio-technology.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
a35ec8e3 |
|
01-Nov-2022 |
Hans J. Schultz <netdev@kapio-technology.com> |
bridge: Add MAC Authentication Bypass (MAB) support Hosts that support 802.1X authentication are able to authenticate themselves by exchanging EAPOL frames with an authenticator (Ethernet bridge, in this case) and an authentication server. Access to the network is only granted by the authenticator to successfully authenticated hosts. The above is implemented in the bridge using the "locked" bridge port option. When enabled, link-local frames (e.g., EAPOL) can be locally received by the bridge, but all other frames are dropped unless the host is authenticated. That is, unless the user space control plane installed an FDB entry according to which the source address of the frame is located behind the locked ingress port. The entry can be dynamic, in which case learning needs to be enabled so that the entry will be refreshed by incoming traffic. There are deployments in which not all the devices connected to the authenticator (the bridge) support 802.1X. Such devices can include printers and cameras. One option to support such deployments is to unlock the bridge ports connecting these devices, but a slightly more secure option is to use MAB. When MAB is enabled, the MAC address of the connected device is used as the user name and password for the authentication. For MAB to work, the user space control plane needs to be notified about MAC addresses that are trying to gain access so that they will be compared against an allow list. This can be implemented via the regular learning process with the sole difference that learned FDB entries are installed with a new "locked" flag indicating that the entry cannot be used to authenticate the device. The flag cannot be set by user space, but user space can clear the flag by replacing the entry, thereby authenticating the device. Locked FDB entries implement the following semantics with regards to roaming, aging and forwarding: 1. Roaming: Locked FDB entries can roam to unlocked (authorized) ports, in which case the "locked" flag is cleared. FDB entries cannot roam to locked ports regardless of MAB being enabled or not. Therefore, locked FDB entries are only created if an FDB entry with the given {MAC, VID} does not already exist. This behavior prevents unauthenticated devices from disrupting traffic destined to already authenticated devices. 2. Aging: Locked FDB entries age and refresh by incoming traffic like regular entries. 3. Forwarding: Locked FDB entries forward traffic like regular entries. If user space detects an unauthorized MAC behind a locked port and wishes to prevent traffic with this MAC DA from reaching the host, it can do so using tc or a different mechanism. Enable the above behavior using a new bridge port option called "mab". It can only be enabled on a bridge port that is both locked and has learning enabled. Locked FDB entries are flushed from the port once MAB is disabled. A new option is added because there are pure 802.1X deployments that are not interested in notifications about locked FDB entries. Signed-off-by: Hans J. Schultz <netdev@kapio-technology.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
ca4567f1 |
|
05-May-2022 |
Alaa Mohamed <eng.alaamohamedsoliman.am@gmail.com> |
rtnetlink: add extack support in fdb del handlers Add extack support to .ndo_fdb_del in netdevice.h and all related methods. Signed-off-by: Alaa Mohamed <eng.alaamohamedsoliman.am@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
564445fb |
|
13-Apr-2022 |
Nikolay Aleksandrov <razor@blackwall.org> |
net: bridge: fdb: add support for flush filtering based on ndm flags and state Add support for fdb flush filtering based on ndm flags and state. NDM state and flags are mapped to bridge-specific flags and matched according to the specified masks. NTF_USE is used to represent added_by_user flag since it sets it on fdb add and we don't have a 1:1 mapping for it. Only allowed bits can be set, NTF_SELF and NTF_MASTER are ignored. Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
1f78ee14 |
|
13-Apr-2022 |
Nikolay Aleksandrov <razor@blackwall.org> |
net: bridge: fdb: add support for fine-grained flushing Add the ability to specify exactly which fdbs to be flushed. They are described by a new structure - net_bridge_fdb_flush_desc. Currently it can match on port/bridge ifindex, vlan id and fdb flags. It is used to describe the existing dynamic fdb flush operation. Note that this flush operation doesn't treat permanent entries in a special way (fdb_delete vs fdb_delete_local), it will delete them regardless if any port is using them, so currently it can't directly replace deletes which need to handle that case, although we can extend it later for that too. Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
edaef191 |
|
13-Apr-2022 |
Nikolay Aleksandrov <razor@blackwall.org> |
net: bridge: fdb: add ndo_fdb_del_bulk Add a minimal ndo_fdb_del_bulk implementation which flushes all entries. Support for more fine-grained filtering will be added in the following patches. Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
122c2948 |
|
16-Mar-2022 |
Tobias Waldekranz <tobias@waldekranz.com> |
net: bridge: mst: Support setting and reporting MST port states Make it possible to change the port state in a given MSTI by extending the bridge port netlink interface (RTM_SETLINK on PF_BRIDGE).The proposed iproute2 interface would be: bridge mst set dev <PORT> msti <MSTI> state <STATE> Current states in all applicable MSTIs can also be dumped via a corresponding RTM_GETLINK. The proposed iproute interface looks like this: $ bridge mst port msti vb1 0 state forwarding 100 state disabled vb2 0 state forwarding 100 state forwarding The preexisting per-VLAN states are still valid in the MST mode (although they are read-only), and can be queried as usual if one is interested in knowing a particular VLAN's state without having to care about the VID to MSTI mapping (in this example VLAN 20 and 30 are bound to MSTI 100): $ bridge -d vlan port vlan-id vb1 10 state forwarding mcast_router 1 20 state disabled mcast_router 1 30 state disabled mcast_router 1 40 state forwarding mcast_router 1 vb2 10 state forwarding mcast_router 1 20 state forwarding mcast_router 1 30 state forwarding mcast_router 1 40 state forwarding mcast_router 1 Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
8c678d60 |
|
16-Mar-2022 |
Tobias Waldekranz <tobias@waldekranz.com> |
net: bridge: mst: Allow changing a VLAN's MSTI Allow a VLAN to move out of the CST (MSTI 0), to an independent tree. The user manages the VID to MSTI mappings via a global VLAN setting. The proposed iproute2 interface would be: bridge vlan global set dev br0 vid <VID> msti <MSTI> Changing the state in non-zero MSTIs is still not supported, but will be addressed in upcoming changes. Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
ec7328b5 |
|
16-Mar-2022 |
Tobias Waldekranz <tobias@waldekranz.com> |
net: bridge: mst: Multiple Spanning Tree (MST) mode Allow the user to switch from the current per-VLAN STP mode to an MST mode. Up to this point, per-VLAN STP states where always isolated from each other. This is in contrast to the MSTP standard (802.1Q-2018, Clause 13.5), where VLANs are grouped into MST instances (MSTIs), and the state is managed on a per-MSTI level, rather that at the per-VLAN level. Perhaps due to the prevalence of the standard, many switching ASICs are built after the same model. Therefore, add a corresponding MST mode to the bridge, which we can later add offloading support for in a straight-forward way. For now, all VLANs are fixed to MSTI 0, also called the Common Spanning Tree (CST). That is, all VLANs will follow the port-global state. Upcoming changes will make this actually useful by allowing VLANs to be mapped to arbitrary MSTIs and allow individual MSTI states to be changed. Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
8d23a54f |
|
15-Feb-2022 |
Vladimir Oltean <vladimir.oltean@nxp.com> |
net: bridge: switchdev: differentiate new VLANs from changed ones br_switchdev_port_vlan_add() currently emits a SWITCHDEV_PORT_OBJ_ADD event with a SWITCHDEV_OBJ_ID_PORT_VLAN for 2 distinct cases: - a struct net_bridge_vlan got created - an existing struct net_bridge_vlan was modified This makes it impossible for switchdev drivers to properly balance PORT_OBJ_ADD with PORT_OBJ_DEL events, so if we want to allow that to happen, we must provide a way for drivers to distinguish between a VLAN with changed flags and a new one. Annotate struct switchdev_obj_port_vlan with a "bool changed" that distinguishes the 2 cases above. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
b2dcdc7f |
|
06-Dec-2021 |
Eric Dumazet <edumazet@google.com> |
net: bridge: add net device refcount tracker Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
168fed98 |
|
28-Dec-2021 |
Nikolay Aleksandrov <nikolay@nvidia.com> |
net: bridge: mcast: fix br_multicast_ctx_vlan_global_disabled helper We need to first check if the context is a vlan one, then we need to check the global bridge multicast vlan snooping flag, and finally the vlan's multicast flag, otherwise we will unnecessarily enable vlan mcast processing (e.g. querier timers). Fixes: 7b54aaaf53cb ("net: bridge: multicast: add vlan state initialization and control") Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Link: https://lore.kernel.org/r/20211228153142.536969-1-nikolay@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
f83a112b |
|
27-Dec-2021 |
Nikolay Aleksandrov <nikolay@nvidia.com> |
net: bridge: mcast: add and enforce startup query interval minimum As reported[1] if startup query interval is set too low in combination with large number of startup queries and we have multiple bridges or even a single bridge with multiple querier vlans configured we can crash the machine. Add a 1 second minimum which must be enforced by overwriting the value if set lower (i.e. without returning an error) to avoid breaking user-space. If that happens a log message is emitted to let the admin know that the startup interval has been set to the minimum. It doesn't make sense to make the startup interval lower than the normal query interval so use the same value of 1 second. The issue has been present since these intervals could be user-controlled. [1] https://lore.kernel.org/netdev/e8b9ce41-57b9-b6e2-a46a-ff9c791cf0ba@gmail.com/ Fixes: d902eee43f19 ("bridge: Add multicast count/interval sysfs entries") Reported-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
99b40610 |
|
27-Dec-2021 |
Nikolay Aleksandrov <nikolay@nvidia.com> |
net: bridge: mcast: add and enforce query interval minimum As reported[1] if query interval is set too low and we have multiple bridges or even a single bridge with multiple querier vlans configured we can crash the machine. Add a 1 second minimum which must be enforced by overwriting the value if set lower (i.e. without returning an error) to avoid breaking user-space. If that happens a log message is emitted to let the administrator know that the interval has been set to the minimum. The issue has been present since these intervals could be user-controlled. [1] https://lore.kernel.org/netdev/e8b9ce41-57b9-b6e2-a46a-ff9c791cf0ba@gmail.com/ Fixes: d902eee43f19 ("bridge: Add multicast count/interval sysfs entries") Reported-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
ae039350 |
|
29-Oct-2021 |
Vladimir Oltean <vladimir.oltean@nxp.com> |
net: bridge: switchdev: fix shim definition for br_switchdev_mdb_notify br_switchdev_mdb_notify() is conditionally compiled only when CONFIG_NET_SWITCHDEV=y and CONFIG_BRIDGE_IGMP_SNOOPING=y. It is called from br_mdb.c, which is conditionally compiled only when CONFIG_BRIDGE_IGMP_SNOOPING=y. The shim definition of br_switchdev_mdb_notify() is therefore needed for the case where CONFIG_NET_SWITCHDEV=n, however we mistakenly put it there for the case where CONFIG_BRIDGE_IGMP_SNOOPING=n. This results in build failures when CONFIG_BRIDGE_IGMP_SNOOPING=y and CONFIG_NET_SWITCHDEV=n. To fix this, put the shim definition right next to br_switchdev_fdb_notify(), which is properly guarded by NET_SWITCHDEV=n. Since this is called only from br_mdb.c, we need not take any extra safety precautions, when NET_SWITCHDEV=n and BRIDGE_IGMP_SNOOPING=n, this shim definition will be absent but nobody will be needing it. Fixes: 9776457c784f ("net: bridge: mdb: move all switchdev logic to br_switchdev.c") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://lore.kernel.org/r/20211029223606.3450523-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
829e050e |
|
28-Oct-2021 |
Ivan Vecera <ivecera@redhat.com> |
net: bridge: fix uninitialized variables when BRIDGE_CFM is disabled Function br_get_link_af_size_filtered() calls br_cfm_{,peer}_mep_count() that return a count. When BRIDGE_CFM is not enabled these functions simply return -EOPNOTSUPP but do not modify count parameter and calling function then works with uninitialized variables. Modify these inline functions to return zero in count parameter. Fixes: b6d0425b816e ("bridge: cfm: Netlink Notifications.") Cc: Henrik Bjoernlund <henrik.bjoernlund@microchip.com> Signed-off-by: Ivan Vecera <ivecera@redhat.com> Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
9776457c |
|
27-Oct-2021 |
Vladimir Oltean <vladimir.oltean@nxp.com> |
net: bridge: mdb: move all switchdev logic to br_switchdev.c The following functions: br_mdb_complete br_switchdev_mdb_populate br_mdb_replay_one br_mdb_queue_one br_mdb_replay br_mdb_switchdev_host_port br_mdb_switchdev_host br_switchdev_mdb_notify are only accessible from code paths where CONFIG_NET_SWITCHDEV is enabled. So move them to br_switchdev.c, in order for that code to be compiled out if that config option is disabled. Note that br_switchdev.c gets build regardless of whether CONFIG_BRIDGE_IGMP_SNOOPING is enabled or not, whereas br_mdb.c only got built when CONFIG_BRIDGE_IGMP_SNOOPING was enabled. So to preserve correct compilation with CONFIG_BRIDGE_IGMP_SNOOPING being disabled, we must now place an #ifdef around these functions in br_switchdev.c. The offending bridge data structures that need this are br->multicast_lock and br->mdb_list, these are also compiled out of struct net_bridge when CONFIG_BRIDGE_IGMP_SNOOPING is turned off. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
4a6849e4 |
|
27-Oct-2021 |
Vladimir Oltean <vladimir.oltean@nxp.com> |
net: bridge: move br_vlan_replay to br_switchdev.c br_vlan_replay() is relevant only if CONFIG_NET_SWITCHDEV is enabled, so move it to br_switchdev.c. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
c5f6e5eb |
|
27-Oct-2021 |
Vladimir Oltean <vladimir.oltean@nxp.com> |
net: bridge: provide shim definition for br_vlan_flags br_vlan_replay() needs this, and we're preparing to move it to br_switchdev.c, which will be compiled regardless of whether or not CONFIG_BRIDGE_VLAN_FILTERING is enabled. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
5cda5272 |
|
26-Oct-2021 |
Vladimir Oltean <vladimir.oltean@nxp.com> |
net: bridge: move br_fdb_replay inside br_switchdev.c br_fdb_replay is only called from switchdev code paths, so it makes sense to be disabled if switchdev is not enabled in the first place. As opposed to br_mdb_replay and br_vlan_replay which might be turned off depending on bridge support for multicast and VLANs, FDB support is always on. So moving br_mdb_replay and br_vlan_replay inside br_switchdev.c would mean adding some #ifdef's in br_switchdev.c, so we keep those where they are. The reason for the movement is that in future changes there will be some code reuse between br_switchdev_fdb_notify and br_fdb_replay. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
f6814fdc |
|
26-Oct-2021 |
Vladimir Oltean <vladimir.oltean@nxp.com> |
net: bridge: rename br_fdb_insert to br_fdb_add_local br_fdb_insert() is a wrapper over fdb_insert() that also takes the bridge hash_lock. With fdb_insert() being renamed to fdb_add_local(), rename br_fdb_insert() to br_fdb_add_local(). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
fac3cb82 |
|
14-Oct-2021 |
Nikolay Aleksandrov <nikolay@nvidia.com> |
net: bridge: mcast: use multicast_membership_interval for IGMPv3 When I added IGMPv3 support I decided to follow the RFC for computing the GMI dynamically: " 8.4. Group Membership Interval The Group Membership Interval is the amount of time that must pass before a multicast router decides there are no more members of a group or a particular source on a network. This value MUST be ((the Robustness Variable) times (the Query Interval)) plus (one Query Response Interval)." But that actually is inconsistent with how the bridge used to compute it for IGMPv2, where it was user-configurable that has a correct default value but it is up to user-space to maintain it. This would make it consistent with the other timer values which are also maintained correct by the user instead of being dynamically computed. It also changes back to the previous user-expected GMI behaviour for IGMPv3 queries which were supported before IGMPv3 was added. Note that to properly compute it dynamically we would need to add support for "Robustness Variable" which is currently missing. Reported-by: Hangbin Liu <liuhangbin@gmail.com> Fixes: 0436862e417e ("net: bridge: mcast: support for IGMPv3/MLDv2 ALLOW_NEW_SOURCES report") Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
f936bb42 |
|
28-Sep-2021 |
Thomas Gleixner <tglx@linutronix.de> |
net: bridge: mcast: Associate the seqcount with its protecting lock. The sequence count bridge_mcast_querier::seq is protected by net_bridge::multicast_lock but seqcount_init() does not associate the seqcount with the lock. This leads to a warning on PREEMPT_RT because preemption is still enabled. Let seqcount_init() associate the seqcount with lock that protects the write section. Remove lockdep_assert_held_once() because lockdep already checks whether the associated lock is held. Fixes: 67b746f94ff39 ("net: bridge: mcast: make sure querier port/address updates are consistent") Reported-by: Mike Galbraith <efault@gmx.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Tested-by: Mike Galbraith <efault@gmx.de> Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com> Link: https://lore.kernel.org/r/20210928141049.593833-1-bigeasy@linutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
2796d846 |
|
20-Aug-2021 |
Nikolay Aleksandrov <nikolay@nvidia.com> |
net: bridge: vlan: convert mcast router global option to per-vlan entry The per-vlan router option controls the port/vlan and host vlan entries' mcast router config. The global option controlled only the host vlan config, but that is unnecessary and incosistent as it's not really a global vlan option, but rather bridge option to control host router config, so convert BRIDGE_VLANDB_GOPTS_MCAST_ROUTER to BRIDGE_VLANDB_ENTRY_MCAST_ROUTER which can be used to control both host vlan and port vlan mcast router config. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
a53581d5 |
|
20-Aug-2021 |
Nikolay Aleksandrov <nikolay@nvidia.com> |
net: bridge: mcast: br_multicast_set_port_router takes multicast context as argument Change br_multicast_set_port_router to take port multicast context as its first argument so we can later use it to control port/vlan mcast router option. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
affce9a7 |
|
16-Aug-2021 |
Nikolay Aleksandrov <nikolay@nvidia.com> |
net: bridge: mcast: toggle also host vlan state in br_multicast_toggle_vlan When changing vlan mcast state by br_multicast_toggle_vlan it iterates over all ports and enables/disables the port mcast ctx based on the new state, but I forgot to update the host vlan (bridge master vlan entry) with the new state so it will be left out. Also that function is not used outside of br_multicast.c, so make it static. Fixes: f4b7002a7076 ("net: bridge: add vlan mcast snooping knob") Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
05d6f38e |
|
16-Aug-2021 |
Nikolay Aleksandrov <nikolay@nvidia.com> |
net: bridge: vlan: account for router port lists when notifying When sending a global vlan notification we should account for the number of router ports when allocating the skb, otherwise we might end up losing notifications. Fixes: dc002875c22b ("net: bridge: vlan: use br_rports_fill_info() to export mcast router ports") Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
c7fa1d9b |
|
13-Aug-2021 |
Nikolay Aleksandrov <nikolay@nvidia.com> |
net: bridge: mcast: dump ipv4 querier state Add support for dumping global IPv4 querier state, we dump the state only if our own querier is enabled or there has been another external querier which has won the election. For the bridge global state we use a new attribute IFLA_BR_MCAST_QUERIER_STATE and embed the state inside. The structure is: [IFLA_BR_MCAST_QUERIER_STATE] `[BRIDGE_QUERIER_IP_ADDRESS] - ip address of the querier `[BRIDGE_QUERIER_IP_PORT] - bridge port ifindex where the querier was seen (set only if external querier) `[BRIDGE_QUERIER_IP_OTHER_TIMER] - other querier timeout Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
67b746f9 |
|
13-Aug-2021 |
Nikolay Aleksandrov <nikolay@nvidia.com> |
net: bridge: mcast: make sure querier port/address updates are consistent Use a sequence counter to make sure port/address updates can be read consistently without requiring the bridge multicast_lock. We need to zero out the port and address when the other querier has expired and we're about to select ourselves as querier. br_multicast_read_querier will be used later when dumping querier state. Updates are done only with the multicast spinlock and softirqs disabled, while reads are done from process context and from softirqs (due to notifications). Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
bb18ef8e |
|
13-Aug-2021 |
Nikolay Aleksandrov <nikolay@nvidia.com> |
net: bridge: mcast: record querier port device ifindex instead of pointer Currently when a querier port is detected its net_bridge_port pointer is recorded, but it's used only for comparisons so it's fine to have stale pointer, in order to dereference and use the port pointer a proper accounting of its usage must be implemented adding unnecessary complexity. To solve the problem we can just store the netdevice ifindex instead of the port pointer and retrieve the bridge port. It is a best effort and the device needs to be validated that is still part of that bridge before use, but that is small price to pay for avoiding querier reference counting for each port/vlan. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
dc002875 |
|
10-Aug-2021 |
Nikolay Aleksandrov <nikolay@nvidia.com> |
net: bridge: vlan: use br_rports_fill_info() to export mcast router ports Embed the standard multicast router port export by br_rports_fill_info() into a new global vlan attribute BRIDGE_VLANDB_GOPTS_MCAST_ROUTER_PORTS. In order to have the same format for the global bridge mcast context and the per-vlan mcast context we need a double-nesting: - BRIDGE_VLANDB_GOPTS_MCAST_ROUTER_PORTS - MDBA_ROUTER Currently we don't compare router lists, if any router port exists in the bridge mcast contexts we consider their option sets as different and export them separately. In addition we export the router port vlan id when dumping similar to the router port notification format. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
a97df080 |
|
10-Aug-2021 |
Nikolay Aleksandrov <nikolay@nvidia.com> |
net: bridge: vlan: add support for mcast router global option Add support to change and retrieve global vlan multicast router state which is used for the bridge itself. We just need to pass multicast context to br_multicast_set_router instead of bridge device and the rest of the logic remains the same. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
62938182 |
|
10-Aug-2021 |
Nikolay Aleksandrov <nikolay@nvidia.com> |
net: bridge: vlan: add support for mcast querier global option Add support to change and retrieve global vlan multicast querier state. We just need to pass multicast context to br_multicast_set_querier instead of bridge device and the rest of the logic remains the same. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
cb486ce9 |
|
10-Aug-2021 |
Nikolay Aleksandrov <nikolay@nvidia.com> |
net: bridge: mcast: querier and query state affect only current context type It is a minor optimization and better behaviour to make sure querier and query sending routines affect only the matching multicast context depending if vlan snooping is enabled (vlan ctx vs bridge ctx). It also avoids sending unnecessary extra query packets. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
4d5b4e84 |
|
10-Aug-2021 |
Nikolay Aleksandrov <nikolay@nvidia.com> |
net: bridge: mcast: move querier state to the multicast context We need to have the querier state per multicast context in order to have per-vlan control, so remove the internal option bit and move it to the multicast context. Also annotate the lockless reads of the new variable. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
941121ee |
|
10-Aug-2021 |
Nikolay Aleksandrov <nikolay@nvidia.com> |
net: bridge: vlan: add support for mcast startup query interval global option Add support to change and retrieve global vlan multicast startup query interval option. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
42521450 |
|
10-Aug-2021 |
Nikolay Aleksandrov <nikolay@nvidia.com> |
net: bridge: vlan: add support for mcast query response interval global option Add support to change and retrieve global vlan multicast query response interval option. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
d6c08aba |
|
10-Aug-2021 |
Nikolay Aleksandrov <nikolay@nvidia.com> |
net: bridge: vlan: add support for mcast query interval global option Add support to change and retrieve global vlan multicast query interval option. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
cd9269d4 |
|
10-Aug-2021 |
Nikolay Aleksandrov <nikolay@nvidia.com> |
net: bridge: vlan: add support for mcast querier interval global option Add support to change and retrieve global vlan multicast querier interval option. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
2da0aea2 |
|
10-Aug-2021 |
Nikolay Aleksandrov <nikolay@nvidia.com> |
net: bridge: vlan: add support for mcast membership interval global option Add support to change and retrieve global vlan multicast membership interval option. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
77f6abab |
|
10-Aug-2021 |
Nikolay Aleksandrov <nikolay@nvidia.com> |
net: bridge: vlan: add support for mcast last member interval global option Add support to change and retrieve global vlan multicast last member interval option. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
50725f6e |
|
10-Aug-2021 |
Nikolay Aleksandrov <nikolay@nvidia.com> |
net: bridge: vlan: add support for mcast startup query count global option Add support to change and retrieve global vlan multicast startup query count option. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
931ba87d |
|
10-Aug-2021 |
Nikolay Aleksandrov <nikolay@nvidia.com> |
net: bridge: vlan: add support for mcast last member count global option Add support to change and retrieve global vlan multicast last member count option. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
df271cd6 |
|
10-Aug-2021 |
Nikolay Aleksandrov <nikolay@nvidia.com> |
net: bridge: vlan: add support for mcast igmp/mld version global options Add support to change and retrieve global vlan IGMP/MLD versions. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
957e2235 |
|
03-Aug-2021 |
Vladimir Oltean <vladimir.oltean@nxp.com> |
net: make switchdev_bridge_port_{,unoffload} loosely coupled with the bridge With the introduction of explicit offloading API in switchdev in commit 2f5dc00f7a3e ("net: bridge: switchdev: let drivers inform which bridge ports are offloaded"), we started having Ethernet switch drivers calling directly into a function exported by net/bridge/br_switchdev.c, which is a function exported by the bridge driver. This means that drivers that did not have an explicit dependency on the bridge before, like cpsw and am65-cpsw, now do - otherwise it is not possible to call a symbol exported by a driver that can be built as module unless you are a module too. There was an attempt to solve the dependency issue in the form of commit b0e81817629a ("net: build all switchdev drivers as modules when the bridge is a module"). Grygorii Strashko, however, says about it: | In my opinion, the problem is a bit bigger here than just fixing the | build :( | | In case, of ^cpsw the switchdev mode is kinda optional and in many | cases (especially for testing purposes, NFS) the multi-mac mode is | still preferable mode. | | There were no such tight dependency between switchdev drivers and | bridge core before and switchdev serviced as independent, notification | based layer between them, so ^cpsw still can be "Y" and bridge can be | "M". Now for mostly every kernel build configuration the CONFIG_BRIDGE | will need to be set as "Y", or we will have to update drivers to | support build with BRIDGE=n and maintain separate builds for | networking vs non-networking testing. But is this enough? Wouldn't | it cause 'chain reaction' required to add more and more "Y" options | (like CONFIG_VLAN_8021Q)? | | PS. Just to be sure we on the same page - ARM builds will be forced | (with this patch) to have CONFIG_TI_CPSW_SWITCHDEV=m and so all our | automation testing will just fail with omap2plus_defconfig. In the light of this, it would be desirable for some configurations to avoid dependencies between switchdev drivers and the bridge, and have the switchdev mode as completely optional within the driver. Arnd Bergmann also tried to write a patch which better expressed the build time dependency for Ethernet switch drivers where the switchdev support is optional, like cpsw/am65-cpsw, and this made the drivers follow the bridge (compile as module if the bridge is a module) only if the optional switchdev support in the driver was enabled in the first place: https://patchwork.kernel.org/project/netdevbpf/patch/20210802144813.1152762-1-arnd@kernel.org/ but this still did not solve the fact that cpsw and am65-cpsw now must be built as modules when the bridge is a module - it just expressed correctly that optional dependency. But the new behavior is an apparent regression from Grygorii's perspective. So to support the use case where the Ethernet driver is built-in, NET_SWITCHDEV (a bool option) is enabled, and the bridge is a module, we need a framework that can handle the possible absence of the bridge from the running system, i.e. runtime bloatware as opposed to build-time bloatware. Luckily we already have this framework, since switchdev has been using it extensively. Events from the bridge side are transmitted to the driver side using notifier chains - this was originally done so that unrelated drivers could snoop for events emitted by the bridge towards ports that are implemented by other drivers (think of a switch driver with LAG offload that listens for switchdev events on a bonding/team interface that it offloads). There are also events which are transmitted from the driver side to the bridge side, which again are modeled using notifiers. SWITCHDEV_FDB_ADD_TO_BRIDGE is an example of this, and deals with notifying the bridge that a MAC address has been dynamically learned. So there is a precedent we can use for modeling the new framework. The difference compared to SWITCHDEV_FDB_ADD_TO_BRIDGE is that the work that the bridge needs to do when a port becomes offloaded is blocking in its nature: replay VLANs, MDBs etc. The calling context is indeed blocking (we are under rtnl_mutex), but the existing switchdev notification chain that the bridge is subscribed to is only the atomic one. So we need to subscribe the bridge to the blocking switchdev notification chain too. This patch: - keeps the driver-side perception of the switchdev_bridge_port_{,un}offload unchanged - moves the implementation of switchdev_bridge_port_{,un}offload from the bridge module into the switchdev module. - makes everybody that is subscribed to the switchdev blocking notifier chain "hear" offload & unoffload events - makes the bridge driver subscribe and handle those events - moves the bridge driver's handling of those events into 2 new functions called br_switchdev_port_{,un}offload. These functions contain in fact the core of the logic that was previously in switchdev_bridge_port_{,un}offload, just that now we go through an extra indirection layer to reach them. Unlike all the other switchdev notification structures, the structure used to carry the bridge port information, struct switchdev_notifier_brport_info, does not contain a "bool handled". This is because in the current usage pattern, we always know that a switchdev bridge port offloading event will be handled by the bridge, because the switchdev_bridge_port_offload() call was initiated by a NETDEV_CHANGEUPPER event in the first place, where info->upper_dev is a bridge. So if the bridge wasn't loaded, then the CHANGEUPPER event couldn't have happened. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Tested-by: Grygorii Strashko <grygorii.strashko@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
b4454bc6 |
|
28-Jul-2021 |
Vladimir Oltean <vladimir.oltean@nxp.com> |
net: bridge: switchdev: replay the entire FDB for each port Currently when a switchdev port joins a bridge, we replay all FDB entries pointing towards that port or towards the bridge. However, this is insufficient in certain situations: (a) DSA, through its assisted_learning_on_cpu_port logic, snoops dynamically learned FDB entries on foreign interfaces. These are FDB entries that are pointing neither towards the newly joined switchdev port, nor towards the bridge. So these addresses would be missed when joining a bridge where a foreign interface has already learned some addresses, and they would also linger on if the DSA port leaves the bridge before the foreign interface forgets them. None of this happens if we replay the entire FDB when the port joins. (b) There is a desire to treat local FDB entries on a port (i.e. the port's termination MAC address) identically to FDB entries pointing towards the bridge itself. More details on the reason behind this in the next patch. The point is that this cannot be done given the current structure of br_fdb_replay() in this situation: ip link set swp0 master br0 # br0 inherits its MAC address from swp0 ip link set swp1 master br0 What is desirable is that when swp1 joins the bridge, br_fdb_replay() also notifies swp1 of br0's MAC address, but this won't in fact happen because the MAC address of br0 does not have fdb->dst == NULL (it doesn't point towards the bridge), but it has fdb->dst == swp0. So our current logic makes it impossible for that address to be replayed. But if we dump the entire FDB instead of just the entries with fdb->dst == swp1 and fdb->dst == NULL, then the inherited MAC address of br0 will be replayed too, which is what we need. A natural question arises: say there is an FDB entry to be replayed, like a MAC address dynamically learned on a foreign interface that belongs to a bridge where no switchdev port has joined yet. If 10 switchdev ports belonging to the same driver join this bridge, one by one, won't every port get notified 10 times of the foreign FDB entry, amounting to a total of 100 notifications for this FDB entry in the switchdev driver? Well, yes, but this is where the "void *ctx" argument for br_fdb_replay is useful: every port of the switchdev driver is notified whenever any other port requests an FDB replay, but because the replay was initiated by a different port, its context is different from the initiating port's context, so it ignores those replays. So the foreign FDB entry will be installed only 10 times, once per port. This is done so that the following 4 code paths are always well balanced: (a) addition of foreign FDB entry is replayed when port joins bridge (b) deletion of foreign FDB entry is replayed when port leaves bridge (c) addition of foreign FDB entry is notified to all ports currently in bridge (c) deletion of foreign FDB entry is notified to all ports currently in bridge Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
ad2f99ae |
|
27-Jul-2021 |
Arnd Bergmann <arnd@arndb.de> |
net: bridge: move bridge ioctls out of .ndo_do_ioctl Working towards obsoleting the .ndo_do_ioctl operation entirely, stop passing the SIOCBRADDIF/SIOCBRDELIF device ioctl commands into this callback. My first attempt was to add another ndo_siocbr() callback, but as there is only a single driver that takes these commands and there is already a hook mechanism to call directly into this driver, extend this hook instead, and use it for both the deviceless and the device specific ioctl commands. Cc: Roopa Prabhu <roopa@nvidia.com> Cc: Nikolay Aleksandrov <nikolay@nvidia.com> Cc: bridge@lists.linux-foundation.org Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
561d8352 |
|
27-Jul-2021 |
Arnd Bergmann <arnd@arndb.de> |
bridge: use ndo_siocdevprivate The bridge driver has an old set of ioctls using the SIOCDEVPRIVATE namespace that have never worked in compat mode and are explicitly forbidden already. Move them over to ndo_siocdevprivate and fix compat mode for these, because we can. Cc: Roopa Prabhu <roopa@nvidia.com> Cc: Nikolay Aleksandrov <nikolay@nvidia.com> Cc: bridge@lists.linux-foundation.org Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
c5381154 |
|
23-Jul-2021 |
Vladimir Oltean <vladimir.oltean@nxp.com> |
net: bridge: fix build when setting skb->offload_fwd_mark with CONFIG_NET_SWITCHDEV=n Switchdev support can be disabled at compile time, and in that case, struct sk_buff will not contain the offload_fwd_mark field. To make the code in br_forward.c work in both cases, we do what is done in other places and we create a helper function, with an empty shim definition, that is implemented by the br_switchdev.o translation module. This is always compiled if and only if CONFIG_NET_SWITCHDEV is y or m. Reported-by: kernel test robot <lkp@intel.com> Fixes: 472111920f1c ("net: bridge: switchdev: allow the TX data plane forwarding to be offloaded") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
47211192 |
|
22-Jul-2021 |
Tobias Waldekranz <tobias@waldekranz.com> |
net: bridge: switchdev: allow the TX data plane forwarding to be offloaded Allow switchdevs to forward frames from the CPU in accordance with the bridge configuration in the same way as is done between bridge ports. This means that the bridge will only send a single skb towards one of the ports under the switchdev's control, and expects the driver to deliver the packet to all eligible ports in its domain. Primarily this improves the performance of multicast flows with multiple subscribers, as it allows the hardware to perform the frame replication. The basic flow between the driver and the bridge is as follows: - When joining a bridge port, the switchdev driver calls switchdev_bridge_port_offload() with tx_fwd_offload = true. - The bridge sends offloadable skbs to one of the ports under the switchdev's control using skb->offload_fwd_mark = true. - The switchdev driver checks the skb->offload_fwd_mark field and lets its FDB lookup select the destination port mask for this packet. v1->v2: - convert br_input_skb_cb::fwd_hwdoms to a plain unsigned long - introduce a static key "br_switchdev_fwd_offload_used" to minimize the impact of the newly introduced feature on all the setups which don't have hardware that can make use of it - introduce a check for nbp->flags & BR_FWD_OFFLOAD to optimize cache line access - reorder nbp_switchdev_frame_mark_accel() and br_handle_vlan() in __br_forward() - do not strip VLAN on egress if forwarding offload on VLAN-aware bridge is being used - propagate errors from .ndo_dfwd_add_station() if not EOPNOTSUPP v2->v3: - replace the solution based on .ndo_dfwd_add_station with a solution based on switchdev_bridge_port_offload - rename BR_FWD_OFFLOAD to BR_TX_FWD_OFFLOAD v3->v4: rebase v4->v5: - make sure the static key is decremented on bridge port unoffload - more function and variable renaming and comments for them: br_switchdev_fwd_offload_used to br_switchdev_tx_fwd_offload br_switchdev_accels_skb to br_switchdev_frame_uses_tx_fwd_offload nbp_switchdev_frame_mark_tx_fwd to nbp_switchdev_frame_mark_tx_fwd_to_hwdom nbp_switchdev_frame_mark_accel to nbp_switchdev_frame_mark_tx_fwd_offload fwd_accel to tx_fwd_offload Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
4e51bf44 |
|
21-Jul-2021 |
Vladimir Oltean <vladimir.oltean@nxp.com> |
net: bridge: move the switchdev object replay helpers to "push" mode Starting with commit 4f2673b3a2b6 ("net: bridge: add helper to replay port and host-joined mdb entries"), DSA has introduced some bridge helpers that replay switchdev events (FDB/MDB/VLAN additions and deletions) that can be lost by the switchdev drivers in a variety of circumstances: - an IP multicast group was host-joined on the bridge itself before any switchdev port joined the bridge, leading to the host MDB entries missing in the hardware database. - during the bridge creation process, the MAC address of the bridge was added to the FDB as an entry pointing towards the bridge device itself, but with no switchdev ports being part of the bridge yet, this local FDB entry would remain unknown to the switchdev hardware database. - a VLAN/FDB/MDB was added to a bridge port that is a LAG interface, before any switchdev port joined that LAG, leading to the hardware database missing those entries. - a switchdev port left a LAG that is a bridge port, while the LAG remained part of the bridge, and all FDB/MDB/VLAN entries remained installed in the hardware database of the switchdev port. Also, since commit 0d2cfbd41c4a ("net: bridge: ignore switchdev events for LAG ports which didn't request replay"), DSA introduced a method, based on a const void *ctx, to ensure that two switchdev ports under the same LAG that is a bridge port do not see the same MDB/VLAN entry being replayed twice by the bridge, once for every bridge port that joins the LAG. With so many ordering corner cases being possible, it seems unreasonable to expect a switchdev driver writer to get it right from the first try. Therefore, now that DSA has experimented with the bridge replay helpers for a little bit, we can move the code to the bridge driver where it is more readily available to all switchdev drivers. To convert the switchdev object replay helpers from "pull mode" (where the driver asks for them) to a "push mode" (where the bridge offers them automatically), the biggest problem is that the bridge needs to be aware when a switchdev port joins and leaves, even when the switchdev is only indirectly a bridge port (for example when the bridge port is a LAG upper of the switchdev). Luckily, we already have a hook for that, in the form of the newly introduced switchdev_bridge_port_offload() and switchdev_bridge_port_unoffload() calls. These offer a natural place for hooking the object addition and deletion replays. Extend the above 2 functions with: - pointers to the switchdev atomic notifier (for FDB replays) and the blocking notifier (for MDB and VLAN replays). - the "const void *ctx" argument required for drivers to be able to disambiguate between which port is targeted, when multiple ports are lowers of the same LAG that is a bridge port. Most of the drivers pass NULL to this argument, except the ones that support LAG offload and have the proper context check already in place in the switchdev blocking notifier handler. Also unexport the replay helpers, since nobody except the bridge calls them directly now. Note that: (a) we abuse the terminology slightly, because FDB entries are not "switchdev objects", but we count them as objects nonetheless. With no direct way to prove it, I think they are not modeled as switchdev objects because those can only be installed by the bridge to the hardware (as opposed to FDB entries which can be propagated in the other direction too). This is merely an abuse of terms, FDB entries are replayed too, despite not being objects. (b) the bridge does not attempt to sync port attributes to newly joined ports, just the countable stuff (the objects). The reason for this is simple: no universal and symmetric way to sync and unsync them is known. For example, VLAN filtering: what to do on unsync, disable or leave it enabled? Similarly, STP state, ageing timer, etc etc. What a switchdev port does when it becomes standalone again is not really up to the bridge's competence, and the driver should deal with it. On the other hand, replaying deletions of switchdev objects can be seen a matter of cleanup and therefore be treated by the bridge, hence this patch. We make the replay helpers opt-in for drivers, because they might not bring immediate benefits for them: - nbp_vlan_init() is called _after_ netdev_master_upper_dev_link(), so br_vlan_replay() should not do anything for the new drivers on which we call it. The existing drivers where there was even a slight possibility for there to exist a VLAN on a bridge port before they join it are already guarded against this: mlxsw and prestera deny joining LAG interfaces that are members of a bridge. - br_fdb_replay() should now notify of local FDB entries, but I patched all drivers except DSA to ignore these new entries in commit 2c4eca3ef716 ("net: bridge: switchdev: include local flag in FDB notifications"). Driver authors can lift this restriction as they wish, and when they do, they can also opt into the FDB replay functionality. - br_mdb_replay() should fix a real issue which is described in commit 4f2673b3a2b6 ("net: bridge: add helper to replay port and host-joined mdb entries"). However most drivers do not offload the SWITCHDEV_OBJ_ID_HOST_MDB to see this issue: only cpsw and am65_cpsw offload this switchdev object, and I don't completely understand the way in which they offload this switchdev object anyway. So I'll leave it up to these drivers' respective maintainers to opt into br_mdb_replay(). So most of the drivers pass NULL notifier blocks for the replay helpers, except: - dpaa2-switch which was already acked/regression-tested with the helpers enabled (and there isn't much of a downside in having them) - ocelot which already had replay logic in "pull" mode - DSA which already had replay logic in "pull" mode An important observation is that the drivers which don't currently request bridge event replays don't even have the switchdev_bridge_port_{offload,unoffload} calls placed in proper places right now. This was done to avoid unnecessary rework for drivers which might never even add support for this. For driver writers who wish to add replay support, this can be used as a tentative placement guide: https://patchwork.kernel.org/project/netdevbpf/patch/20210720134655.892334-11-vladimir.oltean@nxp.com/ Cc: Vadym Kochan <vkochan@marvell.com> Cc: Taras Chornyi <tchornyi@marvell.com> Cc: Ioana Ciornei <ioana.ciornei@nxp.com> Cc: Lars Povlsen <lars.povlsen@microchip.com> Cc: Steen Hegelund <Steen.Hegelund@microchip.com> Cc: UNGLinuxDriver@microchip.com Cc: Claudiu Manoil <claudiu.manoil@nxp.com> Cc: Alexandre Belloni <alexandre.belloni@bootlin.com> Cc: Grygorii Strashko <grygorii.strashko@ti.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Acked-by: Ioana Ciornei <ioana.ciornei@nxp.com> # dpaa2-switch Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
2f5dc00f |
|
21-Jul-2021 |
Vladimir Oltean <vladimir.oltean@nxp.com> |
net: bridge: switchdev: let drivers inform which bridge ports are offloaded On reception of an skb, the bridge checks if it was marked as 'already forwarded in hardware' (checks if skb->offload_fwd_mark == 1), and if it is, it assigns the source hardware domain of that skb based on the hardware domain of the ingress port. Then during forwarding, it enforces that the egress port must have a different hardware domain than the ingress one (this is done in nbp_switchdev_allowed_egress). Non-switchdev drivers don't report any physical switch id (neither through devlink nor .ndo_get_port_parent_id), therefore the bridge assigns them a hardware domain of 0, and packets coming from them will always have skb->offload_fwd_mark = 0. So there aren't any restrictions. Problems appear due to the fact that DSA would like to perform software fallback for bonding and team interfaces that the physical switch cannot offload. +-- br0 ---+ / / | \ / / | \ / | | bond0 / | | / \ swp0 swp1 swp2 swp3 swp4 There, it is desirable that the presence of swp3 and swp4 under a non-offloaded LAG does not preclude us from doing hardware bridging beteen swp0, swp1 and swp2. The bandwidth of the CPU is often times high enough that software bridging between {swp0,swp1,swp2} and bond0 is not impractical. But this creates an impossible paradox given the current way in which port hardware domains are assigned. When the driver receives a packet from swp0 (say, due to flooding), it must set skb->offload_fwd_mark to something. - If we set it to 0, then the bridge will forward it towards swp1, swp2 and bond0. But the switch has already forwarded it towards swp1 and swp2 (not to bond0, remember, that isn't offloaded, so as far as the switch is concerned, ports swp3 and swp4 are not looking up the FDB, and the entire bond0 is a destination that is strictly behind the CPU). But we don't want duplicated traffic towards swp1 and swp2, so it's not ok to set skb->offload_fwd_mark = 0. - If we set it to 1, then the bridge will not forward the skb towards the ports with the same switchdev mark, i.e. not to swp1, swp2 and bond0. Towards swp1 and swp2 that's ok, but towards bond0? It should have forwarded the skb there. So the real issue is that bond0 will be assigned the same hardware domain as {swp0,swp1,swp2}, because the function that assigns hardware domains to bridge ports, nbp_switchdev_add(), recurses through bond0's lower interfaces until it finds something that implements devlink (calls dev_get_port_parent_id with bool recurse = true). This is a problem because the fact that bond0 can be offloaded by swp3 and swp4 in our example is merely an assumption. A solution is to give the bridge explicit hints as to what hardware domain it should use for each port. Currently, the bridging offload is very 'silent': a driver registers a netdevice notifier, which is put on the netns's notifier chain, and which sniffs around for NETDEV_CHANGEUPPER events where the upper is a bridge, and the lower is an interface it knows about (one registered by this driver, normally). Then, from within that notifier, it does a bunch of stuff behind the bridge's back, without the bridge necessarily knowing that there's somebody offloading that port. It looks like this: ip link set swp0 master br0 | v br_add_if() calls netdev_master_upper_dev_link() | v call_netdevice_notifiers | v dsa_slave_netdevice_event | v oh, hey! it's for me! | v .port_bridge_join What we do to solve the conundrum is to be less silent, and change the switchdev drivers to present themselves to the bridge. Something like this: ip link set swp0 master br0 | v br_add_if() calls netdev_master_upper_dev_link() | v bridge: Aye! I'll use this call_netdevice_notifiers ^ ppid as the | | hardware domain for v | this port, and zero dsa_slave_netdevice_event | if I got nothing. | | v | oh, hey! it's for me! | | | v | .port_bridge_join | | | +------------------------+ switchdev_bridge_port_offload(swp0, swp0) Then stacked interfaces (like bond0 on top of swp3/swp4) would be treated differently in DSA, depending on whether we can or cannot offload them. The offload case: ip link set bond0 master br0 | v br_add_if() calls netdev_master_upper_dev_link() | v bridge: Aye! I'll use this call_netdevice_notifiers ^ ppid as the | | switchdev mark for v | bond0. dsa_slave_netdevice_event | Coincidentally (or not), | | bond0 and swp0, swp1, swp2 v | all have the same switchdev hmm, it's not quite for me, | mark now, since the ASIC but my driver has already | is able to forward towards called .port_lag_join | all these ports in hw. for it, because I have | a port with dp->lag_dev == bond0. | | | v | .port_bridge_join | for swp3 and swp4 | | | +------------------------+ switchdev_bridge_port_offload(bond0, swp3) switchdev_bridge_port_offload(bond0, swp4) And the non-offload case: ip link set bond0 master br0 | v br_add_if() calls netdev_master_upper_dev_link() | v bridge waiting: call_netdevice_notifiers ^ huh, switchdev_bridge_port_offload | | wasn't called, okay, I'll use a v | hwdom of zero for this one. dsa_slave_netdevice_event : Then packets received on swp0 will | : not be software-forwarded towards v : swp1, but they will towards bond0. it's not for me, but bond0 is an upper of swp3 and swp4, but their dp->lag_dev is NULL because they couldn't offload it. Basically we can draw the conclusion that the lowers of a bridge port can come and go, so depending on the configuration of lowers for a bridge port, it can dynamically toggle between offloaded and unoffloaded. Therefore, we need an equivalent switchdev_bridge_port_unoffload too. This patch changes the way any switchdev driver interacts with the bridge. From now on, everybody needs to call switchdev_bridge_port_offload and switchdev_bridge_port_unoffload, otherwise the bridge will treat the port as non-offloaded and allow software flooding to other ports from the same ASIC. Note that these functions lay the ground for a more complex handshake between switchdev drivers and the bridge in the future. For drivers that will request a replay of the switchdev objects when they offload and unoffload a bridge port (DSA, dpaa2-switch, ocelot), we place the call to switchdev_bridge_port_unoffload() strategically inside the NETDEV_PRECHANGEUPPER notifier's code path, and not inside NETDEV_CHANGEUPPER. This is because the switchdev object replay helpers need the netdev adjacency lists to be valid, and that is only true in NETDEV_PRECHANGEUPPER. Cc: Vadym Kochan <vkochan@marvell.com> Cc: Taras Chornyi <tchornyi@marvell.com> Cc: Ioana Ciornei <ioana.ciornei@nxp.com> Cc: Lars Povlsen <lars.povlsen@microchip.com> Cc: Steen Hegelund <Steen.Hegelund@microchip.com> Cc: UNGLinuxDriver@microchip.com Cc: Claudiu Manoil <claudiu.manoil@nxp.com> Cc: Alexandre Belloni <alexandre.belloni@bootlin.com> Cc: Grygorii Strashko <grygorii.strashko@ti.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Tested-by: Ioana Ciornei <ioana.ciornei@nxp.com> # dpaa2-switch: regression Acked-by: Ioana Ciornei <ioana.ciornei@nxp.com> # dpaa2-switch Tested-by: Horatiu Vultur <horatiu.vultur@microchip.com> # ocelot-switch Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
85826610 |
|
21-Jul-2021 |
Tobias Waldekranz <tobias@waldekranz.com> |
net: bridge: switchdev: recycle unused hwdoms Since hwdoms have only been used thus far for equality comparisons, the bridge has used the simplest possible assignment policy; using a counter to keep track of the last value handed out. With the upcoming transmit offloading, we need to perform set operations efficiently based on hwdoms, e.g. we want to answer questions like "has this skb been forwarded to any port within this hwdom?" Move to a bitmap-based allocation scheme that recycles hwdoms once all members leaves the bridge. This means that we can use a single unsigned long to keep track of the hwdoms that have received an skb. v1->v2: convert the typedef DECLARE_BITMAP(br_hwdom_map_t, BR_HWDOM_MAX) into a plain unsigned long. v2->v6: none Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
f7cf972f |
|
21-Jul-2021 |
Tobias Waldekranz <tobias@waldekranz.com> |
net: bridge: disambiguate offload_fwd_mark Before this change, four related - but distinct - concepts where named offload_fwd_mark: - skb->offload_fwd_mark: Set by the switchdev driver if the underlying hardware has already forwarded this frame to the other ports in the same hardware domain. - nbp->offload_fwd_mark: An idetifier used to group ports that share the same hardware forwarding domain. - br->offload_fwd_mark: Counter used to make sure that unique IDs are used in cases where a bridge contains ports from multiple hardware domains. - skb->cb->offload_fwd_mark: The hardware domain on which the frame ingressed and was forwarded. Introduce the term "hardware forwarding domain" ("hwdom") in the bridge to denote a set of ports with the following property: If an skb with skb->offload_fwd_mark set, is received on a port belonging to hwdom N, that frame has already been forwarded to all other ports in hwdom N. By decoupling the name from "offload_fwd_mark", we can extend the term's definition in the future - e.g. to add constraints that describe expected egress behavior - without overloading the meaning of "offload_fwd_mark". - nbp->offload_fwd_mark thus becomes nbp->hwdom. - br->offload_fwd_mark becomes br->last_hwdom. - skb->cb->offload_fwd_mark becomes skb->cb->src_hwdom. The slight change in naming here mandates a slight change in behavior of the nbp_switchdev_frame_mark() function. Previously, it only set this value in skb->cb for packets with skb->offload_fwd_mark true (ones which were forwarded in hardware). Whereas now we always track the incoming hwdom for all packets coming from a switchdev (even for the packets which weren't forwarded in hardware, such as STP BPDUs, IGMP reports etc). As all uses of skb->cb->offload_fwd_mark were already gated behind checks of skb->offload_fwd_mark, this will not introduce any functional change, but it paves the way for future changes where the ingressing hwdom must be known for frames coming from a switchdev regardless of whether they were forwarded in hardware or not (basically, if the skb comes from a switchdev, skb->cb->src_hwdom now always tracks which one). A typical example where this is relevant: the switchdev has a fixed configuration to trap STP BPDUs, but STP is not running on the bridge and the group_fwd_mask allows them to be forwarded. Say we have this setup: br0 / | \ / | \ swp0 swp1 swp2 A BPDU comes in on swp0 and is trapped to the CPU; the driver does not set skb->offload_fwd_mark. The bridge determines that the frame should be forwarded to swp{1,2}. It is imperative that forward offloading is _not_ allowed in this case, as the source hwdom is already "poisoned". Recording the source hwdom allows this case to be handled properly. v2->v3: added code comments v3->v6: none Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Grygorii Strashko <grygorii.strashko@ti.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
58d913a3 |
|
21-Jul-2021 |
Nikolay Aleksandrov <nikolay@nvidia.com> |
net: bridge: multicast: add context support for host-joined groups Adding bridge multicast context support for host-joined groups is easy because we only need the proper timer value. We pass the already chosen context and use its timer value. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
9dee572c |
|
19-Jul-2021 |
Nikolay Aleksandrov <nikolay@nvidia.com> |
net: bridge: vlan: add mcast snooping control Add a new global vlan option which controls whether multicast snooping is enabled or disabled for a single vlan. It controls the vlan private flag: BR_VLFLAG_GLOBAL_MCAST_ENABLED. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
743a53d9 |
|
19-Jul-2021 |
Nikolay Aleksandrov <nikolay@nvidia.com> |
net: bridge: vlan: add support for dumping global vlan options Add a new vlan options dump flag which causes only global vlan options to be dumped. The dumps are done only with bridge devices, ports are ignored. They support vlan compression if the options in sequential vlans are equal (currently always true). Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
47ecd2db |
|
19-Jul-2021 |
Nikolay Aleksandrov <nikolay@nvidia.com> |
net: bridge: vlan: add support for global options We can have two types of vlan options depending on context: - per-device vlan options (split in per-bridge and per-port) - global vlan options The second type wasn't supported in the bridge until now, but we need them for per-vlan multicast support, per-vlan STP support and other options which require global vlan context. They are contained in the global bridge vlan context even if the vlan is not configured on the bridge device itself. This patch adds initial netlink attributes and support for setting these global vlan options, they can only be set (RTM_NEWVLAN) and the operation must use the bridge device. Since there are no such options yet it shouldn't have any functional effect. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
1e9ca456 |
|
19-Jul-2021 |
Nikolay Aleksandrov <nikolay@nvidia.com> |
net: bridge: multicast: include router port vlan id in notifications Use the port multicast context to check if the router port is a vlan and in case it is include its vlan id in the notification. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
4cdd0d10 |
|
19-Jul-2021 |
Nikolay Aleksandrov <nikolay@nvidia.com> |
net: bridge: multicast: check if should use vlan mcast ctx Add helpers which check if the current bridge/port multicast context should be used (i.e. they're not disabled) and use them for Rx IGMP/MLD processing, timers and new group addition. It is important for vlans to disable processing of timer/packet after the multicast_lock is obtained if the vlan context doesn't have BR_VLFLAG_MCAST_ENABLED. There are two cases when that flag is missing: - if the vlan is getting destroyed it will be removed and timers will be stopped - if the vlan mcast snooping is being disabled Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
f4b7002a |
|
19-Jul-2021 |
Nikolay Aleksandrov <nikolay@nvidia.com> |
net: bridge: add vlan mcast snooping knob Add a global knob that controls if vlan multicast snooping is enabled. The proper contexts (vlan or bridge-wide) will be chosen based on the knob when processing packets and changing bridge device state. Note that vlans have their individual mcast snooping enabled by default, but this knob is needed to turn on bridge vlan snooping. It is disabled by default. To enable the knob vlan filtering must also be enabled, it doesn't make sense to have vlan mcast snooping without vlan filtering since that would lead to inconsistencies. Disabling vlan filtering will also automatically disable vlan mcast snooping. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
7b54aaaf |
|
19-Jul-2021 |
Nikolay Aleksandrov <nikolay@nvidia.com> |
net: bridge: multicast: add vlan state initialization and control Add helpers to enable/disable vlan multicast based on its flags, we need two flags because we need to know if the vlan has multicast enabled globally (user-controlled) and if it has it enabled on the specific device (bridge or port). The new private vlan flags are: - BR_VLFLAG_MCAST_ENABLED: locally enabled multicast on the device, used when removing a vlan, toggling vlan mcast snooping and controlling single vlan (kernel-controlled, valid under RTNL and multicast_lock) - BR_VLFLAG_GLOBAL_MCAST_ENABLED: globally enabled multicast for the vlan, used to control the bridge-wide vlan mcast snooping for a single vlan (user-controlled, can be checked under any context) Bridge vlan contexts are created with multicast snooping enabled by default to be in line with the current bridge snooping defaults. In order to actually activate per vlan snooping and context usage a bridge-wide knob will be added later which will default to disabled. If that knob is enabled then automatically all vlan snooping will be enabled. All vlan contexts are initialized with the current bridge multicast context defaults. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
613d61db |
|
19-Jul-2021 |
Nikolay Aleksandrov <nikolay@nvidia.com> |
net: bridge: vlan: add global and per-port multicast context Add global and per-port vlan multicast context, only initialized but still not used. No functional changes intended. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
adc47037 |
|
19-Jul-2021 |
Nikolay Aleksandrov <nikolay@nvidia.com> |
net: bridge: multicast: use multicast contexts instead of bridge or port Pass multicast context pointers to multicast functions instead of bridge/port. This would make it easier later to switch these contexts to their per-vlan versions. The patch is basically search and replace, no functional changes. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
d3d065c0 |
|
19-Jul-2021 |
Nikolay Aleksandrov <nikolay@nvidia.com> |
net: bridge: multicast: factor out bridge multicast context Factor out the bridge's global multicast context into a separate structure which will later be used for per-vlan global context. No functional changes intended. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
9632233e |
|
19-Jul-2021 |
Nikolay Aleksandrov <nikolay@nvidia.com> |
net: bridge: multicast: factor out port multicast context Factor out the port's multicast context into a separate structure which will later be shared for per-port,vlan context. No functional changes intended. We need the structure even if bridge multicast is not defined to pass down as pointer to forwarding functions. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
45a68787 |
|
10-Aug-2021 |
Nikolay Aleksandrov <nikolay@nvidia.com> |
net: bridge: fix flags interpretation for extern learn fdb entries Ignore fdb flags when adding port extern learn entries and always set BR_FDB_LOCAL flag when adding bridge extern learn entries. This is closest to the behaviour we had before and avoids breaking any use cases which were allowed. This patch fixes iproute2 calls which assume NUD_PERMANENT and were allowed before, example: $ bridge fdb add 00:11:22:33:44:55 dev swp1 extern_learn Extern learn entries are allowed to roam, but do not expire, so static or dynamic flags make no sense for them. Also add a comment for future reference. Fixes: eb100e0e24a2 ("net: bridge: allow to add externally learned entries from user-space") Fixes: 0541a6293298 ("net: bridge: validate the NUD_PERMANENT bit when adding an extern_learn FDB entry") Reviewed-by: Ido Schimmel <idosch@nvidia.com> Tested-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://lore.kernel.org/r/20210810110010.43859-1-razor@blackwall.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
0541a629 |
|
01-Aug-2021 |
Vladimir Oltean <vladimir.oltean@nxp.com> |
net: bridge: validate the NUD_PERMANENT bit when adding an extern_learn FDB entry Currently it is possible to add broken extern_learn FDB entries to the bridge in two ways: 1. Entries pointing towards the bridge device that are not local/permanent: ip link add br0 type bridge bridge fdb add 00:01:02:03:04:05 dev br0 self extern_learn static 2. Entries pointing towards the bridge device or towards a port that are marked as local/permanent, however the bridge does not process the 'permanent' bit in any way, therefore they are recorded as though they aren't permanent: ip link add br0 type bridge bridge fdb add 00:01:02:03:04:05 dev br0 self extern_learn permanent Since commit 52e4bec15546 ("net: bridge: switchdev: treat local FDBs the same as entries towards the bridge"), these incorrect FDB entries can even trigger NULL pointer dereferences inside the kernel. This is because that commit made the assumption that all FDB entries that are not local/permanent have a valid destination port. For context, local / permanent FDB entries either have fdb->dst == NULL, and these point towards the bridge device and are therefore local and not to be used for forwarding, or have fdb->dst == a net_bridge_port structure (but are to be treated in the same way, i.e. not for forwarding). That assumption _is_ correct as long as things are working correctly in the bridge driver, i.e. we cannot logically have fdb->dst == NULL under any circumstance for FDB entries that are not local. However, the extern_learn code path where FDB entries are managed by a user space controller show that it is possible for the bridge kernel driver to misinterpret the NUD flags of an entry transmitted by user space, and end up having fdb->dst == NULL while not being a local entry. This is invalid and should be rejected. Before, the two commands listed above both crashed the kernel in this check from br_switchdev_fdb_notify: struct net_device *dev = info.is_local ? br->dev : dst->dev; info.is_local == false, dst == NULL. After this patch, the invalid entry added by the first command is rejected: ip link add br0 type bridge && bridge fdb add 00:01:02:03:04:05 dev br0 self extern_learn static; ip link del br0 Error: bridge: FDB entry towards bridge must be permanent. and the valid entry added by the second command is properly treated as a local address and does not crash br_switchdev_fdb_notify anymore: ip link add br0 type bridge && bridge fdb add 00:01:02:03:04:05 dev br0 self extern_learn permanent; ip link del br0 Fixes: eb100e0e24a2 ("net: bridge: allow to add externally learned entries from user-space") Reported-by: syzbot+9ba1174359adba5a5b7c@syzkaller.appspotmail.com Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com> Link: https://lore.kernel.org/r/20210801231730.7493-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
6eb38bf8 |
|
29-Jun-2021 |
Tobias Waldekranz <tobias@waldekranz.com> |
net: bridge: switchdev: send FDB notifications for host addresses Treat addresses added to the bridge itself in the same way as regular ports and send out a notification so that drivers may sync it down to the hardware FDB. Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
bbc6f2cc |
|
14-May-2021 |
Nikolay Aleksandrov <nikolay@nvidia.com> |
net: bridge: fix br_multicast_is_router stub when igmp is disabled br_multicast_is_router takes two arguments when bridge IGMP is enabled and just one when it's disabled, fix the stub to take two as well. Fixes: 1a3065a26807 ("net: bridge: mcast: prepare is-router function for mcast router split") Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Acked-by: Linus Lüssing <linus.luessing@c0d3.blue> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
a3c02e76 |
|
13-May-2021 |
Linus Lüssing <linus.luessing@c0d3.blue> |
net: bridge: mcast: split multicast router state for IPv4 and IPv6 A multicast router for IPv4 does not imply that the same host also is a multicast router for IPv6 and vice versa. To reduce multicast traffic when a host is only a multicast router for one of these two protocol families, keep router state for IPv4 and IPv6 separately. Similar to how querier state is kept separately. For backwards compatibility for netlink and switchdev notifications these two will still only notify if a port switched from either no IPv4/IPv6 multicast router to any IPv4/IPv6 multicast router or the other way round. However a full netlink MDB router dump will now also include a multicast router timeout for both IPv4 and IPv6. Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
1a3065a2 |
|
13-May-2021 |
Linus Lüssing <linus.luessing@c0d3.blue> |
net: bridge: mcast: prepare is-router function for mcast router split In preparation for the upcoming split of multicast router state into their IPv4 and IPv6 variants make br_multicast_is_router() protocol family aware. Note that for now br_ip6_multicast_is_router() uses the currently still common ip4_mc_router_timer for now. It will be renamed to ip6_mc_router_timer later when the split is performed. While at it also renames the "1" and "2" constants in br_multicast_is_router() to the MDB_RTR_TYPE_TEMP_QUERY and MDB_RTR_TYPE_PERM enums. Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
44ebb081 |
|
13-May-2021 |
Linus Lüssing <linus.luessing@c0d3.blue> |
net: bridge: mcast: add wrappers for router node retrieval In preparation for the upcoming split of multicast router state into their IPv4 and IPv6 variants and to avoid IPv6 #ifdef clutter later add two wrapper functions for router node retrieval in the payload forwarding code. Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
ce6f7097 |
|
13-May-2021 |
Linus Lüssing <linus.luessing@c0d3.blue> |
net: bridge: mcast: rename multicast router lists and timers In preparation for the upcoming split of multicast router state into their IPv4 and IPv6 variants, rename the affected variable to the IPv4 version first to avoid some renames in later commits. Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
58e20717 |
|
10-Jun-2021 |
Nikolay Aleksandrov <nikolay@nvidia.com> |
net: bridge: fix vlan tunnel dst null pointer dereference This patch fixes a tunnel_dst null pointer dereference due to lockless access in the tunnel egress path. When deleting a vlan tunnel the tunnel_dst pointer is set to NULL without waiting a grace period (i.e. while it's still usable) and packets egressing are dereferencing it without checking. Use READ/WRITE_ONCE to annotate the lockless use of tunnel_id, use RCU for accessing tunnel_dst and make sure it is read only once and checked in the egress path. The dst is already properly RCU protected so we don't need to do anything fancy than to make sure tunnel_id and tunnel_dst are read only once and checked in the egress path. Cc: stable@vger.kernel.org Fixes: 11538d039ac6 ("bridge: vlan dst_metadata hooks in ingress and egress paths") Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
ae1ea84b |
|
14-Apr-2021 |
Florian Fainelli <f.fainelli@gmail.com> |
net: bridge: propagate error code and extack from br_mc_disabled_update Some Ethernet switches might only be able to support disabling multicast snooping globally, which is an issue for example when several bridges span the same physical device and request contradictory settings. Propagate the return value of br_mc_disabled_update() such that this limitation is transmitted correctly to user-space. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
bcf2766b |
|
23-Mar-2021 |
Felix Fietkau <nbd@nbd.name> |
net: bridge: resolve forwarding path for VLAN tag actions in bridge devices Depending on the VLAN settings of the bridge and the port, the bridge can either add or remove a tag. When vlan filtering is enabled, the fdb lookup also needs to know the VLAN tag/proto for the destination address To provide this, keep track of the stack of VLAN tags for the path in the lookup context Signed-off-by: Felix Fietkau <nbd@nbd.name> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
c97f47e3 |
|
15-Feb-2021 |
Vladimir Oltean <vladimir.oltean@nxp.com> |
net: bridge: fix br_vlan_filter_toggle stub when CONFIG_BRIDGE_VLAN_FILTERING=n The prototype of br_vlan_filter_toggle was updated to include a netlink extack, but the stub definition wasn't, which results in a build error when CONFIG_BRIDGE_VLAN_FILTERING=n. Fixes: 9e781401cbfc ("net: bridge: propagate extack through store_bridge_parm") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
dcbdf135 |
|
13-Feb-2021 |
Vladimir Oltean <vladimir.oltean@nxp.com> |
net: bridge: propagate extack through switchdev_port_attr_set The benefit is the ability to propagate errors from switchdev drivers for the SWITCHDEV_ATTR_ID_BRIDGE_VLAN_FILTERING and SWITCHDEV_ATTR_ID_BRIDGE_VLAN_PROTOCOL attributes. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
9e781401 |
|
13-Feb-2021 |
Vladimir Oltean <vladimir.oltean@nxp.com> |
net: bridge: propagate extack through store_bridge_parm The bridge sysfs interface stores parameters for the STP, VLAN, multicast etc subsystems using a predefined function prototype. Sometimes the underlying function being called supports a netlink extended ack message, and we ignore it. Let's expand the store_bridge_parm function prototype to include the extack, and just print it to console, but at least propagate it where applicable. Where not applicable, create a shim function in the br_sysfs_br.c file that discards the extra function argument. This patch allows us to propagate the extack argument to br_vlan_set_default_pvid, br_vlan_set_proto and br_vlan_filter_toggle, and from there, further up in br_changelink from br_netlink.c. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
7a572964 |
|
13-Feb-2021 |
Vladimir Oltean <vladimir.oltean@nxp.com> |
net: bridge: remove __br_vlan_filter_toggle This function is identical with br_vlan_filter_toggle. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
078bbb85 |
|
12-Feb-2021 |
Vladimir Oltean <vladimir.oltean@nxp.com> |
net: bridge: don't print in br_switchdev_set_port_flag For the netlink interface, propagate errors through extack rather than simply printing them to the console. For the sysfs interface, we still print to the console, but at least that's one layer higher than in switchdev, which also allows us to silently ignore the offloading of flags if that is ever needed in the future. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
89268b05 |
|
26-Jan-2021 |
Nikolay Aleksandrov <nikolay@nvidia.com> |
net: bridge: multicast: add per-port EHT hosts limit Add a default limit of 512 for number of tracked EHT hosts per-port. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
d5a10222 |
|
20-Jan-2021 |
Nikolay Aleksandrov <nikolay@nvidia.com> |
net: bridge: multicast: mark IGMPv3/MLDv2 fast-leave deletes Mark groups which were deleted due to fast leave/EHT. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
474ddb37 |
|
20-Jan-2021 |
Nikolay Aleksandrov <nikolay@nvidia.com> |
net: bridge: multicast: add EHT allow/block handling Add support for IGMPv3/MLDv2 allow/block EHT handling. Similar to how the reports are processed we have 2 cases when the group is in include or exclude mode, these are processed as follows: - group include - allow: create missing entries - block: remove existing matching entries and remove the corresponding S,G entries if there are no more set host entries, then possibly delete the whole group if there are no more S,G entries - group exclude - allow - host include: create missing entries - host exclude: remove existing matching entries and remove the corresponding S,G entries if there are no more set host entries - block - host include: remove existing matching entries and remove the corresponding S,G entries if there are no more set host entries, then possibly delete the whole group if there are no more S,G entries - host exclude: create missing entries Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
8f07b831 |
|
20-Jan-2021 |
Nikolay Aleksandrov <nikolay@nvidia.com> |
net: bridge: multicast: add EHT structures and definitions Add EHT structures for tracking hosts and sources per group. We keep one set for each host which has all of the host's S,G entries, and one set for each multicast source which has all hosts that have joined that S,G. For each host, source entry we record the filter_mode and we keep an expiry timer. There is also one global expiry timer per source set, it is updated with each set entry update, it will be later used to lower the set's timer instead of lowering each entry's timer separately. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
851d0a73 |
|
04-Dec-2020 |
Joseph Huang <Joseph.Huang@garmin.com> |
bridge: Fix a deadlock when enabling multicast snooping When enabling multicast snooping, bridge module deadlocks on multicast_lock if 1) IPv6 is enabled, and 2) there is an existing querier on the same L2 network. The deadlock was caused by the following sequence: While holding the lock, br_multicast_open calls br_multicast_join_snoopers, which eventually causes IP stack to (attempt to) send out a Listener Report (in igmp6_join_group). Since the destination Ethernet address is a multicast address, br_dev_xmit feeds the packet back to the bridge via br_multicast_rcv, which in turn calls br_multicast_add_group, which then deadlocks on multicast_lock. The fix is to move the call br_multicast_join_snoopers outside of the critical section. This works since br_multicast_join_snoopers only deals with IP and does not modify any multicast data structures of the bridge, so there's no need to hold the lock. Steps to reproduce: 1. sysctl net.ipv6.conf.all.force_mld_version=1 2. have another querier 3. ip link set dev bridge type bridge mcast_snooping 0 && \ ip link set dev bridge type bridge mcast_snooping 1 < deadlock > A typical call trace looks like the following: [ 936.251495] _raw_spin_lock+0x5c/0x68 [ 936.255221] br_multicast_add_group+0x40/0x170 [bridge] [ 936.260491] br_multicast_rcv+0x7ac/0xe30 [bridge] [ 936.265322] br_dev_xmit+0x140/0x368 [bridge] [ 936.269689] dev_hard_start_xmit+0x94/0x158 [ 936.273876] __dev_queue_xmit+0x5ac/0x7f8 [ 936.277890] dev_queue_xmit+0x10/0x18 [ 936.281563] neigh_resolve_output+0xec/0x198 [ 936.285845] ip6_finish_output2+0x240/0x710 [ 936.290039] __ip6_finish_output+0x130/0x170 [ 936.294318] ip6_output+0x6c/0x1c8 [ 936.297731] NF_HOOK.constprop.0+0xd8/0xe8 [ 936.301834] igmp6_send+0x358/0x558 [ 936.305326] igmp6_join_group.part.0+0x30/0xf0 [ 936.309774] igmp6_group_added+0xfc/0x110 [ 936.313787] __ipv6_dev_mc_inc+0x1a4/0x290 [ 936.317885] ipv6_dev_mc_inc+0x10/0x18 [ 936.321677] br_multicast_open+0xbc/0x110 [bridge] [ 936.326506] br_multicast_toggle+0xec/0x140 [bridge] Fixes: 4effd28c1245 ("bridge: join all-snoopers multicast address") Signed-off-by: Joseph Huang <Joseph.Huang@garmin.com> Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com> Link: https://lore.kernel.org/r/20201204235628.50653-1-Joseph.Huang@garmin.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
7609ecb2 |
|
19-Nov-2020 |
Heiner Kallweit <hkallweit1@gmail.com> |
net: bridge: switch to net core statistics counters handling Use netdev->tstats instead of a member of net_bridge for storing a pointer to the per-cpu counters. This allows us to use core functionality for statistics handling. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Link: https://lore.kernel.org/r/9bad2be2-fd84-7c6e-912f-cee433787018@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
281cc284 |
|
17-Nov-2020 |
Heiner Kallweit <hkallweit1@gmail.com> |
net: bridge: replace struct br_vlan_stats with pcpu_sw_netstats Struct br_vlan_stats duplicates pcpu_sw_netstats (apart from br_vlan_stats not defining an alignment requirement), therefore switch to using the latter one. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Link: https://lore.kernel.org/r/04d25c3d-c5f6-3611-6d37-c2f40243dae2@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
0169b820 |
|
06-Nov-2020 |
Horatiu Vultur <horatiu.vultur@microchip.com> |
bridge: mrp: Use hlist_head instead of list_head for mrp Replace list_head with hlist_head for MRP list under the bridge. There is no need for a circular list when a linear list will work. This will also decrease the size of 'struct net_bridge'. Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Link: https://lore.kernel.org/r/20201106215049.1448185-1-horatiu.vultur@microchip.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
c43fd36f |
|
31-Oct-2020 |
Vladimir Oltean <vladimir.oltean@nxp.com> |
net: bridge: mcast: fix stub definition of br_multicast_querier_exists The commit cited below has changed only the functional prototype of br_multicast_querier_exists, but forgot to do that for the stub prototype (the one where CONFIG_BRIDGE_IGMP_SNOOPING is disabled). Fixes: 955062b03fa6 ("net: bridge: mcast: add support for raw L2 multicast groups") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://lore.kernel.org/r/20201101000845.190009-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
955062b0 |
|
28-Oct-2020 |
Nikolay Aleksandrov <nikolay@nvidia.com> |
net: bridge: mcast: add support for raw L2 multicast groups Extend the bridge multicast control and data path to configure routes for L2 (non-IP) multicast groups. The uapi struct br_mdb_entry union u is extended with another variant, mac_addr, which does not change the structure size, and which is valid when the proto field is zero. To be compatible with the forwarding code that is already in place, which acts as an IGMP/MLD snooping bridge with querier capabilities, we need to declare that for L2 MDB entries (for which there exists no such thing as IGMP/MLD snooping/querying), that there is always a querier. Otherwise, these entries would be flooded to all bridge ports and not just to those that are members of the L2 multicast group. Needless to say, only permanent L2 multicast groups can be installed on a bridge port. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://lore.kernel.org/r/20201028233831.610076-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
b6d0425b |
|
27-Oct-2020 |
Henrik Bjoernlund <henrik.bjoernlund@microchip.com> |
bridge: cfm: Netlink Notifications. This is the implementation of Netlink notifications out of CFM. Notifications are initiated whenever a state change happens in CFM. IFLA_BRIDGE_CFM: Points to the CFM information. IFLA_BRIDGE_CFM_MEP_STATUS_INFO: This indicate that the MEP instance status are following. IFLA_BRIDGE_CFM_CC_PEER_STATUS_INFO: This indicate that the peer MEP status are following. CFM nested attribute has the following attributes in next level. IFLA_BRIDGE_CFM_MEP_STATUS_INSTANCE: The MEP instance number of the delivered status. The type is NLA_U32. IFLA_BRIDGE_CFM_MEP_STATUS_OPCODE_UNEXP_SEEN: The MEP instance received CFM PDU with unexpected Opcode. The type is NLA_U32 (bool). IFLA_BRIDGE_CFM_MEP_STATUS_VERSION_UNEXP_SEEN: The MEP instance received CFM PDU with unexpected version. The type is NLA_U32 (bool). IFLA_BRIDGE_CFM_MEP_STATUS_RX_LEVEL_LOW_SEEN: The MEP instance received CCM PDU with MD level lower than configured level. This frame is discarded. The type is NLA_U32 (bool). IFLA_BRIDGE_CFM_CC_PEER_STATUS_INSTANCE: The MEP instance number of the delivered status. The type is NLA_U32. IFLA_BRIDGE_CFM_CC_PEER_STATUS_PEER_MEPID: The added Peer MEP ID of the delivered status. The type is NLA_U32. IFLA_BRIDGE_CFM_CC_PEER_STATUS_CCM_DEFECT: The CCM defect status. The type is NLA_U32 (bool). True means no CCM frame is received for 3.25 intervals. IFLA_BRIDGE_CFM_CC_CONFIG_EXP_INTERVAL. IFLA_BRIDGE_CFM_CC_PEER_STATUS_RDI: The last received CCM PDU RDI. The type is NLA_U32 (bool). IFLA_BRIDGE_CFM_CC_PEER_STATUS_PORT_TLV_VALUE: The last received CCM PDU Port Status TLV value field. The type is NLA_U8. IFLA_BRIDGE_CFM_CC_PEER_STATUS_IF_TLV_VALUE: The last received CCM PDU Interface Status TLV value field. The type is NLA_U8. IFLA_BRIDGE_CFM_CC_PEER_STATUS_SEEN: A CCM frame has been received from Peer MEP. The type is NLA_U32 (bool). This is cleared after GETLINK IFLA_BRIDGE_CFM_CC_PEER_STATUS_INFO. IFLA_BRIDGE_CFM_CC_PEER_STATUS_TLV_SEEN: A CCM frame with TLV has been received from Peer MEP. The type is NLA_U32 (bool). This is cleared after GETLINK IFLA_BRIDGE_CFM_CC_PEER_STATUS_INFO. IFLA_BRIDGE_CFM_CC_PEER_STATUS_SEQ_UNEXP_SEEN: A CCM frame with unexpected sequence number has been received from Peer MEP. The type is NLA_U32 (bool). When a sequence number is not one higher than previously received then it is unexpected. This is cleared after GETLINK IFLA_BRIDGE_CFM_CC_PEER_STATUS_INFO. Signed-off-by: Henrik Bjoernlund <henrik.bjoernlund@microchip.com> Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com> Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
e77824d8 |
|
27-Oct-2020 |
Henrik Bjoernlund <henrik.bjoernlund@microchip.com> |
bridge: cfm: Netlink GET status Interface. This is the implementation of CFM netlink status get information interface. Add new nested netlink attributes. These attributes are used by the user space to get status information. GETLINK: Request filter RTEXT_FILTER_CFM_STATUS: Indicating that CFM status information must be delivered. IFLA_BRIDGE_CFM: Points to the CFM information. IFLA_BRIDGE_CFM_MEP_STATUS_INFO: This indicate that the MEP instance status are following. IFLA_BRIDGE_CFM_CC_PEER_STATUS_INFO: This indicate that the peer MEP status are following. CFM nested attribute has the following attributes in next level. GETLINK RTEXT_FILTER_CFM_STATUS: IFLA_BRIDGE_CFM_MEP_STATUS_INSTANCE: The MEP instance number of the delivered status. The type is u32. IFLA_BRIDGE_CFM_MEP_STATUS_OPCODE_UNEXP_SEEN: The MEP instance received CFM PDU with unexpected Opcode. The type is u32 (bool). IFLA_BRIDGE_CFM_MEP_STATUS_VERSION_UNEXP_SEEN: The MEP instance received CFM PDU with unexpected version. The type is u32 (bool). IFLA_BRIDGE_CFM_MEP_STATUS_RX_LEVEL_LOW_SEEN: The MEP instance received CCM PDU with MD level lower than configured level. This frame is discarded. The type is u32 (bool). IFLA_BRIDGE_CFM_CC_PEER_STATUS_INSTANCE: The MEP instance number of the delivered status. The type is u32. IFLA_BRIDGE_CFM_CC_PEER_STATUS_PEER_MEPID: The added Peer MEP ID of the delivered status. The type is u32. IFLA_BRIDGE_CFM_CC_PEER_STATUS_CCM_DEFECT: The CCM defect status. The type is u32 (bool). True means no CCM frame is received for 3.25 intervals. IFLA_BRIDGE_CFM_CC_CONFIG_EXP_INTERVAL. IFLA_BRIDGE_CFM_CC_PEER_STATUS_RDI: The last received CCM PDU RDI. The type is u32 (bool). IFLA_BRIDGE_CFM_CC_PEER_STATUS_PORT_TLV_VALUE: The last received CCM PDU Port Status TLV value field. The type is u8. IFLA_BRIDGE_CFM_CC_PEER_STATUS_IF_TLV_VALUE: The last received CCM PDU Interface Status TLV value field. The type is u8. IFLA_BRIDGE_CFM_CC_PEER_STATUS_SEEN: A CCM frame has been received from Peer MEP. The type is u32 (bool). This is cleared after GETLINK IFLA_BRIDGE_CFM_CC_PEER_STATUS_INFO. IFLA_BRIDGE_CFM_CC_PEER_STATUS_TLV_SEEN: A CCM frame with TLV has been received from Peer MEP. The type is u32 (bool). This is cleared after GETLINK IFLA_BRIDGE_CFM_CC_PEER_STATUS_INFO. IFLA_BRIDGE_CFM_CC_PEER_STATUS_SEQ_UNEXP_SEEN: A CCM frame with unexpected sequence number has been received from Peer MEP. The type is u32 (bool). When a sequence number is not one higher than previously received then it is unexpected. This is cleared after GETLINK IFLA_BRIDGE_CFM_CC_PEER_STATUS_INFO. Signed-off-by: Henrik Bjoernlund <henrik.bjoernlund@microchip.com> Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com> Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
5e312fc0 |
|
27-Oct-2020 |
Henrik Bjoernlund <henrik.bjoernlund@microchip.com> |
bridge: cfm: Netlink GET configuration Interface. This is the implementation of CFM netlink configuration get information interface. Add new nested netlink attributes. These attributes are used by the user space to get configuration information. GETLINK: Request filter RTEXT_FILTER_CFM_CONFIG: Indicating that CFM configuration information must be delivered. IFLA_BRIDGE_CFM: Points to the CFM information. IFLA_BRIDGE_CFM_MEP_CREATE_INFO: This indicate that MEP instance create parameters are following. IFLA_BRIDGE_CFM_MEP_CONFIG_INFO: This indicate that MEP instance config parameters are following. IFLA_BRIDGE_CFM_CC_CONFIG_INFO: This indicate that MEP instance CC functionality parameters are following. IFLA_BRIDGE_CFM_CC_RDI_INFO: This indicate that CC transmitted CCM PDU RDI parameters are following. IFLA_BRIDGE_CFM_CC_CCM_TX_INFO: This indicate that CC transmitted CCM PDU parameters are following. IFLA_BRIDGE_CFM_CC_PEER_MEP_INFO: This indicate that the added peer MEP IDs are following. CFM nested attribute has the following attributes in next level. GETLINK RTEXT_FILTER_CFM_CONFIG: IFLA_BRIDGE_CFM_MEP_CREATE_INSTANCE: The created MEP instance number. The type is u32. IFLA_BRIDGE_CFM_MEP_CREATE_DOMAIN: The created MEP domain. The type is u32 (br_cfm_domain). It must be BR_CFM_PORT. This means that CFM frames are transmitted and received directly on the port - untagged. Not in a VLAN. IFLA_BRIDGE_CFM_MEP_CREATE_DIRECTION: The created MEP direction. The type is u32 (br_cfm_mep_direction). It must be BR_CFM_MEP_DIRECTION_DOWN. This means that CFM frames are transmitted and received on the port. Not in the bridge. IFLA_BRIDGE_CFM_MEP_CREATE_IFINDEX: The created MEP residence port ifindex. The type is u32 (ifindex). IFLA_BRIDGE_CFM_MEP_DELETE_INSTANCE: The deleted MEP instance number. The type is u32. IFLA_BRIDGE_CFM_MEP_CONFIG_INSTANCE: The configured MEP instance number. The type is u32. IFLA_BRIDGE_CFM_MEP_CONFIG_UNICAST_MAC: The configured MEP unicast MAC address. The type is 6*u8 (array). This is used as SMAC in all transmitted CFM frames. IFLA_BRIDGE_CFM_MEP_CONFIG_MDLEVEL: The configured MEP unicast MD level. The type is u32. It must be in the range 1-7. No CFM frames are passing through this MEP on lower levels. IFLA_BRIDGE_CFM_MEP_CONFIG_MEPID: The configured MEP ID. The type is u32. It must be in the range 0-0x1FFF. This MEP ID is inserted in any transmitted CCM frame. IFLA_BRIDGE_CFM_CC_CONFIG_INSTANCE: The configured MEP instance number. The type is u32. IFLA_BRIDGE_CFM_CC_CONFIG_ENABLE: The Continuity Check (CC) functionality is enabled or disabled. The type is u32 (bool). IFLA_BRIDGE_CFM_CC_CONFIG_EXP_INTERVAL: The CC expected receive interval of CCM frames. The type is u32 (br_cfm_ccm_interval). This is also the transmission interval of CCM frames when enabled. IFLA_BRIDGE_CFM_CC_CONFIG_EXP_MAID: The CC expected receive MAID in CCM frames. The type is CFM_MAID_LENGTH*u8. This is MAID is also inserted in transmitted CCM frames. IFLA_BRIDGE_CFM_CC_PEER_MEP_INSTANCE: The configured MEP instance number. The type is u32. IFLA_BRIDGE_CFM_CC_PEER_MEPID: The CC Peer MEP ID added. The type is u32. When a Peer MEP ID is added and CC is enabled it is expected to receive CCM frames from that Peer MEP. IFLA_BRIDGE_CFM_CC_RDI_INSTANCE: The configured MEP instance number. The type is u32. IFLA_BRIDGE_CFM_CC_RDI_RDI: The RDI that is inserted in transmitted CCM PDU. The type is u32 (bool). IFLA_BRIDGE_CFM_CC_CCM_TX_INSTANCE: The configured MEP instance number. The type is u32. IFLA_BRIDGE_CFM_CC_CCM_TX_DMAC: The transmitted CCM frame destination MAC address. The type is 6*u8 (array). This is used as DMAC in all transmitted CFM frames. IFLA_BRIDGE_CFM_CC_CCM_TX_SEQ_NO_UPDATE: The transmitted CCM frame update (increment) of sequence number is enabled or disabled. The type is u32 (bool). IFLA_BRIDGE_CFM_CC_CCM_TX_PERIOD: The period of time where CCM frame are transmitted. The type is u32. The time is given in seconds. SETLINK IFLA_BRIDGE_CFM_CC_CCM_TX must be done before timeout to keep transmission alive. When period is zero any ongoing CCM frame transmission will be stopped. IFLA_BRIDGE_CFM_CC_CCM_TX_IF_TLV: The transmitted CCM frame update with Interface Status TLV is enabled or disabled. The type is u32 (bool). IFLA_BRIDGE_CFM_CC_CCM_TX_IF_TLV_VALUE: The transmitted Interface Status TLV value field. The type is u8. IFLA_BRIDGE_CFM_CC_CCM_TX_PORT_TLV: The transmitted CCM frame update with Port Status TLV is enabled or disabled. The type is u32 (bool). IFLA_BRIDGE_CFM_CC_CCM_TX_PORT_TLV_VALUE: The transmitted Port Status TLV value field. The type is u8. Signed-off-by: Henrik Bjoernlund <henrik.bjoernlund@microchip.com> Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com> Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
2be665c3 |
|
27-Oct-2020 |
Henrik Bjoernlund <henrik.bjoernlund@microchip.com> |
bridge: cfm: Netlink SET configuration Interface. This is the implementation of CFM netlink configuration set information interface. Add new nested netlink attributes. These attributes are used by the user space to create/delete/configure CFM instances. SETLINK: IFLA_BRIDGE_CFM: Indicate that the following attributes are CFM. IFLA_BRIDGE_CFM_MEP_CREATE: This indicate that a MEP instance must be created. IFLA_BRIDGE_CFM_MEP_DELETE: This indicate that a MEP instance must be deleted. IFLA_BRIDGE_CFM_MEP_CONFIG: This indicate that a MEP instance must be configured. IFLA_BRIDGE_CFM_CC_CONFIG: This indicate that a MEP instance Continuity Check (CC) functionality must be configured. IFLA_BRIDGE_CFM_CC_PEER_MEP_ADD: This indicate that a CC Peer MEP must be added. IFLA_BRIDGE_CFM_CC_PEER_MEP_REMOVE: This indicate that a CC Peer MEP must be removed. IFLA_BRIDGE_CFM_CC_CCM_TX: This indicate that the CC transmitted CCM PDU must be configured. IFLA_BRIDGE_CFM_CC_RDI: This indicate that the CC transmitted CCM PDU RDI must be configured. CFM nested attribute has the following attributes in next level. SETLINK RTEXT_FILTER_CFM_CONFIG: IFLA_BRIDGE_CFM_MEP_CREATE_INSTANCE: The created MEP instance number. The type is u32. IFLA_BRIDGE_CFM_MEP_CREATE_DOMAIN: The created MEP domain. The type is u32 (br_cfm_domain). It must be BR_CFM_PORT. This means that CFM frames are transmitted and received directly on the port - untagged. Not in a VLAN. IFLA_BRIDGE_CFM_MEP_CREATE_DIRECTION: The created MEP direction. The type is u32 (br_cfm_mep_direction). It must be BR_CFM_MEP_DIRECTION_DOWN. This means that CFM frames are transmitted and received on the port. Not in the bridge. IFLA_BRIDGE_CFM_MEP_CREATE_IFINDEX: The created MEP residence port ifindex. The type is u32 (ifindex). IFLA_BRIDGE_CFM_MEP_DELETE_INSTANCE: The deleted MEP instance number. The type is u32. IFLA_BRIDGE_CFM_MEP_CONFIG_INSTANCE: The configured MEP instance number. The type is u32. IFLA_BRIDGE_CFM_MEP_CONFIG_UNICAST_MAC: The configured MEP unicast MAC address. The type is 6*u8 (array). This is used as SMAC in all transmitted CFM frames. IFLA_BRIDGE_CFM_MEP_CONFIG_MDLEVEL: The configured MEP unicast MD level. The type is u32. It must be in the range 1-7. No CFM frames are passing through this MEP on lower levels. IFLA_BRIDGE_CFM_MEP_CONFIG_MEPID: The configured MEP ID. The type is u32. It must be in the range 0-0x1FFF. This MEP ID is inserted in any transmitted CCM frame. IFLA_BRIDGE_CFM_CC_CONFIG_INSTANCE: The configured MEP instance number. The type is u32. IFLA_BRIDGE_CFM_CC_CONFIG_ENABLE: The Continuity Check (CC) functionality is enabled or disabled. The type is u32 (bool). IFLA_BRIDGE_CFM_CC_CONFIG_EXP_INTERVAL: The CC expected receive interval of CCM frames. The type is u32 (br_cfm_ccm_interval). This is also the transmission interval of CCM frames when enabled. IFLA_BRIDGE_CFM_CC_CONFIG_EXP_MAID: The CC expected receive MAID in CCM frames. The type is CFM_MAID_LENGTH*u8. This is MAID is also inserted in transmitted CCM frames. IFLA_BRIDGE_CFM_CC_PEER_MEP_INSTANCE: The configured MEP instance number. The type is u32. IFLA_BRIDGE_CFM_CC_PEER_MEPID: The CC Peer MEP ID added. The type is u32. When a Peer MEP ID is added and CC is enabled it is expected to receive CCM frames from that Peer MEP. IFLA_BRIDGE_CFM_CC_RDI_INSTANCE: The configured MEP instance number. The type is u32. IFLA_BRIDGE_CFM_CC_RDI_RDI: The RDI that is inserted in transmitted CCM PDU. The type is u32 (bool). IFLA_BRIDGE_CFM_CC_CCM_TX_INSTANCE: The configured MEP instance number. The type is u32. IFLA_BRIDGE_CFM_CC_CCM_TX_DMAC: The transmitted CCM frame destination MAC address. The type is 6*u8 (array). This is used as DMAC in all transmitted CFM frames. IFLA_BRIDGE_CFM_CC_CCM_TX_SEQ_NO_UPDATE: The transmitted CCM frame update (increment) of sequence number is enabled or disabled. The type is u32 (bool). IFLA_BRIDGE_CFM_CC_CCM_TX_PERIOD: The period of time where CCM frame are transmitted. The type is u32. The time is given in seconds. SETLINK IFLA_BRIDGE_CFM_CC_CCM_TX must be done before timeout to keep transmission alive. When period is zero any ongoing CCM frame transmission will be stopped. IFLA_BRIDGE_CFM_CC_CCM_TX_IF_TLV: The transmitted CCM frame update with Interface Status TLV is enabled or disabled. The type is u32 (bool). IFLA_BRIDGE_CFM_CC_CCM_TX_IF_TLV_VALUE: The transmitted Interface Status TLV value field. The type is u8. IFLA_BRIDGE_CFM_CC_CCM_TX_PORT_TLV: The transmitted CCM frame update with Port Status TLV is enabled or disabled. The type is u32 (bool). IFLA_BRIDGE_CFM_CC_CCM_TX_PORT_TLV_VALUE: The transmitted Port Status TLV value field. The type is u8. Signed-off-by: Henrik Bjoernlund <henrik.bjoernlund@microchip.com> Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com> Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
86a14b79 |
|
27-Oct-2020 |
Henrik Bjoernlund <henrik.bjoernlund@microchip.com> |
bridge: cfm: Kernel space implementation of CFM. MEP create/delete. This is the first commit of the implementation of the CFM protocol according to 802.1Q section 12.14. It contains MEP instance create, delete and configuration. Connectivity Fault Management (CFM) comprises capabilities for detecting, verifying, and isolating connectivity failures in Virtual Bridged Networks. These capabilities can be used in networks operated by multiple independent organizations, each with restricted management access to each others equipment. CFM functions are partitioned as follows: - Path discovery - Fault detection - Fault verification and isolation - Fault notification - Fault recovery Interface consists of these functions: br_cfm_mep_create() br_cfm_mep_delete() br_cfm_mep_config_set() br_cfm_cc_config_set() br_cfm_cc_peer_mep_add() br_cfm_cc_peer_mep_remove() A MEP instance is created by br_cfm_mep_create() -It is the Maintenance association End Point described in 802.1Q section 19.2. -It is created on a specific level (1-7) and is assuring that no CFM frames are passing through this MEP on lower levels. -It initiates and validates CFM frames on its level. -It can only exist on a port that is related to a bridge. -Attributes given cannot be changed until the instance is deleted. A MEP instance can be deleted by br_cfm_mep_delete(). A created MEP instance has attributes that can be configured by br_cfm_mep_config_set(). A MEP Continuity Check feature can be configured by br_cfm_cc_config_set() The Continuity Check Receiver state machine can be enabled and disabled. According to 802.1Q section 19.2.8 A MEP can have Peer MEPs added and removed by br_cfm_cc_peer_mep_add() and br_cfm_cc_peer_mep_remove() The Continuity Check feature can maintain connectivity status on each added Peer MEP. Signed-off-by: Henrik Bjoernlund <henrik.bjoernlund@microchip.com> Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com> Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
f323aa54 |
|
27-Oct-2020 |
Henrik Bjoernlund <henrik.bjoernlund@microchip.com> |
bridge: cfm: Add BRIDGE_CFM to Kconfig. This makes it possible to include or exclude the CFM protocol according to 802.1Q section 12.14. Signed-off-by: Henrik Bjoernlund <henrik.bjoernlund@microchip.com> Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com> Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
90c628dd |
|
27-Oct-2020 |
Henrik Bjoernlund <henrik.bjoernlund@microchip.com> |
net: bridge: extend the process of special frames This patch extends the processing of frames in the bridge. Currently MRP frames needs special processing and the current implementation doesn't allow a nice way to process different frame types. Therefore try to improve this by adding a list that contains frame types that need special processing. This list is iterated for each input frame and if there is a match based on frame type then these functions will be called and decide what to do with the frame. It can process the frame then the bridge doesn't need to do anything or don't process so then the bridge will do normal forwarding. Signed-off-by: Henrik Bjoernlund <henrik.bjoernlund@microchip.com> Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com> Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
9116ffbf |
|
22-Sep-2020 |
Nikolay Aleksandrov <nikolay@nvidia.com> |
net: bridge: mcast: add support for blocked port groups When excluding S,G entries we need a way to block a particular S,G,port. The new port group flag is managed based on the source's timer as per RFCs 3376 and 3810. When a source expires and its port group is in EXCLUDE mode, it will be blocked. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
8266a049 |
|
22-Sep-2020 |
Nikolay Aleksandrov <nikolay@nvidia.com> |
net: bridge: mcast: handle port group filter modes We need to handle group filter mode transitions and initial state. To change a port group's INCLUDE -> EXCLUDE mode (or when we have added a new port group in EXCLUDE mode) we need to add that port to all of *,G ports' S,G entries for proper replication. When the EXCLUDE state is changed from IGMPv3 report, br_multicast_fwd_filter_exclude() must be called after the source list processing because the assumption is that all of the group's S,G entries will be created before transitioning to EXCLUDE mode, i.e. most importantly its blocked entries will already be added so it will not get automatically added to them. The transition EXCLUDE -> INCLUDE happens only when a port group timer expires, it requires us to remove that port from all of *,G ports' S,G entries where it was automatically added previously. Finally when we are adding a new S,G entry we must add all of *,G's EXCLUDE ports to it. In order to distinguish automatically added *,G EXCLUDE ports we have a new port group flag - MDB_PG_FLAGS_STAR_EXCL. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
b0812368 |
|
22-Sep-2020 |
Nikolay Aleksandrov <nikolay@nvidia.com> |
net: bridge: mcast: install S,G entries automatically based on reports This patch adds support for automatic install of S,G mdb entries based on the port group's source list and the source entry's timer. Once installed the S,G will be used when forwarding packets if the approprate multicast/mld versions are set. A new source flag called BR_SGRP_F_INSTALLED denotes if the source has a forwarding mdb entry installed. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
085b53c8 |
|
22-Sep-2020 |
Nikolay Aleksandrov <nikolay@nvidia.com> |
net: bridge: mcast: add sg_port rhashtable To speedup S,G forward handling we need to be able to quickly find out if a port is a member of an S,G group. To do that add a global S,G port rhashtable with key: source addr, group addr, protocol, vid (all br_ip fields) and port pointer. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
8f8cb77e |
|
22-Sep-2020 |
Nikolay Aleksandrov <nikolay@nvidia.com> |
net: bridge: mcast: add rt_protocol field to the port group struct We need to be able to differentiate between pg entries created by user-space and the kernel when we start generating S,G entries for IGMPv3/MLDv2's fast path. User-space entries are created by default as RTPROT_STATIC and the kernel entries are RTPROT_KERNEL. Later we can allow user-space to provide the entry rt_protocol so we can differentiate between who added the entries specifically (e.g. clag, admin, frr etc). Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
88d4bd18 |
|
22-Sep-2020 |
Nikolay Aleksandrov <nikolay@nvidia.com> |
net: bridge: mdb: add support for add/del/dump of entries with source Add new mdb attributes (MDBE_ATTR_SOURCE for setting, MDBA_MDB_EATTR_SOURCE for dumping) to allow add/del and dump of mdb entries with a source address (S,G). New S,G entries are created with filter mode of MCAST_INCLUDE. The same attributes are used for IPv4 and IPv6, they're validated and parsed based on their protocol. S,G host joined entries which are added by user are not allowed yet. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
e12cec65 |
|
06-Sep-2020 |
Nikolay Aleksandrov <nikolay@cumulusnetworks.com> |
net: bridge: mcast: destroy all entries via gc Since each entry type has timers that can be running simultaneously we need to make sure that entries are not freed before their timers have finished. In order to do that generalize the src gc work to mcast gc work and use a callback to free the entries (mdb, port group or src). v3: add IPv6 support v2: force mcast gc on port del to make sure all port group timers have finished before freeing the bridge port Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
0436862e |
|
06-Sep-2020 |
Nikolay Aleksandrov <nikolay@cumulusnetworks.com> |
net: bridge: mcast: support for IGMPv3/MLDv2 ALLOW_NEW_SOURCES report This patch adds handling for the ALLOW_NEW_SOURCES IGMPv3/MLDv2 report types and limits them only when multicast_igmp_version == 3 or multicast_mld_version == 2 respectively. Now that IGMPv3/MLDv2 handling functions will be managing timers we need to delay their activation, thus a new argument is added which controls if the timer should be updated. We also disable host IGMPv3/MLDv2 handling as it's not yet implemented and could cause inconsistent group state, the host can only join a group as EXCLUDE {} or leave it. v4: rename update_timer to igmpv2_mldv1 and use the passed value from br_multicast_add_group's callers v3: Add IPv6/MLDv2 support Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
81f19838 |
|
06-Sep-2020 |
Nikolay Aleksandrov <nikolay@cumulusnetworks.com> |
net: bridge: mdb: use mdb and port entries in notifications We have to use mdb and port entries when sending mdb notifications in order to fill in all group attributes properly. Before this change we would've used a fake br_mdb_entry struct to fill in only partial information about the mdb. Now we can also reuse the mdb dump fill function and thus have only a single central place which fills the mdb attributes. v3: add IPv6 support Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
42c11ccf |
|
06-Sep-2020 |
Nikolay Aleksandrov <nikolay@cumulusnetworks.com> |
net: bridge: mcast: add support for group query retransmit We need to be able to retransmit group-specific and group-and-source specific queries. The new timer takes care of those. v3: add IPv6 support Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
438ef2d0 |
|
06-Sep-2020 |
Nikolay Aleksandrov <nikolay@cumulusnetworks.com> |
net: bridge: mcast: add support for group-and-source specific queries Allows br_multicast_alloc_query to build queries with the port group's source lists and sends a query for sources over and under lmqt when necessary as per RFCs 3376 and 3810 with the suppress flag set appropriately. v3: add IPv6 support Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
8b671779 |
|
06-Sep-2020 |
Nikolay Aleksandrov <nikolay@cumulusnetworks.com> |
net: bridge: mcast: add support for group source list Initial functions for group source lists which are needed for IGMPv3 and MLDv2 include/exclude lists. Both IPv4 and IPv6 sources are supported. User-added mdb entries are created with exclude filter mode, we can extend that later to allow user-supplied mode. When group src entries are deleted, they're freed from a workqueue to make sure their timers are not still running. Source entries are protected by the multicast_lock and rcu. The number of src groups per port group is limited to 32. v4: use the new port group del function directly add igmpv2/mldv1 bool to denote if the entry was added in those modes, it will later replace the old update_timer bool v3: add IPv6 support v2: allow src groups to be traversed under rcu Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
681590bd |
|
06-Sep-2020 |
Nikolay Aleksandrov <nikolay@cumulusnetworks.com> |
net: bridge: mcast: factor out port group del In order to avoid future errors and reduce code duplication we should factor out the port group del sequence. This allows us to have one function which takes care of all details when removing a port group. v4: set pg's fast leave flag when deleting due to fast leave move the patch before adding source lists Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
6ec0d0ee |
|
06-Sep-2020 |
Nikolay Aleksandrov <nikolay@cumulusnetworks.com> |
net: bridge: mdb: arrange internal structs so fast-path fields are close Before this patch we'd need 2 cache lines for fast-path, now all used fields are in the first cache line. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
528ae84a |
|
13-Jul-2020 |
Nikolay Aleksandrov <nikolay@cumulusnetworks.com> |
net: bridge: fix undefined br_vlan_can_enter_range in tunnel code If bridge vlan filtering is not defined we won't have br_vlan_can_enter_range and thus will get a compile error as was reported by Stephen and the build bot. So let's define a stub for when vlan filtering is not used. Fixes: 94339443686b ("net: bridge: notify on vlan tunnel changes done via the old api") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
df42ef22 |
|
02-Jul-2020 |
Horatiu Vultur <horatiu.vultur@microchip.com> |
bridge: mrp: Add br_mrp_fill_info Add the function br_mrp_fill_info which populates the MRP attributes regarding the status of each MRP instance. Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
9b14d1f8 |
|
28-Jun-2020 |
Horatiu Vultur <horatiu.vultur@microchip.com> |
bridge: mrp: Fix endian conversion and some other warnings The following sparse warnings are fixed: net/bridge/br_mrp.c:106:18: warning: incorrect type in assignment (different base types) net/bridge/br_mrp.c:106:18: expected unsigned short [usertype] net/bridge/br_mrp.c:106:18: got restricted __be16 [usertype] net/bridge/br_mrp.c:281:23: warning: incorrect type in argument 1 (different modifiers) net/bridge/br_mrp.c:281:23: expected struct list_head *entry net/bridge/br_mrp.c:281:23: got struct list_head [noderef] * net/bridge/br_mrp.c:332:28: warning: incorrect type in argument 1 (different modifiers) net/bridge/br_mrp.c:332:28: expected struct list_head *new net/bridge/br_mrp.c:332:28: got struct list_head [noderef] * net/bridge/br_mrp.c:332:40: warning: incorrect type in argument 2 (different modifiers) net/bridge/br_mrp.c:332:40: expected struct list_head *head net/bridge/br_mrp.c:332:40: got struct list_head [noderef] * net/bridge/br_mrp.c:682:29: warning: incorrect type in argument 1 (different modifiers) net/bridge/br_mrp.c:682:29: expected struct list_head const *head net/bridge/br_mrp.c:682:29: got struct list_head [noderef] * Reported-by: kernel test robot <lkp@intel.com> Fixes: 2f1a11ae11d222 ("bridge: mrp: Add MRP interface.") Fixes: 4b8d7d4c599182 ("bridge: mrp: Extend bridge interface") Fixes: 9a9f26e8f7ea30 ("bridge: mrp: Connect MRP API with the switchdev API") Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
206e7323 |
|
25-Jun-2020 |
Thomas Martitz <t.martitz@avm.de> |
net: bridge: enfore alignment for ethernet address The eth_addr member is passed to ether_addr functions that require 2-byte alignment, therefore the member must be properly aligned to avoid unaligned accesses. The problem is in place since the initial merge of multicast to unicast: commit 6db6f0eae6052b70885562e1733896647ec1d807 bridge: multicast to unicast Fixes: 6db6f0eae605 ("bridge: multicast to unicast") Cc: Roopa Prabhu <roopa@cumulusnetworks.com> Cc: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Cc: David S. Miller <davem@davemloft.net> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Felix Fietkau <nbd@nbd.name> Signed-off-by: Thomas Martitz <t.martitz@avm.de> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
31cbc39b |
|
23-Jun-2020 |
Nikolay Aleksandrov <nikolay@cumulusnetworks.com> |
net: bridge: add option to allow activity notifications for any fdb entries This patch adds the ability to notify about activity of any entries (static, permanent or ext_learn). EVPN multihoming peers need it to properly and efficiently handle mac sync (peer active/locally active). We add a new NFEA_ACTIVITY_NOTIFY attribute which is used to dump the current activity state and to control if static entries should be monitored at all. We use 2 bits - one to activate fdb entry tracking (disabled by default) and the second to denote that an entry is inactive. We need the second bit in order to avoid multiple notifications of inactivity. Obviously this makes no difference for dynamic entries since at the time of inactivity they get deleted, while the tracked non-dynamic entries get the inactive bit set and get a notification. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
9eb8eff0 |
|
10-May-2020 |
Vladimir Oltean <vladimir.oltean@nxp.com> |
net: bridge: allow enslaving some DSA master network devices Commit 8db0a2ee2c63 ("net: bridge: reject DSA-enabled master netdevices as bridge members") added a special check in br_if.c in order to check for a DSA master network device with a tagging protocol configured. This was done because back then, such devices, once enslaved in a bridge would become inoperative and would not pass DSA tagged traffic anymore due to br_handle_frame returning RX_HANDLER_CONSUMED. But right now we have valid use cases which do require bridging of DSA masters. One such example is when the DSA master ports are DSA switch ports themselves (in a disjoint tree setup). This should be completely equivalent, functionally speaking, from having multiple DSA switches hanging off of the ports of a switchdev driver. So we should allow the enslaving of DSA tagged master network devices. Instead of the regular br_handle_frame(), install a new function br_handle_frame_dummy() on these DSA masters, which returns RX_HANDLER_PASS in order to call into the DSA specific tagging protocol handlers, and lift the restriction from br_add_if. Suggested-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Suggested-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Tested-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
f78ed220 |
|
07-May-2020 |
Eric Dumazet <edumazet@google.com> |
netpoll: accept NULL np argument in netpoll_send_skb() netpoll_send_skb() callers seem to leak skb if the np pointer is NULL. While this should not happen, we can make the code more robust. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
8741e184 |
|
06-May-2020 |
Jason Yan <yanaijie@huawei.com> |
net: bridge: return false in br_mrp_enabled() Fix the following coccicheck warning: net/bridge/br_private.h:1334:8-9: WARNING: return of 0/1 in function 'br_mrp_enabled' with return type bool Fixes: 6536993371fab ("bridge: mrp: Integrate MRP into the bridge") Signed-off-by: Jason Yan <yanaijie@huawei.com> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
419dba8a |
|
26-Apr-2020 |
Horatiu Vultur <horatiu.vultur@microchip.com> |
net: bridge: Add checks for enabling the STP. It is not possible to have the MRP and STP running at the same time on the bridge, therefore add check when enabling the STP to check if MRP is already enabled. In that case return error. Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
65369933 |
|
26-Apr-2020 |
Horatiu Vultur <horatiu.vultur@microchip.com> |
bridge: mrp: Integrate MRP into the bridge To integrate MRP into the bridge, the bridge needs to do the following: - detect if the MRP frame was received on MRP ring port in that case it would be processed otherwise just forward it as usual. - enable parsing of MRP - before whenever the bridge was set up, it would set all the ports in forwarding state. Add an extra check to not set ports in forwarding state if the port is an MRP ring port. The reason of this change is that if the MRP instance initially sets the port in blocked state by setting the bridge up it would overwrite this setting. Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
4b8d7d4c |
|
26-Apr-2020 |
Horatiu Vultur <horatiu.vultur@microchip.com> |
bridge: mrp: Extend bridge interface To integrate MRP into the bridge, first the bridge needs to be aware of ports that are part of an MRP ring and which rings are on the bridge. Therefore extend bridge interface with the following: - add new flag(BR_MPP_AWARE) to the net bridge ports, this bit will be set when the port is added to an MRP instance. In this way it knows if the frame was received on MRP ring port - add new flag(BR_MRP_LOST_CONT) to the net bridge ports, this bit will be set when the port lost the continuity of MRP Test frames. - add a list of MRP instances Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
99f7c5e0 |
|
17-Mar-2020 |
Nikolay Aleksandrov <nikolay@cumulusnetworks.com> |
net: bridge: vlan options: rename br_vlan_opts_eq to br_vlan_opts_eq_range It is more appropriate name as it shows the intent of why we need to check the options' state. It also allows us to give meaning to the two arguments of the function: the first is the current vlan (v_curr) being checked if it could enter the range ending in the second one (range_end). Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
a580c76d |
|
24-Jan-2020 |
Nikolay Aleksandrov <nikolay@cumulusnetworks.com> |
net: bridge: vlan: add per-vlan state The first per-vlan option added is state, it is needed for EVPN and for per-vlan STP. The state allows to control the forwarding on per-vlan basis. The vlan state is considered only if the port state is forwarding in order to avoid conflicts and be consistent. br_allowed_egress is called only when the state is forwarding, but the ingress case is a bit more complicated due to the fact that we may have the transition between port:BR_STATE_FORWARDING -> vlan:BR_STATE_LEARNING which should still allow the bridge to learn from the packet after vlan filtering and it will be dropped after that. Also to optimize the pvid state check we keep a copy in the vlan group to avoid one lookup. The state members are modified with *_ONCE() to annotate the lockless access. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
a5d29ae2 |
|
24-Jan-2020 |
Nikolay Aleksandrov <nikolay@cumulusnetworks.com> |
net: bridge: vlan: add basic option setting support This patch adds support for option modification of single vlans and ranges. It allows to only modify options, i.e. skip create/delete by using the BRIDGE_VLAN_INFO_ONLY_OPTS flag. When working with a range option changes we try to pack the notifications as much as possible. v2: do full port (all vlans) notification only when creating/deleting vlans for compatibility, rework the range detection when changing options, add more verbose extack errors and check if a vlan should be used (br_vlan_should_use checks) Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
7a53e718 |
|
24-Jan-2020 |
Nikolay Aleksandrov <nikolay@cumulusnetworks.com> |
net: bridge: vlan: add basic option dumping support We'll be dumping the options for the whole range if they're equal. The first range vlan will be used to extract the options. The commit doesn't change anything yet it just adds the skeleton for the support. The dump will happen when the first option is added. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
f545923b |
|
14-Jan-2020 |
Nikolay Aleksandrov <nikolay@cumulusnetworks.com> |
net: bridge: vlan: notify on vlan add/delete/change flags Now that we can notify, send a notification on add/del or change of flags. Notifications are also compressed when possible to reduce their number and relieve user-space of extra processing, due to that we have to manually notify after each add/del in order to avoid double notifications. We try hard to notify only about the vlans which actually changed, thus a single command can result in multiple notifications about disjoint ranges if there were vlans which didn't change inside. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
cf5bddb9 |
|
14-Jan-2020 |
Nikolay Aleksandrov <nikolay@cumulusnetworks.com> |
net: bridge: vlan: add rtnetlink group and notify support Add a new rtnetlink group for bridge vlan notifications - RTNLGRP_BRVLAN and add support for sending vlan notifications (both single and ranges). No functional changes intended, the notification support will be used by later patches. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
f26b2965 |
|
14-Jan-2020 |
Nikolay Aleksandrov <nikolay@cumulusnetworks.com> |
net: bridge: vlan: add new rtm message support Add initial RTM_NEWVLAN support which can only create vlans, operating similar to the current br_afspec(). We will use it later to also change per-vlan options. Old-style (flag-based) vlan ranges are not allowed when using RTM messages, we will introduce vlan ranges later via a new nested attribute which would allow us to have all the information about a range encapsulated into a single nl attribute. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
8dcea187 |
|
14-Jan-2020 |
Nikolay Aleksandrov <nikolay@cumulusnetworks.com> |
net: bridge: vlan: add rtm definitions and dump support This patch adds vlan rtm definitions: - NEWVLAN: to be used for creating vlans, setting options and notifications - DELVLAN: to be used for deleting vlans - GETVLAN: used for dumping vlan information Dumping vlans which can span multiple messages is added now with basic information (vid and flags). We use nlmsg_parse() to validate the header length in order to be able to extend the message with filtering attributes later. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
8f4cc940 |
|
14-Jan-2020 |
Nikolay Aleksandrov <nikolay@cumulusnetworks.com> |
net: bridge: netlink: add extack error messages when processing vlans Add extack messages on vlan processing errors. We need to move the flags missing check after the "last" check since we may have "last" set but lack a range end flag in the next entry. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
5a46facb |
|
14-Jan-2020 |
Nikolay Aleksandrov <nikolay@cumulusnetworks.com> |
net: bridge: vlan: add helpers to check for vlan id/range validity Add helpers to check if a vlan id or range are valid. The range helper must be called when range start or end are detected. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
de179966 |
|
11-Dec-2019 |
Vivien Didelot <vivien.didelot@gmail.com> |
net: bridge: add STP xstats This adds rx_bpdu, tx_bpdu, rx_tcn, tx_tcn, transition_blk, transition_fwd xstats counters to the bridge ports copied over via netlink, providing useful information for STP. Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
|
#
5d1fcaf3 |
|
04-Nov-2019 |
Nikolay Aleksandrov <nikolay@cumulusnetworks.com> |
net: bridge: fdb: eliminate extra port state tests from fast-path When commit df1c0b8468b3 ("[BRIDGE]: Packets leaking out of disabled/blocked ports.") introduced the port state tests in br_fdb_update() it was to avoid learning/refreshing from STP BPDUs, it was also used to avoid learning/refreshing from user-space with NTF_USE. Those two tests are done for every packet entering the bridge if it's learning, but for the fast-path we already have them checked in br_handle_frame() and is unnecessary to do it again. Thus push the checks to the unlikely cases and drop them from br_fdb_update(), the new nbp_state_should_learn() helper is used to determine if the port state allows br_fdb_update() to be called. The two places which need to do it manually are: - user-space add call with NTF_USE set - link-local packet learning done in __br_handle_local_finish() Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
be0c5677 |
|
01-Nov-2019 |
Nikolay Aleksandrov <nikolay@cumulusnetworks.com> |
net: bridge: fdb: br_fdb_update can take flags directly If we modify br_fdb_update() to take flags directly we can get rid of one test and one atomic bitop in the learning path. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
d38c6e3d |
|
29-Oct-2019 |
Nikolay Aleksandrov <nikolay@cumulusnetworks.com> |
net: bridge: fdb: convert offloaded to use bitops Convert the offloaded field to a flag and use bitops. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
b5cd9f7c |
|
29-Oct-2019 |
Nikolay Aleksandrov <nikolay@cumulusnetworks.com> |
net: bridge: fdb: convert added_by_external_learn to use bitops Convert the added_by_external_learn field to a flag and use bitops. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
ac3ca6af |
|
29-Oct-2019 |
Nikolay Aleksandrov <nikolay@cumulusnetworks.com> |
net: bridge: fdb: convert added_by_user to bitops Straight-forward convert of the added_by_user field to bitops. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
e0458d9a |
|
29-Oct-2019 |
Nikolay Aleksandrov <nikolay@cumulusnetworks.com> |
net: bridge: fdb: convert is_sticky to bitops Straight-forward convert of the is_sticky field to bitops. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
29e63fff |
|
29-Oct-2019 |
Nikolay Aleksandrov <nikolay@cumulusnetworks.com> |
net: bridge: fdb: convert is_static to bitops Convert the is_static to bitops, make use of the combined test_and_set/clear_bit to simplify expressions in fdb_add_entry. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
6869c3b0 |
|
29-Oct-2019 |
Nikolay Aleksandrov <nikolay@cumulusnetworks.com> |
net: bridge: fdb: convert is_local to bitops The patch adds a new fdb flags field in the hole between the two cache lines and uses it to convert is_local to bitops. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
1bc844ee |
|
17-Aug-2019 |
Nikolay Aleksandrov <nikolay@cumulusnetworks.com> |
net: bridge: mdb: allow add/delete for host-joined groups Currently this is needed only for user-space compatibility, so similar object adds/deletes as the dumped ones would succeed. Later it can be used for L2 mcast MAC add/delete. v3: fix compiler warning (DaveM) v2: don't send a notification when used from user-space, arm the group timer if no ports are left after host entry del Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
091adf9b |
|
02-Aug-2019 |
Nikolay Aleksandrov <nikolay@cumulusnetworks.com> |
net: bridge: move default pvid init/deinit to NETDEV_REGISTER/UNREGISTER Most of the bridge device's vlan init bugs come from the fact that its default pvid is created at the wrong time, way too early in ndo_init() before the device is even assigned an ifindex. It introduces a bug when the bridge's dev_addr is added as fdb during the initial default pvid creation the notification has ifindex/NDA_MASTER both equal to 0 (see example below) which really makes no sense for user-space[0] and is wrong. Usually user-space software would ignore such entries, but they are actually valid and will eventually have all necessary attributes. It makes much more sense to send a notification *after* the device has registered and has a proper ifindex allocated rather than before when there's a chance that the registration might still fail or to receive it with ifindex/NDA_MASTER == 0. Note that we can remove the fdb flush from br_vlan_flush() since that case can no longer happen. At NETDEV_REGISTER br->default_pvid is always == 1 as it's initialized by br_vlan_init() before that and at NETDEV_UNREGISTER it can be anything depending why it was called (if called due to NETDEV_REGISTER error it'll still be == 1, otherwise it could be any value changed during the device life time). For the demonstration below a small change to iproute2 for printing all fdb notifications is added, because it contained a workaround not to show entries with ifindex == 0. Command executed while monitoring: $ ip l add br0 type bridge Before (both ifindex and master == 0): $ bridge monitor fdb 36:7e:8a:b3:56:ba dev * vlan 1 master * permanent After (proper br0 ifindex): $ bridge monitor fdb e6:2a:ae:7a:b7:48 dev br0 vlan 1 master br0 permanent v4: move only the default pvid init/deinit to NETDEV_REGISTER/UNREGISTER v3: send the correct v2 patch with all changes (stub should return 0) v2: on error in br_vlan_init set br->vlgrp to NULL and return 0 in the br_vlan_bridge_event stub when bridge vlans are disabled [0] https://bugzilla.kernel.org/show_bug.cgi?id=204389 Reported-by: michael-dev <michael-dev@fami-braun.de> Fixes: 5be5a2df40f0 ("bridge: Add filtering support for default_pvid") Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
3247b272 |
|
30-Jul-2019 |
Nikolay Aleksandrov <nikolay@cumulusnetworks.com> |
net: bridge: mcast: add delete due to fast-leave mdb flag In user-space there's no way to distinguish why an mdb entry was deleted and that is a problem for daemons which would like to keep the mdb in sync with remote ends (e.g. mlag) but would also like to converge faster. In almost all cases we'd like to age-out the remote entry for performance and convergence reasons except when fast-leave is enabled. In that case we want explicit immediate remote delete, thus add mdb flag which is set only when the entry is being deleted due to fast-leave. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
3c171f49 |
|
29-May-2019 |
Pablo Neira Ayuso <pablo@netfilter.org> |
netfilter: bridge: add connection tracking system This patch adds basic connection tracking support for the bridge, including initial IPv4 support. This patch register two hooks to deal with the bridge forwarding path, one from the bridge prerouting hook to call nf_conntrack_in(); and another from the bridge postrouting hook to confirm the entry. The conntrack bridge prerouting hook defragments packets before passing them to nf_conntrack_in() to look up for an existing entry, otherwise a new entry is allocated and it is attached to the skbuff. The conntrack bridge postrouting hook confirms new conntrack entries, ie. if this is the first packet seen, then it adds the entry to the hashtable and (if needed) it refragments the skbuff into the original fragments, leaving the geometry as is if possible. Exceptions are linearized skbuffs, eg. skbuffs that are passed up to nfqueue and conntrack helpers, as well as cloned skbuff for the local delivery (eg. tcpdump), also in case of bridge port flooding (cloned skbuff too). The packet defragmentation is done through the ip_defrag() call. This forces us to save the bridge control buffer, reset the IP control buffer area and then restore it after call. This function also bumps the IP fragmentation statistics, it would be probably desiderable to have independent statistics for the bridge defragmentation/refragmentation. The maximum fragment length is stored in the control buffer and it is used to refragment the skbuff from the postrouting path. The new fraglist splitter and fragment transformer APIs are used to implement the bridge refragmentation code. The br_ip_fragment() function drops the packet in case the maximum fragment size seen is larger than the output port MTU. This patchset follows the principle that conntrack should not drop packets, so users can do it through policy via invalid state matching. Like br_netfilter, there is no refragmentation for packets that are passed up for local delivery, ie. prerouting -> input path. There are calls to nf_reset() already in several spots in the stack since time ago already, eg. af_packet, that show that skbuff fraglist handling from the netif_rx path is supported already. The helpers are called from the postrouting hook, before confirmation, from there we may see packet floods to bridge ports. Then, although unlikely, this may result in exercising the helpers many times for each clone. It would be good to explore how to pass all the packets in a list to the conntrack hook to do this handle only once for this case. Thanks to Florian Westphal for handing me over an initial patchset version to add support for conntrack bridge. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
2874c5fd |
|
27-May-2019 |
Thomas Gleixner <tglx@linutronix.de> |
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152 Based on 1 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms of the gnu general public license as published by the free software foundation either version 2 of the license or at your option any later version extracted by the scancode license scanner the SPDX license identifier GPL-2.0-or-later has been chosen to replace the boilerplate/reference in 3029 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Allison Randal <allison@lohutok.net> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190527070032.746973796@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
#
9c0ec2e7 |
|
18-Apr-2019 |
Mike Manning <mmanning@vyatta.att-mail.com> |
bridge: support binding vlan dev link state to vlan member bridge ports In the case of vlan filtering on bridges, the bridge may also have the corresponding vlan devices as upper devices. A vlan bridge binding mode is added to allow the link state of the vlan device to track only the state of the subset of bridge ports that are also members of the vlan, rather than that of all bridge ports. This mode is set with a vlan flag rather than a bridge sysfs so that the 8021q module is aware that it should not set the link state for the vlan device. If bridge vlan is configured, the bridge device event handling results in the link state for an upper device being set, if it is a vlan device with the vlan bridge binding mode enabled. This also sets a vlan_bridge_binding flag so that subsequent UP/DOWN/CHANGE events for the ports in that bridge result in a link state update of the vlan device if required. The link state of the vlan device is up if there is at least one bridge port that is a vlan member that is admin & oper up, otherwise its oper state is IF_OPER_LOWERLAYERDOWN. Signed-off-by: Mike Manning <mmanning@vyatta.att-mail.com> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
223fd0ad |
|
11-Apr-2019 |
Florian Westphal <fw@strlen.de> |
bridge: broute: make broute a real ebtables table This makes broute a normal ebtables table, hooking at PREROUTING. The broute hook is removed. It uses skb->cb to signal to bridge rx handler that the skb should be routed instead of being bridged. This change is backwards compatible with ebtables as no userspace visible parts are changed. This means we can also remove the !ops test in ebt_register_table, it was only there for broute table sake. Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: David S. Miller <davem@davemloft.net> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
#
f12064d1 |
|
11-Apr-2019 |
Florian Westphal <fw@strlen.de> |
bridge: reduce size of input cb to 16 bytes Reduce size of br_input_skb_cb from 24 to 16 bytes by using bitfield for those values that can only be 0 or 1. igmp is the igmp type value, so it needs to be at least u8. Furthermore, the bridge currently relies on step-by-step initialization of br_input_skb_cb fields as the skb passes through the stack. Explicitly zero out the bridge input cb instead, this avoids having to review/validate that no BR_INPUT_SKB_CB(skb)->foo test can see a 'random' value from previous protocol cb. AFAICS all current fields are always set up before they are read again, so this is not a bug fix. Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: David S. Miller <davem@davemloft.net> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
#
35f861e3 |
|
29-Mar-2019 |
Julian Wiedmann <jwi@linux.ibm.com> |
net: bridge: use netif_is_bridge_port() Replace the br_port_exists() macro with its twin from netdevice.h CC: Roopa Prabhu <roopa@cumulusnetworks.com> CC: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
87b0984e |
|
16-Jan-2019 |
Petr Machata <petrm@mellanox.com> |
net: Add extack argument to ndo_fdb_add() Drivers may not be able to support certain FDB entries, and an error code is insufficient to give clear hints as to the reasons of rejection. In order to make it possible to communicate the rejection reason, extend ndo_fdb_add() with an extack argument. Adapt the existing implementations of ndo_fdb_add() to take the parameter (and ignore it). Pass the extack parameter when invoking ndo_fdb_add() from rtnl_fdb_add(). Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
27973793 |
|
08-Jan-2019 |
Ido Schimmel <idosch@mellanox.com> |
net: bridge: Fix VLANs memory leak When adding / deleting VLANs to / from a bridge port, the bridge driver first tries to propagate the information via switchdev and falls back to the 8021q driver in case the underlying driver does not support switchdev. This can result in a memory leak [1] when VXLAN and mlxsw ports are enslaved to the bridge: $ ip link set dev vxlan0 master br0 # No mlxsw ports are enslaved to 'br0', so mlxsw ignores the switchdev # notification and the bridge driver adds the VLAN on 'vxlan0' via the # 8021q driver $ bridge vlan add vid 10 dev vxlan0 pvid untagged # mlxsw port is enslaved to the bridge $ ip link set dev swp1 master br0 # mlxsw processes the switchdev notification and the 8021q driver is # skipped $ bridge vlan del vid 10 dev vxlan0 This results in 'struct vlan_info' and 'struct vlan_vid_info' being leaked, as they were allocated by the 8021q driver during VLAN addition, but never freed as the 8021q driver was skipped during deletion. Fix this by introducing a new VLAN private flag that indicates whether the VLAN was added on the port by switchdev or the 8021q driver. If the VLAN was added by the 8021q driver, then we make sure to delete it via the 8021q driver as well. [1] unreferenced object 0xffff88822d20b1e8 (size 256): comm "bridge", pid 2532, jiffies 4295216998 (age 1188.830s) hex dump (first 32 bytes): e0 42 97 ce 81 88 ff ff 00 00 00 00 00 00 00 00 .B.............. 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<00000000f82d851d>] kmem_cache_alloc_trace+0x1be/0x330 [<00000000e0178b02>] vlan_vid_add+0x661/0x920 [<00000000218ebd5f>] __vlan_add+0x1be9/0x3a00 [<000000006eafa1ca>] nbp_vlan_add+0x8b3/0xd90 [<000000003535392c>] br_vlan_info+0x132/0x410 [<00000000aedaa9dc>] br_afspec+0x75c/0x870 [<00000000f5716133>] br_setlink+0x3dc/0x6d0 [<00000000aceca5e2>] rtnl_bridge_setlink+0x615/0xb30 [<00000000a2f2d23e>] rtnetlink_rcv_msg+0x3a3/0xa80 [<0000000064097e69>] netlink_rcv_skb+0x152/0x3c0 [<000000008be8d614>] rtnetlink_rcv+0x21/0x30 [<000000009ab2ca25>] netlink_unicast+0x52f/0x740 [<00000000e7d9ac96>] netlink_sendmsg+0x9c7/0xf50 [<000000005d1e2050>] sock_sendmsg+0xbe/0x120 [<00000000d51426bc>] ___sys_sendmsg+0x778/0x8f0 [<00000000b9d7b2cc>] __sys_sendmsg+0x112/0x270 unreferenced object 0xffff888227454308 (size 32): comm "bridge", pid 2532, jiffies 4295216998 (age 1188.882s) hex dump (first 32 bytes): 88 b2 20 2d 82 88 ff ff 88 b2 20 2d 82 88 ff ff .. -...... -.... 81 00 0a 00 01 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<00000000f82d851d>] kmem_cache_alloc_trace+0x1be/0x330 [<0000000018050631>] vlan_vid_add+0x3e6/0x920 [<00000000218ebd5f>] __vlan_add+0x1be9/0x3a00 [<000000006eafa1ca>] nbp_vlan_add+0x8b3/0xd90 [<000000003535392c>] br_vlan_info+0x132/0x410 [<00000000aedaa9dc>] br_afspec+0x75c/0x870 [<00000000f5716133>] br_setlink+0x3dc/0x6d0 [<00000000aceca5e2>] rtnl_bridge_setlink+0x615/0xb30 [<00000000a2f2d23e>] rtnetlink_rcv_msg+0x3a3/0xa80 [<0000000064097e69>] netlink_rcv_skb+0x152/0x3c0 [<000000008be8d614>] rtnetlink_rcv+0x21/0x30 [<000000009ab2ca25>] netlink_unicast+0x52f/0x740 [<00000000e7d9ac96>] netlink_sendmsg+0x9c7/0xf50 [<000000005d1e2050>] sock_sendmsg+0xbe/0x120 [<00000000d51426bc>] ___sys_sendmsg+0x778/0x8f0 [<00000000b9d7b2cc>] __sys_sendmsg+0x112/0x270 Fixes: d70e42b22dd4 ("mlxsw: spectrum: Enable VxLAN enslavement to VLAN-aware bridges") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Cc: Roopa Prabhu <roopa@cumulusnetworks.com> Cc: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Cc: bridge@lists.linux-foundation.org Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
47674562 |
|
15-Dec-2018 |
Roopa Prabhu <roopa@cumulusnetworks.com> |
bridge: support for ndo_fdb_get This patch implements ndo_fdb_get for the bridge fdb. Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Reviewed-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
169327d5 |
|
12-Dec-2018 |
Petr Machata <petrm@mellanox.com> |
net: bridge: Propagate extack to switchdev ndo_bridge_setlink has been updated in the previous patch to have extack available, and changelink RTNL op has had this argument since the time extack was added. Propagate both through the bridge driver to eventually reach br_switchdev_port_vlan_add(), where it will be used by subsequent patches. Signed-off-by: Petr Machata <petrm@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Acked-by: Ivan Vecera <ivecera@redhat.com> Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
2fd527b7 |
|
12-Dec-2018 |
Petr Machata <petrm@mellanox.com> |
net: ndo_bridge_setlink: Add extack Drivers may not be able to implement a VLAN addition or reconfiguration. In those cases it's desirable to explain to the user that it was rejected (and why). To that end, add extack argument to ndo_bridge_setlink. Adapt all users to that change. Following patches will use the new argument in the bridge driver. Signed-off-by: Petr Machata <petrm@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
d08c6bc0 |
|
05-Dec-2018 |
Nikolay Aleksandrov <nikolay@cumulusnetworks.com> |
net: bridge: increase multicast's default maximum number of entries bridge's default hash_max was 512 which is rather conservative, now that we're using the generic rhashtable API which autoshrinks let's increase it to 4096 and move it to a define in br_private.h. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
cf332bca |
|
05-Dec-2018 |
Nikolay Aleksandrov <nikolay@cumulusnetworks.com> |
net: bridge: mark hash_elasticity as obsolete Now that the bridge multicast uses the generic rhashtable interface we can drop the hash_elasticity option as that is already done for us and it's hardcoded to a maximum of RHT_ELASTICITY (16 currently). Add a warning about the obsolete option when the hash_elasticity is set. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
4329596c |
|
05-Dec-2018 |
Nikolay Aleksandrov <nikolay@cumulusnetworks.com> |
net: bridge: multicast: use non-bh rcu flavor The bridge multicast code has been using a mix of RCU and RCU-bh flavors sometimes in questionable way. Since we've moved to rhashtable just use non-bh RCU everywhere. In addition this simplifies freeing of objects and allows us to remove some unnecessary callback functions. v3: new patch Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
19e3a9c9 |
|
05-Dec-2018 |
Nikolay Aleksandrov <nikolay@cumulusnetworks.com> |
net: bridge: convert multicast to generic rhashtable The bridge multicast code currently uses a custom resizable hashtable which predates the generic rhashtable interface. It has many shortcomings compared and duplicates functionality that is presently available via the generic rhashtable, so this patch removes the custom rhashtable implementation in favor of the kernel's generic rhashtable. The hash maximum is kept and the rhashtable's size is used to do a loose check if it's reached in which case we revert to the old behaviour and disable further bridge multicast processing. Also now we can support any hash maximum, doesn't need to be a power of 2. v3: add non-rcu br_mdb_get variant and use it where multicast_lock is held to avoid RCU splat, drop hash_max function and just set it directly v2: handle when IGMP snooping is undefined, add br_mdb_init/uninit placeholders Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
70e4272b |
|
23-Nov-2018 |
Nikolay Aleksandrov <nikolay@cumulusnetworks.com> |
net: bridge: add no_linklocal_learn bool option Use the new boolopt API to add an option which disables learning from link-local packets. The default is kept as before and learning is enabled. This is a simple map from a boolopt bit to a bridge private flag that is tested before learning. v2: pass NULL for extack via sysfs Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
a428afe8 |
|
23-Nov-2018 |
Nikolay Aleksandrov <nikolay@cumulusnetworks.com> |
net: bridge: add support for user-controlled bool options We have been adding many new bridge options, a big number of which are boolean but still take up netlink attribute ids and waste space in the skb. Recently we discussed learning from link-local packets[1] and decided yet another new boolean option will be needed, thus introducing this API to save some bridge nl space. The API supports changing the value of multiple boolean options at once via the br_boolopt_multi struct which has an optmask (which options to set, bit per opt) and optval (options' new values). Future boolean options will only be added to the br_boolopt_id enum and then will have to be handled in br_boolopt_toggle/get. The API will automatically add the ability to change and export them via netlink, sysfs can use the single boolopt function versions to do the same. The behaviour with failing/succeeding is the same as with normal netlink option changing. If an option requires mapping to internal kernel flag or needs special configuration to be enabled then it should be handled in br_boolopt_toggle. It should also be able to retrieve an option's current state via br_boolopt_get. v2: WARN_ON() on unsupported option as that shouldn't be possible and also will help catch people who add new options without handling them for both set and get. Pass down extack so if an option desires it could set it on error and be more user-friendly. [1] https://www.spinics.net/lists/netdev/msg532698.html Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
9d332e69c |
|
16-Nov-2018 |
Nikolay Aleksandrov <nikolay@cumulusnetworks.com> |
net: bridge: fix vlan stats use-after-free on destruction Syzbot reported a use-after-free of the global vlan context on port vlan destruction. When I added per-port vlan stats I missed the fact that the global vlan context can be freed before the per-port vlan rcu callback. There're a few different ways to deal with this, I've chosen to add a new private flag that is set only when per-port stats are allocated so we can directly check it on destruction without dereferencing the global context at all. The new field in net_bridge_vlan uses a hole. v2: cosmetic change, move the check to br_process_vlan_info where the other checks are done v3: add change log in the patch, add private (in-kernel only) flags in a hole in net_bridge_vlan struct and use that instead of mixing user-space flags with private flags Fixes: 9163a0fc1f0c ("net: bridge: add support for per-port vlan stats") Reported-by: syzbot+04681da557a0e49a52e5@syzkaller.appspotmail.com Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
5978f8a9 |
|
08-Nov-2018 |
Michał Mirosław <mirq-linux@rere.qmqm.pl> |
bridge: use __vlan_hwaccel helpers This removes assumption than vlan_tci != 0 when tag is present. Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
e9ba0fbc |
|
17-Oct-2018 |
Ido Schimmel <idosch@mellanox.com> |
bridge: switchdev: Allow clearing FDB entry offload indication Currently, an FDB entry only ceases being offloaded when it is deleted. This changes with VxLAN encapsulation. Devices capable of performing VxLAN encapsulation usually have only one FDB table, unlike the software data path which has two - one in the bridge driver and another in the VxLAN driver. Therefore, bridge FDB entries pointing to a VxLAN device are only offloaded if there is a corresponding entry in the VxLAN FDB. Allow clearing the offload indication in case the corresponding entry was deleted from the VxLAN FDB. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
9163a0fc |
|
12-Oct-2018 |
Nikolay Aleksandrov <nikolay@cumulusnetworks.com> |
net: bridge: add support for per-port vlan stats This patch adds an option to have per-port vlan stats instead of the default global stats. The option can be set only when there are no port vlans in the bridge since we need to allocate the stats if it is set when vlans are being added to ports (and respectively free them when being deleted). Also bump RTNL_MAX_TYPE as the bridge is the largest user of options. The current stats design allows us to add these without any changes to the fast-path, it all comes down to the per-vlan stats pointer which, if this option is enabled, will be allocated for each port vlan instead of using the global bridge-wide one. CC: bridge@lists.linux-foundation.org CC: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
35750b0b |
|
26-Sep-2018 |
Nikolay Aleksandrov <nikolay@cumulusnetworks.com> |
net: bridge: pack net_bridge better Further reduce the size of net_bridge with 8 bytes and reduce the number of holes in it: Before: holes: 5, sum holes: 15 After: holes: 3, sum holes: 7 Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
3341d917 |
|
26-Sep-2018 |
Nikolay Aleksandrov <nikolay@cumulusnetworks.com> |
net: bridge: convert mtu_set_by_user to a bit Convert the last remaining bool option to a bit thus reducing the overall net_bridge size further by 8 bytes. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Reviewed-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
c69c2cd4 |
|
26-Sep-2018 |
Nikolay Aleksandrov <nikolay@cumulusnetworks.com> |
net: bridge: convert neigh_suppress_enabled option to a bit Convert the neigh_suppress_enabled option to a bit. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Reviewed-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
675779ad |
|
26-Sep-2018 |
Nikolay Aleksandrov <nikolay@cumulusnetworks.com> |
net: bridge: convert mcast options to bits This patch converts the rest of the mcast options to bits. It also packs the mcast options a little better by moving multicast_mld_version to an existing hole, reducing the net_bridge size by 8 bytes. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Reviewed-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
13cefad2 |
|
26-Sep-2018 |
Nikolay Aleksandrov <nikolay@cumulusnetworks.com> |
net: bridge: convert and rename mcast disabled Convert mcast disabled to an option bit and while doing so convert the logic to check if multicast is enabled instead. That is make the logic follow the option value - if it's set then mcast is enabled and vice versa. This avoids a few confusing places where we inverted the value that's being set to follow the mcast_disabled logic. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Reviewed-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
be3664a0 |
|
26-Sep-2018 |
Nikolay Aleksandrov <nikolay@cumulusnetworks.com> |
net: bridge: convert group_addr_set option to a bit Convert group_addr_set internal bridge opt to a bit. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Reviewed-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
8df3510f |
|
26-Sep-2018 |
Nikolay Aleksandrov <nikolay@cumulusnetworks.com> |
net: bridge: convert nf call options to bits No functional change, convert of nf_call_[ip|ip6|arp]tables to bits. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Reviewed-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
ae75767e |
|
26-Sep-2018 |
Nikolay Aleksandrov <nikolay@cumulusnetworks.com> |
net: bridge: add bitfield for options and convert vlan opts Bridge options have usually been added as separate fields all over the net_bridge struct taking up space and ending up in different cache lines. Let's move them to a single bitfield to save up space and speedup lookups. This patch adds a simple API for option modifying and retrieving using bitops and converts the first user of the API - the bridge vlan options (vlan_enabled and vlan_stats_enabled). Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Reviewed-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
1c1cb6d0 |
|
26-Sep-2018 |
Nikolay Aleksandrov <nikolay@cumulusnetworks.com> |
net: bridge: make struct opening bracket consistent Currently we have a mix of opening brackets on new lines and on the same line, let's move them all on the same line. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Reviewed-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
435f2e7c |
|
11-Sep-2018 |
Nikolay Aleksandrov <nikolay@cumulusnetworks.com> |
net: bridge: add support for sticky fdb entries Add support for entries which are "sticky", i.e. will not change their port if they show up from a different one. A new ndm flag is introduced for that purpose - NTF_STICKY. We allow to set it only to non-local entries. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
2756f68c |
|
23-Jul-2018 |
Nikolay Aleksandrov <nikolay@cumulusnetworks.com> |
net: bridge: add support for backup port This patch adds a new port attribute - IFLA_BRPORT_BACKUP_PORT, which allows to set a backup port to be used for known unicast traffic if the port has gone carrier down. The backup pointer is rcu protected and set only under RTNL, a counter is maintained so when deleting a port we know how many other ports reference it as a backup and we remove it from all. Also the pointer is in the first cache line which is hot at the time of the check and thus in the common case we only add one more test. The backup port will be used only for the non-flooding case since it's a part of the bridge and the flooded packets will be forwarded to it anyway. To remove the forwarding just send a 0/non-existing backup port. This is used to avoid numerous scalability problems when using MLAG most notably if we have thousands of fdbs one would need to change all of them on port carrier going down which takes too long and causes a storm of fdb notifications (and again when the port comes back up). In a Multi-chassis Link Aggregation setup usually hosts are connected to two different switches which act as a single logical switch. Those switches usually have a control and backup link between them called peerlink which might be used for communication in case a host loses connectivity to one of them. We need a fast way to failover in case a host port goes down and currently none of the solutions (like bond) cannot fulfill the requirements because the participating ports are actually the "master" devices and must have the same peerlink as their backup interface and at the same time all of them must participate in the bridge device. As Roopa noted it's normal practice in routing called fast re-route where a precalculated backup path is used when the main one is down. Another use case of this is with EVPN, having a single vxlan device which is backup of every port. Due to the nature of master devices it's not currently possible to use one device as a backup for many and still have all of them participate in the bridge (which is master itself). More detailed information about MLAG is available at the link below. https://docs.cumulusnetworks.com/display/DOCS/Multi-Chassis+Link+Aggregation+-+MLAG Further explanation and a diagram by Roopa: Two switches acting in a MLAG pair are connected by the peerlink interface which is a bridge port. the config on one of the switches looks like the below. The other switch also has a similar config. eth0 is connected to one port on the server. And the server is connected to both switches. br0 -- team0---eth0 | -- switch-peerlink Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
705e0dea |
|
20-Jul-2018 |
Tyler Hicks <tyhicks@canonical.com> |
bridge: make sure objects belong to container's owner When creating various bridge objects in /sys/class/net/... make sure that they belong to the container's owner instead of global root (if they belong to a container/namespace). Signed-off-by: Tyler Hicks <tyhicks@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
d66e4348 |
|
29-May-2018 |
Petr Machata <petrm@mellanox.com> |
net: bridge: Extract boilerplate around switchdev_port_obj_*() A call to switchdev_port_obj_add() or switchdev_port_obj_del() involves initializing a struct switchdev_obj_port_vlan, a piece of code that repeats on each call site almost verbatim. While in the current codebase there is just one duplicated add call, the follow-up patches add more of both add and del calls. Thus to remove the duplication, extract the repetition into named functions and reuse. Signed-off-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
7d850abd |
|
24-May-2018 |
Nikolay Aleksandrov <nikolay@cumulusnetworks.com> |
net: bridge: add support for port isolation This patch adds support for a new port flag - BR_ISOLATED. If it is set then isolated ports cannot communicate between each other, but they can still communicate with non-isolated ports. The same can be achieved via ACLs but they can't scale with large number of ports and also the complexity of the rules grows. This feature can be used to achieve isolated vlan functionality (similar to pvlan) as well, though currently it will be port-wide (for all vlans on the port). The new test in should_deliver uses data that is already cache hot and the new boolean is used to avoid an additional source port test in should_deliver. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Reviewed-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
161d82de |
|
03-May-2018 |
Petr Machata <petrm@mellanox.com> |
net: bridge: Notify about !added_by_user FDB entries Do not automatically bail out on sending notifications about activity on non-user-added FDB entries. Instead, notify about this activity except for cases where the activity itself originates in a notification, to avoid sending duplicate notifications. Signed-off-by: Petr Machata <petrm@mellanox.com> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Acked-by: Ivan Vecera <ivecera@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
faa1cd82 |
|
03-May-2018 |
Nikolay Aleksandrov <nikolay@cumulusnetworks.com> |
net: bridge: avoid duplicate notification on up/down/change netdev events While handling netdevice events, br_device_event() sometimes uses br_stp_(disable|enable)_port which unconditionally send a notification, but then a second notification for the same event is sent at the end of the br_device_event() function. To avoid sending duplicate notifications in such cases, check if one has already been sent (i.e. br_stp_enable/disable_port have been called). The patch is based on a change by Satish Ashok. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
4d4fd361 |
|
29-Apr-2018 |
Petr Machata <petrm@mellanox.com> |
net: bridge: Publish bridge accessor functions Add a couple new functions to allow querying FDB and vlan settings of a bridge. Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
804b854d |
|
30-Mar-2018 |
Nikolay Aleksandrov <nikolay@cumulusnetworks.com> |
net: bridge: disable bridge MTU auto tuning if it was set manually As Roopa noted today the biggest source of problems when configuring bridge and ports is that the bridge MTU keeps changing automatically on port events (add/del/changemtu). That leads to inconsistent behaviour and network config software needs to chase the MTU and fix it on each such event. Let's improve on that situation and allow for the user to set any MTU within ETH_MIN/MAX limits, but once manually configured it is the user's responsibility to keep it correct afterwards. In case the MTU isn't manually set - the behaviour reverts to the previous and the bridge follows the minimum MTU. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
f40aa233 |
|
30-Mar-2018 |
Nikolay Aleksandrov <nikolay@cumulusnetworks.com> |
net: bridge: set min MTU on port events and allow user to set max Recently the bridge was changed to automatically set maximum MTU on port events (add/del/changemtu) when vlan filtering is enabled, but that actually changes behaviour in a way which breaks some setups and can lead to packet drops. In order to still allow that maximum to be set while being compatible, we add the ability for the user to tune the bridge MTU up to the maximum when vlan filtering is enabled, but that has to be done explicitly and all port events (add/del/changemtu) lead to resetting that MTU to the minimum as before. Suggested-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
419d14af |
|
22-Mar-2018 |
Chas Williams <3chas3@gmail.com> |
bridge: Allow max MTU when multiple VLANs present If the bridge is allowing multiple VLANs, some VLANs may have different MTUs. Instead of choosing the minimum MTU for the bridge interface, choose the maximum MTU of the bridge members. With this the user only needs to set a larger MTU on the member ports that are participating in the large MTU VLANS. Signed-off-by: Chas Williams <3chas3@gmail.com> Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
03aaa9e2 |
|
18-Jan-2018 |
Gustavo A. R. Silva <garsilva@embeddedor.com> |
bridge: return boolean instead of integer in br_multicast_is_router Return statements in functions returning bool should use true/false instead of 1/0. This issue was detected with the help of Coccinelle. Fixes: 85b352693264 ("bridge: Fix build error when IGMP_SNOOPING is not enabled") Signed-off-by: Gustavo A. R. Silva <garsilva@embeddedor.com> Reviewed-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
eb793583 |
|
12-Dec-2017 |
Nikolay Aleksandrov <nikolay@cumulusnetworks.com> |
net: bridge: use rhashtable for fdbs Before this patch the bridge used a fixed 256 element hash table which was fine for small use cases (in my tests it starts to degrade above 1000 entries), but it wasn't enough for medium or large scale deployments. Modern setups have thousands of participants in a single bridge, even only enabling vlans and adding a few thousand vlan entries will cause a few thousand fdbs to be automatically inserted per participating port. So we need to scale the fdb table considerably to cope with modern workloads, and this patch converts it to use a rhashtable for its operations thus improving the bridge scalability. Tests show the following results (10 runs each), at up to 1000 entries rhashtable is ~3% slower, at 2000 rhashtable is 30% faster, at 3000 it is 2 times faster and at 30000 it is 50 times faster. Obviously this happens because of the properties of the two constructs and is expected, rhashtable keeps pretty much a constant time even with 10000000 entries (tested), while the fixed hash table struggles considerably even above 10000. As a side effect this also reduces the net_bridge struct size from 3248 bytes to 1344 bytes. Also note that the key struct is 8 bytes. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
ff0fd34e |
|
09-Nov-2017 |
Andrew Lunn <andrew@lunn.ch> |
net: bridge: Rename mglist to host_joined The boolean mglist indicates the host has joined a particular multicast group on the bridge interface. It is badly named, obscuring what is means. Rename it. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Acked-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
92899063 |
|
31-Oct-2017 |
Nikolay Aleksandrov <nikolay@cumulusnetworks.com> |
net: bridge: add notifications for the bridge dev on vlan change Currently the bridge device doesn't generate any notifications upon vlan modifications on itself because it doesn't use the generic bridge notifications. With the recent changes we know if anything was modified in the vlan config thus we can generate a notification when necessary for the bridge device so add support to br_ifinfo_notify() similar to how other combined functions are done - if port is present it takes precedence, otherwise notify about the bridge. I've explicitly marked the locations where the notification should be always for the port by setting bridge to NULL. I've also taken the liberty to rearrange each modified function's local variables in reverse xmas tree as well. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
f418af63 |
|
27-Oct-2017 |
Nikolay Aleksandrov <nikolay@cumulusnetworks.com> |
bridge: vlan: signal if anything changed on vlan add Before this patch there was no way to tell if the vlan add operation actually changed anything, thus we would always generate a notification on adds. Let's make the notifications more precise and generate them only if anything changed, so use the new bool parameter to signal that the vlan was updated. We cannot return an error because there are valid use cases that will be broken (e.g. overlapping range add) and also we can't risk masking errors due to calls into drivers for vlan add which can potentially return anything. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Reviewed-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
ed842fae |
|
06-Oct-2017 |
Roopa Prabhu <roopa@cumulusnetworks.com> |
bridge: suppress nd pkts on BR_NEIGH_SUPPRESS ports This patch avoids flooding and proxies ndisc packets for BR_NEIGH_SUPPRESS ports. Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
057658cb |
|
06-Oct-2017 |
Roopa Prabhu <roopa@cumulusnetworks.com> |
bridge: suppress arp pkts on BR_NEIGH_SUPPRESS ports This patch avoids flooding and proxies arp packets for BR_NEIGH_SUPPRESS ports. Moves existing br_do_proxy_arp to br_do_proxy_suppress_arp to support both proxy arp and neigh suppress. Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
821f1b21 |
|
06-Oct-2017 |
Roopa Prabhu <roopa@cumulusnetworks.com> |
bridge: add new BR_NEIGH_SUPPRESS port flag to suppress arp and nd flood This patch adds a new bridge port flag BR_NEIGH_SUPPRESS to suppress arp and nd flood on bridge ports. It implements rfc7432, section 10. https://tools.ietf.org/html/rfc7432#section-10 for ethernet VPN deployments. It is similar to the existing BR_PROXYARP* flags but has a few semantic differences to conform to EVPN standard. Unlike the existing flags, this new flag suppresses flood of all neigh discovery packets (arp and nd) to tunnel ports. Supports both vlan filtering and non-vlan filtering bridges. In case of EVPN, it is mainly used to avoid flooding of arp and nd packets to tunnel ports like vxlan. This patch adds netlink and sysfs support to set this bridge port flag. Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
ca752be0 |
|
04-Oct-2017 |
David Ahern <dsahern@gmail.com> |
net: bridge: Pass extack to down to netdev_master_upper_dev_link Pass extack arg to br_add_if. Add messages for a couple of failures and pass arg to netdev_master_upper_dev_link. Signed-off-by: David Ahern <dsahern@gmail.com> Acked-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
5af48b59 |
|
27-Sep-2017 |
Nikolay Aleksandrov <nikolay@cumulusnetworks.com> |
net: bridge: add per-port group_fwd_mask with less restrictions We need to be able to transparently forward most link-local frames via tunnels (e.g. vxlan, qinq). Currently the bridge's group_fwd_mask has a mask which restricts the forwarding of STP and LACP, but we need to be able to forward these over tunnels and control that forwarding on a per-port basis thus add a new per-port group_fwd_mask option which only disallows mac pause frames to be forwarded (they're always dropped anyway). The patch does not change the current default situation - all of the others are still restricted unless configured for forwarding. We have successfully tested this patch with LACP and STP forwarding over VxLAN and qinq tunnels. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
f1c2eddf |
|
03-Sep-2017 |
Ido Schimmel <idosch@mellanox.com> |
bridge: switchdev: Use an helper to clear forward mark Instead of using ifdef in the C file. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Suggested-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Tested-by: Yotam Gigi <yotamg@mellanox.com> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
25127759 |
|
04-Jul-2017 |
Reshetova, Elena <elena.reshetova@intel.com> |
net, bridge: convert net_bridge_vlan.refcnt from atomic_t to refcount_t refcount_t type and corresponding API should be used instead of atomic_t when the variable is used as a reference counter. This allows to avoid accidental refcounter overflows that might lead to use-after-free situations. Signed-off-by: Elena Reshetova <elena.reshetova@intel.com> Signed-off-by: Hans Liljestrand <ishkamiel@gmail.com> Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: David Windsor <dwindsor@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
9fe8bcec |
|
08-Jun-2017 |
Arkadi Sharshevsky <arkadis@mellanox.com> |
net: bridge: Receive notification about successful FDB offload When a new static FDB is added to the bridge a notification is sent to the driver for offload. In case of successful offload the driver should notify the bridge back, which in turn should mark the FDB as offloaded. Currently, externally learned is equivalent for being offloaded which is not correct due to the fact that FDBs which are added from user-space are also marked as externally learned. In order to specify if an FDB was successfully offloaded a new flag is introduced. Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
6b26b51b |
|
08-Jun-2017 |
Arkadi Sharshevsky <arkadis@mellanox.com> |
net: bridge: Add support for notifying devices about FDB add/del Currently the bridge doesn't notify the underlying devices about new FDBs learned. The FDB sync is placed on the switchdev notifier chain because devices may potentially learn FDB that are not directly related to their ports, for example: 1. Mixed SW/HW bridge - FDBs that point to the ASICs external devices should be offloaded as CPU traps in order to perform forwarding in slow path. 2. EVPN - Externally learned FDBs for the vtep device. Notification is sent only about static FDB add/del. This is done due to fact that currently this is the only scenario supported by switch drivers. Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Reviewed-by: Ivan Vecera <ivecera@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
0baa10ff |
|
08-Jun-2017 |
Arkadi Sharshevsky <arkadis@mellanox.com> |
net: bridge: Add support for calling FDB external learning under rcu This is done as a preparation to moving the switchdev notifier chain to be atomic. The FDB external learning should be called under rtnl or rcu. Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
3922285d |
|
08-Jun-2017 |
Arkadi Sharshevsky <arkadis@mellanox.com> |
net: bridge: Add support for offloading port attributes Currently the flood, learning and learning_sync port attributes are offloaded by setting the SELF flag. Add support for offloading the flood and learning attribute through the bridge code. In case of setting an unsupported flag on a offloded port the operation will fail. The learning_sync attribute doesn't have any software representation and cannot be offloaded through the bridge code. Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Reviewed-by: Ivan Vecera <ivecera@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
1f51445a |
|
26-May-2017 |
Ido Schimmel <idosch@mellanox.com> |
bridge: Export VLAN filtering state It's useful for drivers supporting bridge offload to be able to query the bridge's VLAN filtering state. Currently, upon enslavement to a bridge master, the offloading driver will only learn about the bridge's VLAN filtering state after the bridge device was already linked with its slave. Being able to query the bridge's VLAN filtering state allows such drivers to forbid enslavement in case resource couldn't be allocated for a VLAN-aware bridge and also choose the correct initialization routine for the enslaved port, which is dependent on the bridge type. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
b6fe0440 |
|
10-Apr-2017 |
Ido Schimmel <idosch@mellanox.com> |
bridge: implement missing ndo_uninit() While the bridge driver implements an ndo_init(), it was missing a symmetric ndo_uninit(), causing the different de-initialization operations to be scattered around its dellink() and destructor(). Implement a symmetric ndo_uninit() and remove the overlapping operations from its dellink() and destructor(). This is a prerequisite for the next patch, as it allows us to have a proper cleanup upon changelink() failure during the bridge's newlink(). Fixes: b6677449dff6 ("bridge: netlink: call br_changelink() during br_dev_newlink()") Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
d12c9176 |
|
16-Mar-2017 |
WANG Cong <xiyou.wangcong@gmail.com> |
bridge: resolve a false alarm of lockdep Andrei reported a false alarm of lockdep at net/bridge/br_fdb.c:109, this is because in Andrei's case, a spin_bug() was already triggered before this, therefore the debug_locks is turned off, lockdep_is_held() is no longer accurate after that. We should use lockdep_assert_held_once() instead of lockdep_is_held() to respect debug_locks. Fixes: 410b3d48f5111 ("bridge: fdb: add proper lock checks in searching functions") Reported-by: Andrei Vagin <avagin@gmail.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
410b3d48 |
|
13-Feb-2017 |
Nikolay Aleksandrov <nikolay@cumulusnetworks.com> |
bridge: fdb: add proper lock checks in searching functions In order to avoid new errors add checks to br_fdb_find and fdb_find_rcu functions. The first requires hash_lock, the second obviously RCU. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
bfd0aeac |
|
13-Feb-2017 |
Nikolay Aleksandrov <nikolay@cumulusnetworks.com> |
bridge: fdb: converge fdb searching functions into one Before this patch we had 3 different fdb searching functions which was confusing. This patch reduces all of them to one - fdb_find_rcu(), and two flavors: br_fdb_find() which requires hash_lock and br_fdb_find_rcu which requires RCU. This makes it clear what needs to be used, we also remove two abusers of __br_fdb_get which called it under hash_lock and replace them with br_fdb_find(). Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
1214628c |
|
04-Feb-2017 |
Nikolay Aleksandrov <nikolay@cumulusnetworks.com> |
bridge: move write-heavy fdb members in their own cache line Fdb's used and updated fields are written to on every packet forward and packet receive respectively. Thus if we are receiving packets from a particular fdb, they'll cause false-sharing with everyone who has looked it up (even if it didn't match, since mac/vid share cache line!). The "used" field is even worse since it is updated on every packet forward to that fdb, thus the standard config where X ports use a single gateway results in 100% fdb false-sharing. Note that this patch does not prevent the last scenario, but it makes it better for other bridge participants which are not using that fdb (and are only doing lookups over it). The point is with this move we make sure that only communicating parties get the false-sharing, in a later patch we'll show how to avoid that too. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
f7cdee8a |
|
04-Feb-2017 |
Nikolay Aleksandrov <nikolay@cumulusnetworks.com> |
bridge: move to workqueue gc Move the fdb garbage collector to a workqueue which fires at least 10 milliseconds apart and cleans chain by chain allowing for other tasks to run in the meantime. When having thousands of fdbs the system is much more responsive. Most importantly remove the need to check if the matched entry has expired in __br_fdb_get that causes false-sharing and is completely unnecessary if we cleanup entries, at worst we'll get 10ms of traffic for that entry before it gets deleted. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
1f90c7f3 |
|
04-Feb-2017 |
Nikolay Aleksandrov <nikolay@cumulusnetworks.com> |
bridge: modify bridge and port to have often accessed fields in one cache line Move around net_bridge so the vlan fields are in the beginning since they're checked on every packet even if vlan filtering is disabled. For the port move flags & vlan group to the beginning, so they're in the same cache line with the port's state (both flags and state are checked on each packet). Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
11538d03 |
|
31-Jan-2017 |
Roopa Prabhu <roopa@cumulusnetworks.com> |
bridge: vlan dst_metadata hooks in ingress and egress paths - ingress hook: - if port is a tunnel port, use tunnel info in attached dst_metadata to map it to a local vlan - egress hook: - if port is a tunnel port, use tunnel info attached to vlan to set dst_metadata on the skb CC: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
efa5356b |
|
31-Jan-2017 |
Roopa Prabhu <roopa@cumulusnetworks.com> |
bridge: per vlan dst_metadata netlink support This patch adds support to attach per vlan tunnel info dst metadata. This enables bridge driver to map vlan to tunnel_info at ingress and egress. It uses the kernel dst_metadata infrastructure. The initial use case is vlan to vni bridging, but the api is generic to extend to any tunnel_info in the future: - Uapi to configure/unconfigure/dump per vlan tunnel data - netlink functions to configure vlan and tunnel_info mapping - Introduces bridge port flag BR_LWT_VLAN to enable attach/detach dst_metadata to bridged packets on ports. off by default. - changes to existing code is mainly refactor some existing vlan handling netlink code + hooks for new vlan tunnel code - I have kept the vlan tunnel code isolated in separate files. - most of the netlink vlan tunnel code is handling of vlan-tunid ranges (follows the vlan range handling code). To conserve space vlan-tunid by default are always dumped in ranges if applicable. Use case: example use for this is a vxlan bridging gateway or vtep which maps vlans to vn-segments (or vnis). iproute2 example (patched and pruned iproute2 output to just show relevant fdb entries): example shows same host mac learnt on two vni's and vlan 100 maps to vni 1000, vlan 101 maps to vni 1001 before (netdev per vni): $bridge fdb show | grep "00:02:00:00:00:03" 00:02:00:00:00:03 dev vxlan1001 vlan 101 master bridge 00:02:00:00:00:03 dev vxlan1001 dst 12.0.0.8 self 00:02:00:00:00:03 dev vxlan1000 vlan 100 master bridge 00:02:00:00:00:03 dev vxlan1000 dst 12.0.0.8 self after this patch with collect metdata in bridged mode (single netdev): $bridge fdb show | grep "00:02:00:00:00:03" 00:02:00:00:00:03 dev vxlan0 vlan 101 master bridge 00:02:00:00:00:03 dev vxlan0 src_vni 1001 dst 12.0.0.8 self 00:02:00:00:00:03 dev vxlan0 vlan 100 master bridge 00:02:00:00:00:03 dev vxlan0 src_vni 1000 dst 12.0.0.8 self CC: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
6db6f0ea |
|
21-Jan-2017 |
Felix Fietkau <nbd@nbd.name> |
bridge: multicast to unicast Implements an optional, per bridge port flag and feature to deliver multicast packets to any host on the according port via unicast individually. This is done by copying the packet per host and changing the multicast destination MAC to a unicast one accordingly. multicast-to-unicast works on top of the multicast snooping feature of the bridge. Which means unicast copies are only delivered to hosts which are interested in it and signalized this via IGMP/MLD reports previously. This feature is intended for interface types which have a more reliable and/or efficient way to deliver unicast packets than broadcast ones (e.g. wifi). However, it should only be enabled on interfaces where no IGMPv2/MLDv1 report suppression takes place. This feature is disabled by default. The initial patch and idea is from Felix Fietkau. Signed-off-by: Felix Fietkau <nbd@nbd.name> [linus.luessing@c0d3.blue: various bug + style fixes, commit message] Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue> Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
34d8acd8 |
|
10-Dec-2016 |
Vivien Didelot <vivien.didelot@gmail.com> |
net: bridge: shorten ageing time on topology change 802.1D [1] specifies that the bridges must use a short value to age out dynamic entries in the Filtering Database for a period, once a topology change has been communicated by the root bridge. Add a bridge_ageing_time member in the net_bridge structure to store the bridge ageing time value configured by the user (ioctl/netlink/sysfs). If we are using in-kernel STP, shorten the ageing time value to twice the forward delay used by the topology when the topology change flag is set. When the flag is cleared, restore the configured ageing time. [1] "8.3.5 Notifying topology changes ", http://profesores.elo.utfsm.cl/~agv/elo309/doc/802.1D-1998.pdf Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
82dd4332 |
|
10-Dec-2016 |
Vivien Didelot <vivien.didelot@gmail.com> |
net: bridge: add helper to offload ageing time The SWITCHDEV_ATTR_ID_BRIDGE_AGEING_TIME switchdev attr is actually set when initializing a bridge port, and when configuring the bridge ageing time from ioctl/netlink/sysfs. Add a __set_ageing_time helper to offload the ageing time to physical switches, and add the SWITCHDEV_F_DEFER flag since it can be called under bridge lock. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
aa2ae3e7 |
|
21-Nov-2016 |
Nikolay Aleksandrov <nikolay@cumulusnetworks.com> |
bridge: mcast: add MLDv2 querier support This patch adds basic support for MLDv2 queries, the default is MLDv1 as before. A new multicast option - multicast_mld_version, adds the ability to change it between 1 and 2 via netlink and sysfs. The MLD option is disabled if CONFIG_IPV6 is disabled. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
5e923585 |
|
21-Nov-2016 |
Nikolay Aleksandrov <nikolay@cumulusnetworks.com> |
bridge: mcast: add IGMPv3 query support This patch adds basic support for IGMPv3 queries, the default is IGMPv2 as before. A new multicast option - multicast_igmp_version, adds the ability to change it between 2 and 3 via netlink and sysfs. The option struct member is in a 4 byte hole in net_bridge. There also a few minor style adjustments in br_multicast_new_group and br_multicast_add_group. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
8addd5e7 |
|
31-Aug-2016 |
Nikolay Aleksandrov <nikolay@cumulusnetworks.com> |
net: bridge: change unicast boolean to exact pkt_type Remove the unicast flag and introduce an exact pkt_type. That would help us for the upcoming per-port multicast flood flag and also slightly reduce the tests in the input fast path. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
d297653d |
|
30-Aug-2016 |
Roopa Prabhu <roopa@cumulusnetworks.com> |
rtnetlink: fdb dump: optimize by saving last interface markers fdb dumps spanning multiple skb's currently restart from the first interface again for every skb. This results in unnecessary iterations on the already visited interfaces and their fdb entries. In large scale setups, we have seen this to slow down fdb dumps considerably. On a system with 30k macs we see fdb dumps spanning across more than 300 skbs. To fix the problem, this patch replaces the existing single fdb marker with three markers: netdev hash entries, netdevs and fdb index to continue where we left off instead of restarting from the first netdev. This is consistent with link dumps. In the process of fixing the performance issue, this patch also re-implements fix done by commit 472681d57a5d ("net: ndo_fdb_dump should report -EMSGSIZE to rtnl_fdb_dump") (with an internal fix from Wilson Kok) in the following ways: - change ndo_fdb_dump handlers to return error code instead of the last fdb index - use cb->args strictly for dump frag markers and not error codes. This is consistent with other dump functions. Below results were taken on a system with 1000 netdevs and 35085 fdb entries: before patch: $time bridge fdb show | wc -l 15065 real 1m11.791s user 0m0.070s sys 1m8.395s (existing code does not return all macs) after patch: $time bridge fdb show | wc -l 35085 real 0m2.017s user 0m0.113s sys 0m1.942s Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: Wilson Kok <wkok@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
6bc506b4 |
|
25-Aug-2016 |
Ido Schimmel <idosch@mellanox.com> |
bridge: switchdev: Add forward mark support for stacked devices switchdev_port_fwd_mark_set() is used to set the 'offload_fwd_mark' of port netdevs so that packets being flooded by the device won't be flooded twice. It works by assigning a unique identifier (the ifindex of the first bridge port) to bridge ports sharing the same parent ID. This prevents packets from being flooded twice by the same switch, but will flood packets through bridge ports belonging to a different switch. This method is problematic when stacked devices are taken into account, such as VLANs. In such cases, a physical port netdev can have upper devices being members in two different bridges, thus requiring two different 'offload_fwd_mark's to be configured on the port netdev, which is impossible. The main problem is that packet and netdev marking is performed at the physical netdev level, whereas flooding occurs between bridge ports, which are not necessarily port netdevs. Instead, packet and netdev marking should really be done in the bridge driver with the switch driver only telling it which packets it already forwarded. The bridge driver will mark such packets using the mark assigned to the ingress bridge port and will prevent the packet from being forwarded through any bridge port sharing the same mark (i.e. having the same parent ID). Remove the current switchdev 'offload_fwd_mark' implementation and instead implement the proposed method. In addition, make rocker - the sole user of the mark - use the proposed method. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
9e0b27fe |
|
20-Jul-2016 |
Vivien Didelot <vivien.didelot@gmail.com> |
net: bridge: br_set_ageing_time takes a clock_t Change the ageing_time type in br_set_ageing_time() from u32 to what it is expected to be, i.e. a clock_t. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
37b090e6 |
|
13-Jul-2016 |
Nikolay Aleksandrov <nikolay@cumulusnetworks.com> |
net: bridge: remove _deliver functions and consolidate forward code Before this patch we had two flavors of most forwarding functions - _forward and _deliver, the difference being that the latter are used when the packets are locally originated. Instead of all this function pointer passing and code duplication, we can just pass a boolean noting that the packet was locally originated and use that to perform the necessary checks in __br_forward. This gives a minor performance improvement but more importantly consolidates the forwarding paths. Also add a kernel doc comment to explain the exported br_forward()'s arguments. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
b35c5f63 |
|
13-Jul-2016 |
Nikolay Aleksandrov <nikolay@cumulusnetworks.com> |
net: bridge: drop skb2/skb0 variables and use a local_rcv boolean Currently if the packet is going to be received locally we set skb0 or sometimes called skb2 variables to the original skb. This can get confusing and also we can avoid one conditional on the fast path by simply using a boolean and passing it around. Thanks to Roopa for the name suggestion. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
a65056ec |
|
06-Jul-2016 |
Nikolay Aleksandrov <razor@blackwall.org> |
net: bridge: extend MLD/IGMP query stats As was suggested this patch adds support for the different versions of MLD and IGMP query types. Since the user visible structure is still in net-next we can augment it instead of adding netlink attributes. The distinction between the different IGMP/MLD query types is done as suggested in Section 7.1, RFC 3376 [1] and Section 8.1, RFC 3810 [2] based on query payload size and code for IGMP. Since all IGMP packets go through multicast_rcv() and it uses ip_mc_check_igmp/ipv6_mc_check_mld we can be sure that at least the ip/ipv6 header can be directly used. [1] https://tools.ietf.org/html/rfc3376#section-7 [2] https://tools.ietf.org/html/rfc3810#section-8.1 Suggested-by: Linus Lüssing <linus.luessing@c0d3.blue> Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Acked-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
1080ab95 |
|
28-Jun-2016 |
Nikolay Aleksandrov <nikolay@cumulusnetworks.com> |
net: bridge: add support for IGMP/MLD stats and export them via netlink This patch adds stats support for the currently used IGMP/MLD types by the bridge. The stats are per-port (plus one stat per-bridge) and per-direction (RX/TX). The stats are exported via netlink via the new linkxstats API (RTM_GETSTATS). In order to minimize the performance impact, a new option is used to enable/disable the stats - multicast_stats_enabled, similar to the recent vlan stats. Also in order to avoid multiple IGMP/MLD type lookups and checks, we make use of the current "igmp" member of the bridge private skb->cb region to record the type on Rx (both host-generated and external packets pass by multicast_rcv()). We can do that since the igmp member was used as a boolean and all the valid IGMP/MLD types are positive values. The normal bridge fast-path is not affected at all, the only affected paths are the flooding ones and since we make use of the IGMP/MLD type, we can quickly determine if the packet should be counted using cache-hot data (cb's igmp member). We add counters for: * IGMP Queries * IGMP Leaves * IGMP v1/v2/v3 reports * MLD Queries * MLD Leaves * MLD v1/v2 reports These are invaluable when monitoring or debugging complex multicast setups with bridges. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
0888d5f3 |
|
23-Jun-2016 |
daniel <daniel@dd-wrt.com> |
Bridge: Fix ipv6 mc snooping if bridge has no ipv6 address The bridge is falsly dropping ipv6 mulitcast packets if there is: 1. No ipv6 address assigned on the brigde. 2. No external mld querier present. 3. The internal querier enabled. When the bridge fails to build mld queries, because it has no ipv6 address, it slilently returns, but keeps the local querier enabled. This specific case causes confusing packet loss. Ipv6 multicast snooping can only work if: a) An external querier is present OR b) The bridge has an ipv6 address an is capable of sending own queries Otherwise it has to forward/flood the ipv6 multicast traffic, because snooping cannot work. This patch fixes the issue by adding a flag to the bridge struct that indicates that there is currently no ipv6 address assinged to the bridge and returns a false state for the local querier in __br_multicast_querier_exists(). Special thanks to Linus Lüssing. Fixes: d1d81d4c3dd8 ("bridge: check return value of ipv6_dev_get_saddr()") Signed-off-by: Daniel Danzberger <daniel@dd-wrt.com> Acked-by: Linus Lüssing <linus.luessing@c0d3.blue> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
a60c0903 |
|
30-Apr-2016 |
Nikolay Aleksandrov <nikolay@cumulusnetworks.com> |
bridge: netlink: export per-vlan stats Add a new LINK_XSTATS_TYPE_BRIDGE attribute and implement the RTM_GETSTATS callbacks for IFLA_STATS_LINK_XSTATS (fill_linkxstats and get_linkxstats_size) in order to export the per-vlan stats. The paddings were added because soon these fields will be needed for per-port per-vlan stats (or something else if someone beats me to it) so avoiding at least a few more netlink attributes. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
6dada9b1 |
|
30-Apr-2016 |
Nikolay Aleksandrov <nikolay@cumulusnetworks.com> |
bridge: vlan: learn to count Add support for per-VLAN Tx/Rx statistics. Every global vlan context gets allocated a per-cpu stats which is then set in each per-port vlan context for quick access. The br_allowed_ingress() common function is used to account for Rx packets and the br_handle_vlan() common function is used to account for Tx packets. Stats accounting is performed only if the bridge-wide vlan_stats_enabled option is set either via sysfs or netlink. A struct hole between vlan_enabled and vlan_proto is used for the new option so it is in the same cache line. Currently it is binary (on/off) but it is intentionally restricted to exactly 0 and 1 since other values will be used in the future for different purposes (e.g. per-port stats). Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
45ebcce5 |
|
20-Apr-2016 |
Elad Raz <eladr@mellanox.com> |
bridge: mdb: Marking port-group as offloaded There is a race-condition when updating the mdb offload flag without using the mulicast_lock. This reverts commit 9e8430f8d60d98 ("bridge: mdb: Passing the port-group pointer to br_mdb module"). This patch marks offloaded MDB entry as "offload" by changing the port- group flags and marks it as MDB_PG_FLAGS_OFFLOAD. When switchdev PORT_MDB succeeded and adds a multicast group, a completion callback is been invoked "br_mdb_complete". The completion function locks the multicast_lock and finds the right net_bridge_port_group and marks it as offloaded. Fixes: 9e8430f8d60d98 ("bridge: mdb: Passing the port-group pointer to br_mdb module") Reported-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: Elad Raz <eladr@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
7c25b16d |
|
16-Feb-2016 |
Vivien Didelot <vivien.didelot@gmail.com> |
net: bridge: log port STP state on change Remove the shared br_log_state function and print the info directly in br_set_state, where the net_bridge_port state is actually changed. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Acked-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
9e8430f8 |
|
03-Feb-2016 |
Elad Raz <eladr@mellanox.com> |
bridge: mdb: Passing the port-group pointer to br_mdb module Passing the port-group to br_mdb in order to allow direct access to the structure. br_mdb will later use the structure to reflect HW reflection status via "state" variable. Signed-off-by: Elad Raz <eladr@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
9d06b6d8 |
|
03-Feb-2016 |
Elad Raz <eladr@mellanox.com> |
bridge: mdb: Separate br_mdb_entry->state from net_bridge_port_group->state Change net_bridge_port_group 'state' member to 'flags' and define new set of flags internal to the kernel. Signed-off-by: Elad Raz <eladr@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
f409d0ed |
|
12-Oct-2015 |
Nikolay Aleksandrov <nikolay@cumulusnetworks.com> |
bridge: vlan: move back vlan_flush Ido Schimmel reported a problem with switchdev devices because of the order change of del_nbp operations, more specifically the move of nbp_vlan_flush() which deletes all vlans and frees vlgrp after the rx_handler has been unregistered. So in order to fix this move vlan_flush back where it was and make it destroy the rhtable after NULLing vlgrp and waiting a grace period to make sure noone can see it. Reported-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
907b1e6e |
|
12-Oct-2015 |
Nikolay Aleksandrov <nikolay@cumulusnetworks.com> |
bridge: vlan: use proper rcu for the vlgrp member The bridge and port's vlgrp member is already used in RCU way, currently we rely on the fact that it cannot disappear while the port exists but that is error-prone and we might miss places with improper locking (either RCU or RTNL must be held to walk the vlan_list). So make it official and use RCU for vlgrp to catch offenders. Introduce proper vlgrp accessors and use them consistently throughout the code. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
c62987bb |
|
08-Oct-2015 |
Scott Feldman <sfeldma@gmail.com> |
bridge: push bridge setting ageing_time down to switchdev Use SWITCHDEV_F_SKIP_EOPNOTSUPP to skip over ports in bridge that don't support setting ageing_time (or setting bridge attrs in general). If push fails, don't update ageing_time in bridge and return err to user. If push succeeds, update ageing_time in bridge and run gc_timer now to recalabrate when to run gc_timer next, based on new ageing_time. Signed-off-by: Scott Feldman <sfeldma@gmail.com> Signed-off-by: Jiri Pirko <jiri@resnulli.us> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
0f963b75 |
|
04-Oct-2015 |
Nikolay Aleksandrov <nikolay@cumulusnetworks.com> |
bridge: netlink: add support for default_pvid Add IFLA_BR_VLAN_DEFAULT_PVID to allow setting/getting bridge's default_pvid via netlink. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
6be144f6 |
|
02-Oct-2015 |
Nikolay Aleksandrov <nikolay@cumulusnetworks.com> |
bridge: vlan: use br_vlan_should_use to simplify __vlan_add/del The checks that lead to num_vlans change are always what br_vlan_should_use checks for, namely if the vlan is only a context or not and depending on that it's either not counted or counted as a real/used vlan respectively. Also give better explanation in br_vlan_should_use's comment. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
77751ee8 |
|
30-Sep-2015 |
Nikolay Aleksandrov <nikolay@cumulusnetworks.com> |
bridge: vlan: move pvid inside net_bridge_vlan_group One obvious way to converge more code (which was also used by the previous vlan code) is to move pvid inside net_bridge_vlan_group. This allows us to simplify some and remove other port-specific functions. Also gives us the ability to simply pass the vlan group and use all of the contained information. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
2594e906 |
|
25-Sep-2015 |
Nikolay Aleksandrov <nikolay@cumulusnetworks.com> |
bridge: vlan: add per-vlan struct and move to rhashtables This patch changes the bridge vlan implementation to use rhashtables instead of bitmaps. The main motivation behind this change is that we need extensible per-vlan structures (both per-port and global) so more advanced features can be introduced and the vlan support can be extended. I've tried to break this up but the moment net_port_vlans is changed and the whole API goes away, thus this is a larger patch. A few short goals of this patch are: - Extensible per-vlan structs stored in rhashtables and a sorted list - Keep user-visible behaviour (compressed vlans etc) - Keep fastpath ingress/egress logic the same (optimizations to come later) Here's a brief list of some of the new features we'd like to introduce: - per-vlan counters - vlan ingress/egress mapping - per-vlan igmp configuration - vlan priorities - avoid fdb entries replication (e.g. local fdb scaling issues) The structure is kept single for both global and per-port entries so to avoid code duplication where possible and also because we'll soon introduce "port0 / aka bridge as port" which should simplify things further (thanks to Vlad for the suggestion!). Now we have per-vlan global rhashtable (bridge-wide) and per-vlan port rhashtable, if an entry is added to a port it'll get a pointer to its global context so it can be quickly accessed later. There's also a sorted vlan list which is used for stable walks and some user-visible behaviour such as the vlan ranges, also for error paths. VLANs are stored in a "vlan group" which currently contains the rhashtable, sorted vlan list and the number of "real" vlan entries. A good side-effect of this change is that it resembles how hw keeps per-vlan data. One important note after this change is that if a VLAN is being looked up in the bridge's rhashtable for filtering purposes (or to check if it's an existing usable entry, not just a global context) then the new helper br_vlan_should_use() needs to be used if the vlan is found. In case the lookup is done only with a port's vlan group, then this check can be skipped. Things tested so far: - basic vlan ingress/egress - pvids - untagged vlans - undef CONFIG_BRIDGE_VLAN_FILTERING - adding/deleting vlans in different scenarios (with/without global ctx, while transmitting traffic, in ranges etc) - loading/removing the module while having/adding/deleting vlans - extracting bridge vlan information (user ABI), compressed requests - adding/deleting fdbs on vlans - bridge mac change, promisc mode - default pvid change - kmemleak ON during the whole time Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
0c4b51f0 |
|
15-Sep-2015 |
Eric W. Biederman <ebiederm@xmission.com> |
netfilter: Pass net into okfn This is immediately motivated by the bridge code that chains functions that call into netfilter. Without passing net into the okfns the bridge code would need to guess about the best expression for the network namespace to process packets in. As net is frequently one of the first things computed in continuation functions after netfilter has done it's job passing in the desired network namespace is in many cases a code simplification. To support this change the function dst_output_okfn is introduced to simplify passing dst_output as an okfn. For the moment dst_output_okfn just silently drops the struct net. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
b22fbf22 |
|
27-Aug-2015 |
Nikolay Aleksandrov <nikolay@cumulusnetworks.com> |
bridge: fdb: rearrange net_bridge_fdb_entry While looking into fixing the local entries scalability issue I noticed that the structure is badly arranged because vlan_id would fall in a second cache line while keeping rcu which is used only when deleting in the first, so re-arrange the structure and push rcu to the end so we can get 16 bytes which can be used for other fields (by pushing rcu fully in the second 64 byte chunk). With this change all the core necessary information when doing fdb lookups will be available in a single cache line. pahole before (note vlan_id): struct net_bridge_fdb_entry { struct hlist_node hlist; /* 0 16 */ struct net_bridge_port * dst; /* 16 8 */ struct callback_head rcu; /* 24 16 */ long unsigned int updated; /* 40 8 */ long unsigned int used; /* 48 8 */ mac_addr addr; /* 56 6 */ unsigned char is_local:1; /* 62: 7 1 */ unsigned char is_static:1; /* 62: 6 1 */ unsigned char added_by_user:1; /* 62: 5 1 */ unsigned char added_by_external_learn:1; /* 62: 4 1 */ /* XXX 4 bits hole, try to pack */ /* XXX 1 byte hole, try to pack */ /* --- cacheline 1 boundary (64 bytes) --- */ __u16 vlan_id; /* 64 2 */ /* size: 72, cachelines: 2, members: 11 */ /* sum members: 65, holes: 1, sum holes: 1 */ /* bit holes: 1, sum bit holes: 4 bits */ /* padding: 6 */ /* last cacheline: 8 bytes */ } pahole after (note vlan_id): struct net_bridge_fdb_entry { struct hlist_node hlist; /* 0 16 */ struct net_bridge_port * dst; /* 16 8 */ long unsigned int updated; /* 24 8 */ long unsigned int used; /* 32 8 */ mac_addr addr; /* 40 6 */ __u16 vlan_id; /* 46 2 */ unsigned char is_local:1; /* 48: 7 1 */ unsigned char is_static:1; /* 48: 6 1 */ unsigned char added_by_user:1; /* 48: 5 1 */ unsigned char added_by_external_learn:1; /* 48: 4 1 */ /* XXX 4 bits hole, try to pack */ /* XXX 7 bytes hole, try to pack */ struct callback_head rcu; /* 56 16 */ /* --- cacheline 1 boundary (64 bytes) was 8 bytes ago --- */ /* size: 72, cachelines: 2, members: 11 */ /* sum members: 65, holes: 1, sum holes: 7 */ /* bit holes: 1, sum bit holes: 4 bits */ /* last cacheline: 8 bytes */ } Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
d2d427b3 |
|
27-Aug-2015 |
Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> |
bridge: Add netlink support for vlan_protocol attribute This enables bridge vlan_protocol to be configured through netlink. When CONFIG_BRIDGE_VLAN_FILTERING is disabled, kernel behaves the same way as this feature is not implemented. Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
a7854037 |
|
07-Aug-2015 |
Nikolay Aleksandrov <nikolay@cumulusnetworks.com> |
bridge: netlink: add support for vlan_filtering attribute This patch adds the ability to toggle the vlan filtering support via netlink. Since we're already running with rtnl in .changelink() we don't need to take any additional locks. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
949f1e39 |
|
23-Jul-2015 |
Satish Ashok <sashok@cumulusnetworks.com> |
bridge: mdb: notify on router port add and del Send notifications on router port add and del/expire, re-use the already existing MDBA_ROUTER and send NEWMDB/DELMDB netlink notifications respectively. Signed-off-by: Satish Ashok <sashok@cumulusnetworks.com> Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
a7ce45a7 |
|
20-Jul-2015 |
Nikolay Aleksandrov <razor@blackwall.org> |
bridge: mcast: fix br_multicast_dev_del warn when igmp snooping is not defined Fix: net/bridge/br_if.c: In function 'br_dev_delete': >> net/bridge/br_if.c:284:2: error: implicit declaration of function >> 'br_multicast_dev_del' [-Werror=implicit-function-declaration] br_multicast_dev_del(br); ^ cc1: some warnings being treated as errors when igmp snooping is not defined. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
e10177ab |
|
15-Jul-2015 |
Satish Ashok <sashok@cumulusnetworks.com> |
bridge: multicast: fix handling of temp and perm entries When the bridge (or port) is brought down/up flush only temp entries and leave the perm ones. Flush perm entries only when deleting the bridge device or the associated port. Signed-off-by: Satish Ashok <sashok@cumulusnetworks.com> Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
09cf0211 |
|
09-Jul-2015 |
Nikolay Aleksandrov <nikolay@cumulusnetworks.com> |
bridge: mdb: fill state in br_mdb_notify Fill also the port group state when sending notifications. Signed-off-by: Satish Ashok <sashok@cumulusnetworks.com> Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
1ea2d020 |
|
23-Jun-2015 |
Nikolay Aleksandrov <nikolay@cumulusnetworks.com> |
bridge: vlan: flush the dynamically learned entries on port vlan delete Add a new argument to br_fdb_delete_by_port which allows to specify a vid to match when flushing entries and use it in nbp_vlan_delete() to flush the dynamically learned entries of the vlan/port pair when removing a vlan from a port. Before this patch only the local mac was being removed and the dynamically learned ones were left to expire. Note that the do_all argument is still respected and if specified, the vid will be ignored. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
efb6de9b |
|
30-May-2015 |
Bernhard Thaler <bernhard.thaler@wvnet.at> |
netfilter: bridge: forward IPv6 fragmented packets IPv6 fragmented packets are not forwarded on an ethernet bridge with netfilter ip6_tables loaded. e.g. steps to reproduce 1) create a simple bridge like this modprobe br_netfilter brctl addbr br0 brctl addif br0 eth0 brctl addif br0 eth2 ifconfig eth0 up ifconfig eth2 up ifconfig br0 up 2) place a host with an IPv6 address on each side of the bridge set IPv6 address on host A: ip -6 addr add fd01:2345:6789:1::1/64 dev eth0 set IPv6 address on host B: ip -6 addr add fd01:2345:6789:1::2/64 dev eth0 3) run a simple ping command on host A with packets > MTU ping6 -s 4000 fd01:2345:6789:1::2 4) wait some time and run e.g. "ip6tables -t nat -nvL" on the bridge IPv6 fragmented packets traverse the bridge cleanly until somebody runs. "ip6tables -t nat -nvL". As soon as it is run (and netfilter modules are loaded) IPv6 fragmented packets do not traverse the bridge any more (you see no more responses in ping's output). After applying this patch IPv6 fragmented packets traverse the bridge cleanly in above scenario. Signed-off-by: Bernhard Thaler <bernhard.thaler@wvnet.at> [pablo@netfilter.org: small changes to br_nf_dev_queue_xmit] Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
#
411ffb4f |
|
30-May-2015 |
Bernhard Thaler <bernhard.thaler@wvnet.at> |
netfilter: bridge: refactor frag_max_size Currently frag_max_size is member of br_input_skb_cb and copied back and forth using IPCB(skb) and BR_INPUT_SKB_CB(skb) each time it is changed or used. Attach frag_max_size to nf_bridge_info and set value in pre_routing and forward functions. Use its value in forward and xmit functions. Signed-off-by: Bernhard Thaler <bernhard.thaler@wvnet.at> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
#
784b58a3 |
|
04-May-2015 |
Bernhard Thaler <bernhard.thaler@wvnet.at> |
bridge: change BR_GROUPFWD_RESTRICTED to allow forwarding of LLDP frames BR_GROUPFWD_RESTRICTED bitmask restricts users from setting values to /sys/class/net/brX/bridge/group_fwd_mask that allow forwarding of some IEEE 802.1D Table 7-10 Reserved addresses: (MAC Control) 802.3 01-80-C2-00-00-01 (Link Aggregation) 802.3 01-80-C2-00-00-02 802.1AB LLDP 01-80-C2-00-00-0E Change BR_GROUPFWD_RESTRICTED to allow to forward LLDP frames and document group_fwd_mask. e.g. echo 16384 > /sys/class/net/brX/bridge/group_fwd_mask allows to forward LLDP frames. This may be needed for bridge setups used for network troubleshooting or any other scenario where forwarding of LLDP frames is desired (e.g. bridge connecting a virtual machine to real switch transmitting LLDP frames that virtual machine needs to receive). Tested on a simple bridge setup with two interfaces and host transmitting LLDP frames on one side of this bridge (used lldpd). Setting group_fwd_mask as described above lets LLDP frames traverse bridge. Signed-off-by: Bernhard Thaler <bernhard.thaler@wvnet.at> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
46c264da |
|
28-Apr-2015 |
Nicolas Dichtel <nicolas.dichtel@6wind.com> |
bridge/nl: remove wrong use of NLM_F_MULTI NLM_F_MULTI must be used only when a NLMSG_DONE message is sent. In fact, it is sent only at the end of a dump. Libraries like libnl will wait forever for NLMSG_DONE. Fixes: e5a55a898720 ("net: create generic bridge ops") Fixes: 815cccbf10b2 ("ixgbe: add setlink, getlink support to ixgbe and ixgbevf") CC: John Fastabend <john.r.fastabend@intel.com> CC: Sathya Perla <sathya.perla@emulex.com> CC: Subbu Seetharaman <subbu.seetharaman@emulex.com> CC: Ajit Khaparde <ajit.khaparde@emulex.com> CC: Jeff Kirsher <jeffrey.t.kirsher@intel.com> CC: intel-wired-lan@lists.osuosl.org CC: Jiri Pirko <jiri@resnulli.us> CC: Scott Feldman <sfeldma@gmail.com> CC: Stephen Hemminger <stephen@networkplumber.org> CC: bridge@lists.linux-foundation.org Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
7026b1dd |
|
05-Apr-2015 |
David Miller <davem@davemloft.net> |
netfilter: Pass socket pointer down through okfn(). On the output paths in particular, we have to sometimes deal with two socket contexts. First, and usually skb->sk, is the local socket that generated the frame. And second, is potentially the socket used to control a tunneling socket, such as one the encapsulates using UDP. We do not want to disassociate skb->sk when encapsulating in order to fix this, because that would break socket memory accounting. The most extreme case where this can cause huge problems is an AF_PACKET socket transmitting over a vxlan device. We hit code paths doing checks that assume they are dealing with an ipv4 socket, but are actually operating upon the AF_PACKET one. Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
1a4ba64d |
|
10-Mar-2015 |
Pablo Neira Ayuso <pablo@netfilter.org> |
netfilter: bridge: use rcu hook to resolve br_netfilter dependency e5de75b ("netfilter: bridge: move DNAT helper to br_netfilter") results in the following link problem: net/bridge/br_device.c:29: undefined reference to `br_nf_prerouting_finish_bridge` Moreover it creates a hard dependency between br_netfilter and the bridge core, which is what we've been trying to avoid so far. Resolve this problem by using a hook structure so we reduce #ifdef pollution and keep bridge netfilter specific code under br_netfilter.c which was the original intention. Reported-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
#
e5de75bf |
|
08-Mar-2015 |
Pablo Neira Ayuso <pablo@netfilter.org> |
netfilter: bridge: move DNAT helper to br_netfilter Only one caller, there is no need to keep this in a header. Move it to br_netfilter.c where this belongs to. Based on patch from Florian Westphal. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
#
842a9ae0 |
|
03-Mar-2015 |
Jouni Malinen <jouni@codeaurora.org> |
bridge: Extend Proxy ARP design to allow optional rules for Wi-Fi This extends the design in commit 958501163ddd ("bridge: Add support for IEEE 802.11 Proxy ARP") with optional set of rules that are needed to meet the IEEE 802.11 and Hotspot 2.0 requirements for ProxyARP. The previously added BR_PROXYARP behavior is left as-is and a new BR_PROXYARP_WIFI alternative is added so that this behavior can be configured from user space when required. In addition, this enables proxyarp functionality for unicast ARP requests for both BR_PROXYARP and BR_PROXYARP_WIFI since it is possible to use unicast as well as broadcast for these frames. The key differences in functionality: BR_PROXYARP: - uses the flag on the bridge port on which the request frame was received to determine whether to reply - block bridge port flooding completely on ports that enable proxy ARP BR_PROXYARP_WIFI: - uses the flag on the bridge port to which the target device of the request belongs - block bridge port flooding selectively based on whether the proxyarp functionality replied Signed-off-by: Jouni Malinen <jouni@codeaurora.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
add511b3 |
|
29-Jan-2015 |
Roopa Prabhu <roopa@cumulusnetworks.com> |
bridge: add flags argument to ndo_bridge_setlink and ndo_bridge_dellink bridge flags are needed inside ndo_bridge_setlink/dellink handlers to avoid another call to parse IFLA_AF_SPEC inside these handlers This is used later in this series Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
3aeb6617 |
|
15-Jan-2015 |
Jiri Pirko <jiri@resnulli.us> |
net: replace br_fdb_external_learn_* calls with switchdev notifier events This patch benefits from newly introduced switchdev notifier and uses it to propagate fdb learn events from rocker driver to bridge. That avoids direct function calls and possible use by other listeners (ovs). Suggested-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Scott Feldman <sfeldma@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
df8a39de |
|
13-Jan-2015 |
Jiri Pirko <jiri@resnulli.us> |
net: rename vlan_tx_* helpers since "tx" is misleading there The same macros are used for rx as well. So rename it. Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
065c212a |
|
28-Nov-2014 |
Scott Feldman <sfeldma@gmail.com> |
bridge: move private brport flags to if_bridge.h so port drivers can use flags Signed-off-by: Scott Feldman <sfeldma@gmail.com> Signed-off-by: Jiri Pirko <jiri@resnulli.us> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Acked-by: Andy Gospodarek <gospo@cumulusnetworks.com> Acked-by: Florian Fainelli <f.fainelli@gmail.com Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
cf6b8e1e |
|
28-Nov-2014 |
Scott Feldman <sfeldma@gmail.com> |
bridge: add API to notify bridge driver of learned FBD on offloaded device When the swdev device learns a new mac/vlan on a port, it sends some async notification to the driver and the driver installs an FDB in the device. To give a holistic system view, the learned mac/vlan should be reflected in the bridge's FBD table, so the user, using normal iproute2 cmds, can view what is currently learned by the device. This API on the bridge driver gives a way for the swdev driver to install an FBD entry in the bridge FBD table. (And remove one). This is equivalent to the device running these cmds: bridge fdb [add|del] <mac> dev <dev> vid <vlan id> master This patch needs some extra eyeballs for review, in paricular around the locking and contexts. Signed-off-by: Scott Feldman <sfeldma@gmail.com> Signed-off-by: Jiri Pirko <jiri@resnulli.us> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
f6f6424b |
|
28-Nov-2014 |
Jiri Pirko <jiri@resnulli.us> |
net: make vid as a parameter for ndo_fdb_add/ndo_fdb_del Do the work of parsing NDA_VLAN directly in rtnetlink code, pass simple u16 vid to drivers from there. Signed-off-by: Jiri Pirko <jiri@resnulli.us> Acked-by: Andy Gospodarek <gospo@cumulusnetworks.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Acked-by: John Fastabend <john.r.fastabend@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
93859b13 |
|
28-Nov-2014 |
Jiri Pirko <jiri@resnulli.us> |
bridge: convert flags in fbd entry into bitfields Suggested-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
95850116 |
|
23-Oct-2014 |
Kyeyoon Park <kyeyoonp@codeaurora.org> |
bridge: Add support for IEEE 802.11 Proxy ARP This feature is defined in IEEE Std 802.11-2012, 10.23.13. It allows the AP devices to keep track of the hardware-address-to-IP-address mapping of the mobile devices within the WLAN network. The AP will learn this mapping via observing DHCP, ARP, and NS/NA frames. When a request for such information is made (i.e. ARP request, Neighbor Solicitation), the AP will respond on behalf of the associated mobile device. In the process of doing so, the AP will drop the multicast request frame that was intended to go out to the wireless medium. It was recommended at the LKS workshop to do this implementation in the bridge layer. vxlan.c is already doing something very similar. The DHCP snooping code will be added to the userspace application (hostapd) per the recommendation. This RFC commit is only for IPv4. A similar approach in the bridge layer will be taken for IPv6 as well. Signed-off-by: Kyeyoon Park <kyeyoonp@codeaurora.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
93fdd47e |
|
04-Oct-2014 |
Herbert Xu <herbert@gondor.apana.org.au> |
bridge: Save frag_max_size between PRE_ROUTING and POST_ROUTING As we may defragment the packet in IPv4 PRE_ROUTING and refragment it after POST_ROUTING we should save the value of frag_max_size. This is still very wrong as the bridge is supposed to leave the packets intact, meaning that the right thing to do is to use the original frag_list for fragmentation. Unfortunately we don't currently guarantee that the frag_list is left untouched throughout netfilter so until this changes this is the best we can do. There is also a spot in FORWARD where it appears that we can forward a packet without going through fragmentation, mark it so that we can fix it later. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
5be5a2df |
|
03-Oct-2014 |
Vlad Yasevich <vyasevich@gmail.com> |
bridge: Add filtering support for default_pvid Currently when vlan filtering is turned on on the bridge, the bridge will drop all traffic untill the user configures the filter. This isn't very nice for ports that don't care about vlans and just want untagged traffic. A concept of a default_pvid was recently introduced. This patch adds filtering support for default_pvid. Now, ports that don't care about vlans and don't define there own filter will belong to the VLAN of the default_pvid and continue to receive untagged traffic. This filtering can be disabled by setting default_pvid to 0. Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
3df6bf45 |
|
03-Oct-2014 |
Vlad Yasevich <vyasevich@gmail.com> |
bridge: Simplify pvid checks. Currently, if the pvid is not set, we return an illegal vlan value even though the pvid value is set to 0. Since pvid of 0 is currently invalid, just return 0 instead. This makes the current and future checks simpler. Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
96a20d9d |
|
03-Oct-2014 |
Vlad Yasevich <vyasevich@gmail.com> |
bridge: Add a default_pvid sysfs attribute This patch allows the user to set and retrieve default_pvid value. A new value can only be stored when vlan filtering is disabled. Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
775dd692 |
|
30-Sep-2014 |
Florian Fainelli <f.fainelli@gmail.com> |
net: bridge: add a br_set_state helper function In preparation for being able to propagate port states to e.g: notifiers or other kernel parts, do not manipulate the port state directly, but instead use a helper function which will allow us to do a bit more than just setting the state. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
34666d46 |
|
18-Sep-2014 |
Pablo Neira Ayuso <pablo@netfilter.org> |
netfilter: bridge: move br_netfilter out of the core Jesper reported that br_netfilter always registers the hooks since this is part of the bridge core. This harms performance for people that don't need this. This patch modularizes br_netfilter so it can be rmmod'ed, thus, the hooks can be unregistered. I think the bridge netfilter should have been a separated module since the beginning, Patrick agreed on that. Note that this is breaking compatibility for users that expect that bridge netfilter is going to be available after explicitly 'modprobe bridge' or via automatic load through brctl. However, the damage can be easily undone by modprobing br_netfilter. The bridge core also spots a message to provide a clue to people that didn't notice that this has been deprecated. On top of that, the plan is that nftables will not rely on this software layer, but integrate the connection tracking into the bridge layer to enable stateful filtering and NAT, which is was bridge netfilter users seem to require. This patch still keeps the fake_dst_ops in the bridge core, since this is required by when the bridge port is initialized. So we can safely modprobe/rmmod br_netfilter anytime. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Acked-by: Florian Westphal <fw@strlen.de>
|
#
20adfa1a |
|
12-Sep-2014 |
Vlad Yasevich <vyasevich@gmail.com> |
bridge: Check if vlan filtering is enabled only once. The bridge code checks if vlan filtering is enabled on both ingress and egress. When the state flip happens, it is possible for the bridge to currently be forwarding packets and forwarding behavior becomes non-deterministic. Bridge may drop packets on some interfaces, but not others. This patch solves this by caching the filtered state of the packet into skb_cb on ingress. The skb_cb is guaranteed to not be over-written between the time packet entres bridge forwarding path and the time it leaves it. On egress, we can then check the cached state to see if we need to apply filtering information. Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
5d5eacb3 |
|
10-Jul-2014 |
Jamal Hadi Salim <jhs@mojatatu.com> |
bridge: fdb dumping takes a filter device Dumping a bridge fdb dumps every fdb entry held. With this change we are going to filter on selected bridge port. Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
204177f3 |
|
10-Jun-2014 |
Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> |
bridge: Support 802.1ad vlan filtering This enables us to change the vlan protocol for vlan filtering. We come to be able to filter frames on the basis of 802.1ad vlan tags through a bridge. This also changes br->group_addr if it has not been set by user. This is needed for an 802.1ad bridge. (See IEEE 802.1Q-2011 8.13.5.) Furthermore, this sets br->group_fwd_mask_required so that an 802.1ad bridge can forward the Nearest Customer Bridge group addresses except for br->group_addr, which should be passed to higher layer. To change the vlan protocol, write a protocol in sysfs: # echo 0x88a8 > /sys/class/net/br0/bridge/vlan_protocol Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
f2808d22 |
|
10-Jun-2014 |
Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> |
bridge: Prepare for forwarding another bridge group addresses If a bridge is an 802.1ad bridge, it must forward another bridge group addresses (the Nearest Customer Bridge group addresses). (For details, see IEEE 802.1Q-2011 8.6.3.) As user might not want group_fwd_mask to be modified by enabling 802.1ad, introduce a new mask, group_fwd_mask_required, which indicates addresses the bridge wants to forward. This will be set by enabling 802.1ad. Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
8580e211 |
|
10-Jun-2014 |
Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> |
bridge: Prepare for 802.1ad vlan filtering support This enables a bridge to have vlan protocol informantion and allows vlan tag manipulation (retrieve, insert and remove tags) according to the vlan protocol. Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
2cd41431 |
|
07-Jun-2014 |
Linus Lüssing <linus.luessing@c0d3.blue> |
bridge: memorize and export selected IGMP/MLD querier port Adding bridge support to the batman-adv multicast optimization requires batman-adv knowing about the existence of bridged-in IGMP/MLD queriers to be able to reliably serve any multicast listener behind this same bridge. Signed-off-by: Linus Lüssing <linus.luessing@web.de> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
07f8ac4a |
|
07-Jun-2014 |
Linus Lüssing <linus.luessing@c0d3.blue> |
bridge: add export of multicast database adjacent to net_dev With this new, exported function br_multicast_list_adjacent(net_dev) a list of IPv4/6 addresses is returned. This list contains all multicast addresses sensed by the bridge multicast snooping feature on all bridge ports of the bridge interface of net_dev, excluding addresses from the specified net_device itself. Adding bridge support to the batman-adv multicast optimization requires batman-adv knowing about the existence of bridged-in multicast listeners to be able to reliably serve them with multicast packets. Signed-off-by: Linus Lüssing <linus.luessing@web.de> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
dc4eb53a |
|
07-Jun-2014 |
Linus Lüssing <linus.luessing@c0d3.blue> |
bridge: adhere to querier election mechanism specified by RFCs MLDv1 (RFC2710 section 6), MLDv2 (RFC3810 section 7.6.2), IGMPv2 (RFC2236 section 3) and IGMPv3 (RFC3376 section 6.6.2) specify that the querier with lowest source address shall become the selected querier. So far the bridge stopped its querier as soon as it heard another querier regardless of its source address. This results in the "wrong" querier potentially becoming the active querier or a potential, unnecessary querying delay. With this patch the bridge memorizes the source address of the currently selected querier and ignores queries from queriers with a higher source address than the currently selected one. This slight optimization is supposed to make it more RFC compliant (but is rather uncritical and therefore probably not necessary to be queued for stable kernels). Signed-off-by: Linus Lüssing <linus.luessing@web.de> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
90010b36 |
|
07-Jun-2014 |
Linus Lüssing <linus.luessing@c0d3.blue> |
bridge: rename struct bridge_mcast_query/querier The current naming of these two structs is very random, in that reversing their naming would not make any semantical difference. This patch tries to make the naming less confusing by giving them a more specific, distinguishable naming. This is also useful for the upcoming patches reintroducing the "struct bridge_mcast_querier" but for storing information about the selected querier (no matter if our own or a foreign querier). Signed-off-by: Linus Lüssing <linus.luessing@web.de> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
e0d7968a |
|
26-May-2014 |
Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> |
bridge: Prevent insertion of FDB entry with disallowed vlan br_handle_local_finish() is allowing us to insert an FDB entry with disallowed vlan. For example, when port 1 and 2 are communicating in vlan 10, and even if vlan 10 is disallowed on port 3, port 3 can interfere with their communication by spoofed src mac address with vlan id 10. Note: Even if it is judged that a frame should not be learned, it should not be dropped because it is destined for not forwarding layer but higher layer. See IEEE 802.1Q-2011 8.13.10. Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Acked-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
b1282726 |
|
20-May-2014 |
Cong Wang <xiyou.wangcong@gmail.com> |
bridge: make br_device_notifier static Merge net/bridge/br_notify.c into net/bridge/br.c, since it has only br_device_event() and br.c is small. Cc: Stephen Hemminger <stephen@networkplumber.org> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
d4f0e095 |
|
18-May-2014 |
Alexei Starovoitov <ast@kernel.org> |
net: bridge: fix build fix build when BRIDGE_VLAN_FILTERING is not set Fixes: 2796d0c648c94 ("bridge: Automatically manage port promiscuous mode") Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
2796d0c6 |
|
16-May-2014 |
Vlad Yasevich <vyasevic@redhat.com> |
bridge: Automatically manage port promiscuous mode. There exist configurations where the administrator or another management entity has the foreknowledge of all the mac addresses of end systems that are being bridged together. In these environments, the administrator can statically configure known addresses in the bridge FDB and disable flooding and learning on ports. This makes it possible to turn off promiscuous mode on the interfaces connected to the bridge. Here is why disabling flooding and learning allows us to control promiscuity: Consider port X. All traffic coming into this port from outside the bridge (ingress) will be either forwarded through other ports of the bridge (egress) or dropped. Forwarding (egress) is defined by FDB entries and by flooding in the event that no FDB entry exists. In the event that flooding is disabled, only FDB entries define the egress. Once learning is disabled, only static FDB entries provided by a management entity define the egress. If we provide information from these static FDBs to the ingress port X, then we'll be able to accept all traffic that can be successfully forwarded and drop all the other traffic sooner without spending CPU cycles to process it. Another way to define the above is as following equations: ingress = egress + drop expanding egress ingress = static FDB + learned FDB + flooding + drop disabling flooding and learning we a left with ingress = static FDB + drop By adding addresses from the static FDB entries to the MAC address filter of an ingress port X, we fully define what the bridge can process without dropping and can thus turn off promiscuous mode, thus dropping packets sooner. There have been suggestions that we may want to allow learning and update the filters with learned addresses as well. This would require mac-level authentication similar to 802.1x to prevent attacks against the hw filters as they are limited resource. Additionally, if the user places the bridge device in promiscuous mode, all ports are placed in promiscuous mode regardless of the changes to flooding and learning. Since the above functionality depends on full static configuration, we have also require that vlan filtering be enabled to take advantage of this. The reason is that the bridge has to be able to receive and process VLAN-tagged frames and the there are only 2 ways to accomplish this right now: promiscuous mode or vlan filtering. Suggested-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
f3a6ddf1 |
|
16-May-2014 |
Vlad Yasevich <vyasevic@redhat.com> |
bridge: Introduce BR_PROMISC flag Introduce a BR_PROMISC per-port flag that will help us track if the current port is supposed to be in promiscuous mode or not. For now, always start in promiscuous mode. Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
8db24af7 |
|
16-May-2014 |
Vlad Yasevich <vyasevic@redhat.com> |
bridge: Add functionality to sync static fdb entries to hw Add code that allows static fdb entires to be synced to the hw list for a specified port. This will be used later to program ports that can function in non-promiscuous mode. Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
e028e4b8 |
|
16-May-2014 |
Vlad Yasevich <vyasevic@redhat.com> |
bridge: Keep track of ports capable of automatic discovery. By default, ports on the bridge are capable of automatic discovery of nodes located behind the port. This is accomplished via flooding of unknown traffic (BR_FLOOD) and learning the mac addresses from these packets (BR_LEARNING). If the above functionality is disabled by turning off these flags, the port requires static configuration in the form of static FDB entries to function properly. This patch adds functionality to keep track of all ports capable of automatic discovery. This will later be used to control promiscuity settings. Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
a8779ec1 |
|
27-Mar-2014 |
Eric W. Biederman <ebiederm@xmission.com> |
netpoll: Remove gfp parameter from __netpoll_setup The gfp parameter was added in: commit 47be03a28cc6c80e3aa2b3e8ed6d960ff0c5c0af Author: Amerigo Wang <amwang@redhat.com> Date: Fri Aug 10 01:24:37 2012 +0000 netpoll: use GFP_ATOMIC in slave_enable_netpoll() and __netpoll_setup() slave_enable_netpoll() and __netpoll_setup() may be called with read_lock() held, so should use GFP_ATOMIC to allocate memory. Eric suggested to pass gfp flags to __netpoll_setup(). Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: "David S. Miller" <davem@davemloft.net> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> The reason for the gfp parameter was removed in: commit c4cdef9b7183159c23c7302aaf270d64c549f557 Author: dingtianhong <dingtianhong@huawei.com> Date: Tue Jul 23 15:25:27 2013 +0800 bonding: don't call slave_xxx_netpoll under spinlocks The slave_xxx_netpoll will call synchronize_rcu_bh(), so the function may schedule and sleep, it should't be called under spinlocks. bond_netpoll_setup() and bond_netpoll_cleanup() are always protected by rtnl lock, it is no need to take the read lock, as the slave list couldn't be changed outside rtnl lock. Signed-off-by: Ding Tianhong <dingtianhong@huawei.com> Cc: Jay Vosburgh <fubar@us.ibm.com> Cc: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: David S. Miller <davem@davemloft.net> Nothing else that calls __netpoll_setup or ndo_netpoll_setup requires a gfp paramter, so remove the gfp parameter from both of these functions making the code clearer. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
e5a727f6 |
|
23-Feb-2014 |
Joe Perches <joe@perches.com> |
bridge: Use ether_addr_copy and ETH_ALEN Convert the more obvious uses of memcpy to ether_addr_copy. There are still uses of memcpy that could be converted but these addresses are __aligned(2). Convert a couple uses of 6 in gr_private.h to ETH_ALEN. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
424bb9c9 |
|
07-Feb-2014 |
Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> |
bridge: Properly check if local fdb entry can be deleted when deleting vlan Vlan codes unconditionally delete local fdb entries. We should consider the possibility that other ports have the same address and vlan. Example of problematic case: ip link set eth0 address 12:34:56:78:90:ab ip link set eth1 address aa:bb:cc:dd:ee:ff brctl addif br0 eth0 brctl addif br0 eth1 # br0 will have mac address 12:34:56:78:90:ab bridge vlan add dev eth0 vid 10 bridge vlan add dev eth1 vid 10 bridge vlan add dev br0 vid 10 self We will have fdb entry such that f->dst == eth0, f->vlan_id == 10 and f->addr == 12:34:56:78:90:ab at this time. Next, delete eth0 vlan 10. bridge vlan del dev eth0 vid 10 In this case, we still need the entry for br0, but it will be deleted. Note that br0 needs the entry even though its mac address is not set manually. To delete the entry with proper condition checking, fdb_delete_local() is suitable to use. Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Acked-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
2b292fb4 |
|
07-Feb-2014 |
Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> |
bridge: Fix the way to check if a local fdb entry can be deleted We should take into account the followings when deleting a local fdb entry. - nbp_vlan_find() can be used only when vid != 0 to check if an entry is deletable, because a fdb entry with vid 0 can exist at any time while nbp_vlan_find() always return false with vid 0. Example of problematic case: ip link set eth0 address 12:34:56:78:90:ab ip link set eth1 address 12:34:56:78:90:ab brctl addif br0 eth0 brctl addif br0 eth1 ip link set eth0 address aa:bb:cc:dd:ee:ff Then, the fdb entry 12:34:56:78:90:ab will be deleted even though the bridge port eth1 still has that address. - The port to which the bridge device is attached might needs a local entry if its mac address is set manually. Example of problematic case: ip link set eth0 address 12:34:56:78:90:ab brctl addif br0 eth0 ip link set br0 address 12:34:56:78:90:ab ip link set eth0 address aa:bb:cc:dd:ee:ff Then, the fdb still must have the entry 12:34:56:78:90:ab, but it will be deleted. We can use br->dev->addr_assign_type to check if the address is manually set or not, but I propose another approach. Since we delete and insert local entries whenever changing mac address of the bridge device, we can change dst of the entry to NULL regardless of addr_assign_type when deleting an entry associated with a certain port, and if it is found to be unnecessary later, then delete it. That is, if changing mac address of a port, the entry might be changed to its dst being NULL first, but is eventually deleted when recalculating and changing bridge id. This approach is especially useful when we want to share the code with deleting vlan in which the bridge device might want such an entry regardless of addr_assign_type, and makes things easy because we don't have to consider if mac address of the bridge device will be changed or not at the time we delete a local entry of a port, which means fdb code will not be bothered even if the bridge id calculating logic is changed in the future. Also, this change reduces inconsistent state, where frames whose dst is the mac address of the bridge, can't reach the bridge because of premature fdb entry deletion. This change reduces the possibility that the bridge device replies unreachable mac address to arp requests, which could occur during the short window between calling del_nbp() and br_stp_recalculate_bridge_id() in br_del_if(). This will effective after br_fdb_delete_by_port() starts to use the same code by following patch. Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Acked-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
a5642ab4 |
|
07-Feb-2014 |
Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> |
bridge: Fix the way to find old local fdb entries in br_fdb_changeaddr br_fdb_changeaddr() assumes that there is at most one local entry per port per vlan. It used to be true, but since commit 36fd2b63e3b4 ("bridge: allow creating/deleting fdb entries via netlink"), it has not been so. Therefore, the function might fail to search a correct previous address to be deleted and delete an arbitrary local entry if user has added local entries manually. Example of problematic case: ip link set eth0 address ee:ff:12:34:56:78 brctl addif br0 eth0 bridge fdb add 12:34:56:78:90:ab dev eth0 master ip link set eth0 address aa:bb:cc:dd:ee:ff Then, the address 12:34:56:78:90:ab might be deleted instead of ee:ff:12:34:56:78, the original mac address of eth0. Address this issue by introducing a new flag, added_by_user, to struct net_bridge_fdb_entry. Note that br_fdb_delete_by_port() has to set added_by_user to 0 in cases like: ip link set eth0 address 12:34:56:78:90:ab ip link set eth1 address aa:bb:cc:dd:ee:ff brctl addif br0 eth0 bridge fdb add aa:bb:cc:dd:ee:ff dev eth0 master brctl addif br0 eth1 brctl delif br0 eth0 In this case, kernel should delete the user-added entry aa:bb:cc:dd:ee:ff, but it also should have been added by "brctl addif br0 eth1" originally, so we don't delete it and treat it a new kernel-created entry. Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
b86f81cc |
|
10-Jan-2014 |
WANG Cong <xiyou.wangcong@gmail.com> |
bridge: move br_net_exit() to br.c And it can become static. Cc: Stephen Hemminger <stephen@networkplumber.org> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
8f84985f |
|
03-Jan-2014 |
Li RongQing <roy.qing.li@gmail.com> |
net: unify the pcpu_tstats and br_cpu_netstats as one They are same, so unify them as one, pcpu_sw_netstats. Define pcpu_sw_netstat in netdevice.h, remove pcpu_tstats from if_tunnel and remove br_cpu_netstats from br_private.h Cc: Cong Wang <xiyou.wangcong@gmail.com> Cc: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: Li RongQing <roy.qing.li@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
31a5b837 |
|
18-Dec-2013 |
tanxiaojun <tanxiaojun@huawei.com> |
bridge: add space before '(/{', after ',', etc. Spaces required before the open parenthesis '(', before the open brace '{', after that ',' and around that '?/:'. Signed-off-by: Tan Xiaojun <tanxiaojun@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
859828c0 |
|
05-Dec-2013 |
Jiri Pirko <jiri@resnulli.us> |
br: fix use of ->rx_handler_data in code executed on non-rx_handler path br_stp_rcv() is reached by non-rx_handler path. That means there is no guarantee that dev is bridge port and therefore simple NULL check of ->rx_handler_data is not enough. There is need to check if dev is really bridge port and since only rcu read lock is held here, do it by checking ->rx_handler pointer. Note that synchronize_net() in netdev_rx_handler_unregister() ensures this approach as valid. Introduced originally by: commit f350a0a87374418635689471606454abc7beaa3a "bridge: use rx_handler_data pointer to store net_bridge_port pointer" Fixed but not in the best way by: commit b5ed54e94d324f17c97852296d61a143f01b227a "bridge: fix RCU races with bridge port" Reintroduced by: commit 716ec052d2280d511e10e90ad54a86f5b5d4dcc2 "bridge: fix NULL pointer deref of br_port_get_rcu" Please apply to stable trees as well. Thanks. RH bugzilla reference: https://bugzilla.redhat.com/show_bug.cgi?id=1025770 Reported-by: Laine Stump <laine@redhat.com> Debugged-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jiri Pirko <jiri@resnulli.us> Acked-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
06499098 |
|
28-Oct-2013 |
Vlad Yasevich <vyasevic@redhat.com> |
bridge: pass correct vlan id to multicast code Currently multicast code attempts to extrace the vlan id from the skb even when vlan filtering is disabled. This can lead to mdb entries being created with the wrong vlan id. Pass the already extracted vlan id to the multicast filtering code to make the correct id is used in creation as well as lookup. Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Acked-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
454594f3 |
|
19-Oct-2013 |
Linus Lüssing <linus.luessing@c0d3.blue> |
Revert "bridge: only expire the mdb entry when query is received" While this commit was a good attempt to fix issues occuring when no multicast querier is present, this commit still has two more issues: 1) There are cases where mdb entries do not expire even if there is a querier present. The bridge will unnecessarily continue flooding multicast packets on the according ports. 2) Never removing an mdb entry could be exploited for a Denial of Service by an attacker on the local link, slowly, but steadily eating up all memory. Actually, this commit became obsolete with "bridge: disable snooping if there is no querier" (b00589af3b) which included fixes for a few more cases. Therefore reverting the following commits (the commit stated in the commit message plus three of its follow up fixes): ==================== Revert "bridge: update mdb expiration timer upon reports." This reverts commit f144febd93d5ee534fdf23505ab091b2b9088edc. Revert "bridge: do not call setup_timer() multiple times" This reverts commit 1faabf2aab1fdaa1ace4e8c829d1b9cf7bfec2f1. Revert "bridge: fix some kernel warning in multicast timer" This reverts commit c7e8e8a8f7a70b343ca1e0f90a31e35ab2d16de1. Revert "bridge: only expire the mdb entry when query is received" This reverts commit 9f00b2e7cf241fa389733d41b615efdaa2cb0f5b. ==================== CC: Cong Wang <amwang@redhat.com> Signed-off-by: Linus Lüssing <linus.luessing@web.de> Reviewed-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
348662a1 |
|
18-Oct-2013 |
Joe Perches <joe@perches.com> |
net: 8021q/bluetooth/bridge/can/ceph: Remove extern from function prototypes There are a mix of function prototypes with and without extern in the kernel sources. Standardize on not using extern for function prototypes. Function prototypes don't need to be written with extern. extern is assumed by the compiler. Its use is as unnecessary as using auto to declare automatic/local variables in a block. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
d1c6c708 |
|
16-Oct-2013 |
Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> |
bridge: Fix the way the PVID is referenced We are using the VLAN_TAG_PRESENT bit to detect whether the PVID is set or not at br_get_pvid(), while we don't care about the bit in adding/deleting the PVID, which makes it impossible to forward any incomming untagged frame with vlan_filtering enabled. Since vid 0 cannot be used for the PVID, we can use vid 0 to indicate that the PVID is not set, which is slightly more efficient than using the VLAN_TAG_PRESENT. Fix the problem by getting rid of using the VLAN_TAG_PRESENT. Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Reviewed-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
716ec052 |
|
14-Sep-2013 |
Hong Zhiguo <zhiguohong@tencent.com> |
bridge: fix NULL pointer deref of br_port_get_rcu The NULL deref happens when br_handle_frame is called between these 2 lines of del_nbp: dev->priv_flags &= ~IFF_BRIDGE_PORT; /* --> br_handle_frame is called at this time */ netdev_rx_handler_unregister(dev); In br_handle_frame the return of br_port_get_rcu(dev) is dereferenced without check but br_port_get_rcu(dev) returns NULL if: !(dev->priv_flags & IFF_BRIDGE_PORT) Eric Dumazet pointed out the testing of IFF_BRIDGE_PORT is not necessary here since we're in rcu_read_lock and we have synchronize_net() in netdev_rx_handler_unregister. So remove the testing of IFF_BRIDGE_PORT and by the previous patch, make sure br_port_get_rcu is called in bridging code. Signed-off-by: Hong Zhiguo <zhiguohong@tencent.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
1fb1754a |
|
14-Sep-2013 |
Hong Zhiguo <zhiguohong@tencent.com> |
bridge: use br_port_get_rtnl within rtnl lock current br_port_get_rcu is problematic in bridging path (NULL deref). Change these calls in netlink path first. Signed-off-by: Hong Zhiguo <zhiguohong@tencent.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
be4f154d |
|
12-Sep-2013 |
Herbert Xu <herbert@gondor.apana.org.au> |
bridge: Clamp forward_delay when enabling STP At some point limits were added to forward_delay. However, the limits are only enforced when STP is enabled. This created a scenario where you could have a value outside the allowed range while STP is disabled, which then stuck around even after STP is enabled. This patch fixes this by clamping the value when we enable STP. I had to move the locking around a bit to ensure that there is no window where someone could insert a value outside the range while we're in the middle of enabling STP. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Cheers, Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
3c3769e6 |
|
03-Sep-2013 |
Linus Lüssing <linus.luessing@c0d3.blue> |
bridge: apply multicast snooping to IPv6 link-local, too The multicast snooping code should have matured enough to be safely applicable to IPv6 link-local multicast addresses (excluding the link-local all nodes address, ff02::1), too. Signed-off-by: Linus Lüssing <linus.luessing@web.de> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
cc0fdd80 |
|
30-Aug-2013 |
Linus Lüssing <linus.luessing@c0d3.blue> |
bridge: separate querier and query timer into IGMP/IPv4 and MLD/IPv6 ones Currently we would still potentially suffer multicast packet loss if there is just either an IGMP or an MLD querier: For the former case, we would possibly drop IPv6 multicast packets, for the latter IPv4 ones. This is because we are currently assuming that if either an IGMP or MLD querier is present that the other one is present, too. This patch makes the behaviour and fix added in "bridge: disable snooping if there is no querier" (b00589af3b04) to also work if there is either just an IGMP or an MLD querier on the link: It refines the deactivation of the snooping to be protocol specific by using separate timers for the snooped IGMP and MLD queries as well as separate timers for our internal IGMP and MLD queriers. Signed-off-by: Linus Lüssing <linus.luessing@web.de> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
762a3d89 |
|
04-Aug-2013 |
stephen hemminger <stephen@networkplumber.org> |
bridge: fix rcu check warning in multicast port group Use of RCU here with out marked pointer and function doesn't match prototype with sparse. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
b00589af |
|
31-Jul-2013 |
Linus Lüssing <linus.luessing@c0d3.blue> |
bridge: disable snooping if there is no querier If there is no querier on a link then we won't get periodic reports and therefore won't be able to learn about multicast listeners behind ports, potentially leading to lost multicast packets, especially for multicast listeners that joined before the creation of the bridge. These lost multicast packets can appear since c5c23260594 ("bridge: Add multicast_querier toggle and disable queries by default") in particular. With this patch we are flooding multicast packets if our querier is disabled and if we didn't detect any other querier. A grace period of the Maximum Response Delay of the querier is added to give multicast responses enough time to arrive and to be learned from before disabling the flooding behaviour again. Signed-off-by: Linus Lüssing <linus.luessing@web.de> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
93d8bf9f |
|
24-Jul-2013 |
stephen hemminger <stephen@networkplumber.org> |
bridge: cleanup netpoll code This started out with fixing a sparse warning, then I realized that the wrapper function br_netpoll_info could just be collapsed away by rolling it into the enable code. Also, eliminate unnecessary goto's Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Reviewed-by: Jiri Pirko <jiri@resnulli.us> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
867a5943 |
|
05-Jun-2013 |
Vlad Yasevich <vyasevic@redhat.com> |
bridge: Add a flag to control unicast packet flood. Add a flag to control flood of unicast traffic. By default, flood is on and the bridge will flood unicast traffic if it doesn't know the destination. When the flag is turned off, unicast traffic without an FDB will not be forwarded to the specified port. Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
9ba18891 |
|
05-Jun-2013 |
Vlad Yasevich <vyasevic@redhat.com> |
bridge: Add flag to control mac learning. Allow user to control whether mac learning is enabled on the port. By default, mac learning is enabled. Disabling mac learning will cause new dynamic FDB entries to not be created for a particular port. Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
9f00b2e7 |
|
21-May-2013 |
Cong Wang <amwang@redhat.com> |
bridge: only expire the mdb entry when query is received Currently we arm the expire timer when the mdb entry is added, however, this causes problem when there is no querier sent out after that. So we should only arm the timer when a corresponding query is received, as suggested by Herbert. And he also mentioned "if there is no querier then group subscriptions shouldn't expire. There has to be at least one querier in the network for this thing to work. Otherwise it just degenerates into a non-snooping switch, which is OK." Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Stephen Hemminger <stephen@networkplumber.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Adam Baker <linux@baker-net.org.uk> Signed-off-by: Cong Wang <amwang@redhat.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
1c8ad5bf |
|
21-May-2013 |
Cong Wang <amwang@redhat.com> |
bridge: use the bridge IP addr as source addr for querier Quote from Adam: "If it is believed that the use of 0.0.0.0 as the IP address is what is causing strange behaviour on other devices then is there a good reason that a bridge rather than a router shouldn't be the active querier? If not then using the bridge IP address and having the querier enabled by default may be a reasonable solution (provided that our querier obeys the election rules and shuts up if it sees a query from a lower IP address that isn't 0.0.0.0). Just because a device is the elected querier for IGMP doesn't appear to mean it is required to perform any other routing functions." And introduce a new troggle for it, as suggested by Herbert. Suggested-by: Adam Baker <linux@baker-net.org.uk> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Stephen Hemminger <stephen@networkplumber.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Adam Baker <linux@baker-net.org.uk> Signed-off-by: Cong Wang <amwang@redhat.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
8f3359bd |
|
13-Apr-2013 |
stephen hemminger <stephen@networkplumber.org> |
bridge: make user modified path cost sticky Keep a STP port path cost value if it was set by a user. Don't replace it with the link-speed based path cost whenever the link goes down and comes back up. Reported-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
fbca58a2 |
|
06-Mar-2013 |
Cong Wang <amwang@redhat.com> |
bridge: add missing vid to br_mdb_get() Obviously, vid should be considered when searching for multicast group. Cc: Vlad Yasevich <vyasevic@redhat.com> Cc: Stephen Hemminger <stephen@networkplumber.org> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Cong Wang <amwang@redhat.com> Acked-by: Vlad Yasevich <vyasevich@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
35e03f3a |
|
12-Feb-2013 |
Vlad Yasevich <vyasevic@redhat.com> |
bridge: Separate egress policy bitmap Add an ability to configure a separate "untagged" egress policy to the VLAN information of the bridge. This superseeds PVID policy and makes PVID ingress-only. The policy is configured with a new flag and is represented as a port bitmap per vlan. Egress frames with a VLAN id in "untagged" policy bitmap would egress the port without VLAN header. Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
bc9a25d2 |
|
12-Feb-2013 |
Vlad Yasevich <vyasevic@redhat.com> |
bridge: Add vlan support for local fdb entries When VLAN is added to the port, a local fdb entry for that port (the entry with the mac address of the port) is added for that VLAN. This way we can correctly determine if the traffic is for the bridge itself. If the address of the port changes, we try to change all the local fdb entries we have for that port. Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
1690be63 |
|
12-Feb-2013 |
Vlad Yasevich <vyasevic@redhat.com> |
bridge: Add vlan support to static neighbors When a user adds bridge neighbors, allow him to specify VLAN id. If the VLAN id is not specified, the neighbor will be added for VLANs currently in the ports filter list. If no VLANs are configured on the port, we use vlan 0 and only add 1 entry. Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Acked-by: Jitendra Kalsaria <jitendra.kalsaria@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
b0e9a30d |
|
12-Feb-2013 |
Vlad Yasevich <vyasevic@redhat.com> |
bridge: Add vlan id to multicast groups Add vlan_id to multicasts groups so that we know which vlan each group belongs to and can correctly forward to appropriate vlan. Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
2ba071ec |
|
12-Feb-2013 |
Vlad Yasevich <vyasevic@redhat.com> |
bridge: Add vlan to unicast fdb entries This patch adds vlan to unicast fdb entries that are created for learned addresses (not the manually configured ones). It adds vlan id into the hash mix and uses vlan as an addditional parameter for an entry match. Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
552406c4 |
|
12-Feb-2013 |
Vlad Yasevich <vyasevic@redhat.com> |
bridge: Add the ability to configure pvid A user may designate a certain vlan as PVID. This means that any ingress frame that does not contain a vlan tag is assigned to this vlan and any forwarding decisions are made with this vlan in mind. Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
78851988 |
|
12-Feb-2013 |
Vlad Yasevich <vyasevic@redhat.com> |
bridge: Implement vlan ingress/egress policy with PVID. At ingress, any untagged traffic is assigned to the PVID. Any tagged traffic is filtered according to membership bitmap. At egress, if the vlan matches the PVID, the frame is sent untagged. Otherwise the frame is sent tagged. Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
6cbdceeb |
|
12-Feb-2013 |
Vlad Yasevich <vyasevic@redhat.com> |
bridge: Dump vlan information from a bridge port Using the RTM_GETLINK dump the vlan filter list of a given bridge port. The information depends on setting the filter flag similar to how nic VF info is dumped. Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
407af329 |
|
12-Feb-2013 |
Vlad Yasevich <vyasevic@redhat.com> |
bridge: Add netlink interface to configure vlans on bridge ports Add a netlink interface to add and remove vlan configuration on bridge port. The interface uses the RTM_SETLINK message and encodes the vlan configuration inside the IFLA_AF_SPEC. It is possble to include multiple vlans to either add or remove in a single message. Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
85f46c6b |
|
12-Feb-2013 |
Vlad Yasevich <vyasevic@redhat.com> |
bridge: Verify that a vlan is allowed to egress on given port When bridge forwards a frame, make sure that a frame is allowed to egress on that port. Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
a37b85c9 |
|
12-Feb-2013 |
Vlad Yasevich <vyasevic@redhat.com> |
bridge: Validate that vlan is permitted on ingress When a frame arrives on a port or transmitted by the bridge, if we have VLANs configured, validate that a given VLAN is allowed to enter the bridge. Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
243a2e63 |
|
12-Feb-2013 |
Vlad Yasevich <vyasevic@redhat.com> |
bridge: Add vlan filtering infrastructure Adds an optional infrustructure component to bridge that would allow native vlan filtering in the bridge. Each bridge port (as well as the bridge device) now get a VLAN bitmap. Each bit in the bitmap is associated with a vlan id. This way if the bit corresponding to the vid is set in the bitmap that the packet with vid is allowed to enter and exit the port. Write access the bitmap is protected by RTNL and read access protected by RCU. Vlan functionality is disabled by default. Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
b2748267 |
|
10-Feb-2013 |
Jiri Pirko <jiri@resnulli.us> |
bridge: use dev->addr_assign_type to see if user change mac And remove no longer used br->flags. Signed-off-by: Jiri Pirko <jiri@resnulli.us> Acked-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
fdb184d1 |
|
03-Jan-2013 |
Rami Rosen <ramirose@gmail.com> |
bridge: add empty br_mdb_init() and br_mdb_uninit() definitions. This patch adds empty br_mdb_init() and br_mdb_uninit() definitions in br_private.h to avoid build failure when CONFIG_BRIDGE_IGMP_SNOOPING is not set. These methods were moved from br_multicast.c to br_netlink.c by commit 3ec8e9f085bcaef0de1077f555c2c5102c223390 Signed-off-by: Rami Rosen <ramirose@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
63233159 |
|
19-Dec-2012 |
Vlad Yasevich <vyasevic@redhat.com> |
bridge: Do not unregister all PF_BRIDGE rtnl operations Bridge fdb and link rtnl operations are registered in core/rtnetlink. Bridge mdb operations are registred in bridge/mdb. When removing bridge module, do not unregister ALL PF_BRIDGE ops since that would remove the ops from rtnetlink as well. Do remove mdb ops when bridge is destroyed. Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
ccb1c31a |
|
14-Dec-2012 |
Amerigo Wang <amwang@redhat.com> |
bridge: add flags to distinguish permanent mdb entires This patch adds a flag to each mdb entry, so that we can distinguish permanent entries with temporary entries. Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Stephen Hemminger <shemminger@vyatta.com> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
cfd56754 |
|
11-Dec-2012 |
Cong Wang <amwang@redhat.com> |
bridge: add support of adding and deleting mdb entries This patch implents adding/deleting mdb entries via netlink. Currently all entries are temp, we probably need a flag to distinguish permanent entries too. Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Stephen Hemminger <shemminger@vyatta.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Thomas Graf <tgraf@suug.ch> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
37a393bc |
|
11-Dec-2012 |
Cong Wang <amwang@redhat.com> |
bridge: notify mdb changes via netlink As Stephen mentioned, we need to monitor the mdb changes in user-space, so add notifications via netlink too. Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Stephen Hemminger <shemminger@vyatta.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Thomas Graf <tgraf@suug.ch> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
2ce297fc |
|
09-Dec-2012 |
Cong Wang <amwang@redhat.com> |
bridge: fix seq check in br_mdb_dump() In case of rehashing, introduce a global variable 'br_mdb_rehash_seq' which gets increased every time when rehashing, and assign net->dev_base_seq + br_mdb_rehash_seq to cb->seq. In theory cb->seq could be wrapped to zero, but this is not easy to fix, as net->dev_base_seq is not visible inside br_mdb_rehash(). In practice, this is rare. Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Stephen Hemminger <shemminger@vyatta.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Thomas Graf <tgraf@suug.ch> Cc: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
ee07c6e7 |
|
06-Dec-2012 |
Cong Wang <amwang@redhat.com> |
bridge: export multicast database via netlink V5: fix two bugs pointed out by Thomas remove seq check for now, mark it as TODO V4: remove some useless #include some coding style fix V3: drop debugging printk's update selinux perm table as well V2: drop patch 1/2, export ifindex directly Redesign netlink attributes Improve netlink seq check Handle IPv6 addr as well This patch exports bridge multicast database via netlink message type RTM_GETMDB. Similar to fdb, but currently bridge-specific. We may need to support modify multicast database too (RTM_{ADD,DEL}MDB). (Thanks to Thomas for patient reviews) Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Stephen Hemminger <shemminger@vyatta.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Thomas Graf <tgraf@suug.ch> Cc: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Cong Wang <amwang@redhat.com> Acked-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
c2d3babf |
|
05-Dec-2012 |
David S. Miller <davem@davemloft.net> |
bridge: implement multicast fast leave V3: make it a flag V2: make the toggle per-port Fast leave allows bridge to immediately stops the multicast traffic on the port receives IGMP Leave when IGMP snooping is enabled, no timeouts are observed. Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Stephen Hemminger <shemminger@vyatta.com> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Cong Wang <amwang@redhat.com>
|
#
50426b59 |
|
03-Dec-2012 |
Amerigo Wang <amwang@redhat.com> |
bridge: implement multicast fast leave V2: make the toggle per-port Fast leave allows bridge to immediately stops the multicast traffic on the port receives IGMP Leave when IGMP snooping is enabled, no timeouts are observed. Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Stephen Hemminger <shemminger@vyatta.com> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
1007dd1a |
|
13-Nov-2012 |
stephen hemminger <shemminger@vyatta.com> |
bridge: add root port blocking This is Linux bridge implementation of root port guard. If BPDU is received from a leaf (edge) port, it should not be elected as root port. Why would you want to do this? If using STP on a bridge and the downstream bridges are not fully trusted; this prevents a hostile guest for rerouting traffic. Why not just use netfilter? Netfilter does not track of follow spanning tree decisions. It would be difficult and error prone to try and mirror STP resolution in netfilter module. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
a2e01a65 |
|
13-Nov-2012 |
stephen hemminger <shemminger@vyatta.com> |
bridge: implement BPDU blocking This is Linux bridge implementation of STP protection (Cisco BPDU guard/Juniper BPDU block). BPDU block disables the bridge port if a STP BPDU packet is received. Why would you want to do this? If running Spanning Tree on bridge, hostile devices on the network may send BPDU and cause network failure. Enabling bpdu block will detect and stop this. How to recover the port? The port will be restarted if link is brought down, or removed and reattached. For example: # ip li set dev eth0 down; ip li set dev eth0 up Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
0cb2bbbe |
|
03-Nov-2012 |
Lee Jones <lee.jones@linaro.org> |
bridge: Avoid 'statement with no effect' compiler warnings Instead of issuing (0) statements when !CONFIG_SYSFS which will cause 'warning: ', we'll use inline statements instead. This will effectively do the same thing, but suppress any unnecessary warnings. Cc: Stephen Hemminger <shemminger@vyatta.com> Cc: bridge@lists.linux-foundation.org Cc: netdev@vger.kernel.org Signed-off-by: Lee Jones <lee.jones@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
2469ffd7 |
|
24-Oct-2012 |
John Fastabend <john.r.fastabend@intel.com> |
net: set and query VEB/VEPA bridge mode via PF_BRIDGE Hardware switches may support enabling and disabling the loopback switch which puts the device in a VEPA mode defined in the IEEE 802.1Qbg specification. In this mode frames are not switched in the hardware but sent directly to the switch. SR-IOV capable NICs will likely support this mode I am aware of at least two such devices. Also I am told (but don't have any of this hardware available) that there are devices that only support VEPA modes. In these cases it is important at a minimum to be able to query these attributes. This patch adds an additional IFLA_BRIDGE_MODE attribute that can be set and dumped via the PF_BRIDGE:{SET|GET}LINK operations. Also anticipating bridge attributes that may be common for both embedded bridges and software bridges this adds a flags attribute IFLA_BRIDGE_FLAGS currently used to determine if the command or event is being generated to/from an embedded bridge or software bridge. Finally, the event generation is pulled out of the bridge module and into rtnetlink proper. For example using the macvlan driver in VEPA mode on top of an embedded switch requires putting the embedded switch into a VEPA mode to get the expected results. -------- -------- | VEPA | | VEPA | <-- macvlan vepa edge relays -------- -------- | | | | ------------------ | VEPA | <-- embedded switch in NIC ------------------ | | ------------------- | external switch | <-- shiny new physical ------------------- switch with VEPA support A packet sent from the macvlan VEPA at the top could be loopbacked on the embedded switch and never seen by the external switch. So in order for this to work the embedded switch needs to be set in the VEPA state via the above described commands. By making these attributes nested in IFLA_AF_SPEC we allow future extensions to be made as needed. CC: Lennert Buytenhek <buytenh@wantstofly.org> CC: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: John Fastabend <john.r.fastabend@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
e5a55a89 |
|
24-Oct-2012 |
John Fastabend <john.r.fastabend@intel.com> |
net: create generic bridge ops The PF_BRIDGE:RTM_{GET|SET}LINK nlmsg family and type are currently embedded in the ./net/bridge module. This prohibits them from being used by other bridging devices. One example of this being hardware that has embedded bridging components. In order to use these nlmsg types more generically this patch adds two net_device_ops hooks. One to set link bridge attributes and another to dump the current bride attributes. ndo_bridge_setlink() ndo_bridge_getlink() CC: Lennert Buytenhek <buytenh@wantstofly.org> CC: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: John Fastabend <john.r.fastabend@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
b3343a2a |
|
17-Sep-2012 |
John Fastabend <john.r.fastabend@intel.com> |
net, ixgbe: handle link local multicast addresses in SR-IOV mode In SR-IOV mode the PF driver acts as the uplink port and is used to send control packets e.g. lldpad, stp, etc. eth0.1 eth0.2 eth0 VF VF PF | | | <-- stand-in for uplink | | | -------------------------- | Embedded Switch | -------------------------- | MAC <-- uplink But the embedded switch is setup to forward multicast addresses to all interfaces both VFs and PF and onto the physical link. This results in reserved MAC addresses used by control protocols to be forwarded over the switch onto the VF. In the LLDP case the PF sends an LLDPDU and it is currently being forwarded to all the VFs who then see the PF as a peer. This is incorrect. This patch adds the multicast addresses to the RAR table in the hardware to prevent this behavior. Signed-off-by: John Fastabend <john.r.fastabend@intel.com> Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com> Tested-by: Sibai Li <sibai.li@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
#
edc7d573 |
|
30-Sep-2012 |
stephen hemminger <shemminger@vyatta.com> |
netlink: add attributes to fdb interface Later changes need to be able to refer to neighbour attributes when doing fdb_add. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
6b6e2725 |
|
17-Sep-2012 |
stephen hemminger <shemminger@vyatta.com> |
netdev: make address const in device address management The internal functions for add/deleting addresses don't change their argument. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
47be03a2 |
|
09-Aug-2012 |
Amerigo Wang <amwang@redhat.com> |
netpoll: use GFP_ATOMIC in slave_enable_netpoll() and __netpoll_setup() slave_enable_netpoll() and __netpoll_setup() may be called with read_lock() held, so should use GFP_ATOMIC to allocate memory. Eric suggested to pass gfp flags to __netpoll_setup(). Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: "David S. Miller" <davem@davemloft.net> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
149ddd83 |
|
25-Jun-2012 |
stephen hemminger <shemminger@vyatta.com> |
bridge: Assign rtnl_link_ops to bridge devices created via ioctl (v2) This ensures that bridges created with brctl(8) or ioctl(2) directly also carry IFLA_LINKINFO when dumped over netlink. This also allows to create a bridge with ioctl(2) and delete it with RTM_DELLINK. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
77162022 |
|
15-Apr-2012 |
John Fastabend <john.r.fastabend@intel.com> |
net: add generic PF_BRIDGE:RTM_ FDB hooks This adds two new flags NTF_MASTER and NTF_SELF that can now be used to specify where PF_BRIDGE netlink commands should be sent. NTF_MASTER sends the commands to the 'dev->master' device for parsing. Typically this will be the linux net/bridge, or open-vswitch devices. Also without any flags set the command will be handled by the master device as well so that current user space tools continue to work as expected. The NTF_SELF flag will push the PF_BRIDGE commands to the device. In the basic example below the commands are then parsed and programmed in the embedded bridge. Note if both NTF_SELF and NTF_MASTER bits are set then the command will be sent to both 'dev->master' and 'dev' this allows user space to easily keep the embedded bridge and software bridge in sync. There is a slight complication in the case with both flags set when an error occurs. To resolve this the rtnl handler clears the NTF_ flag in the netlink ack to indicate which sets completed successfully. The add/del handlers will abort as soon as any error occurs. To support this new net device ops were added to call into the device and the existing bridging code was refactored to use these. There should be no required changes in user space to support the current bridge behavior. A basic setup with a SR-IOV enabled NIC looks like this, veth0 veth2 | | ------------ | bridge0 | <---- software bridging ------------ / / ethx.y ethx VF PF \ \ <---- propagate FDB entries to HW \ \ -------------------- | Embedded Bridge | <---- hardware offloaded switching -------------------- In this case the embedded bridge must be managed to allow 'veth0' to communicate with 'ethx.y' correctly. At present drivers managing the embedded bridge either send frames onto the network which then get dropped by the switch OR the embedded bridge will flood these frames. With this patch we have a mechanism to manage the embedded bridge correctly from user space. This example is specific to SR-IOV but replacing the VF with another PF or dropping this into the DSA framework generates similar management issues. Examples session using the 'br'[1] tool to add, dump and then delete a mac address with a new "embedded" option and enabled ixgbe driver: # br fdb add 22:35:19:ac:60:59 dev eth3 # br fdb port mac addr flags veth0 22:35:19:ac:60:58 static veth0 9a:5f:81:f7:f6:ec local eth3 00:1b:21:55:23:59 local eth3 22:35:19:ac:60:59 static veth0 22:35:19:ac:60:57 static #br fdb add 22:35:19:ac:60:59 embedded dev eth3 #br fdb port mac addr flags veth0 22:35:19:ac:60:58 static veth0 9a:5f:81:f7:f6:ec local eth3 00:1b:21:55:23:59 local eth3 22:35:19:ac:60:59 static veth0 22:35:19:ac:60:57 static eth3 22:35:19:ac:60:59 local embedded #br fdb del 22:35:19:ac:60:59 embedded dev eth3 I added a couple lines to 'br' to set the flags correctly is all. It is my opinion that the merit of this patch is now embedded and SW bridges can both be modeled correctly in user space using very nearly the same message passing. [1] 'br' tool was published as an RFC here and will be renamed 'bridge' http://patchwork.ozlabs.org/patch/117664/ Thanks to Jamal Hadi Salim, Stephen Hemminger and Ben Hutchings for valuable feedback, suggestions, and review. v2: fixed api descriptions and error case with both NTF_SELF and NTF_MASTER set plus updated patch description. Signed-off-by: John Fastabend <john.r.fastabend@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
c5c23260 |
|
12-Apr-2012 |
Herbert Xu <herbert@gondor.apana.org.au> |
bridge: Add multicast_querier toggle and disable queries by default Sending general queries was implemented as an optimisation to speed up convergence on start-up. In order to prevent interference with multicast routers a zero source address has to be used. Unfortunately these packets appear to cause some multicast-aware switches to misbehave, e.g., by disrupting multicast packets to us. Since the multicast snooping feature still functions without sending our own queries, this patch will change the default to not send queries. For those that need queries in order to speed up convergence on start-up, a toggle is provided to restore the previous behaviour. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
996304bb |
|
03-Apr-2012 |
Herbert Xu <herbert@gondor.apana.org.au> |
bridge: Do not send queries on multicast group leaves As it stands the bridge IGMP snooping system will respond to group leave messages with queries for remaining membership. This is both unnecessary and undesirable. First of all any multicast routers present should be doing this rather than us. What's more the queries that we send may end up upsetting other multicast snooping swithces in the system that are buggy. In fact, we can simply remove the code that send these queries because the existing membership expiry mechanism doesn't rely on them anyway. So this patch simply removes all code associated with group queries in response to group leave messages. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
e6373c4c |
|
11-Dec-2011 |
Igor Maravić <igorm@etf.rs> |
net:bridge: use IS_ENABLED Use IS_ENABLED(CONFIG_FOO) instead of defined(CONFIG_FOO) || defined (CONFIG_FOO_MODULE) Signed-off-by: Igor Maravić <igorm@etf.rs> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
dfd56b8b |
|
10-Dec-2011 |
Eric Dumazet <eric.dumazet@gmail.com> |
net: use IS_ENABLED(CONFIG_IPV6) Instead of testing defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE) Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
43598813 |
|
08-Dec-2011 |
stephen hemminger <shemminger@vyatta.com> |
bridge: add local MAC address to forwarding table (v2) If user has configured a MAC address that is not one of the existing ports of the bridge, then we need to add a special entry in the forwarding table. This forwarding table entry has no outgoing port so it has to be treated a little differently. The special entry is reported by the netlink interface with ifindex of bridge, but ignored by the old interface since there is no usable way to put it in the ABI. Reported-by: Koki Sanagi <sanagi.koki@jp.fujitsu.com> Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
c8f44aff |
|
15-Nov-2011 |
Michał Mirosław <mirq-linux@rere.qmqm.pl> |
net: introduce and use netdev_features_t for device features sets v2: add couple missing conversions in drivers split unexporting netdev_fix_features() implemented %pNF convert sock::sk_route_(no?)caps Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
1ce5cce8 |
|
06-Oct-2011 |
stephen hemminger <shemminger@vyatta.com> |
bridge: fix hang on removal of bridge via netlink Need to cleanup bridge device timers and ports when being bridge device is being removed via netlink. This fixes the problem of observed when doing: ip link add br0 type bridge ip link set dev eth1 master br0 ip link set br0 up ip link del br0 which would cause br0 to hang in unregister_netdev because of leftover reference count. Reported-by: Sridhar Samudrala <sri@us.ibm.com> Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Acked-by: Sridhar Samudrala <sri@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
515853cc |
|
03-Oct-2011 |
stephen hemminger <shemminger@vyatta.com> |
bridge: allow forwarding some link local frames This is based on an earlier patch by Nick Carter with comments by David Lamparter but with some refinements. Thanks for their patience this is a confusing area with overlap of standards, user requirements, and compatibility with earlier releases. It adds a new sysfs attribute /sys/class/net/brX/bridge/group_fwd_mask that controls forwarding of frames with address of: 01-80-C2-00-00-0X The default setting has no forwarding to retain compatibility. One change from earlier releases is that forwarding of group addresses is not dependent on STP being enabled or disabled. This choice was made based on interpretation of tie 802.1 standards. I expect complaints will arise because of this, but better to follow the standard than continue acting incorrectly by default. The filtering mask is writeable, but only values that don't forward known control frames are allowed. It intentionally blocks attempts to filter control protocols. For example: writing a 8 allows forwarding 802.1X PAE addresses which is the most common request. Reported-by: David Lamparter <equinox@diac24.net> Original-patch-by: Nick Carter <ncarter100@gmail.com> Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Tested-by: Benjamin Poirier <benjamin.poirier@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
0c03150e |
|
22-Jul-2011 |
stephen hemminger <shemminger@vyatta.com> |
bridge: send proper message_age in config BPDU A bridge topology with three systems: +------+ +------+ | A(2) |--| B(1) | +------+ +------+ \ / +------+ | C(3) | +------+ What is supposed to happen: * bridge with the lowest ID is elected root (for example: B) * C detects that A->C is higher cost path and puts in blocking state What happens. Bridge with lowest id (B) is elected correctly as root and things start out fine initially. But then config BPDU doesn't get transmitted from A -> C. Because of that the link from A-C is transistioned to the forwarding state. The root cause of this is that the configuration messages is generated with bogus message age, and dropped before sending. In the standardmessage_age is supposed to be: the time since the generation of the Configuration BPDU by the Root that instigated the generation of this Configuration BPDU. Reimplement this by recording the timestamp (age + jiffies) when recording config information. The old code incorrectly used the time elapsed on the ageing timer which was incorrect. See also: https://bugzilla.vyatta.com/show_bug.cgi?id=7164 Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
c4d27ef9 |
|
22-Apr-2011 |
Michał Mirosław <mirq-linux@rere.qmqm.pl> |
bridge: convert br_features_recompute() to ndo_fix_features Note: netdev_update_features() needs only rtnl_lock as br->port_list is only changed while holding it. Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
14f98f25 |
|
04-Apr-2011 |
stephen hemminger <shemminger@vyatta.com> |
bridge: range check STP parameters Apply restrictions on STP parameters based 802.1D 1998 standard. * Fixes missing locking in set path cost ioctl * Uses common code for both ioctl and sysfs This is based on an earlier patch Sasikanth V but with overhaul. Note: 1. It does NOT enforce the restriction on the relationship max_age and forward delay or hello time because in existing implementation these are set as independant operations. 2. If STP is disabled, there is no restriction on forward delay 3. No restriction on holding time because users use Linux code to act as hub or be sticky. 4. Although standard allow 0-255, Linux only allows 0-63 for port priority because more bits are reserved for port number. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
36fd2b63 |
|
04-Apr-2011 |
stephen hemminger <shemminger@vyatta.com> |
bridge: allow creating/deleting fdb entries via netlink Use RTM_NEWNEIGH and RTM_DELNEIGH to allow updating of entries in bridge forwarding table. This allows manipulating static entries which is not possible with existing tools. Example (using bridge extensions to iproute2) # br fdb add 00:02:03:04:05:06 dev eth0 Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
b078f0df |
|
04-Apr-2011 |
stephen hemminger <shemminger@vyatta.com> |
bridge: add netlink notification on forward entry changes This allows applications to query and monitor bridge forwarding table in the same method used for neighbor table. The forward table entries are returned in same structure format as used by the ioctl. If more information is desired in future, the netlink method is extensible. Example (using bridge extensions to iproute2) # br monitor Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
7cd8861a |
|
04-Apr-2011 |
stephen hemminger <shemminger@vyatta.com> |
bridge: track last used time in forwarding table Adds tracking the last used time in forwarding table. Rename ageing_timer to updated to better describe it. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
edf947f1 |
|
24-Mar-2011 |
stephen hemminger <shemminger@vyatta.com> |
bridge: notify applications if address of bridge device changes The mac address of the bridge device may be changed when a new interface is added to the bridge. If this happens, then the bridge needs to call the network notifiers to tickle any other systems that care. Since bridge can be a module, this also means exporting the notifier function. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
8a4eb573 |
|
11-Mar-2011 |
Jiri Pirko <jpirko@redhat.com> |
net: introduce rx_handler results and logic around that This patch allows rx_handlers to better signalize what to do next to it's caller. That makes skb->deliver_no_wcard no longer needed. kernel-doc for rx_handler_result is taken from Nicolas' patch. Signed-off-by: Jiri Pirko <jpirko@redhat.com> Reviewed-by: Nicolas de Pesloüan <nicolas.2p.debian@free.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
8a870178 |
|
12-Feb-2011 |
Herbert Xu <herbert@gondor.apana.org.au> |
bridge: Replace mp->mglist hlist with a bool As it turns out we never need to walk through the list of multicast groups subscribed by the bridge interface itself (the only time we'd want to do that is when we shut down the bridge, in which case we simply walk through all multicast groups), we don't really need to keep an hlist for mp->mglist. This means that we can replace it with just a single bit to indicate whether the bridge interface is subscribed to a group. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
04ed3e74 |
|
24-Jan-2011 |
Michał Mirosław <mirq-linux@rere.qmqm.pl> |
net: change netdev->features to u32 Quoting Ben Hutchings: we presumably won't be defining features that can only be enabled on 64-bit architectures. Occurences found by `grep -r` on net/, drivers/net, include/ [ Move features and vlan_features next to each other in struct netdev, as per Eric Dumazet's suggestion -DaveM ] Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
ec1e5610 |
|
14-Nov-2010 |
Eric Dumazet <eric.dumazet@gmail.com> |
bridge: add RCU annotations to bridge port lookup br_port_get() renamed to br_port_get_rtnl() to make clear RTNL is held. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
b5ed54e9 |
|
14-Nov-2010 |
stephen hemminger <shemminger@vyatta.com> |
bridge: fix RCU races with bridge port The macro br_port_exists() is not enough protection when only RCU is being used. There is a tiny race where other CPU has cleared port handler hook, but is bridge port flag might still be set. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
e8051688 |
|
14-Nov-2010 |
Eric Dumazet <eric.dumazet@gmail.com> |
bridge: add RCU annotation to bridge multicast table Add modern __rcu annotatations to bridge multicast table. Use newer hlist macros to avoid direct access to hlist internals. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
4df53d8b |
|
02-Jul-2010 |
Patrick McHardy <kaber@trash.net> |
bridge: add per bridge device controls for invoking iptables Support more fine grained control of bridge netfilter iptables invocation by adding seperate brnf_call_*tables parameters for each device using the sysfs interface. Packets are passed to layer 3 netfilter when either the global parameter or the per bridge parameter is enabled. Acked-by: Stephen Hemminger <shemminger@vyatta.com> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Patrick McHardy <kaber@trash.net>
|
#
a53f4b61 |
|
01-Jul-2010 |
Paul E. McKenney <paulmck@kernel.org> |
Revert "net: Make accesses to ->br_port safe for sparse RCU" This reverts commit 81bdf5bd7349bd4523538cbd7878f334bc2bfe14, which is obsoleted by commit f350a0a87374 from the net tree.
|
#
406818ff |
|
23-Jun-2010 |
Eric Dumazet <eric.dumazet@gmail.com> |
bridge: 64bit rx/tx counters Use u64_stats_sync infrastructure to provide 64bit rx/tx counters even on 32bit hosts. It is safe to use a single u64_stats_sync for rx and tx, because BH is disabled on both, and we use per_cpu data. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
9f70b0fc |
|
15-Jun-2010 |
Herbert Xu <herbert@gondor.apana.org.au> |
bridge: Add const to dummy br_netpoll_send_skb The version of br_netpoll_send_skb used when netpoll is off is missing a const thus causing a warning. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
f350a0a8 |
|
15-Jun-2010 |
Jiri Pirko <jpirko@redhat.com> |
bridge: use rx_handler_data pointer to store net_bridge_port pointer Register net_bridge_port pointer as rx_handler data pointer. As br_port is removed from struct net_device, another netdev priv_flag is added to indicate the device serves as a bridge port. Also rcuized pointers are now correctly dereferenced in br_fdb.c and in netfilter parts. Signed-off-by: Jiri Pirko <jpirko@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
91d2c34a |
|
10-Jun-2010 |
Herbert Xu <herbert@gondor.apana.org.au> |
bridge: Fix netpoll support There are multiple problems with the newly added netpoll support: 1) Use-after-free on each netpoll packet. 2) Invoking unsafe code on netpoll/IRQ path. 3) Breaks when netpoll is enabled on the underlying device. This patch fixes all of these problems. In particular, we now allocate proper netpoll structures for each underlying device. We only allow netpoll to be enabled on the bridge when all the devices underneath it support netpoll. Once it is enabled, we do not allow non-netpoll devices to join the bridge (until netpoll is disabled again). This allows us to do away with the npinfo juggling that caused problem number 1. Incidentally this patch fixes number 2 by bypassing unsafe code such as multicast snooping and netfilter. Reported-by: Qianfeng Zhang <frzhang@redhat.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
81bdf5bd |
|
02-May-2010 |
Paul E. McKenney <paulmck@kernel.org> |
net: Make accesses to ->br_port safe for sparse RCU The new versions of the rcu_dereference() APIs requires that any pointers passed to one of these APIs be fully defined. The ->br_port field in struct net_device points to a struct net_bridge_port, which is an incomplete type. This commit therefore changes ->br_port to be a void*, and introduces a br_port() helper function to convert the type to struct net_bridge_port, and applies this new helper function where required. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: David Miller <davem@davemloft.net> Cc: Stephen Hemminger <shemminger@linux-foundation.org> Cc: Eric Dumazet <eric.dumazet@gmail.com>
|
#
ab95bfe0 |
|
01-Jun-2010 |
Jiri Pirko <jpirko@redhat.com> |
net: replace hooks in __netif_receive_skb V5 What this patch does is it removes two receive frame hooks (for bridge and for macvlan) from __netif_receive_skb. These are replaced them with a single hook for both. It only supports one hook per device because it makes no sense to do bridging and macvlan on the same device. Then a network driver (of virtual netdev like macvlan or bridge) can register an rx_handler for needed net device. Signed-off-by: Jiri Pirko <jpirko@redhat.com> Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
e0f43752 |
|
10-May-2010 |
Simon Arlott <simon@octiron.net> |
bridge: update sysfs link names if port device names have changed Links for each port are created in sysfs using the device name, but this could be changed after being added to the bridge. As well as being unable to remove interfaces after this occurs (because userspace tools don't recognise the new name, and the kernel won't recognise the old name), adding another interface with the old name to the bridge will cause an error trying to create the sysfs link. This fixes the problem by listening for NETDEV_CHANGENAME notifications and renaming the link. https://bugzilla.kernel.org/show_bug.cgi?id=12743 Signed-off-by: Simon Arlott <simon@fire.lp0.eu> Acked-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
28a16c97 |
|
10-May-2010 |
stephen hemminger <shemminger@vyatta.com> |
bridge: change console message interface Use one set of macro's for all bridge messages. Note: can't use netdev_XXX macro's because bridge is purely virtual and has no device parent. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
cfb478da |
|
10-May-2010 |
stephen hemminger <shemminger@vyatta.com> |
bridge: netpoll cleanup Move code around so that the ifdef for NETPOLL_CONTROLLER don't have to show up in main code path. The control functions should be in helpers that are only compiled if needed. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
c06ee961 |
|
06-May-2010 |
WANG Cong <amwang@redhat.com> |
bridge: make bridge support netpoll Based on the previous patch, make bridge support netpoll by: 1) implement the 2 methods to support netpoll for bridge; 2) modify netpoll during forwarding packets via bridge; 3) disable netpoll support of bridge when a netpoll-unabled device is added to bridge; 4) enable netpoll support when all underlying devices support netpoll. Cc: David Miller <davem@davemloft.net> Cc: Neil Horman <nhorman@tuxdriver.com> Cc: Stephen Hemminger <shemminger@linux-foundation.org> Cc: Matt Mackall <mpm@selenic.com> Signed-off-by: WANG Cong <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
08b202b6 |
|
22-Apr-2010 |
YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> |
bridge br_multicast: IPv6 MLD support. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
|
#
8ef2a9a5 |
|
17-Apr-2010 |
YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> |
bridge br_multicast: Make functions less ipv4 dependent. Introduce struct br_ip{} to store ip address and protocol and make functions more generic so that we can support both IPv4 and IPv6 with less pain. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
|
#
0eae88f3 |
|
20-Apr-2010 |
Eric Dumazet <eric.dumazet@gmail.com> |
net: Fix various endianness glitches Sparse can help us find endianness bugs, but we need to make some cleanups to be able to more easily spot real bugs. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
14bb4789 |
|
02-Mar-2010 |
stephen hemminger <shemminger@vyatta.com> |
bridge: per-cpu packet statistics (v3) The shared packet statistics are a potential source of slow down on bridged traffic. Convert to per-cpu array, but only keep those statistics which change per-packet. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
32dec5dd |
|
15-Mar-2010 |
YOSHIFUJI Hideaki / 吉藤英明 <yoshfuji@linux-ipv6.org> |
bridge br_multicast: Don't refer to BR_INPUT_SKB_CB(skb)->mrouters_only without IGMP snooping. Without CONFIG_BRIDGE_IGMP_SNOOPING, BR_INPUT_SKB_CB(skb)->mrouters_only is not appropriately initialized, so we can see garbage. A clear option to fix this is to set it even without that config, but we cannot optimize out the branch. Let's introduce a macro that returns value of mrouters_only and let it return 0 without CONFIG_BRIDGE_IGMP_SNOOPING. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
7f7708f0 |
|
16-Mar-2010 |
Michael Braun <michael-dev@fami-braun.de> |
bridge: Fix br_forward crash in promiscuous mode From: Michael Braun <michael-dev@fami-braun.de> bridge: Fix br_forward crash in promiscuous mode It's a linux-next kernel from 2010-03-12 on an x86 system and it OOPs in the bridge module in br_pass_frame_up (called by br_handle_frame_finish) because brdev cannot be dereferenced (its set to a non-null value). Adding some BUG_ON statements revealed that BR_INPUT_SKB_CB(skb)->brdev == br-dev (as set in br_handle_frame_finish first) only holds until br_forward is called. The next call to br_pass_frame_up then fails. Digging deeper it seems that br_forward either frees the skb or passes it to NF_HOOK which will in turn take care of freeing the skb. The same is holds for br_pass_frame_ip. So it seems as if two independent skb allocations are required. As far as I can see, commit b33084be192ee1e347d98bb5c9e38a53d98d35e2 ("bridge: Avoid unnecessary clone on forward path") removed skb duplication and so likely causes this crash. This crash does not happen on 2.6.33. I've therefore modified br_forward the same way br_flood has been modified so that the skb is not freed if skb0 is going to be used and I can confirm that the attached patch resolves the issue for me. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
52cf25d0 |
|
18-Jan-2010 |
Emese Revfy <re.emese@gmail.com> |
Driver core: Constify struct sysfs_ops in struct kobj_type Constify struct sysfs_ops. This is part of the ops structure constification effort started by Arjan van de Ven et al. Benefits of this constification: * prevents modification of data that is shared (referenced) by many other structure instances at runtime * detects/prevents accidental (but not intentional) modification attempts on archs that enforce read-only kernel data at runtime * potentially better optimized code as the compiler can assume that the const data cannot be changed * the compiler/linker move const data into .rodata and therefore exclude them from false sharing Signed-off-by: Emese Revfy <re.emese@gmail.com> Acked-by: David Teigland <teigland@redhat.com> Acked-by: Matt Domsch <Matt_Domsch@dell.com> Acked-by: Maciej Sosnowski <maciej.sosnowski@intel.com> Acked-by: Hans J. Koch <hjk@linutronix.de> Acked-by: Pekka Enberg <penberg@cs.helsinki.fi> Acked-by: Jens Axboe <jens.axboe@oracle.com> Acked-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
#
85b35269 |
|
01-Mar-2010 |
Sridhar Samudrala <sri@us.ibm.com> |
bridge: Fix build error when IGMP_SNOOPING is not enabled Fix the following build error when IGMP_SNOOPING is not enabled. In file included from net/bridge/br.c:24: net/bridge/br_private.h: In function 'br_multicast_is_router': net/bridge/br_private.h:361: error: 'struct net_bridge' has no member named 'multicast_router' net/bridge/br_private.h:362: error: 'struct net_bridge' has no member named 'multicast_router' net/bridge/br_private.h:363: error: 'struct net_bridge' has no member named 'multicast_router_timer' Signed-off-by: Sridhar Samudrala <sri@us.ibm.com> Acked-by: Randy Dunlap <randy.dunlap@oracle.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
b195167f |
|
27-Feb-2010 |
Herbert Xu <herbert@gondor.apana.org.au> |
bridge: Add hash elasticity/max sysfs entries This patch allows the user to control the hash elasticity/max parameters. The elasticity setting does not take effect until the next new multicast group is added. At which point it is checked and if after rehashing it still can't be satisfied then snooping will be disabled. The max setting on the other hand takes effect immediately. It must be a power of two and cannot be set to a value less than the current number of multicast group entries. This is the only way to shrink the multicast hash. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
561f1103 |
|
27-Feb-2010 |
Herbert Xu <herbert@gondor.apana.org.au> |
bridge: Add multicast_snooping sysfs toggle This patch allows the user to disable IGMP snooping completely through a sysfs toggle. It also allows the user to reenable snooping when it has been automatically disabled due to hash collisions. If the collisions have not been resolved however the system will refuse to reenable snooping. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
0909e117 |
|
27-Feb-2010 |
Herbert Xu <herbert@gondor.apana.org.au> |
bridge: Add multicast_router sysfs entries This patch allows the user to forcibly enable/disable ports as having multicast routers attached. A port with a multicast router will receive all multicast traffic. The value 0 disables it completely. The default is 1 which lets the system automatically detect the presence of routers (currently this is limited to picking up queries), and 2 means that the port will always receive all multicast traffic. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
5cb5e947 |
|
27-Feb-2010 |
Herbert Xu <herbert@gondor.apana.org.au> |
bridge: Add multicast forwarding functions This patch adds code to perform selective multicast forwarding. We forward multicast traffic to a set of ports plus all multicast router ports. In order to avoid duplications among these two sets of ports, we order all ports by the numeric value of their pointers. The two lists are then walked in lock-step to eliminate duplicates. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
eb1d1641 |
|
27-Feb-2010 |
Herbert Xu <herbert@gondor.apana.org.au> |
bridge: Add core IGMP snooping support This patch adds the core functionality of IGMP snooping support without actually hooking it up. So this patch should be a no-op as far as the bridge's external behaviour is concerned. All the new code and data is controlled by the Kconfig option BRIDGE_IGMP_SNOOPING. A run-time toggle is also available. The multicast switching is done using an hash table that is lockless on the read-side through RCU. On the write-side the new multicast_lock is used for all operations. The hash table supports dynamic growth/rehashing. The hash table will be rehashed if any chain length exceeds a preset limit. If rehashing does not reduce the maximum chain length then snooping will be disabled. These features may be added in future (in no particular order): * IGMPv3 source support * Non-querier router detection * IPv6 Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
b33084be |
|
27-Feb-2010 |
Herbert Xu <herbert@gondor.apana.org.au> |
bridge: Avoid unnecessary clone on forward path When the packet is delivered to the local bridge device we may end up cloning it unnecessarily if no bridge port can receive the packet in br_flood. This patch avoids this by moving the skb_clone into br_flood. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
68b7c895 |
|
27-Feb-2010 |
Herbert Xu <herbert@gondor.apana.org.au> |
bridge: Allow tail-call on br_pass_frame_up This patch allows tail-call on the call to br_pass_frame_up in br_handle_frame_finish. This is now possible because of the previous patch to call br_pass_frame_up last. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
570930fe |
|
04-Feb-2010 |
Herbert Xu <herbert@gondor.apana.org.au> |
bridge: Remove unused age_list This patch removes the unused age_list member from the net_bridge structure. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Acked-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
6fef4c0c |
|
31-Aug-2009 |
Stephen Hemminger <shemminger@vyatta.com> |
netdev: convert pseudo-devices to netdev_tx_t Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
3982d3d2 |
|
13-Aug-2009 |
Fischer, Anna <anna.fischer@hp.com> |
net/bridge: Add 'hairpin' port forwarding mode This patch adds a 'hairpin' (also called 'reflective relay') mode port configuration to the Linux Ethernet bridge kernel module. A bridge supporting hairpin forwarding mode can send frames back out through the port the frame was received on. Hairpin mode is required to support basic VEPA (Virtual Ethernet Port Aggregator) capabilities. You can find additional information on VEPA here: http://tech.groups.yahoo.com/group/evb/ http://www.ieee802.org/1/files/public/docs2009/new-hudson-vepa_seminar-20090514d.pdf http://www.internet2.edu/presentations/jt2009jul/20090719-congdon.pdf An additional patch 'bridge-utils: Add 'hairpin' port forwarding mode' is provided to allow configuring hairpin mode from userspace tools. Signed-off-by: Paul Congdon <paul.congdon@hp.com> Signed-off-by: Anna Fischer <anna.fischer@hp.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
da678292 |
|
04-Jun-2009 |
Michał Mirosław <mirq-linux@rere.qmqm.pl> |
bridge: Simplify interface for ATM LANE This patch changes FDB entry check for ATM LANE bridge integration. There's no point in holding a FDB entry around SKB building. br_fdb_get()/br_fdb_put() pair are changed into single br_fdb_test_addr() hook that checks if the addr has FDB entry pointing to other port to the one the request arrived on. FDB entry refcounting is removed as it's not used anywhere else. Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl> Acked-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
712d6954 |
|
08-Sep-2008 |
Alexey Dobriyan <adobriyan@gmail.com> |
netns bridge: cleanup bridges during netns stop Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Acked-by: Stephen Hemminger <shemming@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
4aa678ba |
|
08-Sep-2008 |
Alexey Dobriyan <adobriyan@gmail.com> |
netns bridge: allow bridges in netns! Bridge as netdevice doesn't cross netns boundaries. Bridge ports and bridge itself live in same netns. Notifiers are fixed. netns propagated from userspace socket for setup and teardown. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Acked-by: Stephen Hemminger <shemming@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
4adf0af6 |
|
30-Jul-2008 |
Simon Wunderlich <siwu@hrz.tu-chemnitz.de> |
bridge: send correct MTU value in PMTU (revised) When bridging interfaces with different MTUs, the bridge correctly chooses the minimum of the MTUs of the physical devices as the bridges MTU. But when a frame is passed which fits through the incoming, but not through the outgoing interface, a "Fragmentation Needed" packet is generated. However, the propagated MTU is hardcoded to 1500, which is wrong in this situation. The sender will repeat the packet again with the same frame size, and the same problem will occur again. Instead of sending 1500, the (correct) MTU value of the bridge is now sent via PMTU. To achieve this, the corresponding rtable structure is stored in its net_bridge structure. Modified to get rid of fake_net_device as well. Signed-off-by: Simon Wunderlich <siwu@hrz.tu-chemnitz.de> Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
7c85fbf0 |
|
05-Jul-2008 |
Patrick McHardy <kaber@trash.net> |
bridge: Use STP demux Use the STP demux layer for receiving STP PDUs instead of directly registering with LLC. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
92c0574f |
|
17-Jun-2008 |
Stephen Hemminger <shemminger@vyatta.com> |
bridge: make bridge address settings sticky Normally, the bridge just chooses the smallest mac address as the bridge id and mac address of bridge device. But if the administrator has explictly set the interface address then don't change it. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
0b040829 |
|
10-Jun-2008 |
Adrian Bunk <bunk@kernel.org> |
net: remove CVS keywords This patch removes CVS keywords that weren't updated for a long time from comments. Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
a339f1c8 |
|
21-May-2008 |
Pavel Emelyanov <xemul@openvz.org> |
bridge: Use on-device stats instead of private ones. Even though bridges require 6 fields from struct net_device_stats, the on-device stats are always there, so we may just use them. The br_dev_get_stats is no longer required after this. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
43b98c4a |
|
17-Dec-2007 |
Greg Kroah-Hartman <gregkh@suse.de> |
Kobject: change net/bridge to use kobject_create_and_add The kobject in the bridge code is only used for registering with sysfs, not for any lifespan rules. This patch changes it to be only a pointer and use the simpler api for this kind of thing. Cc: Stephen Hemminger <shemminger@linux-foundation.org> Cc: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
#
881d966b |
|
17-Sep-2007 |
Eric W. Biederman <ebiederm@xmission.com> |
[NET]: Make the device list and device lookups per namespace. This patch makes most of the generic device layer network namespace safe. This patch makes dev_base_head a network namespace variable, and then it picks up a few associated variables. The functions: dev_getbyhwaddr dev_getfirsthwbytype dev_get_by_flags dev_get_by_name __dev_get_by_name dev_get_by_index __dev_get_by_index dev_ioctl dev_ethtool dev_load wireless_process_ioctl were modified to take a network namespace argument, and deal with it. vlan_ioctl_set and brioctl_set were modified so their hooks will receive a network namespace argument. So basically anthing in the core of the network stack that was affected to by the change of dev_base was modified to handle multiple network namespaces. The rest of the network stack was simply modified to explicitly use &init_net the initial network namespace. This can be fixed when those components of the network stack are modified to handle multiple network namespaces. For now the ifindex generator is left global. Fundametally ifindex numbers are per namespace, or else we will have corner case problems with migration when we get that far. At the same time there are assumptions in the network stack that the ifindex of a network device won't change. Making the ifindex number global seems a good compromise until the network stack can cope with ifindex changes when you change namespaces, and the like. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
e081e1e3 |
|
16-Sep-2007 |
Herbert Xu <herbert@gondor.apana.org.au> |
[BRIDGE]: Kill clone argument to br_flood_* The clone argument is only used by one caller and that caller can clone the packet itself. This patch moves the clone call into the caller and kills the clone argument. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
87a596e0 |
|
07-Apr-2007 |
Akinobu Mita <akinobu.mita@gmail.com> |
bridge: check kmem_cache_create() error This patch checks kmem_cache_create() error and aborts loading module on failure. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
|
#
9cde0708 |
|
21-Mar-2007 |
Stephen Hemminger <shemminger@linux-foundation.org> |
bridge: add support for user mode STP This patchset based on work by Aji_Srinivas@emc.com provides allows spanning tree to be controled from userspace. Like hotplug, it uses call_usermodehelper when spanning tree is enabled so there is no visible API change. If call to start usermode STP fails it falls back to existing kernel STP. Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
|
#
9cf63747 |
|
09-Apr-2007 |
Stephen Hemminger <shemminger@linux-foundation.org> |
bridge: add sysfs hook to flush forwarding table The RSTP daemon needs to be able to flush all dynamic forwarding entries in the case of topology change. This is a temporary interface. It will change to a netlink interface before RSTP daemon is officially released. Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
|
#
6229e362 |
|
21-Mar-2007 |
Stephen Hemminger <shemminger@osdl.org> |
bridge: eliminate call by reference Change the bridging hook to be simple function with return value rather than modifying the skb argument. This could generate better code and is cleaner. Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
|
#
32fe21c0 |
|
22-Mar-2007 |
Thomas Graf <tgraf@suug.ch> |
[BRIDGE]: Use rtnl registration interface Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
269def7c |
|
22-Feb-2007 |
Stephen Hemminger <shemminger@linux-foundation.org> |
[BRIDGE]: eliminate workqueue for carrier check Having a work queue for checking carrier leads to lots of race issues. Simpler to just get the cost when data structure is created and update on change. Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
ac062e84 |
|
22-Feb-2007 |
Stephen Hemminger <shemminger@linux-foundation.org> |
[BRIDGE]: get rid of miscdevice include The bridge hasn't used miscdevice for a long long time. Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
9d6f229f |
|
09-Feb-2007 |
YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> |
[NET] BRIDGE: Fix whitespace errors. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
c4028958 |
|
22-Nov-2006 |
David Howells <dhowells@redhat.com> |
WorkStruct: make allyesconfig Fix up for make allyesconfig. Signed-Off-By: David Howells <dhowells@redhat.com>
|
#
1a620698 |
|
12-Oct-2006 |
Stephen Hemminger <shemminger@osdl.org> |
[BRIDGE]: flush forwarding table when device carrier off Flush the forwarding table when carrier is lost. This helps for availability because we don't want to forward to a downed device and new packets may come in on other links. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
11dc1f36 |
|
25-May-2006 |
Stephen Hemminger <shemminger@osdl.org> |
[BRIDGE]: netlink interface for link management Add basic netlink support to the Ethernet bridge. Including: * dump interfaces in bridges * monitor link status changes * change state of bridge port For some demo programs see: http://developer.osdl.org/shemminger/prototypes/brnl.tar.gz These are to allow building a daemon that does alternative implementations of Spanning Tree Protocol. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
c0909713 |
|
25-May-2006 |
Stephen Hemminger <shemminger@osdl.org> |
[BRIDGE]: fix module startup error handling Return address in use, if some other kernel code has the SAP. Propogate out error codes from netfilter registration and unwind. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
fda93d92 |
|
20-Mar-2006 |
Stephen Hemminger <shemminger@osdl.org> |
[BRIDGE]: allow show/store of group multicast address Bridge's communicate with each other using Spanning Tree Protocol over a standard multicast address. There are times when testing or layering bridges over existing topologies or tunnels, when it is useful to use alternative multicast addresses for STP packets. The 802.1d standard has some unused addresses, that can be used for this. This patch is restrictive in that it only allows one of the possible addresses in the standard. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
cf0f02d0 |
|
20-Mar-2006 |
Stephen Hemminger <shemminger@osdl.org> |
[BRIDGE]: use llc for receiving STP packets Use LLC for the receive path of Spanning Tree Protocol packets. This allows link local multicast packets to be received by other protocols (if they care), and uses the existing LLC code to get STP packets back into bridge code. The bridge multicast address is also checked, so bridges using other link local multicast addresses are ignored. This allows for use of different multicast addresses to define separate STP domains. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
bab1deea |
|
09-Feb-2006 |
Stephen Hemminger <shemminger@osdl.org> |
[BRIDGE]: fix error handling for add interface to bridge Refactor how the bridge code interacts with kobject system. It should still use kobjects even if not using sysfs. Fix the error unwind handling in br_add_if. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
b3f1be4b |
|
09-Feb-2006 |
Stephen Hemminger <shemminger@osdl.org> |
[BRIDGE]: fix for RCU and deadlock on device removal Change Bridge receive path to correctly handle RCU removal of device from bridge. Also fixes deadlock between carrier_check and del_nbp. This replaces the previous deleted flag fix. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
3f4cfc2d |
|
31-Jan-2006 |
Stephen Hemminger <shemminger@osdl.org> |
[BRIDGE]: Fix device delete race. This is a simpler fix for the two races in bridge device removal. The Xen race of delif and notify is managed now by a new deleted flag. No need for barriers or other locking because of rtnl mutex. The del_timer_sync()'s are unnecessary, because br_stp_disable_port delete's the timers, and they will finish running before RCU callback. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
8cbb512e |
|
21-Dec-2005 |
Stephen Hemminger <shemminger@osdl.org> |
[BRIDGE]: add version number Add version info to bridge module. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
edb5e46f |
|
21-Dec-2005 |
Stephen Hemminger <shemminger@osdl.org> |
[BRIDGE]: limited ethtool support Add limited ethtool support to bridge to allow disabling features. Note: if underlying device does not support a feature (like checksum offload), then the bridge device won't inherit it. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
4433f420 |
|
20-Dec-2005 |
Stephen Hemminger <shemminger@osdl.org> |
[BRIDGE]: handle speed detection after carrier changes Speed of a interface may not be available until carrier is detected in the case of autonegotiation. To get the correct value we need to recheck speed after carrier event. But the check needs to be done in a context that is similar to normal ethtool interface (can sleep). Also, delay check for 1ms to try avoid any carrier bounce transitions. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
4505a3ef |
|
21-Dec-2005 |
Stephen Hemminger <shemminger@osdl.org> |
[BRIDGE]: allow setting hardware address of bridge pseudo-dev Some people are using bridging to hide multiple machines from an ISP that restricts by MAC address. So in that case allow the bridge mac address to be set to any of the existing interfaces. I don't want to allow any arbitrary value and confuse STP. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
81d35307 |
|
29-May-2005 |
Stephen Hemminger <shemminger@osdl.org> |
[BRIDGE]: set features based on enslaved devices Make features of the bridge pseudo-device be a subset of the underlying devices. Motivated by Xen and others who use bridging to do failover. Signed-off-by: Catalin BOIE <catab at umrella.ro> Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
1da177e4 |
|
16-Apr-2005 |
Linus Torvalds <torvalds@ppc970.osdl.org> |
Linux-2.6.12-rc2 Initial git repository build. I'm not bothering with the full history, even though we have it. We can create a separate "historical" git archive of that later if we want to, and in the meantime it's about 3.2GB when imported into git - space that would just make the early git days unnecessarily complicated, when we don't have a lot of good infrastructure for it. Let it rip!
|