#
55c90047 |
|
27-Oct-2023 |
Jakub Kicinski <kuba@kernel.org> |
net: fill in MODULE_DESCRIPTION()s under drivers/net/ W=1 builds now warn if module is built without a MODULE_DESCRIPTION(). Acked-by: Willem de Bruijn <willemb@google.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Jason Wang <jasowang@redhat.com> Acked-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
10a03c36 |
|
13-Mar-2023 |
Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
drivers: remove struct module * setting from struct class There is no need to manually set the owner of a struct class, as the registering function does it automatically, so remove all of the explicit settings from various drivers that did so as it is unneeded. This allows us to remove this pointer entirely from this structure going forward. Cc: "Rafael J. Wysocki" <rafael@kernel.org> Link: https://lore.kernel.org/r/20230313181843.1207845-2-gregkh@linuxfoundation.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
#
fa627348 |
|
01-Oct-2022 |
Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
driver core: class: make namespace and get_ownership take const * The callbacks in struct class namespace() and get_ownership() do not modify the struct device passed to them, so mark the pointer as constant and fix up all callbacks in the kernel to have the correct function signature. This helps make it more obvious what calls and callbacks do, and do not, modify structures passed to them. Cc: "Rafael J. Wysocki" <rafael@kernel.org> Link: https://lore.kernel.org/r/20221001165426.2690912-1-gregkh@linuxfoundation.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
#
d57aae2e |
|
17-Sep-2022 |
Xiu Jianfeng <xiujianfeng@huawei.com> |
net: macvtap: add __init/__exit annotations to module init/exit funcs Add missing __init/__exit annotations to module init/exit funcs. Signed-off-by: Xiu Jianfeng <xiujianfeng@huawei.com> Link: https://lore.kernel.org/r/20220917083535.20040-1-xiujianfeng@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
a0219215 |
|
27-Feb-2022 |
Sven Eckelmann <sven@narfation.org> |
macvtap: advertise link netns via netlink Assign rtnl_link_ops->get_link_net() callback so that IFLA_LINK_NETNSID is added to rtnetlink messages. This fixes iproute2 which otherwise resolved the link interface to an interface in the wrong namespace. Test commands: ip netns add nst ip link add dummy0 type dummy ip link add link macvtap0 link dummy0 type macvtap ip link set macvtap0 netns nst ip -netns nst link show macvtap0 Before: 10: macvtap0@gre0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 500 link/ether 5e:8f:ae:1d:60:50 brd ff:ff:ff:ff:ff:ff After: 10: macvtap0@if2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 500 link/ether 5e:8f:ae:1d:60:50 brd ff:ff:ff:ff:ff:ff link-netnsid 0 Reported-by: Leonardo Mörlein <freifunk@irrelefant.net> Signed-off-by: Sven Eckelmann <sven@narfation.org> Link: https://lore.kernel.org/r/20220228003240.1337426-1-sven@narfation.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
1c5b5b3f |
|
15-Oct-2021 |
Jean Sacren <sakiwit@gmail.com> |
net: macvtap: fix template string argument of device_create() call The last argument of device_create() call should be a template string. The tap_name variable should be the argument to the string, but not the argument of the call itself. We should add the template string and turn tap_name into its argument. Signed-off-by: Jean Sacren <sakiwit@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
09c434b8 |
|
19-May-2019 |
Thomas Gleixner <tglx@linutronix.de> |
treewide: Add SPDX license identifier for more missed files Add SPDX license identifiers to all files which: - Have no license information of any form - Have MODULE_LICENCE("GPL*") inside which was used in the initial scan/conversion to ignore the file These files fall under the project license, GPL v2 only. The resulting SPDX license identifier is: GPL-2.0-only Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
#
dea6e19f |
|
27-Oct-2017 |
Girish Moodalbail <girish.moodalbail@oracle.com> |
tap: reference to KVA of an unloaded module causes kernel panic The commit 9a393b5d5988 ("tap: tap as an independent module") created a separate tap module that implements tap functionality and exports interfaces that will be used by macvtap and ipvtap modules to create create respective tap devices. However, that patch introduced a regression wherein the modules macvtap and ipvtap can be removed (through modprobe -r) while there are applications using the respective /dev/tapX devices. These applications cause kernel to hold reference to /dev/tapX through 'struct cdev macvtap_cdev' and 'struct cdev ipvtap_dev' defined in macvtap and ipvtap modules respectively. So, when the application is later closed the kernel panics because we are referencing KVA that is present in the unloaded modules. ----------8<------- Example ----------8<---------- $ sudo ip li add name mv0 link enp7s0 type macvtap $ sudo ip li show mv0 |grep mv0| awk -e '{print $1 $2}' 14:mv0@enp7s0: $ cat /dev/tap14 & $ lsmod |egrep -i 'tap|vlan' macvtap 16384 0 macvlan 24576 1 macvtap tap 24576 3 macvtap $ sudo modprobe -r macvtap $ fg cat /dev/tap14 ^C <...system panics...> BUG: unable to handle kernel paging request at ffffffffa038c500 IP: cdev_put+0xf/0x30 ----------8<-----------------8<---------- The fix is to set cdev.owner to the module that creates the tap device (either macvtap or ipvtap). With this set, the operations (in fs/char_dev.c) on char device holds and releases the module through cdev_get() and cdev_put() and will not allow the module to unload prematurely. Fixes: 9a393b5d5988ea4e (tap: tap as an independent module) Signed-off-by: Girish Moodalbail <girish.moodalbail@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
42ab19ee |
|
04-Oct-2017 |
David Ahern <dsahern@gmail.com> |
net: Add extack to upper device linking Add extack arg to netdev_upper_dev_link and netdev_master_upper_dev_link Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
fb652fdf |
|
03-Jul-2017 |
David S. Miller <davem@davemloft.net> |
macvlan/macvtap: Remove NETIF_F_UFO advertisement. It is going away. Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
7a3f4a18 |
|
25-Jun-2017 |
Matthias Schiffer <mschiffer@universe-factory.net> |
net: add netlink_ext_ack argument to rtnl_link_ops.newlink Add support for extended error reporting. Signed-off-by: Matthias Schiffer <mschiffer@universe-factory.net> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
174cd4b1 |
|
02-Feb-2017 |
Ingo Molnar <mingo@kernel.org> |
sched/headers: Prepare to move signal wakeup & sigpending methods from <linux/sched.h> into <linux/sched/signal.h> Fix up affected files that include this signal functionality via sched.h. Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
9a393b5d |
|
10-Feb-2017 |
Sainath Grandhi <sainath.grandhi@intel.com> |
tap: tap as an independent module This patch makes tap a separate module for other types of virtual interfaces, for example, ipvlan to use. Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
837585a5 |
|
03-Feb-2017 |
Willem de Bruijn <willemb@google.com> |
macvtap: read vnet_hdr_size once When IFF_VNET_HDR is enabled, a virtio_net header must precede data. Data length is verified to be greater than or equal to expected header length tun->vnet_hdr_sz before copying. Macvtap functions read the value once, but unless READ_ONCE is used, the compiler may ignore this and read multiple times. Enforce a single read and locally cached value to avoid updates between test and use. Signed-off-by: Willem de Bruijn <willemb@google.com> Suggested-by: Eric Dumazet <edumazet@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
6391a448 |
|
19-Jan-2017 |
Jason Wang <jasowang@redhat.com> |
virtio-net: restore VIRTIO_HDR_F_DATA_VALID on receiving Commit 501db511397f ("virtio: don't set VIRTIO_NET_HDR_F_DATA_VALID on xmit") in fact disables VIRTIO_HDR_F_DATA_VALID on receiving path too, fixing this by adding a hint (has_data_valid) and set it only on the receiving path. Cc: Rolf Neugebauer <rolf.neugebauer@docker.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: Rolf Neugebauer <rolf.neugebauer@docker.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
cbbd26b8 |
|
01-Nov-2016 |
Al Viro <viro@zeniv.linux.org.uk> |
[iov_iter] new primitives - copy_from_iter_full() and friends copy_from_iter_full(), copy_from_iter_full_nocache() and csum_and_copy_from_iter_full() - counterparts of copy_from_iter() et.al., advancing iterator only in case of successful full copy and returning whether it had been successful or not. Convert some obvious users. *NOTE* - do not blindly assume that something is a good candidate for those unless you are sure that not advancing iov_iter in failure case is the right thing in this case. Anything that does short read/short write kind of stuff (or is in a loop, etc.) is unlikely to be a good one. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
aa196eed |
|
29-Nov-2016 |
Jason Wang <jasowang@redhat.com> |
macvtap: handle ubuf refcount correctly when meet errors We trigger uarg->callback() immediately after we decide do datacopy even if caller want to do zerocopy. This will cause the callback (vhost_net_zerocopy_callback) decrease the refcount. But when we meet an error afterwards, the error handling in vhost handle_tx() will try to decrease it again. This is wrong and fix this by delay the uarg->callback() until we're sure there's no errors. Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
763dfa27 |
|
28-Nov-2016 |
Zhang Shengju <zhangshengju@cmss.chinamobile.com> |
macvtap: replace printk with netdev_err This patch replaces printk() with netdev_err() for macvtap device. Signed-off-by: Zhang Shengju <zhangshengju@cmss.chinamobile.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
e824265d |
|
24-Nov-2016 |
Gao Feng <fgao@ikuai8.com> |
driver: macvtap: Unregister netdev rx_handler if macvtap_newlink fails The macvtap_newlink registers the netdev rx_handler firstly, but it does not unregister the handler if macvlan_common_newlink failed. Signed-off-by: Gao Feng <fgao@ikuai8.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
3e9e40e7 |
|
18-Nov-2016 |
Jarno Rajahalme <jarno@ovn.org> |
virtio_net: Simplify call sites for virtio_net_hdr_{from, to}_skb(). No point storing the return value of virtio_net_hdr_to_skb() or virtio_net_hdr_from_skb() to a variable when the value is used only once as a boolean in an immediately following if statement. Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
104a4933 |
|
11-Aug-2016 |
Jason Wang <jasowang@redhat.com> |
macvtap: fix use after free for skb_array during release We've clean skb_array in macvtap_put_queue() but still try to pop from it during macvtap_sock_destruct(). Fix this use after free by moving the skb array cleanup to macvtap_sock_destruct() instead. Fixes: 362899b8725b ("macvtap: switch to use skb array") Reported-by: Cornelia Huck <cornelia.huck@de.ibm.com> Tested-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
0d7eacbe |
|
18-Jul-2016 |
Jason Wang <jasowang@redhat.com> |
macvtap: correctly free skb during socket destruction We should use kfree_skb() instead of kfree() to free an skb. Fixes: 362899b8725b ("macvtap: switch to use skb array") Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
362899b8 |
|
15-Jul-2016 |
Jason Wang <jasowang@redhat.com> |
macvtap: switch to use skb array This patch switch to use skb array instead of sk_receive_queue to avoid spinlock contentions. Tests shows about 21% improvements for guest rx pps: Before: 1472731 pkts/s After: 1786289 pkts/s Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
1b16bf42 |
|
15-Jul-2016 |
Jason Wang <jasowang@redhat.com> |
macvtap: avoid hash calculating for single queue We decide the rxq through calculating its hash which is not necessary if we only have one rx queue. So this patch skip this and just return queue 0. Test shows 22% improving on guest rx pps. Before: 1201504 pkts/s After: 1472731 pkts/s Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
fd88d68b |
|
08-Jun-2016 |
Mike Rapoport <rppt@linux.vnet.ibm.com> |
macvtap: use common code for virtio_net_hdr and skb GSO conversion Replace open coded conversion between virtio_net_hdr to skb GSO info with virtio_net_hdr_{from,to}_skb Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
be0bd316 |
|
06-May-2016 |
Eric Dumazet <edumazet@google.com> |
macvtap: segmented packet is consumed If GSO packet is segmented and its segments are properly queued, we call consume_skb() instead of kfree_skb() to be drop monitor friendly. Fixes: 3e4f8b7873709 ("macvtap: Perform GSO on forwarding path.") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Vlad Yasevich <vyasevic@redhat.com> Reviewed-by: Shmulik Ladkani <shmulik.ladkani@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
17af2bce |
|
04-May-2016 |
Marc Angel <marc@arista.com> |
macvtap: add namespace support to the sysfs device class When creating macvtaps that are expected to have the same ifindex in different network namespaces, only the first one will succeed. The others will fail with a sysfs_warn_dup warning due to them trying to create the following sysfs link (with 'NN' the ifindex of macvtapX): /sys/class/macvtap/tapNN -> /sys/devices/virtual/net/macvtapX/tapNN This is reproducible by running the following commands: ip netns add ns1 ip netns add ns2 ip link add veth0 type veth peer name veth1 ip link set veth0 netns ns1 ip link set veth1 netns ns2 ip netns exec ns1 ip l add link veth0 macvtap0 type macvtap ip netns exec ns2 ip l add link veth1 macvtap1 type macvtap The last command will fail with "RTNETLINK answers: File exists" (along with the kernel warning) but retrying it will work because the ifindex was incremented. The 'net' device class is isolated between network namespaces so each one has its own hierarchy of net devices. This isn't the case for the 'macvtap' device class. The problem occurs half-way through the netdev registration, when `macvtap_device_event` is called-back to create the 'tapNN' macvtap class device under the 'macvtapX' net class device. This patch adds namespace support to the 'macvtap' device class so that /sys/class/macvtap is no longer shared between net namespaces. However, making the macvtap sysfs class namespace-aware has the side effect of changing /sys/devices/virtual/net/macvtapX/tapNN into /sys/devices/virtual/net/macvtapX/macvtap/tapNN. This is due to Commit 24b1442 ("Driver-core: Always create class directories for classses that support namespaces") and the fact that class devices supporting namespaces are really not supposed to be placed directly under other class devices. To avoid breaking userland, a tapNN symlink pointing to macvtap/tapNN is created inside the macvtapX directory. Signed-off-by: Marc Angel <marc@arista.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
e96c37f1 |
|
23-Apr-2016 |
Francesco Ruggeri <fruggeri@arista.com> |
macvtap: check minor when unregistering macvtap_device_event(NETDEV_UNREGISTER) should check vlan->minor to determine if it is being invoked in the context of a macvtap_newlink that failed, for example in this code sequence: macvtap_newlink macvlan_common_newlink register_netdevice call_netdevice_notifiers(NETDEV_REGISTER, dev) macvtap_device_event(NETDEV_REGISTER) <fail here, vlan->minor = 0> rollback_registered(dev); rollback_registered_many call_netdevice_notifiers(NETDEV_UNREGISTER, dev); macvtap_device_event(NETDEV_UNREGISTER) <nothing to clean up here> Signed-off-by: Francesco Ruggeri <fruggeri@arista.com> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
8e2ad411 |
|
08-Mar-2016 |
Willem de Bruijn <willemb@google.com> |
macvtap: always pass ethernet header in linear The stack expects link layer headers in the skb linear section. Macvtap can create skbs with llheader in frags in edge cases: when (IFF_VNET_HDR is off or vnet_hdr.hdr_len < ETH_HLEN) and prepad + len > PAGE_SIZE and vnet_hdr.flags has no or bad csum. Add checks to ensure linear is always at least ETH_HLEN. At this point, len is already ensured to be >= ETH_HLEN. For backwards compatiblity, rounds up short vnet_hdr.hdr_len. This differs from tap and packet, which return an error. Fixes b9fb9ee07e67 ("macvtap: add GSO/csum offload support") Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
a188222b |
|
14-Dec-2015 |
Tom Herbert <tom@herbertland.com> |
net: Rename NETIF_F_ALL_CSUM to NETIF_F_CSUM_MASK The name NETIF_F_ALL_CSUM is a misnomer. This does not correspond to the set of features for offloading all checksums. This is a mask of the checksum offload related features bits. It is incorrect to set both NETIF_F_HW_CSUM and NETIF_F_IP_CSUM or NETIF_F_IPV6 at the same time for features of a device. This patch: - Changes instances of NETIF_F_ALL_CSUM to NETIF_F_CSUM_MASK (where NETIF_F_ALL_CSUM is being used as a mask). - Changes bonding, sfc/efx, ipvlan, macvlan, vlan, and team drivers to use NEITF_F_HW_CSUM in features list instead of NETIF_F_ALL_CSUM. Signed-off-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
9cd3e072 |
|
29-Nov-2015 |
Eric Dumazet <edumazet@google.com> |
net: rename SOCK_ASYNC_NOSPACE and SOCK_ASYNC_WAITDATA This patch is a cleanup to make following patch easier to review. Goal is to move SOCK_ASYNC_NOSPACE and SOCK_ASYNC_WAITDATA from (struct socket)->flags to a (struct socket_wq)->flags to benefit from RCU protection in sock_wake_async() To ease backports, we rename both constants. Two new helpers, sk_set_bit(int nr, struct sock *sk) and sk_clear_bit(int net, struct sock *sk) are added so that following patch can change their implementation. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
a499a2e9 |
|
09-Nov-2015 |
Vlad Yasevich <vyasevich@gmail.com> |
macvtap: Resolve possible __might_sleep warning in macvtap_do_read() macvtap_do_read code calls macvtap_put_user while it might be set up to wait for the user. This results in the following warning: Jun 23 16:25:26 galen kernel: ------------[ cut here ]------------ Jun 23 16:25:26 galen kernel: WARNING: CPU: 0 PID: 30433 at kernel/sched/core.c: 7286 __might_sleep+0x7f/0x90() Jun 23 16:25:26 galen kernel: do not call blocking ops when !TASK_RUNNING; state =1 set at [<ffffffff810f1c1f>] prepare_to_wait+0x2f/0x90 Jun 23 16:25:26 galen kernel: CPU: 0 PID: 30433 Comm: cat Not tainted 4.1.0-rc6+ #11 Jun 23 16:25:26 galen kernel: Call Trace: Jun 23 16:25:26 galen kernel: [<ffffffff817f76ba>] dump_stack+0x4c/0x65 Jun 23 16:25:26 galen kernel: [<ffffffff810a07ca>] warn_slowpath_common+0x8a/0xc 0 Jun 23 16:25:26 galen kernel: [<ffffffff810a0846>] warn_slowpath_fmt+0x46/0x50 Jun 23 16:25:26 galen kernel: [<ffffffff810f1c1f>] ? prepare_to_wait+0x2f/0x90 Jun 23 16:25:26 galen kernel: [<ffffffff810f1c1f>] ? prepare_to_wait+0x2f/0x90 Jun 23 16:25:26 galen kernel: [<ffffffff810cdc1f>] __might_sleep+0x7f/0x90 Jun 23 16:25:26 galen kernel: [<ffffffff811f8e15>] might_fault+0x55/0xb0 Jun 23 16:25:26 galen kernel: [<ffffffff810fab9d>] ? trace_hardirqs_on_caller+0x fd/0x1c0 Jun 23 16:25:26 galen kernel: [<ffffffff813f639c>] copy_to_iter+0x7c/0x360 Jun 23 16:25:26 galen kernel: [<ffffffffa052da86>] macvtap_do_read+0x256/0x3d0 [macvtap] Jun 23 16:25:26 galen kernel: [<ffffffff810f20e0>] ? prepare_to_wait_event+0x110/0x110 Jun 23 16:25:26 galen kernel: [<ffffffffa052dcab>] macvtap_read_iter+0x2b/0x50 [macvtap] Jun 23 16:25:26 galen kernel: [<ffffffff81247f2e>] __vfs_read+0xae/0xe0 Jun 23 16:25:26 galen kernel: [<ffffffff81248526>] vfs_read+0x86/0x140 Jun 23 16:25:26 galen kernel: [<ffffffff812493b9>] SyS_read+0x49/0xb0 Jun 23 16:25:26 galen kernel: [<ffffffff8180182e>] system_call_fastpath+0x12/0x76 Jun 23 16:25:26 galen kernel: ---[ end trace 22e33f67e70c0c2a ]--- Make sure thet we call finish_wait() if we have the skb to process before trying to actually process it. Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
f23d538b |
|
22-Oct-2015 |
Jason Wang <jasowang@redhat.com> |
macvtap: unbreak receiving of gro skb with frag list We don't have fraglist support in TAP_FEATURES. This will lead software segmentation of gro skb with frag list. Fixes by having frag list support in TAP_FEATURES. With this patch single session of netperf receiving were restored from about 5Gb/s to about 12Gb/s on mlx4. Fixes a567dd6252 ("macvtap: simplify usage of tap_features") Cc: Vlad Yasevich <vyasevic@redhat.com> Cc: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
3ea79249 |
|
18-Sep-2015 |
Michael S. Tsirkin <mst@redhat.com> |
macvtap: fix TUNSETSNDBUF values > 64k Upon TUNSETSNDBUF, macvtap reads the requested sndbuf size into a local variable u. commit 39ec7de7092b ("macvtap: fix uninitialized access on TUNSETIFF") changed its type to u16 (which is the right thing to do for all other macvtap ioctls), breaking all values > 64k. The value of TUNSETSNDBUF is actually a signed 32 bit integer, so the right thing to do is to read it into an int. Cc: David S. Miller <davem@davemloft.net> Fixes: 39ec7de7092b ("macvtap: fix uninitialized access on TUNSETIFF") Reported-by: Mark A. Peloquin Bisected-by: Matthew Rosato <mjrosato@linux.vnet.ibm.com> Reported-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Tested-by: Matthew Rosato <mjrosato@linux.vnet.ibm.com> Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
c5c62f1b |
|
23-Jul-2015 |
Ivan Vecera <ivecera@redhat.com> |
macvtap: fix network header pointer for VLAN tagged pkts Network header is set with offset ETH_HLEN but it is not true for VLAN (multiple-)tagged and results in checksum issues in lower devices. v2: leave skb->protocol untouched (thx Vlad), comment added v3: moved after skb_probe_transport_header() call (thx Toshiaki) Signed-off-by: Ivan Vecera <ivecera@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
d5de1987 |
|
08-Jul-2015 |
Johannes Thumshirn <jthumshirn@suse.de> |
macvtap: Destroy minor_idr on module_exit Destroy minor_idr on module_exit, reclaiming the allocated memory. This was detected by the following semantic patch (written by Luis Rodriguez <mcgrof@suse.com>) <SmPL> @ defines_module_init @ declarer name module_init, module_exit; declarer name DEFINE_IDR; identifier init; @@ module_init(init); @ defines_module_exit @ identifier exit; @@ module_exit(exit); @ declares_idr depends on defines_module_init && defines_module_exit @ identifier idr; @@ DEFINE_IDR(idr); @ on_exit_calls_destroy depends on declares_idr && defines_module_exit @ identifier declares_idr.idr, defines_module_exit.exit; @@ exit(void) { ... idr_destroy(&idr); ... } @ missing_module_idr_destroy depends on declares_idr && defines_module_exit && !on_exit_calls_destroy @ identifier declares_idr.idr, defines_module_exit.exit; @@ exit(void) { ... +idr_destroy(&idr); } </SmPL> Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
dfe816c5 |
|
19-Jun-2015 |
Pankaj Gupta <pagupta@redhat.com> |
macvtap: Increase limit of macvtap queues Macvtap should be compatible with tuntap for maximum number of queues. commit 'baf71c5c1f80d82e92924050a60b5baaf97e3094 (tuntap: Increase the number of queues in tun.)' removes the limitations and increases number of queues in tuntap. Now, Its safe to increase number of queues in Macvtap as well. This patch also modifies 'macvtap_del_queues' function to avoid extra memory allocation in stack. Changes from v1->v2 : Michael S. Tsirkin, Jason Wang : Better way to use linked list to avoid use of extra memory in stack. Sergei Shtylyov : Specify dependent commit's summary. Signed-off-by: Pankaj Gupta <pagupta@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
8b8e658b |
|
24-Apr-2015 |
Greg Kurz <groug@kaod.org> |
macvtap/tun: cross-endian support for little-endian hosts The VNET_LE flag was introduced to fix accesses to virtio 1.0 headers that are always little-endian. It can also be used to handle the special case of a legacy little-endian device implemented by a big-endian host. Let's add a flag and ioctls for big-endian devices as well. If both flags are set, little-endian wins. Since this is isn't a common usecase, the feature is controlled by a kernel config option (not set by default). Both macvtap and tun are covered by this patch since they share the same API with userland. Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
|
#
7d824109 |
|
24-Apr-2015 |
Greg Kurz <groug@kaod.org> |
virtio: add explicit big-endian support to memory accessors The current memory accessors logic is: - little endian if little_endian - native endian (i.e. no byteswap) if !little_endian If we want to fully support cross-endian vhost, we also need to be able to convert to big endian. Instead of changing the little_endian argument to some 3-value enum, this patch changes the logic to: - little endian if little_endian - big endian if !little_endian The native endian case is handled by all users with a trivial helper. This patch doesn't change any functionality, nor it does add overhead. Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
|
#
5b11e15f |
|
24-Apr-2015 |
Greg Kurz <groug@kaod.org> |
macvtap: introduce macvtap_is_little_endian() helper Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
|
#
7f460d30 |
|
13-May-2015 |
Justin Cormack <justin@myriabit.com> |
fix missing copy_from_user in macvtap Fix missing copy_from_user in macvtap SIOCSIFHWADDR ioctl. Signed-off-by: Justin Cormack <justin@netbsd.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
b5082083 |
|
11-May-2015 |
Justin Cormack <justin@myriabit.com> |
macvtap add missing ioctls - fix wrapping The macvtap driver tries to emulate all the ioctls supported by a normal tun/tap driver, however it was missing the generic SIOCGIFHWADDR and SIOCSIFHWADDR ioctls to get and set the mac address that are supported by tun/tap. This patch adds these. Signed-off-by: Justin Cormack <justin@netbsd.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
11aa9c28 |
|
08-May-2015 |
Eric W. Biederman <ebiederm@xmission.com> |
net: Pass kern from net_proto_family.create to sk_alloc In preparation for changing how struct net is refcounted on kernel sockets pass the knowledge that we are creating a kernel socket from sock_create_kern through to sk_alloc. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
8b86a61d |
|
17-Apr-2015 |
Johannes Berg <johannes.berg@intel.com> |
net: remove unused 'dev' argument from netif_needs_gso() In commit 04ffcb255f22 ("net: Add ndo_gso_check") Tom originally added the 'dev' argument to be able to call ndo_gso_check(). Then later, when generalizing this in commit 5f35227ea34b ("net: Generalize ndo_gso_check to ndo_features_check") Jesse removed the call to ndo_gso_check() in netif_needs_gso() by calling the new ndo_features_check() in a different place. This made the 'dev' argument unused. Remove the unused argument and go back to the code as before. Cc: Tom Herbert <therbert@google.com> Cc: Jesse Gross <jesse@nicira.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
5d5d5689 |
|
03-Apr-2015 |
Al Viro <viro@zeniv.linux.org.uk> |
make new_sync_{read,write}() static All places outside of core VFS that checked ->read and ->write for being NULL or called the methods directly are gone now, so NULL {read,write} with non-NULL {read,write}_iter will do the right thing in all cases. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
1b784140 |
|
02-Mar-2015 |
Ying Xue <ying.xue@windriver.com> |
net: Remove iocb argument from sendmsg and recvmsg After TIPC doesn't depend on iocb argument in its internal implementations of sendmsg() and recvmsg() hooks defined in proto structure, no any user is using iocb argument in them at all now. Then we can drop the redundant iocb argument completely from kinds of implementations of both sendmsg() and recvmsg() in the entire networking stack. Cc: Christoph Hellwig <hch@lst.de> Suggested-by: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
2f1d8b9e |
|
27-Feb-2015 |
Eric Dumazet <edumazet@google.com> |
macvtap: make sure neighbour code can push ethernet header Brian reported crashes using IPv6 traffic with macvtap/veth combo. I tracked the crashes in neigh_hh_output() -> memcpy(skb->data - HH_DATA_MOD, hh->hh_data, HH_DATA_MOD); Neighbour code assumes headroom to push Ethernet header is at least 16 bytes. It appears macvtap has only 14 bytes available on arches where NET_IP_ALIGN is 0 (like x86) Effect is a corruption of 2 bytes right before skb->head, and possible crashes if accessing non existing memory. This fix should also increase IPv4 performance, as paranoid code in ip_finish_output2() wont have to call skb_realloc_headroom() Reported-by: Brian Rak <brak@vultr.com> Tested-by: Brian Rak <brak@vultr.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
e3e3c423 |
|
03-Feb-2015 |
Vlad Yasevich <vyasevich@gmail.com> |
Revert "drivers/net: Disable UFO through virtio" This reverts commit 3d0ad09412ffe00c9afa201d01effdb6023d09b4. Now that GSO functionality can correctly track if the fragment id has been selected and select a fragment id if necessary, we can re-enable UFO on tap/macvap and virtio devices. Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
72f65107 |
|
03-Feb-2015 |
Vlad Yasevich <vyasevich@gmail.com> |
Revert "drivers/net, ipv6: Select IPv6 fragment idents for virtio UFO packets" This reverts commit 5188cd44c55db3e92cd9e77a40b5baa7ed4340f7. Now that GSO layer can track if fragment id has been selected and can allocate one if necessary, we don't need to do this in tap and macvtap. This reverts most of the code and only keeps the new ipv6 fragment id generation function that is still needed. Fixes: 3d0ad09412ff (drivers/net: Disable UFO through virtio) Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
df8a39de |
|
13-Jan-2015 |
Jiri Pirko <jiri@resnulli.us> |
net: rename vlan_tx_* helpers since "tx" is misleading there The same macros are used for rx as well. So rename it. Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
01b07fb3 |
|
16-Dec-2014 |
Michael S. Tsirkin <mst@redhat.com> |
macvtap: drop broken IFF_VNET_LE Use TUNSETVNETLE/TUNGETVNETLE instead. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
39ec7de7 |
|
16-Dec-2014 |
Michael S. Tsirkin <mst@redhat.com> |
macvtap: fix uninitialized access on TUNSETIFF flags field in ifreq is only 16 bit wide, but we read it as a 32 bit value. If userspace doesn't zero-initialize unused fields, this will lead to failures. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
c0371da6 |
|
24-Nov-2014 |
Al Viro <viro@zeniv.linux.org.uk> |
put iov_iter into msghdr Note that the code _using_ ->msg_iter at that point will be very unhappy with anything other than unshifted iovec-backed iov_iter. We still need to convert users to proper primitives. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
6ae7feb3 |
|
23-Nov-2014 |
Michael S. Tsirkin <mst@redhat.com> |
macvtap: TUN_VNET_LE support Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Jason Wang <jasowang@redhat.com>
|
#
f51a5e82 |
|
01-Dec-2014 |
Jason Wang <jasowang@redhat.com> |
tun/macvtap: use consume_skb() instead of kfree_skb() when needed To be more friendly with drop monitor, we should only call kfree_skb() when the packets were dropped and use consume_skb() in other cases. Cc: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
f5ff53b4 |
|
19-Jun-2014 |
Al Viro <viro@zeniv.linux.org.uk> |
{macvtap,tun}_get_user(): switch to iov_iter allows to switch macvtap and tun from ->aio_write() to ->write_iter() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
3af0bfe5 |
|
07-Nov-2014 |
Al Viro <viro@zeniv.linux.org.uk> |
switch macvtap to ->read_iter() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
7cc76f51 |
|
20-Nov-2014 |
Jason Wang <jasowang@redhat.com> |
macvtap: advance iov iterator when needed in macvtap_put_user() When mergeable buffer is used, vnet_hdr_sz is greater than sizeof struct virtio_net_hdr. So we need advance the iov iterators in this case. Fixes 6c36d2e26cda1ad3e2c4b90dd843825fc62fe5b4 ("macvtap: Use iovec iterators") Cc: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
6c36d2e2 |
|
07-Nov-2014 |
Herbert Xu <herbert@gondor.apana.org.au> |
macvtap: Use iovec iterators This patch removes the use of skb_copy_datagram_const_iovec in favour of the iovec iterator-based skb_copy_datagram_iter. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
3ce9b20f |
|
02-Nov-2014 |
Herbert Xu <herbert@gondor.apana.org.au> |
macvtap: Fix csum_start when VLAN tags are present When VLAN is in use in macvtap_put_user, we end up setting csum_start to the wrong place. The result is that the whoever ends up doing the checksum setting will corrupt the packet instead of writing the checksum to the expected location, usually this means writing the checksum with an offset of -4. This patch fixes this by adjusting csum_start when VLAN tags are detected. Fixes: f09e2249c4f5 ("macvtap: restore vlan header on user read") Cc: stable@vger.kernel.org Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Cheers, Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
5188cd44 |
|
30-Oct-2014 |
Ben Hutchings <ben@decadent.org.uk> |
drivers/net, ipv6: Select IPv6 fragment idents for virtio UFO packets UFO is now disabled on all drivers that work with virtio net headers, but userland may try to send UFO/IPv6 packets anyway. Instead of sending with ID=0, we should select identifiers on their behalf (as we used to). Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Fixes: 916e4cf46d02 ("ipv6: reuse ip6_frag_id from ip6_ufo_append_data") Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
3d0ad094 |
|
30-Oct-2014 |
Ben Hutchings <ben@decadent.org.uk> |
drivers/net: Disable UFO through virtio IPv6 does not allow fragmentation by routers, so there is no fragmentation ID in the fixed header. UFO for IPv6 requires the ID to be passed separately, but there is no provision for this in the virtio net protocol. Until recently our software implementation of UFO/IPv6 generated a new ID, but this was a bug. Now we will use ID=0 for any UFO/IPv6 packet passed through a tap, which is even worse. Unfortunately there is no distinction between UFO/IPv4 and v6 features, so disable UFO on taps and virtio_net completely until we have a proper solution. We cannot depend on VM managers respecting the tap feature flags, so keep accepting UFO packets but log a warning the first time we do this. Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Fixes: 916e4cf46d02 ("ipv6: reuse ip6_frag_id from ip6_ufo_append_data") Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
04ffcb25 |
|
14-Oct-2014 |
Tom Herbert <therbert@google.com> |
net: Add ndo_gso_check Add ndo_gso_check which a device can define to indicate whether is is capable of doing GSO on a packet. This funciton would be called from the stack to determine whether software GSO is needed to be done. A driver should populate this function if it advertises GSO types for which there are combinations that it wouldn't be able to handle. For instance a device that performs UDP tunneling might only implement support for transparent Ethernet bridging type of inner packets or might have limitations on lengths of inner headers. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
40b8fe45 |
|
22-Sep-2014 |
Vlad Yasevich <vyasevich@gmail.com> |
macvtap: Fix race between device delete and open. In macvtap device delete and open calls can race and this causes a list curruption of the vlan queue_list. The race intself is triggered by the idr accessors that located the vlan device. The device is stored into and removed from the idr under both an rtnl and a mutex. However, when attempting to locate the device in idr, only a mutex is taken. As a result, once cpu perfoming a delete may take an rtnl and wait for the mutex, while another cput doing an open() will take the idr mutex first to fetch the device pointer and later take an rtnl to add a queue for the device which may have just gotten deleted. With this patch, we now hold the rtnl for the duration of the macvtap_open() call thus making sure that open will not race with delete. CC: Michael S. Tsirkin <mst@redhat.com> CC: Jason Wang <jasowang@redhat.com> Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
cbdb0427 |
|
29-Apr-2014 |
Vlad Yasevich <vyasevic@redhat.com> |
mactap: Fix checksum errors for non-gso packets in bridge mode The following is a problematic configuration: VM1: virtio-net device connected to macvtap0@eth0 VM2: e1000 device connect to macvtap1@eth0 The problem is is that virtio-net supports checksum offloading and thus sends the packets to the host with CHECKSUM_PARTIAL set. On the other hand, e1000 does not support any acceleration. For small TCP packets (and this includes the 3-way handshake), e1000 ends up receiving packets that only have a partial checksum set. This causes TCP to fail checksum validation and to drop packets. As a result tcp connections can not be established. Commit 3e4f8b787370978733ca6cae452720a4f0c296b8 macvtap: Perform GSO on forwarding path. fixes this issue for large packets wthat will end up undergoing GSO. This commit adds a check for the non-GSO case and attempts to compute the checksum for partially checksummed packets in the non-GSO case. CC: Daniel Lezcano <daniel.lezcano@free.fr> CC: Patrick McHardy <kaber@trash.net> CC: Andrian Nord <nightnord@gmail.com> CC: Eric Dumazet <eric.dumazet@gmail.com> CC: Michael S. Tsirkin <mst@redhat.com> CC: Jason Wang <jasowang@redhat.com> Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
a81ab36b |
|
08-Jan-2014 |
Paul Gortmaker <paul.gortmaker@windriver.com> |
drivers/net: delete non-required instances of include <linux/init.h> None of these files are actually using any __init type directives and hence don't need to include <linux/init.h>. Most are just a left over from __devinit and __cpuinit removal, or simply due to code getting copied from one driver to the next. This covers everything under drivers/net except for wireless, which has been submitted separately. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
3958afa1b |
|
15-Dec-2013 |
Tom Herbert <therbert@google.com> |
net: Change skb_get_rxhash to skb_get_hash Changing name of function as part of making the hash in skbuff to be generic property, not just for receive path. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
2f6a1b66 |
|
11-Dec-2013 |
Vlad Yasevich <vyasevic@redhat.com> |
macvlan: Remove custom recieve and forward handlers Since now macvlan and macvtap use the same receive and forward handlers, we can remove them completely and use netif_rx and dev_forward_skb() directly. Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
6acf54f1 |
|
11-Dec-2013 |
Vlad Yasevich <vyasevic@redhat.com> |
macvtap: Add support of packet capture on macvtap device. Macvtap device currently doesn not allow a user to capture traffic on due to the fact that it steals the packets from the network stack before the skb->dev is set correctly on the receive side, and that use uses macvlan transmit path directly on the send side. As a result, we never get a change to give traffic to the taps while the correct device is set in the skb. This patch makes macvtap device behave almost exaclty like macvlan. On the send side, we switch to using dev_queue_xmit(). On the receive side, to deliver packets to macvtap, we now use netif_rx and dev_forward_skb just like macvlan. The only differnce now is that macvtap has its own rx_handler which is attached to the macvtap netdev. It is here that we now steal the packet and provide it to the socket. As a result, we can now capture traffic on the macvtap device: tcpdump -i macvtap0 It also gives us the abilit to add tc actions to the macvtap device and actually utilize different bandwidth management queues on output. Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
ce232ce0 |
|
10-Dec-2013 |
Jason Wang <jasowang@redhat.com> |
macvtap: signal truncated packets macvtap_put_user() never return a value grater than iov length, this in fact bypasses the truncated checking in macvtap_recvmsg(). Fix this by always returning the size of packet plus the possible vlan header to let the trunca checking work. Cc: Vlad Yasevich <vyasevich@gmail.com> Cc: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com> Cc: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
bbd37626 |
|
10-Dec-2013 |
David S. Miller <davem@davemloft.net> |
net: Revert macvtap/tun truncation signalling changes. Jason Wang and Michael S. Tsirkin are still discussing how to properly fix this. Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
730054da |
|
09-Dec-2013 |
Jason Wang <jasowang@redhat.com> |
macvtap: signal truncated packets macvtap_put_user() never return a value grater than iov length, this in fact bypasses the truncated checking in macvtap_recvmsg(). Fix this by always returning the size of packet plus the possible vlan header to let the truncated checking work. Cc: Vlad Yasevich <vyasevich@gmail.com> Cc: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
de2aa476 |
|
10-Dec-2013 |
David S. Miller <davem@davemloft.net> |
Revert "macvtap: remove useless codes in macvtap_aio_read() and macvtap_recvmsg()" This reverts commit 41e4af69a5984a3193ba3108fb4e067b0e34dc73. MSG_TRUNC handling was broken and is going to be fixed in the 'net' tree, so revert this. Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
41e4af69 |
|
06-Dec-2013 |
Zhi Yong Wu <wuzhy@linux.vnet.ibm.com> |
macvtap: remove useless codes in macvtap_aio_read() and macvtap_recvmsg() By checking related codes, it is impossible that ret > len or total_len, so we should remove some useless coeds in both above functions. Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
55ec8e25 |
|
06-Dec-2013 |
Zhi Yong Wu <wuzhy@linux.vnet.ibm.com> |
macvtap: remove unused parameter in macvtap_do_read() Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
359d44d7 |
|
06-Dec-2013 |
Zhi Yong Wu <wuzhy@linux.vnet.ibm.com> |
macvtap: remove the dead branch Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
e6ebc7f1 |
|
05-Dec-2013 |
Zhi Yong Wu <wuzhy@linux.vnet.ibm.com> |
macvtap: update file current position Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
006da7b0 |
|
25-Nov-2013 |
Vlad Yasevich <vyasevic@redhat.com> |
macvtap: Do not double-count received packets Currently macvlan will count received packets after calling each vlans receive handler. Macvtap attempts to count the packet yet again when the user reads the packet from the tap socket. This code doesn't do this consistently either. Remove the counting from macvtap and let only macvlan count received packets. Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
cd3e22b7 |
|
25-Nov-2013 |
Jason Wang <jasowang@redhat.com> |
macvtap: fix tx_dropped counting error After commit 8ffab51b3dfc54876f145f15b351c41f3f703195 (macvlan: lockless tx path), tx stat counter were converted to percpu stat structure. So we need use to this also for tx_dropped in macvtap. Otherwise, the management won't notice the dropping packet in macvtap tx path. Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Vlad Yasevich <vyasevic@redhat.com> Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
16a3fa28 |
|
12-Nov-2013 |
Jason Wang <jasowang@redhat.com> |
macvtap: limit head length of skb allocated We currently use hdr_len as a hint of head length which is advertised by guest. But when guest advertise a very big value, it can lead to an 64K+ allocating of kmalloc() which has a very high possibility of failure when host memory is fragmented or under heavy stress. The huge hdr_len also reduce the effect of zerocopy or even disable if a gso skb is linearized in guest. To solves those issues, this patch introduces an upper limit (PAGE_SIZE) of the head, which guarantees an order 0 allocation each time. Cc: Stefan Hajnoczi <stefanha@redhat.com> Cc: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
e5733321 |
|
16-Aug-2013 |
Vlad Yasevich <vyasevic@redhat.com> |
macvtap: Ignore tap features when VNET_HDR is off When the user turns off VNET_HDR support on the macvtap device, there is no way to provide any offload information to the user. So, it's safer to ignore offload setting then depend on the user setting them correctly. Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
e558b018 |
|
16-Aug-2013 |
Vlad Yasevich <vyasevic@redhat.com> |
macvtap: Correctly set tap features when IFF_VNET_HDR is disabled. When the user turns off IFF_VNET_HDR flag, attempts to change offload features via TUNSETOFFLOAD do not work. This could cause GSO packets to be delivered to the user when the user is not prepared to handle them. To solve, allow processing of TUNSETOFFLOAD when IFF_VNET_HDR is disabled. Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
a567dd62 |
|
16-Aug-2013 |
Vlad Yasevich <vyasevic@redhat.com> |
macvtap: simplify usage of tap_features In macvtap, tap_features specific the features of that the user has specified via ioctl(). If we treat macvtap as a macvlan+tap then we could all the tap a pseudo-device and give it other features like SG and GSO. Then we can stop using the features of lower device (macvlan) when forwarding the traffic the tap. This solves the issue of possible checksum offload mismatch between tap feature and macvlan features. Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
29d79196 |
|
08-Aug-2013 |
Eric Dumazet <edumazet@google.com> |
macvtap: fix two races Since commit ac4e4af1e59e1 ("macvtap: Consistently use rcu functions"), Thomas gets two different warnings : BUG: using smp_processor_id() in preemptible [00000000] code: vhost-45891/45892 caller is macvtap_do_read+0x45c/0x600 [macvtap] CPU: 1 PID: 45892 Comm: vhost-45891 Not tainted 3.11.0-bisecttest #13 Call Trace: ([<00000000001126ee>] show_trace+0x126/0x144) [<00000000001127d2>] show_stack+0xc6/0xd4 [<000000000068bcec>] dump_stack+0x74/0xd8 [<0000000000481066>] debug_smp_processor_id+0xf6/0x114 [<000003ff802e9a18>] macvtap_do_read+0x45c/0x600 [macvtap] [<000003ff802e9c1c>] macvtap_recvmsg+0x60/0x88 [macvtap] [<000003ff80318c5e>] handle_rx+0x5b2/0x800 [vhost_net] [<000003ff8028f77c>] vhost_worker+0x15c/0x1c4 [vhost] [<000000000015f3ac>] kthread+0xd8/0xe4 [<00000000006934a6>] kernel_thread_starter+0x6/0xc [<00000000006934a0>] kernel_thread_starter+0x0/0xc And BUG: using smp_processor_id() in preemptible [00000000] code: vhost-45897/45898 caller is macvlan_start_xmit+0x10a/0x1b4 [macvlan] CPU: 1 PID: 45898 Comm: vhost-45897 Not tainted 3.11.0-bisecttest #16 Call Trace: ([<00000000001126ee>] show_trace+0x126/0x144) [<00000000001127d2>] show_stack+0xc6/0xd4 [<000000000068bdb8>] dump_stack+0x74/0xd4 [<0000000000481132>] debug_smp_processor_id+0xf6/0x114 [<000003ff802b72ca>] macvlan_start_xmit+0x10a/0x1b4 [macvlan] [<000003ff802ea69a>] macvtap_get_user+0x982/0xbc4 [macvtap] [<000003ff802ea92a>] macvtap_sendmsg+0x4e/0x60 [macvtap] [<000003ff8031947c>] handle_tx+0x494/0x5ec [vhost_net] [<000003ff8028f77c>] vhost_worker+0x15c/0x1c4 [vhost] [<000000000015f3ac>] kthread+0xd8/0xe4 [<000000000069356e>] kernel_thread_starter+0x6/0xc [<0000000000693568>] kernel_thread_starter+0x0/0xc 2 locks held by vhost-45897/45898: #0: (&vq->mutex){+.+.+.}, at: [<000003ff8031903c>] handle_tx+0x54/0x5ec [vhost_net] #1: (rcu_read_lock){.+.+..}, at: [<000003ff802ea53c>] macvtap_get_user+0x824/0xbc4 [macvtap] In the first case, macvtap_put_user() calls macvlan_count_rx() in a preempt-able context, and this is not allowed. In the second case, macvtap_get_user() calls macvlan_start_xmit() with BH enabled, and this is not allowed. Reported-by: Thomas Huth <thuth@linux.vnet.ibm.com> Bisected-by: Thomas Huth <thuth@linux.vnet.ibm.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Tested-by: Thomas Huth <thuth@linux.vnet.ibm.com> Cc: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
28d64271 |
|
08-Aug-2013 |
Eric Dumazet <edumazet@google.com> |
net: attempt high order allocations in sock_alloc_send_pskb() Adding paged frags skbs to af_unix sockets introduced a performance regression on large sends because of additional page allocations, even if each skb could carry at least 100% more payload than before. We can instruct sock_alloc_send_pskb() to attempt high order allocations. Most of the time, it does a single page allocation instead of 8. I added an additional parameter to sock_alloc_send_pskb() to let other users to opt-in for this new feature on followup patches. Tested: Before patch : $ netperf -t STREAM_STREAM STREAM STREAM TEST Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 2304 212992 212992 10.00 46861.15 After patch : $ netperf -t STREAM_STREAM STREAM STREAM TEST Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 2304 212992 212992 10.00 57981.11 Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: David Rientjes <rientjes@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
c3bdeb5c |
|
06-Aug-2013 |
Jason Wang <jasowang@redhat.com> |
net: move zerocopy_sg_from_iovec() to net/core/datagram.c To let it be reused and reduce code duplication. Also document this function. Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
b4bf0777 |
|
06-Aug-2013 |
Jason Wang <jasowang@redhat.com> |
net: move iov_pages() to net/core/iovec.c To let it be reused and reduce code duplication. Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
ece793fc |
|
17-Jul-2013 |
Jason Wang <jasowang@redhat.com> |
macvtap: do not zerocopy if iov needs more pages than MAX_SKB_FRAGS We try to linearize part of the skb when the number of iov is greater than MAX_SKB_FRAGS. This is not enough since each single vector may occupy more than one pages, so zerocopy_sg_fromiovec() may still fail and may break the guest network. Solve this problem by calculate the pages needed for iov before trying to do zerocopy and switch to use copy instead of zerocopy if it needs more than MAX_SKB_FRAGS. This is done through introducing a new helper to count the pages for iov, and call uarg->callback() manually when switching from zerocopy to copy to notify vhost. We can do further optimization on top. This bug were introduced from b92946e2919134ebe2a4083e4302236295ea2a73 (macvtap: zerocopy: validate vectors before building skb). Cc: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
0fbe0d47 |
|
15-Jul-2013 |
Jason Wang <jasowang@redhat.com> |
macvtap: do not assume 802.1Q when send vlan packets The hard-coded 8021.q proto will break 802.1ad traffic. So switch to use vlan->proto. Cc: Basil Gor <basil.gor@gmail.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
82a19eb8 |
|
15-Jul-2013 |
Jason Wang <jasowang@redhat.com> |
macvtap: fix the missing ret value of TUNSETQUEUE Commit 441ac0fcaadc76ad09771812382345001dd2b813 (macvtap: Convert to using rtnl lock) forget to return what macvtap_ioctl_set_queue() returns to its caller. This may break multiqueue API by always falling through to TUNGETFEATURES. Cc: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
61d46bf9 |
|
09-Jul-2013 |
Jason Wang <jasowang@redhat.com> |
macvtap: correctly linearize skb when zerocopy is used Userspace may produce vectors greater than MAX_SKB_FRAGS. When we try to linearize parts of the skb to let the rest of iov to be fit in the frags, we need count copylen into linear when calling macvtap_alloc_skb() instead of partly counting it into data_len. Since this breaks zerocopy_sg_from_iovec() since its inner counter assumes nr_frags should be zero at beginning. This cause nr_frags to be increased wrongly without setting the correct frags. This bug were introduced from b92946e2919134ebe2a4083e4302236295ea2a73 (macvtap: zerocopy: validate vectors before building skb). Cc: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
3e4f8b78 |
|
25-Jun-2013 |
Vlad Yasevich <vyasevic@redhat.com> |
macvtap: Perform GSO on forwarding path. When macvtap forwards skb to its tap, it needs to check if GSO needs to be performed. This is sometimes necessary when the HW device performed GRO, but the guest reading from the tap does not support it (ex: Windows 7). Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
2be5c767 |
|
25-Jun-2013 |
Vlad Yasevich <vyasevic@redhat.com> |
macvtap: Let TUNSETOFFLOAD actually controll offload features. When the user issues TUNSETOFFLOAD ioctl, macvtap does not do anything other then to verify arguments. This patch adds functionality to allow users to actually control offload features. NETIF_F_GSO and NETIF_F_GRO are always on, but the rest of the features can be controlled. Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
ac4e4af1 |
|
25-Jun-2013 |
Vlad Yasevich <vyasevic@redhat.com> |
macvtap: Consistently use rcu functions Currently macvtap uses rcu_bh functions in its user facing fuction macvtap_get_user() and macvtap_put_user(). However, its packet handlers use normal rcu as the rcu_read_lock() is taken in netif_receive_skb(). We can safely discontinue the usage or rcu with bh disabled. Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
441ac0fc |
|
25-Jun-2013 |
Vlad Yasevich <vyasevic@redhat.com> |
macvtap: Convert to using rtnl lock Macvtap uses a private lock to protect the relationship between macvtap_queue and macvlan_dev. The private lock is not needed since the relationship is managed by user via open(), release(), and dellink() calls. dellink() already happens under rtnl, so we can safely convert open() and release(), and use it in ioctl() as well. Suggested by Eric Dumazet. Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
4c7ab054 |
|
23-Jun-2013 |
Michael S. Tsirkin <mst@redhat.com> |
macvtap: fix recovery from gup errors get user pages might fail partially in macvtap zero copy mode. To recover we need to put all pages that we got, but code used a wrong index resulting in double-free errors. Reported-by: Brad Hubbard <bhubbard@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
f57855a5 |
|
13-Jun-2013 |
Jason Wang <jasowang@redhat.com> |
macvtap: fix uninitialized return value macvtap_ioctl_set_queue() Return -EINVAL on illegal flag instead of uninitialized value. This fixes the kbuild test warning. Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
d9a90a31 |
|
13-Jun-2013 |
Jason Wang <jasowang@redhat.com> |
macvtap: slient sparse warnings This patch silents the following sparse warnings: drivers/net/macvtap.c:98:9: warning: incorrect type in assignment (different address spaces) drivers/net/macvtap.c:98:9: expected struct macvtap_queue *<noident> drivers/net/macvtap.c:98:9: got struct macvtap_queue [noderef] <asn:4>*<noident> drivers/net/macvtap.c:120:9: warning: incorrect type in assignment (different address spaces) drivers/net/macvtap.c:120:9: expected struct macvtap_queue *<noident> drivers/net/macvtap.c:120:9: got struct macvtap_queue [noderef] <asn:4>*<noident> drivers/net/macvtap.c:151:22: error: incompatible types in comparison expression (different address spaces) drivers/net/macvtap.c:233:23: error: incompatible types in comparison expression (different address spaces) drivers/net/macvtap.c:243:23: error: incompatible types in comparison expression (different address spaces) drivers/net/macvtap.c:247:15: error: incompatible types in comparison expression (different address spaces) CC [M] drivers/net/macvtap.o drivers/net/macvlan.c:232:24: error: incompatible types in comparison expression (different address spaces) Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
df09b36f |
|
05-Jun-2013 |
Jason Wang <jasowang@redhat.com> |
macvtap: enable multiqueue flag To notify the userspace about our capability of multiqueue. Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
815f236d |
|
05-Jun-2013 |
Jason Wang <jasowang@redhat.com> |
macvtap: add TUNSETQUEUE ioctl This patch adds TUNSETQUEUE ioctl to let userspace can temporarily disable or enable a queue of macvtap. This is used to be compatible at API layer of tuntap to simplify the userspace to manage the queues. This is done through introducing a linked list to track all taps while using vlan->taps array to only track active taps. Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
376b1aab |
|
05-Jun-2013 |
Jason Wang <jasowang@redhat.com> |
macvtap: eliminate linear search Linear search were used in both get_slot() and macvtap_get_queue(), this is because: - macvtap didn't reshuffle the array of taps when create or destroy a queue, so when adding a new queue, macvtap must do linear search to find a location for the new queue. This will also complicate the TUNSETQUEUE implementation for multiqueue API. - the queue itself didn't track the queue index, so the we must do a linear search in the array to find the location of a existed queue. The solution is straightforward: reshuffle the array and introduce a queue_index to macvtap_queue. Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
8f475a31 |
|
05-Jun-2013 |
Jason Wang <jasowang@redhat.com> |
macvtap: introduce macvtap_get_vlan() Factor out the device holding logic to a macvtap_get_vlan(), this will be also used by multiqueue API. Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
89cee917 |
|
05-Jun-2013 |
Jason Wang <jasowang@redhat.com> |
macvtap: do not add self to waitqueue if doing a nonblock read There's no need to add self to waitqueue if doing a nonblock read. This could help to avoid the spinlock contention. Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
ed0483fa |
|
05-Jun-2013 |
Jason Wang <jasowang@redhat.com> |
macvtap: fix a possible race between queue selection and changing queues Complier may generate codes that re-read the vlan->numvtaps during macvtap_get_queue(). This may lead a race if vlan->numvtaps were changed in the same time and which can lead unexpected result (e.g. very huge value). We need prevent the compiler from generating such codes by adding an ACCESS_ONCE() to make sure vlan->numvtaps were only read once. Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
351638e7 |
|
27-May-2013 |
Jiri Pirko <jiri@resnulli.us> |
net: pass info struct via netdevice notifier So far, only net_device * could be passed along with netdevice notifier event. This patch provides a possibility to pass custom structure able to provide info that event listener needs to know. Signed-off-by: Jiri Pirko <jiri@resnulli.us> v2->v3: fix typo on simeth shortened dev_getter shortened notifier_info struct name v1->v2: fix notifier_call parameter in call_netdevice_notifier() Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
40893fd0 |
|
26-Mar-2013 |
Jason Wang <jasowang@redhat.com> |
net: switch to use skb_probe_transport_header() Switch to use the new help skb_probe_transport_header() to do the l4 header probing for untrusted sources. For packets with partial csum, the header should already been set by skb_partial_csum_set(). Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
9b4d669b |
|
25-Mar-2013 |
Jason Wang <jasowang@redhat.com> |
macvtap: set transport header before passing skb to lower device Set the transport header for 1) some drivers (e.g ixgbe) needs l4 header 2) precise packet length estimation (introduced in 1def9238) needs l4 header to compute header length. For the packets with partial checksum, the patch just set the transport header to csum_start. Otherwise tries to use skb_flow_dissect() to get l4 offset, if it fails, just pretend no l4 header. Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
ec09ebc1 |
|
27-Feb-2013 |
Tejun Heo <tj@kernel.org> |
macvtap: convert to idr_alloc() Convert to the much saner new idr interface. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Jason Wang <jasowang@redhat.com> Cc: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
c9af6db4 |
|
11-Feb-2013 |
Pravin B Shelar <pshelar@nicira.com> |
net: Fix possible wrong checksum generation. Patch cef401de7be8c4e (net: fix possible wrong checksum generation) fixed wrong checksum calculation but it broke TSO by defining new GSO type but not a netdev feature for that type. net_gso_ok() would not allow hardware checksum/segmentation offload of such packets without the feature. Following patch fixes TSO and wrong checksum. This patch uses same logic that Eric Dumazet used. Patch introduces new flag SKBTX_SHARED_FRAG if at least one frag can be modified by the user. but SKBTX_SHARED_FRAG flag is kept in skb shared info tx_flags rather than gso_type. tx_flags is better compared to gso_type since we can have skb with shared frag without gso packet. It does not link SHARED_FRAG to GSO, So there is no need to define netdev feature for this. Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
cef401de |
|
25-Jan-2013 |
Eric Dumazet <edumazet@google.com> |
net: fix possible wrong checksum generation Pravin Shelar mentioned that GSO could potentially generate wrong TX checksum if skb has fragments that are overwritten by the user between the checksum computation and transmit. He suggested to linearize skbs but this extra copy can be avoided for normal tcp skbs cooked by tcp_sendmsg(). This patch introduces a new SKB_GSO_SHARED_FRAG flag, set in skb_shinfo(skb)->gso_type if at least one frag can be modified by the user. Typical sources of such possible overwrites are {vm}splice(), sendfile(), and macvtap/tun/virtio_net drivers. Tested: $ netperf -H 7.7.8.84 MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 7.7.8.84 () port 0 AF_INET Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 87380 16384 16384 10.00 3959.52 $ netperf -H 7.7.8.84 -t TCP_SENDFILE TCP SENDFILE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 7.7.8.84 () port 0 AF_INET Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 87380 16384 16384 10.00 3216.80 Performance of the SENDFILE is impacted by the extra allocation and copy, and because we use order-0 pages, while the TCP_STREAM uses bigger pages. Reported-by: Pravin Shelar <pshelar@nicira.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
3a7f8c34 |
|
11-Aug-2012 |
Denis Efremov <yefremov.denis@gmail.com> |
macvtap: rcu_dereference outside read-lock section rcu_dereference occurs in update section. Replacement by rcu_dereference_protected in order to prevent lockdep complaint. Found by Linux Driver Verification project (linuxtesting.org) Signed-off-by: Denis Efremov <yefremov.denis@gmail.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
ccf7e72b |
|
06-Jun-2012 |
Hong zhi guo <honkiko@gmail.com> |
macvtap: use prepare_to_wait/finish_wait to ensure mb instead of raw assignment to current->state Signed-off-by: Hong Zhiguo <honkiko@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
f09e2249 |
|
03-May-2012 |
Basil Gor <basil.gor@gmail.com> |
macvtap: restore vlan header on user read Ethernet vlan header is not on the packet and kept in the skb->vlan_tci when it comes from lower dev. This patch inserts vlan header in user buffer during skb copy on user read. Signed-off-by: Basil Gor <basil.gor@gmail.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
b92946e2 |
|
01-May-2012 |
Jason Wang <jasowang@redhat.com> |
macvtap: zerocopy: validate vectors before building skb There're several reasons that the vectors need to be validated: - Return error when caller provides vectors whose num is greater than UIO_MAXIOV. - Linearize part of skb when userspace provides vectors grater than MAX_SKB_FRAGS. - Return error when userspace provides vectors whose total length may exceed - MAX_SKB_FRAGS * PAGE_SIZE. Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
|
#
01d6657b |
|
01-May-2012 |
Jason Wang <jasowang@redhat.com> |
macvtap: zerocopy: set SKBTX_DEV_ZEROCOPY only when skb is built successfully Current the SKBTX_DEV_ZEROCOPY is set unconditionally after zerocopy_sg_from_iovec(), this would lead NULL pointer when macvtap fails to build zerocopy skb because destructor_arg was not initialized. Solve this by set this flag after the skb were built successfully. Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
|
#
02ce04bb |
|
01-May-2012 |
Jason Wang <jasowang@redhat.com> |
macvtap: zerocopy: put page when fail to get all requested user pages When get_user_pages_fast() fails to get all requested pages, we could not use kfree_skb() to free it as it has not been put in the skb fragments. So we need to call put_page() instead. Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
|
#
4ef67ebe |
|
01-May-2012 |
Jason Wang <jasowang@redhat.com> |
macvtap: zerocopy: fix truesize underestimation As the skb fragment were pinned/built from user pages, we should account the page instead of length for truesize. Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
|
#
3afc9621 |
|
01-May-2012 |
Jason Wang <jasowang@redhat.com> |
macvtap: zerocopy: fix offset calculation when building skb This patch fixes the offset calculation when building skb: - offset1 were used as skb data offset not vector offset - reset offset to zero only when we advance to next vector Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
|
#
40401530 |
|
12-Feb-2012 |
Al Viro <viro@ftp.linux.org.uk> |
security: trim security.h Trim security.h Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: James Morris <jmorris@namei.org>
|
#
ef0002b5 |
|
23-Nov-2011 |
Krishna Kumar <krkumar2@in.ibm.com> |
macvtap: Fix macvtap_get_queue to use rxhash first It was reported that the macvtap device selects a Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
2cfa5a04 |
|
23-Nov-2011 |
Eric Dumazet <eric.dumazet@gmail.com> |
net: treewide use of RCU_INIT_POINTER rcu_assign_pointer(ptr, NULL) can be safely replaced by RCU_INIT_POINTER(ptr, NULL) (old rcu_assign_pointer() macro was testing the NULL value and could omit the smp_wmb(), but this had to be removed because of compiler warnings) Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
e09eff7f |
|
19-Oct-2011 |
Eric W. Biederman <ebiederm@xmission.com> |
macvtap: Fix the minor device number allocation On systems that create and delete lots of dynamic devices the 31bit linux ifindex fails to fit in the 16bit macvtap minor, resulting in unusable macvtap devices. I have systems running automated tests that that hit this condition in just a few days. Use a linux idr allocator to track which mavtap minor numbers are available and and to track the association between macvtap minor numbers and macvtap network devices. Remove the unnecessary unneccessary check to see if the network device we have found is indeed a macvtap device. With macvtap specific data structures it is impossible to find any other kind of networking device. Increase the macvtap minor range from 65536 to the full 20 bits that is supported by linux device numbers. It doesn't solve the original problem but there is no penalty for a larger minor device range. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
9bf1907f |
|
19-Oct-2011 |
Eric W. Biederman <ebiederm@xmission.com> |
macvtap: Rewrite macvtap_newlink so the error handling works. Place macvlan_common_newlink at the end of macvtap_newlink because failing in newlink after registering your network device is not supported. Move device_create into a netdevice creation notifier. The network device notifier is the only hook that is called after the network device has been registered with the device layer and before register_network_device returns success. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
2259fef0 |
|
19-Oct-2011 |
Eric W. Biederman <ebiederm@xmission.com> |
macvtap: Don't leak unreceived packets when we delete a macvtap device. To avoid leaking packets in the receive queue. Add a socket destructor that will run whenever destroy a macvtap socket. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
047af9cf |
|
19-Oct-2011 |
Eric W. Biederman <ebiederm@xmission.com> |
macvtap: Fix macvtap_open races in the zero copy enable code. To see if it is appropriate to enable the macvtap zero copy feature don't test the lowerdev network device flags. Instead test the macvtap network device flags which are a direct copy of the lowerdev flags. This is important because nothing holds a reference to lowerdev and on a very bad day we lowerdev could be a pointer to stale memory. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
99f34b38 |
|
19-Oct-2011 |
Eric W. Biederman <ebiederm@xmission.com> |
macvtap: Close a race between macvtap_open and macvtap_dellink. There is a small window in macvtap_open between looking up a networking device and calling macvtap_set_queue in which macvtap_del_queues called from macvtap_dellink. After calling macvtap_del_queues it is totally incorrect to allow macvtap_set_queue to proceed so prevent success by reporting that all of the available queues are in use. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
653fc915 |
|
18-Sep-2011 |
Jason Wang <jasowang@redhat.com> |
macvtap: fix the uninitialized var using in macvtap_alloc_skb() Commit d1b08284 use new frag API but would leave f to be used uninitialized, this patch fix it. Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
d1b08284 |
|
30-Aug-2011 |
Ian Campbell <Ian.Campbell@citrix.com> |
macvtap: convert to SKB paged frag API. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Cc: netdev@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
97bc3633 |
|
05-Jul-2011 |
Shirley Ma <mashirle@us.ibm.com> |
macvtap: macvtapTX zero-copy support Only 128 bytes is copied, the rest of data is DMA mapped directly from userspace. Signed-off-by: Shirley Ma <xma@...ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
10a8d94a |
|
09-Jun-2011 |
Jason Wang <jasowang@redhat.com> |
virtio_net: introduce VIRTIO_NET_HDR_F_DATA_VALID There's no need for the guest to validate the checksum if it have been validated by host nics. So this patch introduces a new flag - VIRTIO_NET_HDR_F_DATA_VALID which is used to bypass the checksum examing in guest. The backend (tap/macvtap) may set this flag when met skbs with CHECKSUM_UNNECESSARY to save cpu utilization. No feature negotiation is needed as old driver just ignore this flag. Iperf shows 12%-30% performance improvement for UDP traffic. For TCP, when gro is on no difference as it produces skb with partial checksum. But when gro is disabled, 20% or even higher improvement could be measured by netperf. Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
ce3c8692 |
|
04-Mar-2011 |
Nicolas Kaiser <nikai@nikai.net> |
drivers/net/macvtap: fix error check 'len' is unsigned of type size_t and can't be negative. Signed-off-by: Nicolas Kaiser <nikai@nikai.net> Acked-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
13707f9e |
|
26-Jan-2011 |
Eric Dumazet <eric.dumazet@gmail.com> |
drivers/net: remove some rcu sparse warnings Add missing __rcu annotations and helpers. minor : Fix some rcu_dereference() calls in macvtap Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Michael Chan <mchan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
1ac9ad13 |
|
11-Jan-2011 |
Eric Dumazet <eric.dumazet@gmail.com> |
net: remove dev_txq_stats_fold() After recent changes, (percpu stats on vlan/tunnels...), we dont need anymore per struct netdev_queue tx_bytes/tx_packets/tx_dropped counters. Only remaining users are ixgbe, sch_teql, gianfar & macvlan : 1) ixgbe can be converted to use existing tx_ring counters. 2) macvlan incremented txq->tx_dropped, it can use the dev->stats.tx_dropped counter. 3) sch_teql : almost revert ab35cd4b8f42 (Use net_device internal stats) Now we have ndo_get_stats64(), use it, even for "unsigned long" fields (No need to bring back a struct net_device_stats) 4) gianfar adds a stats structure per tx queue to hold tx_bytes/tx_packets This removes a lockdep warning (and possible lockup) in rndis gadget, calling dev_get_stats() from hard IRQ context. Ref: http://www.spinics.net/lists/netdev/msg149202.html Reported-by: Neil Jones <neiljay@gmail.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> CC: Jarek Poplawski <jarkao2@gmail.com> CC: Alexander Duyck <alexander.h.duyck@intel.com> CC: Jeff Kirsher <jeffrey.t.kirsher@intel.com> CC: Sandeep Gopalpet <sandeep.kumar@freescale.com> CC: Michal Nazarewicz <mina86@mina86.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
55508d60 |
|
14-Dec-2010 |
Michał Mirosław <mirq-linux@rere.qmqm.pl> |
net: Use skb_checksum_start_offset() Replace skb->csum_start - skb_headroom(skb) with skb_checksum_start_offset(). Note for usb/smsc95xx: skb->data - skb->head == skb_headroom(skb). Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
1565c7c1 |
|
04-Aug-2010 |
Krishna Kumar <krkumar2@in.ibm.com> |
macvtap: Implement multiqueue for macvtap driver Implement multiqueue facility for macvtap driver. The idea is that a macvtap device can be opened multiple times and the fd's can be used to register eg, as backend for vhost. Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
8a35747a |
|
21-Jul-2010 |
Herbert Xu <herbert@gondor.apana.org.au> |
macvtap: Limit packet queue length Mark Wagner reported OOM symptoms when sending UDP traffic over a macvtap link to a kvm receiver. This appears to be caused by the fact that macvtap packet queues are unlimited in length. This means that if the receiver can't keep up with the rate of flow, then we will hit OOM. Of course it gets worse if the OOM killer then decides to kill the receiver. This patch imposes a cap on the packet queue length, in the same way as the tuntap driver, using the device TX queue length. Please note that macvtap currently has no way of giving congestion notification, that means the software device TX queue cannot be used and packets will always be dropped once the macvtap driver queue fills up. This shouldn't be a great problem for the scenario where macvtap is used to feed a kvm receiver, as the traffic is most likely external in origin so congestion notification can't be applied anyway. Of course, if anybody decides to complain about guest-to-guest UDP packet loss down the track, then we may have to revisit this. Incidentally, this patch also fixes a real memory leak when macvtap_get_queue fails. Chris Wright noticed that for this patch to work, we need a non-zero TX queue length. This patch includes his work to change the default macvtap TX queue length to 500. Reported-by: Mark Wagner <mwagner@redhat.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Acked-by: Chris Wright <chrisw@sous-sol.org> Acked-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
1ebed71a |
|
10-Jul-2010 |
David S. Miller <davem@davemloft.net> |
macvtap: Use dev_t for macvtap_major. Reported-by: "Robert P. J. Day" <rpjday@crashcourse.ca> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
55afbd08 |
|
29-Apr-2010 |
Michael S. Tsirkin <mst@redhat.com> |
macvtap: add ioctl to modify vnet header size This adds TUNSETVNETHDRSZ/TUNGETVNETHDRSZ support to macvtap. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Acked-by: David S. Miller <davem@davemloft.net>
|
#
43815482 |
|
29-Apr-2010 |
Eric Dumazet <eric.dumazet@gmail.com> |
net: sock_def_readable() and friends RCU conversion sk_callback_lock rwlock actually protects sk->sk_sleep pointer, so we need two atomic operations (and associated dirtying) per incoming packet. RCU conversion is pretty much needed : 1) Add a new structure, called "struct socket_wq" to hold all fields that will need rcu_read_lock() protection (currently: a wait_queue_head_t and a struct fasync_struct pointer). [Future patch will add a list anchor for wakeup coalescing] 2) Attach one of such structure to each "struct socket" created in sock_alloc_inode(). 3) Respect RCU grace period when freeing a "struct socket_wq" 4) Change sk_sleep pointer in "struct sock" by sk_wq, pointer to "struct socket_wq" 5) Change sk_sleep() function to use new sk->sk_wq instead of sk->sk_sleep 6) Change sk_has_sleeper() to wq_has_sleeper() that must be used inside a rcu_read_lock() section. 7) Change all sk_has_sleeper() callers to : - Use rcu_read_lock() instead of read_lock(&sk->sk_callback_lock) - Use wq_has_sleeper() to eventually wakeup tasks. - Use rcu_read_unlock() instead of read_unlock(&sk->sk_callback_lock) 8) sock_wake_async() is modified to use rcu protection as well. 9) Exceptions : macvtap, drivers/net/tun.c, af_unix use integrated "struct socket_wq" instead of dynamically allocated ones. They dont need rcu freeing. Some cleanups or followups are probably needed, (possible sk_callback_lock conversion to a spinlock for example...). Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
4a4771a5 |
|
25-Apr-2010 |
Eric Dumazet <eric.dumazet@gmail.com> |
net: use sk_sleep() Commit aa395145 (net: sk_sleep() helper) missed three files in the conversion. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
aa395145 |
|
20-Apr-2010 |
Eric Dumazet <eric.dumazet@gmail.com> |
net: sk_sleep() helper Define a new function to return the waitqueue of a "struct sock". static inline wait_queue_head_t *sk_sleep(struct sock *sk) { return sk->sk_sleep; } Change all read occurrences of sk_sleep by a call to this function. Needed for a future RCU conversion. sk_sleep wont be a field directly available. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
5a0e3ad6 |
|
24-Mar-2010 |
Tejun Heo <tj@kernel.org> |
include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h percpu.h is included by sched.h and module.h and thus ends up being included when building most .c files. percpu.h includes slab.h which in turn includes gfp.h making everything defined by the two files universally available and complicating inclusion dependencies. percpu.h -> slab.h dependency is about to be removed. Prepare for this change by updating users of gfp and slab facilities include those headers directly instead of assuming availability. As this conversion needs to touch large number of source files, the following script is used as the basis of conversion. http://userweb.kernel.org/~tj/misc/slabh-sweep.py The script does the followings. * Scan files for gfp and slab usages and update includes such that only the necessary includes are there. ie. if only gfp is used, gfp.h, if slab is used, slab.h. * When the script inserts a new include, it looks at the include blocks and try to put the new include such that its order conforms to its surrounding. It's put in the include block which contains core kernel includes, in the same order that the rest are ordered - alphabetical, Christmas tree, rev-Xmas-tree or at the end if there doesn't seem to be any matching order. * If the script can't find a place to put a new include (mostly because the file doesn't have fitting include block), it prints out an error message indicating which .h file needs to be added to the file. The conversion was done in the following steps. 1. The initial automatic conversion of all .c files updated slightly over 4000 files, deleting around 700 includes and adding ~480 gfp.h and ~3000 slab.h inclusions. The script emitted errors for ~400 files. 2. Each error was manually checked. Some didn't need the inclusion, some needed manual addition while adding it to implementation .h or embedding .c file was more appropriate for others. This step added inclusions to around 150 files. 3. The script was run again and the output was compared to the edits from #2 to make sure no file was left behind. 4. Several build tests were done and a couple of problems were fixed. e.g. lib/decompress_*.c used malloc/free() wrappers around slab APIs requiring slab.h to be added manually. 5. The script was run on all .h files but without automatically editing them as sprinkling gfp.h and slab.h inclusions around .h files could easily lead to inclusion dependency hell. Most gfp.h inclusion directives were ignored as stuff from gfp.h was usually wildly available and often used in preprocessor macros. Each slab.h inclusion directive was examined and added manually as necessary. 6. percpu.h was updated not to include slab.h. 7. Build test were done on the following configurations and failures were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my distributed build env didn't work with gcov compiles) and a few more options had to be turned off depending on archs to make things build (like ipr on powerpc/64 which failed due to missing writeq). * x86 and x86_64 UP and SMP allmodconfig and a custom test config. * powerpc and powerpc64 SMP allmodconfig * sparc and sparc64 SMP allmodconfig * ia64 SMP allmodconfig * s390 SMP allmodconfig * alpha SMP allmodconfig * um on x86_64 SMP allmodconfig 8. percpu.h modifications were reverted so that it could be applied as a separate patch and serve as bisection point. Given the fact that I had only a couple of failures from tests on step 6, I'm fairly confident about the coverage of this conversion patch. If there is a breakage, it's likely to be something in one of the arch headers which should be easily discoverable easily on most builds of the specific arch. Signed-off-by: Tejun Heo <tj@kernel.org> Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
|
#
b9fb9ee0 |
|
17-Feb-2010 |
Arnd Bergmann <arnd@arndb.de> |
macvtap: add GSO/csum offload support Added flags field to macvtap_queue to enable/disable processing of virtio_net_hdr via IFF_VNET_HDR. This flag is checked to prepend virtio_net_hdr in the receive path and process/skip virtio_net_hdr in the send path. Original patch by Sridhar, further changes by Arnd. Signed-off-by: Sridhar Samudrala <sri@us.ibm.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
501c774c |
|
17-Feb-2010 |
Arnd Bergmann <arnd@arndb.de> |
net/macvtap: add vhost support This adds support for passing a macvtap file descriptor into vhost-net, much like we already do for tun/tap. Most of the new code is taken from the respective patch in the tun driver and may get consolidated in the future. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Sridhar Samudrala <sri@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
02df55d2 |
|
17-Feb-2010 |
Arnd Bergmann <arnd@arndb.de> |
macvtap: rework object lifetime rules This reworks the change done by the previous patch in a more complete way. The original macvtap code has a number of problems resulting from the use of RCU for protecting the access to struct macvtap_queue from open files. This includes - need for GFP_ATOMIC allocations for skbs - potential deadlocks when copy_*_user sleeps - inability to work with vhost-net Changing the lifetime of macvtap_queue to always depend on the open file solves all these. The RCU reference simply moves one step down to the reference on the macvlan_dev, which we only need for nonblocking operations. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Sridhar Samudrala <sri@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
564517e8 |
|
10-Feb-2010 |
Arnd Bergmann <arnd@arndb.de> |
net/macvtap: fix reference counting The RCU usage in the original code was broken because there are cases where we possibly sleep with rcu_read_lock held. As a fix, change the macvtap_file_get_queue to get a reference on the socket and the netdev instead of taking the full rcu_read_lock. Also, change macvtap_file_get_queue failure case to not require a subsequent macvtap_file_put_queue, as pointed out by Ed Swierk. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: Ed Swierk <eswierk@aristanetworks.com> Cc: Sridhar Samudrala <sri@us.ibm.com> Acked-by: Sridhar Samudrala <sri@us.ibm.com> Acked-by: Ed Swierk <eswierk@aristanetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
20d29d7a |
|
29-Jan-2010 |
Arnd Bergmann <arnd@arndb.de> |
net: macvtap driver In order to use macvlan with qemu and other tools that require a tap file descriptor, the macvtap driver adds a small backend with a character device with the same interface as the tun driver, with a minimum set of features. Macvtap interfaces are created in the same way as macvlan interfaces using ip link, but the netif is just used as a handle for configuration and accounting, while the data goes through the chardev. Each macvtap interface has its own character device, simplifying permission management significantly over the generic tun/tap driver. Cc: Patrick McHardy <kaber@trash.net> Cc: Stephen Hemminger <shemminger@linux-foundation.org> Cc: David S. Miller" <davem@davemloft.net> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Or Gerlitz <ogerlitz@voltaire.com> Cc: netdev@vger.kernel.org Cc: bridge@lists.linux-foundation.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
|