#
059a49aa |
|
03-Apr-2024 |
Breno Leitao <leitao@debian.org> |
virtio_net: Do not send RSS key if it is not supported There is a bug when setting the RSS options in virtio_net that can break the whole machine, getting the kernel into an infinite loop. Running the following command in any QEMU virtual machine with virtionet will reproduce this problem: # ethtool -X eth0 hfunc toeplitz This is how the problem happens: 1) ethtool_set_rxfh() calls virtnet_set_rxfh() 2) virtnet_set_rxfh() calls virtnet_commit_rss_command() 3) virtnet_commit_rss_command() populates 4 entries for the rss scatter-gather 4) Since the command above does not have a key, then the last scatter-gatter entry will be zeroed, since rss_key_size == 0. sg_buf_size = vi->rss_key_size; 5) This buffer is passed to qemu, but qemu is not happy with a buffer with zero length, and do the following in virtqueue_map_desc() (QEMU function): if (!sz) { virtio_error(vdev, "virtio: zero sized buffers are not allowed"); 6) virtio_error() (also QEMU function) set the device as broken vdev->broken = true; 7) Qemu bails out, and do not repond this crazy kernel. 8) The kernel is waiting for the response to come back (function virtnet_send_command()) 9) The kernel is waiting doing the following : while (!virtqueue_get_buf(vi->cvq, &tmp) && !virtqueue_is_broken(vi->cvq)) cpu_relax(); 10) None of the following functions above is true, thus, the kernel loops here forever. Keeping in mind that virtqueue_is_broken() does not look at the qemu `vdev->broken`, so, it never realizes that the vitio is broken at QEMU side. Fix it by not sending RSS commands if the feature is not available in the device. Fixes: c7114b1249fa ("drivers/net/virtio_net: Added basic RSS support.") Cc: stable@vger.kernel.org Cc: qemu-devel@nongnu.org Signed-off-by: Breno Leitao <leitao@debian.org> Reviewed-by: Heng Qi <hengqi@linux.alibaba.com> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
5da7137d |
|
29-Feb-2024 |
Xuan Zhuo <xuanzhuo@linux.alibaba.com> |
virtio_net: rename free_old_xmit_skbs to free_old_xmit Since free_old_xmit_skbs not only deals with skb, but also xdp frame and subsequent added xsk, so change the name of this function to free_old_xmit. Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20240229072044.77388-19-xuanzhuo@linux.alibaba.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
|
#
b1dc24ab |
|
29-Feb-2024 |
Xuan Zhuo <xuanzhuo@linux.alibaba.com> |
virtio_net: unify the code for recycling the xmit ptr There are two completely similar and independent implementations. This is inconvenient for the subsequent addition of new types. So extract a function from this piece of code and call this function uniformly to recover old xmit ptr. Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20240229072044.77388-18-xuanzhuo@linux.alibaba.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
|
#
0d197a14 |
|
20-Jul-2023 |
Jason Wang <jasowang@redhat.com> |
virtio-net: add cond_resched() to the command waiting loop Adding cond_resched() to the command waiting loop for a better co-operation with the scheduler. This allows to give CPU a breath to run other task(workqueue) instead of busy looping when preemption is not allowed on a device whose CVQ might be slow. Signed-off-by: Jason Wang <jasowang@redhat.com> Message-Id: <20230720083839.481487-3-jasowang@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Shannon Nelson <shannon.nelson@amd.com>
|
#
b9f74252 |
|
20-Jul-2023 |
Jason Wang <jasowang@redhat.com> |
virtio-net: convert rx mode setting to use workqueue This patch convert rx mode setting to be done in a workqueue, this is a must for allow to sleep when waiting for the cvq command to response since current code is executed under addr spin lock. Note that we need to disable and flush the workqueue during freeze, this means the rx mode setting is lost after resuming. This is not the bug of this patch as we never try to restore rx mode setting during resume. Signed-off-by: Jason Wang <jasowang@redhat.com> Message-Id: <20230720083839.481487-2-jasowang@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Shannon Nelson <shannon.nelson@amd.com>
|
#
e3fe8d28 |
|
03-Jan-2024 |
Zhu Yanjun <yanjun.zhu@linux.dev> |
virtio_net: Fix "‘%d’ directive writing between 1 and 11 bytes into a region of size 10" warnings Fix the warnings when building virtio_net driver. " drivers/net/virtio_net.c: In function ‘init_vqs’: drivers/net/virtio_net.c:4551:48: warning: ‘%d’ directive writing between 1 and 11 bytes into a region of size 10 [-Wformat-overflow=] 4551 | sprintf(vi->rq[i].name, "input.%d", i); | ^~ In function ‘virtnet_find_vqs’, inlined from ‘init_vqs’ at drivers/net/virtio_net.c:4645:8: drivers/net/virtio_net.c:4551:41: note: directive argument in the range [-2147483643, 65534] 4551 | sprintf(vi->rq[i].name, "input.%d", i); | ^~~~~~~~~~ drivers/net/virtio_net.c:4551:17: note: ‘sprintf’ output between 8 and 18 bytes into a destination of size 16 4551 | sprintf(vi->rq[i].name, "input.%d", i); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ drivers/net/virtio_net.c: In function ‘init_vqs’: drivers/net/virtio_net.c:4552:49: warning: ‘%d’ directive writing between 1 and 11 bytes into a region of size 9 [-Wformat-overflow=] 4552 | sprintf(vi->sq[i].name, "output.%d", i); | ^~ In function ‘virtnet_find_vqs’, inlined from ‘init_vqs’ at drivers/net/virtio_net.c:4645:8: drivers/net/virtio_net.c:4552:41: note: directive argument in the range [-2147483643, 65534] 4552 | sprintf(vi->sq[i].name, "output.%d", i); | ^~~~~~~~~~~ drivers/net/virtio_net.c:4552:17: note: ‘sprintf’ output between 9 and 19 bytes into a destination of size 16 4552 | sprintf(vi->sq[i].name, "output.%d", i); " Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev> Link: https://lore.kernel.org/r/20240104020902.2753599-1-yanjun.zhu@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
d2c4f192 |
|
26-Dec-2023 |
Xuan Zhuo <xuanzhuo@linux.alibaba.com> |
virtio_net: fix missing dma unmap for resize For rq, we have three cases getting buffers from virtio core: 1. virtqueue_get_buf{,_ctx} 2. virtqueue_detach_unused_buf 3. callback for virtqueue_resize But in commit 295525e29a5b("virtio_net: merge dma operations when filling mergeable buffers"), I missed the dma unmap for the #3 case. That will leak some memory, because I did not release the pages referred by the unused buffers. If we do such script, we will make the system OOM. while true do ethtool -G ens4 rx 128 ethtool -G ens4 rx 256 free -m done Fixes: 295525e29a5b ("virtio_net: merge dma operations when filling mergeable buffers") Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Message-Id: <20231226094333.47740-1-xuanzhuo@linux.alibaba.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
|
#
fb6e30a7 |
|
12-Dec-2023 |
Ahmed Zaki <ahmed.zaki@intel.com> |
net: ethtool: pass a pointer to parameters to get/set_rxfh ethtool ops The get/set_rxfh ethtool ops currently takes the rxfh (RSS) parameters as direct function arguments. This will force us to change the API (and all drivers' functions) every time some new parameters are added. This is part 1/2 of the fix, as suggested in [1]: - First simplify the code by always providing a pointer to all params (indir, key and func); the fact that some of them may be NULL seems like a weird historic thing or a premature optimization. It will simplify the drivers if all pointers are always present. - Then make the functions take a dev pointer, and a pointer to a single struct wrapping all arguments. The set_* should also take an extack. Link: https://lore.kernel.org/netdev/20231121152906.2dd5f487@kernel.org/ [1] Suggested-by: Jakub Kicinski <kuba@kernel.org> Suggested-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Ahmed Zaki <ahmed.zaki@intel.com> Link: https://lore.kernel.org/r/20231213003321.605376-2-ahmed.zaki@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
62087995 |
|
11-Dec-2023 |
Heng Qi <hengqi@linux.alibaba.com> |
virtio-net: support rx netdim By comparing the traffic information in the complete napi processes, let the virtio-net driver automatically adjust the coalescing moderation parameters of each receive queue. Signed-off-by: Heng Qi <hengqi@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
1db43c08 |
|
11-Dec-2023 |
Heng Qi <hengqi@linux.alibaba.com> |
virtio-net: extract virtqueue coalescig cmd for reuse Extract commands to set virtqueue coalescing parameters for reuse by ethtool -Q, vq resize and netdim. Signed-off-by: Heng Qi <hengqi@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
d7180080 |
|
11-Dec-2023 |
Heng Qi <hengqi@linux.alibaba.com> |
virtio-net: separate rx/tx coalescing moderation cmds This patch separates the rx and tx global coalescing moderation commands to support netdim switches in subsequent patches. Signed-off-by: Heng Qi <hengqi@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
7949c06a |
|
11-Dec-2023 |
Heng Qi <hengqi@linux.alibaba.com> |
virtio-net: returns whether napi is complete rx netdim needs to count the traffic during a complete napi process, and start updating and comparing samples to make decisions after the napi ends. Let virtqueue_napi_complete() return true if napi is done, otherwise vice versa. Signed-off-by: Heng Qi <hengqi@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
2311e06b |
|
26-Dec-2023 |
Xuan Zhuo <xuanzhuo@linux.alibaba.com> |
virtio_net: fix missing dma unmap for resize For rq, we have three cases getting buffers from virtio core: 1. virtqueue_get_buf{,_ctx} 2. virtqueue_detach_unused_buf 3. callback for virtqueue_resize But in commit 295525e29a5b("virtio_net: merge dma operations when filling mergeable buffers"), I missed the dma unmap for the #3 case. That will leak some memory, because I did not release the pages referred by the unused buffers. If we do such script, we will make the system OOM. while true do ethtool -G ens4 rx 128 ethtool -G ens4 rx 256 free -m done Fixes: 295525e29a5b ("virtio_net: merge dma operations when filling mergeable buffers") Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Link: https://lore.kernel.org/r/20231226094333.47740-1-xuanzhuo@linux.alibaba.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
61217d8f |
|
26-Oct-2023 |
Eric Dumazet <edumazet@google.com> |
virtio_net: use u64_stats_t infra to avoid data-races syzbot reported a data-race in virtnet_poll / virtnet_stats [1] u64_stats_t infra has very nice accessors that must be used to avoid potential load-store tearing. [1] BUG: KCSAN: data-race in virtnet_poll / virtnet_stats read-write to 0xffff88810271b1a0 of 8 bytes by interrupt on cpu 0: virtnet_receive drivers/net/virtio_net.c:2102 [inline] virtnet_poll+0x6c8/0xb40 drivers/net/virtio_net.c:2148 __napi_poll+0x60/0x3b0 net/core/dev.c:6527 napi_poll net/core/dev.c:6594 [inline] net_rx_action+0x32b/0x750 net/core/dev.c:6727 __do_softirq+0xc1/0x265 kernel/softirq.c:553 invoke_softirq kernel/softirq.c:427 [inline] __irq_exit_rcu kernel/softirq.c:632 [inline] irq_exit_rcu+0x3b/0x90 kernel/softirq.c:644 common_interrupt+0x7f/0x90 arch/x86/kernel/irq.c:247 asm_common_interrupt+0x26/0x40 arch/x86/include/asm/idtentry.h:636 __sanitizer_cov_trace_const_cmp8+0x0/0x80 kernel/kcov.c:306 jbd2_write_access_granted fs/jbd2/transaction.c:1174 [inline] jbd2_journal_get_write_access+0x94/0x1c0 fs/jbd2/transaction.c:1239 __ext4_journal_get_write_access+0x154/0x3f0 fs/ext4/ext4_jbd2.c:241 ext4_reserve_inode_write+0x14e/0x200 fs/ext4/inode.c:5745 __ext4_mark_inode_dirty+0x8e/0x440 fs/ext4/inode.c:5919 ext4_evict_inode+0xaf0/0xdc0 fs/ext4/inode.c:299 evict+0x1aa/0x410 fs/inode.c:664 iput_final fs/inode.c:1775 [inline] iput+0x42c/0x5b0 fs/inode.c:1801 do_unlinkat+0x2b9/0x4f0 fs/namei.c:4405 __do_sys_unlink fs/namei.c:4446 [inline] __se_sys_unlink fs/namei.c:4444 [inline] __x64_sys_unlink+0x30/0x40 fs/namei.c:4444 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd read to 0xffff88810271b1a0 of 8 bytes by task 2814 on cpu 1: virtnet_stats+0x1b3/0x340 drivers/net/virtio_net.c:2564 dev_get_stats+0x6d/0x860 net/core/dev.c:10511 rtnl_fill_stats+0x45/0x320 net/core/rtnetlink.c:1261 rtnl_fill_ifinfo+0xd0e/0x1120 net/core/rtnetlink.c:1867 rtnl_dump_ifinfo+0x7f9/0xc20 net/core/rtnetlink.c:2240 netlink_dump+0x390/0x720 net/netlink/af_netlink.c:2266 netlink_recvmsg+0x425/0x780 net/netlink/af_netlink.c:1992 sock_recvmsg_nosec net/socket.c:1027 [inline] sock_recvmsg net/socket.c:1049 [inline] ____sys_recvmsg+0x156/0x310 net/socket.c:2760 ___sys_recvmsg net/socket.c:2802 [inline] __sys_recvmsg+0x1ea/0x270 net/socket.c:2832 __do_sys_recvmsg net/socket.c:2842 [inline] __se_sys_recvmsg net/socket.c:2839 [inline] __x64_sys_recvmsg+0x46/0x50 net/socket.c:2839 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd value changed: 0x000000000045c334 -> 0x000000000045c376 Fixes: 3fa2a1df9094 ("virtio-net: per cpu 64 bit stats (v2)") Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
c4e33cf2 |
|
08-Oct-2023 |
Heng Qi <hengqi@linux.alibaba.com> |
virtio-net: a tiny comment update Update a comment because virtio-net now supports both VIRTIO_NET_F_NOTF_COAL and VIRTIO_NET_F_VQ_NOTF_COAL. Signed-off-by: Heng Qi <hengqi@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
f61fe5f0 |
|
08-Oct-2023 |
Heng Qi <hengqi@linux.alibaba.com> |
virtio-net: fix the vq coalescing setting for vq resize According to the definition of virtqueue coalescing spec[1]: Upon disabling and re-enabling a transmit virtqueue, the device MUST set the coalescing parameters of the virtqueue to those configured through the VIRTIO_NET_CTRL_NOTF_COAL_TX_SET command, or, if the driver did not set any TX coalescing parameters, to 0. Upon disabling and re-enabling a receive virtqueue, the device MUST set the coalescing parameters of the virtqueue to those configured through the VIRTIO_NET_CTRL_NOTF_COAL_RX_SET command, or, if the driver did not set any RX coalescing parameters, to 0. We need to add this setting for vq resize (ethtool -G) where vq_reset happens. [1] https://lists.oasis-open.org/archives/virtio-dev/202303/msg00415.html Fixes: 394bd87764b6 ("virtio_net: support per queue interrupt coalesce command") Cc: Gavin Li <gavinl@nvidia.com> Signed-off-by: Heng Qi <hengqi@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
bfb2b360 |
|
08-Oct-2023 |
Heng Qi <hengqi@linux.alibaba.com> |
virtio-net: fix per queue coalescing parameter setting When the user sets a non-zero coalescing parameter to 0 for a specific virtqueue, it does not work as expected, so let's fix this. Fixes: 394bd87764b6 ("virtio_net: support per queue interrupt coalesce command") Reported-by: Xiaoming Zhao <zxm377917@alibaba-inc.com> Cc: Gavin Li <gavinl@nvidia.com> Signed-off-by: Heng Qi <hengqi@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
e9420838 |
|
08-Oct-2023 |
Heng Qi <hengqi@linux.alibaba.com> |
virtio-net: consistently save parameters for per-queue When using .set_coalesce interface to set all queue coalescing parameters, we need to update both per-queue and global save values. Fixes: 394bd87764b6 ("virtio_net: support per queue interrupt coalesce command") Cc: Gavin Li <gavinl@nvidia.com> Signed-off-by: Heng Qi <hengqi@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
134674c1 |
|
08-Oct-2023 |
Heng Qi <hengqi@linux.alibaba.com> |
virtio-net: fix mismatch of getting tx-frames Since virtio-net allows switching napi_tx for per txq, we have to get the specific txq's result now. Fixes: 394bd87764b6 ("virtio_net: support per queue interrupt coalesce command") Cc: Gavin Li <gavinl@nvidia.com> Signed-off-by: Heng Qi <hengqi@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
3014a0d5 |
|
08-Oct-2023 |
Heng Qi <hengqi@linux.alibaba.com> |
virtio-net: initially change the value of tx-frames Background: 1. Commit 0c465be183c7 ("virtio_net: ethtool tx napi configuration") uses tx-frames to toggle napi_tx (0 off and 1 on) if notification coalescing is not supported. 2. Commit 31c03aef9bc2 ("virtio_net: enable napi_tx by default") enables napi_tx for all txqs by default. Status: When virtio-net supports notification coalescing, after initialization, tx-frames is 0 and napi_tx is true. Problem: When the user only wants to set rx coalescing params using ethtool -C eth0 rx-usecs 10, or ethtool -Q eth0 queue_mask 0x1 -C rx-usecs 10, these cmds will carry tx-frames as 0, causing the napi_tx switching condition is satisfied. Then the user gets: netlink error: Device or resource busy. The same happens when trying to set rx-frames, adaptive_rx, adaptive_tx... How to fix: When notification coalescing feature is negotiated, initially make the value of tx-frames to be consistent with napi_tx. For compatibility with the past, it is still supported to use tx-frames to toggle napi_tx. Reported-by: Xiaoming Zhao <zxm377917@alibaba-inc.com> Signed-off-by: Heng Qi <hengqi@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
d12a26b7 |
|
21-Sep-2023 |
Eric Dumazet <edumazet@google.com> |
virtio_net: avoid data-races on dev->stats fields Use DEV_STATS_INC() and DEV_STATS_READ() which provide atomicity on paths that can be used concurrently. Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
5720c43d |
|
26-Sep-2023 |
Xuan Zhuo <xuanzhuo@linux.alibaba.com> |
virtio_net: fix the missing of the dma cpu sync Commit 295525e29a5b ("virtio_net: merge dma operations when filling mergeable buffers") unmaps the buffer with DMA_ATTR_SKIP_CPU_SYNC when the dma->ref is zero. We do that with DMA_ATTR_SKIP_CPU_SYNC, because we do not want to do the sync for the entire page_frag. But that misses the sync for the current area. This patch does cpu sync regardless of whether the ref is zero or not. Fixes: 295525e29a5b ("virtio_net: merge dma operations when filling mergeable buffers") Reported-by: Michael Roth <michael.roth@amd.com> Closes: http://lore.kernel.org/all/20230926130451.axgodaa6tvwqs3ut@amd.com Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
295525e2 |
|
10-Aug-2023 |
Xuan Zhuo <xuanzhuo@linux.alibaba.com> |
virtio_net: merge dma operations when filling mergeable buffers Currently, the virtio core will perform a dma operation for each buffer. Although, the same page may be operated multiple times. This patch, the driver does the dma operation and manages the dma address based the feature premapped of virtio core. This way, we can perform only one dma operation for the pages of the alloc frag. This is beneficial for the iommu device. kernel command line: intel_iommu=on iommu.passthrough=0 | strict=0 | strict=1 Before | 775496pps | 428614pps After | 1109316pps | 742853pps Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Message-Id: <20230810123057.43407-13-xuanzhuo@linux.alibaba.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
|
#
dae64749 |
|
21-Aug-2023 |
Feng Liu <feliu@nvidia.com> |
virtio_net: Introduce skb_vnet_common_hdr to avoid typecasting The virtio_net driver currently deals with different versions and types of virtio net headers, such as virtio_net_hdr_mrg_rxbuf, virtio_net_hdr_v1_hash, etc. Due to these variations, the code relies on multiple type casts to convert memory between different structures, potentially leading to bugs when there are changes in these structures. Introduces the "struct skb_vnet_common_hdr" as a unifying header structure using a union. With this approach, various virtio net header structures can be converted by accessing different members of this structure, thus eliminating the need for type casting and reducing the risk of potential bugs. For example following code: static struct sk_buff *page_to_skb(struct virtnet_info *vi, struct receive_queue *rq, struct page *page, unsigned int offset, unsigned int len, unsigned int truesize, unsigned int headroom) { [...] struct virtio_net_hdr_mrg_rxbuf *hdr; [...] hdr_len = vi->hdr_len; [...] ok: hdr = skb_vnet_hdr(skb); memcpy(hdr, hdr_p, hdr_len); [...] } When VIRTIO_NET_F_HASH_REPORT feature is enabled, hdr_len = 20 But the sizeof(*hdr) is 12, memcpy(hdr, hdr_p, hdr_len); will copy 20 bytes to the hdr, which make a potential risk of bug. And this risk can be avoided by introducing struct skb_vnet_common_hdr. Change log v1->v2 feedback from Willem de Bruijn <willemdebruijn.kernel@gmail.com> feedback from Simon Horman <horms@kernel.org> 1. change to use net-next tree. 2. move skb_vnet_common_hdr inside kernel file instead of the UAPI header. v2->v3 feedback from Willem de Bruijn <willemdebruijn.kernel@gmail.com> 1. fix typo in commit message. 2. add original struct virtio_net_hdr into union 3. remove virtio_net_hdr_mrg_rxbuf variable in receive_buf; Signed-off-by: Feng Liu <feliu@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
49e47a5b |
|
02-Aug-2023 |
Jakub Kicinski <kuba@kernel.org> |
net: move struct netdev_rx_queue out of netdevice.h struct netdev_rx_queue is touched in only a few places and having it defined in netdevice.h brings in the dependency on xdp.h, because struct xdp_rxq_info gets embedded in struct netdev_rx_queue. In prep for removal of xdp.h from netdevice.h move all the netdev_rx_queue stuff to a new header. We could technically break the new header up to avoid the sysfs.h include but it's so rarely included it doesn't seem to be worth it at this point. Reviewed-by: Amritha Nambiar <amritha.nambiar@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Acked-by: Jesper Dangaard Brouer <hawk@kernel.org> Link: https://lore.kernel.org/r/20230803010230.1755386-3-kuba@kernel.org Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
|
#
8af3bf66 |
|
31-Jul-2023 |
Gavin Li <gavinl@nvidia.com> |
virtio_net: enable per queue interrupt coalesce feature Enable per queue interrupt coalesce feature bit in driver and validate its dependency with control queue. Signed-off-by: Gavin Li <gavinl@nvidia.com> Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Heng Qi <hengqi@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com> Link: https://lore.kernel.org/r/20230731070656.96411-4-gavinl@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
394bd877 |
|
31-Jul-2023 |
Gavin Li <gavinl@nvidia.com> |
virtio_net: support per queue interrupt coalesce command Add interrupt_coalesce config in send_queue and receive_queue to cache user config. Send per virtqueue interrupt moderation config to underlying device in order to have more efficient interrupt moderation and cpu utilization of guest VM. Additionally, address all the VQs when updating the global configuration, as now the individual VQs configuration can diverge from the global configuration. Signed-off-by: Gavin Li <gavinl@nvidia.com> Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Heng Qi <hengqi@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com> Link: https://lore.kernel.org/r/20230731070656.96411-3-gavinl@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
308d7982 |
|
31-Jul-2023 |
Gavin Li <gavinl@nvidia.com> |
virtio_net: extract interrupt coalescing settings to a structure Extract interrupt coalescing settings to a structure so that it could be reused in other data structures. Signed-off-by: Gavin Li <gavinl@nvidia.com> Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Acked-by: Jason Wang <jasowang@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Heng Qi <hengqi@linux.alibaba.com> Link: https://lore.kernel.org/r/20230731070656.96411-2-gavinl@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
51b81317 |
|
09-Aug-2023 |
Jason Wang <jasowang@redhat.com> |
virtio-net: set queues after driver_ok Commit 25266128fe16 ("virtio-net: fix race between set queues and probe") tries to fix the race between set queues and probe by calling _virtnet_set_queues() before DRIVER_OK is set. This violates virtio spec. Fixing this by setting queues after virtio_device_ready(). Note that rtnl needs to be held for userspace requests to change the number of queues. So we are serialized in this way. Fixes: 25266128fe16 ("virtio-net: fix race between set queues and probe") Reported-by: Dragos Tatulea <dtatulea@nvidia.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
2c507ce9 |
|
10-Aug-2023 |
Hawkins Jiawei <yin31149@gmail.com> |
virtio-net: Zero max_tx_vq field for VIRTIO_NET_CTRL_MQ_HASH_CONFIG case Kernel uses `struct virtio_net_ctrl_rss` to save command-specific-data for both the VIRTIO_NET_CTRL_MQ_HASH_CONFIG and VIRTIO_NET_CTRL_MQ_RSS_CONFIG commands. According to the VirtIO standard, "Field reserved MUST contain zeroes. It is defined to make the structure to match the layout of virtio_net_rss_config structure, defined in 5.1.6.5.7.". Yet for the VIRTIO_NET_CTRL_MQ_HASH_CONFIG command case, the `max_tx_vq` field in struct virtio_net_ctrl_rss, which corresponds to the `reserved` field in struct virtio_net_hash_config, is not zeroed, thereby violating the VirtIO standard. This patch solves this problem by zeroing this field in virtnet_init_default_rss(). Cc: Andrew Melnychenko <andrew@daynix.com> Cc: stable@vger.kernel.org Fixes: c7114b1249fa ("drivers/net/virtio_net: Added basic RSS support.") Signed-off-by: Hawkins Jiawei <yin31149@gmail.com> Acked-by: Jason Wang <jasowang@redhat.com> Acked-by: Eugenio Pérez <eperezma@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Message-Id: <20230810110405.25558-1-yin31149@gmail.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com>
|
#
25266128 |
|
25-Jul-2023 |
Jason Wang <jasowang@redhat.com> |
virtio-net: fix race between set queues and probe A race were found where set_channels could be called after registering but before virtnet_set_queues() in virtnet_probe(). Fixing this by moving the virtnet_set_queues() before netdevice registering. While at it, use _virtnet_set_queues() to avoid holding rtnl as the device is not even registered at that time. Cc: stable@vger.kernel.org Fixes: a220871be66f ("virtio-net: correctly enable multiqueue") Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Link: https://lore.kernel.org/r/20230725072049.617289-1-jasowang@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
accc1bf2 |
|
05-Jun-2023 |
Brett Creeley <brett.creeley@amd.com> |
virtio_net: use control_buf for coalesce params Commit 699b045a8e43 ("net: virtio_net: notifications coalescing support") added coalescing command support for virtio_net. However, the coalesce commands are using buffers on the stack, which is causing the device to see DMA errors. There should also be a complaint from check_for_stack() in debug_dma_map_xyz(). Fix this by adding and using coalesce params from the control_buf struct, which aligns with other commands. Cc: stable@vger.kernel.org Fixes: 699b045a8e43 ("net: virtio_net: notifications coalescing support") Reviewed-by: Shannon Nelson <shannon.nelson@amd.com> Signed-off-by: Allen Hubbe <allen.hubbe@amd.com> Signed-off-by: Brett Creeley <brett.creeley@amd.com> Acked-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Link: https://lore.kernel.org/r/20230605195925.51625-1-brett.creeley@amd.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
5306623a |
|
12-May-2023 |
Feng Liu <feliu@nvidia.com> |
virtio_net: Fix error unwinding of XDP initialization When initializing XDP in virtnet_open(), some rq xdp initialization may hit an error causing net device open failed. However, previous rqs have already initialized XDP and enabled NAPI, which is not the expected behavior. Need to roll back the previous rq initialization to avoid leaks in error unwinding of init code. Also extract helper functions of disable and enable queue pairs. Use newly introduced disable helper function in error unwinding and virtnet_close. Use enable helper function in virtnet_open. Fixes: 754b8a21a96d ("virtio_net: setup xdp_rxq_info") Signed-off-by: Feng Liu <feliu@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: William Tu <witu@nvidia.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
b51f4113 |
|
10-May-2023 |
Yunsheng Lin <linyunsheng@huawei.com> |
net: introduce and use skb_frag_fill_page_desc() Most users use __skb_frag_set_page()/skb_frag_off_set()/ skb_frag_size_set() to fill the page desc for a skb frag. Introduce skb_frag_fill_page_desc() to do that. net/bpf/test_run.c does not call skb_frag_off_set() to set the offset, "copy_from_user(page_address(page), ...)" and 'shinfo' being part of the 'data' kzalloced in bpf_test_init() suggest that it is assuming offset to be initialized as zero, so call skb_frag_fill_page_desc() with offset being zero for this case. Also, skb_frag_set_page() is not used anymore, so remove it. Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
21e26a71 |
|
08-May-2023 |
Xuan Zhuo <xuanzhuo@linux.alibaba.com> |
virtio_net: introduce virtnet_build_skb() This logic is used in multiple places, now we separate it into a helper. Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
19e8c85e |
|
08-May-2023 |
Xuan Zhuo <xuanzhuo@linux.alibaba.com> |
virtio_net: introduce receive_small_build_xdp Simplifying receive_small() function. Bringing the logic relating to build_skb together. Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
aef76506 |
|
08-May-2023 |
Xuan Zhuo <xuanzhuo@linux.alibaba.com> |
virtio_net: small: remove skip_xdp Because the skb build code is not shared between xdp and non-xdp, and the xdp code in receive_small() is simpler, so "skip_xdp" is not needed. We can remove it. Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
7af70fc1 |
|
08-May-2023 |
Xuan Zhuo <xuanzhuo@linux.alibaba.com> |
virtio_net: small: avoid code duplication in xdp scenarios Avoid the problem that some variables(headroom and so on) will repeat the calculation when process xdp. Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
fc8ce84b |
|
08-May-2023 |
Xuan Zhuo <xuanzhuo@linux.alibaba.com> |
virtio_net: small: remove the delta In the case of XDP-PASS, skb_reserve uses the "delta" to compatible non-XDP, now that is not shared between xdp and non-xdp, so we can remove this logic. Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
c5f3e72f |
|
08-May-2023 |
Xuan Zhuo <xuanzhuo@linux.alibaba.com> |
virtio_net: introduce receive_small_xdp() The purpose of this patch is to simplify the receive_small(). Separate all the logic of XDP of small into a function. Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
59ba3b1a |
|
08-May-2023 |
Xuan Zhuo <xuanzhuo@linux.alibaba.com> |
virtio_net: merge: remove skip_xdp Now, the logic of merge xdp process is simple, we can remove the skip_xdp. Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
d8f2835a |
|
08-May-2023 |
Xuan Zhuo <xuanzhuo@linux.alibaba.com> |
virtio_net: introduce receive_mergeable_xdp() The purpose of this patch is to simplify the receive_mergeable(). Separate all the logic of XDP into a function. Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
4cb00b13 |
|
08-May-2023 |
Xuan Zhuo <xuanzhuo@linux.alibaba.com> |
virtio_net: virtnet_build_xdp_buff_mrg() auto release xdp shinfo virtnet_build_xdp_buff_mrg() auto release xdp shinfo then the caller no need to careful the xdp shinfo. Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
80f50f91 |
|
08-May-2023 |
Xuan Zhuo <xuanzhuo@linux.alibaba.com> |
virtio_net: separate the logic of freeing the rest mergeable buf This patch introduce a new function that frees the rest mergeable buf. The subsequent patch will reuse this function. Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
bb2c1e9e |
|
08-May-2023 |
Xuan Zhuo <xuanzhuo@linux.alibaba.com> |
virtio_net: separate the logic of freeing xdp shinfo This patch introduce a new function that releases the xdp shinfo. The subsequent patch will reuse this function. Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
00765f8e |
|
08-May-2023 |
Xuan Zhuo <xuanzhuo@linux.alibaba.com> |
virtio_net: introduce virtnet_xdp_handler() to seprate the logic of run xdp At present, we have two similar logic to perform the XDP prog. Therefore, this patch separates the code of executing XDP, which is conducive to later maintenance. Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
dbe4fec2 |
|
08-May-2023 |
Xuan Zhuo <xuanzhuo@linux.alibaba.com> |
virtio_net: optimize mergeable_xdp_get_buf() The previous patch, in order to facilitate review, I do not do any modification. This patch has made some optimization on the top. * remove some repeated logics in this function. * add fast check for passing without any alloc. Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
ad4858be |
|
08-May-2023 |
Xuan Zhuo <xuanzhuo@linux.alibaba.com> |
virtio_net: introduce mergeable_xdp_get_buf() Separating the logic of preparation for xdp from receive_mergeable. The purpose of this is to simplify the logic of execution of XDP. The main logic here is that when headroom is insufficient, we need to allocate a new page and calculate offset. It should be noted that if there is new page, the variable page will refer to the new page. Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
363d8ce4 |
|
08-May-2023 |
Xuan Zhuo <xuanzhuo@linux.alibaba.com> |
virtio_net: mergeable xdp: put old page immediately In the xdp implementation of virtio-net mergeable, it always checks whether two page is used and a page is selected to release. This is complicated for the processing of action, and be careful. In the entire process, we have such principles: * If xdp_page is used (PASS, TX, Redirect), then we release the old page. * If it is a drop case, we will release two. The old page obtained from buf is release inside err_xdp, and xdp_page needs be relased by us. But in fact, when we allocate a new page, we can release the old page immediately. Then just one is using, we just need to release the new page for drop case. On the drop path, err_xdp will release the variable "page", so we only need to let "page" point to the new xdp_page in advance. Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
f8bb5104 |
|
03-May-2023 |
Wenliang Wang <wangwenliang.1995@bytedance.com> |
virtio_net: suppress cpu stall when free_unused_bufs For multi-queue and large ring-size use case, the following error occurred when free_unused_bufs: rcu: INFO: rcu_sched self-detected stall on CPU. Fixes: 986a4f4d452d ("virtio_net: multiqueue support") Signed-off-by: Wenliang Wang <wangwenliang.1995@bytedance.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
be50da3e |
|
09-Mar-2023 |
Jiri Pirko <jiri@resnulli.us> |
net: virtio_net: implement exact header length guest feature Virtio spec introduced a feature VIRTIO_NET_F_GUEST_HDRLEN which when set implicates that device benefits from knowing the exact size of the header. For compatibility, to signal to the device that the header is reliable driver also needs to set this feature. Without this feature set by driver, device has to figure out the header size itself. Quoting the original virtio spec: "hdr_len is a hint to the device as to how much of the header needs to be kept to copy into each packet" "a hint" might not be clear for the reader what does it mean, if it is "maybe like that" of "exactly like that". This feature just makes it crystal clear and let the device count on the hdr_len being filled up by the exact length of header. Also note the spec already has following note about hdr_len: "Due to various bugs in implementations, this field is not useful as a guarantee of the transport header size." Without this feature the device needs to parse the header in core data path handling. Accurate information helps the device to eliminate such header parsing and directly use the hardware accelerators for GSO operation. virtio_net_hdr_from_skb() fills up hdr_len to skb_headlen(skb). The driver already complies to fill the correct value. Introduce the feature and advertise it. Note that virtio spec also includes following note for device implementation: "Caution should be taken by the implementation so as to prevent a malicious driver from attacking the device by setting an incorrect hdr_len." There is a plan to support this feature in our emulated device. A device of SolidRun offers this feature bit. They claim this feature will save the device a few cycles for every GSO packet. Link: https://docs.oasis-open.org/virtio/virtio/v1.2/cs01/virtio-v1.2-cs01.html#x1-230006x3 Signed-off-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Parav Pandit <parav@nvidia.com> Reviewed-by: Alvaro Karsz <alvaro.karsz@solid-run.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Willem de Bruijn <willemb@google.com> Link: https://lore.kernel.org/r/20230309094559.917857-1-jiri@resnulli.us Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
853618d5 |
|
14-Apr-2023 |
Xuan Zhuo <xuanzhuo@linux.alibaba.com> |
virtio_net: bugfix overflow inside xdp_linearize_page() Here we copy the data from the original buf to the new page. But we not check that it may be overflow. As long as the size received(including vnethdr) is greater than 3840 (PAGE_SIZE -VIRTIO_XDP_HEADROOM). Then the memcpy will overflow. And this is completely possible, as long as the MTU is large, such as 4096. In our test environment, this will cause crash. Since crash is caused by the written memory, it is meaningless, so I do not include it. Fixes: 72979a6c3590 ("virtio_net: xdp, add slowpath case for non contiguous buffers") Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
1a3bd6ea |
|
14-Mar-2023 |
Xuan Zhuo <xuanzhuo@linux.alibaba.com> |
virtio_net: free xdp shinfo frags when build_skb_from_xdp_buff() fails build_skb_from_xdp_buff() may return NULL, in this case we need to free the frags of xdp shinfo. Fixes: fab89bafa95b ("virtio-net: support multi-buffer xdp") Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Yunsheng Lin <linyunsheng@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
fa0f1ba7 |
|
14-Mar-2023 |
Xuan Zhuo <xuanzhuo@linux.alibaba.com> |
virtio_net: fix page_to_skb() miss headroom Because headroom is not passed to page_to_skb(), this causes the shinfo exceeds the range. Then the frags of shinfo are changed by other process. [ 157.724634] stack segment: 0000 [#1] PREEMPT SMP NOPTI [ 157.725358] CPU: 3 PID: 679 Comm: xdp_pass_user_f Tainted: G E 6.2.0+ #150 [ 157.726401] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/4 [ 157.727820] RIP: 0010:skb_release_data+0x11b/0x180 [ 157.728449] Code: 44 24 02 48 83 c3 01 39 d8 7e be 48 89 d8 48 c1 e0 04 41 80 7d 7e 00 49 8b 6c 04 30 79 0c 48 89 ef e8 89 b [ 157.730751] RSP: 0018:ffffc90000178b48 EFLAGS: 00010202 [ 157.731383] RAX: 0000000000000010 RBX: 0000000000000001 RCX: 0000000000000000 [ 157.732270] RDX: 0000000000000000 RSI: 0000000000000002 RDI: ffff888100dd0b00 [ 157.733117] RBP: 5d5d76010f6e2408 R08: ffff888100dd0b2c R09: 0000000000000000 [ 157.734013] R10: ffffffff82effd30 R11: 000000000000a14e R12: ffff88810981ffc0 [ 157.734904] R13: ffff888100dd0b00 R14: 0000000000000002 R15: 0000000000002310 [ 157.735793] FS: 00007f06121d9740(0000) GS:ffff88842fcc0000(0000) knlGS:0000000000000000 [ 157.736794] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 157.737522] CR2: 00007ffd9a56c084 CR3: 0000000104bda001 CR4: 0000000000770ee0 [ 157.738420] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 157.739283] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 157.740146] PKRU: 55555554 [ 157.740502] Call Trace: [ 157.740843] <IRQ> [ 157.741117] kfree_skb_reason+0x50/0x120 [ 157.741613] __udp4_lib_rcv+0x52b/0x5e0 [ 157.742132] ip_protocol_deliver_rcu+0xaf/0x190 [ 157.742715] ip_local_deliver_finish+0x77/0xa0 [ 157.743280] ip_sublist_rcv_finish+0x80/0x90 [ 157.743834] ip_list_rcv_finish.constprop.0+0x16f/0x190 [ 157.744493] ip_list_rcv+0x126/0x140 [ 157.744952] __netif_receive_skb_list_core+0x29b/0x2c0 [ 157.745602] __netif_receive_skb_list+0xed/0x160 [ 157.746190] ? udp4_gro_receive+0x275/0x350 [ 157.746732] netif_receive_skb_list_internal+0xf2/0x1b0 [ 157.747398] napi_gro_receive+0xd1/0x210 [ 157.747911] virtnet_receive+0x75/0x1c0 [ 157.748422] virtnet_poll+0x48/0x1b0 [ 157.748878] __napi_poll+0x29/0x1b0 [ 157.749330] net_rx_action+0x27a/0x340 [ 157.749812] __do_softirq+0xf3/0x2fb [ 157.750298] do_softirq+0xa2/0xd0 [ 157.750745] </IRQ> [ 157.751563] <TASK> [ 157.752329] __local_bh_enable_ip+0x6d/0x80 [ 157.753178] virtnet_xdp_set+0x482/0x860 [ 157.754159] ? __pfx_virtnet_xdp+0x10/0x10 [ 157.755129] dev_xdp_install+0xa4/0xe0 [ 157.756033] dev_xdp_attach+0x20b/0x5e0 [ 157.756933] do_setlink+0x82e/0xc90 [ 157.757777] ? __nla_validate_parse+0x12b/0x1e0 [ 157.758744] rtnl_setlink+0xd8/0x170 [ 157.759549] ? mod_objcg_state+0xcb/0x320 [ 157.760328] ? security_capable+0x37/0x60 [ 157.761209] ? security_capable+0x37/0x60 [ 157.762072] rtnetlink_rcv_msg+0x145/0x3d0 [ 157.762929] ? ___slab_alloc+0x327/0x610 [ 157.763754] ? __alloc_skb+0x141/0x170 [ 157.764533] ? __pfx_rtnetlink_rcv_msg+0x10/0x10 [ 157.765422] netlink_rcv_skb+0x58/0x110 [ 157.766229] netlink_unicast+0x21f/0x330 [ 157.766951] netlink_sendmsg+0x240/0x4a0 [ 157.767654] sock_sendmsg+0x93/0xa0 [ 157.768434] ? sockfd_lookup_light+0x12/0x70 [ 157.769245] __sys_sendto+0xfe/0x170 [ 157.770079] ? handle_mm_fault+0xe9/0x2d0 [ 157.770859] ? preempt_count_add+0x51/0xa0 [ 157.771645] ? up_read+0x3c/0x80 [ 157.772340] ? do_user_addr_fault+0x1e9/0x710 [ 157.773166] ? kvm_read_and_reset_apf_flags+0x49/0x60 [ 157.774087] __x64_sys_sendto+0x29/0x30 [ 157.774856] do_syscall_64+0x3c/0x90 [ 157.775518] entry_SYSCALL_64_after_hwframe+0x72/0xdc [ 157.776382] RIP: 0033:0x7f06122def70 Fixes: 18117a842ab0 ("virtio-net: remove xdp related info from page_to_skb()") Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
cd1c604a |
|
07-Mar-2023 |
Xuan Zhuo <xuanzhuo@linux.alibaba.com> |
virtio_net: add checking sq is full inside xdp xmit If the queue of xdp xmit is not an independent queue, then when the xdp xmit used all the desc, the xmit from the __dev_queue_xmit() may encounter the following error. net ens4: Unexpected TXQ (0) queue failure: -28 This patch adds a check whether sq is full in xdp xmit. Fixes: 56434a01b12e ("virtio_net: add XDP_TX support") Reported-by: Yichun Zhang <yichun@openresty.com> Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Reviewed-by: Alexander Duyck <alexanderduyck@fb.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
b8ef4809 |
|
07-Mar-2023 |
Xuan Zhuo <xuanzhuo@linux.alibaba.com> |
virtio_net: separate the logic of checking whether sq is full Separate the logic of checking whether sq is full. The subsequent patch will reuse this func. Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Reviewed-by: Alexander Duyck <alexanderduyck@fb.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
25074a44 |
|
07-Mar-2023 |
Xuan Zhuo <xuanzhuo@linux.alibaba.com> |
virtio_net: reorder some funcs The purpose of this is to facilitate the subsequent addition of new functions without introducing a separate declaration. Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
30bbf891 |
|
07-Feb-2023 |
Lorenzo Bianconi <lorenzo@kernel.org> |
virtio_net: Update xdp_features with xdp multi-buff Now virtio-net supports xdp multi-buffer so add it to xdp_features. Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Michael S. Tsirkin <mst@redhat.com> Link: https://lore.kernel.org/bpf/60c76cd63a0246db785606e8891b925fd5c9bf06.1675763384.git.lorenzo@kernel.org
|
#
27369c9c |
|
03-Feb-2023 |
Parav Pandit <parav@nvidia.com> |
virtio-net: Maintain reverse cleanup order To easily audit the code, better to keep the device stop() sequence to be mirror of the device open() sequence. Acked-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Parav Pandit <parav@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
66c0e13a |
|
01-Feb-2023 |
Marek Majtyka <alardam@gmail.com> |
drivers: net: turn on XDP features A summary of the flags being set for various drivers is given below. Note that XDP_F_REDIRECT_TARGET and XDP_F_FRAG_TARGET are features that can be turned off and on at runtime. This means that these flags may be set and unset under RTNL lock protection by the driver. Hence, READ_ONCE must be used by code loading the flag value. Also, these flags are not used for synchronization against the availability of XDP resources on a device. It is merely a hint, and hence the read may race with the actual teardown of XDP resources on the device. This may change in the future, e.g. operations taking a reference on the XDP resources of the driver, and in turn inhibiting turning off this flag. However, for now, it can only be used as a hint to check whether device supports becoming a redirection target. Turn 'hw-offload' feature flag on for: - netronome (nfp) - netdevsim. Turn 'native' and 'zerocopy' features flags on for: - intel (i40e, ice, ixgbe, igc) - mellanox (mlx5). - stmmac - netronome (nfp) Turn 'native' features flags on for: - amazon (ena) - broadcom (bnxt) - freescale (dpaa, dpaa2, enetc) - funeth - intel (igb) - marvell (mvneta, mvpp2, octeontx2) - mellanox (mlx4) - mtk_eth_soc - qlogic (qede) - sfc - socionext (netsec) - ti (cpsw) - tap - tsnep - veth - xen - virtio_net. Turn 'basic' (tx, pass, aborted and drop) features flags on for: - netronome (nfp) - cavium (thunder) - hyperv. Turn 'redirect_target' feature flag on for: - amanzon (ena) - broadcom (bnxt) - freescale (dpaa, dpaa2) - intel (i40e, ice, igb, ixgbe) - ti (cpsw) - marvell (mvneta, mvpp2) - sfc - socionext (netsec) - qlogic (qede) - mellanox (mlx5) - tap - veth - virtio_net - xen Reviewed-by: Gerhard Engleder <gerhard@engleder-embedded.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Acked-by: Stanislav Fomichev <sdf@google.com> Acked-by: Jakub Kicinski <kuba@kernel.org> Co-developed-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Co-developed-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Marek Majtyka <alardam@gmail.com> Link: https://lore.kernel.org/r/3eca9fafb308462f7edb1f58e451d59209aa07eb.1675245258.git.lorenzo@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
#
981f14d4 |
|
31-Jan-2023 |
Heng Qi <hengqi@linux.alibaba.com> |
virtio-net: fix possible unsigned integer overflow When the single-buffer xdp is loaded and after xdp_linearize_page() is called, *num_buf becomes 0 and (*num_buf - 1) may overflow into a large integer in virtnet_build_xdp_buff_mrg(), resulting in unexpected packet dropping. Fixes: ef75cb51f139 ("virtio-net: build xdp_buff with multi buffers") Signed-off-by: Heng Qi <hengqi@linux.alibaba.com> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Link: https://lore.kernel.org/r/20230131085004.98687-1-hengqi@linux.alibaba.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
9f62d221 |
|
27-Jan-2023 |
Laurent Vivier <lvivier@redhat.com> |
virtio_net: notify MAC address change on device initialization In virtnet_probe(), if the device doesn't provide a MAC address the driver assigns a random one. As we modify the MAC address we need to notify the device to allow it to update all the related information. The problem can be seen with vDPA and mlx5_vdpa driver as it doesn't assign a MAC address by default. The virtio_net device uses a random MAC address (we can see it with "ip link"), but we can't ping a net namespace from another one using the virtio-vdpa device because the new MAC address has not been provided to the hardware: RX packets are dropped since they don't go through the receive filters, TX packets go through unaffected. Signed-off-by: Laurent Vivier <lvivier@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
7c06458c |
|
27-Jan-2023 |
Laurent Vivier <lvivier@redhat.com> |
virtio_net: disable VIRTIO_NET_F_STANDBY if VIRTIO_NET_F_MAC is not set failover relies on the MAC address to pair the primary and the standby devices: "[...] the hypervisor needs to enable VIRTIO_NET_F_STANDBY feature on the virtio-net interface and assign the same MAC address to both virtio-net and VF interfaces." Documentation/networking/net_failover.rst This patch disables the STANDBY feature if the MAC address is not provided by the hypervisor. Signed-off-by: Laurent Vivier <lvivier@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
d0671115 |
|
22-Jan-2023 |
Parav Pandit <parav@nvidia.com> |
virtio-net: Reduce debug name field size to 16 bytes virtio queue index can be maximum of 65535. 16 bytes are enough to store the vq name with the existing string prefix. With this change, send queue struct saves 24 bytes and receive queue saves whole cache line worth 64 bytes per structure due to saving in alignment bytes. Pahole results before: pahole -s drivers/net/virtio_net.o | \ grep -e "send_queue" -e "receive_queue" send_queue 1112 0 receive_queue 1280 1 Pahole results after: pahole -s drivers/net/virtio_net.o | \ grep -e "send_queue" -e "receive_queue" send_queue 1088 0 receive_queue 1216 1 Signed-off-by: Parav Pandit <parav@nvidia.com> Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
eb1d929f |
|
16-Jan-2023 |
Parav Pandit <parav@nvidia.com> |
virtio_net: Reuse buffer free function virtnet_rq_free_unused_buf() helper function to free the buffer already exists. Avoid code duplication by reusing existing function. Reviewed-by: Alexander Duyck <alexanderduyck@fb.com> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Signed-off-by: Parav Pandit <parav@nvidia.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
fab89baf |
|
14-Jan-2023 |
Heng Qi <hengqi@linux.alibaba.com> |
virtio-net: support multi-buffer xdp Driver can pass the skb to stack by build_skb_from_xdp_buff(). Driver forwards multi-buffer packets using the send queue when XDP_TX and XDP_REDIRECT, and clears the reference of multi pages when XDP_DROP. Signed-off-by: Heng Qi <hengqi@linux.alibaba.com> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
18117a84 |
|
14-Jan-2023 |
Heng Qi <hengqi@linux.alibaba.com> |
virtio-net: remove xdp related info from page_to_skb() For the clear construction of xdp_buff, we remove the xdp processing interleaved with page_to_skb(). Now, the logic of xdp and building skb from xdp are separate and independent. Signed-off-by: Heng Qi <hengqi@linux.alibaba.com> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
b26aa481 |
|
14-Jan-2023 |
Heng Qi <hengqi@linux.alibaba.com> |
virtio-net: build skb from multi-buffer xdp This converts the xdp_buff directly to a skb, including multi-buffer and single buffer xdp. We'll isolate the construction of skb based on xdp from page_to_skb(). Signed-off-by: Heng Qi <hengqi@linux.alibaba.com> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
97717e8d |
|
14-Jan-2023 |
Heng Qi <hengqi@linux.alibaba.com> |
virtio-net: transmit the multi-buffer xdp This serves as the basis for XDP_TX and XDP_REDIRECT to send a multi-buffer xdp_frame. Signed-off-by: Heng Qi <hengqi@linux.alibaba.com> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
22174f79 |
|
14-Jan-2023 |
Heng Qi <hengqi@linux.alibaba.com> |
virtio-net: construct multi-buffer xdp in mergeable Build multi-buffer xdp using virtnet_build_xdp_buff_mrg(). For the prefilled buffer before xdp is set, we will probably use vq reset in the future. At the same time, virtio net currently uses comp pages, and bpf_xdp_frags_increase_tail() needs to calculate the tailroom of the last frag, which will involve the offset of the corresponding page and cause a negative value, so we disable tail increase by not setting xdp_rxq->frag_size. Signed-off-by: Heng Qi <hengqi@linux.alibaba.com> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
ef75cb51 |
|
14-Jan-2023 |
Heng Qi <hengqi@linux.alibaba.com> |
virtio-net: build xdp_buff with multi buffers Support xdp for multi buffer packets in mergeable mode. Putting the first buffer as the linear part for xdp_buff, and the rest of the buffers as non-linear fragments to struct skb_shared_info in the tailroom belonging to xdp_buff. Let 'truesize' return to its literal meaning, that is, when xdp is set, it includes the length of headroom and tailroom. Signed-off-by: Heng Qi <hengqi@linux.alibaba.com> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
50bd14bc |
|
14-Jan-2023 |
Heng Qi <hengqi@linux.alibaba.com> |
virtio-net: update bytes calculation for xdp_frame Update relative record value for xdp_frame as basis for multi-buffer xdp transmission. Signed-off-by: Heng Qi <hengqi@linux.alibaba.com> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
8d9bc36d |
|
14-Jan-2023 |
Heng Qi <hengqi@linux.alibaba.com> |
virtio-net: set up xdp for multi buffer packets When the xdp program sets xdp.frags, which means it can process multi-buffer packets over larger MTU, so we continue to support xdp. Signed-off-by: Heng Qi <hengqi@linux.alibaba.com> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
e814b958 |
|
14-Jan-2023 |
Heng Qi <hengqi@linux.alibaba.com> |
virtio-net: fix calculation of MTU for single-buffer xdp When single-buffer xdp is loaded, the size of the buffer filled each time is 'sz = (PAGE_SIZE - headroom - tailroom)', which is the maximum packet length that the driver allows the device to pass in. Otherwise, the packet with a length greater than sz will come in, so num_buf will be greater than or equal to 2, and xdp_linearize_page() will be performed and the packet will be dropped because the total length is greater than PAGE_SIZE. So the maximum value of MTU for single-buffer xdp is 'max_sz = sz - ETH_HLEN'. Signed-off-by: Heng Qi <hengqi@linux.alibaba.com> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
484beac2 |
|
14-Jan-2023 |
Heng Qi <hengqi@linux.alibaba.com> |
virtio-net: disable the hole mechanism for xdp XDP core assumes that the frame_size of xdp_buff and the length of the frag are PAGE_SIZE. The hole may cause the processing of xdp to fail, so we disable the hole mechanism when xdp is set. Signed-off-by: Heng Qi <hengqi@linux.alibaba.com> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
63b11404 |
|
02-Feb-2023 |
Parav Pandit <parav@nvidia.com> |
virtio-net: Keep stop() to follow mirror sequence of open() Cited commit in fixes tag frees rxq xdp info while RQ NAPI is still enabled and packet processing may be ongoing. Follow the mirror sequence of open() in the stop() callback. This ensures that when rxq info is unregistered, no rx packet processing is ongoing. Fixes: 754b8a21a96d ("virtio_net: setup xdp_rxq_info") Acked-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Parav Pandit <parav@nvidia.com> Link: https://lore.kernel.org/r/20230202163516.12559-1-parav@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
ad7e615f |
|
25-Jan-2023 |
Magnus Karlsson <magnus.karlsson@intel.com> |
virtio-net: execute xdp_do_flush() before napi_complete_done() Make sure that xdp_do_flush() is always executed before napi_complete_done(). This is important for two reasons. First, a redirect to an XSKMAP assumes that a call to xdp_do_redirect() from napi context X on CPU Y will be followed by a xdp_do_flush() from the same napi context and CPU. This is not guaranteed if the napi_complete_done() is executed before xdp_do_flush(), as it tells the napi logic that it is fine to schedule napi context X on another CPU. Details from a production system triggering this bug using the veth driver can be found following the first link below. The second reason is that the XDP_REDIRECT logic in itself relies on being inside a single NAPI instance through to the xdp_do_flush() call for RCU protection of all in-kernel data structures. Details can be found in the second link below. Fixes: 186b3c998c50 ("virtio-net: support XDP_REDIRECT") Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Acked-by: Toke Høiland-Jørgensen <toke@redhat.com> Link: https://lore.kernel.org/r/20221220185903.1105011-1-sbohrer@cloudflare.com Link: https://lore.kernel.org/all/20210624160609.292325-1-toke@redhat.com/ Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
d71ebe81 |
|
16-Jan-2023 |
Jason Wang <jasowang@redhat.com> |
virtio-net: correctly enable callback during start_xmit Commit a7766ef18b33("virtio_net: disable cb aggressively") enables virtqueue callback via the following statement: do { if (use_napi) virtqueue_disable_cb(sq->vq); free_old_xmit_skbs(sq, false); } while (use_napi && kick && unlikely(!virtqueue_enable_cb_delayed(sq->vq))); When NAPI is used and kick is false, the callback won't be enabled here. And when the virtqueue is about to be full, the tx will be disabled, but we still don't enable tx interrupt which will cause a TX hang. This could be observed when using pktgen with burst enabled. TO be consistent with the logic that tries to disable cb only for NAPI, fixing this by trying to enable delayed callback only when NAPI is enabled when the queue is about to be full. Fixes: a7766ef18b33 ("virtio_net: disable cb aggressively") Signed-off-by: Jason Wang <jasowang@redhat.com> Tested-by: Laurent Vivier <lvivier@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
418044e1 |
|
07-Dec-2022 |
Andrew Melnychenko <andrew@daynix.com> |
drivers/net/virtio_net.c: Added USO support. Now, it possible to enable GSO_UDP_L4("tx-udp-segmentation") for VirtioNet. Signed-off-by: Andrew Melnychenko <andrew@daynix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
068c38ad |
|
26-Oct-2022 |
Thomas Gleixner <tglx@linutronix.de> |
net: Remove the obsolte u64_stats_fetch_*_irq() users (drivers). Now that the 32bit UP oddity is gone and 32bit uses always a sequence count, there is no need for the fetch_irq() variants anymore. Convert to the regular interface. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
b0686565 |
|
22-Nov-2022 |
Li Zetao <lizetao1@huawei.com> |
virtio_net: Fix probe failed when modprobe virtio_net When doing the following test steps, an error was found: step 1: modprobe virtio_net succeeded # modprobe virtio_net <-- OK step 2: fault injection in register_netdevice() # modprobe -r virtio_net <-- OK # ... FAULT_INJECTION: forcing a failure. name failslab, interval 1, probability 0, space 0, times 0 CPU: 0 PID: 3521 Comm: modprobe Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), Call Trace: <TASK> ... should_failslab+0xa/0x20 ... dev_set_name+0xc0/0x100 netdev_register_kobject+0xc2/0x340 register_netdevice+0xbb9/0x1320 virtnet_probe+0x1d72/0x2658 [virtio_net] ... </TASK> virtio_net: probe of virtio0 failed with error -22 step 3: modprobe virtio_net failed # modprobe virtio_net <-- failed virtio_net: probe of virtio0 failed with error -2 The root cause of the problem is that the queues are not disable on the error handling path when register_netdevice() fails in virtnet_probe(), resulting in an error "-ENOENT" returned in the next modprobe call in setup_vq(). virtio_pci_modern_device uses virtqueues to send or receive message, and "queue_enable" records whether the queues are available. In vp_modern_find_vqs(), all queues will be selected and activated, but once queues are enabled there is no way to go back except reset. Fix it by reset virtio device on error handling path. This makes error handling follow the same order as normal device cleanup in virtnet_remove() which does: unregister, destroy failover, then reset. And that flow is better tested than error handling so we can be reasonably sure it works well. Fixes: 024655555021 ("virtio_net: fix use after free on allocation failure") Signed-off-by: Li Zetao <lizetao1@huawei.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Link: https://lore.kernel.org/r/20221122150046.3910638-1-lizetao1@huawei.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
#
4959aebb |
|
14-Sep-2022 |
Gavin Li <gavinl@nvidia.com> |
virtio-net: use mtu size as buffer length for big packets Currently add_recvbuf_big() allocates MAX_SKB_FRAGS segments for big packets even when GUEST_* offloads are not present on the device. However, if guest GSO is not supported, it would be sufficient to allocate segments to cover just up the MTU size and no further. Allocating the maximum amount of segments results in a large waste of buffer space in the queue, which limits the number of packets that can be buffered and can result in reduced performance. Therefore, if guest GSO is not supported, use the MTU to calculate the optimal amount of segments required. Below is the iperf TCP test results over a Mellanox NIC, using vDPA for 1 VQ, queue size 1024, before and after the change, with the iperf server running over the virtio-net interface. MTU(Bytes)/Bandwidth (Gbit/s) Before After 1500 22.5 22.4 9000 12.8 25.9 And result of queue size 256. MTU(Bytes)/Bandwidth (Gbit/s) Before After 9000 2.15 11.9 With this patch no degradation is observed with multiple below tests and feature bit combinations. Results are summarized below for q depth of 1024. Interface MTU is 1500 if MTU feature is disabled. MTU is set to 9000 in other tests. Features/ Bandwidth (Gbit/s) Before After mtu off 20.1 20.2 mtu/indirect on 17.4 17.3 mtu/indirect/packed on 17.2 17.2 Signed-off-by: Gavin Li <gavinl@nvidia.com> Reviewed-by: Gavi Teitz <gavi@nvidia.com> Reviewed-by: Parav Pandit <parav@nvidia.com> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Reviewed-by: Si-Wei Liu <si-wei.liu@oracle.com> Message-Id: <20220914144911.56422-3-gavinl@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>
|
#
46cd26f4 |
|
14-Sep-2022 |
Gavin Li <gavinl@nvidia.com> |
virtio-net: introduce and use helper function for guest gso support checks Probe routine is already several hundred lines. Use helper function for guest gso support check. Signed-off-by: Gavin Li <gavinl@nvidia.com> Reviewed-by: Gavi Teitz <gavi@nvidia.com> Reviewed-by: Parav Pandit <parav@nvidia.com> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Reviewed-by: Si-Wei Liu <si-wei.liu@oracle.com> Message-Id: <20220914144911.56422-2-gavinl@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
|
#
fb3ceec1 |
|
30-Aug-2022 |
Wolfram Sang <wsa+renesas@sang-engineering.com> |
net: move from strlcpy with unused retval to strscpy Follow the advice of the below link and prefer 'strscpy' in this subsystem. Conversion is 1:1 because the return value is not used. Generated by a coccinelle script. Link: https://lore.kernel.org/r/CAHk-=wgfRnXz0W3D37d01q3JFkr_i_uTL=V6A6G1oUZcprmknw@mail.gmail.com/ Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Acked-by: Marc Kleine-Budde <mkl@pengutronix.de> # for CAN Link: https://lore.kernel.org/r/20220830201457.7984-1-wsa+renesas@sang-engineering.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
95bb6330 |
|
11-Aug-2022 |
Michael S. Tsirkin <mst@redhat.com> |
virtio_net: fix endian-ness for RSS Using native endian-ness for device supplied fields is wrong on BE platforms. Sparse warns about this. Fixes: 91f41f01d219 ("drivers/net/virtio_net: Added RSS hash report.") Cc: "Andrew Melnychenko" <andrew@daynix.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
2e9ca760 |
|
15-Aug-2022 |
Michael S. Tsirkin <mst@redhat.com> |
virtio_net: Revert "virtio_net: set the default max ring size by find_vqs()" This reverts commit 762faee5a2678559d3dc09d95f8f2c54cd0466a7. This has been reported to trip up guests on GCP (Google Cloud). The reason is that virtio_find_vqs_ctx_size is broken on legacy devices. We can in theory fix virtio_find_vqs_ctx_size but in fact the patch itself has several other issues: - It treats unknown speed as < 10G - It leaves userspace no way to find out the ring size set by hypervisor - It tests speed when link is down - It ignores the virtio spec advice: Both \field{speed} and \field{duplex} can change, thus the driver is expected to re-read these values after receiving a configuration change notification. - It is not clear the performance impact has been tested properly Revert the patch for now. Reported-by: Andres Freund <andres@anarazel.de> Link: https://lore.kernel.org/r/20220814212610.GA3690074%40roeck-us.net Link: https://lore.kernel.org/r/20220815070203.plwjx7b3cyugpdt7%40awork3.anarazel.de Link: https://lore.kernel.org/r/3df6bb82-1951-455d-a768-e9e1513eb667%40www.fastmail.com Link: https://lore.kernel.org/r/FCDC5DDE-3CDD-4B8A-916F-CA7D87B547CE%40anarazel.de Fixes: 762faee5a267 ("virtio_net: set the default max ring size by find_vqs()") Cc: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Cc: Jason Wang <jasowang@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Tested-by: Andres Freund <andres@anarazel.de> Tested-by: Guenter Roeck <linux@roeck-us.net> Message-Id: <20220816053602.173815-2-mst@redhat.com>
|
#
699b045a |
|
17-Jul-2022 |
Alvaro Karsz <alvaro.karsz@solid-run.com> |
net: virtio_net: notifications coalescing support New VirtIO network feature: VIRTIO_NET_F_NOTF_COAL. Control a Virtio network device notifications coalescing parameters using the control virtqueue. A device that supports this fetature can receive VIRTIO_NET_CTRL_NOTF_COAL control commands. - VIRTIO_NET_CTRL_NOTF_COAL_TX_SET: Ask the network device to change the following parameters: - tx_usecs: Maximum number of usecs to delay a TX notification. - tx_max_packets: Maximum number of packets to send before a TX notification. - VIRTIO_NET_CTRL_NOTF_COAL_RX_SET: Ask the network device to change the following parameters: - rx_usecs: Maximum number of usecs to delay a RX notification. - rx_max_packets: Maximum number of packets to receive before a RX notification. VirtIO spec. patch: https://lists.oasis-open.org/archives/virtio-comment/202206/msg00100.html Signed-off-by: Alvaro Karsz <alvaro.karsz@solid-run.com> Message-Id: <20220718091102.498774-1-alvaro.karsz@solid-run.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Acked-by: Jason Wang <jasowang@redhat.com>
|
#
a335b33f |
|
01-Aug-2022 |
Xuan Zhuo <xuanzhuo@linux.alibaba.com> |
virtio_net: support set_ringparam Support set_ringparam based on virtio queue reset. Users can use ethtool -G eth0 <ring_num> to modify the ring size of virtio-net. Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20220801063902.129329-43-xuanzhuo@linux.alibaba.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
|
#
ebcce492 |
|
01-Aug-2022 |
Xuan Zhuo <xuanzhuo@linux.alibaba.com> |
virtio_net: support tx queue resize This patch implements the resize function of the tx queues. Based on this function, it is possible to modify the ring num of the queue. Inludes fixup: virtio_net: fix for stuck when change tx ring size with dev down When dev is set to DOWN state, napi has been disabled, if we modify the ring size at this time, we should not call napi_disable() again, which will cause stuck. And all operations are under the protection of rtnl_lock, so there is no need to consider concurrency issues. Message-Id: <20220801063902.129329-42-xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20220811080258.79398-3-xuanzhuo@linux.alibaba.com> Reported-by: Kangjie Xu <kangjie.xu@linux.alibaba.com> Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
|
#
6a4763e2 |
|
01-Aug-2022 |
Xuan Zhuo <xuanzhuo@linux.alibaba.com> |
virtio_net: support rx queue resize This patch implements the resize function of the rx queues. Based on this function, it is possible to modify the ring num of the queue. Includes fixup: virtio_net: fix for stuck when change rx ring size with dev down When dev is set to DOWN state, napi has been disabled, if we modify the ring size at this time, we should not call napi_disable() again, which will cause stuck. And all operations are under the protection of rtnl_lock, so there is no need to consider concurrency issues. Message-Id: <20220801063902.129329-41-xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20220811080258.79398-2-xuanzhuo@linux.alibaba.com> Reported-by: Kangjie Xu <kangjie.xu@linux.alibaba.com> Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
|
#
6e345f8c |
|
01-Aug-2022 |
Xuan Zhuo <xuanzhuo@linux.alibaba.com> |
virtio_net: split free_unused_bufs() This patch separates two functions for freeing sq buf and rq buf from free_unused_bufs(). When supporting the enable/disable tx/rq queue in the future, it is necessary to support separate recovery of a sq buf or a rq buf. Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20220801063902.129329-40-xuanzhuo@linux.alibaba.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
|
#
8597b5dd |
|
01-Aug-2022 |
Xuan Zhuo <xuanzhuo@linux.alibaba.com> |
virtio_net: get ringparam by virtqueue_get_vring_max_size() Use virtqueue_get_vring_max_size() in virtnet_get_ringparam() to set tx,rx_max_pending. Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20220801063902.129329-39-xuanzhuo@linux.alibaba.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
|
#
762faee5 |
|
01-Aug-2022 |
Xuan Zhuo <xuanzhuo@linux.alibaba.com> |
virtio_net: set the default max ring size by find_vqs() Use virtio_find_vqs_ctx_size() to specify the maximum ring size of tx, rx at the same time. | rx/tx ring size ------------------------------------------- speed == UNKNOWN or < 10G| 1024 speed < 40G | 4096 speed >= 40G | 8192 Call virtnet_update_settings() once before calling init_vqs() to update speed. Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20220801063902.129329-38-xuanzhuo@linux.alibaba.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
|
#
7a542bee |
|
04-Aug-2022 |
Xuan Zhuo <xuanzhuo@linux.alibaba.com> |
virtio_net: fix memory leak inside XPD_TX with mergeable When we call xdp_convert_buff_to_frame() to get xdpf, if it returns NULL, we should check if xdp_page was allocated by xdp_linearize_page(). If it is newly allocated, it should be freed here alone. Just like any other "goto err_xdp". Fixes: 44fa2dbd4759 ("xdp: transition into using xdp_frame for ndo_xdp_xmit") Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
5a159128 |
|
25-Jul-2022 |
Jason Wang <jasowang@redhat.com> |
virtio-net: fix the race between refill work and close We try using cancel_delayed_work_sync() to prevent the work from enabling NAPI. This is insufficient since we don't disable the source of the refill work scheduling. This means an NAPI poll callback after cancel_delayed_work_sync() can schedule the refill work then can re-enable the NAPI that leads to use-after-free [1]. Since the work can enable NAPI, we can't simply disable NAPI before calling cancel_delayed_work_sync(). So fix this by introducing a dedicated boolean to control whether or not the work could be scheduled from NAPI. [1] ================================================================== BUG: KASAN: use-after-free in refill_work+0x43/0xd4 Read of size 2 at addr ffff88810562c92e by task kworker/2:1/42 CPU: 2 PID: 42 Comm: kworker/2:1 Not tainted 5.19.0-rc1+ #480 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 Workqueue: events refill_work Call Trace: <TASK> dump_stack_lvl+0x34/0x44 print_report.cold+0xbb/0x6ac ? _printk+0xad/0xde ? refill_work+0x43/0xd4 kasan_report+0xa8/0x130 ? refill_work+0x43/0xd4 refill_work+0x43/0xd4 process_one_work+0x43d/0x780 worker_thread+0x2a0/0x6f0 ? process_one_work+0x780/0x780 kthread+0x167/0x1a0 ? kthread_exit+0x50/0x50 ret_from_fork+0x22/0x30 </TASK> ... Fixes: b2baed69e605c ("virtio_net: set/cancel work on ndo_open/ndo_stop") Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
50c0ada6 |
|
17-Jun-2022 |
Jason Wang <jasowang@redhat.com> |
virtio-net: fix race between ndo_open() and virtio_device_ready() We currently call virtio_device_ready() after netdev registration. Since ndo_open() can be called immediately after register_netdev, this means there exists a race between ndo_open() and virtio_device_ready(): the driver may start to use the device before DRIVER_OK which violates the spec. Fix this by switching to use register_netdevice() and protect the virtio_device_ready() with rtnl_lock() to make sure ndo_open() can only be called after virtio_device_ready(). Fixes: 4baf1e33d0842 ("virtio_net: enable VQs early") Signed-off-by: Jason Wang <jasowang@redhat.com> Message-Id: <20220617072949.30734-1-jasowang@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
|
#
8af52fe9 |
|
21-Jun-2022 |
Stephan Gerhold <stephan.gerhold@kernkonzept.com> |
virtio_net: fix xdp_rxq_info bug after suspend/resume The following sequence currently causes a driver bug warning when using virtio_net: # ip link set eth0 up # echo mem > /sys/power/state (or e.g. # rtcwake -s 10 -m mem) <resume> # ip link set eth0 down Missing register, driver bug WARNING: CPU: 0 PID: 375 at net/core/xdp.c:138 xdp_rxq_info_unreg+0x58/0x60 Call trace: xdp_rxq_info_unreg+0x58/0x60 virtnet_close+0x58/0xac __dev_close_many+0xac/0x140 __dev_change_flags+0xd8/0x210 dev_change_flags+0x24/0x64 do_setlink+0x230/0xdd0 ... This happens because virtnet_freeze() frees the receive_queue completely (including struct xdp_rxq_info) but does not call xdp_rxq_info_unreg(). Similarly, virtnet_restore() sets up the receive_queue again but does not call xdp_rxq_info_reg(). Actually, parts of virtnet_freeze_down() and virtnet_restore_up() are almost identical to virtnet_close() and virtnet_open(): only the calls to xdp_rxq_info_(un)reg() are missing. This means that we can fix this easily and avoid such problems in the future by just calling virtnet_close()/open() from the freeze/restore handlers. Aside from adding the missing xdp_rxq_info calls the only difference is that the refill work is only cancelled if netif_running(). However, this should not make any functional difference since the refill work should only be active if the network interface is actually up. Fixes: 754b8a21a96d ("virtio_net: setup xdp_rxq_info") Signed-off-by: Stephan Gerhold <stephan.gerhold@kernkonzept.com> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Link: https://lore.kernel.org/r/20220621114845.3650258-1-stephan.gerhold@kernkonzept.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
d484735d |
|
06-May-2022 |
Jakub Kicinski <kuba@kernel.org> |
net: virtio: switch to netif_napi_add_weight() virtio netdev driver uses a custom napi weight, switch to the new API for setting custom weight. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
8d602e1a |
|
04-May-2022 |
Jakub Kicinski <kuba@kernel.org> |
net: move snowflake callers to netif_napi_add_tx_weight() Make the drivers with custom tx napi weight call netif_napi_add_tx_weight(). Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Link: https://lore.kernel.org/r/20220504163725.550782-2-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
acb16b39 |
|
25-Apr-2022 |
Nikolay Aleksandrov <razor@blackwall.org> |
virtio_net: fix wrong buf address calculation when using xdp We received a report[1] of kernel crashes when Cilium is used in XDP mode with virtio_net after updating to newer kernels. After investigating the reason it turned out that when using mergeable bufs with an XDP program which adjusts xdp.data or xdp.data_meta page_to_buf() calculates the build_skb address wrong because the offset can become less than the headroom so it gets the address of the previous page (-X bytes depending on how lower offset is): page_to_skb: page addr ffff9eb2923e2000 buf ffff9eb2923e1ffc offset 252 headroom 256 This is a pr_err() I added in the beginning of page_to_skb which clearly shows offset that is less than headroom by adding 4 bytes of metadata via an xdp prog. The calculations done are: receive_mergeable(): headroom = VIRTIO_XDP_HEADROOM; // VIRTIO_XDP_HEADROOM == 256 bytes offset = xdp.data - page_address(xdp_page) - vi->hdr_len - metasize; page_to_skb(): p = page_address(page) + offset; ... buf = p - headroom; Now buf goes -4 bytes from the page's starting address as can be seen above which is set as skb->head and skb->data by build_skb later. Depending on what's done with the skb (when it's freed most often) we get all kinds of corruptions and BUG_ON() triggers in mm[2]. We have to recalculate the new headroom after the xdp program has run, similar to how offset and len are recalculated. Headroom is directly related to data_hard_start, data and data_meta, so we use them to get the new size. The result is correct (similar pr_err() in page_to_skb, one case of xdp_page and one case of virtnet buf): a) Case with 4 bytes of metadata [ 115.949641] page_to_skb: page addr ffff8b4dcfad2000 offset 252 headroom 252 [ 121.084105] page_to_skb: page addr ffff8b4dcf018000 offset 20732 headroom 252 b) Case of pushing data +32 bytes [ 153.181401] page_to_skb: page addr ffff8b4dd0c4d000 offset 288 headroom 288 [ 158.480421] page_to_skb: page addr ffff8b4dd00b0000 offset 24864 headroom 288 c) Case of pushing data -33 bytes [ 835.906830] page_to_skb: page addr ffff8b4dd3270000 offset 223 headroom 223 [ 840.839910] page_to_skb: page addr ffff8b4dcdd68000 offset 12511 headroom 223 Offset and headroom are equal because offset points to the start of reserved bytes for the virtio_net header which are at buf start + headroom, while data points at buf start + vnet hdr size + headroom so when data or data_meta are adjusted by the xdp prog both the headroom size and the offset change equally. We can use data_hard_start to compute the new headroom after the xdp prog (linearized / page start case, the virtnet buf case is similar just with bigger base offset): xdp.data_hard_start = page_address + vnet_hdr xdp.data = page_address + vnet_hdr + headroom new headroom after xdp prog = xdp.data - xdp.data_hard_start - metasize An example reproducer xdp prog[3] is below. [1] https://github.com/cilium/cilium/issues/19453 [2] Two of the many traces: [ 40.437400] BUG: Bad page state in process swapper/0 pfn:14940 [ 40.916726] BUG: Bad page state in process systemd-resolve pfn:053b7 [ 41.300891] kernel BUG at include/linux/mm.h:720! [ 41.301801] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI [ 41.302784] CPU: 1 PID: 1181 Comm: kubelet Kdump: loaded Tainted: G B W 5.18.0-rc1+ #37 [ 41.304458] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.15.0-1.fc35 04/01/2014 [ 41.306018] RIP: 0010:page_frag_free+0x79/0xe0 [ 41.306836] Code: 00 00 75 ea 48 8b 07 a9 00 00 01 00 74 e0 48 8b 47 48 48 8d 50 ff a8 01 48 0f 45 fa eb d0 48 c7 c6 18 b8 30 a6 e8 d7 f8 fc ff <0f> 0b 48 8d 78 ff eb bc 48 8b 07 a9 00 00 01 00 74 3a 66 90 0f b6 [ 41.310235] RSP: 0018:ffffac05c2a6bc78 EFLAGS: 00010292 [ 41.311201] RAX: 000000000000003e RBX: 0000000000000000 RCX: 0000000000000000 [ 41.312502] RDX: 0000000000000001 RSI: ffffffffa6423004 RDI: 00000000ffffffff [ 41.313794] RBP: ffff993c98823600 R08: 0000000000000000 R09: 00000000ffffdfff [ 41.315089] R10: ffffac05c2a6ba68 R11: ffffffffa698ca28 R12: ffff993c98823600 [ 41.316398] R13: ffff993c86311ebc R14: 0000000000000000 R15: 000000000000005c [ 41.317700] FS: 00007fe13fc56740(0000) GS:ffff993cdd900000(0000) knlGS:0000000000000000 [ 41.319150] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 41.320152] CR2: 000000c00008a000 CR3: 0000000014908000 CR4: 0000000000350ee0 [ 41.321387] Call Trace: [ 41.321819] <TASK> [ 41.322193] skb_release_data+0x13f/0x1c0 [ 41.322902] __kfree_skb+0x20/0x30 [ 41.343870] tcp_recvmsg_locked+0x671/0x880 [ 41.363764] tcp_recvmsg+0x5e/0x1c0 [ 41.384102] inet_recvmsg+0x42/0x100 [ 41.406783] ? sock_recvmsg+0x1d/0x70 [ 41.428201] sock_read_iter+0x84/0xd0 [ 41.445592] ? 0xffffffffa3000000 [ 41.462442] new_sync_read+0x148/0x160 [ 41.479314] ? 0xffffffffa3000000 [ 41.496937] vfs_read+0x138/0x190 [ 41.517198] ksys_read+0x87/0xc0 [ 41.535336] do_syscall_64+0x3b/0x90 [ 41.551637] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 41.568050] RIP: 0033:0x48765b [ 41.583955] Code: e8 4a 35 fe ff eb 88 cc cc cc cc cc cc cc cc e8 fb 7a fe ff 48 8b 7c 24 10 48 8b 74 24 18 48 8b 54 24 20 48 8b 44 24 08 0f 05 <48> 3d 01 f0 ff ff 76 20 48 c7 44 24 28 ff ff ff ff 48 c7 44 24 30 [ 41.632818] RSP: 002b:000000c000a2f5b8 EFLAGS: 00000212 ORIG_RAX: 0000000000000000 [ 41.664588] RAX: ffffffffffffffda RBX: 000000c000062000 RCX: 000000000048765b [ 41.681205] RDX: 0000000000005e54 RSI: 000000c000e66000 RDI: 0000000000000016 [ 41.697164] RBP: 000000c000a2f608 R08: 0000000000000001 R09: 00000000000001b4 [ 41.713034] R10: 00000000000000b6 R11: 0000000000000212 R12: 00000000000000e9 [ 41.728755] R13: 0000000000000001 R14: 000000c000a92000 R15: ffffffffffffffff [ 41.744254] </TASK> [ 41.758585] Modules linked in: br_netfilter bridge veth netconsole virtio_net and [ 33.524802] BUG: Bad page state in process systemd-network pfn:11e60 [ 33.528617] page ffffe05dc0147b00 ffffe05dc04e7a00 ffff8ae9851ec000 (1) len 82 offset 252 metasize 4 hroom 0 hdr_len 12 data ffff8ae9851ec10c data_meta ffff8ae9851ec108 data_end ffff8ae9851ec14e [ 33.529764] page:000000003792b5ba refcount:0 mapcount:-512 mapping:0000000000000000 index:0x0 pfn:0x11e60 [ 33.532463] flags: 0xfffffc0000000(node=0|zone=1|lastcpupid=0x1fffff) [ 33.532468] raw: 000fffffc0000000 0000000000000000 dead000000000122 0000000000000000 [ 33.532470] raw: 0000000000000000 0000000000000000 00000000fffffdff 0000000000000000 [ 33.532471] page dumped because: nonzero mapcount [ 33.532472] Modules linked in: br_netfilter bridge veth netconsole virtio_net [ 33.532479] CPU: 0 PID: 791 Comm: systemd-network Kdump: loaded Not tainted 5.18.0-rc1+ #37 [ 33.532482] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.15.0-1.fc35 04/01/2014 [ 33.532484] Call Trace: [ 33.532496] <TASK> [ 33.532500] dump_stack_lvl+0x45/0x5a [ 33.532506] bad_page.cold+0x63/0x94 [ 33.532510] free_pcp_prepare+0x290/0x420 [ 33.532515] free_unref_page+0x1b/0x100 [ 33.532518] skb_release_data+0x13f/0x1c0 [ 33.532524] kfree_skb_reason+0x3e/0xc0 [ 33.532527] ip6_mc_input+0x23c/0x2b0 [ 33.532531] ip6_sublist_rcv_finish+0x83/0x90 [ 33.532534] ip6_sublist_rcv+0x22b/0x2b0 [3] XDP program to reproduce(xdp_pass.c): #include <linux/bpf.h> #include <bpf/bpf_helpers.h> SEC("xdp_pass") int xdp_pkt_pass(struct xdp_md *ctx) { bpf_xdp_adjust_head(ctx, -(int)32); return XDP_PASS; } char _license[] SEC("license") = "GPL"; compile: clang -O2 -g -Wall -target bpf -c xdp_pass.c -o xdp_pass.o load on virtio_net: ip link set enp1s0 xdpdrv obj xdp_pass.o sec xdp_pass CC: stable@vger.kernel.org CC: Jason Wang <jasowang@redhat.com> CC: Xuan Zhuo <xuanzhuo@linux.alibaba.com> CC: Daniel Borkmann <daniel@iogearbox.net> CC: "Michael S. Tsirkin" <mst@redhat.com> CC: virtualization@lists.linux-foundation.org Fixes: 8fb7da9e9907 ("virtio_net: get build_skb() buf by data ptr") Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Link: https://lore.kernel.org/r/20220425103703.3067292-1-razor@blackwall.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
#
c1170820 |
|
28-Mar-2022 |
Andrew Melnychenko <andrew@daynix.com> |
drivers/net/virtio_net: Added RSS hash report control. Now it's possible to control supported hashflows. Added hashflow set/get callbacks. Also, disabling RXH_IP_SRC/DST for TCP would disable then for UDP. TCP and UDP supports only: ethtool -U eth0 rx-flow-hash tcp4 sd RXH_IP_SRC + RXH_IP_DST ethtool -U eth0 rx-flow-hash tcp4 sdfn RXH_IP_SRC + RXH_IP_DST + RXH_L4_B_0_1 + RXH_L4_B_2_3 Disabling happens because VirtioNET hashtype for IP doesn't check L4 proto, it works for all IP packets(TCP, UDP, ICMP, etc.). For TCP and UDP, it's possible to set IP+PORT hashes. But disabling IP hashes will disable them for TCP and UDP simultaneously. It's possible to set IP+PORT for TCP/UDP and disable/enable IP for everything else(UDP, ICMP, etc.). Signed-off-by: Andrew Melnychenko <andrew@daynix.com> Link: https://lore.kernel.org/r/20220328175336.10802-5-andrew@daynix.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
|
#
91f41f01 |
|
28-Mar-2022 |
Andrew Melnychenko <andrew@daynix.com> |
drivers/net/virtio_net: Added RSS hash report. Added features for RSS hash report. If hash is provided - it sets to skb. Added checks if rss and/or hash are enabled together. Signed-off-by: Andrew Melnychenko <andrew@daynix.com> Link: https://lore.kernel.org/r/20220328175336.10802-4-andrew@daynix.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
|
#
c7114b12 |
|
28-Mar-2022 |
Andrew Melnychenko <andrew@daynix.com> |
drivers/net/virtio_net: Added basic RSS support. Added features for RSS. Added initialization, RXHASH feature and ethtool ops. By default RSS/RXHASH is disabled. Virtio RSS "IPv6 extensions" hashes disabled. Added ethtools ops to set key and indirection table. Signed-off-by: Andrew Melnychenko <andrew@daynix.com> Link: https://lore.kernel.org/r/20220328175336.10802-3-andrew@daynix.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
|
#
c1ddc42d |
|
28-Mar-2022 |
Andrew Melnychenko <andrew@daynix.com> |
drivers/net/virtio_net: Fixed padded vheader to use v1 with hash. The header v1 provides additional info about RSS. Added changes to computing proper header length. In the next patches, the header may contain RSS hash info for the hash population. Signed-off-by: Andrew Melnychenko <andrew@daynix.com> Link: https://lore.kernel.org/r/20220328175336.10802-2-andrew@daynix.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
|
#
4f50ef15 |
|
13-Feb-2022 |
Michael Catanzaro <mcatanzaro.kernel@gmail.com> |
virtio_net: Fix code indent error This patch fixes the checkpatch.pl warning: ERROR: code indent should use tabs where possible #3453: FILE: drivers/net/virtio_net.c:3453: ret = register_virtio_driver(&virtio_net_driver);$ Uneccessary newline was also removed making line 3453 now 3452. Signed-off-by: Michael Catanzaro <mcatanzaro.kernel@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
9b51d9d8 |
|
14-Aug-2021 |
Yury Norov <yury.norov@gmail.com> |
cpumask: replace cpumask_next_* with cpumask_first_* where appropriate cpumask_first() is a more effective analogue of 'next' version if n == -1 (which means start == 0). This patch replaces 'next' with 'first' where things look trivial. There's no cpumask_first_zero() function, so create it. Signed-off-by: Yury Norov <yury.norov@gmail.com> Tested-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
|
#
d9679d00 |
|
13-Oct-2021 |
Michael S. Tsirkin <mst@redhat.com> |
virtio: wrap config->reset calls This will enable cleanups down the road. The idea is to disable cbs, then add "flush_queued_cbs" callback as a parameter, this way drivers can flush any work queued after callbacks have been disabled. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Link: https://lore.kernel.org/r/20211013105226.20225-1-mst@redhat.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
|
#
c8064e5b |
|
30-Nov-2021 |
Paolo Abeni <pabeni@redhat.com> |
bpf: Let bpf_warn_invalid_xdp_action() report more info In non trivial scenarios, the action id alone is not sufficient to identify the program causing the warning. Before the previous patch, the generated stack-trace pointed out at least the involved device driver. Let's additionally include the program name and id, and the relevant device name. If the user needs additional infos, he can fetch them via a kernel probe, leveraging the arguments added here. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Toke Høiland-Jørgensen <toke@redhat.com> Link: https://lore.kernel.org/bpf/ddb96bb975cbfddb1546cf5da60e77d5100b533c.1638189075.git.pabeni@redhat.com
|
#
74624944 |
|
18-Nov-2021 |
Hao Chen <chenhao288@hisilicon.com> |
ethtool: extend ringparam setting/getting API with rx_buf_len Add two new parameters kernel_ringparam and extack for .get_ringparam and .set_ringparam to extend more ring params through netlink. Signed-off-by: Hao Chen <chenhao288@hisilicon.com> Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
5337824f |
|
16-Nov-2021 |
Eric Dumazet <edumazet@google.com> |
net: annotate accesses to queue->trans_start In following patches, dev_watchdog() will no longer stop all queues. It will read queue->trans_start locklessly. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
053c9e18 |
|
15-Dec-2021 |
Wenliang Wang <wangwenliang.1995@bytedance.com> |
virtio_net: fix rx_drops stat for small pkts We found the stat of rx drops for small pkts does not increment when build_skb fail, it's not coherent with other mode's rx drops stat. Signed-off-by: Wenliang Wang <wangwenliang.1995@bytedance.com> Acked-by: Jason Wang <jasowang@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
fcfb65f8 |
|
24-Nov-2021 |
Michael S. Tsirkin <mst@redhat.com> |
Revert "virtio-net: don't let virtio core to validate used length" This reverts commit 816625c13652cef5b2c49082d652875da6f2ad7a. Attempts to validate length in the core did not work out. We'll drop them, so revert the dependent changes in drivers. Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
|
#
816625c1 |
|
26-Oct-2021 |
Jason Wang <jasowang@redhat.com> |
virtio-net: don't let virtio core to validate used length For RX virtuqueue, the used length is validated in all the three paths (big, small and mergeable). For control vq, we never tries to use used length. So this patch forbids the core to validate the used length. Signed-off-by: Jason Wang <jasowang@redhat.com> Link: https://lore.kernel.org/r/20211027022107.14357-3-jasowang@redhat.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
|
#
fc02e8cb |
|
09-Oct-2021 |
Michael S. Tsirkin <mst@redhat.com> |
virtio_net: clarify tailroom logic Make tailroom math follow same logic as everything else, subtracing values in the order in which things are laid out in the buffer. Tested-by: Corentin Noël <corentin.noel@collabora.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
|
#
f2edaa4a |
|
27-Oct-2021 |
Jakub Kicinski <kuba@kernel.org> |
net: virtio: use eth_hw_addr_set() Commit 406f42fa0d3c ("net-next: When a bond have a massive amount of VLANs...") introduced a rbtree for faster Ethernet address look up. To maintain netdev->dev_addr in this tree we need to make all the writes to it go through appropriate helpers. Even though the current code uses dev->addr_len the we can switch to eth_hw_addr_set() instead of dev_addr_set(). The netdev is always allocated by alloc_etherdev_mq() and there are at least two places which assume Ethernet address: - the line below calling eth_hw_addr_random() - virtnet_set_mac_address() -> eth_commit_mac_addr_change() Acked-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Link: https://lore.kernel.org/r/20211027152012.3393077-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
6213f07c |
|
09-Oct-2021 |
Li RongQing <lirongqing@baidu.com> |
virtio_net: skip RCU read lock by checking xdp_enabled of vi networking benchmark shows that __rcu_read_lock and __rcu_read_unlock takes some cpu cycles, and we can avoid calling them partially in virtio rx path by check xdp_enabled of vi, and xdp is disabled most of time Signed-off-by: Li RongQing <lirongqing@baidu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
a520794b |
|
17-Sep-2021 |
Tony Lu <tony.ly@linux.alibaba.com> |
virtio_net: introduce TX timeout watchdog This implements ndo_tx_timeout handler and put this into stats. When there is something wrong to send out packets, we could notice tx timeout events and total timeout counter. We have suffered send timeout issues due to the backends hung. With this, we can find the details, and collect the counters by monitor systems. Signed-off-by: Tony Lu <tony.ly@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
9ce4e3d6 |
|
18-Sep-2021 |
Xuan Zhuo <xuanzhuo@linux.alibaba.com> |
virtio_net: use netdev_warn_once to output warn when without enough queues This warning is output when virtnet does not have enough queues, but it only needs to be printed once to inform the user of this situation. It is not necessary to print it every time. If the user loads xdp frequently, this log appears too much. Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
732b74d6 |
|
09-Oct-2021 |
Xuan Zhuo <xuanzhuo@linux.alibaba.com> |
virtio-net: fix for skb_over_panic inside big mode commit 126285651b7f ("Merge ra.kernel.org:/pub/scm/linux/kernel/git/netdev/net") accidentally reverted the effect of commit 1a8024239da ("virtio-net: fix for skb_over_panic inside big mode") on drivers/net/virtio_net.c As a result, users of crosvm (which is using large packet mode) are experiencing crashes with 5.14-rc1 and above that do not occur with 5.13. Crash trace: [ 61.346677] skbuff: skb_over_panic: text:ffffffff881ae2c7 len:3762 put:3762 head:ffff8a5ec8c22000 data:ffff8a5ec8c22010 tail:0xec2 end:0xec0 dev:<NULL> [ 61.369192] kernel BUG at net/core/skbuff.c:111! [ 61.372840] invalid opcode: 0000 [#1] SMP PTI [ 61.374892] CPU: 5 PID: 0 Comm: swapper/5 Not tainted 5.14.0-rc1 linux-v5.14-rc1-for-mesa-ci.tar.bz2 #1 [ 61.376450] Hardware name: ChromiumOS crosvm, BIOS 0 .. [ 61.393635] Call Trace: [ 61.394127] <IRQ> [ 61.394488] skb_put.cold+0x10/0x10 [ 61.395095] page_to_skb+0xf7/0x410 [ 61.395689] receive_buf+0x81/0x1660 [ 61.396228] ? netif_receive_skb_list_internal+0x1ad/0x2b0 [ 61.397180] ? napi_gro_flush+0x97/0xe0 [ 61.397896] ? detach_buf_split+0x67/0x120 [ 61.398573] virtnet_poll+0x2cf/0x420 [ 61.399197] __napi_poll+0x25/0x150 [ 61.399764] net_rx_action+0x22f/0x280 [ 61.400394] __do_softirq+0xba/0x257 [ 61.401012] irq_exit_rcu+0x8e/0xb0 [ 61.401618] common_interrupt+0x7b/0xa0 [ 61.402270] </IRQ> See https://lore.kernel.org/r/5edaa2b7c2fe4abd0347b8454b2ac032b6694e2c.camel%40collabora.com for the report. Apply the original 1a8024239da ("virtio-net: fix for skb_over_panic inside big mode") again, the original logic still holds: In virtio-net's large packet mode, there is a hole in the space behind buf. hdr_padded_len - hdr_len We must take this into account when calculating tailroom. Cc: Greg KH <gregkh@linuxfoundation.org> Fixes: fb32856b16ad ("virtio-net: page_to_skb() use build_skb when there's sufficient tailroom") Fixes: 126285651b7f ("Merge ra.kernel.org:/pub/scm/linux/kernel/git/netdev/net") Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Reported-by: Corentin Noël <corentin.noel@collabora.com> Tested-by: Corentin Noël <corentin.noel@collabora.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
afd92d82 |
|
17-Sep-2021 |
Jason Wang <jasowang@redhat.com> |
virtio-net: fix pages leaking when building skb in big mode We try to use build_skb() if we had sufficient tailroom. But we forget to release the unused pages chained via private in big mode which will leak pages. Fixing this by release the pages after building the skb in big mode. Cc: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Fixes: fb32856b16ad ("virtio-net: page_to_skb() use build_skb when there's sufficient tailroom") Signed-off-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
3dcc1edc |
|
26-Aug-2021 |
Li RongQing <lirongqing@baidu.com> |
virtio_net: reduce raw_smp_processor_id() calling in virtnet_xdp_get_sq smp_processor_id()/raw* will be called once each when not more queues in virtnet_xdp_get_sq() which is called in non-preemptible context, so it's safe to call the function smp_processor_id() once. Signed-off-by: Li RongQing <lirongqing@baidu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
f3ccfda1 |
|
20-Aug-2021 |
Yufeng Mo <moyufeng@huawei.com> |
ethtool: extend coalesce setting uAPI with CQE mode In order to support more coalesce parameters through netlink, add two new parameter kernel_coal and extack for .set_coalesce and .get_coalesce, then some extra info can return to user with the netlink API. Signed-off-by: Yufeng Mo <moyufeng@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
a0d1d0f4 |
|
03-Aug-2021 |
Sebastian Andrzej Siewior <bigeasy@linutronix.de> |
virtio_net: Replace deprecated CPU-hotplug functions. The functions get_online_cpus() and put_online_cpus() have been deprecated during the CPU hotplug rework. They map directly to cpus_read_lock() and cpus_read_unlock(). Replace deprecated CPU-hotplug functions with the official version. The behavior remains unchanged. Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Cc: virtualization@lists.linux-foundation.org Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
c32325b8 |
|
02-Aug-2021 |
Jakub Kicinski <kuba@kernel.org> |
virtio-net: realign page_to_skb() after merges We ended up merging two versions of the same patch set: commit 8fb7da9e9907 ("virtio_net: get build_skb() buf by data ptr") commit 5c37711d9f27 ("virtio-net: fix for unable to handle page fault for address") into net, and commit 7bf64460e3b2 ("virtio-net: get build_skb() buf by data ptr") commit 6c66c147b9a4 ("virtio-net: fix for unable to handle page fault for address") into net-next. Redo the merge from commit 126285651b7f ("Merge ra.kernel.org:/pub/scm/linux/kernel/git/netdev/net"), so that the most recent code remains. Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
dbcf24d1 |
|
17-Aug-2021 |
Jason Wang <jasowang@redhat.com> |
virtio-net: use NETIF_F_GRO_HW instead of NETIF_F_LRO Commit a02e8964eaf92 ("virtio-net: ethtool configurable LRO") maps LRO to virtio guest offloading features and allows the administrator to enable and disable those features via ethtool. This leads to several issues: - For a device that doesn't support control guest offloads, the "LRO" can't be disabled triggering WARN in dev_disable_lro() when turning off LRO or when enabling forwarding bridging etc. - For a device that supports control guest offloads, the guest offloads are disabled in cases of bridging, forwarding etc slowing down the traffic. Fix this by using NETIF_F_GRO_HW instead. Though the spec does not guarantee packets to be re-segmented as the original ones, we can add that to the spec, possibly with a flag for devices to differentiate between GRO and LRO. Further, we never advertised LRO historically before a02e8964eaf92 ("virtio-net: ethtool configurable LRO") and so bridged/forwarded configs effectively always relied on virtio receive offloads behaving like GRO - thus even if this breaks any configs it is at least not a regression. Fixes: a02e8964eaf92 ("virtio-net: ethtool configurable LRO") Acked-by: Michael S. Tsirkin <mst@redhat.com> Reported-by: Ivan <ivan@prestigetransportation.com> Tested-by: Ivan <ivan@prestigetransportation.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
222722bc |
|
09-Jul-2021 |
Yunjian Wang <wangyunjian@huawei.com> |
virtio_net: check virtqueue_add_sgs() return value As virtqueue_add_sgs() can fail, we should check the return value. Addresses-Coverity-ID: 1464439 ("Unchecked return value") Signed-off-by: Yunjian Wang <wangyunjian@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
a7766ef1 |
|
12-Apr-2021 |
Michael S. Tsirkin <mst@redhat.com> |
virtio_net: disable cb aggressively There are currently two cases where we poll TX vq not in response to a callback: start xmit and rx napi. We currently do this with callbacks enabled which can cause extra interrupts from the card. Used not to be a big issue as we run with interrupts disabled but that is no longer the case, and in some cases the rate of spurious interrupts is so high linux detects this and actually kills the interrupt. Fix up by disabling the callbacks before polling the tx vq. Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
|
#
22bc63c5 |
|
12-Apr-2021 |
Michael S. Tsirkin <mst@redhat.com> |
virtio_net: move txq wakeups under tx q lock We currently check num_free outside tx q lock which is unsafe: new packets can arrive meanwhile and there won't be space in the queue. Thus a spurious queue wakeup causing overhead and even packet drops. Move the check under the lock to fix that. Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
|
#
5a2f966d |
|
12-Apr-2021 |
Michael S. Tsirkin <mst@redhat.com> |
virtio_net: move tx vq operation under tx queue lock It's unsafe to operate a vq from multiple threads. Unfortunately this is exactly what we do when invoking clean tx poll from rx napi. Same happens with napi-tx even without the opportunistic cleaning from the receive interrupt: that races with processing the vq in start_xmit. As a fix move everything that deals with the vq to under tx lock. Fixes: b92f1e6751a6 ("virtio-net: transmit napi") Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
|
#
3f2869ca |
|
17-May-2021 |
Xie Yongji <xieyongji@bytedance.com> |
virtio_net: Fix error handling in virtnet_restore() Do some cleanups in virtnet_restore() when virtnet_cpu_notif_add() failed. Signed-off-by: Xie Yongji <xieyongji@bytedance.com> Link: https://lore.kernel.org/r/20210517084516.332-1-xieyongji@bytedance.com Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
|
#
a2f7dc00 |
|
23-Jun-2021 |
Xianting Tian <xianting_tian@126.com> |
virtio_net: Use virtio_find_vqs_ctx() helper virtio_find_vqs_ctx() is defined but never be called currently, it is the right place to use it. Signed-off-by: Xianting Tian <xianting.tian@linux.alibaba.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
85eb1389 |
|
05-Jun-2021 |
Xianting Tian <xianting.tian@linux.alibaba.com> |
virtio_net: Remove BUG() to avoid machine dead We should not directly BUG() when there is hdr error, it is better to output a print when such error happens. Currently, the caller of xmit_skb() already did it. Signed-off-by: Xianting Tian <xianting.tian@linux.alibaba.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
ad993a95 |
|
31-May-2021 |
Xie Yongji <xieyongji@bytedance.com> |
virtio-net: Add validation for used length This adds validation for used length (might come from an untrusted device) to avoid data corruption or loss. Signed-off-by: Xie Yongji <xieyongji@bytedance.com> Acked-by: Jason Wang <jasowang@redhat.com> Link: https://lore.kernel.org/r/20210531135852.113-1-xieyongji@bytedance.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
7bf64460 |
|
13-May-2021 |
Xuan Zhuo <xuanzhuo@linux.alibaba.com> |
virtio-net: get build_skb() buf by data ptr In the case of merge, the page passed into page_to_skb() may be a head page, not the page where the current data is located. So when trying to get the buf where the data is located, you should directly use the pointer(p) to get the address corresponding to the page. At the same time, the offset of the data in the page should also be obtained using offset_in_page(). This patch solves this problem. But if you don’t use this patch, the original code can also run, because if the page is not the page of the current data, the calculated tailroom will be less than 0, and will not enter the logic of build_skb() . The significance of this patch is to modify this logical problem, allowing more situations to use build_skb(). Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
6c66c147 |
|
13-May-2021 |
Xuan Zhuo <xuanzhuo@linux.alibaba.com> |
virtio-net: fix for unable to handle page fault for address In merge mode, when xdp is enabled, if the headroom of buf is smaller than virtnet_get_headroom(), xdp_linearize_page() will be called but the variable of "headroom" is still 0, which leads to wrong logic after entering page_to_skb(). [ 16.600944] BUG: unable to handle page fault for address: ffffecbfff7b43c8[ 16.602175] #PF: supervisor read access in kernel mode [ 16.603350] #PF: error_code(0x0000) - not-present page [ 16.604200] PGD 0 P4D 0 [ 16.604686] Oops: 0000 [#1] SMP PTI [ 16.605306] CPU: 4 PID: 715 Comm: sh Tainted: G B 5.12.0+ #312 [ 16.606429] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/04 [ 16.608217] RIP: 0010:unmap_page_range+0x947/0xde0 [ 16.609014] Code: 00 00 08 00 48 83 f8 01 45 19 e4 41 f7 d4 41 83 e4 03 e9 a4 fd ff ff e8 b7 63 ed ff 4c 89 e0 48 c1 e0 065 [ 16.611863] RSP: 0018:ffffc90002503c58 EFLAGS: 00010286 [ 16.612720] RAX: ffffecbfff7b43c0 RBX: 00007f19f7203000 RCX: ffffffff812ff359 [ 16.613853] RDX: ffff888107778000 RSI: 0000000000000000 RDI: 0000000000000005 [ 16.614976] RBP: ffffea000425e000 R08: 0000000000000000 R09: 3030303030303030 [ 16.616124] R10: ffffffff82ed7d94 R11: 6637303030302052 R12: 7c00000afffded0f [ 16.617276] R13: 0000000000000001 R14: ffff888119ee7010 R15: 00007f19f7202000 [ 16.618423] FS: 0000000000000000(0000) GS:ffff88842fd00000(0000) knlGS:0000000000000000 [ 16.619738] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 16.620670] CR2: ffffecbfff7b43c8 CR3: 0000000103220005 CR4: 0000000000370ee0 [ 16.621792] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 16.622920] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 16.624047] Call Trace: [ 16.624525] ? release_pages+0x24d/0x730 [ 16.625209] unmap_single_vma+0xa9/0x130 [ 16.625885] unmap_vmas+0x76/0xf0 [ 16.626480] exit_mmap+0xa0/0x210 [ 16.627129] mmput+0x67/0x180 [ 16.627673] do_exit+0x3d1/0xf10 [ 16.628259] ? do_user_addr_fault+0x231/0x840 [ 16.629000] do_group_exit+0x53/0xd0 [ 16.629631] __x64_sys_exit_group+0x1d/0x20 [ 16.630354] do_syscall_64+0x3c/0x80 [ 16.630988] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 16.631828] RIP: 0033:0x7f1a043d0191 [ 16.632464] Code: Unable to access opcode bytes at RIP 0x7f1a043d0167. [ 16.633502] RSP: 002b:00007ffe3d993308 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7 [ 16.634737] RAX: ffffffffffffffda RBX: 00007f1a044c9490 RCX: 00007f1a043d0191 [ 16.635857] RDX: 000000000000003c RSI: 00000000000000e7 RDI: 0000000000000000 [ 16.636986] RBP: 0000000000000000 R08: ffffffffffffff88 R09: 0000000000000001 [ 16.638120] R10: 0000000000000008 R11: 0000000000000246 R12: 00007f1a044c9490 [ 16.639245] R13: 0000000000000001 R14: 00007f1a044c9968 R15: 0000000000000000 [ 16.640408] Modules linked in: [ 16.640958] CR2: ffffecbfff7b43c8 [ 16.641557] ---[ end trace bc4891c6ce46354c ]--- [ 16.642335] RIP: 0010:unmap_page_range+0x947/0xde0 [ 16.643135] Code: 00 00 08 00 48 83 f8 01 45 19 e4 41 f7 d4 41 83 e4 03 e9 a4 fd ff ff e8 b7 63 ed ff 4c 89 e0 48 c1 e0 065 [ 16.645983] RSP: 0018:ffffc90002503c58 EFLAGS: 00010286 [ 16.646845] RAX: ffffecbfff7b43c0 RBX: 00007f19f7203000 RCX: ffffffff812ff359 [ 16.647970] RDX: ffff888107778000 RSI: 0000000000000000 RDI: 0000000000000005 [ 16.649091] RBP: ffffea000425e000 R08: 0000000000000000 R09: 3030303030303030 [ 16.650250] R10: ffffffff82ed7d94 R11: 6637303030302052 R12: 7c00000afffded0f [ 16.651394] R13: 0000000000000001 R14: ffff888119ee7010 R15: 00007f19f7202000 [ 16.652529] FS: 0000000000000000(0000) GS:ffff88842fd00000(0000) knlGS:0000000000000000 [ 16.653887] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 16.654841] CR2: ffffecbfff7b43c8 CR3: 0000000103220005 CR4: 0000000000370ee0 [ 16.655992] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 16.657150] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 16.658290] Kernel panic - not syncing: Fatal exception [ 16.659613] Kernel Offset: disabled [ 16.660234] ---[ end Kernel panic - not syncing: Fatal exception ]--- Fixes: fb32856b16ad ("virtio-net: page_to_skb() use build_skb when there's sufficient tailroom") Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
1a802423 |
|
03-Jun-2021 |
Xuan Zhuo <xuanzhuo@linux.alibaba.com> |
virtio-net: fix for skb_over_panic inside big mode In virtio-net's large packet mode, there is a hole in the space behind buf. hdr_padded_len - hdr_len We must take this into account when calculating tailroom. [ 44.544385] skb_put.cold (net/core/skbuff.c:5254 (discriminator 1) net/core/skbuff.c:5252 (discriminator 1)) [ 44.544864] page_to_skb (drivers/net/virtio_net.c:485) [ 44.545361] receive_buf (drivers/net/virtio_net.c:849 drivers/net/virtio_net.c:1131) [ 44.545870] ? netif_receive_skb_list_internal (net/core/dev.c:5714) [ 44.546628] ? dev_gro_receive (net/core/dev.c:6103) [ 44.547135] ? napi_complete_done (./include/linux/list.h:35 net/core/dev.c:5867 net/core/dev.c:5862 net/core/dev.c:6565) [ 44.547672] virtnet_poll (drivers/net/virtio_net.c:1427 drivers/net/virtio_net.c:1525) [ 44.548251] __napi_poll (net/core/dev.c:6985) [ 44.548744] net_rx_action (net/core/dev.c:7054 net/core/dev.c:7139) [ 44.549264] __do_softirq (./arch/x86/include/asm/jump_label.h:19 ./include/linux/jump_label.h:200 ./include/trace/events/irq.h:142 kernel/softirq.c:560) [ 44.549762] irq_exit_rcu (kernel/softirq.c:433 kernel/softirq.c:637 kernel/softirq.c:649) [ 44.551384] common_interrupt (arch/x86/kernel/irq.c:240 (discriminator 13)) [ 44.551991] ? asm_common_interrupt (./arch/x86/include/asm/idtentry.h:638) [ 44.552654] asm_common_interrupt (./arch/x86/include/asm/idtentry.h:638) Fixes: fb32856b16ad ("virtio-net: page_to_skb() use build_skb when there's sufficient tailroom") Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Reported-by: Corentin Noël <corentin.noel@collabora.com> Tested-by: Corentin Noël <corentin.noel@collabora.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
8fb7da9e |
|
01-Jun-2021 |
Xuan Zhuo <xuanzhuo@linux.alibaba.com> |
virtio_net: get build_skb() buf by data ptr In the case of merge, the page passed into page_to_skb() may be a head page, not the page where the current data is located. So when trying to get the buf where the data is located, we should get buf based on headroom instead of offset. This patch solves this problem. But if you don't use this patch, the original code can also run, because if the page is not the page of the current data, the calculated tailroom will be less than 0, and will not enter the logic of build_skb() . The significance of this patch is to modify this logical problem, allowing more situations to use build_skb(). Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
5c37711d |
|
01-Jun-2021 |
Xuan Zhuo <xuanzhuo@linux.alibaba.com> |
virtio-net: fix for unable to handle page fault for address In merge mode, when xdp is enabled, if the headroom of buf is smaller than virtnet_get_headroom(), xdp_linearize_page() will be called but the variable of "headroom" is still 0, which leads to wrong logic after entering page_to_skb(). [ 16.600944] BUG: unable to handle page fault for address: ffffecbfff7b43c8[ 16.602175] #PF: supervisor read access in kernel mode [ 16.603350] #PF: error_code(0x0000) - not-present page [ 16.604200] PGD 0 P4D 0 [ 16.604686] Oops: 0000 [#1] SMP PTI [ 16.605306] CPU: 4 PID: 715 Comm: sh Tainted: G B 5.12.0+ #312 [ 16.606429] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/04 [ 16.608217] RIP: 0010:unmap_page_range+0x947/0xde0 [ 16.609014] Code: 00 00 08 00 48 83 f8 01 45 19 e4 41 f7 d4 41 83 e4 03 e9 a4 fd ff ff e8 b7 63 ed ff 4c 89 e0 48 c1 e0 065 [ 16.611863] RSP: 0018:ffffc90002503c58 EFLAGS: 00010286 [ 16.612720] RAX: ffffecbfff7b43c0 RBX: 00007f19f7203000 RCX: ffffffff812ff359 [ 16.613853] RDX: ffff888107778000 RSI: 0000000000000000 RDI: 0000000000000005 [ 16.614976] RBP: ffffea000425e000 R08: 0000000000000000 R09: 3030303030303030 [ 16.616124] R10: ffffffff82ed7d94 R11: 6637303030302052 R12: 7c00000afffded0f [ 16.617276] R13: 0000000000000001 R14: ffff888119ee7010 R15: 00007f19f7202000 [ 16.618423] FS: 0000000000000000(0000) GS:ffff88842fd00000(0000) knlGS:0000000000000000 [ 16.619738] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 16.620670] CR2: ffffecbfff7b43c8 CR3: 0000000103220005 CR4: 0000000000370ee0 [ 16.621792] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 16.622920] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 16.624047] Call Trace: [ 16.624525] ? release_pages+0x24d/0x730 [ 16.625209] unmap_single_vma+0xa9/0x130 [ 16.625885] unmap_vmas+0x76/0xf0 [ 16.626480] exit_mmap+0xa0/0x210 [ 16.627129] mmput+0x67/0x180 [ 16.627673] do_exit+0x3d1/0xf10 [ 16.628259] ? do_user_addr_fault+0x231/0x840 [ 16.629000] do_group_exit+0x53/0xd0 [ 16.629631] __x64_sys_exit_group+0x1d/0x20 [ 16.630354] do_syscall_64+0x3c/0x80 [ 16.630988] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 16.631828] RIP: 0033:0x7f1a043d0191 [ 16.632464] Code: Unable to access opcode bytes at RIP 0x7f1a043d0167. [ 16.633502] RSP: 002b:00007ffe3d993308 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7 [ 16.634737] RAX: ffffffffffffffda RBX: 00007f1a044c9490 RCX: 00007f1a043d0191 [ 16.635857] RDX: 000000000000003c RSI: 00000000000000e7 RDI: 0000000000000000 [ 16.636986] RBP: 0000000000000000 R08: ffffffffffffff88 R09: 0000000000000001 [ 16.638120] R10: 0000000000000008 R11: 0000000000000246 R12: 00007f1a044c9490 [ 16.639245] R13: 0000000000000001 R14: 00007f1a044c9968 R15: 0000000000000000 [ 16.640408] Modules linked in: [ 16.640958] CR2: ffffecbfff7b43c8 [ 16.641557] ---[ end trace bc4891c6ce46354c ]--- [ 16.642335] RIP: 0010:unmap_page_range+0x947/0xde0 [ 16.643135] Code: 00 00 08 00 48 83 f8 01 45 19 e4 41 f7 d4 41 83 e4 03 e9 a4 fd ff ff e8 b7 63 ed ff 4c 89 e0 48 c1 e0 065 [ 16.645983] RSP: 0018:ffffc90002503c58 EFLAGS: 00010286 [ 16.646845] RAX: ffffecbfff7b43c0 RBX: 00007f19f7203000 RCX: ffffffff812ff359 [ 16.647970] RDX: ffff888107778000 RSI: 0000000000000000 RDI: 0000000000000005 [ 16.649091] RBP: ffffea000425e000 R08: 0000000000000000 R09: 3030303030303030 [ 16.650250] R10: ffffffff82ed7d94 R11: 6637303030302052 R12: 7c00000afffded0f [ 16.651394] R13: 0000000000000001 R14: ffff888119ee7010 R15: 00007f19f7202000 [ 16.652529] FS: 0000000000000000(0000) GS:ffff88842fd00000(0000) knlGS:0000000000000000 [ 16.653887] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 16.654841] CR2: ffffecbfff7b43c8 CR3: 0000000103220005 CR4: 0000000000370ee0 [ 16.655992] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 16.657150] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 16.658290] Kernel panic - not syncing: Fatal exception [ 16.659613] Kernel Offset: disabled [ 16.660234] ---[ end Kernel panic - not syncing: Fatal exception ]--- Fixes: fb32856b16ad ("virtio-net: page_to_skb() use build_skb when there's sufficient tailroom") Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
122b84a1 |
|
01-May-2021 |
Max Gurtovoy <mgurtovoy@nvidia.com> |
virtio-net: don't allocate control_buf if not supported Not all virtio_net devices support the ctrl queue feature. Thus, there is no need to allocate unused resources. Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com> Link: https://lore.kernel.org/r/20210502093319.61313-1-mgurtovoy@nvidia.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
|
#
f80bd740 |
|
22-Apr-2021 |
Xuan Zhuo <xuanzhuo@linux.alibaba.com> |
virtio-net: fix use-after-free in skb_gro_receive When "headroom" > 0, the actual allocated memory space is the entire page, so the address of the page should be used when passing it to build_skb(). BUG: KASAN: use-after-free in skb_gro_receive (net/core/skbuff.c:4260) Write of size 16 at addr ffff88811619fffc by task kworker/u9:0/534 CPU: 2 PID: 534 Comm: kworker/u9:0 Not tainted 5.12.0-rc7-custom-16372-gb150be05b806 #3382 Hardware name: QEMU MSN2700, BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 Workqueue: xprtiod xs_stream_data_receive_workfn [sunrpc] Call Trace: <IRQ> dump_stack (lib/dump_stack.c:122) print_address_description.constprop.0 (mm/kasan/report.c:233) kasan_report.cold (mm/kasan/report.c:400 mm/kasan/report.c:416) skb_gro_receive (net/core/skbuff.c:4260) tcp_gro_receive (net/ipv4/tcp_offload.c:266 (discriminator 1)) tcp4_gro_receive (net/ipv4/tcp_offload.c:316) inet_gro_receive (net/ipv4/af_inet.c:1545 (discriminator 2)) dev_gro_receive (net/core/dev.c:6075) napi_gro_receive (net/core/dev.c:6168 net/core/dev.c:6198) receive_buf (drivers/net/virtio_net.c:1151) virtio_net virtnet_poll (drivers/net/virtio_net.c:1415 drivers/net/virtio_net.c:1519) virtio_net __napi_poll (net/core/dev.c:6964) net_rx_action (net/core/dev.c:7033 net/core/dev.c:7118) __do_softirq (./arch/x86/include/asm/jump_label.h:25 ./include/linux/jump_label.h:200 ./include/trace/events/irq.h:142 kernel/softirq.c:346) irq_exit_rcu (kernel/softirq.c:221 kernel/softirq.c:422 kernel/softirq.c:434) common_interrupt (arch/x86/kernel/irq.c:240 (discriminator 14)) </IRQ> Fixes: fb32856b16ad ("virtio-net: page_to_skb() use build_skb when there's sufficient tailroom") Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Reported-by: Ido Schimmel <idosch@nvidia.com> Tested-by: Ido Schimmel <idosch@nvidia.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
af39c8f7 |
|
20-Apr-2021 |
Eric Dumazet <edumazet@google.com> |
virtio-net: fix use-after-free in page_to_skb() KASAN/syzbot had 4 reports, one of them being: BUG: KASAN: slab-out-of-bounds in memcpy include/linux/fortify-string.h:191 [inline] BUG: KASAN: slab-out-of-bounds in page_to_skb+0x5cf/0xb70 drivers/net/virtio_net.c:480 Read of size 12 at addr ffff888014a5f800 by task systemd-udevd/8445 CPU: 0 PID: 8445 Comm: systemd-udevd Not tainted 5.12.0-rc8-next-20210419-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: <IRQ> __dump_stack lib/dump_stack.c:79 [inline] dump_stack+0x141/0x1d7 lib/dump_stack.c:120 print_address_description.constprop.0.cold+0x5b/0x2f8 mm/kasan/report.c:233 __kasan_report mm/kasan/report.c:419 [inline] kasan_report.cold+0x7c/0xd8 mm/kasan/report.c:436 check_region_inline mm/kasan/generic.c:180 [inline] kasan_check_range+0x13d/0x180 mm/kasan/generic.c:186 memcpy+0x20/0x60 mm/kasan/shadow.c:65 memcpy include/linux/fortify-string.h:191 [inline] page_to_skb+0x5cf/0xb70 drivers/net/virtio_net.c:480 receive_mergeable drivers/net/virtio_net.c:1009 [inline] receive_buf+0x2bc0/0x6250 drivers/net/virtio_net.c:1119 virtnet_receive drivers/net/virtio_net.c:1411 [inline] virtnet_poll+0x568/0x10b0 drivers/net/virtio_net.c:1516 __napi_poll+0xaf/0x440 net/core/dev.c:6962 napi_poll net/core/dev.c:7029 [inline] net_rx_action+0x801/0xb40 net/core/dev.c:7116 __do_softirq+0x29b/0x9fe kernel/softirq.c:559 invoke_softirq kernel/softirq.c:433 [inline] __irq_exit_rcu+0x136/0x200 kernel/softirq.c:637 irq_exit_rcu+0x5/0x20 kernel/softirq.c:649 common_interrupt+0xa4/0xd0 arch/x86/kernel/irq.c:240 Fixes: fb32856b16ad ("virtio-net: page_to_skb() use build_skb when there's sufficient tailroom") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Reported-by: Guenter Roeck <linux@roeck-us.net> Reported-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Cc: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Cc: Jason Wang <jasowang@redhat.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: virtualization@lists.linux-foundation.org Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
f5d7872a |
|
20-Apr-2021 |
Eric Dumazet <edumazet@google.com> |
virtio-net: restrict build_skb() use to some arches build_skb() is supposed to be followed by skb_reserve(skb, NET_IP_ALIGN), so that IP headers are word-aligned. (Best practice is to reserve NET_IP_ALIGN+NET_SKB_PAD, but the NET_SKB_PAD part is only a performance optimization if tunnel encaps are added.) Unfortunately virtio_net has not provisioned this reserve. We can only use build_skb() for arches where NET_IP_ALIGN == 0 We might refine this later, with enough testing. Fixes: fb32856b16ad ("virtio-net: page_to_skb() use build_skb when there's sufficient tailroom") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Guenter Roeck <linux@roeck-us.net> Cc: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Cc: Jason Wang <jasowang@redhat.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: virtualization@lists.linux-foundation.org Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
fb32856b |
|
16-Apr-2021 |
Xuan Zhuo <xuanzhuo@linux.alibaba.com> |
virtio-net: page_to_skb() use build_skb when there's sufficient tailroom In page_to_skb(), if we have enough tailroom to save skb_shared_info, we can use build_skb to create skb directly. No need to alloc for additional space. And it can save a 'frags slot', which is very friendly to GRO. Here, if the payload of the received package is too small (less than GOOD_COPY_LEN), we still choose to copy it directly to the space got by napi_alloc_skb. So we can reuse these pages. Testing Machine: The four queues of the network card are bound to the cpu1. Test command: for ((i=0;i<5;++i)); do sockperf tp --ip 192.168.122.64 -m 1000 -t 150& done The size of the udp package is 1000, so in the case of this patch, there will always be enough tailroom to use build_skb. The sent udp packet will be discarded because there is no port to receive it. The irqsoftd of the machine is 100%, we observe the received quantity displayed by sar -n DEV 1: no build_skb: 956864.00 rxpck/s build_skb: 1158465.00 rxpck/s Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Suggested-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
044ab86d |
|
18-Mar-2021 |
Antoine Tenart <atenart@kernel.org> |
net: move the xps maps to an array Move the xps maps (xps_cpus_map and xps_rxqs_map) to an array in net_device. That will simplify a lot the code removing the need for lots of if/else conditionals as the correct map will be available using its offset in the array. This should not modify the xps maps behaviour in any way. Suggested-by: Alexander Duyck <alexander.duyck@gmail.com> Signed-off-by: Antoine Tenart <atenart@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
fdc13979 |
|
07-Mar-2021 |
Lorenzo Bianconi <lorenzo@kernel.org> |
bpf, devmap: Move drop error path to devmap for XDP_REDIRECT We want to change the current ndo_xdp_xmit drop semantics because it will allow us to implement better queue overflow handling. This is working towards the larger goal of a XDP TX queue-hook. Move XDP_REDIRECT error path handling from each XDP ethernet driver to devmap code. According to the new APIs, the driver running the ndo_xdp_xmit pointer, will break tx loop whenever the hw reports a tx error and it will just return to devmap caller the number of successfully transmitted frames. It will be devmap responsibility to free dropped frames. Move each XDP ndo_xdp_xmit capable driver to the new APIs: - veth - virtio-net - mvneta - mvpp2 - socionext - amazon ena - bnxt - freescale (dpaa2, dpaa) - xen-frontend - qede - ice - igb - ixgbe - i40e - mlx5 - ti (cpsw, cpsw-new) - tun - sfc Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Ioana Ciornei <ioana.ciornei@nxp.com> Reviewed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> Reviewed-by: Camelia Groza <camelia.groza@nxp.com> Acked-by: Edward Cree <ecree.xilinx@gmail.com> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: Shay Agroskin <shayagr@amazon.com> Link: https://lore.kernel.org/bpf/ed670de24f951cfd77590decf0229a0ad7fd12f6.1615201152.git.lorenzo@kernel.org
|
#
d7a9a01b |
|
16-Mar-2021 |
Alexander Duyck <alexanderduyck@fb.com> |
virtio_net: Update driver to use ethtool_sprintf Update the code to replace instances of snprintf and a pointer update with just calling ethtool_sprintf. Also replace the char pointer with a u8 pointer to avoid having to recast the pointer type. Acked-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Alexander Duyck <alexanderduyck@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
97c2c69e |
|
09-Mar-2021 |
Xuan Zhuo <xuanzhuo@linux.alibaba.com> |
virtio-net: support XDP when not more queues The number of queues implemented by many virtio backends is limited, especially some machines have a large number of CPUs. In this case, it is often impossible to allocate a separate queue for XDP_TX/XDP_REDIRECT, then xdp cannot be loaded to work, even xdp does not use the XDP_TX/XDP_REDIRECT. This patch allows XDP_TX/XDP_REDIRECT to run by reuse the existing SQ with __netif_tx_lock() hold when there are not enough queues. Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Reviewed-by: Dust Li <dust.li@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
ab5bd583 |
|
18-Feb-2021 |
Xuan Zhuo <xuanzhuo@linux.alibaba.com> |
virtio-net: Support IFF_TX_SKB_NO_LINEAR flag Virtio net supports the case where the skb linear space is empty, so add priv_flags. Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Signed-off-by: Alexander Lobakin <alobakin@pm.me> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20210218204908.5455-4-alobakin@pm.me
|
#
0f6925b3 |
|
02-Apr-2021 |
Eric Dumazet <edumazet@google.com> |
virtio_net: Do not pull payload in skb->head Xuan Zhuo reported that commit 3226b158e67c ("net: avoid 32 x truesize under-estimation for tiny skbs") brought a ~10% performance drop. The reason for the performance drop was that GRO was forced to chain sk_buff (using skb_shinfo(skb)->frag_list), which uses more memory but also cause packet consumers to go over a lot of overhead handling all the tiny skbs. It turns out that virtio_net page_to_skb() has a wrong strategy : It allocates skbs with GOOD_COPY_LEN (128) bytes in skb->head, then copies 128 bytes from the page, before feeding the packet to GRO stack. This was suboptimal before commit 3226b158e67c ("net: avoid 32 x truesize under-estimation for tiny skbs") because GRO was using 2 frags per MSS, meaning we were not packing MSS with 100% efficiency. Fix is to pull only the ethernet header in page_to_skb() Then, we change virtio_net_hdr_to_skb() to pull the missing headers, instead of assuming they were already pulled by callers. This fixes the performance regression, but could also allow virtio_net to accept packets with more than 128bytes of headers. Many thanks to Xuan Zhuo for his report, and his tests/help. Fixes: 3226b158e67c ("net: avoid 32 x truesize under-estimation for tiny skbs") Reported-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Link: https://www.spinics.net/lists/netdev/msg731397.html Co-Developed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Cc: virtualization@lists.linux-foundation.org Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
95efabf0 |
|
19-Nov-2020 |
Gustavo A. R. Silva <gustavoars@kernel.org> |
virtio_net: Fix fall-through warnings for Clang In preparation to enable -Wimplicit-fallthrough for Clang, fix a warning by explicitly adding a goto statement instead of letting the code fall through to the next case. Link: https://github.com/KSPP/linux/issues/115 Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Link: https://lore.kernel.org/r/cb9b9534572bc476f4fb7b49a73dc8646b780c84.1605896060.git.gustavoars@kernel.org Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
|
#
be9df4af |
|
22-Dec-2020 |
Lorenzo Bianconi <lorenzo@kernel.org> |
net, xdp: Introduce xdp_prepare_buff utility routine Introduce xdp_prepare_buff utility routine to initialize per-descriptor xdp_buff fields (e.g. xdp_buff pointers). Rely on xdp_prepare_buff() in all XDP capable drivers. Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Alexander Duyck <alexanderduyck@fb.com> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Shay Agroskin <shayagr@amazon.com> Acked-by: Martin Habets <habetsm.xilinx@gmail.com> Acked-by: Camelia Groza <camelia.groza@nxp.com> Acked-by: Marcin Wojtas <mw@semihalf.com> Link: https://lore.kernel.org/bpf/45f46f12295972a97da8ca01990b3e71501e9d89.1608670965.git.lorenzo@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
#
43b5169d |
|
22-Dec-2020 |
Lorenzo Bianconi <lorenzo@kernel.org> |
net, xdp: Introduce xdp_init_buff utility routine Introduce xdp_init_buff utility routine to initialize xdp_buff fields const over NAPI iterations (e.g. frame_sz or rxq pointer). Rely on xdp_init_buff in all XDP capable drivers. Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Alexander Duyck <alexanderduyck@fb.com> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Shay Agroskin <shayagr@amazon.com> Acked-by: Martin Habets <habetsm.xilinx@gmail.com> Acked-by: Camelia Groza <camelia.groza@nxp.com> Acked-by: Marcin Wojtas <mw@semihalf.com> Link: https://lore.kernel.org/bpf/7f8329b6da1434dc2b05a77f2e800b29628a8913.1608670965.git.lorenzo@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
#
de33212f |
|
22-Dec-2020 |
Jeff Dike <jdike@akamai.com> |
virtio_net: Fix recursive call to cpus_read_lock() virtnet_set_channels can recursively call cpus_read_lock if CONFIG_XPS and CONFIG_HOTPLUG are enabled. The path is: virtnet_set_channels - calls get_online_cpus(), which is a trivial wrapper around cpus_read_lock() netif_set_real_num_tx_queues netif_reset_xps_queues_gt netif_reset_xps_queues - calls cpus_read_lock() This call chain and potential deadlock happens when the number of TX queues is reduced. This commit the removes netif_set_real_num_[tr]x_queues calls from inside the get/put_online_cpus section, as they don't require that it be held. Fixes: 47be24796c13 ("virtio-net: fix the set affinity bug when CPU IDs are not consecutive") Signed-off-by: Jeff Dike <jdike@akamai.com> Acked-by: Jason Wang <jasowang@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Link: https://lore.kernel.org/r/20201223025421.671-1-jdike@akamai.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
411ea23a |
|
04-Dec-2020 |
Dan Carpenter <dan.carpenter@oracle.com> |
virtio_net: Fix error code in probe() Set a negative error code intead of returning success if the MTU has been changed to something invalid. Fixes: fe36cbe0671e ("virtio_net: clear MTU when out of range") Reported-by: Robert Buhren <robert.buhren@sect.tu-berlin.de> Reported-by: Felicitas Hetzelt <file@sect.tu-berlin.de> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Link: https://lore.kernel.org/r/X8pGVJSeeCdII1Ys@mwanda Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>
|
#
b02e5a0e |
|
30-Nov-2020 |
Björn Töpel <bjorn@kernel.org> |
xsk: Propagate napi_id to XDP socket Rx path Add napi_id to the xdp_rxq_info structure, and make sure the XDP socket pick up the napi_id in the Rx path. The napi_id is used to find the corresponding NAPI structure for socket busy polling. Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> Acked-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Tariq Toukan <tariqt@nvidia.com> Link: https://lore.kernel.org/bpf/20201130185205.196029-7-bjorn.topel@gmail.com
|
#
cf8691cb |
|
21-Oct-2020 |
Michael S. Tsirkin <mst@redhat.com> |
Revert "virtio-net: ethtool configurable RXCSUM" This reverts commit 3618ad2a7c0e78e4258386394d5d5f92a3dbccf8. When control vq is not negotiated, that commit causes a crash: [ 72.229171] kernel BUG at drivers/net/virtio_net.c:1667! [ 72.230266] invalid opcode: 0000 [#1] PREEMPT SMP [ 72.231172] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.9.0-rc8-02934-g3618ad2a7c0e7 #1 [ 72.231172] EIP: virtnet_send_command+0x120/0x140 [ 72.231172] Code: 00 0f 94 c0 8b 7d f0 65 33 3d 14 00 00 00 75 1c 8d 65 f4 5b 5e 5f 5d c3 66 90 be 01 00 00 00 e9 6e ff ff ff 8d b6 00 +00 00 00 <0f> 0b e8 d9 bb 82 00 eb 17 8d b4 26 00 00 00 00 8d b4 26 00 00 00 [ 72.231172] EAX: 0000000d EBX: f72895c0 ECX: 00000017 EDX: 00000011 [ 72.231172] ESI: f7197800 EDI: ed69bd00 EBP: ed69bcf4 ESP: ed69bc98 [ 72.231172] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 EFLAGS: 00010246 [ 72.231172] CR0: 80050033 CR2: 00000000 CR3: 02c84000 CR4: 000406f0 [ 72.231172] Call Trace: [ 72.231172] ? __virt_addr_valid+0x45/0x60 [ 72.231172] ? ___cache_free+0x51f/0x760 [ 72.231172] ? kobject_uevent_env+0xf4/0x560 [ 72.231172] virtnet_set_guest_offloads+0x4d/0x80 [ 72.231172] virtnet_set_features+0x85/0x120 [ 72.231172] ? virtnet_set_guest_offloads+0x80/0x80 [ 72.231172] __netdev_update_features+0x27a/0x8e0 [ 72.231172] ? kobject_uevent+0xa/0x20 [ 72.231172] ? netdev_register_kobject+0x12c/0x160 [ 72.231172] register_netdevice+0x4fe/0x740 [ 72.231172] register_netdev+0x1c/0x40 [ 72.231172] virtnet_probe+0x728/0xb60 [ 72.231172] ? _raw_spin_unlock+0x1d/0x40 [ 72.231172] ? virtio_vdpa_get_status+0x1c/0x20 [ 72.231172] virtio_dev_probe+0x1c6/0x271 [ 72.231172] really_probe+0x195/0x2e0 [ 72.231172] driver_probe_device+0x26/0x60 [ 72.231172] device_driver_attach+0x49/0x60 [ 72.231172] __driver_attach+0x46/0xc0 [ 72.231172] ? device_driver_attach+0x60/0x60 [ 72.231172] bus_add_driver+0x197/0x1c0 [ 72.231172] driver_register+0x66/0xc0 [ 72.231172] register_virtio_driver+0x1b/0x40 [ 72.231172] virtio_net_driver_init+0x61/0x86 [ 72.231172] ? veth_init+0x14/0x14 [ 72.231172] do_one_initcall+0x76/0x2e4 [ 72.231172] ? rdinit_setup+0x2a/0x2a [ 72.231172] do_initcalls+0xb2/0xd5 [ 72.231172] kernel_init_freeable+0x14f/0x179 [ 72.231172] ? rest_init+0x100/0x100 [ 72.231172] kernel_init+0xd/0xe0 [ 72.231172] ret_from_fork+0x1c/0x30 [ 72.231172] Modules linked in: [ 72.269563] ---[ end trace a6ebc4afea0e6cb1 ]--- The reason is that virtnet_set_features now calls virtnet_set_guest_offloads unconditionally, it used to only call it when there is something to configure. If device does not have a control vq, everything breaks. Revert the original commit for now. Cc: Tonghao Zhang <xiangxia.m.yue@gmail.com> Fixes: 3618ad2a7c0e7 ("virtio-net: ethtool configurable RXCSUM") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Willem de Bruijn <willemb@google.com> Acked-by: Jason Wang <jasowang@redhat.com> Link: https://lore.kernel.org/r/20201021142944.13615-1-mst@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
3618ad2a |
|
11-Oct-2020 |
Tonghao Zhang <xiangxia.m.yue@gmail.com> |
virtio-net: ethtool configurable RXCSUM Allow user configuring RXCSUM separately with ethtool -K, reusing the existing virtnet_set_guest_offloads helper that configures RXCSUM for XDP. This is conditional on VIRTIO_NET_F_CTRL_GUEST_OFFLOADS. If Rx checksum is disabled, LRO should also be disabled. Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Acked-by: Willem de Bruijn <willemb@google.com> Link: https://lore.kernel.org/r/20201012015820.62042-1-xiangxia.m.yue@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
1a03b8a3 |
|
28-Sep-2020 |
Tonghao Zhang <xiangxia.m.yue@gmail.com> |
virtio-net: don't disable guest csum when disable LRO Open vSwitch and Linux bridge will disable LRO of the interface when this interface added to them. Now when disable the LRO, the virtio-net csum is disable too. That drops the forwarding performance. Fixes: a02e8964eaf9 ("virtio-net: ethtool configurable LRO") Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Willem de Bruijn <willemb@google.com> Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
5198d545 |
|
09-Sep-2020 |
Jakub Kicinski <kuba@kernel.org> |
net: remove napi_hash_del() from driver-facing API We allow drivers to call napi_hash_del() before calling netif_napi_del() to batch RCU grace periods. This makes the API asymmetric and leaks internal implementation details. Soon we will want the grace period to protect more than just the NAPI hash table. Restructure the API and have drivers call a new function - __netif_napi_del() if they want to take care of RCU waits. Note that only core was checking the return status from napi_hash_del() so the new helper does not report if the NAPI was actually deleted. Some notes on driver oddness: - veth observed the grace period before calling netif_napi_del() but that should not matter - myri10ge observed normal RCU flavor - bnx2x and enic did not actually observe the grace period (unless they did so implicitly) - virtio_net and enic only unhashed Rx NAPIs The last two points seem to indicate that the calls to napi_hash_del() were a left over rather than an optimization. Regardless, it's easy enough to correct them. This patch may introduce extra synchronize_net() calls for interfaces which set NAPI_STATE_NO_BUSY_POLL and depend on free_netdev() to call netif_napi_del(). This seems inevitable since we want to use RCU for netpoll dev->napi_list traversal, and almost no drivers set IFF_DISABLE_NETPOLL. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
df561f66 |
|
23-Aug-2020 |
Gustavo A. R. Silva <gustavoars@kernel.org> |
treewide: Use fallthrough pseudo-keyword Replace the existing /* fall through */ comments and its variants with the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary fall-through markings when it is the case. [1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
|
#
64ffa39d |
|
05-Aug-2020 |
Michael S. Tsirkin <mst@redhat.com> |
virtio_net: use LE accessors for speed/duplex Speed and duplex config fields depend on VIRTIO_NET_F_SPEED_DUPLEX which being 63>31 depends on VIRTIO_F_VERSION_1. Accordingly, use LE accessors for these fields. Reported-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
|
#
e8407fde |
|
22-Jul-2020 |
Andrii Nakryiko <andriin@fb.com> |
bpf, xdp: Remove XDP_QUERY_PROG and XDP_QUERY_PROG_HW XDP commands Now that BPF program/link management is centralized in generic net_device code, kernel code never queries program id from drivers, so XDP_QUERY_PROG/XDP_QUERY_PROG_HW commands are unnecessary. This patch removes all the implementations of those commands in kernel, along the xdp_attachment_query(). This patch was compile-tested on allyesconfig. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200722064603.3350758-10-andriin@fb.com
|
#
1b698fa5 |
|
28-May-2020 |
Lorenzo Bianconi <lorenzo@kernel.org> |
xdp: Rename convert_to_xdp_frame in xdp_convert_buff_to_frame In order to use standard 'xdp' prefix, rename convert_to_xdp_frame utility routine in xdp_convert_buff_to_frame and replace all the occurrences Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Link: https://lore.kernel.org/bpf/6344f739be0d1a08ab2b9607584c4d5478c8c083.1590698295.git.lorenzo@kernel.org
|
#
9ce6146e |
|
13-May-2020 |
Jesper Dangaard Brouer <brouer@redhat.com> |
virtio_net: Add XDP frame size in two code paths The virtio_net driver is running inside the guest-OS. There are two XDP receive code-paths in virtio_net, namely receive_small() and receive_mergeable(). The receive_big() function does not support XDP. In receive_small() the frame size is available in buflen. The buffer backing these frames are allocated in add_recvbuf_small() with same size, except for the headroom, but tailroom have reserved room for skb_shared_info. The headroom is encoded in ctx pointer as a value. In receive_mergeable() the frame size is more dynamic. There are two basic cases: (1) buffer size is based on a exponentially weighted moving average (see DECLARE_EWMA) of packet length. Or (2) in case virtnet_get_headroom() have any headroom then buffer size is PAGE_SIZE. The ctx pointer is this time used for encoding two values; the buffer len "truesize" and headroom. In case (1) if the rx buffer size is underestimated, the packet will have been split over more buffers (num_buf info in virtio_net_hdr_mrg_rxbuf placed in top of buffer area). If that happens the XDP path does a xdp_linearize_page operation. V3: Adjust frame_sz in receive_mergeable() case, spotted by Jason Wang. The code is really hard to follow, so some hints to reviewers. The receive_mergeable() case gets frames that were allocated in add_recvbuf_mergeable() which uses headroom=virtnet_get_headroom(), and 'buf' ptr is advanced this headroom. The headroom can only be 0 or VIRTIO_XDP_HEADROOM, as virtnet_get_headroom is really simple: static unsigned int virtnet_get_headroom(struct virtnet_info *vi) { return vi->xdp_queue_pairs ? VIRTIO_XDP_HEADROOM : 0; } As frame_sz is an offset size from xdp.data_hard_start, reviewers should notice how this is calculated in receive_mergeable(): int offset = buf - page_address(page); [...] data = page_address(xdp_page) + offset; xdp.data_hard_start = data - VIRTIO_XDP_HEADROOM + vi->hdr_len; The calculated offset will always be VIRTIO_XDP_HEADROOM when reaching this code. Thus, xdp.data_hard_start will be page-start address plus vi->hdr_len. Given this xdp.frame_sz need to be reduced with vi->hdr_len size. IMHO a followup patch should cleanup this code to make it easier to maintain and understand, but it is outside the scope of this patchset. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Link: https://lore.kernel.org/bpf/158945344436.97035.9445115070189151680.stgit@firesoul
|
#
01c32598 |
|
07-May-2020 |
Michael S. Tsirkin <mst@redhat.com> |
virtio_net: fix lockdep warning on 32 bit When we fill up a receive VQ, try_fill_recv currently tries to count kicks using a 64 bit stats counter. Turns out, on a 32 bit kernel that uses a seqcount. sequence counts are "lock" constructs where you need to make sure that writers are serialized. In turn, this means that we mustn't run two try_fill_recv concurrently. Which of course we don't. We do run try_fill_recv sometimes from a softirq napi context, and sometimes from a fully preemptible context, but the later always runs with napi disabled. However, when it comes to the seqcount, lockdep is trying to enforce the rule that the same lock isn't accessed from preemptible and softirq context - it doesn't know about napi being enabled/disabled. This causes a false-positive warning: WARNING: inconsistent lock state ... inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage. As a work around, shut down the warning by switching to u64_stats_update_begin_irqsave - that works by disabling interrupts on 32 bit only, is a NOP on 64 bit. Reported-by: Thomas Gleixner <tglx@linutronix.de> Suggested-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
a51e5206 |
|
04-Mar-2020 |
Jakub Kicinski <kuba@kernel.org> |
virtio_net: reject unsupported coalescing params Set ethtool_ops->supported_coalesce_params to let the core reject unsupported coalescing parameters. This driver correctly rejects all unsupported parameters. As a side effect of these changes the error code for unsupported params changes from EINVAL to EOPNOTSUPP. v2: correctly handle rx-frames (and adjust the commit msg) v3: adjust commit message for new error code and member name Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
9aedc6e2 |
|
28-Feb-2020 |
Cris Forno <cforno12@linux.vnet.ibm.com> |
net/ethtool: Introduce link_ksettings API for virtual network devices With the ethtool_virtdev_set_link_ksettings function in core/ethtool.c, ibmveth, netvsc, and virtio now use the core's helper function. Funtionality changes that pertain to ibmveth driver include: 1. Changed the initial hardcoded link speed to 1GB. 2. Added support for allowing a user to change the reported link speed via ethtool. Functionality changes to the netvsc driver include: 1. When netvsc_get_link_ksettings is called, it will defer to the VF device if it exists to pull accelerated networking values, otherwise pull default or user-defined values. 2. Similarly, if netvsc_set_link_ksettings called and a VF device exists, the real values of speed and duplex are changed. Signed-off-by: Cris Forno <cforno12@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
503d539a |
|
24-Feb-2020 |
Yuya Kusakabe <yuya.kusakabe@gmail.com> |
virtio_net: Add XDP meta data support Implement support for transferring XDP meta data into skb for virtio_net driver; before calling into the program, xdp.data_meta points to xdp.data, where on program return with pass verdict, we call into skb_metadata_set(). Tested with the script at https://github.com/higebu/virtio_net-xdp-metadata-test. Signed-off-by: Yuya Kusakabe <yuya.kusakabe@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Jason Wang <jasowang@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Link: https://lore.kernel.org/bpf/20200225033212.437563-2-yuya.kusakabe@gmail.com
|
#
f1d4884d |
|
24-Feb-2020 |
Yuya Kusakabe <yuya.kusakabe@gmail.com> |
virtio_net: Keep vnet header zeroed if XDP is loaded for small buffer We do not want to care about the vnet header in receive_small() if XDP is loaded, since we can not know whether or not the packet is modified by XDP. Fixes: f6b10209b90d ("virtio-net: switch to use build_skb() for small buffer") Signed-off-by: Yuya Kusakabe <yuya.kusakabe@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Jason Wang <jasowang@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Link: https://lore.kernel.org/bpf/20200225033212.437563-1-yuya.kusakabe@gmail.com
|
#
9719c6b9 |
|
26-Jan-2020 |
John Fastabend <john.fastabend@gmail.com> |
bpf, xdp: virtio_net use access ptr macro for xdp enable check virtio_net currently relies on rcu critical section to access the xdp program in its xdp_xmit handler. However, the pointer to the xdp program is only used to do a NULL pointer comparison to determine if xdp is enabled or not. Use rcu_access_pointer() instead of rcu_dereference() to reflect this. Then later when we drop rcu_read critical section virtio_net will not need in special handling. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Link: https://lore.kernel.org/bpf/1580084042-11598-3-git-send-email-john.fastabend@gmail.com
|
#
1d233886 |
|
16-Jan-2020 |
Toke Høiland-Jørgensen <toke@redhat.com> |
xdp: Use bulking for non-map XDP_REDIRECT and consolidate code paths Since the bulk queue used by XDP_REDIRECT now lives in struct net_device, we can re-use the bulking for the non-map version of the bpf_redirect() helper. This is a simple matter of having xdp_do_redirect_slow() queue the frame on the bulk queue instead of sending it out with __bpf_tx_xdp(). Unfortunately we can't make the bpf_redirect() helper return an error if the ifindex doesn't exit (as bpf_redirect_map() does), because we don't have a reference to the network namespace of the ingress device at the time the helper is called. So we have to leave it as-is and keep the device lookup in xdp_do_redirect_slow(). Since this leaves less reason to have the non-map redirect code in a separate function, so we get rid of the xdp_do_redirect_slow() function entirely. This does lose us the tracepoint disambiguation, but fortunately the xdp_redirect and xdp_redirect_map tracepoints use the same tracepoint entry structures. This means both can contain a map index, so we can just amend the tracepoint definitions so we always emit the xdp_redirect(_err) tracepoints, but with the map ID only populated if a map is present. This means we retire the xdp_redirect_map(_err) tracepoints entirely, but keep the definitions around in case someone is still listening for them. With this change, the performance of the xdp_redirect sample program goes from 5Mpps to 8.4Mpps (a 68% increase). Since the flush functions are no longer map-specific, rename the flush() functions to drop _map from their names. One of the renamed functions is the xdp_do_flush_map() callback used in all the xdp-enabled drivers. To keep from having to update all drivers, use a #define to keep the old name working, and only update the virtual drivers in this patch. Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/157918768505.1458396.17518057312953572912.stgit@toke.dk
|
#
85192dbf |
|
17-Nov-2019 |
Andrii Nakryiko <andriin@fb.com> |
bpf: Convert bpf_prog refcnt to atomic64_t Similarly to bpf_map's refcnt/usercnt, convert bpf_prog's refcnt to atomic64 and remove artificial 32k limit. This allows to make bpf_prog's refcounting non-failing, simplifying logic of users of bpf_prog_add/bpf_prog_inc. Validated compilation by running allyesconfig kernel build. Suggested-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20191117172806.2195367-3-andriin@fb.com
|
#
895b5c9f |
|
29-Sep-2019 |
Florian Westphal <fw@strlen.de> |
netfilter: drop bridge nf reset from nf_reset commit 174e23810cd31 ("sk_buff: drop all skb extensions on free and skb scrubbing") made napi recycle always drop skb extensions. The additional skb_ext_del() that is performed via nf_reset on napi skb recycle is not needed anymore. Most nf_reset() calls in the stack are there so queued skb won't block 'rmmod nf_conntrack' indefinitely. This removes the skb_ext_del from nf_reset, and renames it to a more fitting nf_reset_ct(). In a few selected places, add a call to skb_ext_reset to make sure that no active extensions remain. I am submitting this for "net", because we're still early in the release cycle. The patch applies to net-next too, but I think the rename causes needless divergence between those trees. Suggested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
#
718be6ba |
|
19-Aug-2019 |
? jiang <jiangkidd@hotmail.com> |
virtio-net: lower min ring num_free for efficiency This change lowers ring buffer reclaim threshold from 1/2*queue to budget for better performance. According to our test with qemu + dpdk, packet dropping happens when the guest is not able to provide free buffer in avail ring timely with default 1/2*queue. The value in the patch has been tested and does show better performance. Test setup: iperf3 to generate packets to guest (total 30mins, pps 400k, UDP) avg packets drop before: 2842 avg packets drop after: 360(-87.3%) Further, current code suffers from a starvation problem: the amount of work done by try_fill_recv is not bounded by the budget parameter, thus (with large queues) once in a while userspace gets blocked for a long time while queue is being refilled. Trigger refills earlier to make sure the amount of work to do is limited. Signed-off-by: jiangkidd <jiangkidd@hotmail.com> Acked-by: Jason Wang <jasowang@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
|
#
31c03aef |
|
12-Jun-2019 |
Willem de Bruijn <willemb@google.com> |
virtio_net: enable napi_tx by default NAPI tx mode improves TCP behavior by enabling TCP small queues (TSQ). TSQ reduces queuing ("bufferbloat") and burstiness. Previous measurements have shown significant improvement for TCP_STREAM style workloads. Such as those in commit 86a5df1495cc ("Merge branch 'virtio-net-tx-napi'"). There has been uncertainty about smaller possible regressions in latency due to increased reliance on tx interrupts. The above results did not show that, nor did I observe this when rerunning TCP_RR on Linux 5.1 this week on a pair of guests in the same rack. This may be subject to other settings, notably interrupt coalescing. In the unlikely case of regression, we have landed a credible runtime solution. Ethtool can configure it with -C tx-frames [0|1] as of commit 0c465be183c7 ("virtio_net: ethtool tx napi configuration"). NAPI tx mode has been the default in Google Container-Optimized OS (COS) for over half a year, as of release M70 in October 2018, without any negative reports. Link: https://marc.info/?l=linux-netdev&m=149305618416472 Link: https://lwn.net/Articles/507065/ Signed-off-by: Willem de Bruijn <willemb@google.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
1ccea77e |
|
19-May-2019 |
Thomas Gleixner <tglx@linutronix.de> |
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 13 Based on 2 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms of the gnu general public license as published by the free software foundation either version 2 of the license or at your option any later version this program is distributed in the hope that it will be useful but without any warranty without even the implied warranty of merchantability or fitness for a particular purpose see the gnu general public license for more details you should have received a copy of the gnu general public license along with this program if not see http www gnu org licenses this program is free software you can redistribute it and or modify it under the terms of the gnu general public license as published by the free software foundation either version 2 of the license or at your option any later version this program is distributed in the hope that it will be useful but without any warranty without even the implied warranty of merchantability or fitness for a particular purpose see the gnu general public license for more details [based] [from] [clk] [highbank] [c] you should have received a copy of the gnu general public license along with this program if not see http www gnu org licenses extracted by the scancode license scanner the SPDX license identifier GPL-2.0-or-later has been chosen to replace the boilerplate/reference in 355 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Reviewed-by: Jilayne Lovejoy <opensource@jilayne.com> Reviewed-by: Steve Winslow <swinslow@gmail.com> Reviewed-by: Allison Randal <allison@lohutok.net> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190519154041.837383322@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
#
7934b481 |
|
02-Apr-2019 |
Yuval Shaia <yuval.shaia@oracle.com> |
virtio-net: Fix some minor formatting errors Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
6221333a |
|
03-Apr-2019 |
Yuval Shaia <yuval.shaia@oracle.com> |
virtio-net: Remove inclusion of pci.h This header is not in use - remove it. Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
6b16f9ee |
|
01-Apr-2019 |
Florian Westphal <fw@strlen.de> |
net: move skb->xmit_more hint to softnet data There are two reasons for this. First, the xmit_more flag conceptually doesn't fit into the skb, as xmit_more is not a property related to the skb. Its only a hint to the driver that the stack is about to transmit another packet immediately. Second, it was only done this way to not have to pass another argument to ndo_start_xmit(). We can place xmit_more in the softnet data, next to the device recursion. The recursion counter is already written to on each transmit. The "more" indicator is placed right next to it. Drivers can use the netdev_xmit_more() helper instead of skb->xmit_more to check the "more packets coming" hint. skb->xmit_more is retained (but always 0) to not cause build breakage. This change takes care of the simple s/skb->xmit_more/netdev_xmit_more()/ conversions. Remaining drivers are converted in the next patches. Suggested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
310974fa |
|
18-Mar-2019 |
Peter Xu <peterx@redhat.com> |
virtio_net: remove hcpu from virtnet_clean_affinity The variable is never used. CC: Michael S. Tsirkin <mst@redhat.com> CC: Jason Wang <jasowang@redhat.com> CC: virtualization@lists.linux-foundation.org CC: netdev@vger.kernel.org CC: linux-kernel@vger.kernel.org Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
546f2897 |
|
31-Jan-2019 |
Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> |
virtio_net: Account for tx bytes and packets on sending xdp_frames Previously virtnet_xdp_xmit() did not account for device tx counters, which caused confusions. To be consistent with SKBs, account them on freeing xdp_frames. Reported-by: David Ahern <dsahern@gmail.com> Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
5050471d |
|
28-Jan-2019 |
Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> |
virtio_net: Differentiate sk_buff and xdp_frame on freeing We do not reset or free up unused buffers when enabling/disabling XDP, so it can happen that xdp_frames are freed after disabling XDP or sk_buffs are freed after enabling XDP on xdp tx queues. Thus we need to handle both forms (xdp_frames and sk_buffs) regardless of XDP setting. One way to trigger this problem is to disable XDP when napi_tx is enabled. In that case, virtnet_xdp_set() calls virtnet_napi_enable() which kicks NAPI. The NAPI handler will call virtnet_poll_cleantx() which invokes free_old_xmit_skbs() for queues which have been used by XDP. Note that even with this change we need to keep skipping free_old_xmit_skbs() from NAPI handlers when XDP is enabled, because XDP tx queues do not aquire queue locks. - v2: Use napi_consume_skb() instead of dev_consume_skb_any() Fixes: 4941d472bf95 ("virtio-net: do not reset during XDP set") Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Acked-by: Jason Wang <jasowang@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
07b344f4 |
|
28-Jan-2019 |
Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> |
virtio_net: Use xdp_return_frame to free xdp_frames on destroying vqs put_page() can work as a fallback for freeing xdp_frames, but the appropriate way is to use xdp_return_frame(). Fixes: cac320c850ef ("virtio_net: convert to use generic xdp_frame and xdp_return_frame API") Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Acked-by: Jason Wang <jasowang@redhat.com> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
03aa6d34 |
|
28-Jan-2019 |
Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> |
virtio_net: Don't process redirected XDP frames when XDP is disabled Commit 8dcc5b0ab0ec ("virtio_net: fix ndo_xdp_xmit crash towards dev not ready for XDP") tried to avoid access to unexpected sq while XDP is disabled, but was not complete. There was a small window which causes out of bounds sq access in virtnet_xdp_xmit() while disabling XDP. An example case of - curr_queue_pairs = 6 (2 for SKB and 4 for XDP) - online_cpu_num = xdp_queue_paris = 4 when XDP is enabled: CPU 0 CPU 1 (Disabling XDP) (Processing redirected XDP frames) virtnet_xdp_xmit() virtnet_xdp_set() _virtnet_set_queues() set curr_queue_pairs (2) check if rq->xdp_prog is not NULL virtnet_xdp_sq(vi) qp = curr_queue_pairs - xdp_queue_pairs + smp_processor_id() = 2 - 4 + 1 = -1 sq = &vi->sq[qp] // out of bounds access set xdp_queue_pairs (0) rq->xdp_prog = NULL Basically we should not change curr_queue_pairs and xdp_queue_pairs while someone can read the values. Thus, when disabling XDP, assign NULL to rq->xdp_prog first, and wait for RCU grace period, then change xxx_queue_pairs. Note that we need to keep the current order when enabling XDP though. - v2: Make rcu_assign_pointer/synchronize_net conditional instead of _virtnet_set_queues. Fixes: 186b3c998c50 ("virtio-net: support XDP_REDIRECT") Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Acked-by: Jason Wang <jasowang@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
1667c08a |
|
28-Jan-2019 |
Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> |
virtio_net: Fix out of bounds access of sq When XDP is disabled, curr_queue_pairs + smp_processor_id() can be larger than max_queue_pairs. There is no guarantee that we have enough XDP send queues dedicated for each cpu when XDP is disabled, so do not count drops on sq in that case. Fixes: 5b8f3c8d30a6 ("virtio_net: Add XDP related stats") Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Acked-by: Jason Wang <jasowang@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
188313c1 |
|
28-Jan-2019 |
Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> |
virtio_net: Fix not restoring real_num_rx_queues When _virtnet_set_queues() failed we did not restore real_num_rx_queues. Fix this by placing the change of real_num_rx_queues after _virtnet_set_queues(). This order is also in line with virtnet_set_channels(). Fixes: 4941d472bf95 ("virtio-net: do not reset during XDP set") Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Acked-by: Jason Wang <jasowang@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
534da5e8 |
|
28-Jan-2019 |
Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> |
virtio_net: Don't call free_old_xmit_skbs for xdp_frames When napi_tx is enabled, virtnet_poll_cleantx() called free_old_xmit_skbs() even for xdp send queue. This is bogus since the queue has xdp_frames, not sk_buffs, thus mangled device tx bytes counters because skb->len is meaningless value, and even triggered oops due to general protection fault on freeing them. Since xdp send queues do not aquire locks, old xdp_frames should be freed only in virtnet_xdp_xmit(), so just skip free_old_xmit_skbs() for xdp send queues. Similarly virtnet_poll_tx() called free_old_xmit_skbs(). This NAPI handler is called even without calling start_xmit() because cb for tx is by default enabled. Once the handler is called, it enabled the cb again, and then the handler would be called again. We don't need this handler for XDP, so don't enable cb as well as not calling free_old_xmit_skbs(). Also, we need to disable tx NAPI when disabling XDP, so virtnet_poll_tx() can safely access curr_queue_pairs and xdp_queue_pairs, which are not atomically updated while disabling XDP. Fixes: b92f1e6751a6 ("virtio-net: transmit napi") Fixes: 7b0411ef4aa6 ("virtio-net: clean tx descriptors from rx napi") Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Acked-by: Jason Wang <jasowang@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
8be4d9a4 |
|
28-Jan-2019 |
Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> |
virtio_net: Don't enable NAPI when interface is down Commit 4e09ff536284 ("virtio-net: disable NAPI only when enabled during XDP set") tried to fix inappropriate NAPI enabling/disabling when !netif_running(), but was not complete. On error path virtio_net could enable NAPI even when !netif_running(). This can cause enabling NAPI twice on virtnet_open(), which would trigger BUG_ON() in napi_enable(). Fixes: 4941d472bf95b ("virtio-net: do not reset during XDP set") Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Acked-by: Jason Wang <jasowang@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
df133f3f |
|
17-Jan-2019 |
Michael S. Tsirkin <mst@redhat.com> |
virtio_net: bulk free tx skbs Use napi_consume_skb() to get bulk free. Note that napi_consume_skb is safe to call in a non-napi context as long as the napi_budget flag is correct. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
133bbb18 |
|
17-Jan-2019 |
Willem de Bruijn <willemb@google.com> |
virtio-net: per-queue RPS config On multiqueue network devices, RPS maps are configured independently for each receive queue through /sys/class/net/$DEV/queues/rx-*. On virtio-net currently all packets use the map from rx-0, because the real rx queue is not known at time of map lookup by get_rps_cpu. Call skb_record_rx_queue in the driver rx path to make lookup work. Recording the receive queue has ramifications beyond RPS, such as in sticky load balancing decisions for sockets (skb_tx_hash) and XPS. Reported-by: Mark Hlady <mhlady@google.com> Signed-off-by: Willem de Bruijn <willemb@google.com> Acked-by: Jason Wang <jasowang@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
a02e8964 |
|
20-Dec-2018 |
Willem de Bruijn <willemb@google.com> |
virtio-net: ethtool configurable LRO Virtio-net devices negotiate LRO support with the host. Display the initially negotiated state with ethtool -k. Also allow configuring it with ethtool -K, reusing the existing virtnet_set_guest_offloads helper that configures LRO for XDP. This is conditional on VIRTIO_NET_F_CTRL_GUEST_OFFLOADS. Virtio-net negotiates TSO4 and TSO6 separately, but ethtool does not distinguish between the two. Display LRO as on only if any offload is active. RTNL is held while calling virtnet_set_features, same as on the path from virtnet_xdp_set. Changes v1 -> v2 - allow ethtool config (-K) only if VIRTIO_NET_F_CTRL_GUEST_OFFLOADS - show LRO as enabled if any LRO variant is enabled - do not allow configuration while XDP is active - differentiate current features from the capable set, to restore on XDP down only those features that were active on XDP up - move test out of VIRTIO_NET_F_CSUM/TSO branch, which is tx only Signed-off-by: Willem de Bruijn <willemb@google.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
436c9453 |
|
28-Nov-2018 |
Jason Wang <jasowang@redhat.com> |
virtio-net: keep vnet header zeroed after processing XDP We copy vnet header unconditionally in page_to_skb() this is wrong since XDP may modify the packet data. So let's keep a zeroed vnet header for not confusing the conversion between vnet header and skb metadata. In the future, we should able to detect whether or not the packet was modified and keep using the vnet header when packet was not touched. Fixes: f600b6905015 ("virtio_net: Add XDP support") Reported-by: Pavel Popa <pashinho1990@gmail.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
18ba58e1 |
|
21-Nov-2018 |
Jason Wang <jasowang@redhat.com> |
virtio-net: fail XDP set if guest csum is negotiated We don't support partial csumed packet since its metadata will be lost or incorrect during XDP processing. So fail the XDP set if guest_csum feature is negotiated. Fixes: f600b6905015 ("virtio_net: Add XDP support") Reported-by: Jesper Dangaard Brouer <brouer@redhat.com> Cc: Jesper Dangaard Brouer <brouer@redhat.com> Cc: Pavel Popa <pashinho1990@gmail.com> Cc: David Ahern <dsahern@gmail.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
e59ff2c4 |
|
21-Nov-2018 |
Jason Wang <jasowang@redhat.com> |
virtio-net: disable guest csum during XDP set We don't disable VIRTIO_NET_F_GUEST_CSUM if XDP was set. This means we can receive partial csumed packets with metadata kept in the vnet_hdr. This may have several side effects: - It could be overridden by header adjustment, thus is might be not correct after XDP processing. - There's no way to pass such metadata information through XDP_REDIRECT to another driver. - XDP does not support checksum offload right now. So simply disable guest csum if possible in this the case of XDP. Fixes: 3f93522ffab2d ("virtio-net: switch off offloads on demand if possible on XDP set") Reported-by: Jesper Dangaard Brouer <brouer@redhat.com> Cc: Jesper Dangaard Brouer <brouer@redhat.com> Cc: Pavel Popa <pashinho1990@gmail.com> Cc: David Ahern <dsahern@gmail.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
05c998b7 |
|
17-Oct-2018 |
Ake Koomsin <ake@igel.co.jp> |
virtio_net: avoid using netif_tx_disable() for serializing tx routine Commit 713a98d90c5e ("virtio-net: serialize tx routine during reset") introduces netif_tx_disable() after netif_device_detach() in order to avoid use-after-free of tx queues. However, there are two issues. 1) Its operation is redundant with netif_device_detach() in case the interface is running. 2) In case of the interface is not running before suspending and resuming, the tx does not get resumed by netif_device_attach(). This results in losing network connectivity. It is better to use netif_tx_lock_bh()/netif_tx_unlock_bh() instead for serializing tx routine during reset. This also preserves the symmetry of netif_device_detach() and netif_device_attach(). Fixes commit 713a98d90c5e ("virtio-net: serialize tx routine during reset") Signed-off-by: Ake Koomsin <ake@igel.co.jp> Acked-by: Jason Wang <jasowang@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
0c465be1 |
|
08-Oct-2018 |
Jason Wang <jasowang@redhat.com> |
virtio_net: ethtool tx napi configuration Implement ethtool .set_coalesce (-C) and .get_coalesce (-c) handlers. Interrupt moderation is currently not supported, so these accept and display the default settings of 0 usec and 1 frame. Toggle tx napi through setting tx-frames. So as to not interfere with possible future interrupt moderation, value 1 means tx napi while value 0 means not. Only allow the switching when device is down for simplicity. Link: https://patchwork.ozlabs.org/patch/948149/ Suggested-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
260dd2c3 |
|
27-Sep-2018 |
Eric Dumazet <edumazet@google.com> |
virtio_net: remove ndo_poll_controller As diagnosed by Song Liu, ndo_poll_controller() can be very dangerous on loaded hosts, since the cpu calling ndo_poll_controller() might steal all NAPI contexts (for all RX/TX queues of the NIC). This capture can last for unlimited amount of time, since one cpu is generally not able to drain all the queues under load. virto_net uses NAPI for TX completions, so we better let core networking stack call the napi->poll() to avoid the capture. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
1150827b |
|
13-Aug-2018 |
YueHaibing <yuehaibing@huawei.com> |
virtio_net: remove duplicated include from virtio_net.c Remove duplicated include linux/netdevice.h Signed-off-by: YueHaibing <yuehaibing@huawei.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
2ca653d6 |
|
09-Aug-2018 |
Caleb Raitto <caraitto@google.com> |
virtio_net: Stripe queue affinities across cores. Always set the affinity hint, even if #cpu != #vq. Handle the case where #cpu > #vq (including when #cpu % #vq != 0) and when #vq > #cpu (including when #vq % #cpu != 0). Signed-off-by: Caleb Raitto <caraitto@google.com> Signed-off-by: Willem de Bruijn <willemb@google.com> Acked-by: Jon Olson <jonolson@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
19e226e8 |
|
09-Aug-2018 |
Caleb Raitto <caraitto@google.com> |
virtio: Make vp_set_vq_affinity() take a mask. Make vp_set_vq_affinity() take a cpumask instead of taking a single CPU. If there are fewer queues than cores, queue affinity should be able to map to multiple cores. Link: https://patchwork.ozlabs.org/patch/948149/ Suggested-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Caleb Raitto <caraitto@google.com> Acked-by: Gonglei <arei.gonglei@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
4d99f660 |
|
08-Aug-2018 |
Andrei Vagin <avagin@gmail.com> |
net: allow to call netif_reset_xps_queues() under cpus_read_lock The definition of static_key_slow_inc() has cpus_read_lock in place. In the virtio_net driver, XPS queues are initialized after setting the queue:cpu affinity in virtnet_set_affinity() which is already protected within cpus_read_lock. Lockdep prints a warning when we are trying to acquire cpus_read_lock when it is already held. This patch adds an ability to call __netif_set_xps_queue under cpus_read_lock(). Acked-by: Jason Wang <jasowang@redhat.com> ============================================ WARNING: possible recursive locking detected 4.18.0-rc3-next-20180703+ #1 Not tainted -------------------------------------------- swapper/0/1 is trying to acquire lock: 00000000cf973d46 (cpu_hotplug_lock.rw_sem){++++}, at: static_key_slow_inc+0xe/0x20 but task is already holding lock: 00000000cf973d46 (cpu_hotplug_lock.rw_sem){++++}, at: init_vqs+0x513/0x5a0 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(cpu_hotplug_lock.rw_sem); lock(cpu_hotplug_lock.rw_sem); *** DEADLOCK *** May be due to missing lock nesting notation 3 locks held by swapper/0/1: #0: 00000000244bc7da (&dev->mutex){....}, at: __driver_attach+0x5a/0x110 #1: 00000000cf973d46 (cpu_hotplug_lock.rw_sem){++++}, at: init_vqs+0x513/0x5a0 #2: 000000005cd8463f (xps_map_mutex){+.+.}, at: __netif_set_xps_queue+0x8d/0xc60 v2: move cpus_read_lock() out of __netif_set_xps_queue() Cc: "Nambiar, Amritha" <amritha.nambiar@intel.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Fixes: 8af2c06ff4b1 ("net-sysfs: Add interface for Rx queue(s) map per Tx queue") Signed-off-by: Andrei Vagin <avagin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
b633d440 |
|
04-Aug-2018 |
Gustavo A. R. Silva <gustavo@embeddedor.com> |
virtio-net: mark expected switch fall-throughs In preparation to enabling -Wimplicit-fallthrough, mark switch cases where we are expecting to fall through. Addresses-Coverity-ID: 1402059 ("Missing break in switch") Addresses-Coverity-ID: 1402060 ("Missing break in switch") Addresses-Coverity-ID: 1402061 ("Missing break in switch") Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
d46eeeaf |
|
31-Jul-2018 |
Jason Wang <jasowang@redhat.com> |
virtio-net: get rid of unnecessary container of rq stats We don't maintain tx counters in rx stats any more. There's no need for an extra container of rq stats. Cc: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
ca9e83b4 |
|
31-Jul-2018 |
Jason Wang <jasowang@redhat.com> |
virtio-net: correctly update XDP_TX counters Commit 5b8f3c8d30a6 ("virtio_net: Add XDP related stats") tries to count TX XDP stats in virtnet_receive(). This will cause several issues: - virtnet_xdp_sq() was called without checking whether or not XDP is set. This may cause out of bound access when there's no enough txq for XDP. - Stats were updated even if there's no XDP/XDP_TX. Fixing this by reusing virtnet_xdp_xmit() for XDP_TX which can counts TX XDP counter itself and remove the unnecessary tx stats embedded in rx stats. Reported-by: syzbot+604f8271211546f5b3c7@syzkaller.appspotmail.com Fixes: 5b8f3c8d30a6 ("virtio_net: Add XDP related stats") Cc: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
ecbc42ca |
|
23-Jul-2018 |
Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> |
virtio_net: Fix incosistent received bytes counter When received packets are dropped in virtio_net driver, received packets counter is incremented but bytes counter is not. As a result, for instance if we drop all packets by XDP, only received is counted and bytes stays 0, which looks inconsistent. IMHO received packets/bytes should be counted if packets are produced by the hypervisor, like what common NICs on physical machines are doing. So fix the bytes counter. Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
461f03dc |
|
23-Jul-2018 |
Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> |
virtio_net: Add kick stats So we can infer the number of VM-Exits. Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
5b8f3c8d |
|
23-Jul-2018 |
Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> |
virtio_net: Add XDP related stats Add counters below: * Tx - xdp_tx: frames sent by ndo_xdp_xmit or XDP_TX. - xdp_tx_drops: dropped frames out of xdp_tx ones. * Rx - xdp_packets: frames went through xdp program. - xdp_tx: XDP_TX frames. - xdp_redirects: XDP_REDIRECT frames. - xdp_drops: any dropped frames out of xdp_packets ones. Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
2a43565c |
|
23-Jul-2018 |
Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> |
virtio_net: Factor out the logic to determine xdp sq Make sure to use the same logic in all places to determine xdp sq. This is useful for xdp counters which the following commit will introduce as well. Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
2c4a2f7d |
|
23-Jul-2018 |
Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> |
virtio_net: Make drop counter per-queue Since when XDP was introduced, drop counter has been able to be updated much more frequently than before, as XDP_DROP increments the counter. Thus for performance analysis per-queue drop counter would be useful. Also this avoids cache contention and race on updating the counter. It is currently racy because napi handlers read-modify-write it without any locks. There are more counters in dev->stats that are racy, but I left them per-device, because they are rarely updated and does not worth being per-queue counters IMHO. To fix them we need atomic ops or some kind of locks. Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
a0929a44 |
|
23-Jul-2018 |
Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> |
virtio_net: Use temporary storage for accounting rx stats The purpose is to keep receive_buf arguments simple when more per-queue counter items are added later. Also XDP_TX related sq counters will be updated in the following changes so create a container struct virtnet_rx_stats which will includes both rq and sq statistics. For now it only covers rq stats. Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
7d9d60fd |
|
23-Jul-2018 |
Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> |
virtio_net: Fix incosistent received bytes counter When received packets are dropped in virtio_net driver, received packets counter is incremented but bytes counter is not. As a result, for instance if we drop all packets by XDP, only received is counted and bytes stays 0, which looks inconsistent. IMHO received packets/bytes should be counted if packets are produced by the hypervisor, like what common NICs on physical machines are doing. So fix the bytes counter. Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
6b867589 |
|
11-Jul-2018 |
Jakub Kicinski <kuba@kernel.org> |
xdp: don't make drivers report attachment mode prog_attached of struct netdev_bpf should have been superseded by simply setting prog_id long time ago, but we kept it around to allow offloading drivers to communicate attachment mode (drv vs hw). Subsequently drivers were also allowed to report back attachment flags (prog_flags), and since nowadays only programs attached will XDP_FLAGS_HW_MODE can get offloaded, we can tell the attachment mode from the flags driver reports. Remove prog_attached member. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
#
2471c75e |
|
26-Jun-2018 |
Jesper Dangaard Brouer <brouer@redhat.com> |
virtio_net: split XDP_TX kick and XDP_REDIRECT map flushing The driver was combining XDP_TX virtqueue_kick and XDP_REDIRECT map flushing (xdp_do_flush_map). This is suboptimal, these two flush operations should be kept separate. The suboptimal behavior was introduced in commit 9267c430c6b6 ("virtio-net: add missing virtqueue kick when flushing packets"). Fixes: 9267c430c6b6 ("virtio-net: add missing virtqueue kick when flushing packets") Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
6396bb22 |
|
12-Jun-2018 |
Kees Cook <keescook@chromium.org> |
treewide: kzalloc() -> kcalloc() The kzalloc() function has a 2-factor argument form, kcalloc(). This patch replaces cases of: kzalloc(a * b, gfp) with: kcalloc(a * b, gfp) as well as handling cases of: kzalloc(a * b * c, gfp) with: kzalloc(array3_size(a, b, c), gfp) as it's slightly less ugly than: kzalloc_array(array_size(a, b), c, gfp) This does, however, attempt to ignore constant size factors like: kzalloc(4 * 1024, gfp) though any constants defined via macros get caught up in the conversion. Any factors with a sizeof() of "unsigned char", "char", and "u8" were dropped, since they're redundant. The Coccinelle script used for this was: // Fix redundant parens around sizeof(). @@ type TYPE; expression THING, E; @@ ( kzalloc( - (sizeof(TYPE)) * E + sizeof(TYPE) * E , ...) | kzalloc( - (sizeof(THING)) * E + sizeof(THING) * E , ...) ) // Drop single-byte sizes and redundant parens. @@ expression COUNT; typedef u8; typedef __u8; @@ ( kzalloc( - sizeof(u8) * (COUNT) + COUNT , ...) | kzalloc( - sizeof(__u8) * (COUNT) + COUNT , ...) | kzalloc( - sizeof(char) * (COUNT) + COUNT , ...) | kzalloc( - sizeof(unsigned char) * (COUNT) + COUNT , ...) | kzalloc( - sizeof(u8) * COUNT + COUNT , ...) | kzalloc( - sizeof(__u8) * COUNT + COUNT , ...) | kzalloc( - sizeof(char) * COUNT + COUNT , ...) | kzalloc( - sizeof(unsigned char) * COUNT + COUNT , ...) ) // 2-factor product with sizeof(type/expression) and identifier or constant. @@ type TYPE; expression THING; identifier COUNT_ID; constant COUNT_CONST; @@ ( - kzalloc + kcalloc ( - sizeof(TYPE) * (COUNT_ID) + COUNT_ID, sizeof(TYPE) , ...) | - kzalloc + kcalloc ( - sizeof(TYPE) * COUNT_ID + COUNT_ID, sizeof(TYPE) , ...) | - kzalloc + kcalloc ( - sizeof(TYPE) * (COUNT_CONST) + COUNT_CONST, sizeof(TYPE) , ...) | - kzalloc + kcalloc ( - sizeof(TYPE) * COUNT_CONST + COUNT_CONST, sizeof(TYPE) , ...) | - kzalloc + kcalloc ( - sizeof(THING) * (COUNT_ID) + COUNT_ID, sizeof(THING) , ...) | - kzalloc + kcalloc ( - sizeof(THING) * COUNT_ID + COUNT_ID, sizeof(THING) , ...) | - kzalloc + kcalloc ( - sizeof(THING) * (COUNT_CONST) + COUNT_CONST, sizeof(THING) , ...) | - kzalloc + kcalloc ( - sizeof(THING) * COUNT_CONST + COUNT_CONST, sizeof(THING) , ...) ) // 2-factor product, only identifiers. @@ identifier SIZE, COUNT; @@ - kzalloc + kcalloc ( - SIZE * COUNT + COUNT, SIZE , ...) // 3-factor product with 1 sizeof(type) or sizeof(expression), with // redundant parens removed. @@ expression THING; identifier STRIDE, COUNT; type TYPE; @@ ( kzalloc( - sizeof(TYPE) * (COUNT) * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kzalloc( - sizeof(TYPE) * (COUNT) * STRIDE + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kzalloc( - sizeof(TYPE) * COUNT * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kzalloc( - sizeof(TYPE) * COUNT * STRIDE + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kzalloc( - sizeof(THING) * (COUNT) * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | kzalloc( - sizeof(THING) * (COUNT) * STRIDE + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | kzalloc( - sizeof(THING) * COUNT * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | kzalloc( - sizeof(THING) * COUNT * STRIDE + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) ) // 3-factor product with 2 sizeof(variable), with redundant parens removed. @@ expression THING1, THING2; identifier COUNT; type TYPE1, TYPE2; @@ ( kzalloc( - sizeof(TYPE1) * sizeof(TYPE2) * COUNT + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2)) , ...) | kzalloc( - sizeof(TYPE1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2)) , ...) | kzalloc( - sizeof(THING1) * sizeof(THING2) * COUNT + array3_size(COUNT, sizeof(THING1), sizeof(THING2)) , ...) | kzalloc( - sizeof(THING1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(THING1), sizeof(THING2)) , ...) | kzalloc( - sizeof(TYPE1) * sizeof(THING2) * COUNT + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2)) , ...) | kzalloc( - sizeof(TYPE1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2)) , ...) ) // 3-factor product, only identifiers, with redundant parens removed. @@ identifier STRIDE, SIZE, COUNT; @@ ( kzalloc( - (COUNT) * STRIDE * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - COUNT * (STRIDE) * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - COUNT * STRIDE * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - (COUNT) * (STRIDE) * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - COUNT * (STRIDE) * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - (COUNT) * STRIDE * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - (COUNT) * (STRIDE) * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - COUNT * STRIDE * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) ) // Any remaining multi-factor products, first at least 3-factor products, // when they're not all constants... @@ expression E1, E2, E3; constant C1, C2, C3; @@ ( kzalloc(C1 * C2 * C3, ...) | kzalloc( - (E1) * E2 * E3 + array3_size(E1, E2, E3) , ...) | kzalloc( - (E1) * (E2) * E3 + array3_size(E1, E2, E3) , ...) | kzalloc( - (E1) * (E2) * (E3) + array3_size(E1, E2, E3) , ...) | kzalloc( - E1 * E2 * E3 + array3_size(E1, E2, E3) , ...) ) // And then all remaining 2 factors products when they're not all constants, // keeping sizeof() as the second factor argument. @@ expression THING, E1, E2; type TYPE; constant C1, C2, C3; @@ ( kzalloc(sizeof(THING) * C2, ...) | kzalloc(sizeof(TYPE) * C2, ...) | kzalloc(C1 * C2 * C3, ...) | kzalloc(C1 * C2, ...) | - kzalloc + kcalloc ( - sizeof(TYPE) * (E2) + E2, sizeof(TYPE) , ...) | - kzalloc + kcalloc ( - sizeof(TYPE) * E2 + E2, sizeof(TYPE) , ...) | - kzalloc + kcalloc ( - sizeof(THING) * (E2) + E2, sizeof(THING) , ...) | - kzalloc + kcalloc ( - sizeof(THING) * E2 + E2, sizeof(THING) , ...) | - kzalloc + kcalloc ( - (E1) * E2 + E1, E2 , ...) | - kzalloc + kcalloc ( - (E1) * (E2) + E1, E2 , ...) | - kzalloc + kcalloc ( - E1 * E2 + E1, E2 , ...) ) Signed-off-by: Kees Cook <keescook@chromium.org>
|
#
6da2ec56 |
|
12-Jun-2018 |
Kees Cook <keescook@chromium.org> |
treewide: kmalloc() -> kmalloc_array() The kmalloc() function has a 2-factor argument form, kmalloc_array(). This patch replaces cases of: kmalloc(a * b, gfp) with: kmalloc_array(a * b, gfp) as well as handling cases of: kmalloc(a * b * c, gfp) with: kmalloc(array3_size(a, b, c), gfp) as it's slightly less ugly than: kmalloc_array(array_size(a, b), c, gfp) This does, however, attempt to ignore constant size factors like: kmalloc(4 * 1024, gfp) though any constants defined via macros get caught up in the conversion. Any factors with a sizeof() of "unsigned char", "char", and "u8" were dropped, since they're redundant. The tools/ directory was manually excluded, since it has its own implementation of kmalloc(). The Coccinelle script used for this was: // Fix redundant parens around sizeof(). @@ type TYPE; expression THING, E; @@ ( kmalloc( - (sizeof(TYPE)) * E + sizeof(TYPE) * E , ...) | kmalloc( - (sizeof(THING)) * E + sizeof(THING) * E , ...) ) // Drop single-byte sizes and redundant parens. @@ expression COUNT; typedef u8; typedef __u8; @@ ( kmalloc( - sizeof(u8) * (COUNT) + COUNT , ...) | kmalloc( - sizeof(__u8) * (COUNT) + COUNT , ...) | kmalloc( - sizeof(char) * (COUNT) + COUNT , ...) | kmalloc( - sizeof(unsigned char) * (COUNT) + COUNT , ...) | kmalloc( - sizeof(u8) * COUNT + COUNT , ...) | kmalloc( - sizeof(__u8) * COUNT + COUNT , ...) | kmalloc( - sizeof(char) * COUNT + COUNT , ...) | kmalloc( - sizeof(unsigned char) * COUNT + COUNT , ...) ) // 2-factor product with sizeof(type/expression) and identifier or constant. @@ type TYPE; expression THING; identifier COUNT_ID; constant COUNT_CONST; @@ ( - kmalloc + kmalloc_array ( - sizeof(TYPE) * (COUNT_ID) + COUNT_ID, sizeof(TYPE) , ...) | - kmalloc + kmalloc_array ( - sizeof(TYPE) * COUNT_ID + COUNT_ID, sizeof(TYPE) , ...) | - kmalloc + kmalloc_array ( - sizeof(TYPE) * (COUNT_CONST) + COUNT_CONST, sizeof(TYPE) , ...) | - kmalloc + kmalloc_array ( - sizeof(TYPE) * COUNT_CONST + COUNT_CONST, sizeof(TYPE) , ...) | - kmalloc + kmalloc_array ( - sizeof(THING) * (COUNT_ID) + COUNT_ID, sizeof(THING) , ...) | - kmalloc + kmalloc_array ( - sizeof(THING) * COUNT_ID + COUNT_ID, sizeof(THING) , ...) | - kmalloc + kmalloc_array ( - sizeof(THING) * (COUNT_CONST) + COUNT_CONST, sizeof(THING) , ...) | - kmalloc + kmalloc_array ( - sizeof(THING) * COUNT_CONST + COUNT_CONST, sizeof(THING) , ...) ) // 2-factor product, only identifiers. @@ identifier SIZE, COUNT; @@ - kmalloc + kmalloc_array ( - SIZE * COUNT + COUNT, SIZE , ...) // 3-factor product with 1 sizeof(type) or sizeof(expression), with // redundant parens removed. @@ expression THING; identifier STRIDE, COUNT; type TYPE; @@ ( kmalloc( - sizeof(TYPE) * (COUNT) * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kmalloc( - sizeof(TYPE) * (COUNT) * STRIDE + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kmalloc( - sizeof(TYPE) * COUNT * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kmalloc( - sizeof(TYPE) * COUNT * STRIDE + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kmalloc( - sizeof(THING) * (COUNT) * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | kmalloc( - sizeof(THING) * (COUNT) * STRIDE + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | kmalloc( - sizeof(THING) * COUNT * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | kmalloc( - sizeof(THING) * COUNT * STRIDE + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) ) // 3-factor product with 2 sizeof(variable), with redundant parens removed. @@ expression THING1, THING2; identifier COUNT; type TYPE1, TYPE2; @@ ( kmalloc( - sizeof(TYPE1) * sizeof(TYPE2) * COUNT + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2)) , ...) | kmalloc( - sizeof(TYPE1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2)) , ...) | kmalloc( - sizeof(THING1) * sizeof(THING2) * COUNT + array3_size(COUNT, sizeof(THING1), sizeof(THING2)) , ...) | kmalloc( - sizeof(THING1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(THING1), sizeof(THING2)) , ...) | kmalloc( - sizeof(TYPE1) * sizeof(THING2) * COUNT + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2)) , ...) | kmalloc( - sizeof(TYPE1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2)) , ...) ) // 3-factor product, only identifiers, with redundant parens removed. @@ identifier STRIDE, SIZE, COUNT; @@ ( kmalloc( - (COUNT) * STRIDE * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | kmalloc( - COUNT * (STRIDE) * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | kmalloc( - COUNT * STRIDE * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kmalloc( - (COUNT) * (STRIDE) * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | kmalloc( - COUNT * (STRIDE) * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kmalloc( - (COUNT) * STRIDE * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kmalloc( - (COUNT) * (STRIDE) * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kmalloc( - COUNT * STRIDE * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) ) // Any remaining multi-factor products, first at least 3-factor products, // when they're not all constants... @@ expression E1, E2, E3; constant C1, C2, C3; @@ ( kmalloc(C1 * C2 * C3, ...) | kmalloc( - (E1) * E2 * E3 + array3_size(E1, E2, E3) , ...) | kmalloc( - (E1) * (E2) * E3 + array3_size(E1, E2, E3) , ...) | kmalloc( - (E1) * (E2) * (E3) + array3_size(E1, E2, E3) , ...) | kmalloc( - E1 * E2 * E3 + array3_size(E1, E2, E3) , ...) ) // And then all remaining 2 factors products when they're not all constants, // keeping sizeof() as the second factor argument. @@ expression THING, E1, E2; type TYPE; constant C1, C2, C3; @@ ( kmalloc(sizeof(THING) * C2, ...) | kmalloc(sizeof(TYPE) * C2, ...) | kmalloc(C1 * C2 * C3, ...) | kmalloc(C1 * C2, ...) | - kmalloc + kmalloc_array ( - sizeof(TYPE) * (E2) + E2, sizeof(TYPE) , ...) | - kmalloc + kmalloc_array ( - sizeof(TYPE) * E2 + E2, sizeof(TYPE) , ...) | - kmalloc + kmalloc_array ( - sizeof(THING) * (E2) + E2, sizeof(THING) , ...) | - kmalloc + kmalloc_array ( - sizeof(THING) * E2 + E2, sizeof(THING) , ...) | - kmalloc + kmalloc_array ( - (E1) * E2 + E1, E2 , ...) | - kmalloc + kmalloc_array ( - (E1) * (E2) + E1, E2 , ...) | - kmalloc + kmalloc_array ( - E1 * E2 + E1, E2 , ...) ) Signed-off-by: Kees Cook <keescook@chromium.org>
|
#
fd3a8862 |
|
06-Jun-2018 |
Willem de Bruijn <willemb@google.com> |
net: in virtio_net_hdr only add VLAN_HLEN to csum_start if payload holds vlan Tun, tap, virtio, packet and uml vector all use struct virtio_net_hdr to communicate packet metadata to userspace. For skbuffs with vlan, the first two return the packet as it may have existed on the wire, inserting the VLAN tag in the user buffer. Then virtio_net_hdr.csum_start needs to be adjusted by VLAN_HLEN bytes. Commit f09e2249c4f5 ("macvtap: restore vlan header on user read") added this feature to macvtap. Commit 3ce9b20f1971 ("macvtap: Fix csum_start when VLAN tags are present") then fixed up csum_start. Virtio, packet and uml do not insert the vlan header in the user buffer. When introducing virtio_net_hdr_from_skb to deduplicate filling in the virtio_net_hdr, the variant from macvtap which adds VLAN_HLEN was applied uniformly, breaking csum offset for packets with vlan on virtio and packet. Make insertion of VLAN_HLEN optional. Convert the callers to pass it when needed. Fixes: e858fae2b0b8f4 ("virtio_net: use common code for virtio_net_hdr and skb GSO conversion") Fixes: 1276f24eeef2 ("packet: use common code for virtio_net_hdr and skb GSO conversion") Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
16ec025a |
|
05-Jun-2018 |
Jesper Dangaard Brouer <brouer@redhat.com> |
virtio_net: remove ndo_xdp_flush call virtnet_xdp_flush Remove the ndo_xdp_flush call implementation virtnet_xdp_flush as no callers of ndo_xdp_flush are left. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
#
2fa3c8a8 |
|
31-May-2018 |
Tonghao Zhang <xiangxia.m.yue@gmail.com> |
net: virtio: simplify the virtnet_find_vqs Use the common free functions while return successfully. Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
5d274cb4 |
|
31-May-2018 |
Jesper Dangaard Brouer <brouer@redhat.com> |
virtio_net: implement flush flag for ndo_xdp_xmit When passed the XDP_XMIT_FLUSH flag virtnet_xdp_xmit now performs the same virtqueue_kick as virtnet_xdp_flush. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
#
42b33468 |
|
31-May-2018 |
Jesper Dangaard Brouer <brouer@redhat.com> |
xdp: add flags argument to ndo_xdp_xmit API This patch only change the API and reject any use of flags. This is an intermediate step that allows us to implement the flush flag operation later, for each individual driver in a separate patch. The plan is to implement flush operation via XDP_XMIT_FLUSH flag and then remove XDP_XMIT_FLAGS_NONE when done. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
#
4b8e6ac4 |
|
30-May-2018 |
Wei Yongjun <weiyongjun1@huawei.com> |
virtio_net: fix error return code in virtnet_probe() Fix to return a negative error code from the failover create fail error handling case instead of 0, as done elsewhere in this function. Fixes: ba5e4426e80e ("virtio_net: Extend virtio to use VF datapath when available") Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Acked-by: Jason Wang <jasowang@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
ba5e4426 |
|
24-May-2018 |
Sridhar Samudrala <sridhar.samudrala@intel.com> |
virtio_net: Extend virtio to use VF datapath when available This patch enables virtio_net to switch over to a VF datapath when STANDBY feature is enabled and a VF netdev is present with the same MAC address. It allows live migration of a VM with a direct attached VF without the need to setup a bond/team between a VF and virtio net device in the guest. It uses the API that is exported by the net_failover driver to create and and destroy a master failover netdev. When STANDBY feature is enabled, an additional netdev(failover netdev) is created that acts as a master device and tracks the state of the 2 lower netdevs. The original virtio_net netdev is marked as 'standby' netdev and a passthru device with the same MAC is registered as 'primary' netdev. The hypervisor needs to unplug the VF device from the guest on the source host and reset the MAC filter of the VF to initiate failover of datapath to virtio before starting the migration. After the migration is completed, the destination hypervisor sets the MAC filter on the VF and plugs it back to the guest to switch over to VF datapath. This patch is based on the discussion initiated by Jesse on this thread. https://marc.info/?l=linux-virtualization&m=151189725224231&w=2 Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
9805069d |
|
24-May-2018 |
Sridhar Samudrala <sridhar.samudrala@intel.com> |
virtio_net: Introduce VIRTIO_NET_F_STANDBY feature bit This feature bit can be used by hypervisor to indicate virtio_net device to act as a standby for another device with the same MAC address. VIRTIO_NET_F_STANDBY is defined as bit 62 as it is a device feature bit. Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
735fc405 |
|
24-May-2018 |
Jesper Dangaard Brouer <brouer@redhat.com> |
xdp: change ndo_xdp_xmit API to support bulking This patch change the API for ndo_xdp_xmit to support bulking xdp_frames. When kernel is compiled with CONFIG_RETPOLINE, XDP sees a huge slowdown. Most of the slowdown is caused by DMA API indirect function calls, but also the net_device->ndo_xdp_xmit() call. Benchmarked patch with CONFIG_RETPOLINE, using xdp_redirect_map with single flow/core test (CPU E5-1650 v4 @ 3.60GHz), showed performance improved: for driver ixgbe: 6,042,682 pps -> 6,853,768 pps = +811,086 pps for driver i40e : 6,187,169 pps -> 6,724,519 pps = +537,350 pps With frames avail as a bulk inside the driver ndo_xdp_xmit call, further optimizations are possible, like bulk DMA-mapping for TX. Testing without CONFIG_RETPOLINE show the same performance for physical NIC drivers. The virtual NIC driver tun sees a huge performance boost, as it can avoid doing per frame producer locking, but instead amortize the locking cost over the bulk. V2: Fix compile errors reported by kbuild test robot <lkp@intel.com> V4: Isolated ndo, driver changes and callers. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
#
3d62b2a0 |
|
21-May-2018 |
Jason Wang <jasowang@redhat.com> |
virtio-net: fix leaking page for gso packet during mergeable XDP We need to drop refcnt to xdp_page if we see a gso packet. Otherwise it will be leaked. Fixing this by moving the check of gso packet above the linearizing logic. While at it, remove useless comment as well. Cc: John Fastabend <john.fastabend@gmail.com> Fixes: 72979a6c3590 ("virtio_net: xdp, add slowpath case for non contiguous buffers") Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
850e088d |
|
21-May-2018 |
Jason Wang <jasowang@redhat.com> |
virtio-net: correctly check num_buf during err path If we successfully linearize the packet, num_buf will be set to zero which may confuse error handling path which assumes num_buf is at least 1 and this can lead the code tries to pop the descriptor of next buffer. Fixing this by checking num_buf against 1 before decreasing. Fixes: 4941d472bf95 ("virtio-net: do not reset during XDP set") Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
5d458a13 |
|
21-May-2018 |
Jason Wang <jasowang@redhat.com> |
virtio-net: correctly transmit XDP buff after linearizing We should not go for the error path after successfully transmitting a XDP buffer after linearizing. Since the error path may try to pop and drop next packet and increase the drop counters. Fixing this by simply drop the refcnt of original page and go for xmit path. Fixes: 72979a6c3590 ("virtio_net: xdp, add slowpath case for non contiguous buffers") Cc: John Fastabend <john.fastabend@gmail.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
6890418b |
|
21-May-2018 |
Jason Wang <jasowang@redhat.com> |
virtio-net: correctly redirect linearized packet After a linearized packet was redirected by XDP, we should not go for the err path which will try to pop buffers for the next packet and increase the drop counter. Fixing this by just drop the page refcnt for the original page. Fixes: 186b3c998c50 ("virtio-net: support XDP_REDIRECT") Reported-by: David Ahern <dsahern@gmail.com> Tested-by: David Ahern <dsahern@gmail.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
aaa64527 |
|
22-Apr-2018 |
Nikita V. Shirokov <tehnerd@tehnerd.com> |
bpf: fix virtio-net's length calc for XDP_PASS In commit 6870de435b90 ("bpf: make virtio compatible w/ bpf_xdp_adjust_tail") i didn't account for vi->hdr_len during new packet's length calculation after bpf_prog_run in receive_mergeable. because of this all packets, if they were passed to the kernel, were truncated by 12 bytes. Fixes:6870de435b90 ("bpf: make virtio compatible w/ bpf_xdp_adjust_tail") Reported-by: David Ahern <dsahern@gmail.com> Signed-off-by: Nikita V. Shirokov <tehnerd@tehnerd.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
#
f4ee703a |
|
18-Apr-2018 |
Michael S. Tsirkin <mst@redhat.com> |
virtio_net: sparse annotation fix offloads is a buffer in virtio format, should use the __virtio64 tag. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
d7fad4c8 |
|
18-Apr-2018 |
Michael S. Tsirkin <mst@redhat.com> |
virtio_net: fix adding vids on big-endian Programming vids (adding or removing them) still passes guest-endian values in the DMA buffer. That's wrong if guest is big-endian and when virtio 1 is enabled. Note: this is on top of a previous patch: virtio_net: split out ctrl buffer Fixes: 9465a7a6f ("virtio_net: enable v1.0 support") Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
12e57169 |
|
18-Apr-2018 |
Michael S. Tsirkin <mst@redhat.com> |
virtio_net: split out ctrl buffer When sending control commands, virtio net sets up several buffers for DMA. The buffers are all part of the net device which means it's actually allocated by kvmalloc so it's in theory (on extreme memory pressure) possible to get a vmalloc'ed buffer which on some platforms means we can't DMA there. Fix up by moving the DMA buffers into a separate structure. Reported-by: Mikulas Patocka <mpatocka@redhat.com> Suggested-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
6870de43 |
|
17-Apr-2018 |
Nikita V. Shirokov <tehnerd@tehnerd.com> |
bpf: make virtio compatible w/ bpf_xdp_adjust_tail w/ bpf_xdp_adjust_tail helper xdp's data_end pointer could be changed as well (only "decrease" of pointer's location is going to be supported). changing of this pointer will change packet's size. for virtio driver we need to adjust XDP_PASS handling by recalculating length of the packet if it was passed to the TCP/IP stack Reviewed-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Nikita V. Shirokov <tehnerd@tehnerd.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
#
44fa2dbd |
|
17-Apr-2018 |
Jesper Dangaard Brouer <brouer@redhat.com> |
xdp: transition into using xdp_frame for ndo_xdp_xmit Changing API ndo_xdp_xmit to take a struct xdp_frame instead of struct xdp_buff. This brings xdp_return_frame and ndp_xdp_xmit in sync. This builds towards changing the API further to become a bulk API, because xdp_buff is not a queue-able object while xdp_frame is. V4: Adjust for commit 59655a5b6c83 ("tuntap: XDP_TX can use native XDP") V7: Adjust for commit d9314c474d4f ("i40e: add support for XDP_REDIRECT") Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
03993094 |
|
17-Apr-2018 |
Jesper Dangaard Brouer <brouer@redhat.com> |
xdp: transition into using xdp_frame for return API Changing API xdp_return_frame() to take struct xdp_frame as argument, seems like a natural choice. But there are some subtle performance details here that needs extra care, which is a deliberate choice. When de-referencing xdp_frame on a remote CPU during DMA-TX completion, result in the cache-line is change to "Shared" state. Later when the page is reused for RX, then this xdp_frame cache-line is written, which change the state to "Modified". This situation already happens (naturally) for, virtio_net, tun and cpumap as the xdp_frame pointer is the queued object. In tun and cpumap, the ptr_ring is used for efficiently transferring cache-lines (with pointers) between CPUs. Thus, the only option is to de-referencing xdp_frame. It is only the ixgbe driver that had an optimization, in which it can avoid doing the de-reference of xdp_frame. The driver already have TX-ring queue, which (in case of remote DMA-TX completion) have to be transferred between CPUs anyhow. In this data area, we stored a struct xdp_mem_info and a data pointer, which allowed us to avoid de-referencing xdp_frame. To compensate for this, a prefetchw is used for telling the cache coherency protocol about our access pattern. My benchmarks show that this prefetchw is enough to compensate the ixgbe driver. V7: Adjust for commit d9314c474d4f ("i40e: add support for XDP_REDIRECT") V8: Adjust for commit bd658dda4237 ("net/mlx5e: Separate dma base address and offset in dma_sync call") Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
8d5d8852 |
|
17-Apr-2018 |
Jesper Dangaard Brouer <brouer@redhat.com> |
xdp: rhashtable with allocator ID to pointer mapping Use the IDA infrastructure for getting a cyclic increasing ID number, that is used for keeping track of each registered allocator per RX-queue xdp_rxq_info. Instead of using the IDR infrastructure, which uses a radix tree, use a dynamic rhashtable, for creating ID to pointer lookup table, because this is faster. The problem that is being solved here is that, the xdp_rxq_info pointer (stored in xdp_buff) cannot be used directly, as the guaranteed lifetime is too short. The info is needed on a (potentially) remote CPU during DMA-TX completion time . In an xdp_frame the xdp_mem_info is stored, when it got converted from an xdp_buff, which is sufficient for the simple page refcnt based recycle schemes. For more advanced allocators there is a need to store a pointer to the registered allocator. Thus, there is a need to guard the lifetime or validity of the allocator pointer, which is done through this rhashtable ID map to pointer. The removal and validity of of the allocator and helper struct xdp_mem_allocator is guarded by RCU. The allocator will be created by the driver, and registered with xdp_rxq_info_reg_mem_model(). It is up-to debate who is responsible for freeing the allocator pointer or invoking the allocator destructor function. In any case, this must happen via RCU freeing. Use the IDA infrastructure for getting a cyclic increasing ID number, that is used for keeping track of each registered allocator per RX-queue xdp_rxq_info. V4: Per req of Jason Wang - Use xdp_rxq_info_reg_mem_model() in all drivers implementing XDP_REDIRECT, even-though it's not strictly necessary when allocator==NULL for type MEM_TYPE_PAGE_SHARED (given it's zero). V6: Per req of Alex Duyck - Introduce rhashtable_lookup() call in later patch V8: Address sparse should be static warnings (from kbuild test robot) Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
cac320c8 |
|
17-Apr-2018 |
Jesper Dangaard Brouer <brouer@redhat.com> |
virtio_net: convert to use generic xdp_frame and xdp_return_frame API The virtio_net driver assumes XDP frames are always released based on page refcnt (via put_page). Thus, is only queues the XDP data pointer address and uses virt_to_head_page() to retrieve struct page. Use the XDP return API to get away from such assumptions. Instead queue an xdp_frame, which allow us to use the xdp_return_frame API, when releasing the frame. V8: Avoid endianness issues (found by kbuild test robot) V9: Change __virtnet_xdp_xmit from bool to int return value (found by Dan Carpenter) Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
9267c430 |
|
13-Apr-2018 |
Jason Wang <jasowang@redhat.com> |
virtio-net: add missing virtqueue kick when flushing packets We tends to batch submitting packets during XDP_TX. This requires to kick virtqueue after a batch, we tried to do it through xdp_do_flush_map() which only makes sense for devmap not XDP_TX. So explicitly kick the virtqueue in this case. Reported-by: Kimitoshi Takahashi <ktaka@nii.ac.jp> Tested-by: Kimitoshi Takahashi <ktaka@nii.ac.jp> Cc: Daniel Borkmann <daniel@iogearbox.net> Fixes: 186b3c998c50 ("virtio-net: support XDP_REDIRECT") Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
bda7fab5 |
|
22-Mar-2018 |
Jay Vosburgh <jay.vosburgh@canonical.com> |
virtio-net: Fix operstate for virtio when no VIRTIO_NET_F_STATUS The operstate update logic will leave an interface in the default UNKNOWN operstate if the interface carrier state never changes from the default carrier up state set at creation. This includes the case of an explicit call to netif_carrier_on, as the carrier on to on transition has no effect on operstate. This affects virtio-net for the case that the virtio peer does not support VIRTIO_NET_F_STATUS (the feature that provides carrier state updates). Without this feature, the virtio specification states that "the link should be assumed active," so, logically, the operstate should be UP instead of UNKNOWN. This has impact on user space applications that use the operstate to make availability decisions for the interface. Resolve this by changing the virtio probe logic slightly to call netif_carrier_off for both the "with" and "without" VIRTIO_NET_F_STATUS cases, and then the existing call to netif_carrier_on for the "without" case will cause an operstate transition. Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Jay Vosburgh <jay.vosburgh@canonical.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
3cc81a9a |
|
02-Mar-2018 |
Jason Wang <jasowang@redhat.com> |
virtio-net: re enable XDP_REDIRECT for mergeable buffer XDP_REDIRECT support for mergeable buffer was removed since commit 7324f5399b06 ("virtio_net: disable XDP_REDIRECT in receive_mergeable() case"). This is because we don't reserve enough tailroom for struct skb_shared_info which breaks XDP assumption. So this patch fixes this by reserving enough tailroom and using fixed size of rx buffer. Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
51568d69 |
|
02-Mar-2018 |
Jason Wang <jasowang@redhat.com> |
virtio-net: re enable XDP_REDIRECT for mergeable buffer XDP_REDIRECT support for mergeable buffer was removed since commit 7324f5399b06 ("virtio_net: disable XDP_REDIRECT in receive_mergeable() case"). This is because we don't reserve enough tailroom for struct skb_shared_info which breaks XDP assumption. So this patch fixes this by reserving enough tailroom and using fixed size of rx buffer. Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
4e09ff53 |
|
28-Feb-2018 |
Jason Wang <jasowang@redhat.com> |
virtio-net: disable NAPI only when enabled during XDP set We try to disable NAPI to prevent a single XDP TX queue being used by multiple cpus. But we don't check if device is up (NAPI is enabled), this could result stall because of infinite wait in napi_disable(). Fixing this by checking device state through netif_running() before. Fixes: 4941d472bf95b ("virtio-net: do not reset during XDP set") Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
8dcc5b0a |
|
20-Feb-2018 |
Jesper Dangaard Brouer <brouer@redhat.com> |
virtio_net: fix ndo_xdp_xmit crash towards dev not ready for XDP When a driver implements the ndo_xdp_xmit() function, there is (currently) no generic way to determine whether it is safe to call. It is e.g. unsafe to call the drivers ndo_xdp_xmit, if it have not allocated the needed XDP TX queues yet. This is the case for virtio_net, which first allocates the XDP TX queues once an XDP/bpf prog is attached (in virtnet_xdp_set()). Thus, a crash will occur for virtio_net when redirecting to another virtio_net device's ndo_xdp_xmit, which have not attached a XDP prog. The sample xdp_redirect_map tries to attach a dummy XDP prog to take this into account, but it can also easily fail if the virtio_net (or actually underlying vhost driver) have not allocated enough extra queues for the device. Allocating more queue this is currently a manual config. Hint for libvirt XML add: <driver name='vhost' queues='16'> <host mrg_rxbuf='off'/> <guest tso4='off' tso6='off' ecn='off' ufo='off'/> </driver> The solution in this patch is to check that the device have loaded an XDP/bpf prog before proceeding. This is similar to the check performed in driver ixgbe. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
11b7d897 |
|
20-Feb-2018 |
Jesper Dangaard Brouer <brouer@redhat.com> |
virtio_net: fix memory leak in XDP_REDIRECT XDP_REDIRECT calling xdp_do_redirect() can fail for multiple reasons (which can be inspected by tracepoints). The current semantics is that on failure the driver calling xdp_do_redirect() must handle freeing or recycling the page associated with this frame. This can be seen as an optimization, as drivers usually have an optimized XDP_DROP code path for frame recycling in place already. The virtio_net driver didn't handle when xdp_do_redirect() failed. This caused a memory leak as the page refcnt wasn't decremented on failures. The function __virtnet_xdp_xmit() did handle one type of failure, when the xmit queue virtqueue_add_outbuf() is full, which "hides" releasing a refcnt on the page. Instead the function __virtnet_xdp_xmit() must follow API of xdp_do_redirect(), which on errors leave it up to the caller to free the page, of the failed send operation. Fixes: 186b3c998c50 ("virtio-net: support XDP_REDIRECT") Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
95dbe9e7 |
|
20-Feb-2018 |
Jesper Dangaard Brouer <brouer@redhat.com> |
virtio_net: fix XDP code path in receive_small() When configuring virtio_net to use the code path 'receive_small()', in-order to get correct XDP_REDIRECT support, I discovered TCP packets would get silently dropped when loading an XDP program action XDP_PASS. The bug seems to be that receive_small() when XDP is loaded check that hdr->hdr.flags is zero, which seems wrong as hdr.flags contains the flags VIRTIO_NET_HDR_F_* : #define VIRTIO_NET_HDR_F_NEEDS_CSUM 1 /* Use csum_start, csum_offset */ #define VIRTIO_NET_HDR_F_DATA_VALID 2 /* Csum is valid */ TCP got dropped as it had the VIRTIO_NET_HDR_F_DATA_VALID flag set. The flags that are relevant here are the VIRTIO_NET_HDR_GSO_* flags stored in hdr->hdr.gso_type. Thus, the fix is just check that none of the gso_type flags have been set. Fixes: bb91accf2733 ("virtio-net: XDP support for small buffers") Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
7324f539 |
|
20-Feb-2018 |
Jesper Dangaard Brouer <brouer@redhat.com> |
virtio_net: disable XDP_REDIRECT in receive_mergeable() case The virtio_net code have three different RX code-paths in receive_buf(). Two of these code paths can handle XDP, but one of them is broken for at least XDP_REDIRECT. Function(1): receive_big() does not support XDP. Function(2): receive_small() support XDP fully and uses build_skb(). Function(3): receive_mergeable() broken XDP_REDIRECT uses napi_alloc_skb(). The simple explanation is that receive_mergeable() is broken because it uses napi_alloc_skb(), which violates XDP given XDP assumes packet header+data in single page and enough tail room for skb_shared_info. The longer explaination is that receive_mergeable() tries to work-around and satisfy these XDP requiresments e.g. by having a function xdp_linearize_page() that allocates and memcpy RX buffers around (in case packet is scattered across multiple rx buffers). This does currently satisfy XDP_PASS, XDP_DROP and XDP_TX (but only because we have not implemented bpf_xdp_adjust_tail yet). The XDP_REDIRECT action combined with cpumap is broken, and cause hard to debug crashes. The main issue is that the RX packet does not have the needed tail-room (SKB_DATA_ALIGN(skb_shared_info)), causing skb_shared_info to overlap the next packets head-room (in which cpumap stores info). Reproducing depend on the packet payload length and if RX-buffer size happened to have tail-room for skb_shared_info or not. But to make this even harder to troubleshoot, the RX-buffer size is runtime dynamically change based on an Exponentially Weighted Moving Average (EWMA) over the packet length, when refilling RX rings. This patch only disable XDP_REDIRECT support in receive_mergeable() case, because it can cause a real crash. IMHO we should consider NOT supporting XDP in receive_mergeable() at all, because the principles behind XDP are to gain speed by (1) code simplicity, (2) sacrificing memory and (3) where possible moving runtime checks to setup time. These principles are clearly being violated in receive_mergeable(), that e.g. runtime track average buffer size to save memory consumption. In the longer run, we should consider introducing a separate receive function when attaching an XDP program, and also change the memory model to be compatible with XDP when attaching an XDP prog. Fixes: 186b3c998c50 ("virtio-net: support XDP_REDIRECT") Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
d7dfc5cf |
|
16-Jan-2018 |
Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> |
virtio_net: Add ethtool stats The main purpose of this patch is adding a way of checking per-queue stats. It's useful to debug performance problems on multiqueue environment. $ ethtool -S ens10 NIC statistics: rx_queue_0_packets: 2090408 rx_queue_0_bytes: 3164825094 rx_queue_1_packets: 2082531 rx_queue_1_bytes: 3152932314 tx_queue_0_packets: 2770841 tx_queue_0_bytes: 4194955474 tx_queue_1_packets: 3084697 tx_queue_1_bytes: 4670196372 This change converts existing per-cpu stats structure into per-queue one. This should not impact on performance since each queue counter is not updated concurrently by multiple cpus. Performance numbers: - Guest has 2 vcpus and 2 queues - Guest runs netserver - Host runs 100-flow super_netperf Before After Diff UDP_STREAM 18byte 86.22 87.00 +0.90% UDP_STREAM 1472byte 4055.27 4042.18 -0.32% TCP_STREAM 16956.32 16890.63 -0.39% UDP_RR 178667.11 185862.70 +4.03% TCP_RR 128473.04 124985.81 -2.71% Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
faa9b39f |
|
05-Jan-2018 |
Jason Baron <jbaron@akamai.com> |
virtio_net: propagate linkspeed/duplex settings from the hypervisor The ability to set speed and duplex for virtio_net is useful in various scenarios as described here: 16032be virtio_net: add ethtool support for set and get of settings However, it would be nice to be able to set this from the hypervisor, such that virtio_net doesn't require custom guest ethtool commands. Introduce a new feature flag, VIRTIO_NET_F_SPEED_DUPLEX, which allows the hypervisor to export a linkspeed and duplex setting. The user can subsequently overwrite it later if desired via: 'ethtool -s'. Note that VIRTIO_NET_F_SPEED_DUPLEX is defined as bit 63, the intention is that device feature bits are to grow down from bit 63, since the transports are starting from bit 24 and growing up. Signed-off-by: Jason Baron <jbaron@akamai.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Cc: virtio-dev@lists.oasis-open.org Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
754b8a21 |
|
03-Jan-2018 |
Jesper Dangaard Brouer <brouer@redhat.com> |
virtio_net: setup xdp_rxq_info The virtio_net driver doesn't dynamically change the RX-ring queue layout and backing pages, but instead reject XDP setup if all the conditions for XDP is not meet. Thus, the xdp_rxq_info also remains fairly static. This allow us to simply add the reg/unreg to net_device open/close functions. Driver hook points for xdp_rxq_info: * reg : virtnet_open * unreg: virtnet_close V3: - bugfix, also setup xdp.rxq in receive_mergeable() - Tested bpf-sample prog inside guest on a virtio_net device Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Cc: virtualization@lists.linux-foundation.org Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Reviewed-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
#
fdaa767a |
|
06-Dec-2017 |
Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> |
virtio_net: Disable interrupts if napi_complete_done rescheduled napi Since commit 39e6c8208d7b ("net: solve a NAPI race") napi has been able to be rescheduled within napi_complete_done() even in non-busypoll case, but virtnet_poll() always enabled interrupts before complete, and when napi was rescheduled within napi_complete_done() it did not disable interrupts. This caused more interrupts when event idx is disabled. According to commit cbdadbbf0c79 ("virtio_net: fix race in RX VQ processing") we cannot place virtqueue_enable_cb_prepare() after NAPI_STATE_SCHED is cleared, so disable interrupts again if napi_complete_done() returned false. Tested with vhost-user of OVS 2.7 on host, which does not have the event idx feature. * Before patch: $ netperf -t UDP_STREAM -H 192.168.150.253 -l 60 -- -m 1472 MIGRATED UDP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.150.253 () port 0 AF_INET Socket Message Elapsed Messages Size Size Time Okay Errors Throughput bytes bytes secs # # 10^6bits/sec 212992 1472 60.00 32763206 0 6430.32 212992 60.00 23384299 4589.56 Interrupts on guest: 9872369 Packets/interrupt: 2.37 * After patch $ netperf -t UDP_STREAM -H 192.168.150.253 -l 60 -- -m 1472 MIGRATED UDP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.150.253 () port 0 AF_INET Socket Message Elapsed Messages Size Size Time Okay Errors Throughput bytes bytes secs # # 10^6bits/sec 212992 1472 60.00 32794646 0 6436.49 212992 60.00 32793501 6436.27 Interrupts on guest: 4941299 Packets/interrupt: 6.64 Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Acked-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
03e9f8a0 |
|
03-Dec-2017 |
Yunjian Wang <wangyunjian@huawei.com> |
virtio_net: fix return value check in receive_mergeable() The function virtqueue_get_buf_ctx() could return NULL, the return value 'buf' need to be checked with NULL, not value 'ctx'. Signed-off-by: Yunjian Wang <wangyunjian@huawei.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
|
#
453f85d4 |
|
15-Nov-2017 |
Mel Gorman <mgorman@techsingularity.net> |
mm: remove __GFP_COLD As the page free path makes no distinction between cache hot and cold pages, there is no real useful ordering of pages in the free list that allocation requests can take advantage of. Juding from the users of __GFP_COLD, it is likely that a number of them are the result of copying other sites instead of actually measuring the impact. Remove the __GFP_COLD parameter which simplifies a number of paths in the page allocator. This is potentially controversial but bear in mind that the size of the per-cpu pagelists versus modern cache sizes means that the whole per-cpu list can often fit in the L3 cache. Hence, there is only a potential benefit for microbenchmarks that alloc/free pages in a tight loop. It's even worse when THP is taken into account which has little or no chance of getting a cache-hot page as the per-cpu list is bypassed and the zeroing of multiple pages will thrash the cache anyway. The truncate microbenchmarks are not shown as this patch affects the allocation path and not the free path. A page fault microbenchmark was tested but it showed no sigificant difference which is not surprising given that the __GFP_COLD branches are a miniscule percentage of the fault path. Link: http://lkml.kernel.org/r/20171018075952.10627-9-mgorman@techsingularity.net Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Andi Kleen <ak@linux.intel.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Jan Kara <jack@suse.cz> Cc: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
f4e63525 |
|
03-Nov-2017 |
Jakub Kicinski <kuba@kernel.org> |
net: bpf: rename ndo_xdp to ndo_bpf ndo_xdp is a control path callback for setting up XDP in the driver. We can reuse it for other forms of communication between the eBPF stack and the drivers. Rename the callback and associated structures and definitions. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
de8f3a83 |
|
24-Sep-2017 |
Daniel Borkmann <daniel@iogearbox.net> |
bpf: add meta pointer for direct access This work enables generic transfer of metadata from XDP into skb. The basic idea is that we can make use of the fact that the resulting skb must be linear and already comes with a larger headroom for supporting bpf_xdp_adjust_head(), which mangles xdp->data. Here, we base our work on a similar principle and introduce a small helper bpf_xdp_adjust_meta() for adjusting a new pointer called xdp->data_meta. Thus, the packet has a flexible and programmable room for meta data, followed by the actual packet data. struct xdp_buff is therefore laid out that we first point to data_hard_start, then data_meta directly prepended to data followed by data_end marking the end of packet. bpf_xdp_adjust_head() takes into account whether we have meta data already prepended and if so, memmove()s this along with the given offset provided there's enough room. xdp->data_meta is optional and programs are not required to use it. The rationale is that when we process the packet in XDP (e.g. as DoS filter), we can push further meta data along with it for the XDP_PASS case, and give the guarantee that a clsact ingress BPF program on the same device can pick this up for further post-processing. Since we work with skb there, we can also set skb->mark, skb->priority or other skb meta data out of BPF, thus having this scratch space generic and programmable allows for more flexibility than defining a direct 1:1 transfer of potentially new XDP members into skb (it's also more efficient as we don't need to initialize/handle each of such new members). The facility also works together with GRO aggregation. The scratch space at the head of the packet can be multiple of 4 byte up to 32 byte large. Drivers not yet supporting xdp->data_meta can simply be set up with xdp->data_meta as xdp->data + 1 as bpf_xdp_adjust_meta() will detect this and bail out, such that the subsequent match against xdp->data for later access is guaranteed to fail. The verifier treats xdp->data_meta/xdp->data the same way as we treat xdp->data/xdp->data_end pointer comparisons. The requirement for doing the compare against xdp->data is that it hasn't been modified from it's original address we got from ctx access. It may have a range marking already from prior successful xdp->data/xdp->data_end pointer comparisons though. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
dd543797 |
|
22-Sep-2017 |
Jason Wang <jasowang@redhat.com> |
virtio-net: correctly set xdp_xmit for mergeable buffer We should set xdp_xmit only when xdp_do_redirect() succeed. Cc: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
186b3c99 |
|
19-Sep-2017 |
Jason Wang <jasowang@redhat.com> |
virtio-net: support XDP_REDIRECT This patch tries to add XDP_REDIRECT for virtio-net. The changes are not complex as we could use exist XDP_TX helpers for most of the work. The rest is passing the XDP_TX to NAPI handler for implementing batching. Cc: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
31240345 |
|
19-Sep-2017 |
Jason Wang <jasowang@redhat.com> |
virtio-net: add packet len average only when needed during XDP There's no need to add packet len average in the case of XDP_PASS since it will be done soon after skb is created. Cc: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
9457642a |
|
19-Sep-2017 |
Jason Wang <jasowang@redhat.com> |
virtio-net: remove unnecessary parameter of virtnet_xdp_xmit() CC: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
dadc0736 |
|
24-Aug-2017 |
Eric Dumazet <edumazet@google.com> |
virtio_net: be drop monitor friendly This change is needed to not fool drop monitor. (perf record ... -e skb:kfree_skb ) Packets were properly sent and are consumed after TX completion. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
718ad681 |
|
18-Aug-2017 |
stephen hemminger <stephen@networkplumber.org> |
net: drop unused attribute argument from sysfs queue funcs The show and store functions don't need/use the attribute. Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
a4a76503 |
|
15-Aug-2017 |
stephen hemminger <stephen@networkplumber.org> |
virtio: put paren around sizeof Kernel coding style is to put paren around operand of sizeof. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
7acd4329 |
|
12-Aug-2017 |
Colin Ian King <colin.king@canonical.com> |
virtio-net: make array guest_offloads static The array guest_offloads is local to the source and does not need to be in global scope, so make it static. Also tweak formatting. Cleans up sparse warnings: symbol 'guest_offloads' was not declared. Should it be static? Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
1daa8790 |
|
31-Jul-2017 |
Michael S. Tsirkin <mst@redhat.com> |
virtio_net: fix truesize for mergeable buffers Seth Forshee noticed a performance degradation with some workloads. This turns out to be due to packet drops. Euan Kemp noticed that this is because we drop all packets where length exceeds the truesize, but for some packets we add in extra memory without updating the truesize. This in turn was kept around unchanged from ab7db91705e95 ("virtio-net: auto-tune mergeable rx buffer size for improved performance"). That commit had an internal reason not to account for the extra space: not enough bits to do it. No longer true so let's account for the allocated length exactly. Many thanks to Seth Forshee for the report and bisecting and Euan Kemp for debugging the issue. Fixes: 680557cf79f8 ("virtio_net: rework mergeable buffer handling") Reported-by: Euan Kemp <euan.kemp@coreos.com> Tested-by: Euan Kemp <euan.kemp@coreos.com> Reported-by: Seth Forshee <seth.forshee@canonical.com> Tested-by: Seth Forshee <seth.forshee@canonical.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
67a75194 |
|
25-Jul-2017 |
Arnd Bergmann <arnd@arndb.de> |
virtio-net: mark PM functions as __maybe_unused After removing the reset function, the freeze and restore functions are now unused when CONFIG_PM_SLEEP is disabled: drivers/net/virtio_net.c:1881:12: error: 'virtnet_restore_up' defined but not used [-Werror=unused-function] static int virtnet_restore_up(struct virtio_device *vdev) drivers/net/virtio_net.c:1859:13: error: 'virtnet_freeze_down' defined but not used [-Werror=unused-function] static void virtnet_freeze_down(struct virtio_device *vdev) A more robust way to do this is to remove the #ifdef around the callers and instead mark them as __maybe_unused. The compiler will now just silently drop the unused code. Fixes: 4941d472bf95 ("virtio-net: do not reset during XDP set") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
cfa0ebc9 |
|
24-Jul-2017 |
Andrew Jones <drjones@redhat.com> |
virtio-net: fix module unloading Unregister the driver before removing multi-instance hotplug callbacks. This order avoids the warning issued from __cpuhp_remove_state_cpuslocked when the number of remaining instances isn't yet zero. Fixes: 8017c279196a ("net/virtio-net: Convert to hotplug state machine") Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
|
#
3f93522f |
|
19-Jul-2017 |
Jason Wang <jasowang@redhat.com> |
virtio-net: switch off offloads on demand if possible on XDP set Current XDP implementation wants guest offloads feature to be disabled on device. This is inconvenient and means guest can't benefit from offloads if XDP is not used. This patch tries to address this limitation by disabling the offloads on demand through control guest offloads. Guest offloads will be disabled and enabled on demand on XDP set. Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
4941d472 |
|
19-Jul-2017 |
Jason Wang <jasowang@redhat.com> |
virtio-net: do not reset during XDP set We currently reset the device during XDP set, the main reason is that we allocate more headroom with XDP (for header adjustment). This works but causes network downtime for users. Previous patches encoded the headroom in the buffer context, this makes it possible to detect the case where a buffer with headroom insufficient for XDP is added to the queue and XDP is enabled afterwards. Upon detection, we handle this case by copying the packet (slow, but it's a temporary condition). Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
192f68cf |
|
19-Jul-2017 |
Jason Wang <jasowang@redhat.com> |
virtio-net: switch to use new ctx API for small buffer Use ctx API to store headroom for small buffers. Following patches will retrieve this info and use it for XDP. Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
28b39bc7 |
|
19-Jul-2017 |
Jason Wang <jasowang@redhat.com> |
virtio-net: pack headroom into ctx for mergeable buffers Pack headroom into ctx - this way when we get a buffer we can figure out the actual headroom that was allocated for the buffer. Will be helpful to optimize switching between XDP and non-XDP modes which have different headroom requirements. Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
e078de03 |
|
03-Jul-2017 |
David S. Miller <davem@davemloft.net> |
virtio_net: Remove references to NETIF_F_UFO. It is going away. Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
55281621 |
|
07-Jul-2017 |
Jason Wang <jasowang@redhat.com> |
virtio-net: fix leaking of ctx array Fixes: commit d45b897b11ea ("virtio_net: allow specifying context for rx") Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
713a98d9 |
|
27-Jun-2017 |
Jason Wang <jasowang@redhat.com> |
virtio-net: serialize tx routine during reset We don't hold any tx lock when trying to disable TX during reset, this would lead a use after free since ndo_start_xmit() tries to access the virtqueue which has already been freed. Fix this by using netif_tx_disable() before freeing the vqs, this could make sure no tx after vq freeing. Reported-by: Jean-Philippe Menil <jpmenil@gmail.com> Tested-by: Jean-Philippe Menil <jpmenil@gmail.com> Fixes commit f600b6905015 ("virtio_net: Add XDP support") Cc: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Robert McCabe <robert.mccabe@rockwellcollins.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
5b0e6629 |
|
15-Jun-2017 |
Martin KaFai Lau <kafai@fb.com> |
bpf: virtio_net: Report bpf_prog ID during XDP_QUERY_PROG Add support to virtio_net to report bpf_prog ID during XDP_QUERY_PROG. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Cc: John Fastabend <john.fastabend@gmail.com> Cc: Jason Wang <jasowang@redhat.com> Acked-by: Alexei Starovoitov <ast@fb.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
59ae1d12 |
|
16-Jun-2017 |
Johannes Berg <johannes.berg@intel.com> |
networking: introduce and use skb_put_data() A common pattern with skb_put() is to just want to memcpy() some data into the new space, introduce skb_put_data() for this. An spatch similar to the one for skb_put_zero() converts many of the places using it: @@ identifier p, p2; expression len, skb, data; type t, t2; @@ ( -p = skb_put(skb, len); +p = skb_put_data(skb, data, len); | -p = (t)skb_put(skb, len); +p = skb_put_data(skb, data, len); ) ( p2 = (t2)p; -memcpy(p2, data, len); | -memcpy(p, data, len); ) @@ type t, t2; identifier p, p2; expression skb, data; @@ t *p; ... ( -p = skb_put(skb, sizeof(t)); +p = skb_put_data(skb, data, sizeof(t)); | -p = (t *)skb_put(skb, sizeof(t)); +p = skb_put_data(skb, data, sizeof(t)); ) ( p2 = (t2)p; -memcpy(p2, data, sizeof(*p)); | -memcpy(p, data, sizeof(*p)); ) @@ expression skb, len, data; @@ -memcpy(skb_put(skb, len), data, len); +skb_put_data(skb, data, len); (again, manually post-processed to retain some comments) Reviewed-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
e2fcad58 |
|
03-Jun-2017 |
Jason A. Donenfeld <Jason@zx2c4.com> |
virtio_net: check return value of skb_to_sgvec always Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Reviewed-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
f0c3192c |
|
02-Jun-2017 |
Michael S. Tsirkin <mst@redhat.com> |
virtio_net: lower limit on buffer size commit d85b758f72b0 ("virtio_net: fix support for small rings") was supposed to increase the buffer size for small rings but had an unintentional side effect of decreasing it for large rings. This seems to break some setups - it's not yet clear why, but increasing buffer size back to what it was before helps. Fixes: d85b758f72b0 ("virtio_net: fix support for small rings") Reported-by: Mikulas Patocka <mpatocka@redhat.com> Reported-by: "J. Bruce Fields" <bfields@fieldses.org> Tested-by: Mikulas Patocka <mpatocka@redhat.com> Tested-by: "J. Bruce Fields" <bfields@fieldses.org> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
2836b4f2 |
|
23-May-2017 |
Vlad Yasevich <vyasevich@gmail.com> |
virtio-net: enable TSO/checksum offloads for Q-in-Q vlans Since virtio does not provide it's own ndo_features_check handler, TSO, and now checksum offload, are disabled for stacked vlans. Re-enable the support and let the host take care of it. This restores/improves Guest-to-Guest performance over Q-in-Q vlans. Acked-by: Jason Wang <jasowang@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
56da5fd0 |
|
05-Apr-2017 |
Dan Carpenter <dan.carpenter@oracle.com> |
virtio_net: tidy a couple debug statements We are printing a decimal value for truesize so we shouldn't use an "0x" prefix. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
|
#
5f24df09 |
|
29-Mar-2017 |
Michael S. Tsirkin <mst@redhat.com> |
virtio_net: don't reset twice on XDP on/off We already do a reset once in remove_vq_common - there appears to be no point in doing another one when we add/remove XDP. Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
|
#
d85b758f7 |
|
08-Mar-2017 |
Michael S. Tsirkin <mst@redhat.com> |
virtio_net: fix support for small rings When ring size is small (<32 entries) making buffers smaller means a full ring might not be able to hold enough buffers to fit a single large packet. Make sure a ring full of buffers is large enough to allow at least one packet of max size. Fixes: 2613af0ed18a ("virtio_net: migrate mergeable rx buffers to page frag allocators") Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
|
#
e377fcc8 |
|
06-Mar-2017 |
Michael S. Tsirkin <mst@redhat.com> |
virtio_net: reduce alignment for buffers We don't need to align length to any particular value anymore. Aligning to L1 cache size probably sill makes sense to reduce false sharing. Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
|
#
680557cf |
|
06-Mar-2017 |
Michael S. Tsirkin <mst@redhat.com> |
virtio_net: rework mergeable buffer handling Use the new _ctx virtio API to maintain true length for each buffer. Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
|
#
d45b897b |
|
06-Mar-2017 |
Michael S. Tsirkin <mst@redhat.com> |
virtio_net: allow specifying context for rx With mergeable buffers we never use s/g for rx, so allow specifying context in that case. Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
|
#
4d463c4d |
|
02-May-2017 |
Daniel Borkmann <daniel@iogearbox.net> |
xdp: use common helper for netlink extended ack reporting Small follow-up to d74a32acd59a ("xdp: use netlink extended ACK reporting") in order to let drivers all use the same NL_SET_ERR_MSG_MOD() helper macro for reporting. This also ensures that we consistently add the driver's prefix for dumping the report in user space to indicate that the error message is driver specific and not coming from core code. Furthermore, NL_SET_ERR_MSG_MOD() now reuses NL_SET_ERR_MSG() and thus makes all macros check the pointer as suggested. References: https://www.spinics.net/lists/netdev/msg433267.html Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
9b2bbdb2 |
|
06-Mar-2017 |
Michael S. Tsirkin <mst@redhat.com> |
virtio: wrap find_vqs We are going to add more parameters to find_vqs, let's wrap the call so we don't need to tweak all drivers every time. Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
|
#
9861ce03 |
|
30-Apr-2017 |
Jakub Kicinski <kuba@kernel.org> |
virtio_net: make use of extended ack message reporting Try to carry error messages to the user via the netlink extended ack message attribute. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
1d11e732 |
|
27-Apr-2017 |
Willem de Bruijn <willemb@google.com> |
virtio-net: use netif_tx_napi_add for tx napi Avoid hashing the tx napi struct into napi_hash[], which is used for busy polling receive queues. Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
78a57b48 |
|
25-Apr-2017 |
Willem de Bruijn <willemb@google.com> |
virtio-net: on tx, only call napi_disable if tx napi is on As of tx napi, device down (`ip link set dev $dev down`) hangs unless tx napi is enabled. Else napi_enable is not called, so napi_disable will spin on test_and_set_bit NAPI_STATE_SCHED. Only call napi_disable if tx napi is enabled. Fixes: 5a719c2552ca ("virtio-net: transmit napi") Reported-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Willem de Bruijn <willemb@google.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
bdb12e0d |
|
24-Apr-2017 |
Willem de Bruijn <willemb@google.com> |
virtio-net: keep tx interrupts disabled unless kick Tx napi mode increases the rate of transmit interrupts. Suppress some by masking interrupts while more packets are expected. The interrupts will be reenabled before the last packet is sent. This optimization reduces the througput drop with tx napi for unidirectional flows such as UDP_STREAM that do not benefit from cleaning tx completions in the the receive napi handler. Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
7b0411ef |
|
24-Apr-2017 |
Willem de Bruijn <willemb@google.com> |
virtio-net: clean tx descriptors from rx napi Amortize the cost of virtual interrupts by doing both rx and tx work on reception of a receive interrupt if tx napi is enabled. With VIRTIO_F_EVENT_IDX, this suppresses most explicit tx completion interrupts for bidirectional workloads. Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
ea7735d9 |
|
24-Apr-2017 |
Willem de Bruijn <willemb@google.com> |
virtio-net: move free_old_xmit_skbs An upcoming patch will call free_old_xmit_skbs indirectly from virtnet_poll. Move the function above this to avoid having to introduce a forward declaration. This is a pure move: no code changes. Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
b92f1e67 |
|
24-Apr-2017 |
Willem de Bruijn <willemb@google.com> |
virtio-net: transmit napi Convert virtio-net to a standard napi tx completion path. This enables better TCP pacing using TCP small queues and increases single stream throughput. The virtio-net driver currently cleans tx descriptors on transmission of new packets in ndo_start_xmit. Latency depends on new traffic, so is unbounded. To avoid deadlock when a socket reaches its snd limit, packets are orphaned on tranmission. This breaks socket backpressure, including TSQ. Napi increases the number of interrupts generated compared to the current model, which keeps interrupts disabled as long as the ring has enough free descriptors. Keep tx napi optional and disabled for now. Follow-on patches will reduce the interrupt cost. Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
e4e8452a |
|
24-Apr-2017 |
Willem de Bruijn <willemb@google.com> |
virtio-net: napi helper functions Prepare virtio-net for tx napi by converting existing napi code to use helper functions. This also deduplicates some logic. Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
fe36cbe0 |
|
29-Mar-2017 |
Michael S. Tsirkin <mst@redhat.com> |
virtio_net: clear MTU when out of range virtio attempts to clear the MTU feature bit if the value is out of the supported range, but this has no real effect since FEATURES_OK has already been set. Fix this up by checking the MTU in the new validate callback. Fixes: 14de9d114a82 ("virtio-net: Add initial MTU advice feature") Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
|
#
2e123b44 |
|
07-Mar-2017 |
Michael S. Tsirkin <mst@redhat.com> |
virtio_net: enable big packets for large MTU values If one enables e.g. jumbo frames without mergeable buffers, packets won't fit in 1500 byte buffers we use. Switch to big packet mode instead. TODO: make sizing more exact, possibly extend small packet mode to use larger pages. Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
|
#
ebb6b4b1 |
|
21-Mar-2017 |
Philippe Reynes <tremyfr@gmail.com> |
net: virtio_net: use new api ethtool_{get|set}_link_ksettings The ethtool api {get|set}_settings is deprecated. We move this driver to new api {get|set}_link_ksettings. Signed-off-by: Philippe Reynes <tremyfr@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
eb1e011a |
|
15-Feb-2017 |
Johannes Berg <johannes.berg@intel.com> |
average: change to declare precision, not factor Declaring the factor is counter-intuitive, and people are prone to using small(-ish) values even when that makes no sense. Change the DECLARE_EWMA() macro to take the fractional precision, in bits, rather than a factor, and update all users. While at it, add some more documentation. Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
#
fb5e31d9 |
|
05-Feb-2017 |
Christoph Hellwig <hch@lst.de> |
virtio: allow drivers to request IRQ affinity when creating VQs Add a struct irq_affinity pointer to the find_vqs methods, which if set is used to tell the PCI layer to create the MSI-X vectors for our I/O virtqueues with the proper affinity from the start. Compared to after the fact affinity hints this gives us an instantly working setup and allows to allocate the irq descritors node-local and avoid interconnect traffic. Last but not least this will allow blk-mq queues are created based on the interrupt affinity for storage drivers. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
|
#
f6b10209 |
|
21-Feb-2017 |
Jason Wang <jasowang@redhat.com> |
virtio-net: switch to use build_skb() for small buffer This patch switch to use build_skb() for small buffer which can have better performance for both TCP and XDP (since we can work at page before skb creation). It also remove lots of XDP codes since both mergeable and small buffer use page frag during refill now. Before | After XDP_DROP(xdp1) 64B : 11.1Mpps | 14.4Mpps Tested with xdp1/xdp2/xdp_ip_tx_tunnel and netperf. Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
017b29c3 |
|
19-Feb-2017 |
Jason Wang <jasowang@redhat.com> |
virito-net: set queues after reset during xdp_set We set queues before reset which will cause a crash[1]. This is because is_xdp_raw_buffer_queue() depends on the old xdp queue pairs number to do the correct detection. So fix this by - passing xdp queue pairs and current queue pairs to virtnet_reset() - change vi->xdp_qp after reset but before refill, to make sure both free_unused_bufs() and refill can make correct detection of XDP. - remove the duplicated queue pairs setting before virtnet_reset() since we will do it inside virtnet_reset() [1] [ 74.328168] general protection fault: 0000 [#1] SMP [ 74.328625] Modules linked in: nfsd xfs libcrc32c virtio_net virtio_pci [ 74.329117] CPU: 0 PID: 2849 Comm: xdp2 Not tainted 4.10.0-rc7+ #499 [ 74.329577] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.10.1-0-g8891697-prebuilt.qemu-project.org 04/01/2014 [ 74.330424] task: ffff88007a894000 task.stack: ffffc90004388000 [ 74.330844] RIP: 0010:skb_release_head_state+0x28/0x80 [ 74.331298] RSP: 0018:ffffc9000438b8d0 EFLAGS: 00010206 [ 74.331676] RAX: 0000000000000000 RBX: ffff88007ad96300 RCX: 0000000000000000 [ 74.332217] RDX: ffff88007fc137a8 RSI: ffff88007fc0db28 RDI: 0001bf00000001be [ 74.332758] RBP: ffffc9000438b8d8 R08: 000000000005008f R09: 00000000000005f9 [ 74.333274] R10: ffff88007d001700 R11: ffffffff820a8a4d R12: ffff88007ad96300 [ 74.333787] R13: 0000000000000002 R14: ffff880036604000 R15: 000077ff80000000 [ 74.334308] FS: 00007fc70d8a7b40(0000) GS:ffff88007fc00000(0000) knlGS:0000000000000000 [ 74.334891] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 74.335314] CR2: 00007fff4144a710 CR3: 000000007ab56000 CR4: 00000000003406f0 [ 74.335830] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 74.336373] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 74.336895] Call Trace: [ 74.337086] skb_release_all+0xd/0x30 [ 74.337356] consume_skb+0x2c/0x90 [ 74.337607] free_unused_bufs+0x1ff/0x270 [virtio_net] [ 74.337988] ? vp_synchronize_vectors+0x3b/0x60 [virtio_pci] [ 74.338398] virtnet_xdp+0x21e/0x440 [virtio_net] [ 74.338741] dev_change_xdp_fd+0x101/0x140 [ 74.339048] do_setlink+0xcf4/0xd20 [ 74.339304] ? symcmp+0xf/0x20 [ 74.339529] ? mls_level_isvalid+0x52/0x60 [ 74.339828] ? mls_range_isvalid+0x43/0x50 [ 74.340135] ? nla_parse+0xa0/0x100 [ 74.340400] rtnl_setlink+0xd4/0x120 [ 74.340664] ? cpumask_next_and+0x30/0x50 [ 74.340966] rtnetlink_rcv_msg+0x7f/0x1f0 [ 74.341259] ? sock_has_perm+0x59/0x60 [ 74.341586] ? napi_consume_skb+0xe2/0x100 [ 74.342010] ? rtnl_newlink+0x890/0x890 [ 74.342435] netlink_rcv_skb+0x92/0xb0 [ 74.342846] rtnetlink_rcv+0x23/0x30 [ 74.343277] netlink_unicast+0x162/0x210 [ 74.343677] netlink_sendmsg+0x2db/0x390 [ 74.343968] sock_sendmsg+0x33/0x40 [ 74.344233] SYSC_sendto+0xee/0x160 [ 74.344482] ? SYSC_bind+0xb0/0xe0 [ 74.344806] ? sock_alloc_file+0x92/0x110 [ 74.345106] ? fd_install+0x20/0x30 [ 74.345360] ? sock_map_fd+0x3f/0x60 [ 74.345586] SyS_sendto+0x9/0x10 [ 74.345790] entry_SYSCALL_64_fastpath+0x1a/0xa9 [ 74.346086] RIP: 0033:0x7fc70d1b8f6d [ 74.346312] RSP: 002b:00007fff4144a708 EFLAGS: 00000246 ORIG_RAX: 000000000000002c [ 74.346785] RAX: ffffffffffffffda RBX: 00000000ffffffff RCX: 00007fc70d1b8f6d [ 74.347244] RDX: 000000000000002c RSI: 00007fff4144a720 RDI: 0000000000000003 [ 74.347683] RBP: 0000000000000003 R08: 0000000000000000 R09: 0000000000000000 [ 74.348544] R10: 0000000000000000 R11: 0000000000000246 R12: 00007fff4144bd90 [ 74.349082] R13: 0000000000000002 R14: 0000000000000002 R15: 00007fff4144cda0 [ 74.349607] Code: 00 00 00 55 48 89 e5 53 48 89 fb 48 8b 7f 58 48 85 ff 74 0e 40 f6 c7 01 74 3d 48 c7 43 58 00 00 00 00 48 8b 7b 68 48 85 ff 74 05 <f0> ff 0f 74 20 48 8b 43 60 48 85 c0 74 14 65 8b 15 f3 ab 8d 7e [ 74.351008] RIP: skb_release_head_state+0x28/0x80 RSP: ffffc9000438b8d0 [ 74.351625] ---[ end trace fe6e19fd11cfc80b ]--- Fixes: 2de2f7f40ef9 ("virtio_net: XDP support for adjust_head") Cc: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
61845d20 |
|
16-Feb-2017 |
Jason Wang <jasowang@redhat.com> |
virtio-net: batch stats updating We already have counters for sent/recv packets and sent/recv bytes. Doing a batched update to reduce the number of u64_stats_update_begin/end(). Take care not to bother with stats update when called speculatively. Cc: Willem de Bruijn <willemb@google.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
2de2f7f4 |
|
02-Feb-2017 |
John Fastabend <john.fastabend@gmail.com> |
virtio_net: XDP support for adjust_head Add support for XDP adjust head by allocating a 256B header region that XDP programs can grow into. This is only enabled when a XDP program is loaded. In order to ensure that we do not have to unwind queue headroom push queue setup below bpf_prog_add. It reads better to do a prog ref unwind vs another queue setup call. At the moment this code must do a full reset to ensure old buffers without headroom on program add or with headroom on program removal are not used incorrectly in the datapath. Ideally we would only have to disable/enable the RX queues being updated but there is no API to do this at the moment in virtio so use the big hammer. In practice it is likely not that big of a problem as this will only happen when XDP is enabled/disabled changing programs does not require the reset. There is some risk that the driver may either have an allocation failure or for some reason fail to correctly negotiate with the underlying backend in this case the driver will be left uninitialized. I have not seen this ever happen on my test systems and for what its worth this same failure case can occur from probe and other contexts in virtio framework. Signed-off-by: John Fastabend <john.r.fastabend@intel.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
9fe7bfce |
|
02-Feb-2017 |
John Fastabend <john.fastabend@gmail.com> |
virtio_net: refactor freeze/restore logic into virtnet reset logic For XDP we will need to reset the queues to allow for buffer headroom to be configured. In order to do this we need to essentially run the freeze()/restore() code path. Unfortunately the locking requirements between the freeze/restore and reset paths are different however so we can not simply reuse the code. This patch refactors the code path and adds a reset helper routine. Signed-off-by: John Fastabend <john.r.fastabend@intel.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
722d8283 |
|
02-Feb-2017 |
John Fastabend <john.fastabend@gmail.com> |
virtio_net: remove duplicate queue pair binding in XDP Factor out qp assignment. Signed-off-by: John Fastabend <john.r.fastabend@intel.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
0354e4d1 |
|
02-Feb-2017 |
John Fastabend <john.fastabend@gmail.com> |
virtio_net: factor out xdp handler for readability At this point the do_xdp_prog is mostly if/else branches handling the different modes of virtio_net. So remove it and handle running the program in the per mode handlers. Signed-off-by: John Fastabend <john.r.fastabend@intel.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
47315329 |
|
02-Feb-2017 |
John Fastabend <john.fastabend@gmail.com> |
virtio_net: wrap rtnl_lock in test for calling with lock already held For XDP use case and to allow ethtool reset tests it is useful to be able to use reset paths from contexts where rtnl lock is already held. This requries updating virtnet_set_queues and free_receive_bufs the two places where rtnl_lock is taken in virtio_net. To do this we use the following pattern, _foo(...) { do stuff } foo(...) { rtnl_lock(); _foo(...); rtnl_unlock()}; this allows us to use freeze()/restore() flow from both contexts. Signed-off-by: John Fastabend <john.r.fastabend@intel.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
4d6308aa |
|
04-Feb-2017 |
Eric Dumazet <edumazet@google.com> |
virtio_net: exploit napi_complete_done() return value Since commit 364b6055738b ("net: busy-poll: return busypolling status to drivers"), napi_complete_done() returns a boolean that can be used by drivers to conditionally rearm interrupts. This patch changes virtio_net to use this boolean to avoid a bit of overhead for busy-poll users. Jason reports about 1.1% improvement for 1 byte TCP_RR (burst 100). Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Jason Wang <jasowang@redhat.com> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
ceef438d |
|
02-Feb-2017 |
Eric Dumazet <edumazet@google.com> |
virtio_net: remove custom busy_poll Generic NAPI busy polling allows us to remove custom implementations found in drivers. It is possible further optimization could be done by testing napi_complete_done() return value. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
529ec6ac |
|
25-Jan-2017 |
Jakub Kicinski <kuba@kernel.org> |
virtio_net: reject XDP programs using header adjustment commit 17bedab27231 ("bpf: xdp: Allow head adjustment in XDP prog") added a new XDP helper to prepend and remove data from a frame. Make virtio_net reject programs making use of this helper until proper support is added. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Acked-by: John Fastabend <john.r.fastabend@intel.com> Acked-by: Jason Wang <jasowang@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
b68df015 |
|
25-Jan-2017 |
John Fastabend <john.fastabend@gmail.com> |
virtio_net: use dev_kfree_skb for small buffer XDP receive In the small buffer case during driver unload we currently use put_page instead of dev_kfree_skb. Resolve this by adding a check for virtnet mode when checking XDP queue type. Also name the function so that the code reads correctly to match the additional check. Fixes: bb91accf2733 ("virtio-net: XDP support for small buffers") Signed-off-by: John Fastabend <john.r.fastabend@intel.com> Acked-by: Jason Wang <jasowang@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
a67edbf4 |
|
24-Jan-2017 |
Daniel Borkmann <daniel@iogearbox.net> |
bpf: add initial bpf tracepoints This work adds a number of tracepoints to paths that are either considered slow-path or exception-like states, where monitoring or inspecting them would be desirable. For bpf(2) syscall, tracepoints have been placed for main commands when they succeed. In XDP case, tracepoint is for exceptions, that is, f.e. on abnormal BPF program exit such as unknown or XDP_ABORTED return code, or when error occurs during XDP_TX action and the packet could not be forwarded. Both have been split into separate event headers, and can be further extended. Worst case, if they unexpectedly should get into our way in future, they can also removed [1]. Of course, these tracepoints (like any other) can be analyzed by eBPF itself, etc. Example output: # ./perf record -a -e bpf:* sleep 10 # ./perf script sock_example 6197 [005] 283.980322: bpf:bpf_map_create: map type=ARRAY ufd=4 key=4 val=8 max=256 flags=0 sock_example 6197 [005] 283.980721: bpf:bpf_prog_load: prog=a5ea8fa30ea6849c type=SOCKET_FILTER ufd=5 sock_example 6197 [005] 283.988423: bpf:bpf_prog_get_type: prog=a5ea8fa30ea6849c type=SOCKET_FILTER sock_example 6197 [005] 283.988443: bpf:bpf_map_lookup_elem: map type=ARRAY ufd=4 key=[06 00 00 00] val=[00 00 00 00 00 00 00 00] [...] sock_example 6197 [005] 288.990868: bpf:bpf_map_lookup_elem: map type=ARRAY ufd=4 key=[01 00 00 00] val=[14 00 00 00 00 00 00 00] swapper 0 [005] 289.338243: bpf:bpf_prog_put_rcu: prog=a5ea8fa30ea6849c type=SOCKET_FILTER [1] https://lwn.net/Articles/705270/ Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
d0fa28f0 |
|
23-Jan-2017 |
Michael S. Tsirkin <mst@redhat.com> |
virtio_net: fix PAGE_SIZE > 64k I don't have any guests with PAGE_SIZE > 64k but the code seems to be clearly broken in that case as PAGE_SIZE / MERGEABLE_BUFFER_ALIGN will need more than 8 bit and so the code in mergeable_ctx_to_buf_address does not give us the actual true size. Cc: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
6391a448 |
|
19-Jan-2017 |
Jason Wang <jasowang@redhat.com> |
virtio-net: restore VIRTIO_HDR_F_DATA_VALID on receiving Commit 501db511397f ("virtio: don't set VIRTIO_NET_HDR_F_DATA_VALID on xmit") in fact disables VIRTIO_HDR_F_DATA_VALID on receiving path too, fixing this by adding a hint (has_data_valid) and set it only on the receiving path. Cc: Rolf Neugebauer <rolf.neugebauer@docker.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: Rolf Neugebauer <rolf.neugebauer@docker.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
bc1f4470 |
|
06-Jan-2017 |
stephen hemminger <stephen@networkplumber.org> |
net: make ndo_get_stats64 a void function The network device operation for reading statistics is only called in one place, and it ignores the return value. Having a structure return value is potentially confusing because some future driver could incorrectly assume that the return value was used. Fix all drivers with ndo_get_stats64 to have a void function. Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
801822d1 |
|
23-Dec-2016 |
Shyam Saini <mayhs11saini@gmail.com> |
net: Use kmemdup instead of kmalloc and memcpy when some other buffer is immediately copied into allocated region. Replace calls to kmalloc followed by a memcpy with a direct call to kmemdup. Signed-off-by: Shyam Saini <mayhs11saini@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
73c1b41e |
|
21-Dec-2016 |
Thomas Gleixner <tglx@linutronix.de> |
cpu/hotplug: Cleanup state names When the state names got added a script was used to add the extra argument to the calls. The script basically converted the state constant to a string, but the cleanup to convert these strings into meaningful ones did not happen. Replace all the useless strings with 'subsys/xxx/yyy:state' strings which are used in all the other places already. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sebastian Siewior <bigeasy@linutronix.de> Link: http://lkml.kernel.org/r/20161221192112.085444152@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
bb91accf |
|
23-Dec-2016 |
Jason Wang <jasowang@redhat.com> |
virtio-net: XDP support for small buffers Commit f600b6905015 ("virtio_net: Add XDP support") leaves the case of small receive buffer untouched. This will confuse the user who want to set XDP but use small buffers. Other than forbid XDP in small buffer mode, let's make it work. XDP then can only work at skb->data since virtio-net create skbs during refill, this is sub optimal which could be optimized in the future. Cc: John Fastabend <john.r.fastabend@intel.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: John Fastabend <john.r.fastabend@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
c47a43d3 |
|
23-Dec-2016 |
Jason Wang <jasowang@redhat.com> |
virtio-net: remove big packet XDP codes Now we in fact don't allow XDP for big packets, remove its codes. Cc: John Fastabend <john.r.fastabend@intel.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
92502fe8 |
|
23-Dec-2016 |
Jason Wang <jasowang@redhat.com> |
virtio-net: forbid XDP when VIRTIO_NET_F_GUEST_UFO is support When VIRTIO_NET_F_GUEST_UFO is negotiated, host could still send UFO packet that exceeds a single page which could not be handled correctly by XDP. So this patch forbids setting XDP when GUEST_UFO is supported. While at it, forbid XDP for ECN (which comes only from GRO) too to prevent user from misconfiguration. Cc: John Fastabend <john.r.fastabend@intel.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: John Fastabend <john.r.fastabend@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
5c33474d |
|
23-Dec-2016 |
Jason Wang <jasowang@redhat.com> |
virtio-net: make rx buf size estimation works for XDP We don't update ewma rx buf size in the case of XDP. This will lead underestimation of rx buf size which causes host to produce more than one buffers. This will greatly increase the possibility of XDP page linearization. Cc: John Fastabend <john.r.fastabend@intel.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: John Fastabend <john.r.fastabend@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
b00f70b0 |
|
23-Dec-2016 |
Jason Wang <jasowang@redhat.com> |
virtio-net: unbreak csumed packets for XDP_PASS We drop csumed packet when do XDP for packets. This breaks XDP_PASS when GUEST_CSUM is supported. Fix this by allowing csum flag to be set. With this patch, simple TCP works for XDP_PASS. Cc: John Fastabend <john.r.fastabend@intel.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: John Fastabend <john.r.fastabend@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
1830f893 |
|
23-Dec-2016 |
Jason Wang <jasowang@redhat.com> |
virtio-net: correctly handle XDP_PASS for linearized packets When XDP_PASS were determined for linearized packets, we try to get new buffers in the virtqueue and build skbs from them. This is wrong, we should create skbs based on existed buffers instead. Fixing them by creating skb based on xdp_page. With this patch "ping 192.168.100.4 -s 3900 -M do" works for XDP_PASS. Cc: John Fastabend <john.r.fastabend@intel.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: John Fastabend <john.r.fastabend@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
56a86f84 |
|
23-Dec-2016 |
Jason Wang <jasowang@redhat.com> |
virtio-net: fix page miscount during XDP linearizing We don't put page during linearizing, the would cause leaking when xmit through XDP_TX or the packet exceeds PAGE_SIZE. Fix them by put page accordingly. Also decrease the number of buffers during linearizing to make sure caller can free buffers correctly when packet exceeds PAGE_SIZE. With this patch, we won't get OOM after linearize huge number of packets. Cc: John Fastabend <john.r.fastabend@intel.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: John Fastabend <john.r.fastabend@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
275be061 |
|
23-Dec-2016 |
Jason Wang <jasowang@redhat.com> |
virtio-net: correctly xmit linearized page on XDP_TX After we linearize page, we should xmit this page instead of the page of first buffer which may lead unexpected result. With this patch, we can see correct packet during XDP_TX. Cc: John Fastabend <john.r.fastabend@intel.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: John Fastabend <john.r.fastabend@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
73b62bd0 |
|
23-Dec-2016 |
Jason Wang <jasowang@redhat.com> |
virtio-net: remove the warning before XDP linearizing Since we use EWMA to estimate the size of rx buffer. When rx buffer size is underestimated, it's usual to have a packet with more than one buffers. Consider this is not a bug, remove the warning and correct the comment before XDP linearizing. Cc: John Fastabend <john.r.fastabend@intel.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
72979a6c |
|
15-Dec-2016 |
John Fastabend <john.fastabend@gmail.com> |
virtio_net: xdp, add slowpath case for non contiguous buffers virtio_net XDP support expects receive buffers to be contiguous. If this is not the case we enable a slowpath to allow connectivity to continue but at a significan performance overhead associated with linearizing data. To make it painfully aware to users that XDP is running in a degraded mode we throw an xdp buffer error. To linearize packets we allocate a page and copy the segments of the data, including the header, into it. After this the page can be handled by XDP code flow as normal. Then depending on the return code the page is either freed or sent to the XDP xmit path. There is no attempt to optimize this path. This case is being handled simple as a precaution in case some unknown backend were to generate packets in this form. To test this I had to hack qemu and force it to generate these packets. I do not expect this case to be generated by "real" backends. Signed-off-by: John Fastabend <john.r.fastabend@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
56434a01 |
|
15-Dec-2016 |
John Fastabend <john.fastabend@gmail.com> |
virtio_net: add XDP_TX support This adds support for the XDP_TX action to virtio_net. When an XDP program is run and returns the XDP_TX action the virtio_net XDP implementation will transmit the packet on a TX queue that aligns with the current CPU that the XDP packet was processed on. Before sending the packet the header is zeroed. Also XDP is expected to handle checksum correctly so no checksum offload support is provided. Signed-off-by: John Fastabend <john.r.fastabend@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
672aafd5 |
|
15-Dec-2016 |
John Fastabend <john.fastabend@gmail.com> |
virtio_net: add dedicated XDP transmit queues XDP requires using isolated transmit queues to avoid interference with normal networking stack (BQL, NETDEV_TX_BUSY, etc). This patch adds a XDP queue per cpu when a XDP program is loaded and does not expose the queues to the OS via the normal API call to netif_set_real_num_tx_queues(). This way the stack will never push an skb to these queues. However virtio/vhost/qemu implementation only allows for creating TX/RX queue pairs at this time so creating only TX queues was not possible. And because the associated RX queues are being created I went ahead and exposed these to the stack and let the backend use them. This creates more RX queues visible to the network stack than TX queues which is worth mentioning but does not cause any issues as far as I can tell. Signed-off-by: John Fastabend <john.r.fastabend@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
f600b690 |
|
15-Dec-2016 |
John Fastabend <john.fastabend@gmail.com> |
virtio_net: Add XDP support This adds XDP support to virtio_net. Some requirements must be met for XDP to be enabled depending on the mode. First it will only be supported with LRO disabled so that data is not pushed across multiple buffers. Second the MTU must be less than a page size to avoid having to handle XDP across multiple pages. If mergeable receive is enabled this patch only supports the case where header and data are in the same buf which we can check when a packet is received by looking at num_buf. If the num_buf is greater than 1 and a XDP program is loaded the packet is dropped and a warning is thrown. When any_header_sg is set this does not happen and both header and data is put in a single buffer as expected so we check this when XDP programs are loaded. Subsequent patches will process the packet in a degraded mode to ensure connectivity and correctness is not lost even if backend pushes packets into multiple buffers. If big packets mode is enabled and MTU/LRO conditions above are met then XDP is allowed. This patch was tested with qemu with vhost=on and vhost=off where mergeable and big_packet modes were forced via hard coding feature negotiation. Multiple buffers per packet was forced via a small test patch to vhost.c in the vhost=on qemu mode. Suggested-by: Shrijeet Mukherjee <shrijeet@gmail.com> Signed-off-by: John Fastabend <john.r.fastabend@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
a220871b |
|
12-Dec-2016 |
Jason Wang <jasowang@redhat.com> |
virtio-net: correctly enable multiqueue Commit 4490001029012539937ff02778fe6180613fa949 ("virtio-net: enable multiqueue by default") blindly set the affinity instead of queues during probe which can cause a mismatch of #queues between guest and host. This patch fixes it by setting queues. Reported-by: Theodore Ts'o <tytso@mit.edu> Tested-by: Theodore Ts'o <tytso@mit.edu> Cc: Neil Horman <nhorman@tuxdriver.com> Cc: Michael S. Tsirkin <mst@redhat.com> Fixes: 49000102901 ("virtio-net: enable multiqueue by default") Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
e37e2ff3 |
|
05-Dec-2016 |
Andy Lutomirski <luto@kernel.org> |
virtio-net: Fix DMA-from-the-stack in virtnet_set_mac_address() With CONFIG_VMAP_STACK=y, virtnet_set_mac_address() can be passed a pointer to the stack and it will OOPS. Copy the address to the heap to prevent the crash. Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Laura Abbott <labbott@redhat.com> Reported-by: zbyszek@in.waw.pl Signed-off-by: Andy Lutomirski <luto@kernel.org> Acked-by: Jason Wang <jasowang@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
44900010 |
|
24-Nov-2016 |
Jason Wang <jasowang@redhat.com> |
virtio-net: enable multiqueue by default We use single queue even if multiqueue is enabled and let admin to enable it through ethtool later. This is used to avoid possible regression (small packet TCP stream transmission). But looks like an overkill since: - single queue user can disable multiqueue when launching qemu - brings extra troubles for the management since it needs extra admin tool in guest to enable multiqueue - multiqueue performs much better than single queue in most of the cases So this patch enables multiqueue by default: if #queues is less than or equal to #vcpu, enable as much as queue pairs; if #queues is greater than #vcpu, enable #vcpu queue pairs. Cc: Hannes Frederic Sowa <hannes@redhat.com> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Neil Horman <nhorman@redhat.com> Cc: Jeremy Eder <jeder@redhat.com> Cc: Marko Myllynen <myllynen@redhat.com> Cc: Maxime Coquelin <maxime.coquelin@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
963abe5c |
|
15-Nov-2016 |
Eric Dumazet <edumazet@google.com> |
virtio-net: add a missing synchronize_net() It seems many drivers do not respect napi_hash_del() contract. When napi_hash_del() is used before netif_napi_del(), an RCU grace period is needed before freeing NAPI object. Fixes: 91815639d880 ("virtio-net: rx busy polling support") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Michael S. Tsirkin <mst@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
f3358507 |
|
03-Nov-2016 |
Michael S. Tsirkin <mst@redhat.com> |
virtio-net: drop legacy features in virtio 1 mode Virtio 1.0 spec says VIRTIO_F_ANY_LAYOUT and VIRTIO_NET_F_GSO are legacy-only feature bits. Do not negotiate them in virtio 1 mode. Note this is a spec violation so we need to backport it to stable/downstream kernels. Cc: stable@vger.kernel.org Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
93a205ee |
|
25-Oct-2016 |
Aaron Conole <aconole@redhat.com> |
virtio-net: Update the mtu code to match virtio spec The virtio committee recently ratified a change, VIRTIO-152, which defines the mtu field to be 'max' MTU, not simply desired MTU. This commit brings the virtio-net device in compliance with VIRTIO-152. Additionally, drop the max_mtu branch - it cannot be taken since the u16 returned by virtio_cread16 will never exceed the initial value of max_mtu. Signed-off-by: Aaron Conole <aconole@redhat.com> Acked-by: "Michael S. Tsirkin" <mst@redhat.com> Acked-by: Jarod Wilson <jarod@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
d0c2c997 |
|
20-Oct-2016 |
Jarod Wilson <jarod@redhat.com> |
net: use core MTU range checking in virt drivers hyperv_net: - set min/max_mtu, per Haiyang, after rndis_filter_device_add virtio_net: - set min/max_mtu - remove virtnet_change_mtu vmxnet3: - set min/max_mtu xen-netback: - min_mtu = 0, max_mtu = 65517 xen-netfront: - min_mtu = 0, max_mtu = 65535 unisys/visor: - clean up defines a little to not clash with network core or add redundat definitions CC: netdev@vger.kernel.org CC: virtualization@lists.linux-foundation.org CC: "K. Y. Srinivasan" <kys@microsoft.com> CC: Haiyang Zhang <haiyangz@microsoft.com> CC: "Michael S. Tsirkin" <mst@redhat.com> CC: Shrikrishna Khare <skhare@vmware.com> CC: "VMware, Inc." <pv-drivers@vmware.com> CC: Wei Liu <wei.liu2@citrix.com> CC: Paul Durrant <paul.durrant@citrix.com> CC: David Kershner <david.kershner@unisys.com> Signed-off-by: Jarod Wilson <jarod@redhat.com> Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
8017c279 |
|
12-Aug-2016 |
Sebastian Andrzej Siewior <bigeasy@linutronix.de> |
net/virtio-net: Convert to hotplug state machine Install the callbacks via the state machine. The driver supports multiple instances and therefore the new cpuhp_state_add_instance_nocalls() infrastrucure is used. The driver currently uses get_online_cpus() to avoid missing a CPU hotplug event while invoking virtnet_set_affinity(). This could be avoided by using cpuhp_state_add_instance() variant which holds the hotplug lock and invokes callback during registration. This is more or less a 1:1 conversion of the current code. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Mark Rutland <mark.rutland@arm.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: netdev@vger.kernel.org Cc: Will Deacon <will.deacon@arm.com> Cc: virtualization@lists.linux-foundation.org Cc: rt@linutronix.de Link: http://lkml.kernel.org/r/1471024183-12666-7-git-send-email-bigeasy@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
a725ee3e |
|
18-Jul-2016 |
Andy Lutomirski <luto@kernel.org> |
virtio-net: Remove more stack DMA VLAN and MQ control was doing DMA from the stack. Fix it. Cc: Michael S. Tsirkin <mst@redhat.com> Cc: "netdev@vger.kernel.org" <netdev@vger.kernel.org> Signed-off-by: Andy Lutomirski <luto@kernel.org> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
d1dc06dc |
|
13-Jun-2016 |
Mike Rapoport <rppt@linux.vnet.ibm.com> |
virtio_net: fix csum generation for virtio-net devices The commit e858fae2b0b8 ("virtio_net: use common code for virtio_net_hdr and skb GSO conversion") replaced the tun code for header manipulation with the generic helpers. While doing so, it implictly moved the skb_partial_csum_set() invocation after eth_type_trans(), which invalidate the current gso start/offset values. Fix it by moving the helper invocation before the mac pulling. Fixes: e858fae2b0b8 ("virtio_net: use common code for virtio_net_hdr and skb GSO conversion") Reported-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
e858fae2 |
|
08-Jun-2016 |
Mike Rapoport <rppt@linux.vnet.ibm.com> |
virtio_net: use common code for virtio_net_hdr and skb GSO conversion Replace open coded conversion between virtio_net_hdr to skb GSO info with virtio_net_hdr_{from,to}_skb Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
14de9d11 |
|
03-Jun-2016 |
Aaron Conole <aconole@redhat.com> |
virtio-net: Add initial MTU advice feature This commit adds the feature bit and associated mtu device entry for the virtio network device. When a virtio device comes up, it checks the feature bit for the VIRTIO_NET_F_MTU feature. If such feature bit is enabled, the driver will read the advised MTU and use it as the initial value. Signed-off-by: Aaron Conole <aconole@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
f00e35e2 |
|
30-May-2016 |
wangyunjian <wangyunjian@huawei.com> |
virtio_net: fix virtnet_open and virtnet_probe competing for try_fill_recv In function virtnet_open() and virtnet_probe(), func try_fill_recv() may be executed at the same time. VQ in virtqueue_add() has not been protected well and BUG_ON will be triggered when virito_net.ko being removed. Signed-off-by: Yunjian Wang <wangyunjian@huawei.com> Acked-by: Jason Wang <jasowang@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
c67f5db8 |
|
17-Mar-2016 |
Paolo Abeni <pabeni@redhat.com> |
virtio_net: replace netdev_alloc_skb_ip_align() with napi_alloc_skb() This gives small but noticeable rx performance improvement (2-3%) and will allow exploiting future napi improvement. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
|
#
0cf3ace9 |
|
07-Feb-2016 |
Nikolay Aleksandrov <nikolay@cumulusnetworks.com> |
virtio_net: validate ethtool port setting and explain the user validation We should validate the port setting that we got from the user and check if it's what we've set it to (PORT_OTHER), also add explanation that ignoring advertising is good as long as we don't have autonegotiation. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
16032be5 |
|
02-Feb-2016 |
Nikolay Aleksandrov <nikolay@cumulusnetworks.com> |
virtio_net: add ethtool support for set and get of settings This patch allows the user to set and retrieve speed and duplex of the virtio_net device via ethtool. Having this functionality is very helpful for simulating different environments and also enables the virtio_net device to participate in operations where proper speed and duplex are required (e.g. currently bonding lacp mode requires full duplex). Custom speed and duplex are not allowed, the user-supplied settings are validated before applying. Example: $ ethtool eth1 Settings for eth1: ... Speed: Unknown! Duplex: Unknown! (255) $ ethtool -s eth1 speed 1000 duplex full $ ethtool eth1 Settings for eth1: ... Speed: 1000Mb/s Duplex: Full Based on a patch by Roopa Prabhu. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
2ac46030 |
|
15-Nov-2015 |
Michael S. Tsirkin <mst@redhat.com> |
virtio-net: Stop doing DMA from the stack Once virtio starts using the DMA API, we won't be able to safely DMA from the stack. virtio-net does a couple of config DMA requests from small stack buffers -- switch to using dynamically-allocated memory. This should have no effect on any performance-critical code paths. Reported-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Tested-by: Andy Lutomirski <luto@kernel.org>
|
#
93d05d4a |
|
18-Nov-2015 |
Eric Dumazet <edumazet@google.com> |
net: provide generic busy polling to all NAPI drivers NAPI drivers no longer need to observe a particular protocol to benefit from busy polling (CONFIG_NET_RX_BUSY_POLL=y) napi_hash_add() and napi_hash_del() are automatically called from core networking stack, respectively from netif_napi_add() and netif_napi_del() This patch depends on free_netdev() and netif_napi_del() being called from process context, which seems to be the norm. Drivers might still prefer to call napi_hash_del() on their own, since they might combine all the rcu grace periods into a single one, knowing their NAPI structures lifetime, while core networking stack has no idea of a possible combining. Once this patch proves to not bring serious regressions, we will cleanup drivers to either remove napi_hash_del() or provide appropriate rcu grace periods combining. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
93f93a44 |
|
18-Nov-2015 |
Eric Dumazet <edumazet@google.com> |
net: move skb_mark_napi_id() into core networking stack We would like to automatically provide busy polling support to all NAPI drivers, without them having to implement anything. skb_mark_napi_id() can be called from napi_gro_receive() and napi_get_frags(). Few drivers are still calling skb_mark_napi_id() because they use netif_receive_skb(). They should eventually call napi_gro_receive() instead. I will leave this to drivers maintainers. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
547c890c |
|
27-Aug-2015 |
Jason Wang <jasowang@redhat.com> |
virtio-net: avoid unnecessary sg initialzation Usually an skb does not have up to MAX_SKB_FRAGS frags. So no need to initialize the unuse part of sg. This patch initialize the sg based on the real number it will used: - during xmit, it could be inferred from nr_frags and can_push. - for small receive buffer, it will also be 2. Cc: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
5377d758 |
|
19-Aug-2015 |
Johannes Berg <johannes@sipsolutions.net> |
virtio_net: use DECLARE_EWMA Instead of using the out-of-line EWMA calculation, use DECLARE_EWMA() to create static inlines. On x86/64 this results in no change in code size for me, but reduces the struct receive_queue size by the two unsigned long values that store the parameters. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
48900cb6 |
|
04-Aug-2015 |
Jason Wang <jasowang@redhat.com> |
virtio-net: drop NETIF_F_FRAGLIST virtio declares support for NETIF_F_FRAGLIST, but assumes that there are at most MAX_SKB_FRAGS + 2 fragments which isn't always true with a fraglist. A longer fraglist in the skb will make the call to skb_to_sgvec overflow the sg array, leading to memory corruption. Drop NETIF_F_FRAGLIST so we only get what we can handle. Cc: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
0fbd050a |
|
31-Jul-2015 |
Eric Dumazet <edumazet@google.com> |
virtio_net: add gro capability Straightforward patch to add GRO processing to virtio_net. napi_complete_done() usage allows more aggressive aggregation, opted-in by setting /sys/class/net/xxx/gro_flush_timeout Tested: Setting /sys/class/net/xxx/gro_flush_timeout to 1000 nsec, Rick Jones reported following results. One VM of each on a pair of OpenStack compute nodes with E5-2650Lv3 CPUs and Intel 82599ES-based NICs. So, two "before" and two "after" VMs. The OpenStack compute nodes were running OpenStack Kilo, with VxLAN encapsulation being used through OVS so no GRO coming-up the host stack. The compute nodes themselves were running a 3.14-based kernel. Single-stream netperf, CPU utilizations and thus service demands are based on intra-guest reported CPU. Throughput Mbit/s, bigger is better Min Median Average Max 4.2.0-rc3+ 1364 1686 1678 1938 4.2.0-rc3+flush1k 1824 2269 2275 2647 Send Service Demand, smaller is better Min Median Average Max 4.2.0-rc3+ 0.236 0.558 0.524 0.802 4.2.0-rc3+flush1k 0.176 0.503 0.471 0.738 Receive Service Demand, smaller is better. Min Median Average Max 4.2.0-rc3+ 1.906 2.188 2.191 2.531 4.2.0-rc3+flush1k 0.448 0.529 0.533 0.692 Signed-off-by: Eric Dumazet <edumazet@google.com> Tested-by: Rick Jones <rick.jones2@hp.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
75993300 |
|
15-Jul-2015 |
Michael S. Tsirkin <mst@redhat.com> |
virtio_net: don't require ANY_LAYOUT with VERSION_1 ANY_LAYOUT is a compatibility feature. It's implied for VERSION_1 devices, and non-transitional devices might not offer it. Change code to behave accordingly. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
60302ff6 |
|
02-Apr-2015 |
Michael S. Tsirkin <mst@redhat.com> |
virtio: document queue state logic commit d631b94e7a15277858ec5f88d674d93080506999 virtio: change comment in transmit started clarifying the logic behind queue state management, but introduced an inaccuracy: TX_BUSY does not cause a BUG message. Clean this up some more, explaining the tradeoffs in detail. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
faadb05f |
|
26-Mar-2015 |
Li RongQing <roy.qing.li@gmail.com> |
virtio: simplify the using of received in virtnet_poll received is 0, no need to minus it and use "+=" to reassign it Signed-off-by: Li RongQing <roy.qing.li@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
d631b94e |
|
24-Mar-2015 |
stephen hemminger <stephen@networkplumber.org> |
virtio: change comment in transmit The original comment was not really informative or funny as well as sexist. Replace it with a better explanation of why the driver does stop and what the impacts are. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
ab3971b1 |
|
11-Mar-2015 |
Jason Wang <jasowang@redhat.com> |
virtio-net: correctly delete napi hash We don't delete napi from hash list during module exit. This will cause the following panic when doing module load and unload: BUG: unable to handle kernel paging request at 0000004e00000075 IP: [<ffffffff816bd01b>] napi_hash_add+0x6b/0xf0 PGD 3c5d5067 PUD 0 Oops: 0000 [#1] SMP ... Call Trace: [<ffffffffa0a5bfb7>] init_vqs+0x107/0x490 [virtio_net] [<ffffffffa0a5c9f2>] virtnet_probe+0x562/0x791815639d880be [virtio_net] [<ffffffff8139e667>] virtio_dev_probe+0x137/0x200 [<ffffffff814c7f2a>] driver_probe_device+0x7a/0x250 [<ffffffff814c81d3>] __driver_attach+0x93/0xa0 [<ffffffff814c8140>] ? __device_attach+0x40/0x40 [<ffffffff814c6053>] bus_for_each_dev+0x63/0xa0 [<ffffffff814c7a79>] driver_attach+0x19/0x20 [<ffffffff814c76f0>] bus_add_driver+0x170/0x220 [<ffffffffa0a60000>] ? 0xffffffffa0a60000 [<ffffffff814c894f>] driver_register+0x5f/0xf0 [<ffffffff8139e41b>] register_virtio_driver+0x1b/0x30 [<ffffffffa0a60010>] virtio_net_driver_init+0x10/0x12 [virtio_net] This patch fixes this by doing this in virtnet_free_queues(). And also don't delete napi in virtnet_freeze() since it will call virtnet_free_queues() which has already did this. Fixes 91815639d880 ("virtio-net: rx busy polling support") Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
e3e3c423 |
|
03-Feb-2015 |
Vlad Yasevich <vyasevich@gmail.com> |
Revert "drivers/net: Disable UFO through virtio" This reverts commit 3d0ad09412ffe00c9afa201d01effdb6023d09b4. Now that GSO functionality can correctly track if the fragment id has been selected and select a fragment id if necessary, we can re-enable UFO on tap/macvap and virtio devices. Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
074c3582 |
|
24-Jun-2014 |
Jacob Keller <jacob.e.keller@intel.com> |
virtio_net: add software timestamp support This patch enables the use of software timestamping via the virtio_net driver. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
#
6ba42248 |
|
12-Jan-2015 |
Michael S. Tsirkin <mst@redhat.com> |
virtio/net: verify device has config space Some devices might not implement config space access (e.g. remoteproc used not to - before 3.9). virtio/net needs config space access so make it fail gracefully if not there. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
|
#
41f2f127 |
|
23-Dec-2014 |
Jason Wang <jasowang@redhat.com> |
virtio-net: don't do header check for dodgy gso packets There's no need to do header check for virtio-net since: - Host sets dodgy for all gso packets from guest and check the header. - Host should be prepared for all kinds of evil packets from guest, since malicious guest can send any kinds of packet. So this patch sets NETIF_F_GSO_ROBUST for virtio-net to skip the check. Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Michael S. Tsirkin <mst@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
8acdf999 |
|
19-Dec-2014 |
Herbert Xu <herbert@gondor.apana.org.au> |
virtio_net: Fix napi poll list corruption The commit d75b1ade567ffab085e8adbbdacf0092d10cd09c (net: less interrupt masking in NAPI) breaks virtio_net in an insidious way. It is now required that if the entire budget is consumed when poll returns, the napi poll_list must remain empty. However, like some other drivers virtio_net tries to do a last-ditch check and if there is more work it will call napi_schedule and then immediately process some of this new work. Should the entire budget be consumed while processing such new work then we will violate the new caller contract. This patch fixes this by not touching any work when we reschedule in virtio_net. The worst part of this bug is that the list corruption causes other napi users to be moved off-list. In my case I was chasing a stall in IPsec (IPsec uses netif_rx) and I only belatedly realised that it was virtio_net which caused the stall even though the virtio_net poll was still functioning perfectly after IPsec stalled. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
51cdc381 |
|
01-Dec-2014 |
Michael S. Tsirkin <mst@redhat.com> |
virtio: drop VIRTIO_F_VERSION_1 from drivers Core activates this bit automatically now, drop it from drivers that set it explicitly. Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
|
#
9465a7a6 |
|
23-Oct-2014 |
Michael S. Tsirkin <mst@redhat.com> |
virtio_net: enable v1.0 support Now that we have completed 1.0 support, enable it in our driver. Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
|
#
7e93a02f |
|
26-Nov-2014 |
Michael S. Tsirkin <mst@redhat.com> |
virtio_net: disable mac write for virtio 1.0 The spec states that mac in config space is only driver-writable in the legacy case. Fence writing it in virtnet_set_mac_address() in the virtio 1.0 case. Suggested-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
|
#
d04302b3 |
|
23-Oct-2014 |
Michael S. Tsirkin <mst@redhat.com> |
virtio_net: bigger header when VERSION_1 is set With VERSION_1 virtio_net uses same header size whether mergeable buffers are enabled or not. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com> Reviewed-by: Jason Wang <jasowang@redhat.com>
|
#
bcff3162 |
|
23-Oct-2014 |
Michael S. Tsirkin <mst@redhat.com> |
virtio_net: stricter short buffer length checks Our buffer length check is not strict enough for mergeable buffers: buffer can still be shorter that header + address by 2 bytes. Fix that up. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com> Reviewed-by: Jason Wang <jasowang@redhat.com>
|
#
012873d0 |
|
24-Oct-2014 |
Michael S. Tsirkin <mst@redhat.com> |
virtio_net: get rid of virtio_net_hdr/skb_vnet_hdr virtio 1.0 doesn't use virtio_net_hdr anymore, and in fact, it's not really useful since virtio_net_hdr_mrg_rxbuf includes that as the first field anyway. Let's drop it, precalculate header len and store within vi instead. This way we can also remove struct skb_vnet_hdr. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com> Reviewed-by: Jason Wang <jasowang@redhat.com>
|
#
946fa564 |
|
23-Oct-2014 |
Michael S. Tsirkin <mst@redhat.com> |
virtio_net: pass vi around Too many places poke at [rs]q->vq->vdev->priv just to get the vi structure. Let's just pass the pointer around: seems cleaner, and might even be faster. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
|
#
fdd819b2 |
|
07-Oct-2014 |
Michael S. Tsirkin <mst@redhat.com> |
virtio_net: v1.0 endianness Based on patches by Rusty Russell, Cornelia Huck. Note: more code changes are needed for 1.0 support (due to different header size). So we don't advertize support for 1.0 yet. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
|
#
892d6eb1 |
|
20-Nov-2014 |
Jason Wang <jasowang@redhat.com> |
virtio-net: validate features during probe We currently trigger BUG when VIRTIO_NET_F_CTRL_VQ is not set but one of features depending on it is. That's not a friendly way to report errors to hypervisors. Let's check, and fail probe instead. Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Cornelia Huck <cornelia.huck@de.ibm.com> Cc: Wanlong Gao <gaowanlong@cn.fujitsu.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
3d0ad094 |
|
30-Oct-2014 |
Ben Hutchings <ben@decadent.org.uk> |
drivers/net: Disable UFO through virtio IPv6 does not allow fragmentation by routers, so there is no fragmentation ID in the fixed header. UFO for IPv6 requires the ID to be passed separately, but there is no provision for this in the virtio net protocol. Until recently our software implementation of UFO/IPv6 generated a new ID, but this was a bug. Now we will use ID=0 for any UFO/IPv6 packet passed through a tap, which is even worse. Unfortunately there is no distinction between UFO/IPv4 and v6 features, so disable UFO on taps and virtio_net completely until we have a proper solution. We cannot depend on VM managers respecting the tap feature flags, so keep accepting UFO packets but log a warning the first time we do this. Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Fixes: 916e4cf46d02 ("ipv6: reuse ip6_frag_id from ip6_ufo_append_data") Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
4b7fd2e6 |
|
15-Oct-2014 |
Michael S. Tsirkin <mst@redhat.com> |
virtio_net: fix use after free commit 0b725a2ca61bedc33a2a63d0451d528b268cf975 net: Remove ndo_xmit_flush netdev operation, use signalling instead. added code that looks at skb->xmit_more after the skb has been put in TX VQ. Since some paths process the ring and free the skb immediately, this can cause use after free. Fix by storing xmit_more in a local variable. Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
e53fbd11 |
|
14-Oct-2014 |
Michael S. Tsirkin <mst@redhat.com> |
virtio_net: enable VQs early on restore virtio spec requires drivers to set DRIVER_OK before using VQs. This is set automatically after restore returns, virtio net violated this rule by using receive VQs within restore. To fix, call virtio_device_ready before using VQs. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
|
#
02465555 |
|
14-Oct-2014 |
Michael S. Tsirkin <mst@redhat.com> |
virtio_net: fix use after free on allocation failure In the extremely unlikely event that driver initialization fails after RX buffers are added, virtio net frees RX buffers while VQs are still active, potentially causing device to use a freed buffer. To fix, reset device first - same as we do on device removal. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
|
#
4baf1e33 |
|
14-Oct-2014 |
Michael S. Tsirkin <mst@redhat.com> |
virtio_net: enable VQs early virtio spec requires drivers to set DRIVER_OK before using VQs. This is set automatically after probe returns, virtio net violated this rule by using receive VQs within probe. To fix, call virtio_device_ready before using VQs. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
|
#
507613bf |
|
14-Oct-2014 |
Michael S. Tsirkin <mst@redhat.com> |
virtio_net: minor cleanup goto done; done: return; is ugly, it was put there to make diff review easier. replace by open-coded return. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
|
#
080c6373 |
|
14-Oct-2014 |
Michael S. Tsirkin <mst@redhat.com> |
virtio-net: drop config_mutex config_mutex served two purposes: prevent multiple concurrent config change handlers, and synchronize access to config_enable flag. Since commit dbf2576e37da0fcc7aacbfbb9fd5d3de7888a3c1 workqueue: make all workqueues non-reentrant all workqueues are non-reentrant, and config_enable is now gone. Get rid of the unnecessary lock. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
|
#
102a2786 |
|
14-Oct-2014 |
Michael S. Tsirkin <mst@redhat.com> |
virtio_net: drop config_enable Now that virtio core ensures config changes don't arrive during probing, drop config_enable flag in virtio net. On removal, flush is now sufficient to guarantee that no change work is queued. This help simplify the driver, and will allow setting DRIVER_OK earlier without losing config change notifications. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
|
#
a5835440 |
|
10-Sep-2014 |
Rusty Russell <rusty@rustcorp.com.au> |
virtio_net: pass well-formed sgs to virtqueue_add_*() This is the only driver which doesn't hand virtqueue_add_inbuf and virtqueue_add_outbuf a well-formed, well-terminated sg. Fix it, so we can make virtio_add_* simpler. pktgen results: modprobe pktgen echo 'add_device eth0' > /proc/net/pktgen/kpktgend_0 echo nowait 1 > /proc/net/pktgen/eth0 echo count 1000000 > /proc/net/pktgen/eth0 echo clone_skb 100000 > /proc/net/pktgen/eth0 echo dst_mac 4e:14:25:a9:30:ac > /proc/net/pktgen/eth0 echo dst 192.168.1.2 > /proc/net/pktgen/eth0 for i in `seq 20`; do echo start > /proc/net/pktgen/pgctrl; tail -n1 /proc/net/pktgen/eth0; done Before: 746547-793084(786421+/-9.6e+03)pps 346-367(364.4+/-4.4)Mb/sec (346397808-367990976(3.649e+08+/-4.5e+06)bps) errors: 0 After: 767390-792966(785159+/-6.5e+03)pps 356-367(363.75+/-2.9)Mb/sec (356068960-367936224(3.64314e+08+/-3e+06)bps) errors: 0 Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
c89fcfd4 |
|
28-Aug-2014 |
David S. Miller <davem@davemloft.net> |
virtio_net: flush when in xmit_more mode and under descriptor pressure Mirror the changes made to ixgbe in commit 2367a17390138f68b3aa28f2f220b8d7ff8d91f4 ("ixgbe: flush when in xmit_more mode and under descriptor pressure") Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
0b725a2c |
|
25-Aug-2014 |
David S. Miller <davem@davemloft.net> |
net: Remove ndo_xmit_flush netdev operation, use signalling instead. As reported by Jesper Dangaard Brouer, for high packet rates the overhead of having another indirect call in the TX path is non-trivial. There is the indirect call itself, and then there is all of the reloading of the state to refetch the tail pointer value and then write the device register. Move to a more passive scheme, which requires very light modifications to the device drivers. The signal is a new skb->xmit_more value, if it is non-zero it means that more SKBs are pending to be transmitted on the same queue as the current SKB. And therefore, the driver may elide the tail pointer update. Right now skb->xmit_more is always zero. Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
c223a078 |
|
23-Aug-2014 |
David S. Miller <davem@davemloft.net> |
virtio_net: Support netdev_ops->ndo_xmit_flush() Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
91815639 |
|
23-Jul-2014 |
Jason Wang <jasowang@redhat.com> |
virtio-net: rx busy polling support Add basic support for rx busy polling. Instead of introducing new states and spinlock to synchronize between NAPI and polling method, this patch just reuse NAPI state to avoid extra overhead for fast path and simplified the codes. Test was done between a kvm guest and an external host. Two hosts were connected through 40gb mlx4 cards. With both busy_poll and busy_read are set to 50 in guest, 1 byte netperf tcp_rr shows 127% improvement: transaction rate was increased from 8353.33 to 18966.87. Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Vlad Yasevich <vyasevic@redhat.com> Cc: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
2ffa7598 |
|
23-Jul-2014 |
Jason Wang <jasowang@redhat.com> |
virtio-net: introduce virtnet_receive() Move common receive logic to a new helper virtnet_receive(). It will also be used by rx busy polling method. Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Vlad Yasevich <vyasevic@redhat.com> Cc: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
7ad24ea4 |
|
10-May-2014 |
Wilfried Klaebe <w-lkml@lebenslange-mailadresse.de> |
net: get rid of SET_ETHTOOL_OPS net: get rid of SET_ETHTOOL_OPS Dave Miller mentioned he'd like to see SET_ETHTOOL_OPS gone. This does that. Mostly done via coccinelle script: @@ struct ethtool_ops *ops; struct net_device *dev; @@ - SET_ETHTOOL_OPS(dev, ops); + dev->ethtool_ops = ops; Compile tested only, but I'd seriously wonder if this broke anything. Suggested-by: Dave Miller <davem@davemloft.net> Signed-off-by: Wilfried Klaebe <w-lkml@lebenslange-mailadresse.de> Acked-by: Felipe Balbi <balbi@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
6ebbc1a6 |
|
29-Apr-2014 |
Zhangjie \(HZ\) <zhangjie14@huawei.com> |
virtio-net: Set needed_headroom for virtio-net when VIRTIO_F_ANY_LAYOUT is true This is a small supplement for commit e7428e95a06fb516fac1308bd0e176e27c0b9287 ("virtio-net: put virtio-net header inline with data"). TCP packages have enough room to put virtio-net header in, but UDP packages do not. By setting dev->needed_headroom for virtio-net device, UDP packages could have enough room. For UDP packages, sk_buff is alloced in fun __ip_append_data. The size is "alloclen + hh_len + 15", and "hh_len = LL_RESERVED_SPACE(rt-dst.dev);". The Macro is defined as follows: #define LL_RESERVED_SPACE(dev) \ ((((dev)->hard_header_len+(dev)->needed_headroom)\ &~(HH_DATA_MOD - 1)) + HH_DATA_MOD) By default, for UDP packages, after skb is allocated, only 16 bytes reserved. And 2 bytes remained after mac header is set. That is not enough to put virtio-net header in. If we set dev->needed_headroom to 12 or 10 (according to mergeable_rx_bufs is on or off ), more room can be reserved. Then there is enough room for UDP packages to put the header in. test result list as below: guest and host: suse11sp3, netperf, intel 2.4GHz +-------+---------+---------+---------+---------+ | | old | new | +-------+---------+---------+---------+---------+ | UDP | Gbit/s | pps | Gbit/s | pps | | 64 | 0.57 | 692232 | 0.61 | 742420 | | 256 | 1.60 | 686860 | 1.71 | 733331 | | 512 | 2.92 | 674576 | 3.07 | 710446 | | 1024 | 4.99 | 598977 | 5.17 | 620821 | | 1460 | 5.68 | 483757 | 7.16 | 610519 | | 4096 | 6.98 | 637468 | 7.21 | 658471 | +-------+---------+---------+---------+---------+ Signed-off-by: Zhang Jie <zhangjie14@huawei.com> Acked-by: Rusty Russell <rusty@rustcorp.com.au> Acked-by: Jason Wang <jasowang@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
c18e9cd6 |
|
17-Apr-2014 |
Amos Kong <akong@redhat.com> |
virtio_net: zero is an invald queue_pairs number Execute "ethtool -L eth0 combined 0" in guest, if multiqueue is enabled, virtnet_send_command() will return -EINVAL error, there is a validation in QEMU. But if multiqueue is disabled, virtnet_set_queues() will just return zero (success). We should return error for this situation. Signed-off-by: Amos Kong <akong@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
681daee2 |
|
25-Mar-2014 |
Jason Wang <jasowang@redhat.com> |
virtio-net: correct error handling of virtqueue_kick() Current error handling of virtqueue_kick() was wrong in two places: - The skb were freed immediately when virtqueue_kick() fail during xmit. This may lead double free since the skb was not detached from the virtqueue. - try_fill_recv() returns false when virtqueue_kick() fail. This will lead unnecessary rescheduling of refill work. Actually, it's safe to just ignore the kick failure in those two places. So this patch fixes this by partially revert commit 67975901183799af8e93ec60e322f9e2a1940b9b. Fixes 67975901183799af8e93ec60e322f9e2a1940b9b (virtio_net: verify if virtqueue_kick() succeeded). Cc: Heinz Graalfs <graalfs@linux.vnet.ibm.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
85e94525 |
|
15-Mar-2014 |
Eric W. Biederman <ebiederm@xmission.com> |
virtio_net: Call dev_kfree_skb_any instead of dev_kfree_skb. Replace dev_kfree_skb with dev_kfree_skb_any in start_xmit which can be called in hard irq and other contexts. start_xmit only frees skbs that it is dropping. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
|
#
57a7744e |
|
13-Mar-2014 |
Eric W. Biederman <ebiederm@xmission.com> |
net: Replace u64_stats_fetch_begin_bh to u64_stats_fetch_begin_irq Replace the bh safe variant with the hard irq safe variant. We need a hard irq safe variant to deal with netpoll transmitting packets from hard irq context, and we need it in most if not all of the places using the bh safe variant. Except on 32bit uni-processor the code is exactly the same so don't bother with a bh variant, just have a hard irq safe variant that everyone can use. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
a7c58146 |
|
12-Mar-2014 |
Rusty Russell <rusty@rustcorp.com.au> |
virtio_net: don't crash if virtqueue is broken. A bad implementation of virtio might cause us to mark the virtqueue broken: we'll dev_err() in that case, and the device is useless, but let's not BUG_ON(). Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
|
#
0e7ede80 |
|
20-Feb-2014 |
Jason Wang <jasowang@redhat.com> |
virtio-net: alloc big buffers also when guest can receive UFO We should alloc big buffers also when guest can receive UFO packets to let the big packets fit into guest rx buffer. Fixes 5c5167515d80f78f6bb538492c423adcae31ad65 (virtio-net: Allow UFO feature to be set and advertised.) Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Sridhar Samudrala <sri@us.ibm.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
fbf28d78 |
|
16-Jan-2014 |
Michael Dalton <mwdalton@google.com> |
virtio-net: initial rx sysfs support, export mergeable rx buffer size Add initial support for per-rx queue sysfs attributes to virtio-net. If mergeable packet buffers are enabled, adds a read-only mergeable packet buffer size sysfs attribute for each RX queue. Suggested-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael Dalton <mwdalton@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
ab7db917 |
|
16-Jan-2014 |
Michael Dalton <mwdalton@google.com> |
virtio-net: auto-tune mergeable rx buffer size for improved performance Commit 2613af0ed18a ("virtio_net: migrate mergeable rx buffers to page frag allocators") changed the mergeable receive buffer size from PAGE_SIZE to MTU-size, introducing a single-stream regression for benchmarks with large average packet size. There is no single optimal buffer size for all workloads. For workloads with packet size <= MTU bytes, MTU + virtio-net header-sized buffers are preferred as larger buffers reduce the TCP window due to SKB truesize. However, single-stream workloads with large average packet sizes have higher throughput if larger (e.g., PAGE_SIZE) buffers are used. This commit auto-tunes the mergeable receiver buffer packet size by choosing the packet buffer size based on an EWMA of the recent packet sizes for the receive queue. Packet buffer sizes range from MTU_SIZE + virtio-net header len to PAGE_SIZE. This improves throughput for large packet workloads, as any workload with average packet size >= PAGE_SIZE will use PAGE_SIZE buffers. These optimizations interact positively with recent commit ba275241030c ("virtio-net: coalesce rx frags when possible during rx"), which coalesces adjacent RX SKB fragments in virtio_net. The coalescing optimizations benefit buffers of any size. Benchmarks taken from an average of 5 netperf 30-second TCP_STREAM runs between two QEMU VMs on a single physical machine. Each VM has two VCPUs with all offloads & vhost enabled. All VMs and vhost threads run in a single 4 CPU cgroup cpuset, using cgroups to ensure that other processes in the system will not be scheduled on the benchmark CPUs. Trunk includes SKB rx frag coalescing. net-next w/ virtio_net before 2613af0ed18a (PAGE_SIZE bufs): 14642.85Gb/s net-next (MTU-size bufs): 13170.01Gb/s net-next + auto-tune: 14555.94Gb/s Jason Wang also reported a throughput increase on mlx4 from 22Gb/s using MTU-sized buffers to about 26Gb/s using auto-tuning. Signed-off-by: Michael Dalton <mwdalton@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
fb51879d |
|
16-Jan-2014 |
Michael Dalton <mwdalton@google.com> |
virtio-net: use per-receive queue page frag alloc for mergeable bufs The virtio-net driver currently uses netdev_alloc_frag() for GFP_ATOMIC mergeable rx buffer allocations. This commit migrates virtio-net to use per-receive queue page frags for GFP_ATOMIC allocation. This change unifies mergeable rx buffer memory allocation, which now will use skb_refill_frag() for both atomic and GFP-WAIT buffer allocations. To address fragmentation concerns, if after buffer allocation there is too little space left in the page frag to allocate a subsequent buffer, the remaining space is added to the current allocated buffer so that the remaining space can be used to store packet data. Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael Dalton <mwdalton@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
be121f46 |
|
15-Jan-2014 |
Jason Wang <jasowang@redhat.com> |
virtio-net: drop rq->max and rq->num It looks like there's no need for those two fields: - Unless there's a failure for the first refill try, rq->max should be always equal to the vring size. - rq->num is only used to determine the condition that we need to do the refill, we could check vq->num_free instead. - rq->num was required to be increased or decreased explicitly after each get/put which results a bad API. So this patch removes them both to make the code simpler. Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: Rusty Russell <rusty@rustcorp.com.au> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
6cd4ce00 |
|
29-Dec-2013 |
Jason Wang <jasowang@redhat.com> |
virtio-net: fix refill races during restore During restoring, try_fill_recv() was called with neither napi lock nor napi disabled. This can lead two try_fill_recv() was called in the same time. Fix this by refilling before trying to enable napi. Fixes 0741bcb5584f9e2390ae6261573c4de8314999f2 (virtio: net: Add freeze, restore handlers to support S4). Cc: Amit Shah <amit.shah@redhat.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
788a8b6d |
|
09-Dec-2013 |
stephen hemminger <stephen@networkplumber.org> |
virtio_net: spelling fixes Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
d24bae32 |
|
09-Dec-2013 |
stephen hemminger <stephen@networkplumber.org> |
virtio_net: remove unused parameter to send_command All the code passes NULL for the last sg list (in). Simplify by just removing it. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
98bfd23c |
|
05-Dec-2013 |
Michael Dalton <mwdalton@google.com> |
virtio-net: free bufs correctly on invalid packet length When a packet with invalid length arrives, ensure that the packet is freed correctly if mergeable packet buffers and big packets (GUEST_TSO4) are both enabled. Signed-off-by: Michael Dalton <mwdalton@google.com> Acked-by: Jason Wang <jasowang@redhat.com> Acked-by: Andrew Vagin <avagin@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
d4fb84ee |
|
05-Dec-2013 |
Andrey Vagin <avagin@openvz.org> |
virtio: delete napi structures from netdev before releasing memory free_netdev calls netif_napi_del too, but it's too late, because napi structures are placed on vi->rq. netif_napi_add() is called from virtnet_alloc_queues. general protection fault: 0000 [#1] SMP Dumping ftrace buffer: (ftrace buffer empty) Modules linked in: ip6table_filter ip6_tables iptable_filter ip_tables virtio_balloon pcspkr virtio_net(-) i2c_pii CPU: 1 PID: 347 Comm: rmmod Not tainted 3.13.0-rc2+ #171 Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 task: ffff8800b779c420 ti: ffff8800379e0000 task.ti: ffff8800379e0000 RIP: 0010:[<ffffffff81322e19>] [<ffffffff81322e19>] __list_del_entry+0x29/0xd0 RSP: 0018:ffff8800379e1dd0 EFLAGS: 00010a83 RAX: 6b6b6b6b6b6b6b6b RBX: ffff8800379c2fd0 RCX: dead000000200200 RDX: 6b6b6b6b6b6b6b6b RSI: 0000000000000001 RDI: ffff8800379c2fd0 RBP: ffff8800379e1dd0 R08: 0000000000000001 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000001 R12: ffff8800379c2f90 R13: ffff880037839160 R14: 0000000000000000 R15: 00000000013352f0 FS: 00007f1400e34740(0000) GS:ffff8800bfb00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 00007f464124c763 CR3: 00000000b68cf000 CR4: 00000000000006e0 Stack: ffff8800379e1df0 ffffffff8155beab 6b6b6b6b6b6b6b2b ffff8800378391c0 ffff8800379e1e18 ffffffff8156499b ffff880037839be0 ffff880037839d20 ffff88003779d3f0 ffff8800379e1e38 ffffffffa003477c ffff88003779d388 Call Trace: [<ffffffff8155beab>] netif_napi_del+0x1b/0x80 [<ffffffff8156499b>] free_netdev+0x8b/0x110 [<ffffffffa003477c>] virtnet_remove+0x7c/0x90 [virtio_net] [<ffffffff813ae323>] virtio_dev_remove+0x23/0x80 [<ffffffff813f62ef>] __device_release_driver+0x7f/0xf0 [<ffffffff813f6ca0>] driver_detach+0xc0/0xd0 [<ffffffff813f5f28>] bus_remove_driver+0x58/0xd0 [<ffffffff813f72ec>] driver_unregister+0x2c/0x50 [<ffffffff813ae65e>] unregister_virtio_driver+0xe/0x10 [<ffffffffa0036942>] virtio_net_driver_exit+0x10/0x6ce [virtio_net] [<ffffffff810d7cf2>] SyS_delete_module+0x172/0x220 [<ffffffff810a732d>] ? trace_hardirqs_on+0xd/0x10 [<ffffffff810f5d4c>] ? __audit_syscall_entry+0x9c/0xf0 [<ffffffff81677f69>] system_call_fastpath+0x16/0x1b Code: 00 00 55 48 8b 17 48 b9 00 01 10 00 00 00 ad de 48 8b 47 08 48 89 e5 48 39 ca 74 29 48 b9 00 02 20 00 00 00 RIP [<ffffffff81322e19>] __list_del_entry+0x29/0xd0 RSP <ffff8800379e1dd0> ---[ end trace d5931cd3f87c9763 ]--- Fixes: 986a4f4d452d (virtio_net: multiqueue support) Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: "Michael S. Tsirkin" <mst@redhat.com> Signed-off-by: Andrey Vagin <avagin@openvz.org> Acked-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
fa9fac17 |
|
05-Dec-2013 |
Andrey Vagin <avagin@openvz.org> |
virtio-net: determine type of bufs correctly free_unused_bufs must check vi->mergeable_rx_bufs before vi->big_packets, because we use this sequence in other places. Otherwise we allocate buffer of one type, then free it as another type. general protection fault: 0000 [#1] SMP Dumping ftrace buffer: (ftrace buffer empty) Modules linked in: ip6table_filter ip6_tables iptable_filter ip_tables pcspkr virtio_balloon virtio_net(-) i2c_pii CPU: 0 PID: 400 Comm: rmmod Not tainted 3.13.0-rc2+ #170 Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 task: ffff8800b6d2a210 ti: ffff8800aed32000 task.ti: ffff8800aed32000 RIP: 0010:[<ffffffffa00345f3>] [<ffffffffa00345f3>] free_unused_bufs+0xc3/0x190 [virtio_net] RSP: 0018:ffff8800aed33dd8 EFLAGS: 00010202 RAX: ffff8800b1fe2c00 RBX: ffff8800b66a7240 RCX: 6b6b6b6b6b6b6b6b RDX: 6b6b6b6b6b6b6b6b RSI: ffff8800b8419a68 RDI: ffff8800b66a1148 RBP: ffff8800aed33e00 R08: 0000000000000001 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000 R13: ffff8800b66a1148 R14: 0000000000000000 R15: 000077ff80000000 FS: 00007fc4f9c4e740(0000) GS:ffff8800bfa00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 00007f63f432f000 CR3: 00000000b6538000 CR4: 00000000000006f0 Stack: ffff8800b66a7240 ffff8800b66a7380 ffff8800377bd3f0 0000000000000000 00000000023302f0 ffff8800aed33e18 ffffffffa00346e2 ffff8800b66a7240 ffff8800aed33e38 ffffffffa003474d ffff8800377bd388 ffff8800377bd390 Call Trace: [<ffffffffa00346e2>] remove_vq_common+0x22/0x40 [virtio_net] [<ffffffffa003474d>] virtnet_remove+0x4d/0x90 [virtio_net] [<ffffffff813ae303>] virtio_dev_remove+0x23/0x80 [<ffffffff813f62cf>] __device_release_driver+0x7f/0xf0 [<ffffffff813f6c80>] driver_detach+0xc0/0xd0 [<ffffffff813f5f08>] bus_remove_driver+0x58/0xd0 [<ffffffff813f72cc>] driver_unregister+0x2c/0x50 [<ffffffff813ae63e>] unregister_virtio_driver+0xe/0x10 [<ffffffffa0036852>] virtio_net_driver_exit+0x10/0x7be [virtio_net] [<ffffffff810d7cf2>] SyS_delete_module+0x172/0x220 [<ffffffff810a732d>] ? trace_hardirqs_on+0xd/0x10 [<ffffffff810f5d4c>] ? __audit_syscall_entry+0x9c/0xf0 [<ffffffff81677f69>] system_call_fastpath+0x16/0x1b Code: c0 74 55 0f 1f 44 00 00 80 7b 30 00 74 7a 48 8b 50 30 4c 89 e6 48 03 73 20 48 85 d2 0f 84 bb 00 00 00 66 0f RIP [<ffffffffa00345f3>] free_unused_bufs+0xc3/0x190 [virtio_net] RSP <ffff8800aed33dd8> ---[ end trace edb570ea923cce9c ]--- Fixes: 2613af0ed18a (virtio_net: migrate mergeable rx buffers to page frag allocators) Cc: Michael Dalton <mwdalton@google.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: "Michael S. Tsirkin" <mst@redhat.com> Signed-off-by: Andrey Vagin <avagin@openvz.org> Acked-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
adf8d3ff |
|
06-Dec-2013 |
Jeff Kirsher <jeffrey.t.kirsher@intel.com> |
drivers/net/*: Fix FSF address in file headers Several files refer to an old address for the Free Software Foundation in the file header comment. Resolve by replacing the address with the URL <http://www.gnu.org/licenses/> so that we do not have to keep updating the header comments anytime the address changes. CC: Jay Vosburgh <fubar@us.ibm.com> CC: Veaceslav Falico <vfalico@redhat.com> CC: Andy Gospodarek <andy@greyhouse.net> CC: Haiyang Zhang <haiyangz@microsoft.com> CC: "K. Y. Srinivasan" <kys@microsoft.com> CC: Paul Mackerras <paulus@samba.org> CC: Ian Campbell <ian.campbell@citrix.com> CC: Wei Liu <wei.liu2@citrix.com> CC: Rusty Russell <rusty@rustcorp.com.au> CC: "Michael S. Tsirkin" <mst@redhat.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Acked-by: Wei Liu <wei.liu2@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
f121159d |
|
28-Nov-2013 |
Michael S. Tsirkin <mst@redhat.com> |
virtio_net: make all RX paths handle erors consistently receive mergeable now handles errors internally. Do same for big and small packet paths, otherwise the logic is too hard to follow. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
8fc3b9e9 |
|
28-Nov-2013 |
Michael S. Tsirkin <mst@redhat.com> |
virtio_net: fix error handling for mergeable buffers Eric Dumazet noticed that if we encounter an error when processing a mergeable buffer, we don't dequeue all of the buffers from this packet, the result is almost sure to be loss of networking. Jason Wang noticed that we also leak a page and that we don't decrement the rq buf count, so we won't repost buffers (a resource leak). Fix both issues. Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Michael Dalton <mwdalton@google.com> Reported-by: Eric Dumazet <edumazet@google.com> Reported-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
99e872ae |
|
29-Nov-2013 |
Thomas Huth <thuth@linux.vnet.ibm.com> |
virtio_net: Fixed a trivial typo (fitler --> filter) "MAC filter" sounds more reasonable than "MAC fitler". Signed-off-by: Thomas Huth <thuth@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
0f13b66b |
|
18-Nov-2013 |
Zhi Yong Wu <wuzhy@linux.vnet.ibm.com> |
net, virtio_net: replace the magic value It is more appropriate to use # of queue pairs currently used by the driver instead of a magic value. Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
5061de36 |
|
14-Nov-2013 |
Michael Dalton <mwdalton@google.com> |
virtio-net: mergeable buffer size should include virtio-net header Commit 2613af0ed18a ("virtio_net: migrate mergeable rx buffers to page frag allocators") changed the mergeable receive buffer size from PAGE_SIZE to MTU-size. However, the merge buffer size does not take into account the size of the virtio-net header. Consequently, packets that are MTU-size will take two buffers intead of one (to store the virtio-net header), substantially decreasing the throughput of MTU-size traffic due to TCP window / SKB truesize effects. This commit changes the mergeable buffer size to include the virtio-net header. The buffer size is cacheline-aligned because skb_page_frag_refill will not automatically align the requested size. Benchmarks taken from an average of 5 netperf 30-second TCP_STREAM runs between two QEMU VMs on a single physical machine. Each VM has two VCPUs and vhost enabled. All VMs and vhost threads run in a single 4 CPU cgroup cpuset, using cgroups to ensure that other processes in the system will not be scheduled on the benchmark CPUs. Transmit offloads and mergeable receive buffers are enabled, but guest_tso4 / guest_csum are explicitly disabled to force MTU-sized packets on the receiver. next-net trunk before 2613af0ed18a (PAGE_SIZE buf): 3861.08Gb/s net-next trunk (MTU 1500- packet uses two buf due to size bug): 4076.62Gb/s net-next trunk (MTU 1480- packet fits in one buf): 6301.34Gb/s net-next trunk w/ size fix (MTU 1500 - packet fits in one buf): 6445.44Gb/s Suggested-by: Eric Northup <digitaleric@google.com> Signed-off-by: Michael Dalton <mwdalton@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
827da44c |
|
07-Oct-2013 |
John Stultz <john.stultz@linaro.org> |
net: Explicitly initialize u64_stats_sync structures for lockdep In order to enable lockdep on seqcount/seqlock structures, we must explicitly initialize any locks. The u64_stats_sync structure, uses a seqcount, and thus we need to introduce a u64_stats_init() function and use it to initialize the structure. This unfortunately adds a lot of fairly trivial initialization code to a number of drivers. But the benefit of ensuring correctness makes this worth while. Because these changes are required for lockdep to be enabled, and the changes are quite trivial, I've not yet split this patch out into 30-some separate patches, as I figured it would be better to get the various maintainers thoughts on how to best merge this change along with the seqcount lockdep enablement. Feedback would be appreciated! Signed-off-by: John Stultz <john.stultz@linaro.org> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> Cc: James Morris <jmorris@namei.org> Cc: Jesse Gross <jesse@nicira.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Mirko Lindner <mlindner@marvell.com> Cc: Patrick McHardy <kaber@trash.net> Cc: Roger Luethi <rl@hellgate.ch> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Simon Horman <horms@verge.net.au> Cc: Stephen Hemminger <stephen@networkplumber.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Petazzoni <thomas.petazzoni@free-electrons.com> Cc: Wensong Zhang <wensong@linux-vs.org> Cc: netdev@vger.kernel.org Link: http://lkml.kernel.org/r/1381186321-4906-2-git-send-email-john.stultz@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
9bb8ca86 |
|
05-Nov-2013 |
Jason Wang <jasowang@redhat.com> |
virtio-net: switch to use XPS to choose txq We used to use a percpu structure vq_index to record the cpu to queue mapping, this is suboptimal since it duplicates the work of XPS and loses all other XPS functionality such as allowing user to configure their own transmission steering strategy. So this patch switches to use XPS and suggest a default mapping when the number of cpus is equal to the number of queues. With XPS support, there's no need for keeping per-cpu vq_index and .ndo_select_queue(), so they were removed also. Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Michael S. Tsirkin <mst@redhat.com> Acked-by: Rusty Russell <rusty@rustcorp.com.au> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
ba275241 |
|
01-Nov-2013 |
Jason Wang <jasowang@redhat.com> |
virtio-net: coalesce rx frags when possible during rx Commit 2613af0ed18a11d5c566a81f9a6510b73180660a (virtio_net: migrate mergeable rx buffers to page frag allocators) try to increase the payload/truesize for MTU-sized traffic. But this will introduce the extra overhead for GSO packets received because of the frag list. This commit tries to reduce this issue by coalesce the possible rx frags when possible during rx. Test result shows the about 15% improvement on full size GSO packet receiving (and even better than before commit 2613af0ed18a11d5c566a81f9a6510b73180660a). Before this commit: ./netperf -H 192.168.100.4 MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.100.4 () port 0 AF_INET : demo Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 87380 16384 16384 10.00 20303.87 After this commit: ./netperf -H 192.168.100.4 MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.100.4 () port 0 AF_INET : demo Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 87380 16384 16384 10.00 23841.26 Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Michael Dalton <mwdalton@google.com> Cc: Eric Dumazet <edumazet@google.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
ec9debbd |
|
29-Oct-2013 |
Jason Wang <jasowang@redhat.com> |
virtio-net: correctly handle cpu hotplug notifier during resuming commit 3ab098df35f8b98b6553edc2e40234af512ba877 (virtio-net: don't respond to cpu hotplug notifier if we're not ready) tries to bypass the cpu hotplug notifier by checking the config_enable and does nothing is it was false. So it need to try to hold the config_lock mutex which may happen in atomic environment which leads the following warnings: [ 622.944441] CPU0 attaching NULL sched-domain. [ 622.944446] CPU1 attaching NULL sched-domain. [ 622.944485] CPU0 attaching NULL sched-domain. [ 622.950795] BUG: sleeping function called from invalid context at kernel/mutex.c:616 [ 622.950796] in_atomic(): 1, irqs_disabled(): 1, pid: 10, name: migration/1 [ 622.950796] no locks held by migration/1/10. [ 622.950798] CPU: 1 PID: 10 Comm: migration/1 Not tainted 3.12.0-rc5-wl-01249-gb91e82d #317 [ 622.950799] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 [ 622.950802] 0000000000000000 ffff88001d42dba0 ffffffff81a32f22 ffff88001bfb9c70 [ 622.950803] ffff88001d42dbb0 ffffffff810edb02 ffff88001d42dc38 ffffffff81a396ed [ 622.950805] 0000000000000046 ffff88001d42dbe8 ffffffff810e861d 0000000000000000 [ 622.950805] Call Trace: [ 622.950810] [<ffffffff81a32f22>] dump_stack+0x54/0x74 [ 622.950815] [<ffffffff810edb02>] __might_sleep+0x112/0x114 [ 622.950817] [<ffffffff81a396ed>] mutex_lock_nested+0x3c/0x3c6 [ 622.950818] [<ffffffff810e861d>] ? up+0x39/0x3e [ 622.950821] [<ffffffff8153ea7c>] ? acpi_os_signal_semaphore+0x21/0x2d [ 622.950824] [<ffffffff81565ed1>] ? acpi_ut_release_mutex+0x5e/0x62 [ 622.950828] [<ffffffff816d04ec>] virtnet_cpu_callback+0x33/0x87 [ 622.950830] [<ffffffff81a42576>] notifier_call_chain+0x3c/0x5e [ 622.950832] [<ffffffff810e86a8>] __raw_notifier_call_chain+0xe/0x10 [ 622.950835] [<ffffffff810c5556>] __cpu_notify+0x20/0x37 [ 622.950836] [<ffffffff810c5580>] cpu_notify+0x13/0x15 [ 622.950838] [<ffffffff81a237cd>] take_cpu_down+0x27/0x3a [ 622.950841] [<ffffffff81136289>] stop_machine_cpu_stop+0x93/0xf1 [ 622.950842] [<ffffffff81136167>] cpu_stopper_thread+0xa0/0x12f [ 622.950844] [<ffffffff811361f6>] ? cpu_stopper_thread+0x12f/0x12f [ 622.950847] [<ffffffff81119710>] ? lock_release_holdtime.part.7+0xa3/0xa8 [ 622.950848] [<ffffffff81135e4b>] ? cpu_stop_should_run+0x3f/0x47 [ 622.950850] [<ffffffff810ea9b0>] smpboot_thread_fn+0x1c5/0x1e3 [ 622.950852] [<ffffffff810ea7eb>] ? lg_global_unlock+0x67/0x67 [ 622.950854] [<ffffffff810e36b7>] kthread+0xd8/0xe0 [ 622.950857] [<ffffffff81a3bfad>] ? wait_for_common+0x12f/0x164 [ 622.950859] [<ffffffff810e35df>] ? kthread_create_on_node+0x124/0x124 [ 622.950861] [<ffffffff81a45ffc>] ret_from_fork+0x7c/0xb0 [ 622.950862] [<ffffffff810e35df>] ? kthread_create_on_node+0x124/0x124 [ 622.950876] smpboot: CPU 1 is now offline [ 623.194556] SMP alternatives: lockdep: fixing up alternatives [ 623.194559] smpboot: Booting Node 0 Processor 1 APIC 0x1 ... A correct fix is to unregister the hotcpu notifier during restore and register a new one in resume. Reported-by: Fengguang Wu <fengguang.wu@intel.com> Tested-by: Fengguang Wu <fengguang.wu@intel.com> Cc: Wanlong Gao <gaowanlong@cn.fujitsu.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Wanlong Gao <gaowanlong@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
2613af0e |
|
28-Oct-2013 |
Michael Dalton <mwdalton@google.com> |
virtio_net: migrate mergeable rx buffers to page frag allocators The virtio_net driver's mergeable receive buffer allocator uses 4KB packet buffers. For MTU-sized traffic, SKB truesize is > 4KB but only ~1500 bytes of the buffer is used to store packet data, reducing the effective TCP window size substantially. This patch addresses the performance concerns with mergeable receive buffers by allocating MTU-sized packet buffers using page frag allocators. If more than MAX_SKB_FRAGS buffers are needed, the SKB frag_list is used. Signed-off-by: Michael Dalton <mwdalton@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
047b9b94 |
|
28-Oct-2013 |
Heinz Graalfs <graalfs@linux.vnet.ibm.com> |
virtio_net: verify if queue is broken after virtqueue_get_buf() If a virtqueue_get_buf() call returns a NULL pointer a possibly endless while loop should be avoided by checking for a broken virtqueue. Signed-off-by: Heinz Graalfs <graalfs@linux.vnet.ibm.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
|
#
67975901 |
|
28-Oct-2013 |
Heinz Graalfs <graalfs@linux.vnet.ibm.com> |
virtio_net: verify if virtqueue_kick() succeeded Verify if a host kick succeeded by checking return value of virtqueue_kick(). Signed-off-by: Heinz Graalfs <graalfs@linux.vnet.ibm.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
|
#
35ed159b |
|
14-Oct-2013 |
Jason Wang <jasowang@redhat.com> |
virtio-net: refill only when device is up during setting queues We used to schedule the refill work unconditionally after changing the number of queues. This may lead an issue if the device is not up. Since we only try to cancel the work in ndo_stop(), this may cause the refill work still work after removing the device. Fix this by only schedule the work when device is up. The bug were introduce by commit 9b9cd8024a2882e896c65222aa421d461354e3f2. (virtio-net: fix the race between channels setting and refill) Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
3ab098df |
|
14-Oct-2013 |
Jason Wang <jasowang@redhat.com> |
virtio-net: don't respond to cpu hotplug notifier if we're not ready We're trying to re-configure the affinity unconditionally in cpu hotplug callback. This may lead the issue during resuming from s3/s4 since - virt queues haven't been allocated at that time. - it's unnecessary since thaw method will re-configure the affinity. Fix this issue by checking the config_enable and do nothing is we're not ready. The bug were introduced by commit 8de4b2f3ae90c8fc0f17eeaab87d5a951b66ee17 (virtio-net: reset virtqueue affinity when doing cpu hotplug). Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Wanlong Gao <gaowanlong@cn.fujitsu.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Wanlong Gao <gaowanlong@cn.fujitsu.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
855e0c52 |
|
14-Oct-2013 |
Rusty Russell <rusty@rustcorp.com.au> |
virtio: use size-based config accessors. This lets the transport do endian conversion if necessary, and insulates the drivers from the difference. Most drivers can use the simple helpers virtio_cread() and virtio_cwrite(). Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
|
#
89107000 |
|
16-Sep-2013 |
Aaron Lu <aaron.lu@intel.com> |
virtio: pm: use CONFIG_PM_SLEEP instead of CONFIG_PM The freeze and restore functions defined in virtio drivers are used for suspend and hibernate, so CONFIG_PM_SLEEP is more appropriate than CONFIG_PM. This patch replace all CONFIG_PM with CONFIG_PM_SLEEP for virtio drivers that implement freeze and restore callbacks. Signed-off-by: Aaron Lu <aaron.lu@intel.com> Reviewed-by: Amit Shah <amit.shah@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
|
#
4f49129b |
|
27-Aug-2013 |
Thomas Huth <thuth@linux.vnet.ibm.com> |
virtio-net: Set RXCSUM feature if GUEST_CSUM is available If the VIRTIO_NET_F_GUEST_CSUM virtio feature is available, the guest does not have to calculate the checksums on all received packets. This is pretty much the same feature as RX checksum offloading on real network cards, so the virtio-net driver should report this by setting the NETIF_F_RXCSUM flag. When the user now runs "ethtool -k", he or she can see whether the virtio-net interface has to calculate RX checksums or not. Signed-off-by: Thomas Huth <thuth@linux.vnet.ibm.com> Acked-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
e7428e95 |
|
24-Jul-2013 |
Michael S. Tsirkin <mst@redhat.com> |
virtio-net: put virtio net header inline with data For small packets we can simplify xmit processing by linearizing buffers with the header: most packets seem to have enough head room we can use for this purpose. Since existing hypervisors require that header is the first s/g element, we need a feature bit for this. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
cbdadbbf |
|
08-Jul-2013 |
Michael S. Tsirkin <mst@redhat.com> |
virtio_net: fix race in RX VQ processing virtio net called virtqueue_enable_cq on RX path after napi_complete, so with NAPI_STATE_SCHED clear - outside the implicit napi lock. This violates the requirement to synchronize virtqueue_enable_cq wrt virtqueue_add_buf. In particular, used event can move backwards, causing us to lose interrupts. In a debug build, this can trigger panic within START_USE. Jason Wang reports that he can trigger the races artificially, by adding udelay() in virtqueue_enable_cb() after virtio_mb(). However, we must call napi_complete to clear NAPI_STATE_SCHED before polling the virtqueue for used buffers, otherwise napi_schedule_prep in a callback will fail, causing us to lose RX events. To fix, call virtqueue_enable_cb_prepare with NAPI_STATE_SCHED set (under napi lock), later call virtqueue_poll with NAPI_STATE_SCHED clear (outside the lock). Reported-by: Jason Wang <jasowang@redhat.com> Tested-by: Jason Wang <jasowang@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
9b9cd802 |
|
03-Jul-2013 |
Jason Wang <jasowang@redhat.com> |
virtio-net: fix the race between channels setting and refill Commit 55257d72bd1c51f25106350f4983ec19f62ed1fa (virtio-net: fill only rx queues which are being used) tries to refill on demand when changing the number of channels by call try_refill_recv() directly, this may race: - the refill work who may do the refill in the same time - the try_refill_recv() called in bh since napi was not disabled Which may led guest complain during setting channels: virtio_net virtio0: input.1:id 0 is not a head! Solve this issue by scheduling a refill work which can guarantee the serialization of refill. Cc: Sasha Levin <sasha.levin@oracle.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
|
#
e4166625 |
|
21-May-2013 |
Jason Wang <jasowang@redhat.com> |
virtio_net: enable napi for all possible queues during open Commit 55257d72bd1c51f25106350f4983ec19f62ed1fa (virtio-net: fill only rx queues which are being used) only does the napi enabling during open for curr_queue_pairs. This will break multiqueue receiving since napi of new queues were still disabled after changing the number of queues. This patch fixes this by enabling napi for all possible queues during open. Cc: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: Rusty Russell <rusty@rustcorp.com.au> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
d34710e3 |
|
09-May-2013 |
Amerigo Wang <amwang@redhat.com> |
virtio_net: use default napi weight by default Since commit 82dc3c63c692b1e1d5937 ("net: introduce NAPI_POLL_WEIGHT") we warn drivers when they use napi weight higher than NAPI_POLL_WEIGHT, but virtio_net still uses 128 by default. This patch makes its default value to NAPI_POLL_WEIGHT. Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
55257d72 |
|
28-Apr-2013 |
Sasha Levin <sasha.levin@oracle.com> |
virtio-net: fill only rx queues which are being used Due to MQ support we may allocate a whole bunch of rx queues but never use them. With this patch we'll safe the space used by the receive buffers until they are actually in use: sh-4.2# free -h total used free shared buffers cached Mem: 490M 35M 455M 0B 0B 4.1M -/+ buffers/cache: 31M 459M Swap: 0B 0B 0B sh-4.2# ethtool -L eth0 combined 8 sh-4.2# free -h total used free shared buffers cached Mem: 490M 162M 327M 0B 0B 4.1M -/+ buffers/cache: 158M 331M Swap: 0B 0B 0B Signed-off-by: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
|
#
80d5c368 |
|
18-Apr-2013 |
Patrick McHardy <kaber@trash.net> |
net: vlan: prepare for 802.1ad VLAN filtering offload Change the rx_{add,kill}_vid callbacks to take a protocol argument in preparation of 802.1ad support. The protocol argument used so far is always htons(ETH_P_8021Q). Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
f646968f |
|
18-Apr-2013 |
Patrick McHardy <kaber@trash.net> |
net: vlan: rename NETIF_F_HW_VLAN_* feature flags to NETIF_F_HW_VLAN_CTAG_* Rename the hardware VLAN acceleration features to include "CTAG" to indicate that they only support CTAGs. Follow up patches will introduce 802.1ad server provider tagging (STAGs) and require the distinction for hardware not supporting acclerating both. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
4fda8302 |
|
10-Apr-2013 |
Jason Wang <jasowang@redhat.com> |
virtio-net: initialize vlan_features There's nothing that prevent passing the device features of virtio_net to its vlan device. So this patch simply passes those to vlan device to benefit from advanced features. Netperf shows better sending performance for vlan device since TSO can work on vlan now. before: netperf -H 192.168.5.2 MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.5.2 () port 0 AF_INET : demo Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 87380 16384 16384 10.00 4162.35 after: netperf -H 192.168.5.2 MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.5.2 () port 0 AF_INET : demo Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 87380 16384 16384 10.00 9365.42 Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: "Michael S. Tsirkin" <mst@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
9d0ca6ed |
|
21-Mar-2013 |
Rusty Russell <rusty@rustcorp.com.au> |
virtio: remove obsolete virtqueue_get_queue_index() You can access it directly now, since 3.8: v3.7-rc1-13-g06ca287 'virtio: move queue_index and num_free fields into core struct virtqueue.' Cc: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
9dc7b9e4 |
|
19-Mar-2013 |
Rusty Russell <rusty@rustcorp.com.au> |
virtio_net: use simplified virtqueue accessors. We never add buffers with input and output parts, so use the new accessors. Cc: "Michael S. Tsirkin" <mst@redhat.com> Reviewed-by: Asias He <asias@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
|
#
f7bc9594 |
|
19-Mar-2013 |
Rusty Russell <rusty@rustcorp.com.au> |
virtio_net: use virtqueue_add_sgs[] for command buffers. It's a bit cleaner to hand multiple sgs, rather than one big one. Cc: "Michael S. Tsirkin" <mst@redhat.com> Tested-by: Wanlong Gao <gaowanlong@cn.fujitsu.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
|
#
c9af6db4 |
|
11-Feb-2013 |
Pravin B Shelar <pshelar@nicira.com> |
net: Fix possible wrong checksum generation. Patch cef401de7be8c4e (net: fix possible wrong checksum generation) fixed wrong checksum calculation but it broke TSO by defining new GSO type but not a netdev feature for that type. net_gso_ok() would not allow hardware checksum/segmentation offload of such packets without the feature. Following patch fixes TSO and wrong checksum. This patch uses same logic that Eric Dumazet used. Patch introduces new flag SKBTX_SHARED_FRAG if at least one frag can be modified by the user. but SKBTX_SHARED_FRAG flag is kept in skb shared info tx_flags rather than gso_type. tx_flags is better compared to gso_type since we can have skb with shared frag without gso packet. It does not link SHARED_FRAG to GSO, So there is no need to define netdev feature for this. Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
b2a17029 |
|
12-Feb-2013 |
Rusty Russell <rusty@rustcorp.com.au> |
virtio: use module_virtio_driver. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
|
#
e68ed8f0 |
|
03-Feb-2013 |
Joe Perches <joe@perches.com> |
drivers:net:misc: Remove unnecessary alloc/OOM messages alloc failures already get standardized OOM messages and a dump_stack. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
cef401de |
|
25-Jan-2013 |
Eric Dumazet <edumazet@google.com> |
net: fix possible wrong checksum generation Pravin Shelar mentioned that GSO could potentially generate wrong TX checksum if skb has fragments that are overwritten by the user between the checksum computation and transmit. He suggested to linearize skbs but this extra copy can be avoided for normal tcp skbs cooked by tcp_sendmsg(). This patch introduces a new SKB_GSO_SHARED_FRAG flag, set in skb_shinfo(skb)->gso_type if at least one frag can be modified by the user. Typical sources of such possible overwrites are {vm}splice(), sendfile(), and macvtap/tun/virtio_net drivers. Tested: $ netperf -H 7.7.8.84 MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 7.7.8.84 () port 0 AF_INET Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 87380 16384 16384 10.00 3959.52 $ netperf -H 7.7.8.84 -t TCP_SENDFILE TCP SENDFILE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 7.7.8.84 () port 0 AF_INET Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 87380 16384 16384 10.00 3216.80 Performance of the SENDFILE is impacted by the extra allocation and copy, and because we use order-0 pages, while the TCP_STREAM uses bigger pages. Reported-by: Pravin Shelar <pshelar@nicira.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
8de4b2f3 |
|
24-Jan-2013 |
Wanlong Gao <gaowanlong@cn.fujitsu.com> |
virtio-net: reset virtqueue affinity when doing cpu hotplug Add a cpu notifier to virtio-net, so that we can reset the virtqueue affinity if the cpu hotplug happens. It improve the performance through enabling or disabling the virtqueue affinity after doing cpu hotplug. Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Eric Dumazet <erdnetdev@gmail.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: virtualization@lists.linux-foundation.org Cc: netdev@vger.kernel.org Signed-off-by: Wanlong Gao <gaowanlong@cn.fujitsu.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
8898c21c |
|
24-Jan-2013 |
Wanlong Gao <gaowanlong@cn.fujitsu.com> |
virtio-net: split out clean affinity function Split out the clean affinity function to virtnet_clean_affinity(). Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Eric Dumazet <erdnetdev@gmail.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: virtualization@lists.linux-foundation.org Cc: netdev@vger.kernel.org Signed-off-by: Wanlong Gao <gaowanlong@cn.fujitsu.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
47be2479 |
|
24-Jan-2013 |
Wanlong Gao <gaowanlong@cn.fujitsu.com> |
virtio-net: fix the set affinity bug when CPU IDs are not consecutive As Michael mentioned, set affinity and select queue will not work very well when CPU IDs are not consecutive, this can happen with hot unplug. Fix this bug by traversal the online CPUs, and create a per cpu variable to find the mapping from CPU to the preferable virtual-queue. Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Eric Dumazet <erdnetdev@gmail.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: virtualization@lists.linux-foundation.org Cc: netdev@vger.kernel.org Signed-off-by: Wanlong Gao <gaowanlong@cn.fujitsu.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
7e58d5ae |
|
20-Jan-2013 |
Amos Kong <akong@redhat.com> |
virtio-net: introduce a new control to set macaddr Currently we write MAC address to pci config space byte by byte, this means that we have an intermediate step where mac is wrong. This patch introduced a new control command to set MAC address, it's atomic. VIRTIO_NET_F_CTRL_MAC_ADDR is a new feature bit for compatibility. Signed-off-by: Amos Kong <akong@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
40cbfc37 |
|
20-Jan-2013 |
Amos Kong <akong@redhat.com> |
move virtnet_send_command() above virtnet_set_mac_address() We want to send vq command to set mac address in virtnet_set_mac_address(), so do this function moving. Fixed a little issue of coding style. Signed-off-by: Amos Kong <akong@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
0e3daa64 |
|
16-Oct-2012 |
Rusty Russell <rusty@rustcorp.com.au> |
virtio: net: make it clear that virtqueue_add_buf() no longer returns > 0 We simplified virtqueue_add_buf(), make it clear in the callers. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
|
#
9ed4cb07 |
|
16-Oct-2012 |
Rusty Russell <rusty@rustcorp.com.au> |
virtio_net: don't rely on virtqueue_add_buf() returning capacity. Now we can easily use vq->num_free to determine if there are descriptors left in the queue, we're about to change virtqueue_add_buf() to return 0 on success. The virtio_net driver is the only one which actually uses the return value, so change that. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Acked-by: Michael S. Tsirkin <mst@redhat.com>
|
#
7bedc7dc |
|
16-Oct-2012 |
Michael S. Tsirkin <mst@redhat.com> |
virtio-net: remove unused skb_vnet_hdr->num_sg field [Split from "correct capacity math on ring full" -- Rusty] Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Acked-by: Michael S. Tsirkin <mst@redhat.com>
|
#
6ee57bcc |
|
16-Oct-2012 |
Michael S. Tsirkin <mst@redhat.com> |
virtio-net: correct capacity math on ring full Capacity math on ring full is wrong: we are looking at num_sg but that might be optimistic because of indirect buffer use. The implementation also penalizes fast path with extra memory accesses for the benefit of ring full condition handling which is slow path. It's easy to query ring capacity so let's do just that. This change also makes it easier to move vnet header for tx around as follow-up patch does. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Acked-by: Michael S. Tsirkin <mst@redhat.com>
|
#
008d4278 |
|
09-Dec-2012 |
Amerigo Wang <amwang@redhat.com> |
virtio_net: fix a typo in virtnet_alloc_queues() Obviously it should check !vi->rq. Reported-by: Fengguang Wu <fengguang.wu@intel.com> Cc: Jason Wang <jasowang@redhat.com> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
d73bcd2c |
|
07-Dec-2012 |
Jason Wang <jasowang@redhat.com> |
virtio-net: support changing the number of queue pairs through ethtool This patch implements the ethtool_{set|get}_channels method of virtio-net to allow user to change the number of queues when the device is running on demand. Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
986a4f4d |
|
07-Dec-2012 |
Jason Wang <jasowang@redhat.com> |
virtio_net: multiqueue support This patch adds the multiqueue (VIRTIO_NET_F_MQ) support to virtio_net driver. VIRTIO_NET_F_MQ capable device could allow the driver to do packet transmission and reception through multiple queue pairs and does the packet steering to get better performance. By default, one one queue pair is used, user could change the number of queue pairs by ethtool in the next patch. When multiple queue pairs is used and the number of queue pairs is equal to the number of vcpus. Driver does the following optimizations to implement per-cpu virt queue pairs: - select the txq based on the smp processor id. - smp affinity hint to the cpu that owns the queue pairs. This could be used with the flow steering support of the device to guarantee the packets of a single flow is handled by the same cpu. Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
e9d7417b |
|
07-Dec-2012 |
Jason Wang <jasowang@redhat.com> |
virtio-net: separate fields of sending/receiving queue from virtnet_info To support multiqueue transmitq/receiveq, the first step is to separate queue related structure from virtnet_info. This patch introduce send_queue and receive_queue structure and use the pointer to them as the parameter in functions handling sending/receiving. Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
8cc085d6 |
|
03-Dec-2012 |
Bill Pemberton <wfp5p@virginia.edu> |
virtio_net: remove __dev* attributes CONFIG_HOTPLUG is going away as an option. As result the __dev* markings will be going away. Remove use of __devinit, __devexit_p, __devinitdata, __devinitconst, and __devexit. Signed-off-by: Bill Pemberton <wfp5p@virginia.edu> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: virtualization@lists.linux-foundation.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
#
be443899 |
|
08-Nov-2012 |
Amerigo Wang <amwang@redhat.com> |
virtio_net: use net_*_ratelimited() helpers These can be converted to net_*_ratelimited(). Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: David Miller <davem@davemloft.net> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
3b07e9ca |
|
20-Aug-2012 |
Tejun Heo <tj@kernel.org> |
workqueue: deprecate system_nrt[_freezable]_wq system_nrt[_freezable]_wq are now spurious. Mark them deprecated and convert all users to system[_freezable]_wq. If you're cc'd and wondering what's going on: Now all workqueues are non-reentrant, so there's no reason to use system_nrt[_freezable]_wq. Please use system[_freezable]_wq instead. This patch doesn't make any functional difference. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-By: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: David Airlie <airlied@linux.ie> Cc: Jiri Kosina <jkosina@suse.cz> Cc: "David S. Miller" <davem@davemloft.net> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: David Howells <dhowells@redhat.com>
|
#
ee89bab1 |
|
09-Aug-2012 |
Amerigo Wang <amwang@redhat.com> |
net: move and rename netif_notify_peers() I believe net/core/dev.c is a better place for netif_notify_peers(), because other net event notify functions also stay in this file. And rename it to netdev_notify_peers(). Cc: David S. Miller <davem@davemloft.net> Cc: Ian Campbell <Ian.Campbell@citrix.com> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
e3906486 |
|
21-Jul-2012 |
Kevin Groeneveld <kgroeneveld@gmail.com> |
net: fix race condition in several drivers when reading stats Fix race condition in several network drivers when reading stats on 32bit UP architectures. These drivers update their stats in a BH context and therefore should use u64_stats_fetch_begin_bh/u64_stats_fetch_retry_bh instead of u64_stats_fetch_begin/u64_stats_fetch_retry when reading the stats. Signed-off-by: Kevin Groeneveld <kgroeneveld@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
f2f2c8b4 |
|
28-Jun-2012 |
Jiri Pirko <jpirko@redhat.com> |
virtio_net: use IFF_LIVE_ADDR_CHANGE priv_flag Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jiri Pirko <jpirko@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
d4fc6918 |
|
26-Jun-2012 |
Jiri Pirko <jpirko@redhat.com> |
virtio_net: allow to change mac when iface is running Signed-off-by: Jiri Pirko <jpirko@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
83a27052 |
|
05-Jun-2012 |
Eric Dumazet <edumazet@google.com> |
virtio-net: fix a race on 32bit arches commit 3fa2a1df909 (virtio-net: per cpu 64 bit stats (v2)) added a race on 32bit arches. We must use separate syncp for rx and tx path as they can be run at the same time on different cpus. Thus one sequence increment can be lost and readers spin forever. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Stephen Hemminger <shemminger@vyatta.com> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Acked-by: Rusty Russell <rusty@rustcorp.com.au> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
3bbf372c |
|
30-May-2012 |
Michael S. Tsirkin <mst@redhat.com> |
virtio-net: remove useless disable on freeze disable_cb is just an optimization: it can not guarantee that there are no callbacks. In particular it doesn't have any effect when event index is on. Instead, detach, napi disable and reset on freeze ensure we don't run concurrently with a callback. Remove the useless calls so we get same behaviour with and without event index. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
ec13ee80 |
|
16-May-2012 |
Michael S. Tsirkin <mst@redhat.com> |
virtio_net: invoke softirqs after __napi_schedule __napi_schedule might raise softirq but nothing causes do_softirq to trigger, so it does not in fact run. As a result, the error message "NOHZ: local_softirq_pending 08" sometimes occurs during boot of a KVM guest when the network service is started and we are oom: ... Bringing up loopback interface: [ OK ] Bringing up interface eth0: Determining IP information for eth0...NOHZ: local_softirq_pending 08 done. [ OK ] ... Further, receive queue processing might get delayed indefinitely until some interrupt triggers: virtio_net expected napi to be run immediately. One way to cause do_softirq to be executed is by invoking local_bh_enable(). As __napi_schedule is normally called from bh or irq context, this seems to make sense: disable bh before __napi_schedule and enable afterwards. In fact it's a very complicated way of calling do_softirq(), and works since this function is only used when we are not in interrupt context. It's not hot at all, in any ideal scenario. Reported-by: Ulrich Obergfell <uobergfe@redhat.com> Tested-by: Ulrich Obergfell <uobergfe@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Rusty Russell <rusty@rustcorp.com.au>
|
#
586d17c5 |
|
11-Apr-2012 |
Jason Wang <jasowang@redhat.com> |
virtio-net: send gratuitous packets when needed As hypervior does not have the knowledge of guest network configuration, it's better to ask guest to send gratuitous packets when needed. This patch implements VIRTIO_NET_F_GUEST_ANNOUNCE feature: hypervisor would notice the guest when it thinks it's time for guest to announce the link presnece. Guest tests VIRTIO_NET_S_ANNOUNCE bit during config change interrupt and woule send gratuitous packets through netif_notify_peers() and ack the notification through ctrl vq. We need to make sure the atomicy of read and ack in guest otherwise we may ack more times than being notified. This is done through handling the whole config change interrupt in an non-reentrant workqueue. Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
31304165 |
|
08-Apr-2012 |
Torsten Kaiser <just.for.lkml@googlemail.com> |
net: Fix misplaced parenthesis in virtio_net.c Commit 2e57b79ccef1ff1422fdf45a9b28fe60f8f084f7 misplaced its parenthesis and now tx_fifo_errors will only be incremented if an ENOMEM error is not written to the syslog. Correct the parenthesis and indentation to the original goal of counting all non ENOMEM errors and ratelimiting only the messages. Signed-of-by: Torsten Kaiser <just.for.lkml@googlemail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
2e57b79c |
|
27-Mar-2012 |
Rick Jones <rick.jones2@hp.com> |
virtio_net: do not rate limit counter increments While it is desirable to rate limit certain messages, it is not desirable to rate limit the incrementing of counters associated with those messages. Signed-off-by: Rick Jones <rick.jones2@hp.com> Acked-by: Rusty Russell <rusty@rustcorp.com.au> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
f2cedb63 |
|
14-Feb-2012 |
Danny Kukawka <danny.kukawka@bisect.de> |
net: replace random_ether_addr() with eth_hw_addr_random() Replace usage of random_ether_addr() with eth_hw_addr_random() to set addr_assign_type correctly to NET_ADDR_RANDOM. Change the trivial cases. v2: adapt to renamed eth_hw_addr_random() Signed-off-by: Danny Kukawka <danny.kukawka@bisect.de> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
58472a76 |
|
12-Feb-2012 |
Eric Dumazet <eric.dumazet@gmail.com> |
virtio: net: remove sparse errors commit 3fa2a1df909 (virtio-net: per cpu 64 bit stats (v2)) added extra __percpu qualifiers and sparse errors. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Cc: Stephen Hemminger <shemminger@vyatta.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
0741bcb5 |
|
22-Dec-2011 |
Amit Shah <amit.shah@redhat.com> |
virtio: net: Add freeze, restore handlers to support S4 Remove all the vqs, disable napi and detach from the netdev on hibernation. Re-create vqs after restoring from a hibernated image, re-enable napi and re-attach the netdev. This keeps networking working across hibernation. Signed-off-by: Amit Shah <amit.shah@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
|
#
04486ed0 |
|
22-Dec-2011 |
Amit Shah <amit.shah@redhat.com> |
virtio: net: Move vq and vq buf removal into separate function The remove and PM freeze functions will share this code. Signed-off-by: Amit Shah <amit.shah@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
|
#
3f9c10b0 |
|
22-Dec-2011 |
Amit Shah <amit.shah@redhat.com> |
virtio: net: Move vq initialization into separate function The probe and PM restore functions will share this code. Signed-off-by: Amit Shah <amit.shah@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
|
#
f96fde41 |
|
11-Jan-2012 |
Rusty Russell <rusty@rustcorp.com.au> |
virtio: rename virtqueue_add_buf_gfp to virtqueue_add_buf Remove wrapper functions. This makes the allocation type explicit in all callers; I used GPF_KERNEL where it seemed obvious, left it at GFP_ATOMIC otherwise. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Reviewed-by: Christoph Hellwig <hch@lst.de>
|
#
3464645a |
|
03-Jan-2012 |
Mike Waychison <mikew@google.com> |
virtio_net: Pass gfp flags when allocating rx buffers. Currently, the refill path for RX buffers will always allocate the buffers as GFP_ATOMIC, even if we are in process context. This will fail to apply memory pressure as the worker thread will not contribute to the freeing of memory. Fix this by changing add_recvbuf_small to use the gfp variant allocator, __netdev_alloc_skb_ip_align(). Signed-off-by: Mike Waychison <mikew@google.com> Acked-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
f1776dad |
|
28-Dec-2011 |
Rusty Russell <rusty@rustcorp.com.au> |
virtio_net: use non-reentrant workqueue. Michael S. Tsirkin also noticed that we could run the refill work multiple CPUs: if we kick off a refill on one CPU and then on another, they would both manipulate the queue at the same time (they use napi_disable to avoid racing against the receive handler itself). Tejun points out that this is what the WQ_NON_REENTRANT flag is for, and that there is a convenient system kthread we can use. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
b2baed69 |
|
28-Dec-2011 |
Rusty Russell <rusty@rustcorp.com.au> |
virtio_net: set/cancel work on ndo_open/ndo_stop Michael S. Tsirkin noticed that we could run the refill work after ndo_close, which can re-enable napi - we don't disable it until virtnet_remove. This is clearly wrong, so move the workqueue control to ndo_open and ndo_stop (aka. virtnet_open and virtnet_close). One subtle point: virtnet_probe() could simply fail if it couldn't allocate a receive buffer, but that's less polite in virtnet_open() so we schedule a refill as we do in the normal receive path if we run out of memory. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
eb939922 |
|
19-Dec-2011 |
Rusty Russell <rusty@rustcorp.com.au> |
module_param: make bool parameters really bool (net & drivers/net) module_param(bool) used to counter-intuitively take an int. In fddd5201 (mid-2009) we allowed bool or int/unsigned int using a messy trick. It's time to remove the int/unsigned int option. For this version it'll simply give a warning, but it'll break next kernel version. (Thanks to Joe Perches for suggesting coccinelle for 0/1 -> true/false). Cc: "David S. Miller" <davem@davemloft.net> Cc: netdev@vger.kernel.org Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
8e586137 |
|
08-Dec-2011 |
Jiri Pirko <jpirko@redhat.com> |
net: make vlan ndo_vlan_rx_[add/kill]_vid return error value Let caller know the result of adding/removing vlan id to/from vlan filter. In some drivers I make those functions to just return 0. But in those where there is able to see if hw setup went correctly, return value is set appropriately. Signed-off-by: Jiri Pirko <jpirko@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
021ac8d3 |
|
21-Nov-2011 |
Rick Jones <rick.jones2@hp.com> |
virtio_net: return already tracked tx_fifo_errors via virtnet_getstats() Tx_fifo_errors are tracked in start_xmit_ for virtio_net, but not reported in the tallies returned by virtnet_stats(). Return them as the rx "sub-stats" rx_length_errors and rx_frame_errors are. Signed-off-by: Rick Jones <rick.jones2@hp.com> Acked-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
66846048 |
|
14-Nov-2011 |
Rick Jones <rick.jones2@hp.com> |
enable virtio_net to return bus_info in ethtool -i consistent with emulated NICs Add a new .bus_name to virtio_config_ops then modify virtio_net to call through to it in an ethtool .get_drvinfo routine to report bus_info in ethtool -i output which is consistent with other emulated NICs and the output of lspci. Signed-off-by: Rick Jones <rick.jones2@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
77dd7693 |
|
14-Aug-2011 |
Sasha Levin <levinsasha928@gmail.com> |
virtio-net: Use virtio_config_val() for retrieving config This patch modifies virtio-net to use virtio_config_val() instead of a 'if(virtio_has_feature()) vdev->config->get()' construct to retrieve optional values from the config space. Cc: Amit Shah <amit.shah@redhat.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: virtualization@lists.linux-foundation.org Signed-off-by: Sasha Levin <levinsasha928@gmail.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
|
#
8f9f4668 |
|
19-Oct-2011 |
Rick Jones <rick.jones2@hp.com> |
Add ethtool -g support to virtio_net Add support for reporting ring sizes via ethtool -g to the virtio_net driver. Signed-off-by: Rick Jones <rick.jones2@hp.com> Acked-by: Rusty Russell <rusty@rustcorp.com.au> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
4b727361 |
|
19-Oct-2011 |
Eric Dumazet <eric.dumazet@gmail.com> |
virtio_net: fix truesize underestimation We must account in skb->truesize, the size of the fragments, not the used part of them. Doing this work is important to avoid unexpected OOM situations. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> CC: Rusty Russell <rusty@rustcorp.com.au> CC: "Michael S. Tsirkin" <mst@redhat.com> CC: virtualization@lists.linux-foundation.org CC: Krishna Kumar <krkumar2@in.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
8a59a7b9 |
|
19-Oct-2011 |
Krishna Kumar <krkumar2@in.ibm.com> |
virtio_net: Clean up set_skb_frag() Remove manual initialization in set_skb_frag, and instead use __skb_fill_page_desc() to do the same. Patch tested on net-next. Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
9e903e08 |
|
18-Oct-2011 |
Eric Dumazet <eric.dumazet@gmail.com> |
net: add skb frag size accessors To ease skb->truesize sanitization, its better to be able to localize all references to skb frags size. Define accessors : skb_frag_size() to fetch frag size, and skb_frag_size_{set|add|sub}() to manipulate it. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
e878d78b |
|
27-Sep-2011 |
Sasha Levin <levinsasha928@gmail.com> |
virtio-net: Verify page list size before fitting into skb This patch verifies that the length of a buffer stored in a linked list of pages is small enough to fit into a skb. If the size is larger than a max size of a skb, it means that we shouldn't go ahead building skbs anyway since we won't be able to send the buffer as the user requested. Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: virtualization@lists.linux-foundation.org Cc: netdev@vger.kernel.org Cc: kvm@vger.kernel.org Signed-off-by: Sasha Levin <levinsasha928@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
86ee8130 |
|
21-Sep-2011 |
Ian Campbell <Ian.Campbell@citrix.com> |
virtionet: convert to SKB paged frag API. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: virtualization@lists.linux-foundation.org Cc: netdev@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
01789349 |
|
16-Aug-2011 |
Jiri Pirko <jpirko@redhat.com> |
net: introduce IFF_UNICAST_FLT private flag Use IFF_UNICAST_FTL to find out if driver handles unicast address filtering. In case it does not, promisc mode is entered. Patch also fixes following drivers: stmmac, niu: support uc filtering and yet it propagated ndo_set_multicast_list bna, benet, pxa168_eth, ks8851, ks8851_mll, ksz884x : has set ndo_set_rx_mode but do not support uc filtering Signed-off-by: Jiri Pirko <jpirko@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
2e66f55b |
|
19-Jul-2011 |
Krishna Kumar <krkumar2@in.ibm.com> |
virtio_net: Fix panic in virtnet_remove Fix a panic in virtnet_remove. unregister_netdev has already freed up the netdev (and virtnet_info) due to dev->destructor being set, while virtnet_info is still required. Remove virtnet_free altogether, and move the freeing of the per-cpu statistics from virtnet_free to virtnet_remove. Tested patch below. Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
3fa2a1df |
|
15-Jun-2011 |
stephen hemminger <shemminger@vyatta.com> |
virtio-net: per cpu 64 bit stats (v2) Use per-cpu variables to maintain 64 bit statistics. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@conan.davemloft.net>
|
#
10a8d94a |
|
09-Jun-2011 |
Jason Wang <jasowang@redhat.com> |
virtio_net: introduce VIRTIO_NET_HDR_F_DATA_VALID There's no need for the guest to validate the checksum if it have been validated by host nics. So this patch introduces a new flag - VIRTIO_NET_HDR_F_DATA_VALID which is used to bypass the checksum examing in guest. The backend (tap/macvtap) may set this flag when met skbs with CHECKSUM_UNNECESSARY to save cpu utilization. No feature negotiation is needed as old driver just ignore this flag. Iperf shows 12%-30% performance improvement for UDP traffic. For TCP, when gro is on no difference as it produces skb with partial checksum. But when gro is disabled, 20% or even higher improvement could be measured by netperf. Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
7a66f784 |
|
19-May-2011 |
Michael S. Tsirkin <mst@redhat.com> |
virtio_net: delay TX callbacks Ask for delayed callbacks on TX ring full, to give the other side more of a chance to make progress. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
|
#
98e778c9 |
|
30-Mar-2011 |
Michał Mirosław <mirq-linux@rere.qmqm.pl> |
virtio_net: convert to hw_features Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
3e9d08ec |
|
10-Feb-2011 |
Bruce Rogers <brogers@novell.com> |
virtio_net: Add schedule check to napi_enable call Under harsh testing conditions, including low memory, the guest would stop receiving packets. With this patch applied we no longer see any problems in the driver while performing these tests for extended periods of time. Make sure napi is scheduled subsequent to each napi_enable. Signed-off-by: Bruce Rogers <brogers@novell.com> Signed-off-by: Olaf Kirch <okir@suse.de> Cc: stable@kernel.org Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
55508d60 |
|
14-Dec-2010 |
Michał Mirosław <mirq-linux@rere.qmqm.pl> |
net: Use skb_checksum_start_offset() Replace skb->csum_start - skb_headroom(skb) with skb_checksum_start_offset(). Note for usb/smsc95xx: skb->data - skb->head == skb_headroom(skb). Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
167c25e4 |
|
10-Nov-2010 |
Jason Wang <jasowang@redhat.com> |
virtio-net: init link state correctly For device that supports VIRTIO_NET_F_STATUS, there's no need to assume the link is up and we need to call nerif_carrier_off() before querying device status, otherwise we may get wrong operstate after diver was loaded because the link watch event was not fired as expected. For device that does not support VIRITO_NET_F_STATUS, we could not get its status through virtnet_update_status() and what we can only do is always assuming the link is up. Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
01414802 |
|
17-Aug-2010 |
Ben Hutchings <bhutchings@solarflare.com> |
ethtool: Provide a default implementation of ethtool_ops::get_drvinfo The driver name and bus address for a net_device can normally be found through the driver model now. Instead of requiring drivers to provide this information redundantly through the ethtool_ops::get_drvinfo operation, use the driver model to do so if the driver does not define the operation. Since ETHTOOL_GDRVINFO no longer requires the driver to implement any operations, do not require net_device::ethtool_ops to be set either. Remove implementations of get_drvinfo and ethtool_ops that provide only this information. Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
a767bde4 |
|
04-Aug-2010 |
Rusty Russell <rusty@rustcorp.com.au> |
virtio_net: implements ethtool_ops.get_drvinfo I often use "ethtool -i" command to check what driver controls the ehternet device. But because current virtio_net driver doesn't support "ethtool -i", it becomes the following: # ethtool -i eth3 Cannot get driver information: Operation not supported This patch simply adds the "ethtool -i" support. The following is the result when using the virtio_net driver with my patch applied to. # ethtool -i eth3 driver: virtio_net version: N/A firmware-version: N/A bus-info: virtio0 Personally, "-i" is one of the most frequently-used option, and most network drivers support "ethtool -i", so I think virtio_net also should do. Signed-off-by: Taku Izumi <izumi.taku@jp.fujitsu.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (use ARRAY_SIZE) Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
58eba97d |
|
02-Jul-2010 |
Rusty Russell <rusty@rustcorp.com.au> |
virtio_net: fix oom handling on tx virtio net will never try to overflow the TX ring, so the only reason add_buf may fail is out of memory. Thus, we can not stop the device until some request completes - there's no guarantee anything at all is outstanding. Make the error message clearer as well: error here does not indicate queue full. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (...and avoid TX_BUSY) Cc: stable@kernel.org # .34.x (s/virtqueue_/vi->svq->vq_ops->/) Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
1788f495 |
|
02-Jul-2010 |
Michael S. Tsirkin <mst@redhat.com> |
virtio_net: do not reschedule rx refill forever We currently fill all of RX ring, then add_buf returns ENOSPC, which gets mis-detected as an out of memory condition and causes us to reschedule the work, and so on forever. Fix this by oom = err == -ENOMEM; Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Cc: stable@kernel.org # .34.x Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
aa989f5e |
|
30-May-2010 |
Michael S. Tsirkin <mst@redhat.com> |
virtio-net: pass gfp to add_buf virtio-net bounces buffer allocations off to a thread if it can't allocate buffers from the atomic pool. However, if posting buffers still requires atomic buffers, this is unlikely to succeed. Fix by passing in the proper gfp_t parameter. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
1915a712 |
|
12-Apr-2010 |
Michael S. Tsirkin <mst@redhat.com> |
virtio_net: use virtqueue_xxx wrappers Switch virtio_net to new virtqueue_xxx wrappers. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
|
#
b4bf665c |
|
14-Apr-2010 |
David S. Miller <davem@davemloft.net> |
virtio_net: Fix mis-merge. Pointed out by Stephen Rothwell. Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
0e413f22 |
|
29-Mar-2010 |
Shirley Ma <mashirle@us.ibm.com> |
virtio_net: missing sg_init_table Add missing sg_init_table for sg_set_buf in virtio_net which induced in defer skb patch. Reported-by: Thomas Müller <thomas@mathtm.de> Tested-by: Thomas Müller <thomas@mathtm.de> Signed-off-by: Shirley Ma <xma@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
5e01d2f9 |
|
07-Apr-2010 |
Michael S. Tsirkin <mst@redhat.com> |
virtio-net: move sg off stack Move sg structure off stack and into virtnet_info structure. This helps remove extra sg_init_table calls as well as reduce stack usage. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Tested-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
22bedad3 |
|
01-Apr-2010 |
Jiri Pirko <jpirko@redhat.com> |
net: convert multicast list to list_head Converts the list and the core manipulating with it to be the same as uc_list. +uses two functions for adding/removing mc address (normal and "global" variant) instead of a function parameter. +removes dev_mcast.c completely. +exposes netdev_hw_addr_list_* macros along with __hw_addr_* functions for manipulation with lists on a sandbox (used in bonding and 80211 drivers) Signed-off-by: Jiri Pirko <jpirko@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
2c45cd43 |
|
29-Mar-2010 |
Shirley Ma <mashirle@us.ibm.com> |
virtio_net: missing sg_init_table Add missing sg_init_table for sg_set_buf in virtio_net which induced in defer skb patch. Reported-by: Thomas Müller <thomas@mathtm.de> Tested-by: Thomas Müller <thomas@mathtm.de> Signed-off-by: Shirley Ma <xma@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
5a0e3ad6 |
|
24-Mar-2010 |
Tejun Heo <tj@kernel.org> |
include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h percpu.h is included by sched.h and module.h and thus ends up being included when building most .c files. percpu.h includes slab.h which in turn includes gfp.h making everything defined by the two files universally available and complicating inclusion dependencies. percpu.h -> slab.h dependency is about to be removed. Prepare for this change by updating users of gfp and slab facilities include those headers directly instead of assuming availability. As this conversion needs to touch large number of source files, the following script is used as the basis of conversion. http://userweb.kernel.org/~tj/misc/slabh-sweep.py The script does the followings. * Scan files for gfp and slab usages and update includes such that only the necessary includes are there. ie. if only gfp is used, gfp.h, if slab is used, slab.h. * When the script inserts a new include, it looks at the include blocks and try to put the new include such that its order conforms to its surrounding. It's put in the include block which contains core kernel includes, in the same order that the rest are ordered - alphabetical, Christmas tree, rev-Xmas-tree or at the end if there doesn't seem to be any matching order. * If the script can't find a place to put a new include (mostly because the file doesn't have fitting include block), it prints out an error message indicating which .h file needs to be added to the file. The conversion was done in the following steps. 1. The initial automatic conversion of all .c files updated slightly over 4000 files, deleting around 700 includes and adding ~480 gfp.h and ~3000 slab.h inclusions. The script emitted errors for ~400 files. 2. Each error was manually checked. Some didn't need the inclusion, some needed manual addition while adding it to implementation .h or embedding .c file was more appropriate for others. This step added inclusions to around 150 files. 3. The script was run again and the output was compared to the edits from #2 to make sure no file was left behind. 4. Several build tests were done and a couple of problems were fixed. e.g. lib/decompress_*.c used malloc/free() wrappers around slab APIs requiring slab.h to be added manually. 5. The script was run on all .h files but without automatically editing them as sprinkling gfp.h and slab.h inclusions around .h files could easily lead to inclusion dependency hell. Most gfp.h inclusion directives were ignored as stuff from gfp.h was usually wildly available and often used in preprocessor macros. Each slab.h inclusion directive was examined and added manually as necessary. 6. percpu.h was updated not to include slab.h. 7. Build test were done on the following configurations and failures were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my distributed build env didn't work with gcov compiles) and a few more options had to be turned off depending on archs to make things build (like ipr on powerpc/64 which failed due to missing writeq). * x86 and x86_64 UP and SMP allmodconfig and a custom test config. * powerpc and powerpc64 SMP allmodconfig * sparc and sparc64 SMP allmodconfig * ia64 SMP allmodconfig * s390 SMP allmodconfig * alpha SMP allmodconfig * um on x86_64 SMP allmodconfig 8. percpu.h modifications were reverted so that it could be applied as a separate patch and serve as bisection point. Given the fact that I had only a couple of failures from tests on step 6, I'm fairly confident about the coverage of this conversion patch. If there is a breakage, it's likely to be something in one of the arch headers which should be easily discoverable easily on most builds of the specific arch. Signed-off-by: Tejun Heo <tj@kernel.org> Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
|
#
2507c05f |
|
02-Mar-2010 |
Jiri Pirko <jpirko@redhat.com> |
virtio_net: remove forgotten assignment This is no longer needed. I missed to remove this in 567ec874d15b478c8eda7e9a5d2dcb05f13f1fb5 ("net: convert multiple drivers to use netdev_for_each_mc_addr, part6") Signed-off-by: Jiri Pirko <jpirko@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
567ec874 |
|
23-Feb-2010 |
Jiri Pirko <jpirko@redhat.com> |
net: convert multiple drivers to use netdev_for_each_mc_addr, part6 Signed-off-by: Jiri Pirko <jpirko@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
830a8a97 |
|
08-Feb-2010 |
Shirley Ma <mashirle@us.ibm.com> |
virtio_net: remove send queue Now we have a virtio detach API (in commit f9bfbebf34eab707b065116cdc9699d25ba4252a), we don't need to track xmit skbs in the virio_net driver, which improves transmission performance. Signed-off-by: Shirley Ma <xma@us.ibm.com> Acked-by: Rusty Russell <rusty@rustcorp.com.au> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
4cd24eaf |
|
07-Feb-2010 |
Jiri Pirko <jpirko@redhat.com> |
net: use netdev_mc_count and netdev_mc_empty when appropriate This patch replaces dev->mc_count in all drivers (hopefully I didn't miss anything). Used spatch and did small tweaks and conding style changes when it was suitable. Jirka Signed-off-by: Jiri Pirko <jpirko@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
9ab86bbc |
|
28-Jan-2010 |
Shirley Ma <mashirle@us.ibm.com> |
virtio_net: Defer skb allocation in receive path Date: Wed, 13 Jan 2010 12:53:38 -0800 virtio_net receives packets from its pre-allocated vring buffers, then it delivers these packets to upper layer protocols as skb buffs. So it's not necessary to pre-allocate skb for each mergable buffer, then frees extra skbs when buffers are merged into a large packet. This patch has deferred skb allocation in receiving packets for both big packets and mergeable buffers to reduce skb pre-allocations and skb frees. It frees unused buffers by calling detach_unused_buf in vring, so recv skb queue is not needed. Signed-off-by: Shirley Ma <xma@us.ibm.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
39d32157 |
|
25-Jan-2010 |
Herbert Xu <herbert@gondor.apana.org.au> |
virtio_net: Make delayed refill more reliable I have seen RX stalls on a machine that experienced a suspected OOM. After the stall, the RX buffer is empty on the guest side and there are exactly 16 entries available on the host side. As the number of entries is less than that required by a maximal skb, the host cannot proceed. The guest did not have a refill job scheduled. My diagnosis is that an OOM had occured, with the delayed refill job scheduled. The job was able to allocate at least one skb, but not enough to overcome the minimum required by the host to proceed. As the refill job would only reschedule itself if it failed completely to allocate any skbs, this would lead to an RX stall. The following patch removes this stall possibility by always rescheduling the refill job until the ring is totally refilled. Testing has shown that the RX stall no longer occurs whereas previously it would occur within a day. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Acked-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
32e7bfc4 |
|
25-Jan-2010 |
Jiri Pirko <jpirko@redhat.com> |
net: use helpers to access uc list V2 This patch introduces three macros to work with uc list from net drivers. Signed-off-by: Jiri Pirko <jpirko@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
8e95a202 |
|
03-Dec-2009 |
Joe Perches <joe@perches.com> |
drivers/net: Move && and || to end of previous line Only files where David Miller is the primary git-signer. wireless, wimax, ixgbe, etc are not modified. Compile tested x86 allyesconfig only Not all files compiled (not x86 compatible) Added a few > 80 column lines, which I ignored. Existing checkpatch complaints ignored. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
22402529 |
|
05-Nov-2009 |
Uwe Kleine-König <u.kleine-koenig@pengutronix.de> |
virtio_net: rename driver struct to please modpost Commit 3d1285b (move virtnet_remove to .devexit.text) introduced the first reference to __devexit in struct virtio_driver virtio_net which upset modpost ("Section mismatch in reference from the variable virtio_net to the function .devexit.text:virtnet_remove()"). Fix this by renaming virtio_net to virtio_net_driver. Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Reported-by: Michael S. Tsirkin <mst@redhat.com> Blame-taken-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
03f191ba |
|
28-Oct-2009 |
Michael S. Tsirkin <mst@redhat.com> |
virtio-net: fix data corruption with OOM virtio net used to unlink skbs from send queues on error, but ever since 48925e372f04f5e35fec6269127c62b2c71ab794 we do not do this. This causes guest data corruption and crashes with vhost since net core can requeue the skb or free it without it being taken off the list. This patch fixes this by queueing the skb after successful transmit. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
e95646c3 |
|
30-Sep-2009 |
Christian Borntraeger <borntraeger@de.ibm.com> |
virtio: let header files include virtio_ids.h Rusty, commit 3ca4f5ca73057a617f9444a91022d7127041970a virtio: add virtio IDs file moved all device IDs into a single file. While the change itself is a very good one, it can break userspace applications. For example if a userspace tool wanted to get the ID of virtio_net it used to include virtio_net.h. This does no longer work, since virtio_net.h does not include virtio_ids.h. This patch moves all "#include <linux/virtio_ids.h>" from the C files into the header files, making the header files compatible with the old ones. In addition, this patch exports virtio_ids.h to userspace. CC: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
|
#
ed79bab8 |
|
14-Oct-2009 |
Eric Dumazet <eric.dumazet@gmail.com> |
virtio_net: use dev_kfree_skb_any() in free_old_xmit_skbs() Because netpoll can call netdevice start_xmit() method with irqs disabled, drivers should not call kfree_skb() from their start_xmit(), but use dev_kfree_skb_any() instead. Oct 8 11:16:52 172.30.1.31 [113074.791813] ------------[ cut here ]------------ Oct 8 11:16:52 172.30.1.31 [113074.791813] WARNING: at net/core/skbuff.c:398 \ skb_release_head_state+0x64/0xc8() Oct 8 11:16:52 172.30.1.31 [113074.791813] Hardware name: Oct 8 11:16:52 172.30.1.31 [113074.791813] Modules linked in: netconsole ocfs2 jbd2 quota_tree \ ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm ocfs2_nodemanager ocfs2_stackglue configfs crc32c drbd cn loop \ serio_raw psmouse snd_pcm snd_timer snd soundcore snd_page_alloc virtio_net pcspkr parport_pc parport \ i2c_piix4 i2c_core button processor evdev ext3 jbd mbcache dm_mirror dm_region_hash dm_log dm_snapshot \ dm_mod ide_cd_mod cdrom ata_generic ata_piix virtio_blk libata scsi_mod piix ide_pci_generic ide_core \ virtio_pci virtio_ring virtio floppy thermal fan thermal_sys [last unloaded: netconsole] Oct 8 11:16:52 172.30.1.31 [113074.791813] Pid: 11132, comm: php5-cgi Tainted: G W \ 2.6.31.2-vserver #1 Oct 8 11:16:52 172.30.1.31 [113074.791813] Call Trace: Oct 8 11:16:52 172.30.1.31 [113074.791813] <IRQ> [<ffffffff81253cd5>] ? \ skb_release_head_state+0x64/0xc8 Oct 8 11:16:52 172.30.1.31 [113074.791813] [<ffffffff81253cd5>] ? skb_release_head_state+0x64/0xc8 Oct 8 11:16:52 172.30.1.31 [113074.791813] [<ffffffff81049ae1>] ? warn_slowpath_common+0x77/0xa3 Oct 8 11:16:52 172.30.1.31 [113074.791813] [<ffffffff81253cd5>] ? skb_release_head_state+0x64/0xc8 Oct 8 11:16:52 172.30.1.31 [113074.791813] [<ffffffff81253a1a>] ? __kfree_skb+0x9/0x7d Oct 8 11:16:52 172.30.1.31 [113074.791813] [<ffffffffa01cb139>] ? free_old_xmit_skbs+0x51/0x6e \ [virtio_net] Oct 8 11:16:52 172.30.1.31 [113074.791813] [<ffffffffa01cbc85>] ? start_xmit+0x26/0xf2 [virtio_net] Oct 8 11:16:52 172.30.1.31 [113074.791813] [<ffffffff8126934f>] ? netpoll_send_skb+0xd2/0x205 Oct 8 11:16:52 172.30.1.31 [113074.791813] [<ffffffffa0429216>] ? write_msg+0x90/0xeb [netconsole] Oct 8 11:16:52 172.30.1.31 [113074.791813] [<ffffffff81049f06>] ? __call_console_drivers+0x5e/0x6f Oct 8 11:16:52 172.30.1.31 [113074.791813] [<ffffffff8102b49d>] ? kvm_clock_read+0x4d/0x52 Oct 8 11:16:52 172.30.1.31 [113074.791813] [<ffffffff8104a082>] ? release_console_sem+0x115/0x1ba Oct 8 11:16:52 172.30.1.31 [113074.791813] [<ffffffff8104a632>] ? vprintk+0x2f2/0x34b Oct 8 11:16:52 172.30.1.31 [113074.791813] [<ffffffff8106b142>] ? vx_update_load+0x18/0x13e Oct 8 11:16:52 172.30.1.31 [113074.791813] [<ffffffff81308309>] ? printk+0x4e/0x5d Oct 8 11:16:52 172.30.1.31 [113074.791813] [<ffffffff8102b49d>] ? kvm_clock_read+0x4d/0x52 Oct 8 11:16:52 172.30.1.31 [113074.791813] [<ffffffff81070b62>] ? getnstimeofday+0x55/0xaf Oct 8 11:16:52 172.30.1.31 [113074.791813] [<ffffffff81062683>] ? ktime_get_ts+0x21/0x49 Oct 8 11:16:52 172.30.1.31 [113074.791813] [<ffffffff810626b7>] ? ktime_get+0xc/0x41 Oct 8 11:16:52 172.30.1.31 [113074.791813] [<ffffffff81062788>] ? hrtimer_interrupt+0x9c/0x146 Oct 8 11:16:52 172.30.1.31 [113074.791813] [<ffffffff81024a4b>] ? smp_apic_timer_interrupt+0x80/0x93 Oct 8 11:16:52 172.30.1.31 [113074.791813] [<ffffffff81011663>] ? apic_timer_interrupt+0x13/0x20 Oct 8 11:16:52 172.30.1.31 [113074.791813] <EOI> [<ffffffff8130a9eb>] ? _spin_unlock_irq+0xd/0x31 Reported-and-tested-by: Massimo Cetra <mcetra@navynet.it> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Bug-Entry: http://bugzilla.kernel.org/show_bug.cgi?id=14378 Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
89d71a66 |
|
12-Oct-2009 |
Eric Dumazet <eric.dumazet@gmail.com> |
net: Use netdev_alloc_skb_ip_align() Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
3d1285be |
|
30-Sep-2009 |
Uwe Kleine-König <u.kleine-koenig@pengutronix.de> |
move virtnet_remove to .devexit.text The function virtnet_remove is used only wrapped by __devexit_p so define it using __devexit. Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Acked-by: Sam Ravnborg <sam@ravnborg.org> Cc: David S. Miller <davem@davemloft.net> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Alex Williamson <alex.williamson@hp.com> Cc: Mark McLoughlin <markmc@redhat.com> Cc: netdev@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
0aea51c3 |
|
26-Aug-2009 |
Amit Shah <amit.shah@redhat.com> |
virtio_net: Check for room in the vq before adding buffer Saves us one cycle of alloc-add-free if the queue was full. Signed-off-by: Amit Shah <amit.shah@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (modified)
|
#
48925e37 |
|
24-Sep-2009 |
Rusty Russell <rusty@rustcorp.com.au> |
virtio_net: avoid (most) NETDEV_TX_BUSY by stopping queue early. Now we can tell the theoretical capacity remaining in the output queue, virtio_net can waste entries by stopping the queue early. It doesn't work in the case of indirect buffers and kmalloc failure, but that's rare (we could drop the packet in that case, but other drivers return TX_BUSY for similar reasons). For the record, I think this patch reflects poorly on the linux network API. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Cc: Dinesh Subhraveti <dineshs@us.ibm.com>
|
#
b3f24698 |
|
24-Sep-2009 |
Rusty Russell <rusty@rustcorp.com.au> |
virtio_net: formalize skb_vnet_hdr We put the virtio_net_hdr into the skb's cb region; turn this into a union to clean up the code slightly and allow future expansion. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Cc: Mark McLoughlin <markmc@redhat.com> Cc: Dinesh Subhraveti <dineshs@us.ibm.com>
|
#
b0c39dbd |
|
24-Sep-2009 |
Rusty Russell <rusty@rustcorp.com.au> |
virtio_net: don't free buffers in xmit ring The virtio_net driver is complicated by the two methods of freeing old xmit buffers (in addition to freeing old ones at the start of the xmit path). The original code used a 1/10 second timer attached to xmit_free(), reset on every xmit. Before we orphaned skbs on xmit, the transmitting userspace could block with a full socket until the timer fired, the skb destructor was called, and they were re-woken. So we added the VIRTIO_F_NOTIFY_ON_EMPTY feature: supporting devices send an interrupt (even if normally suppressed) on an empty xmit ring which makes us schedule xmit_tasklet(). This was a benchmark win. Unfortunately, VIRTIO_F_NOTIFY_ON_EMPTY makes quite a lot of work: a host which is faster than the guest will fire the interrupt every xmit packet (slowing the guest down further). Attempting mitigation in the host adds overhead of userspace timers (possibly with the additional pain of signals), and risks increasing latency anyway if you get it wrong. In practice, this effect was masked by benchmarks which take advantage of GSO (with its inherent transmit batching), but it's still there. Now we orphan xmitted skbs, the pressure is off: remove both paths and no longer request VIRTIO_F_NOTIFY_ON_EMPTY. Note that the current QEMU will notify us even if we don't negotiate this feature (legal, but suboptimal); a patch is outstanding to improve that. Move the skb_orphan/nf_reset to after we've done the send and notified the other end, for a slight optimization. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Cc: Mark McLoughlin <markmc@redhat.com>
|
#
8958f574 |
|
24-Sep-2009 |
Rusty Russell <rusty@rustcorp.com.au> |
virtio_net: return NETDEV_TX_BUSY instead of queueing an extra skb. This effectively reverts 99ffc696d10b28580fe93441d627cf290ac4484c "virtio: wean net driver off NETDEV_TX_BUSY". The complexity of queuing an skb (setting a tasklet to re-xmit) is questionable, especially once we get rid of the other reason for the tasklet in the next patch. If the skb won't fit in the tx queue, just return NETDEV_TX_BUSY. This is frowned upon, so a followup patch uses a more complex solution. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Cc: Herbert Xu <herbert@gondor.apana.org.au>
|
#
2b5bbe3b |
|
24-Sep-2009 |
Rusty Russell <rusty@rustcorp.com.au> |
virtio_net: skb_orphan() and nf_reset() in xmit path. The complex transmit free logic was introduced to avoid hangs on removing the ip_conntrack module and also because drivers aren't generally supposed to keep stale skbs for unbounded times. After some debate, it was decided that while doing skb_orphan() generally is a rat's nest, we can do it in this driver. Following patches take advantage of this. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
|
#
3ca4f5ca |
|
31-Jul-2009 |
Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp> |
virtio: add virtio IDs file Virtio IDs are spread all over the tree which makes assigning new IDs bothersome. Putting them together should make the process less error-prone. Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
|
#
3c1b27d5 |
|
23-Sep-2009 |
Rusty Russell <rusty@rustcorp.com.au> |
virtio: make add_buf return capacity remaining This API change means that virtio_net can tell how much capacity remains for buffers. It's necessarily fuzzy, since VIRTIO_RING_F_INDIRECT_DESC means we can fit any number of descriptors in one, *if* we can kmalloc. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Cc: Dinesh Subhraveti <dineshs@us.ibm.com>
|
#
0fc0b732 |
|
02-Sep-2009 |
Stephen Hemminger <shemminger@vyatta.com> |
netdev: drivers should make ethtool_ops const No need to put ethtool_ops in data, they should be const. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
424efe9c |
|
31-Aug-2009 |
Stephen Hemminger <shemminger@vyatta.com> |
netdev: convert pseudo drivers to netdev_tx_t These are all drivers that don't touch real hardware. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
3161e453 |
|
26-Aug-2009 |
Rusty Russell <rusty@rustcorp.com.au> |
virtio: net refill on out-of-memory If we run out of memory, use keventd to fill the buffer. There's a report of this happening: "Page allocation failures in guest", Message-ID: <20090713115158.0a4892b0@mjolnir.ossman.eu> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
5c516751 |
|
14-Jul-2009 |
Sridhar Samudrala <sri@us.ibm.com> |
virtio-net: Allow UFO feature to be set and advertised. - Allow setting UFO on virtio-net and advertise to host. Signed-off-by: Sridhar Samudrala <sri@us.ibm.com> Acked-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
31278e71 |
|
16-Jun-2009 |
Jiri Pirko <jpirko@redhat.com> |
net: group address list and its count This patch is inspired by patch recently posted by Johannes Berg. Basically what my patch does is to group list and a count of addresses into newly introduced structure netdev_hw_addr_list. This brings us two benefits: 1) struct net_device becames a bit nicer. 2) in the future there will be a possibility to operate with lists independently on netdevices (with exporting right functions). I wanted to introduce this patch before I'll post a multicast lists conversion. Signed-off-by: Jiri Pirko <jpirko@redhat.com> drivers/net/bnx2.c | 4 +- drivers/net/e1000/e1000_main.c | 4 +- drivers/net/ixgbe/ixgbe_main.c | 6 +- drivers/net/mv643xx_eth.c | 2 +- drivers/net/niu.c | 4 +- drivers/net/virtio_net.c | 10 ++-- drivers/s390/net/qeth_l2_main.c | 2 +- include/linux/netdevice.h | 17 +++-- net/core/dev.c | 130 ++++++++++++++++++-------------------- 9 files changed, 89 insertions(+), 90 deletions(-) Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
d2a7ddda |
|
12-Jun-2009 |
Michael S. Tsirkin <mst@redhat.com> |
virtio: find_vqs/del_vqs virtio operations This replaces find_vq/del_vq with find_vqs/del_vqs virtio operations, and updates all drivers. This is needed for MSI support, because MSI needs to know the total number of vectors upfront. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (+ lguest/9p compile fixes)
|
#
9499f5e7 |
|
12-Jun-2009 |
Rusty Russell <rusty@rustcorp.com.au> |
virtio: add names to virtqueue struct, mapping from devices to queues. Add a linked list of all virtqueues for a virtio device: this helps for debugging and is also needed for upcoming interface change. Also, add a "name" field for clearer debug messages. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
|
#
8981f010 |
|
11-Jun-2009 |
Herbert Xu <herbert@gondor.apana.org.au> |
virtio_net: Fix IP alignment on non-mergeable RX path We need to enforce the IP alignment on the non-mergeable RX path just like the other RX path. Not doing so results in misaligned IP headers. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
b82f08ea |
|
03-Jun-2009 |
Herbert Xu <herbert@gondor.apana.org.au> |
virtio_net: Set correct gso->hdr_len Through a bug in the tun driver, I noticed that virtio_net is producing bogus hdr_len values. In particular, it only includes the IP header in the linear area, and excludes the entire TCP header. This causes the TCP header to be copied twice for each packet. (The bug omitted the second copy :) This patch corrects this. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
ccffad25 |
|
22-May-2009 |
Jiri Pirko <jpirko@redhat.com> |
net: convert unicast addr list This patch converts unicast address list to standard list_head using previously introduced struct netdev_hw_addr. It also relaxes the locking. Original spinlock (still used for multicast addresses) is not needed and is no longer used for a protection of this list. All reading and writing takes place under rtnl (with no changes). I also removed a possibility to specify the length of the address while adding or deleting unicast address. It's always dev->addr_len. The convertion touched especially e1000 and ixgbe codes when the change is not so trivial. Signed-off-by: Jiri Pirko <jpirko@redhat.com> drivers/net/bnx2.c | 13 +-- drivers/net/e1000/e1000_main.c | 24 +++-- drivers/net/ixgbe/ixgbe_common.c | 14 ++-- drivers/net/ixgbe/ixgbe_common.h | 4 +- drivers/net/ixgbe/ixgbe_main.c | 6 +- drivers/net/ixgbe/ixgbe_type.h | 4 +- drivers/net/macvlan.c | 11 +- drivers/net/mv643xx_eth.c | 11 +- drivers/net/niu.c | 7 +- drivers/net/virtio_net.c | 7 +- drivers/s390/net/qeth_l2_main.c | 6 +- drivers/scsi/fcoe/fcoe.c | 16 ++-- include/linux/netdevice.h | 18 ++-- net/8021q/vlan.c | 4 +- net/8021q/vlan_dev.c | 10 +- net/core/dev.c | 195 +++++++++++++++++++++++++++----------- net/dsa/slave.c | 10 +- net/packet/af_packet.c | 4 +- 18 files changed, 227 insertions(+), 137 deletions(-) Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
1824a989 |
|
01-May-2009 |
Alex Williamson <alex.williamson@hp.com> |
virtio_net: Fix function name typo Signed-off-by: Alex Williamson <alex.williamson@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
23e258e1 |
|
01-May-2009 |
Alex Williamson <alex.williamson@hp.com> |
virtio_net: Cleanup command queue scatterlist usage We were avoiding calling sg_init* on scatterlists passed into virtnet_send_command to prevent extraneous end markers. This caused build warnings for uninitialized variables. Cleanup the code to create proper scatterlists. Signed-off-by: Alex Williamson <alex.williamson@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
0ee904c3 |
|
11-Apr-2009 |
Alexander Beregalov <a.beregalov@gmail.com> |
drivers/net: replace BUG() with BUG_ON() if possible Signed-off-by: Alexander Beregalov <a.beregalov@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
62994b2d |
|
04-Apr-2009 |
Alex Williamson <alex.williamson@hp.com> |
virtio_net: Set the mac config only when VIRITO_NET_F_MAC VIRTIO_NET_F_MAC indicates the presence of the mac field in config space, not the validity of the value it contains. Allow the mac to be changed at runtime, but only push the change into config space with the VIRTIO_NET_F_MAC feature present. Signed-off-by: Alex Williamson <alex.williamson@hp.com> Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
4783256e |
|
18-Mar-2009 |
Pantelis Koukousoulas <pktoss@gmail.com> |
virtio_net: Make virtio_net support carrier detection Impact: Make NetworkManager work with virtio_net For now the semantics are simple: There is always carrier. This allows a seamless experience with e.g., qemu/kvm where NetworkManager just configures and sets up everything automagically. If/when a generally agreed-upon way to control carrier on/off in the emulator/hypervisor level emerges, it will be trivial to extend the driver to support that too, but for now even this 2-liner makes user experience that much better. Signed-off-by: Pantelis Koukousoulas <pktoss@gmail.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
9c46f6d4 |
|
04-Feb-2009 |
Alex Williamson <alex.williamson@hp.com> |
virtio_net: Allow setting the MAC address of the NIC Many physical NICs let the OS re-program the "hardware" MAC address. Virtual NICs should allow this too. Signed-off-by: Alex Williamson <alex.williamson@hp.com> Acked-by: Mark McLoughlin <markmc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
0bde9569 |
|
04-Feb-2009 |
Alex Williamson <alex.williamson@hp.com> |
virtio_net: Add support for VLAN filtering in the hypervisor VLAN filtering allows the hypervisor to drop packets from VLANs that we're not a part of, further reducing the number of extraneous packets recieved. This makes use of the VLAN virtqueue command class. The CTRL_VLAN feature bit tells us whether the backend supports VLAN filtering. Signed-off-by: Alex Williamson <alex.williamson@hp.com> Acked-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
f565a7c2 |
|
04-Feb-2009 |
Alex Williamson <alex.williamson@hp.com> |
virtio_net: Add a MAC filter table Make use of the MAC control virtqueue class to support a MAC filter table. The filter table is managed by the hypervisor. We consider the table to be available if the CTRL_RX feature bit is set. We leave it to the hypervisor to manage the table and enable promiscuous or all-multi mode as necessary depending on the resources available to it. Signed-off-by: Alex Williamson <alex.williamson@hp.com> Acked-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
2af7698e |
|
04-Feb-2009 |
Alex Williamson <alex.williamson@hp.com> |
virtio_net: Add a set_rx_mode interface Make use of the RX_MODE control virtqueue class to enable the set_rx_mode netdev interface. This allows us to selectively enable/disable promiscuous and allmulti mode so we don't see packets we don't want. For now, we automatically enable these as needed if additional unicast or multicast addresses are requested. Signed-off-by: Alex Williamson <alex.williamson@hp.com> Acked-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
2a41f71d |
|
04-Feb-2009 |
Alex Williamson <alex.williamson@hp.com> |
virtio_net: Add a virtqueue for outbound control commands This will be used for RX mode, MAC filter table, VLAN filtering, etc... The control transaction consists of one or more "out" sg entries and one or more "in" sg entries. The first out entry contains a header defining the class and command. Additional out entries may provide data for the command. The last in entry provides a status response back from the command. Virtqueues typically run asynchronous, running a callback function when there's data in the channel. We can't readily make use of this in the command paths where we need to use this. Instead, we kick the virtqueue and spin. The kick causes an I/O write, triggering an immediate trap into the hypervisor. Signed-off-by: Alex Williamson <alex.williamson@hp.com> Acked-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
8527bec5 |
|
26-Jan-2009 |
Ira W. Snyder <iws@ovro.caltech.edu> |
virtio_net: use correct accessors for scatterlists Without this fix, virtio_net makes incorrect usage of scatterlists. It sets the end of the scatterlist chain after the first element, despite the fact that more entries come after it. If you try to run dma_map_sg() on one of the scatterlists given to you by add_buf(), you will get a null pointer oops. Signed-off-by: Ira W. Snyder <iws@ovro.caltech.edu> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
e918085a |
|
25-Jan-2009 |
Alex Williamson <alex.williamson@hp.com> |
virtio_net: Fix MAX_PACKET_LEN to support 802.1Q VLANs 802.1Q expanded the maximum ethernet frame size by 4 bytes for the VLAN tag. We're not taking this into account in virtio_net, which means the buffers we provide to the backend in the virtqueue RX ring aren't big enough to hold a full MTU VLAN packet. For QEMU/KVM, this results in the backend exiting with a packet truncation error. Signed-off-by: Alex Williamson <alex.williamson@hp.com> Acked-by: Mark McLoughlin <markmc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
9f4d26d0 |
|
19-Jan-2009 |
Mark McLoughlin <markmc@redhat.com> |
virtio_net: add link status handling Allow the host to inform us that the link is down by adding a VIRTIO_NET_F_STATUS which indicates that device status is available in virtio_net config. This is currently useful for simulating link down conditions (e.g. using proposed qemu 'set_link' monitor command) but would also be needed if we were to support device assignment via virtio. Signed-off-by: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (added future masking) Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
288379f0 |
|
19-Jan-2009 |
Ben Hutchings <bhutchings@solarflare.com> |
net: Remove redundant NAPI functions Following the removal of the unused struct net_device * parameter from the NAPI functions named *netif_rx_* in commit 908a7a1, they are exactly equivalent to the corresponding *napi_* functions and are therefore redundant. Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
76288b4e |
|
06-Jan-2009 |
Stephen Hemminger <shemminger@vyatta.com> |
virtio: convert to net_device_ops Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Acked-by: Mark McLoughlin <markmc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
908a7a16 |
|
22-Dec-2008 |
Neil Horman <nhorman@tuxdriver.com> |
net: Remove unused netdev arg from some NAPI interfaces. When the napi api was changed to separate its 1:1 binding to the net_device struct, the netif_rx_[prep|schedule|complete] api failed to remove the now vestigual net_device structure parameter. This patch cleans up that api by properly removing it.. Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
39da5814 |
|
26-Nov-2008 |
Mark McLoughlin <markmc@redhat.com> |
virtio_net: large tx MTU support We don't really have a max tx packet size limit, so allow configuring the device with up to 64k tx MTU. Signed-off-by: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
|
#
3f2c31d9 |
|
16-Nov-2008 |
Mark McLoughlin <markmc@redhat.com> |
virtio_net: VIRTIO_NET_F_MSG_RXBUF (imprive rcv buffer allocation) If segmentation offload is enabled by the host, we currently allocate maximum sized packet buffers and pass them to the host. This uses up 20 ring entries, allowing us to supply only 20 packet buffers to the host with a 256 entry ring. This is a huge overhead when receiving small packets, and is most keenly felt when receiving MTU sized packets from off-host. The VIRTIO_NET_F_MRG_RXBUF feature flag is set by hosts which support using receive buffers which are smaller than the maximum packet size. In order to transfer large packets to the guest, the host merges together multiple receive buffers to form a larger logical buffer. The number of merged buffers is returned to the guest via a field in the virtio_net_hdr. Make use of this support by supplying single page receive buffers to the host. On receive, we extract the virtio_net_hdr, copy 128 bytes of the payload to the skb's linear data buffer and adjust the fragment offset to point to the remaining data. This ensures proper alignment and allows us to not use any paged data for small packets. If the payload occupies multiple pages, we simply append those pages as fragments and free the associated skbs. This scheme allows us to be efficient in our use of ring entries while still supporting large packets. Benchmarking using netperf from an external machine to a guest over a 10Gb/s network shows a 100% improvement from ~1Gb/s to ~2Gb/s. With a local host->guest benchmark with GSO disabled on the host side, throughput was seen to increase from 700Mb/s to 1.7Gb/s. Based on a patch from Herbert Xu. Signed-off-by: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (use netdev_priv) Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
0276b497 |
|
16-Nov-2008 |
Mark McLoughlin <markmc@redhat.com> |
virtio_net: hook up the set-tso ethtool op Seems like an oversight that we have set-tx-csum and set-sg hooked up, but not set-tso. Also leads to the strange situation that if you e.g. disable tx-csum, then tso doesn't get disabled. Signed-off-by: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
0a888fd1 |
|
16-Nov-2008 |
Mark McLoughlin <markmc@redhat.com> |
virtio_net: Recycle some more rx buffer pages Each time we re-fill the recv queue with buffers, we allocate one too many skbs and free it again when adding fails. We should recycle the pages allocated in this case. A previous version of this patch made trim_pages() trim trailing unused pages from skbs with some paged data, but this actually caused a barely measurable slowdown. Signed-off-by: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (use netdev_priv) Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
8f15ea42 |
|
13-Nov-2008 |
Wang Chen <wangchen@cn.fujitsu.com> |
netdevice: safe convert to netdev_priv() #part-3 We have some reasons to kill netdev->priv: 1. netdev->priv is equal to netdev_priv(). 2. netdev_priv() wraps the calculation of netdev->priv's offset, obviously netdev_priv() is more flexible than netdev->priv. But we cann't kill netdev->priv, because so many drivers reference to it directly. This patch is a safe convert for netdev->priv to netdev_priv(netdev). Since all of the netdev->priv is only for read. But it is too big to be sent in one mail. I split it to 4 parts and make every part smaller than 100,000 bytes, which is max size allowed by vger. Signed-off-by: Wang Chen <wangchen@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
e174961c |
|
27-Oct-2008 |
Johannes Berg <johannes@sipsolutions.net> |
net: convert print_mac to %pM This converts pretty much everything to print_mac. There were a few things that had conflicts which I have just dropped for now, no harm done. I've built an allyesconfig with this and looked at the files that weren't built very carefully, but it's a huge patch. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
fb6813f4 |
|
24-Jul-2008 |
Rusty Russell <rusty@rustcorp.com.au> |
virtio: Recycle unused recv buffer pages for large skbs in net driver If we hack the virtio_net driver to always allocate full-sized (64k+) skbuffs, the driver slows down (lguest numbers): Time to receive 1GB (small buffers): 10.85 seconds Time to receive 1GB (64k+ buffers): 24.75 seconds Of course, large buffers use up more space in the ring, so we increase that from 128 to 2048: Time to receive 1GB (64k+ buffers, 2k ring): 16.61 seconds If we recycle pages rather than using alloc_page/free_page: Time to receive 1GB (64k+ buffers, 2k ring, recycle pages): 10.81 seconds This demonstrates that with efficient allocation, we don't need to have a separate "small buffer" queue. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
|
#
97402b96 |
|
17-Apr-2008 |
Herbert Xu <herbert@gondor.apana.org.au> |
virtio net: Allow receiving SG packets Finally this patch lets virtio_net receive GSO packets in addition to sending them. This can definitely be optimised for the non-GSO case. For comparison the Xen approach stores one page in each skb and uses subsequent skb's pages to construct an SG skb instead of preallocating the maximum amount of pages per skb. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (added feature bits)
|
#
a9ea3fc6 |
|
17-Apr-2008 |
Herbert Xu <herbert@gondor.apana.org.au> |
virtio net: Add ethtool ops for SG/GSO This patch adds some basic ethtool operations to virtio_net so I could test SG without GSO (which was really useful because TSO turned out to be buggy :) Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (remove MTU setting)
|
#
9953ca6c |
|
26-May-2008 |
Mark McLoughlin <markmc@redhat.com> |
virtio: fix virtio_net xmit of freed skb bug On Mon, 2008-05-26 at 17:42 +1000, Rusty Russell wrote: > If we fail to transmit a packet, we assume the queue is full and put > the skb into last_xmit_skb. However, if more space frees up before we > xmit it, we loop, and the result can be transmitting the same skb twice. > > Fix is simple: set skb to NULL if we've used it in some way, and check > before sending. ... > diff -r 564237b31993 drivers/net/virtio_net.c > --- a/drivers/net/virtio_net.c Mon May 19 12:22:00 2008 +1000 > +++ b/drivers/net/virtio_net.c Mon May 19 12:24:58 2008 +1000 > @@ -287,21 +287,25 @@ again: > free_old_xmit_skbs(vi); > > /* If we has a buffer left over from last time, send it now. */ > - if (vi->last_xmit_skb) { > + if (unlikely(vi->last_xmit_skb)) { > if (xmit_skb(vi, vi->last_xmit_skb) != 0) { > /* Drop this skb: we only queue one. */ > vi->dev->stats.tx_dropped++; > kfree_skb(skb); > + skb = NULL; > goto stop_queue; > } > vi->last_xmit_skb = NULL; With this, may drop an skb and then later in the function discover that we could have sent it after all. Poor wee skb :) How about the incremental patch below? Cheers, Mark. Subject: [PATCH] virtio_net: Delay dropping tx skbs Currently we drop the skb in start_xmit() if we have a queued buffer and fail to transmit it. However, if we delay dropping it until we've stopped the queue and enabled the tx notification callback, then there is a chance space might become available for it. Signed-off-by: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
|
#
5e4fe5c4 |
|
08-Jul-2008 |
Mark McLoughlin <markmc@redhat.com> |
virtio_net: Set VIRTIO_NET_F_GUEST_CSUM feature We can handle receiving partial csums, so set the appropriate feature bit. Signed-off-by: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
|
#
363f1514 |
|
08-Jun-2008 |
Rusty Russell <rusty@rustcorp.com.au> |
virtio: use callback on empty in virtio_net virtio_net uses a timer to free old transmitted packets, rather than leaving callbacks enabled all the time. If the host promises to always notify us when the transmit ring is empty, we can free packets at that point and avoid the timer. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
|
#
14c998f0 |
|
08-Jun-2008 |
Mark McLoughlin <markmc@redhat.com> |
virtio: virtio_net free transmit skbs in a timer virtio_net currently only frees old transmit skbs just before queueing new ones. If the queue is full, it then enables interrupts and waits for notification that more work has been performed. However, a side-effect of this scheme is that there are always xmit skbs left dangling when no new packets are sent, against the Documentation/networking/driver.txt guideline: "... it is not allowed for your TX mitigation scheme to let TX packets "hang out" in the TX ring unreclaimed forever if no new TX packets are sent." Add a timer to ensure that any time we queue new TX skbs, we will shortly free them again. This fixes an easily reproduced hang at shutdown where iptables attempts to unload nf_conntrack and nf_conntrack waits for an skb it is tracking to be freed, but virtio_net never frees it. Signed-off-by: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
|
#
23cde76d |
|
08-Jun-2008 |
Mark McLoughlin <markmc@redhat.com> |
virtio_net: Fix skb->csum_start computation hdr->csum_start is the offset from the start of the ethernet header to the transport layer checksum field. skb->csum_start is the offset from skb->head. skb_partial_csum_set() assumes that skb->data points to the ethernet header - i.e. it computes skb->csum_start by adding the headroom to hdr->csum_start. Since eth_type_trans() skb_pull()s the ethernet header, skb_partial_csum_set() should be called before eth_type_trans(). (Without this patch, GSO packets from a guest to the world outside the host are corrupted). Signed-off-by: Mark McLoughlin <markmc@redhat.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
|
#
11a3a154 |
|
26-May-2008 |
Rusty Russell <rusty@rustcorp.com.au> |
virtio: fix delayed xmit of packet and freeing of old packets. Because we cache the last failed-to-xmit packet, if there are no packets queued behind that one we may never send it (reproduced here as TCP stalls, "cured" by an outgoing ping). Cc: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
|
#
7eb2e251 |
|
26-May-2008 |
Rusty Russell <rusty@rustcorp.com.au> |
virtio: fix virtio_net xmit of freed skb bug If we fail to transmit a packet, we assume the queue is full and put the skb into last_xmit_skb. However, if more space frees up before we xmit it, we loop, and the result can be transmitting the same skb twice. Fix is simple: set skb to NULL if we've used it in some way, and check before sending. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
|
#
288369cc |
|
22-May-2008 |
Wang Chen <wangchen@cn.fujitsu.com> |
VIRTIO: Use __skb_queue_purge() Use standard routine for queue purging. Signed-off-by: Wang Chen <wangchen@cn.fujitsu.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
|
#
c45a6816 |
|
02-May-2008 |
Rusty Russell <rusty@rustcorp.com.au> |
virtio: explicit advertisement of driver features A recent proposed feature addition to the virtio block driver revealed some flaws in the API: in particular, we assume that feature negotiation is complete once a driver's probe function returns. There is nothing in the API to require this, however, and even I didn't notice when it was violated. So instead, we require the driver to specify what features it supports in a table, we can then move the feature negotiation into the virtio core. The intersection of device and driver features are presented in a new 'features' bitmap in the struct virtio_device. Note that this highlights the difference between Linux unsigned-long bitmaps where each unsigned long is in native endian, and a straight-forward little-endian array of bytes. Drivers can still remove feature bits in their probe routine if they really have to. API changes: - dev->config->feature() no longer gets and acks a feature. - drivers should advertise their features in the 'feature_table' field - use virtio_has_feature() for extra sanity when checking feature bits Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
|
#
5539ae96 |
|
02-May-2008 |
Rusty Russell <rusty@rustcorp.com.au> |
virtio: finer-grained features for virtio_net So, we previously had a 'VIRTIO_NET_F_GSO' bit which meant that 'the host can handle csum offload, and any TSO (v4&v6 incl ECN) or UFO packets you might want to send. I thought this was good enough for Linux, but it actually isn't, since we don't do UFO in software. So, add separate feature bits for what the host can handle. Add equivalent ones for the guest to say what it can handle, because LRO is coming too (thanks Herbert!). Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
|
#
99ffc696 |
|
02-May-2008 |
Rusty Russell <rusty@rustcorp.com.au> |
virtio: wean net driver off NETDEV_TX_BUSY Herbert tells me that returning NETDEV_TX_BUSY from hard_start_xmit is seen as a poor thing to do; we should cache the packet and stop the queue. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
|
#
05271685 |
|
02-May-2008 |
Rusty Russell <rusty@rustcorp.com.au> |
virtio: fix scatterlist sizing in net driver. Herbert Xu points out (within another patch) that my scatterlists are too short: one entry for the gso header, one for the skb->data, and MAX_SKB_FRAGS for all the fragments. Fix both xmit and recv sides (recv currently unused, coming in later patch). Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
|
#
655aa31f |
|
02-May-2008 |
Rusty Russell <rusty@rustcorp.com.au> |
virtio: fix tx_ stats in virtio_net get_buf() gives the length written by the other side, which will be zero. We want to add the skb length. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
|
#
21f644f3 |
|
08-Apr-2008 |
David S. Miller <davem@davemloft.net> |
[NET]: Undo code bloat in hot paths due to print_mac(). If print_mac() is used inside of a pr_debug() the compiler can't see that the call is redundant so still performs it even of pr_debug() ends up being a nop. So don't use print_mac() in such cases in hot code paths, use MAC_FMT et al. instead. As noted by Joe Perches, pr_debug() could be modified to handle this better, but that is a change to an interface used by the entire kernel and thus needs to be validated carefully. This here is thus the less risky fix for 2.6.25 Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
6ea0a467 |
|
07-Apr-2008 |
Anthony Liguori <aliguori@us.ibm.com> |
virtio_net: remove overzealous printk The 'disable_cb' is really just a hint and as such, it's possible for more work to get queued up while callbacks are disabled. Under stress with an SMP guest, this printk triggers very frequently. There is no race here, this is how things are designed to work so let's just remove the printk. Signed-off-by: Anthony Liguori <aliguori@us.ibm.com> Acked-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
4265f161 |
|
14-Mar-2008 |
Christian Borntraeger <borntraeger@de.ibm.com> |
virtio: fix race in enable_cb There is a race in virtio_net, dealing with disabling/enabling the callback. I saw the following oops: kernel BUG at /space/kvm/drivers/virtio/virtio_ring.c:218! illegal operation: 0001 [#1] SMP Modules linked in: sunrpc dm_mod CPU: 2 Not tainted 2.6.25-rc1zlive-host-10623-gd358142-dirty #99 Process swapper (pid: 0, task: 000000000f85a610, ksp: 000000000f873c60) Krnl PSW : 0404300180000000 00000000002b81a6 (vring_disable_cb+0x16/0x20) R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:0 CC:3 PM:0 EA:3 Krnl GPRS: 0000000000000001 0000000000000001 0000000010005800 0000000000000001 000000000f3a0900 000000000f85a610 0000000000000000 0000000000000000 0000000000000000 000000000f870000 0000000000000000 0000000000001237 000000000f3a0920 000000000010ff74 00000000002846f6 000000000fa0bcd8 Krnl Code: 00000000002b819a: a7110001 tmll %r1,1 00000000002b819e: a7840004 brc 8,2b81a6 00000000002b81a2: a7f40001 brc 15,2b81a4 >00000000002b81a6: a51b0001 oill %r1,1 00000000002b81aa: 40102000 sth %r1,0(%r2) 00000000002b81ae: 07fe bcr 15,%r14 00000000002b81b0: eb7ff0380024 stmg %r7,%r15,56(%r15) 00000000002b81b6: a7f13e00 tmll %r15,15872 Call Trace: ([<000000000fa0bcd0>] 0xfa0bcd0) [<00000000002b8350>] vring_interrupt+0x5c/0x6c [<000000000010ab08>] do_extint+0xb8/0xf0 [<0000000000110716>] ext_no_vtime+0x16/0x1a [<0000000000107e72>] cpu_idle+0x1c2/0x1e0 The problem can be triggered with a high amount of host->guest traffic. I think its the following race: poll says netif_rx_complete poll calls enable_cb enable_cb opens the interrupt mask a new packet comes, an interrupt is triggered----\ enable_cb sees that there is more work | enable_cb disables the interrupt | . V . interrupt is delivered . skb_recv_done does atomic napi test, ok some waiting disable_cb is called->check fails->bang! . poll would do napi check poll would do disable_cb The fix is to let enable_cb not disable the interrupt again, but expect the caller to do the cleanup if it returns false. In that case, the interrupt is only disabled, if the napi test_set_bit was successful. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (cleaned up doco)
|
#
da74e89d |
|
29-Feb-2008 |
Amit Shah <amitshah@gmx.net> |
virtio: Enable netpoll interface for netconsole logging Add a new poll_controller handler that the netpoll interface needs. This enables netconsole logging from a kvm guest over the virtio net interface. Signed-off-by: Amit Shah <amitshah@gmx.net> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
|
#
d9d5dcc8 |
|
18-Feb-2008 |
Christian Borntraeger <borntraeger@de.ibm.com> |
virtio_net: Fix oops on early interrupts - introduced by virtio reset code Signed-off-by: Jeff Garzik <jeff@garzik.org>
|
#
370076d9 |
|
06-Feb-2008 |
Christian Borntraeger <borntraeger@de.ibm.com> |
virtio net: fix oops on interface-up I got the following oops during interface ifup. Unfortunately its not easily reproducable so I cant say for sure that my fix fixes this problem, but I am confident and I think its correct anyway: <2>kernel BUG at /space/kvm/drivers/virtio/virtio_ring.c:234! <4>illegal operation: 0001 [#1] PREEMPT SMP <4>Modules linked in: <4>CPU: 0 Not tainted 2.6.24zlive-guest-07293-gf1ca151-dirty #91 <4>Process swapper (pid: 0, task: 0000000000800938, ksp: 000000000084ddb8) <4>Krnl PSW : 0404300180000000 0000000000466374 (vring_disable_cb+0x30/0x34) <4> R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:0 CC:3 PM:0 EA:3 <4>Krnl GPRS: 0000000000000001 0000000000000001 0000000010003800 0000000000466344 <4> 000000000e980900 00000000008848b0 000000000084e748 0000000000000000 <4> 000000000087b300 0000000000001237 0000000000001237 000000000f85bdd8 <4> 000000000e980920 00000000001137c0 0000000000464754 000000000f85bdd8 <4>Krnl Code: 0000000000466368: e3b0b0700004 lg %r11,112(%r11) <4> 000000000046636e: 07fe bcr 15,%r14 <4> 0000000000466370: a7f40001 brc 15,466372 <4> >0000000000466374: a7f4fff6 brc 15,466360 <4> 0000000000466378: eb7ff0500024 stmg %r7,%r15,80(%r15) <4> 000000000046637e: a7f13e00 tmll %r15,15872 <4> 0000000000466382: b90400ef lgr %r14,%r15 <4> 0000000000466386: a7840001 brc 8,466388 <4>Call Trace: <4>([<000201500f85c000>] 0x201500f85c000) <4> [<0000000000466556>] vring_interrupt+0x72/0x88 <4> [<00000000004801a0>] kvm_extint_handler+0x34/0x44 <4> [<000000000010d22c>] do_extint+0xbc/0xf8 <4> [<0000000000113f98>] ext_no_vtime+0x16/0x1a <4> [<000000000010a182>] cpu_idle+0x216/0x238 <4>([<000000000010a162>] cpu_idle+0x1f6/0x238) <4> [<0000000000568656>] rest_init+0xaa/0xb8 <4> [<000000000084ee2c>] start_kernel+0x3fc/0x490 <4> [<0000000000100020>] _stext+0x20/0x80 <4> <4> <0>Kernel panic - not syncing: Fatal exception in interrupt <4> After looking at the code and the dump I think the following scenario happened: Ifup was running on cpu2 and the interrupt arrived on cpu0. Now virtnet_open on cpu 2 managed to execute napi_enable and disable_cb but did not execute rx_schedule. Meanwhile on cpu 0 skb_recv_done was called by vring_interrupt, executed netif_rx_schedule_prep, which succeeded and therefore called disable_cb. This triggered the BUG_ON, as interrupts were already disabled by cpu 2. I think the proper solution is to make the call to disable_cb depend on the atomic update of NAPI_STATE_SCHED by using netif_rx_schedule_prep in the same way as skb_recv_done. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Acked-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Jeff Garzik <jeff@garzik.org>
|
#
6c0cd7c0 |
|
16-Dec-2007 |
Dor Laor <dor.laor@qumranet.com> |
virtio_net: parametrize the napi_weight for virtio receive queue. It is done in order to improve performance. Signed-off-by: Dor Laor <dor.laor@qumranet.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
|
#
2cb9c6ba |
|
04-Feb-2008 |
Rusty Russell <rusty@rustcorp.com.au> |
virtio: free transmit skbs when notified, not on next xmit. This fixes a potential dangling xmit problem. We also suppress refill interrupts until we need them. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
|
#
a48bd8f6 |
|
04-Feb-2008 |
Rusty Russell <rusty@rustcorp.com.au> |
virtio: flush buffers on open Fix bug found by Christian Borntraeger: if the other side fills all the registered network buffers before we enable NAPI, we will never get an interrupt. The simplest fix is to process the input queue once on open. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
|
#
e70f2f1b |
|
06-Dec-2007 |
Christian Borntraeger <borntraeger@de.ibm.com> |
virtnet: remove double ether_setup Hello Rusty, virtnet_probe already calls alloc_etherdev, which calls ether_setup. There is no need to do that again. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
|
#
6e5aa7ef |
|
04-Feb-2008 |
Rusty Russell <rusty@rustcorp.com.au> |
virtio: reset function A reset function solves three problems: 1) It allows us to renegotiate features, eg. if we want to upgrade a guest driver without rebooting the guest. 2) It gives us a clean way of shutting down virtqueues: after a reset, we know that the buffers won't be used by the host, and 3) It helps the guest recover from messed-up drivers. So we remove the ->shutdown hook, and the only way we now remove feature bits is via reset. We leave it to the driver to do the reset before it deletes queues: the balloon driver, for example, needs to chat to the host in its remove function. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
|
#
b3369c1f |
|
04-Feb-2008 |
Rusty Russell <rusty@rustcorp.com.au> |
virtio: populate network rings in the probe routine, not open Since we want to reset the device to remove them, this is simpler (device is reset for us on driver remove). Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
|
#
34a48579 |
|
04-Feb-2008 |
Rusty Russell <rusty@rustcorp.com.au> |
virtio: Tweak virtio_net defines 1) Turn GSO on virtio net into an all-or-nothing (keep checksumming separate). Having multiple bits is a pain: if you can't support something you should handle it in software, which is still a performance win. 2) Make VIRTIO_NET_HDR_GSO_ECN a flag in the header, so it can apply to IPv6 or v4. 3) Rename VIRTIO_NET_F_NO_CSUM to VIRTIO_NET_F_CSUM (ie. means we do checksumming). 4) Add csum and gso params to virtio_net to allow more testing. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
|
#
50c8ea80 |
|
04-Feb-2008 |
Rusty Russell <rusty@rustcorp.com.au> |
virtio: Net header needs hdr_len It's far easier to deal with packets if we don't have to parse the packet to figure out the header length to know how much to pull into the skb data. Add the field to the virtio_net_hdr struct (and fix the spaces that somehow crept in there). Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
|
#
18445c4d |
|
04-Feb-2008 |
Rusty Russell <rusty@rustcorp.com.au> |
virtio: explicit enable_cb/disable_cb rather than callback return. It seems that virtio_net wants to disable callbacks (interrupts) before calling netif_rx_schedule(), so we can't use the return value to do so. Rename "restart" to "cb_enable" and introduce "cb_disable" hook: callback now returns void, rather than a boolean. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
|
#
a586d4f6 |
|
04-Feb-2008 |
Rusty Russell <rusty@rustcorp.com.au> |
virtio: simplify config mechanism. Previously we used a type/len pair within the config space, but this seems overkill. We now simply define a structure which represents the layout in the config space: the config space can now only be extended at the end. The main driver-visible changes: 1) We indicate what fields are present with an explicit feature bit. 2) Virtqueues are explicitly numbered, and not in the config space. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
|
#
f35d9d8a |
|
04-Feb-2008 |
Rusty Russell <rusty@rustcorp.com.au> |
virtio: Implement skb_partial_csum_set, for setting partial csums on untrusted packets. Use it in virtio_net (replacing buggy version there), it's also going to be used by TAP for partial csum support. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Acked-by: David S. Miller <davem@davemloft.net>
|
#
8329d98e |
|
19-Nov-2007 |
Rusty Russell <rusty@rustcorp.com.au> |
virtio: fix net driver loop case where we fail to restart skb is only NULL the first time around: it's more correct to test for being under-budget. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
|
#
74b2553f |
|
19-Nov-2007 |
Rusty Russell <rusty@rustcorp.com.au> |
virtio: fix module/device unloading The virtio code never hooked through the ->remove callback. Although noone supports device removal at the moment, this code is already needed for module unloading. This of course also revealed bugs in virtio_blk, virtio_net and lguest unloading paths. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
|
#
4d125de3 |
|
06-Nov-2007 |
Rusty Russell <rusty@rustcorp.com.au> |
virtio: more fallout from scatterlist changes. This fixes OOPS in network driver when CONFIG_DEBUG_SG=y. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
|
#
296f96fc |
|
21-Oct-2007 |
Rusty Russell <rusty@rustcorp.com.au> |
Net driver using virtio The network driver uses two virtqueues: one for input packets and one for output packets. This has nice locking properties (ie. we don't do any for recv vs send). TODO: 1) Big packets. 2) Multi-client devices (maybe separate driver?). 3) Resolve freeing of old xmit skbs (Christian Borntraeger) Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: netdev@vger.kernel.org
|