Cross Reference: /linux-master/drivers/vhost/vhost.c

History log of /linux-master/drivers/vhost/vhost.c
Revision	Date	Author	Comments
# 76f40853	11-Mar-2024	Xianting Tian <xianting.tian@linux.alibaba.com>	vhost: correct misleading printing information Guest moved avail idx not used idx when we need to print log if '(vq->avail_idx - last_avail_idx) > vq->num', so fix it. Signed-off-by: Xianting Tian <xianting.tian@linux.alibaba.com> Message-Id: <20240311082109.46773-1-xianting.tian@linux.alibaba.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
# df9ace76	27-Mar-2024	Gavin Shan <gshan@redhat.com>	vhost: Add smp_rmb() in vhost_enable_notify() A smp_rmb() has been missed in vhost_enable_notify(), inspired by Will. Otherwise, it's not ensured the available ring entries pushed by guest can be observed by vhost in time, leading to stale available ring entries fetched by vhost in vhost_get_vq_desc(), as reported by Yihuang Yu on NVidia's grace-hopper (ARM64) platform. /home/gavin/sandbox/qemu.main/build/qemu-system-aarch64 \ -accel kvm -machine virt,gic-version=host -cpu host \ -smp maxcpus=1,cpus=1,sockets=1,clusters=1,cores=1,threads=1 \ -m 4096M,slots=16,maxmem=64G \ -object memory-backend-ram,id=mem0,size=4096M \ : \ -netdev tap,id=vnet0,vhost=true \ -device virtio-net-pci,bus=pcie.8,netdev=vnet0,mac=52:54:00:f1:26:b0 : guest# netperf -H 10.26.1.81 -l 60 -C -c -t UDP_STREAM virtio_net virtio0: output.0:id 100 is not a head! Add the missed smp_rmb() in vhost_enable_notify(). When it returns true, it means there's still pending tx buffers. Since it might read indices, so it still can bypass the smp_rmb() in vhost_get_vq_desc(). Note that it should be safe until vq->avail_idx is changed by commit d3bb267bbdcb ("vhost: cache avail index in vhost_enable_notify()"). Fixes: d3bb267bbdcb ("vhost: cache avail index in vhost_enable_notify()") Cc: <stable@kernel.org> # v5.18+ Reported-by: Yihuang Yu <yihyu@redhat.com> Suggested-by: Will Deacon <will@kernel.org> Signed-off-by: Gavin Shan <gshan@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20240328002149.1141302-3-gshan@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
# 22e1992c	27-Mar-2024	Gavin Shan <gshan@redhat.com>	vhost: Add smp_rmb() in vhost_vq_avail_empty() A smp_rmb() has been missed in vhost_vq_avail_empty(), spotted by Will. Otherwise, it's not ensured the available ring entries pushed by guest can be observed by vhost in time, leading to stale available ring entries fetched by vhost in vhost_get_vq_desc(), as reported by Yihuang Yu on NVidia's grace-hopper (ARM64) platform. /home/gavin/sandbox/qemu.main/build/qemu-system-aarch64 \ -accel kvm -machine virt,gic-version=host -cpu host \ -smp maxcpus=1,cpus=1,sockets=1,clusters=1,cores=1,threads=1 \ -m 4096M,slots=16,maxmem=64G \ -object memory-backend-ram,id=mem0,size=4096M \ : \ -netdev tap,id=vnet0,vhost=true \ -device virtio-net-pci,bus=pcie.8,netdev=vnet0,mac=52:54:00:f1:26:b0 : guest# netperf -H 10.26.1.81 -l 60 -C -c -t UDP_STREAM virtio_net virtio0: output.0:id 100 is not a head! Add the missed smp_rmb() in vhost_vq_avail_empty(). When tx_can_batch() returns true, it means there's still pending tx buffers. Since it might read indices, so it still can bypass the smp_rmb() in vhost_get_vq_desc(). Note that it should be safe until vq->avail_idx is changed by commit 275bf960ac697 ("vhost: better detection of available buffers"). Fixes: 275bf960ac69 ("vhost: better detection of available buffers") Cc: <stable@kernel.org> # v4.11+ Reported-by: Yihuang Yu <yihyu@redhat.com> Suggested-by: Will Deacon <will@kernel.org> Signed-off-by: Gavin Shan <gshan@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20240328002149.1141302-2-gshan@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
# 3652117f	22-Nov-2023	Christian Brauner <brauner@kernel.org>	eventfd: simplify eventfd_signal() Ever since the eventfd type was introduced back in 2007 in commit e1ad7468c77d ("signal/timer/event: eventfd core") the eventfd_signal() function only ever passed 1 as a value for @n. There's no point in keeping that additional argument. Link: https://lore.kernel.org/r/20231122-vfs-eventfd-signal-v2-2-bd549b14ce0c@kernel.org Acked-by: Xu Yilun <yilun.xu@intel.com> Acked-by: Andrew Donnellan <ajd@linux.ibm.com> # ocxl Acked-by: Eric Farman <farman@linux.ibm.com> # s390 Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Christian Brauner <brauner@kernel.org>
# ca50ec37	27-Sep-2023	Eric Auger <eric.auger@redhat.com>	vhost: Allow null msg.size on VHOST_IOTLB_INVALIDATE Commit e2ae38cf3d91 ("vhost: fix hung thread due to erroneous iotlb entries") Forbade vhost iotlb msg with null size to prevent entries with size = start = 0 and last = ULONG_MAX to end up in the iotlb. Then commit 95932ab2ea07 ("vhost: allow batching hint without size") only applied the check for VHOST_IOTLB_UPDATE and VHOST_IOTLB_INVALIDATE message types to fix a regression observed with batching hit. Still, the introduction of that check introduced a regression for some users attempting to invalidate the whole ULONG_MAX range by setting the size to 0. This is the case with qemu/smmuv3/vhost integration which does not work anymore. It Looks safe to partially revert the original commit and allow VHOST_IOTLB_INVALIDATE messages with null size. vhost_iotlb_del_range() will compute a correct end iova. Same for vhost_vdpa_iotlb_unmap(). Signed-off-by: Eric Auger <eric.auger@redhat.com> Fixes: e2ae38cf3d91 ("vhost: fix hung thread due to erroneous iotlb entries") Cc: stable@vger.kernel.org # v5.17+ Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20230927140544.205088-1-eric.auger@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
# 228a27cf	26-Jun-2023	Mike Christie <michael.christie@oracle.com>	vhost: Allow worker switching while work is queueing This patch drops the requirement that we can only switch workers if work has not been queued by using RCU for the vq based queueing paths and a mutex for the device wide flush. We can also use this to support SIGKILL properly in the future where we should exit almost immediately after getting that signal. With this patch, when get_signal returns true, we can set the vq->worker to NULL and do a synchronize_rcu to prevent new work from being queued to the vhost_task that has been killed. Signed-off-by: Mike Christie <michael.christie@oracle.com> Message-Id: <20230626232307.97930-18-michael.christie@oracle.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
# c1ecd8e9	26-Jun-2023	Mike Christie <michael.christie@oracle.com>	vhost: allow userspace to create workers For vhost-scsi with 3 vqs or more and a workload that tries to use them in parallel like: fio --filename=/dev/sdb --direct=1 --rw=randrw --bs=4k \ --ioengine=libaio --iodepth=128 --numjobs=3 the single vhost worker thread will become a bottlneck and we are stuck at around 500K IOPs no matter how many jobs, virtqueues, and CPUs are used. To better utilize virtqueues and available CPUs, this patch allows userspace to create workers and bind them to vqs. You can have N workers per dev and also share N workers with M vqs on that dev. This patch adds the interface related code and the next patch will hook vhost-scsi into it. The patches do not try to hook net and vsock into the interface because: 1. multiple workers don't seem to help vsock. The problem is that with only 2 virtqueues we never fully use the existing worker when doing bidirectional tests. This seems to match vhost-scsi where we don't see the worker as a bottleneck until 3 virtqueues are used. 2. net already has a way to use multiple workers. Signed-off-by: Mike Christie <michael.christie@oracle.com> Message-Id: <20230626232307.97930-16-michael.christie@oracle.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
# 1cdaafa1	26-Jun-2023	Mike Christie <michael.christie@oracle.com>	vhost: replace single worker pointer with xarray The next patch allows userspace to create multiple workers per device, so this patch replaces the vhost_worker pointer with an xarray so we can store mupltiple workers and look them up. Signed-off-by: Mike Christie <michael.christie@oracle.com> Message-Id: <20230626232307.97930-15-michael.christie@oracle.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
# cef25866	26-Jun-2023	Mike Christie <michael.christie@oracle.com>	vhost: add helper to parse userspace vring state/file The next patches add new vhost worker ioctls which will need to get a vhost_virtqueue from a userspace struct which specifies the vq's index. This moves the vhost_vring_ioctl code to do this to a helper so it can be shared. Signed-off-by: Mike Christie <michael.christie@oracle.com> Message-Id: <20230626232307.97930-14-michael.christie@oracle.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
# 27eca189	26-Jun-2023	Mike Christie <michael.christie@oracle.com>	vhost: remove vhost_work_queue vhost_work_queue is no longer used. Each driver is using the poll or vq based queueing, so remove vhost_work_queue. Signed-off-by: Mike Christie <michael.christie@oracle.com> Message-Id: <20230626232307.97930-13-michael.christie@oracle.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
# 493b94bf	26-Jun-2023	Mike Christie <michael.christie@oracle.com>	vhost: convert poll work to be vq based This has the drivers pass in their poll to vq mapping and then converts the core poll code to use the vq based helpers. In the next patches we will allow vqs to be handled by different workers, so to allow drivers to execute operations like queue, stop, flush, etc on specific polls/vqs we need to know the mappings. Signed-off-by: Mike Christie <michael.christie@oracle.com> Message-Id: <20230626232307.97930-8-michael.christie@oracle.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
# a6fc0473	26-Jun-2023	Mike Christie <michael.christie@oracle.com>	vhost: take worker or vq for flushing This patch has the core work flush function take a worker. When we support multiple workers we can then flush each worker during device removal, stoppage, etc. It also adds a helper to flush specific virtqueues, so vhost-scsi can flush IO vqs from it's ctl vq. Signed-off-by: Mike Christie <michael.christie@oracle.com> Message-Id: <20230626232307.97930-7-michael.christie@oracle.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
# 0921dddc	26-Jun-2023	Mike Christie <michael.christie@oracle.com>	vhost: take worker or vq instead of dev for queueing This patch has the core work queueing function take a worker for when we support multiple workers. It also adds a helper that takes a vq during queueing so modules can control which vq/worker to queue work on. This temp leaves vhost_work_queue. It will be removed when the drivers are converted in the next patches. Signed-off-by: Mike Christie <michael.christie@oracle.com> Message-Id: <20230626232307.97930-6-michael.christie@oracle.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
# 9784df15	26-Jun-2023	Mike Christie <michael.christie@oracle.com>	vhost, vhost_net: add helper to check if vq has work In the next patches each vq might have different workers so one could have work but others do not. For net, we only want to check specific vqs, so this adds a helper to check if a vq has work pending and converts vhost-net to use it. Signed-off-by: Mike Christie <michael.christie@oracle.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20230626232307.97930-5-michael.christie@oracle.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
# 737bdb64	26-Jun-2023	Mike Christie <michael.christie@oracle.com>	vhost: add vhost_worker pointer to vhost_virtqueue This patchset allows userspace to map vqs to different workers. This patch adds a worker pointer to the vq so in later patches in this set we can queue/flush specific vqs and their workers. Signed-off-by: Mike Christie <michael.christie@oracle.com> Message-Id: <20230626232307.97930-4-michael.christie@oracle.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
# c011bb66	26-Jun-2023	Mike Christie <michael.christie@oracle.com>	vhost: dynamically allocate vhost_worker This patchset allows us to allocate multiple workers, so this has us move from the vhost_worker that's embedded in the vhost_dev to dynamically allocating it. Signed-off-by: Mike Christie <michael.christie@oracle.com> Message-Id: <20230626232307.97930-3-michael.christie@oracle.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
# 3e11c6eb	26-Jun-2023	Mike Christie <michael.christie@oracle.com>	vhost: create worker at end of vhost_dev_set_owner vsock can start queueing work after VHOST_VSOCK_SET_GUEST_CID, so after we have called vhost_worker_create it can be calling vhost_work_queue and trying to access the vhost worker/task. If vhost_dev_alloc_iovecs fails, then vhost_worker_free could free the worker/task from under vsock. This moves vhost_worker_create to the end of vhost_dev_set_owner where we know we can no longer fail in that path. If it fails after the VHOST_SET_OWNER and userspace closes the device, then the normal vsock release handling will do the right thing. Signed-off-by: Mike Christie <michael.christie@oracle.com> Message-Id: <20230626232307.97930-2-michael.christie@oracle.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
# 55d8122f	24-Apr-2023	Shannon Nelson <shannon.nelson@amd.com>	vhost: support PACKED when setting-getting vring_base Use the right structs for PACKED or split vqs when setting and getting the vring base. Fixes: 4c8cf31885f6 ("vhost: introduce vDPA-based backend") Signed-off-by: Shannon Nelson <shannon.nelson@amd.com> Message-Id: <20230424225031.18947-3-shannon.nelson@amd.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>
# 4b13cbef	07-Jun-2023	Mike Christie <michael.christie@oracle.com>	vhost: Fix worker hangs due to missed wake up calls We can race where we have added work to the work_list, but vhost_task_fn has passed that check but not yet set us into TASK_INTERRUPTIBLE. wake_up_process will see us in TASK_RUNNING and just return. This bug was intoduced in commit f9010dbdce91 ("fork, vhost: Use CLONE_THREAD to fix freezer/ps regression") when I moved the setting of TASK_INTERRUPTIBLE to simplfy the code and avoid get_signal from logging warnings about being in the wrong state. This moves the setting of TASK_INTERRUPTIBLE back to before we test if we need to stop the task to avoid a possible race there as well. We then have vhost_worker set TASK_RUNNING if it finds work similar to before. Fixes: f9010dbdce91 ("fork, vhost: Use CLONE_THREAD to fix freezer/ps regression") Signed-off-by: Mike Christie <michael.christie@oracle.com> Message-Id: <20230607192338.6041-3-michael.christie@oracle.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
# a284f09e	07-Jun-2023	Mike Christie <michael.christie@oracle.com>	vhost: Fix crash during early vhost_transport_send_pkt calls If userspace does VHOST_VSOCK_SET_GUEST_CID before VHOST_SET_OWNER we can race where: 1. thread0 calls vhost_transport_send_pkt -> vhost_work_queue 2. thread1 does VHOST_SET_OWNER which calls vhost_worker_create. 3. vhost_worker_create will set the dev->worker pointer before setting the worker->vtsk pointer. 4. thread0's vhost_work_queue will see the dev->worker pointer is set and try to call vhost_task_wake using not yet set worker->vtsk pointer. 5. We then crash since vtsk is NULL. Before commit 6e890c5d5021 ("vhost: use vhost_tasks for worker threads"), we only had the worker pointer so we could just check it to see if VHOST_SET_OWNER has been done. After that commit we have the vhost_worker and vhost_task pointer, so we can now hit the bug above. This patch embeds the vhost_worker in the vhost_dev and moves the work list initialization back to vhost_dev_init, so we can just check the worker.vtsk pointer to check if VHOST_SET_OWNER has been done like before. Fixes: 6e890c5d5021 ("vhost: use vhost_tasks for worker threads") Signed-off-by: Mike Christie <michael.christie@oracle.com> Message-Id: <20230607192338.6041-2-michael.christie@oracle.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reported-by: syzbot+d0d442c22fa8db45ff0e@syzkaller.appspotmail.com Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
# 4d8df0f5	22-May-2023	Prathu Baronia <prathubaronia2011@gmail.com>	vhost: use kzalloc() instead of kmalloc() followed by memset() Use kzalloc() to allocate new zeroed out msg node instead of memsetting a node allocated with kmalloc(). Signed-off-by: Prathu Baronia <prathubaronia2011@gmail.com> Message-Id: <20230522085019.42914-1-prathubaronia2011@gmail.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
# f9010dbd	01-Jun-2023	Mike Christie <michael.christie@oracle.com>	fork, vhost: Use CLONE_THREAD to fix freezer/ps regression When switching from kthreads to vhost_tasks two bugs were added: 1. The vhost worker tasks's now show up as processes so scripts doing ps or ps a would not incorrectly detect the vhost task as another process. 2. kthreads disabled freeze by setting PF_NOFREEZE, but vhost tasks's didn't disable or add support for them. To fix both bugs, this switches the vhost task to be thread in the process that does the VHOST_SET_OWNER ioctl, and has vhost_worker call get_signal to support SIGKILL/SIGSTOP and freeze signals. Note that SIGKILL/STOP support is required because CLONE_THREAD requires CLONE_SIGHAND which requires those 2 signals to be supported. This is a modified version of the patch written by Mike Christie <michael.christie@oracle.com> which was a modified version of patch originally written by Linus. Much of what depended upon PF_IO_WORKER now depends on PF_USER_WORKER. Including ignoring signals, setting up the register state, and having get_signal return instead of calling do_group_exit. Tidied up the vhost_task abstraction so that the definition of vhost_task only needs to be visible inside of vhost_task.c. Making it easier to review the code and tell what needs to be done where. As part of this the main loop has been moved from vhost_worker into vhost_task_fn. vhost_worker now returns true if work was done. The main loop has been updated to call get_signal which handles SIGSTOP, freezing, and collects the message that tells the thread to exit as part of process exit. This collection clears __fatal_signal_pending. This collection is not guaranteed to clear signal_pending() so clear that explicitly so the schedule() sleeps. For now the vhost thread continues to exist and run work until the last file descriptor is closed and the release function is called as part of freeing struct file. To avoid hangs in the coredump rendezvous and when killing threads in a multi-threaded exec. The coredump code and de_thread have been modified to ignore vhost threads. Remvoing the special case for exec appears to require teaching vhost_dev_flush how to directly complete transactions in case the vhost thread is no longer running. Removing the special case for coredump rendezvous requires either the above fix needed for exec or moving the coredump rendezvous into get_signal. Fixes: 6e890c5d5021 ("vhost: use vhost_tasks for worker threads") Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Co-developed-by: Mike Christie <michael.christie@oracle.com> Signed-off-by: Mike Christie <michael.christie@oracle.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
# e4be66e5	27-Feb-2023	Jacob Keller <jacob.e.keller@intel.com>	vhost: use struct_size and size_add to compute flex array sizes The vhost_get_avail_size and vhost_get_used_size functions compute the size of structures with flexible array members with an additional 2 bytes if the VIRTIO_RING_F_EVENT_IDX feature flag is set. Convert these functions to use struct_size() and size_add() instead of coding the calculation by hand. This ensures that the calculations will saturate at SIZE_MAX rather than overflowing. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Cc: virtualization@lists.linux-foundation.org Cc: kvm@vger.kernel.org Message-Id: <20230227214127.3678392-1-jacob.e.keller@intel.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
# 05bfb338	24-Feb-2023	Josh Poimboeuf <jpoimboe@kernel.org>	vhost: Fix livepatch timeouts in vhost_worker() Livepatch timeouts were reported due to busy vhost_worker() kthreads. Now that cond_resched() can do livepatch task switching, use cond_resched() in vhost_worker(). That's the better way to conditionally call schedule() anyway. Reported-by: Seth Forshee (DigitalOcean) <sforshee@kernel.org> Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Petr Mladek <pmladek@suse.com> Tested-by: Seth Forshee (DigitalOcean) <sforshee@kernel.org> Link: https://lore.kernel.org/r/509f6ea6fe6505f0a75a66026ba531c765ef922f.1677257135.git.jpoimboe@kernel.org
# ff61f079	14-Mar-2023	Jonathan Corbet <corbet@lwn.net>	docs: move x86 documentation into Documentation/arch/ Move the x86 documentation under Documentation/arch/ as a way of cleaning up the top-level directory and making the structure of our docs more closely match the structure of the source directories it describes. All in-kernel references to the old paths have been updated. Acked-by: Dave Hansen <dave.hansen@linux.intel.com> Cc: linux-arch@vger.kernel.org Cc: x86@kernel.org Cc: Borislav Petkov <bp@alien8.de> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/lkml/20230315211523.108836-1-corbet@lwn.net/ Signed-off-by: Jonathan Corbet <corbet@lwn.net>
# 6e890c5d	10-Mar-2023	Mike Christie <michael.christie@oracle.com>	vhost: use vhost_tasks for worker threads For vhost workers we use the kthread API which inherit's its values from and checks against the kthreadd thread. This results in the wrong RLIMITs being checked, so while tools like libvirt try to control the number of threads based on the nproc rlimit setting we can end up creating more threads than the user wanted. This patch has us use the vhost_task helpers which will inherit its values/checks from the thread that owns the device similar to if we did a clone in userspace. The vhost threads will now be counted in the nproc rlimits. And we get features like cgroups and mm sharing automatically, so we can remove those calls. Signed-off-by: Mike Christie <michael.christie@oracle.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
# 1a5f8090	10-Mar-2023	Mike Christie <michael.christie@oracle.com>	vhost: move worker thread fields to new struct This is just a prep patch. It moves the worker related fields to a new vhost_worker struct and moves the code around to create some helpers that will be used in the next patch. Signed-off-by: Mike Christie <michael.christie@oracle.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
# 759aba1e	09-Jan-2023	Liming Wu <liming.wu@jaguarmicro.com>	vhost: remove unused paramete "enabled" is defined in vhost_init_device_iotlb, but it is never used. Let's remove it. Signed-off-by: Liming Wu <liming.wu@jaguarmicro.com> Message-Id: <20230110024445.303-1-liming.wu@jaguarmicro.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Reviewed-by: Eugenio Pérez <eperezma@redhat.com>
# 9526f9a2	17-Jan-2023	Eric Auger <eric.auger@redhat.com>	vhost/net: Clear the pending messages when the backend is removed When the vhost iotlb is used along with a guest virtual iommu and the guest gets rebooted, some MISS messages may have been recorded just before the reboot and spuriously executed by the virtual iommu after the reboot. As vhost does not have any explicit reset user API, VHOST_NET_SET_BACKEND looks a reasonable point where to clear the pending messages, in case the backend is removed. Export vhost_clear_msg() and call it in vhost_net_set_backend() when fd == -1. Signed-off-by: Eric Auger <eric.auger@redhat.com> Suggested-by: Jason Wang <jasowang@redhat.com> Fixes: 6b1e6cc7855b0 ("vhost: new device IOTLB API") Message-Id: <20230117151518.44725-3-eric.auger@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
# 98047313	09-Nov-2022	Stefano Garzarella <sgarzare@redhat.com>	vhost: fix range used in translate_desc() vhost_iotlb_itree_first() requires `start` and `last` parameters to search for a mapping that overlaps the range. In translate_desc() we cyclically call vhost_iotlb_itree_first(), incrementing `addr` by the amount already translated, so rightly we move the `start` parameter passed to vhost_iotlb_itree_first(), but we should hold the `last` parameter constant. Let's fix it by saving the `last` parameter value before incrementing `addr` in the loop. Fixes: a9709d6874d5 ("vhost: convert pre sorted vhost memory array to interval tree") Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Message-Id: <20221109102503.18816-3-sgarzare@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
# de4eda9d	15-Sep-2022	Al Viro <viro@zeniv.linux.org.uk>	use less confusing names for iov_iter direction initializers READ/WRITE proved to be actively confusing - the meanings are "data destination, as used with read(2)" and "data source, as used with write(2)", but people keep interpreting those as "we read data from it" and "we write data to it", i.e. exactly the wrong way. Call them ITER_DEST and ITER_SOURCE - at least that is harder to misinterpret... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
# e3bf3df8	15-Sep-2022	Al Viro <viro@zeniv.linux.org.uk>	[vhost] fix 'direction' argument of iov_iter_{init,bvec}() READ means "data destination", WRITE - "data source". Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
# b2ffa407	17-May-2022	Mike Christie <michael.christie@oracle.com>	vhost: rename vhost_work_dev_flush This patch renames vhost_work_dev_flush to just vhost_dev_flush to relfect that it flushes everything on the device and that drivers don't know/care that polls are based on vhost_works. Drivers just flush the entire device and polls, and works for vhost-scsi management TMFs and IO net virtqueues, etc all are flushed. Signed-off-by: Mike Christie <michael.christie@oracle.com> Acked-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Message-Id: <20220517180850.198915-9-michael.christie@oracle.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
# 6ca84326	17-May-2022	Mike Christie <michael.christie@oracle.com>	vhost: flush dev once during vhost_dev_stop When vhost_work_dev_flush returns all work queued at that time will have completed. There is then no need to flush after every vhost_poll_stop call, and we can move the flush call to after the loop that stops the pollers. Signed-off-by: Mike Christie <michael.christie@oracle.com> Acked-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Message-Id: <20220517180850.198915-3-michael.christie@oracle.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
# 6fcf224c	17-May-2022	Andrey Ryabinin <arbn@yandex-team.com>	vhost: get rid of vhost_poll_flush() wrapper vhost_poll_flush() is a simple wrapper around vhost_work_dev_flush(). It gives wrong impression that we are doing some work over vhost_poll, while in fact it flushes vhost_poll->dev. It only complicate understanding of the code and leads to mistakes like flushing the same vhost_dev several times in a row. Just remove vhost_poll_flush() and call vhost_work_dev_flush() directly. Signed-off-by: Andrey Ryabinin <arbn@yandex-team.com> [merge vhost_poll_flush removal from Stefano Garzarella] Signed-off-by: Mike Christie <michael.christie@oracle.com> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Acked-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Message-Id: <20220517180850.198915-2-michael.christie@oracle.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
# aaca8373	30-Mar-2022	Gautam Dawar <gautam.dawar@xilinx.com>	vhost-vdpa: support ASID based IOTLB API This patch extends the vhost-vdpa to support ASID based IOTLB API. The vhost-vdpa device will allocated multiple IOTLBs for vDPA device that supports multiple address spaces. The IOTLBs and vDPA device memory mappings is determined and maintained through ASID. Note that we still don't support vDPA device with more than one address spaces that depends on platform IOMMU. This work will be done by moving the IOMMU logic from vhost-vDPA to vDPA device driver. Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Gautam Dawar <gdawar@xilinx.com> Message-Id: <20220330180436.24644-16-gdawar@xilinx.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Includes fixup: vhost-vdpa: Fix some error handling path in vhost_vdpa_process_iotlb_msg() In the error paths introduced by the original patch, a mutex may be left locked. Add the correct goto instead of a direct return. Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Message-Id: <89ef0ae4c26ac3cfa440c71e97e392dcb328ac1b.1653227924.git.christophe.jaillet@wanadoo.fr> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
# 91233ad7	30-Mar-2022	Gautam Dawar <gautam.dawar@xilinx.com>	vhost: support ASID in IOTLB API This patches allows userspace to send ASID based IOTLB message to vhost. This idea is to use the reserved u32 field in the existing V2 IOTLB message. Vhost device should advertise this capability via VHOST_BACKEND_F_IOTLB_ASID backend feature. Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Gautam Dawar <gdawar@xilinx.com> Message-Id: <20220330180436.24644-10-gdawar@xilinx.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
# d3bb267b	21-Jan-2022	Stefano Garzarella <sgarzare@redhat.com>	vhost: cache avail index in vhost_enable_notify() In vhost_enable_notify() we enable the notifications and we read the avail index to check if new buffers have become available in the meantime. We are not caching the avail index, so when the device will call vhost_get_vq_desc(), it will find the old value in the cache and it will read the avail index again. It would be better to refresh the cache every time we read avail index, so let's change vhost_enable_notify() caching the value in `avail_idx` and compare it with `last_avail_idx` to check if there are new buffers available. We don't expect a significant performance boost because the above path is not very common, indeed vhost_enable_notify() is often called with unlikely(), expecting that avail index has not been updated. We ran virtio-test/vhost-test and noticed minimal improvement as expected. To stress the patch more, we modified vhost_test.ko to call vhost_enable_notify()/vhost_disable_notify() on every cycle when calling vhost_get_vq_desc(); in this case we observed a more evident improvement, with a reduction of the test execution time of about 3.7%. Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Link: https://lore.kernel.org/r/20220121153108.187291-1-sgarzare@redhat.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
# 95932ab2	10-Mar-2022	Jason Wang <jasowang@redhat.com>	vhost: allow batching hint without size Commit e2ae38cf3d91 ("vhost: fix hung thread due to erroneous iotlb entries") tries to reject the IOTLB message whose size is zero. But the size is not necessarily meaningful, one example is the batching hint, so the commit breaks that. Fixing this be reject zero size message only if the message is used to update/invalidate the IOTLB. Fixes: e2ae38cf3d91 ("vhost: fix hung thread due to erroneous iotlb entries") Reported-by: Eli Cohen <elic@nvidia.com> Cc: Anirudh Rayabharam <mail@anirudhrb.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Link: https://lore.kernel.org/r/20220310075211.4801-1-jasowang@redhat.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Tested-by: Eli Cohen <elic@nvidia.com>
# 4c809363	13-Jan-2022	Stefano Garzarella <sgarzare@redhat.com>	vhost: remove avail_event arg from vhost_update_avail_event() In vhost_update_avail_event() we never used the `avail_event` argument, since its introduction in commit 2723feaa8ec6 ("vhost: set log when updating used flags or avail event"). Let's remove it to clean up the code. Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Link: https://lore.kernel.org/r/20220113141134.186773-1-sgarzare@redhat.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>