History log of /linux-master/virt/kvm/vfio.c
Revision Date Author Comments
# 73e2f19d 14-Jul-2023 Dmitry Torokhov <dmitry.torokhov@gmail.com>

kvm/vfio: avoid bouncing the mutex when adding and deleting groups

Stop taking kv->lock mutex in kvm_vfio_update_coherency() and instead
call it with this mutex held: the callers of the function usually
already have it taken (and released) before calling
kvm_vfio_update_coherency(). This avoid bouncing the lock up and down.

The exception is kvm_vfio_release() where we do not take the lock, but
it is being executed when the very last reference to kvm_device is being
dropped, so there are no concerns about concurrency.

Suggested-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Link: https://lore.kernel.org/r/20230714224538.404793-2-dmitry.torokhov@gmail.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>


# 9e0f4f29 14-Jul-2023 Dmitry Torokhov <dmitry.torokhov@gmail.com>

kvm/vfio: ensure kvg instance stays around in kvm_vfio_group_add()

kvm_vfio_group_add() creates kvg instance, links it to kv->group_list,
and calls kvm_vfio_file_set_kvm() with kvg->file as an argument after
dropping kv->lock. If we race group addition and deletion calls, kvg
instance may get freed by the time we get around to calling
kvm_vfio_file_set_kvm().

Previous iterations of the code did not reference kvg->file outside of
the critical section, but used a temporary variable. Still, they had
similar problem of the file reference being owned by kvg structure and
potential for kvm_vfio_group_del() dropping it before
kvm_vfio_group_add() had a chance to complete.

Fix this by moving call to kvm_vfio_file_set_kvm() under the protection
of kv->lock. We already call it while holding the same lock when vfio
group is being deleted, so it should be safe here as well.

Fixes: 2fc1bec15883 ("kvm: set/clear kvm to/from vfio_group when group add/delete")
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Link: https://lore.kernel.org/r/20230714224538.404793-1-dmitry.torokhov@gmail.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>


# dcc31ea6 18-Jul-2023 Yi Liu <yi.l.liu@intel.com>

kvm/vfio: Accept vfio device file from userspace

This defines KVM_DEV_VFIO_FILE* and make alias with KVM_DEV_VFIO_GROUP*.
Old userspace uses KVM_DEV_VFIO_GROUP* works as well.

Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Tested-by: Terrence Xu <terrence.xu@intel.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Matthew Rosato <mjrosato@linux.ibm.com>
Tested-by: Yanting Jiang <yanting.jiang@intel.com>
Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Tested-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Link: https://lore.kernel.org/r/20230718135551.6592-6-yi.l.liu@intel.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>


# 2f99073a 18-Jul-2023 Yi Liu <yi.l.liu@intel.com>

kvm/vfio: Prepare for accepting vfio device fd

This renames kvm_vfio_group related helpers to prepare for accepting
vfio device fd. No functional change is intended.

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Tested-by: Terrence Xu <terrence.xu@intel.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Matthew Rosato <mjrosato@linux.ibm.com>
Tested-by: Yanting Jiang <yanting.jiang@intel.com>
Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Tested-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Link: https://lore.kernel.org/r/20230718135551.6592-5-yi.l.liu@intel.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>


# b1a59be8 18-Jul-2023 Yi Liu <yi.l.liu@intel.com>

vfio: Refine vfio file kAPIs for KVM

This prepares for making the below kAPIs to accept both group file
and device file instead of only vfio group file.

bool vfio_file_enforced_coherent(struct file *file);
void vfio_file_set_kvm(struct file *file, struct kvm *kvm);

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Tested-by: Terrence Xu <terrence.xu@intel.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Matthew Rosato <mjrosato@linux.ibm.com>
Tested-by: Yanting Jiang <yanting.jiang@intel.com>
Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Tested-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Link: https://lore.kernel.org/r/20230718135551.6592-3-yi.l.liu@intel.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>


# 51cdc8bc 20-Jan-2023 Yi Liu <yi.l.liu@intel.com>

kvm/vfio: Fix potential deadlock on vfio group_lock

Currently it is possible that the final put of a KVM reference comes from
vfio during its device close operation. This occurs while the vfio group
lock is held; however, if the vfio device is still in the kvm device list,
then the following call chain could result in a deadlock:

VFIO holds group->group_lock/group_rwsem
-> kvm_put_kvm
-> kvm_destroy_vm
-> kvm_destroy_devices
-> kvm_vfio_destroy
-> kvm_vfio_file_set_kvm
-> vfio_file_set_kvm
-> try to hold group->group_lock/group_rwsem

The key function is the kvm_destroy_devices() which triggers destroy cb
of kvm_device_ops. It calls back to vfio and try to hold group_lock. So
if this path doesn't call back to vfio, this dead lock would be fixed.
Actually, there is a way for it. KVM provides another point to free the
kvm-vfio device which is the point when the device file descriptor is
closed. This can be achieved by providing the release cb instead of the
destroy cb. Also rename kvm_vfio_destroy() to be kvm_vfio_release().

/*
* Destroy is responsible for freeing dev.
*
* Destroy may be called before or after destructors are called
* on emulated I/O regions, depending on whether a reference is
* held by a vcpu or other kvm component that gets destroyed
* after the emulated I/O.
*/
void (*destroy)(struct kvm_device *dev);

/*
* Release is an alternative method to free the device. It is
* called when the device file descriptor is closed. Once
* release is called, the destroy method will not be called
* anymore as the device is removed from the device list of
* the VM. kvm->lock is held.
*/
void (*release)(struct kvm_device *dev);

Fixes: 421cfe6596f6 ("vfio: remove VFIO_GROUP_NOTIFY_SET_KVM")
Reported-by: Alex Williamson <alex.williamson@redhat.com>
Suggested-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com>
Link: https://lore.kernel.org/r/20230114000351.115444-1-mjrosato@linux.ibm.com
Link: https://lore.kernel.org/r/20230120150528.471752-1-yi.l.liu@intel.com
[aw: update comment as well, s/destroy/release/]
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>


# 819da99a 07-Oct-2022 Jason Gunthorpe <jgg@ziepe.ca>

vfio: Hold a reference to the iommu_group in kvm for SPAPR

SPAPR exists completely outside the normal iommu driver framework, the
groups it creates are fake and are only created to enable VFIO's uAPI.

Thus, it does not need to follow the iommu core rule that the iommu_group
will only be touched while a driver is attached.

Carry a group reference into KVM and have KVM directly manage the lifetime
of this object independently of VFIO. This means KVM no longer relies on
the vfio group file being valid to maintain the group reference.

Tested-by: Matthew Rosato <mjrosato@linux.ibm.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/2-v2-15417f29324e+1c-vfio_group_disassociate_jgg@nvidia.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>


# 4b22ef04 07-Oct-2022 Jason Gunthorpe <jgg@ziepe.ca>

vfio: Add vfio_file_is_group()

This replaces uses of vfio_file_iommu_group() which were only detecting if
the file is a VFIO file with no interest in the actual group.

The only remaning user of vfio_file_iommu_group() is in KVM for the SPAPR
stuff. It passes the iommu_group into the arch code through kvm for some
reason.

Tested-by: Matthew Rosato <mjrosato@linux.ibm.com>
Tested-by: Christian Borntraeger <borntraeger@de.ibm.com>
Tested-by: Eric Farman <farman@linux.ibm.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/1-v2-15417f29324e+1c-vfio_group_disassociate_jgg@nvidia.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>


# 6b17ca8e 16-May-2022 Wan Jiabing <wanjiabing@vivo.com>

kvm/vfio: Fix potential deadlock problem in vfio

Fix following coccicheck warning:
./virt/kvm/vfio.c:258:1-7: preceding lock on line 236

If kvm_vfio_file_iommu_group() failed, code would goto err_fdput with
mutex_lock acquired and then return ret. It might cause potential
deadlock. Move mutex_unlock bellow err_fdput tag to fix it.

Fixes: d55d9e7a45721 ("kvm/vfio: Store the struct file in the kvm_vfio_group")
Signed-off-by: Wan Jiabing <wanjiabing@vivo.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/20220517023441.4258-1-wanjiabing@vivo.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>


# 3e5449d5 04-May-2022 Jason Gunthorpe <jgg@ziepe.ca>

kvm/vfio: Remove vfio_group from kvm

None of the VFIO APIs take in the vfio_group anymore, so we can remove it
completely.

This has a subtle side effect on the enforced coherency tracking. The
vfio_group_get_external_user() was holding on to the container_users which
would prevent the iommu_domain and thus the enforced coherency value from
changing while the group is registered with kvm.

It changes the security proof slightly into 'user must hold a group FD
that has a device that cannot enforce DMA coherence'. As opening the group
FD, not attaching the container, is the privileged operation this doesn't
change the security properties much.

On the flip side it paves the way to changing the iommu_domain/container
attached to a group at runtime which is something that will be required to
support nested translation.

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>i
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/7-v3-f7729924a7ea+25e33-vfio_kvm_no_group_jgg@nvidia.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>


# ba70a89f 04-May-2022 Jason Gunthorpe <jgg@ziepe.ca>

vfio: Change vfio_group_set_kvm() to vfio_file_set_kvm()

Just change the argument from struct vfio_group to struct file *.

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/6-v3-f7729924a7ea+25e33-vfio_kvm_no_group_jgg@nvidia.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>


# a905ad04 04-May-2022 Jason Gunthorpe <jgg@ziepe.ca>

vfio: Change vfio_external_check_extension() to vfio_file_enforced_coherent()

Instead of a general extension check change the function into a limited
test if the iommu_domain has enforced coherency, which is the only thing
kvm needs to query.

Make the new op self contained by properly refcounting the container
before touching it.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/5-v3-f7729924a7ea+25e33-vfio_kvm_no_group_jgg@nvidia.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>


# c38ff5b0 04-May-2022 Jason Gunthorpe <jgg@ziepe.ca>

vfio: Remove vfio_external_group_match_file()

vfio_group_fops_open() ensures there is only ever one struct file open for
any struct vfio_group at any time:

/* Do we need multiple instances of the group open? Seems not. */
opened = atomic_cmpxchg(&group->opened, 0, 1);
if (opened) {
vfio_group_put(group);
return -EBUSY;

Therefor the struct file * can be used directly to search the list of VFIO
groups that KVM keeps instead of using the
vfio_external_group_match_file() callback to try to figure out if the
passed in FD matches the list or not.

Delete vfio_external_group_match_file().

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Yi Liu <yi.l.liu@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/4-v3-f7729924a7ea+25e33-vfio_kvm_no_group_jgg@nvidia.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>


# 50d63b5b 04-May-2022 Jason Gunthorpe <jgg@ziepe.ca>

vfio: Change vfio_external_user_iommu_id() to vfio_file_iommu_group()

The only caller wants to get a pointer to the struct iommu_group
associated with the VFIO group file. Instead of returning the group ID
then searching sysfs for that string to get the struct iommu_group just
directly return the iommu_group pointer already held by the vfio_group
struct.

It already has a safe lifetime due to the struct file kref, the vfio_group
and thus the iommu_group cannot be destroyed while the group file is open.

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Yi Liu <yi.l.liu@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/3-v3-f7729924a7ea+25e33-vfio_kvm_no_group_jgg@nvidia.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>


# d55d9e7a 04-May-2022 Jason Gunthorpe <jgg@ziepe.ca>

kvm/vfio: Store the struct file in the kvm_vfio_group

Following patches will change the APIs to use the struct file as the handle
instead of the vfio_group, so hang on to a reference to it with the same
duration of as the vfio_group.

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Yi Liu <yi.l.liu@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/2-v3-f7729924a7ea+25e33-vfio_kvm_no_group_jgg@nvidia.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>


# 73b0565f 04-May-2022 Jason Gunthorpe <jgg@ziepe.ca>

kvm/vfio: Move KVM_DEV_VFIO_GROUP_* ioctls into functions

To make it easier to read and change in following patches.

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Yi Liu <yi.l.liu@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/1-v3-f7729924a7ea+25e33-vfio_kvm_no_group_jgg@nvidia.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>


# d2912cb1 04-Jun-2019 Thomas Gleixner <tglx@linutronix.de>

treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 500

Based on 2 normalized pattern(s):

this program is free software you can redistribute it and or modify
it under the terms of the gnu general public license version 2 as
published by the free software foundation

this program is free software you can redistribute it and or modify
it under the terms of the gnu general public license version 2 as
published by the free software foundation #

extracted by the scancode license scanner the SPDX license identifier

GPL-2.0-only

has been chosen to replace the boilerplate/reference in 4122 file(s).

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Enrico Weigelt <info@metux.net>
Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Allison Randal <allison@lohutok.net>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190604081206.933168790@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


# b12ce36a 11-Feb-2019 Ben Gardon <bgardon@google.com>

kvm: Add memcg accounting to KVM allocations

There are many KVM kernel memory allocations which are tied to the life of
the VM process and should be charged to the VM process's cgroup. If the
allocations aren't tied to the process, the OOM killer will not know
that killing the process will free the associated kernel memory.
Add __GFP_ACCOUNT flags to many of the allocations which are not yet being
charged to the VM process's cgroup.

Tested:
Ran all kvm-unit-tests on a 64 bit Haswell machine, the patch
introduced no new failures.
Ran a kernel memory accounting test which creates a VM to touch
memory and then checks that the kernel memory allocated for the
process is within certain bounds.
With this patch we account for much more of the vmalloc and slab memory
allocated for the VM.

There remain a few allocations which should be charged to the VM's
cgroup but are not. In they include:
vcpu->run
kvm->coalesced_mmio_ring
There allocations are unaccounted in this patch because they are mapped
to userspace, and accounting them to a cgroup causes problems. This
should be addressed in a future patch.

Signed-off-by: Ben Gardon <bgardon@google.com>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>


# 5d6dee80 28-Jun-2017 Alex Williamson <alex.williamson@redhat.com>

vfio: New external user group/file match

At the point where the kvm-vfio pseudo device wants to release its
vfio group reference, we can't always acquire a new reference to make
that happen. The group can be in a state where we wouldn't allow a
new reference to be added. This new helper function allows a caller
to match a file to a group to facilitate this. Given a file and
group, report if they match. Thus the caller needs to already have a
group reference to match to the file. This allows the deletion of a
group without acquiring a new reference.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Tested-by: Eric Auger <eric.auger@redhat.com>
Cc: stable@vger.kernel.org


# e323369b 28-Jun-2017 Alex Williamson <alex.williamson@redhat.com>

kvm-vfio: Decouple only when we match a group

Unset-KVM and decrement-assignment only when we find the group in our
list. Otherwise we can get out of sync if the user triggers this for
groups that aren't currently on our list.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Tested-by: Eric Auger <eric.auger@redhat.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Cc: stable@vger.kernel.org


# 121f80ba 21-Mar-2017 Alexey Kardashevskiy <aik@ozlabs.ru>

KVM: PPC: VFIO: Add in-kernel acceleration for VFIO

This allows the host kernel to handle H_PUT_TCE, H_PUT_TCE_INDIRECT
and H_STUFF_TCE requests targeted an IOMMU TCE table used for VFIO
without passing them to user space which saves time on switching
to user space and back.

This adds H_PUT_TCE/H_PUT_TCE_INDIRECT/H_STUFF_TCE handlers to KVM.
KVM tries to handle a TCE request in the real mode, if failed
it passes the request to the virtual mode to complete the operation.
If it a virtual mode handler fails, the request is passed to
the user space; this is not expected to happen though.

To avoid dealing with page use counters (which is tricky in real mode),
this only accelerates SPAPR TCE IOMMU v2 clients which are required
to pre-register the userspace memory. The very first TCE request will
be handled in the VFIO SPAPR TCE driver anyway as the userspace view
of the TCE table (iommu_table::it_userspace) is not allocated till
the very first mapping happens and we cannot call vmalloc in real mode.

If we fail to update a hardware IOMMU table unexpected reason, we just
clear it and move on as there is nothing really we can do about it -
for example, if we hot plug a VFIO device to a guest, existing TCE tables
will be mirrored automatically to the hardware and there is no interface
to report to the guest about possible failures.

This adds new attribute - KVM_DEV_VFIO_GROUP_SET_SPAPR_TCE - to
the VFIO KVM device. It takes a VFIO group fd and SPAPR TCE table fd
and associates a physical IOMMU table with the SPAPR TCE table (which
is a guest view of the hardware IOMMU table). The iommu_table object
is cached and referenced so we do not have to look up for it in real mode.

This does not implement the UNSET counterpart as there is no use for it -
once the acceleration is enabled, the existing userspace won't
disable it unless a VFIO container is destroyed; this adds necessary
cleanup to the KVM_DEV_VFIO_GROUP_DEL handler.

This advertises the new KVM_CAP_SPAPR_TCE_VFIO capability to the user
space.

This adds real mode version of WARN_ON_ONCE() as the generic version
causes problems with rcu_sched. Since we testing what vmalloc_to_phys()
returns in the code, this also adds a check for already existing
vmalloc_to_phys() call in kvmppc_rm_h_put_tce_indirect().

This finally makes use of vfio_external_user_iommu_id() which was
introduced quite some time ago and was considered for removal.

Tests show that this patch increases transmission speed from 220MB/s
to 750..1020MB/s on 10Gb network (Chelsea CXGB3 10Gb ethernet card).

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Acked-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>


# 2fc1bec1 30-Nov-2016 Jike Song <jike.song@intel.com>

kvm: set/clear kvm to/from vfio_group when group add/delete

Sometimes users need to be aware when a vfio_group attaches to a
KVM or detaches from it. KVM already calls get/put method from vfio to
manipulate the vfio_group reference, it can notify vfio_group in
a similar way.

Cc: Kirti Wankhede <kwankhede@nvidia.com>
Cc: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Signed-off-by: Jike Song <jike.song@intel.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>


# 5544eb9b 07-Jul-2015 Paolo Bonzini <pbonzini@redhat.com>

KVM: count number of assigned devices

If there are no assigned devices, the guest PAT are not providing
any useful information and can be overridden to writeback; VMX
always does this because it has the "IPAT" bit in its extended
page table entries, but SVM does not have anything similar.
Hook into VFIO and legacy device assignment so that they
provide this information to KVM.

Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Tested-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>


# 571ee1b6 09-Oct-2014 Wanpeng Li <wanpeng.li@linux.intel.com>

kvm: vfio: fix unregister kvm_device_ops of vfio

After commit 80ce163 (KVM: VFIO: register kvm_device_ops dynamically),
kvm_device_ops of vfio can be registered dynamically. Commit 3c3c29fd
(kvm-vfio: do not use module_init) move the dynamic register invoked by
kvm_init in order to fix broke unloading of the kvm module. However,
kvm_device_ops of vfio is unregistered after rmmod kvm-intel module
which lead to device type collision detection warning after kvm-intel
module reinsmod.

WARNING: CPU: 1 PID: 10358 at /root/cathy/kvm/arch/x86/kvm/../../../virt/kvm/kvm_main.c:3289 kvm_init+0x234/0x282 [kvm]()
Modules linked in: kvm_intel(O+) kvm(O) nfsv3 nfs_acl auth_rpcgss oid_registry nfsv4 dns_resolver nfs fscache lockd sunrpc pci_stub bridge stp llc autofs4 8021q cpufreq_ondemand ipv6 joydev microcode pcspkr igb i2c_algo_bit ehci_pci ehci_hcd e1000e i2c_i801 ixgbe ptp pps_core hwmon mdio tpm_tis tpm ipmi_si ipmi_msghandler acpi_cpufreq isci libsas scsi_transport_sas button dm_mirror dm_region_hash dm_log dm_mod [last unloaded: kvm_intel]
CPU: 1 PID: 10358 Comm: insmod Tainted: G W O 3.17.0-rc1 #2
Hardware name: Intel Corporation S2600CP/S2600CP, BIOS RMLSDP.86I.00.29.D696.1311111329 11/11/2013
0000000000000cd9 ffff880ff08cfd18 ffffffff814a61d9 0000000000000cd9
0000000000000000 ffff880ff08cfd58 ffffffff810417b7 ffff880ff08cfd48
ffffffffa045bcac ffffffffa049c420 0000000000000040 00000000000000ff
Call Trace:
[<ffffffff814a61d9>] dump_stack+0x49/0x60
[<ffffffff810417b7>] warn_slowpath_common+0x7c/0x96
[<ffffffffa045bcac>] ? kvm_init+0x234/0x282 [kvm]
[<ffffffff810417e6>] warn_slowpath_null+0x15/0x17
[<ffffffffa045bcac>] kvm_init+0x234/0x282 [kvm]
[<ffffffffa016e995>] vmx_init+0x1bf/0x42a [kvm_intel]
[<ffffffffa016e7d6>] ? vmx_check_processor_compat+0x64/0x64 [kvm_intel]
[<ffffffff810002ab>] do_one_initcall+0xe3/0x170
[<ffffffff811168a9>] ? __vunmap+0xad/0xb8
[<ffffffff8109c58f>] do_init_module+0x2b/0x174
[<ffffffff8109d414>] load_module+0x43e/0x569
[<ffffffff8109c6d8>] ? do_init_module+0x174/0x174
[<ffffffff8109c75a>] ? copy_module_from_user+0x39/0x82
[<ffffffff8109b7dd>] ? module_sect_show+0x20/0x20
[<ffffffff8109d65f>] SyS_init_module+0x54/0x81
[<ffffffff814a9a12>] system_call_fastpath+0x16/0x1b
---[ end trace 0626f4a3ddea56f3 ]---

The bug can be reproduced by:

rmmod kvm_intel.ko
insmod kvm_intel.ko

without rmmod/insmod kvm.ko
This patch fixes the bug by unregistering kvm_device_ops of vfio when the
kvm-intel module is removed.

Reported-by: Liu Rongrong <rongrongx.liu@intel.com>
Fixes: 3c3c29fd0d7cddc32862c350d0700ce69953e3bd
Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>


# 3c3c29fd 24-Sep-2014 Paolo Bonzini <pbonzini@redhat.com>

kvm-vfio: do not use module_init

/me got confused between the kernel and QEMU. In the kernel, you can
only have one module_init function, and it will prevent unloading the
module unless you also have the corresponding module_exit function.

So, commit 80ce1639727e (KVM: VFIO: register kvm_device_ops dynamically,
2014-09-02) broke unloading of the kvm module, by adding a module_init
function and no module_exit.

Repair it by making kvm_vfio_ops_init weak, and checking it in
kvm_init.

Cc: Will Deacon <will.deacon@arm.com>
Cc: Gleb Natapov <gleb@kernel.org>
Cc: Alex Williamson <Alex.Williamson@redhat.com>
Fixes: 80ce1639727e9d38729c34f162378508c307ca25
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>


# 80ce1639 02-Sep-2014 Will Deacon <will@kernel.org>

KVM: VFIO: register kvm_device_ops dynamically

Now that we have a dynamic means to register kvm_device_ops, use that
for the VFIO kvm device, instead of relying on the static table.

This is achieved by a module_init call to register the ops with KVM.

Cc: Gleb Natapov <gleb@kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Acked-by: Alex Williamson <Alex.Williamson@redhat.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>


# 9d830d47 26-Feb-2014 Alex Williamson <alex.williamson@redhat.com>

kvm/vfio: Support for DMA coherent IOMMUs

VFIO now has support for using the IOMMU_CACHE flag and a mechanism
for an external user to test the current operating mode of the IOMMU.
Add support for this to the kvm-vfio pseudo device so that we only
register noncoherent DMA when necessary.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Cc: Gleb Natapov <gleb@kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>


# e81d1ad3 09-Jan-2014 Paul Bolle <pebolle@tiscali.nl>

kvm: vfio: silence GCC warning

Building vfio.o triggers a GCC warning (when building for 32 bits x86):
arch/x86/kvm/../../../virt/kvm/vfio.c: In function 'kvm_vfio_set_group':
arch/x86/kvm/../../../virt/kvm/vfio.c:104:22: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
void __user *argp = (void __user *)arg;
^

Silence this warning by casting arg to unsigned long.

argp's current type, "void __user *", is always casted to "int32_t
__user *". So its type might as well be changed to "int32_t __user *".

Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>


# e0f0bbc5 30-Oct-2013 Alex Williamson <alex.williamson@redhat.com>

kvm: Create non-coherent DMA registeration

We currently use some ad-hoc arch variables tied to legacy KVM device
assignment to manage emulation of instructions that depend on whether
non-coherent DMA is present. Create an interface for this, adapting
legacy KVM device assignment and adding VFIO via the KVM-VFIO device.
For now we assume that non-coherent DMA is possible any time we have a
VFIO group. Eventually an interface can be developed as part of the
VFIO external user interface to query the coherency of a group.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>


# ec53500f 30-Oct-2013 Alex Williamson <alex.williamson@redhat.com>

kvm: Add VFIO device

So far we've succeeded at making KVM and VFIO mostly unaware of each
other, but areas are cropping up where a connection beyond eventfds
and irqfds needs to be made. This patch introduces a KVM-VFIO device
that is meant to be a gateway for such interaction. The user creates
the device and can add and remove VFIO groups to it via file
descriptors. When a group is added, KVM verifies the group is valid
and gets a reference to it via the VFIO external user interface.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>