#
1c71222e |
|
26-Jan-2023 |
Suren Baghdasaryan <surenb@google.com> |
mm: replace vma->vm_flags direct modifications with modifier calls Replace direct modifications to vma->vm_flags with calls to modifier functions to be able to track flag changes and to keep vma locking correctness. [akpm@linux-foundation.org: fix drivers/misc/open-dice.c, per Hyeonggon Yoo] Link: https://lkml.kernel.org/r/20230126193752.297968-5-surenb@google.com Signed-off-by: Suren Baghdasaryan <surenb@google.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Mel Gorman <mgorman@techsingularity.net> Acked-by: Mike Rapoport (IBM) <rppt@kernel.org> Acked-by: Sebastian Reichel <sebastian.reichel@collabora.com> Reviewed-by: Liam R. Howlett <Liam.Howlett@Oracle.com> Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arjun Roy <arjunroy@google.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: David Hildenbrand <david@redhat.com> Cc: David Howells <dhowells@redhat.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: David Rientjes <rientjes@google.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Greg Thelen <gthelen@google.com> Cc: Hugh Dickins <hughd@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jann Horn <jannh@google.com> Cc: Joel Fernandes <joelaf@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Kent Overstreet <kent.overstreet@linux.dev> Cc: Laurent Dufour <ldufour@linux.ibm.com> Cc: Lorenzo Stoakes <lstoakes@gmail.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Minchan Kim <minchan@google.com> Cc: Paul E. McKenney <paulmck@kernel.org> Cc: Peter Oskolkov <posk@google.com> Cc: Peter Xu <peterx@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Punit Agrawal <punit.agrawal@bytedance.com> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Shakeel Butt <shakeelb@google.com> Cc: Soheil Hassas Yeganeh <soheil@google.com> Cc: Song Liu <songliubraving@fb.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
#
5c13a4a0 |
|
02-Oct-2022 |
M. Vefa Bicakci <m.v.b@runbox.com> |
xen/gntdev: Accommodate VMA splitting Prior to this commit, the gntdev driver code did not handle the following scenario correctly with paravirtualized (PV) Xen domains: * User process sets up a gntdev mapping composed of two grant mappings (i.e., two pages shared by another Xen domain). * User process munmap()s one of the pages. * User process munmap()s the remaining page. * User process exits. In the scenario above, the user process would cause the kernel to log the following messages in dmesg for the first munmap(), and the second munmap() call would result in similar log messages: BUG: Bad page map in process doublemap.test pte:... pmd:... page:0000000057c97bff refcount:1 mapcount:-1 \ mapping:0000000000000000 index:0x0 pfn:... ... page dumped because: bad pte ... file:gntdev fault:0x0 mmap:gntdev_mmap [xen_gntdev] readpage:0x0 ... Call Trace: <TASK> dump_stack_lvl+0x46/0x5e print_bad_pte.cold+0x66/0xb6 unmap_page_range+0x7e5/0xdc0 unmap_vmas+0x78/0xf0 unmap_region+0xa8/0x110 __do_munmap+0x1ea/0x4e0 __vm_munmap+0x75/0x120 __x64_sys_munmap+0x28/0x40 do_syscall_64+0x38/0x90 entry_SYSCALL_64_after_hwframe+0x61/0xcb ... For each munmap() call, the Xen hypervisor (if built with CONFIG_DEBUG) would print out the following and trigger a general protection fault in the affected Xen PV domain: (XEN) d0v... Attempt to implicitly unmap d0's grant PTE ... (XEN) d0v... Attempt to implicitly unmap d0's grant PTE ... As of this writing, gntdev_grant_map structure's vma field (referred to as map->vma below) is mainly used for checking the start and end addresses of mappings. However, with split VMAs, these may change, and there could be more than one VMA associated with a gntdev mapping. Hence, remove the use of map->vma and rely on map->pages_vm_start for the original start address and on (map->count << PAGE_SHIFT) for the original mapping size. Let the invalidate() and find_special_page() hooks use these. Also, given that there can be multiple VMAs associated with a gntdev mapping, move the "mmu_interval_notifier_remove(&map->notifier)" call to the end of gntdev_put_map, so that the MMU notifier is only removed after the closing of the last remaining VMA. Finally, use an atomic to prevent inadvertent gntdev mapping re-use, instead of using the map->live_grants atomic counter and/or the map->vma pointer (the latter of which is now removed). This prevents the userspace from mmap()'ing (with MAP_FIXED) a gntdev mapping over the same address range as a previously set up gntdev mapping. This scenario can be summarized with the following call-trace, which was valid prior to this commit: mmap gntdev_mmap mmap (repeat mmap with MAP_FIXED over the same address range) gntdev_invalidate unmap_grant_pages (sets 'being_removed' entries to true) gnttab_unmap_refs_async unmap_single_vma gntdev_mmap (maps the shared pages again) munmap gntdev_invalidate unmap_grant_pages (no-op because 'being_removed' entries are true) unmap_single_vma (For PV domains, Xen reports that a granted page is being unmapped and triggers a general protection fault in the affected domain, if Xen was built with CONFIG_DEBUG) The fix for this last scenario could be worth its own commit, but we opted for a single commit, because removing the gntdev_grant_map structure's vma field requires guarding the entry to gntdev_mmap(), and the live_grants atomic counter is not sufficient on its own to prevent the mmap() over a pre-existing mapping. Link: https://github.com/QubesOS/qubes-issues/issues/7631 Fixes: ab31523c2fca ("xen/gntdev: allow usermode to map granted pages") Cc: stable@vger.kernel.org Signed-off-by: M. Vefa Bicakci <m.v.b@runbox.com> Reviewed-by: Juergen Gross <jgross@suse.com> Link: https://lore.kernel.org/r/20221002222006.2077-3-m.v.b@runbox.com Signed-off-by: Juergen Gross <jgross@suse.com>
|
#
0991028c |
|
02-Oct-2022 |
M. Vefa Bicakci <m.v.b@runbox.com> |
xen/gntdev: Prevent leaking grants Prior to this commit, if a grant mapping operation failed partially, some of the entries in the map_ops array would be invalid, whereas all of the entries in the kmap_ops array would be valid. This in turn would cause the following logic in gntdev_map_grant_pages to become invalid: for (i = 0; i < map->count; i++) { if (map->map_ops[i].status == GNTST_okay) { map->unmap_ops[i].handle = map->map_ops[i].handle; if (!use_ptemod) alloced++; } if (use_ptemod) { if (map->kmap_ops[i].status == GNTST_okay) { if (map->map_ops[i].status == GNTST_okay) alloced++; map->kunmap_ops[i].handle = map->kmap_ops[i].handle; } } } ... atomic_add(alloced, &map->live_grants); Assume that use_ptemod is true (i.e., the domain mapping the granted pages is a paravirtualized domain). In the code excerpt above, note that the "alloced" variable is only incremented when both kmap_ops[i].status and map_ops[i].status are set to GNTST_okay (i.e., both mapping operations are successful). However, as also noted above, there are cases where a grant mapping operation fails partially, breaking the assumption of the code excerpt above. The aforementioned causes map->live_grants to be incorrectly set. In some cases, all of the map_ops mappings fail, but all of the kmap_ops mappings succeed, meaning that live_grants may remain zero. This in turn makes it impossible to unmap the successfully grant-mapped pages pointed to by kmap_ops, because unmap_grant_pages has the following snippet of code at its beginning: if (atomic_read(&map->live_grants) == 0) return; /* Nothing to do */ In other cases where only some of the map_ops mappings fail but all kmap_ops mappings succeed, live_grants is made positive, but when the user requests unmapping the grant-mapped pages, __unmap_grant_pages_done will then make map->live_grants negative, because the latter function does not check if all of the pages that were requested to be unmapped were actually unmapped, and the same function unconditionally subtracts "data->count" (i.e., a value that can be greater than map->live_grants) from map->live_grants. The side effects of a negative live_grants value have not been studied. The net effect of all of this is that grant references are leaked in one of the above conditions. In Qubes OS v4.1 (which uses Xen's grant mechanism extensively for X11 GUI isolation), this issue manifests itself with warning messages like the following to be printed out by the Linux kernel in the VM that had granted pages (that contain X11 GUI window data) to dom0: "g.e. 0x1234 still pending", especially after the user rapidly resizes GUI VM windows (causing some grant-mapping operations to partially or completely fail, due to the fact that the VM unshares some of the pages as part of the window resizing, making the pages impossible to grant-map from dom0). The fix for this issue involves counting all successful map_ops and kmap_ops mappings separately, and then adding the sum to live_grants. During unmapping, only the number of successfully unmapped grants is subtracted from live_grants. The code is also modified to check for negative live_grants values after the subtraction and warn the user. Link: https://github.com/QubesOS/qubes-issues/issues/7631 Fixes: dbe97cff7dd9 ("xen/gntdev: Avoid blocking in unmap_grant_pages()") Cc: stable@vger.kernel.org Signed-off-by: M. Vefa Bicakci <m.v.b@runbox.com> Acked-by: Demi Marie Obenour <demi@invisiblethingslab.com> Reviewed-by: Juergen Gross <jgross@suse.com> Link: https://lore.kernel.org/r/20221002222006.2077-2-m.v.b@runbox.com Signed-off-by: Juergen Gross <jgross@suse.com>
|
#
166d3863 |
|
10-Jul-2022 |
Demi Marie Obenour <demi@invisiblethingslab.com> |
xen/gntdev: Ignore failure to unmap INVALID_GRANT_HANDLE The error paths of gntdev_mmap() can call unmap_grant_pages() even though not all of the pages have been successfully mapped. This will trigger the WARN_ON()s in __unmap_grant_pages_done(). The number of warnings can be very large; I have observed thousands of lines of warnings in the systemd journal. Avoid this problem by only warning on unmapping failure if the handle being unmapped is not INVALID_GRANT_HANDLE. The handle field of any page that was not successfully mapped will be INVALID_GRANT_HANDLE, so this catches all cases where unmapping can legitimately fail. Fixes: dbe97cff7dd9 ("xen/gntdev: Avoid blocking in unmap_grant_pages()") Cc: stable@vger.kernel.org Suggested-by: Juergen Gross <jgross@suse.com> Signed-off-by: Demi Marie Obenour <demi@invisiblethingslab.com> Reviewed-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> Reviewed-by: Juergen Gross <jgross@suse.com> Link: https://lore.kernel.org/r/20220710230522.1563-1-demi@invisiblethingslab.com Signed-off-by: Juergen Gross <jgross@suse.com>
|
#
dbe97cff |
|
21-Jun-2022 |
Demi Marie Obenour <demi@invisiblethingslab.com> |
xen/gntdev: Avoid blocking in unmap_grant_pages() unmap_grant_pages() currently waits for the pages to no longer be used. In https://github.com/QubesOS/qubes-issues/issues/7481, this lead to a deadlock against i915: i915 was waiting for gntdev's MMU notifier to finish, while gntdev was waiting for i915 to free its pages. I also believe this is responsible for various deadlocks I have experienced in the past. Avoid these problems by making unmap_grant_pages async. This requires making it return void, as any errors will not be available when the function returns. Fortunately, the only use of the return value is a WARN_ON(), which can be replaced by a WARN_ON when the error is detected. Additionally, a failed call will not prevent further calls from being made, but this is harmless. Because unmap_grant_pages is now async, the grant handle will be sent to INVALID_GRANT_HANDLE too late to prevent multiple unmaps of the same handle. Instead, a separate bool array is allocated for this purpose. This wastes memory, but stuffing this information in padding bytes is too fragile. Furthermore, it is necessary to grab a reference to the map before making the asynchronous call, and release the reference when the call returns. It is also necessary to guard against reentrancy in gntdev_map_put(), and to handle the case where userspace tries to map a mapping whose contents have not all been freed yet. Fixes: 745282256c75 ("xen/gntdev: safely unmap grants in case they are still in use") Cc: stable@vger.kernel.org Signed-off-by: Demi Marie Obenour <demi@invisiblethingslab.com> Reviewed-by: Juergen Gross <jgross@suse.com> Link: https://lore.kernel.org/r/20220622022726.2538-1-demi@invisiblethingslab.com Signed-off-by: Juergen Gross <jgross@suse.com>
|
#
ce2f46f3 |
|
10-Dec-2021 |
Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com> |
xen/gntdev: fix unmap notification order While working with Xen's libxenvchan library I have faced an issue with unmap notifications sent in wrong order if both UNMAP_NOTIFY_SEND_EVENT and UNMAP_NOTIFY_CLEAR_BYTE were requested: first we send an event channel notification and then clear the notification byte which renders in the below inconsistency (cli_live is the byte which was requested to be cleared on unmap): [ 444.514243] gntdev_put_map UNMAP_NOTIFY_SEND_EVENT map->notify.event 6 libxenvchan_is_open cli_live 1 [ 444.515239] __unmap_grant_pages UNMAP_NOTIFY_CLEAR_BYTE at 14 Thus it is not possible to reliably implement the checks like - wait for the notification (UNMAP_NOTIFY_SEND_EVENT) - check the variable (UNMAP_NOTIFY_CLEAR_BYTE) because it is possible that the variable gets checked before it is cleared by the kernel. To fix that we need to re-order the notifications, so the variable is first gets cleared and then the event channel notification is sent. With this fix I can see the correct order of execution: [ 54.522611] __unmap_grant_pages UNMAP_NOTIFY_CLEAR_BYTE at 14 [ 54.537966] gntdev_put_map UNMAP_NOTIFY_SEND_EVENT map->notify.event 6 libxenvchan_is_open cli_live 0 Cc: stable@vger.kernel.org Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Link: https://lore.kernel.org/r/20211210092817.580718-1-andr2000@gmail.com Signed-off-by: Juergen Gross <jgross@suse.com>
|
#
f28347cc |
|
17-Sep-2021 |
Jan Beulich <jbeulich@suse.com> |
Xen/gntdev: don't ignore kernel unmapping error While working on XSA-361 and its follow-ups, I failed to spot another place where the kernel mapping part of an operation was not treated the same as the user space part. Detect and propagate errors and add a 2nd pr_debug(). Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Juergen Gross <jgross@suse.com> Link: https://lore.kernel.org/r/c2513395-74dc-aea3-9192-fd265aa44e35@suse.com Signed-off-by: Juergen Gross <jgross@suse.com>
|
#
30dcc56b |
|
30-Jul-2021 |
Juergen Gross <jgross@suse.com> |
xen: assume XENFEAT_gnttab_map_avail_bits being set for pv guests XENFEAT_gnttab_map_avail_bits is always set in Xen 4.0 and newer. Remove coding assuming it might be zero. Signed-off-by: Juergen Gross <jgross@suse.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Link: https://lore.kernel.org/r/20210730071804.4302-4-jgross@suse.com Signed-off-by: Juergen Gross <jgross@suse.com>
|
#
970655aa |
|
22-Apr-2021 |
Juergen Gross <jgross@suse.com> |
xen/gntdev: fix gntdev_mmap() error exit path Commit d3eeb1d77c5d0af ("xen/gntdev: use mmu_interval_notifier_insert") introduced an error in gntdev_mmap(): in case the call of mmu_interval_notifier_insert_locked() fails the exit path should not call mmu_interval_notifier_remove(), as this might result in NULL dereferences. One reason for failure is e.g. a signal pending for the running process. Fixes: d3eeb1d77c5d0af ("xen/gntdev: use mmu_interval_notifier_insert") Cc: stable@vger.kernel.org Reported-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> Tested-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Luca Fancellu <luca.fancellu@arm.com> Link: https://lore.kernel.org/r/20210423054038.26696-1-jgross@suse.com Signed-off-by: Juergen Gross <jgross@suse.com>
|
#
f1d20d86 |
|
10-Mar-2021 |
Jan Beulich <jbeulich@suse.com> |
Xen/gntdev: don't needlessly use kvcalloc() Requesting zeroed memory when all of it will be overwritten subsequently by all ones is a waste of processing bandwidth. In fact, rather than recording zeroed ->grants[], fill that array too with more appropriate "invalid" indicators. Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Juergen Gross <jgross@suse.com> Link: https://lore.kernel.org/r/9a726be2-4893-8ffe-0ef1-b70dd1c229b1@suse.com Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
|
#
bce21a2b |
|
10-Mar-2021 |
Jan Beulich <jbeulich@suse.com> |
Xen/gnttab: introduce common INVALID_GRANT_{HANDLE,REF} It's not helpful if every driver has to cook its own. Generalize xenbus'es INVALID_GRANT_HANDLE and pcifront's INVALID_GRANT_REF (which shouldn't have expanded to zero to begin with). Use the constants in p2m.c and gntdev.c right away, and update field types where necessary so they would match with the constants' types (albeit without touching struct ioctl_gntdev_grant_ref's ref field, as that's part of the public interface of the kernel and would require introducing a dependency on Xen's grant_table.h public header). Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Juergen Gross <jgross@suse.com> Link: https://lore.kernel.org/r/db7c38a5-0d75-d5d1-19de-e5fe9f0b9c48@suse.com Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
|
#
36caa3fe |
|
10-Mar-2021 |
Jan Beulich <jbeulich@suse.com> |
Xen/gntdev: don't needlessly allocate k{,un}map_ops[] They're needed only in the not-auto-translate (i.e. PV) case; there's no point in allocating memory that's never going to get accessed. Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Juergen Gross <jgross@suse.com> Link: https://lore.kernel.org/r/180d50cb-5531-8952-4bf0-d65c554638ed@suse.com Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
|
#
ebee0eab |
|
15-Feb-2021 |
Jan Beulich <jbeulich@suse.com> |
Xen/gntdev: correct error checking in gntdev_map_grant_pages() Failure of the kernel part of the mapping operation should also be indicated as an error to the caller, or else it may assume the respective kernel VA is okay to access. Furthermore gnttab_map_refs() failing still requires recording successfully mapped handles, so they can be unmapped subsequently. This in turn requires there to be a way to tell full hypercall failure from partial success - preset map_op status fields such that they won't "happen" to look as if the operation succeeded. Also again use GNTST_okay instead of implying its value (zero). This is part of XSA-361. Signed-off-by: Jan Beulich <jbeulich@suse.com> Cc: stable@vger.kernel.org Reviewed-by: Juergen Gross <jgross@suse.com> Signed-off-by: Juergen Gross <jgross@suse.com>
|
#
dbe52836 |
|
15-Feb-2021 |
Jan Beulich <jbeulich@suse.com> |
Xen/gntdev: correct dev_bus_addr handling in gntdev_map_grant_pages() We may not skip setting the field in the unmap structure when GNTMAP_device_map is in use - such an unmap would fail to release the respective resources (a page ref in the hypervisor). Otoh the field doesn't need setting at all when GNTMAP_device_map is not in use. To record the value for unmapping, we also better don't use our local p2m: In particular after a subsequent change it may not have got updated for all the batch elements. Instead it can simply be taken from the respective map's results. We can additionally avoid playing this game altogether for the kernel part of the mappings in (x86) PV mode. This is part of XSA-361. Signed-off-by: Jan Beulich <jbeulich@suse.com> Cc: stable@vger.kernel.org Reviewed-by: Stefano Stabellini <sstabellini@kernel.org> Signed-off-by: Juergen Gross <jgross@suse.com>
|
#
d6bbc2ff |
|
05-Sep-2020 |
Souptick Joarder <jrdr.linux@gmail.com> |
xen/gntdev.c: Convert get_user_pages*() to pin_user_pages*() In 2019, we introduced pin_user_pages*() and now we are converting get_user_pages*() to the new API as appropriate. [1] & [2] could be referred for more information. This is case 5 as per document [1]. [1] Documentation/core-api/pin_user_pages.rst [2] "Explicit pinning of user-space pages": https://lwn.net/Articles/807108/ Signed-off-by: Souptick Joarder <jrdr.linux@gmail.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Juergen Gross <jgross@suse.com> Cc: David Vrabel <david.vrabel@citrix.com> Link: https://lore.kernel.org/r/1599375114-32360-2-git-send-email-jrdr.linux@gmail.com Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
|
#
77905584 |
|
05-Sep-2020 |
Souptick Joarder <jrdr.linux@gmail.com> |
xen/gntdev.c: Mark pages as dirty There seems to be a bug in the original code when gntdev_get_page() is called with writeable=true then the page needs to be marked dirty before being put. To address this, a bool writeable is added in gnt_dev_copy_batch, set it in gntdev_grant_copy_seg() (and drop `writeable` argument to gntdev_get_page()) and then, based on batch->writeable, use set_page_dirty_lock(). Fixes: a4cdb556cae0 (xen/gntdev: add ioctl for grant copy) Suggested-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Souptick Joarder <jrdr.linux@gmail.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Juergen Gross <jgross@suse.com> Cc: David Vrabel <david.vrabel@citrix.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/1599375114-32360-1-git-send-email-jrdr.linux@gmail.com Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
|
#
c1e8d7c6 |
|
08-Jun-2020 |
Michel Lespinasse <walken@google.com> |
mmap locking API: convert mmap_sem comments Convert comments that reference mmap_sem to reference mmap_lock instead. [akpm@linux-foundation.org: fix up linux-next leftovers] [akpm@linux-foundation.org: s/lockaphore/lock/, per Vlastimil] [akpm@linux-foundation.org: more linux-next fixups, per Michel] Signed-off-by: Michel Lespinasse <walken@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com> Cc: Davidlohr Bueso <dbueso@suse.de> Cc: David Rientjes <rientjes@google.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Jerome Glisse <jglisse@redhat.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Laurent Dufour <ldufour@linux.ibm.com> Cc: Liam Howlett <Liam.Howlett@oracle.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ying Han <yinghan@google.com> Link: http://lkml.kernel.org/r/20200520052908.204642-13-walken@google.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
d8ed45c5 |
|
08-Jun-2020 |
Michel Lespinasse <walken@google.com> |
mmap locking API: use coccinelle to convert mmap_sem rwsem call sites This change converts the existing mmap_sem rwsem calls to use the new mmap locking API instead. The change is generated using coccinelle with the following rule: // spatch --sp-file mmap_lock_api.cocci --in-place --include-headers --dir . @@ expression mm; @@ ( -init_rwsem +mmap_init_lock | -down_write +mmap_write_lock | -down_write_killable +mmap_write_lock_killable | -down_write_trylock +mmap_write_trylock | -up_write +mmap_write_unlock | -downgrade_write +mmap_write_downgrade | -down_read +mmap_read_lock | -down_read_killable +mmap_read_lock_killable | -down_read_trylock +mmap_read_trylock | -up_read +mmap_read_unlock ) -(&mm->mmap_sem) +(mm) Signed-off-by: Michel Lespinasse <walken@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com> Reviewed-by: Laurent Dufour <ldufour@linux.ibm.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Cc: Davidlohr Bueso <dbueso@suse.de> Cc: David Rientjes <rientjes@google.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Jerome Glisse <jglisse@redhat.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Liam Howlett <Liam.Howlett@oracle.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ying Han <yinghan@google.com> Link: http://lkml.kernel.org/r/20200520052908.204642-5-walken@google.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
0102e4ef |
|
23-Mar-2020 |
Yan Yankovskyi <yyankovskyi@gmail.com> |
xen: Use evtchn_type_t as a type for event channels Make event channel functions pass event channel port using evtchn_port_t type. It eliminates signed <-> unsigned conversion. Signed-off-by: Yan Yankovskyi <yyankovskyi@gmail.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Link: https://lore.kernel.org/r/20200323152343.GA28422@kbp1-lhp-F74019 Signed-off-by: Juergen Gross <jgross@suse.com>
|
#
92937241 |
|
28-Jan-2020 |
Boris Ostrovsky <boris.ostrovsky@oracle.com> |
xen/gntdev: Do not use mm notifiers with autotranslating guests Commit d3eeb1d77c5d ("xen/gntdev: use mmu_interval_notifier_insert") missed a test for use_ptemod when calling mmu_interval_read_begin(). Fix that. Fixes: d3eeb1d77c5d ("xen/gntdev: use mmu_interval_notifier_insert") CC: stable@vger.kernel.org # 5.5 Reported-by: Ilpo Järvinen <ilpo.jarvinen@cs.helsinki.fi> Tested-by: Ilpo Järvinen <ilpo.jarvinen@cs.helsinki.fi> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Acked-by: Juergen Gross <jgross@suse.com>
|
#
b3f7931f |
|
06-Nov-2019 |
Juergen Gross <jgross@suse.com> |
xen/gntdev: switch from kcalloc() to kvcalloc() With sufficient many pages to map gntdev can reach order 9 allocation sizes. As there is no need to have physically contiguous buffers switch to kvcalloc() in order to avoid failing allocations. Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Juergen Gross <jgross@suse.com>
|
#
3b06ac67 |
|
06-Nov-2019 |
Juergen Gross <jgross@suse.com> |
xen/gntdev: replace global limit of mapped pages by limit per call Today there is a global limit of pages mapped via /dev/xen/gntdev set to 1 million pages per default. There is no reason why that limit is existing, as total number of grant mappings is limited by the hypervisor anyway and preferring kernel mappings over userspace ones doesn't make sense. It should be noted that the gntdev device is usable by root only. Additionally checking of that limit is fragile, as the number of pages to map via one call is specified in a 32-bit unsigned variable which isn't tested to stay within reasonable limits (the only test is the value to be <= zero, which basically excludes only calls without any mapping requested). So trying to map e.g. 0xffff0000 pages while already nearly 1000000 pages are mapped will effectively lower the global number of mapped pages such that a parallel call mapping a reasonable amount of pages can succeed in spite of the global limit being violated. So drop the global limit and introduce per call limit instead. This per call limit (default: 65536 grant mappings) protects against allocating insane large arrays in the kernel for doing a hypercall which will fail anyway in case a user is e.g. trying to map billions of pages. Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Juergen Gross <jgross@suse.com>
|
#
d41b26d8 |
|
10-Nov-2019 |
Colin Ian King <colin.king@canonical.com> |
xen/gntdev: remove redundant non-zero check on ret The non-zero check on ret is always going to be false because ret was initialized as zero and the only place it is set to non-zero contains a return path before the non-zero check. Hence the check is redundant and can be removed. [ jgross@suse.com: limit scope of ret ] Addresses-Coverity: ("Logically dead code") Signed-off-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: Juergen Gross <jgross@suse.com> Signed-off-by: Juergen Gross <jgross@suse.com>
|
#
d3eeb1d7 |
|
12-Nov-2019 |
Jason Gunthorpe <jgg@ziepe.ca> |
xen/gntdev: use mmu_interval_notifier_insert gntdev simply wants to monitor a specific VMA for any notifier events, this can be done straightforwardly using mmu_interval_notifier_insert() over the VMA's VA range. The notifier should be attached until the original VMA is destroyed. It is unclear if any of this is even sane, but at least a lot of duplicate code is removed. Link: https://lore.kernel.org/r/20191112202231.3856-15-jgg@ziepe.ca Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
#
ee7f5225 |
|
08-Oct-2019 |
Rob Herring <robh@kernel.org> |
xen: Stop abusing DT of_dma_configure API As the removed comments say, these aren't DT based devices. of_dma_configure() is going to stop allowing a NULL DT node and calling it will no longer work. The comment is also now out of date as of commit 9ab91e7c5c51 ("arm64: default to the direct mapping in get_arch_dma_ops"). Direct mapping is now the default rather than dma_dummy_ops. According to Stefano and Oleksandr, the only other part needed is setting the DMA masks and there's no reason to restrict the masks to 32-bits. So set the masks to 64 bits. Cc: Robin Murphy <robin.murphy@arm.com> Cc: Julien Grall <julien.grall@arm.com> Cc: Nicolas Saenz Julienne <nsaenzjulienne@suse.de> Cc: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Juergen Gross <jgross@suse.com> Cc: Stefano Stabellini <sstabellini@kernel.org> Cc: Christoph Hellwig <hch@lst.de> Cc: xen-devel@lists.xenproject.org Signed-off-by: Rob Herring <robh@kernel.org> Acked-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
|
#
8d1502f6 |
|
30-Jul-2019 |
Souptick Joarder <jrdr.linux@gmail.com> |
xen/gntdev.c: Replace vm_map_pages() with vm_map_pages_zero() 'commit df9bde015a72 ("xen/gntdev.c: convert to use vm_map_pages()")' breaks gntdev driver. If vma->vm_pgoff > 0, vm_map_pages() will: - use map->pages starting at vma->vm_pgoff instead of 0 - verify map->count against vma_pages()+vma->vm_pgoff instead of just vma_pages(). In practice, this breaks using a single gntdev FD for mapping multiple grants. relevant strace output: [pid 857] ioctl(7, IOCTL_GNTDEV_MAP_GRANT_REF, 0x7ffd3407b6d0) = 0 [pid 857] mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_SHARED, 7, 0) = 0x777f1211b000 [pid 857] ioctl(7, IOCTL_GNTDEV_SET_UNMAP_NOTIFY, 0x7ffd3407b710) = 0 [pid 857] ioctl(7, IOCTL_GNTDEV_MAP_GRANT_REF, 0x7ffd3407b6d0) = 0 [pid 857] mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_SHARED, 7, 0x1000) = -1 ENXIO (No such device or address) details here: https://github.com/QubesOS/qubes-issues/issues/5199 The reason is -> ( copying Marek's word from discussion) vma->vm_pgoff is used as index passed to gntdev_find_map_index. It's basically using this parameter for "which grant reference to map". map struct returned by gntdev_find_map_index() describes just the pages to be mapped. Specifically map->pages[0] should be mapped at vma->vm_start, not vma->vm_start+vma->vm_pgoff*PAGE_SIZE. When trying to map grant with index (aka vma->vm_pgoff) > 1, __vm_map_pages() will refuse to map it because it will expect map->count to be at least vma_pages(vma)+vma->vm_pgoff, while it is exactly vma_pages(vma). Converting vm_map_pages() to use vm_map_pages_zero() will fix the problem. Marek has tested and confirmed the same. Cc: stable@vger.kernel.org # v5.2+ Fixes: df9bde015a72 ("xen/gntdev.c: convert to use vm_map_pages()") Reported-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> Signed-off-by: Souptick Joarder <jrdr.linux@gmail.com> Tested-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Juergen Gross <jgross@suse.com>
|
#
8b1e0f81 |
|
11-Jul-2019 |
Anshuman Khandual <anshuman.khandual@arm.com> |
mm/pgtable: drop pgtable_t variable from pte_fn_t functions Drop the pgtable_t variable from all implementation for pte_fn_t as none of them use it. apply_to_pte_range() should stop computing it as well. Should help us save some cycles. Link: http://lkml.kernel.org/r/1556803126-26596-1-git-send-email-anshuman.khandual@arm.com Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Acked-by: Matthew Wilcox <willy@infradead.org> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Logan Gunthorpe <logang@deltatee.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: <jglisse@redhat.com> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
df9bde01 |
|
13-May-2019 |
Souptick Joarder <jrdr.linux@gmail.com> |
xen/gntdev.c: convert to use vm_map_pages() Convert to use vm_map_pages() to map range of kernel memory to user vma. map->count is passed to vm_map_pages() and internal API verify map->count against count ( count = vma_pages(vma)) for page array boundary overrun condition. Link: http://lkml.kernel.org/r/88e56e82d2db98705c2d842e9c9806c00b366d67.1552921225.git.jrdr.linux@gmail.com Signed-off-by: Souptick Joarder <jrdr.linux@gmail.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: David Airlie <airlied@linux.ie> Cc: Heiko Stuebner <heiko@sntech.de> Cc: Joerg Roedel <joro@8bytes.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Juergen Gross <jgross@suse.com> Cc: Kees Cook <keescook@chromium.org> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Kyungmin Park <kyungmin.park@samsung.com> Cc: Marek Szyprowski <m.szyprowski@samsung.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Mauro Carvalho Chehab <mchehab@infradead.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@linux.ibm.com> Cc: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com> Cc: Pawel Osciak <pawel@osciak.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@surriel.com> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Russell King <linux@armlinux.org.uk> Cc: Sandy Huang <hjc@rock-chips.com> Cc: Stefan Richter <stefanr@s5r6.in-berlin.de> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Thierry Reding <treding@nvidia.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
dfcd6660 |
|
13-May-2019 |
Jérôme Glisse <jglisse@redhat.com> |
mm/mmu_notifier: convert user range->blockable to helper function Use the mmu_notifier_range_blockable() helper function instead of directly dereferencing the range->blockable field. This is done to make it easier to change the mmu_notifier range field. This patch is the outcome of the following coccinelle patch: %<------------------------------------------------------------------- @@ identifier I1, FN; @@ FN(..., struct mmu_notifier_range *I1, ...) { <... -I1->blockable +mmu_notifier_range_blockable(I1) ...> } ------------------------------------------------------------------->% spatch --in-place --sp-file blockable.spatch --dir . Link: http://lkml.kernel.org/r/20190326164747.24405-3-jglisse@redhat.com Signed-off-by: Jérôme Glisse <jglisse@redhat.com> Reviewed-by: Ralph Campbell <rcampbell@nvidia.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Cc: Christian König <christian.koenig@amd.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Jani Nikula <jani.nikula@linux.intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Jan Kara <jack@suse.cz> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Peter Xu <peterx@redhat.com> Cc: Felix Kuehling <Felix.Kuehling@amd.com> Cc: Jason Gunthorpe <jgg@mellanox.com> Cc: Ross Zwisler <zwisler@kernel.org> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krcmar <rkrcmar@redhat.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Christian Koenig <christian.koenig@amd.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
73b0140b |
|
13-May-2019 |
Ira Weiny <ira.weiny@intel.com> |
mm/gup: change GUP fast to use flags rather than a write 'bool' To facilitate additional options to get_user_pages_fast() change the singular write parameter to be gup_flags. This patch does not change any functionality. New functionality will follow in subsequent patches. Some of the get_user_pages_fast() call sites were unchanged because they already passed FOLL_WRITE or 0 for the write parameter. NOTE: It was suggested to change the ordering of the get_user_pages_fast() arguments to ensure that callers were converted. This breaks the current GUP call site convention of having the returned pages be the final parameter. So the suggestion was rejected. Link: http://lkml.kernel.org/r/20190328084422.29911-4-ira.weiny@intel.com Link: http://lkml.kernel.org/r/20190317183438.2057-4-ira.weiny@intel.com Signed-off-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Mike Marshall <hubcap@omnibond.com> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dan Williams <dan.j.williams@intel.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Hogan <jhogan@kernel.org> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: John Hubbard <jhubbard@nvidia.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Rich Felker <dalias@libc.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
fa13e665 |
|
14-Feb-2019 |
Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com> |
xen/gntdev: Do not destroy context while dma-bufs are in use If there are exported DMA buffers which are still in use and grant device is closed by either normal user-space close or by a signal this leads to the grant device context to be destroyed, thus making it not possible to correctly destroy those exported buffers when they are returned back to gntdev and makes the module crash: [ 339.617540] [<ffff00000854c0d8>] dmabuf_exp_ops_release+0x40/0xa8 [ 339.617560] [<ffff00000867a6e8>] dma_buf_release+0x60/0x190 [ 339.617577] [<ffff0000082211f0>] __fput+0x88/0x1d0 [ 339.617589] [<ffff000008221394>] ____fput+0xc/0x18 [ 339.617607] [<ffff0000080ed4e4>] task_work_run+0x9c/0xc0 [ 339.617622] [<ffff000008089714>] do_notify_resume+0xfc/0x108 Fix this by referencing gntdev on each DMA buffer export and unreferencing on buffer release. Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com> Reviewed-by: Boris Ostrovsky@oracle.com> Signed-off-by: Juergen Gross <jgross@suse.com>
|
#
5d6527a7 |
|
28-Dec-2018 |
Jérôme Glisse <jglisse@redhat.com> |
mm/mmu_notifier: use structure for invalidate_range_start/end callback Patch series "mmu notifier contextual informations", v2. This patchset adds contextual information, why an invalidation is happening, to mmu notifier callback. This is necessary for user of mmu notifier that wish to maintains their own data structure without having to add new fields to struct vm_area_struct (vma). For instance device can have they own page table that mirror the process address space. When a vma is unmap (munmap() syscall) the device driver can free the device page table for the range. Today we do not have any information on why a mmu notifier call back is happening and thus device driver have to assume that it is always an munmap(). This is inefficient at it means that it needs to re-allocate device page table on next page fault and rebuild the whole device driver data structure for the range. Other use case beside munmap() also exist, for instance it is pointless for device driver to invalidate the device page table when the invalidation is for the soft dirtyness tracking. Or device driver can optimize away mprotect() that change the page table permission access for the range. This patchset enables all this optimizations for device drivers. I do not include any of those in this series but another patchset I am posting will leverage this. The patchset is pretty simple from a code point of view. The first two patches consolidate all mmu notifier arguments into a struct so that it is easier to add/change arguments. The last patch adds the contextual information (munmap, protection, soft dirty, clear, ...). This patch (of 3): To avoid having to change many callback definition everytime we want to add a parameter use a structure to group all parameters for the mmu_notifier invalidate_range_start/end callback. No functional changes with this patch. [akpm@linux-foundation.org: fix drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c kerneldoc] Link: http://lkml.kernel.org/r/20181205053628.3210-2-jglisse@redhat.com Signed-off-by: Jérôme Glisse <jglisse@redhat.com> Acked-by: Jan Kara <jack@suse.cz> Acked-by: Jason Gunthorpe <jgg@mellanox.com> [infiniband] Cc: Matthew Wilcox <mawilcox@microsoft.com> Cc: Ross Zwisler <zwisler@kernel.org> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krcmar <rkrcmar@redhat.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Christian Koenig <christian.koenig@amd.com> Cc: Felix Kuehling <felix.kuehling@amd.com> Cc: Ralph Campbell <rcampbell@nvidia.com> Cc: John Hubbard <jhubbard@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
58a57569 |
|
04-Sep-2018 |
Michal Hocko <mhocko@suse.com> |
xen/gntdev: fix up blockable calls to mn_invl_range_start Patch series "mmu_notifiers follow ups". Tetsuo has noticed some fallouts from 93065ac753e4 ("mm, oom: distinguish blockable mode for mmu notifiers"). One of them has been fixed and picked up by AMD/DRM maintainer [1]. XEN issue is fixed by patch 1. I have also clarified expectations about blockable semantic of invalidate_range_end. Finally the last patch removes MMU_INVALIDATE_DOES_NOT_BLOCK which is no longer used nor needed. [1] http://lkml.kernel.org/r/20180824135257.GU29735@dhcp22.suse.cz This patch (of 3): 93065ac753e4 ("mm, oom: distinguish blockable mode for mmu notifiers") has introduced blockable parameter to all mmu_notifiers and the notifier has to back off when called in !blockable case and it could block down the road. The above commit implemented that for mn_invl_range_start but both in_range checks are done unconditionally regardless of the blockable mode and as such they would fail all the time for regular calls. Fix this by checking blockable parameter as well. Once we are there we can remove the stale TODO. The lock has to be sleepable because we wait for completion down in gnttab_unmap_refs_sync. Link: http://lkml.kernel.org/r/20180827112623.8992-2-mhocko@kernel.org Fixes: 93065ac753e4 ("mm, oom: distinguish blockable mode for mmu notifiers") Signed-off-by: Michal Hocko <mhocko@suse.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Juergen Gross <jgross@suse.com> Cc: David Rientjes <rientjes@google.com> Cc: Jerome Glisse <jglisse@redhat.com> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Reviewed-by: Juergen Gross <jgross@suse.com> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
|
#
93065ac7 |
|
21-Aug-2018 |
Michal Hocko <mhocko@suse.com> |
mm, oom: distinguish blockable mode for mmu notifiers There are several blockable mmu notifiers which might sleep in mmu_notifier_invalidate_range_start and that is a problem for the oom_reaper because it needs to guarantee a forward progress so it cannot depend on any sleepable locks. Currently we simply back off and mark an oom victim with blockable mmu notifiers as done after a short sleep. That can result in selecting a new oom victim prematurely because the previous one still hasn't torn its memory down yet. We can do much better though. Even if mmu notifiers use sleepable locks there is no reason to automatically assume those locks are held. Moreover majority of notifiers only care about a portion of the address space and there is absolutely zero reason to fail when we are unmapping an unrelated range. Many notifiers do really block and wait for HW which is harder to handle and we have to bail out though. This patch handles the low hanging fruit. __mmu_notifier_invalidate_range_start gets a blockable flag and callbacks are not allowed to sleep if the flag is set to false. This is achieved by using trylock instead of the sleepable lock for most callbacks and continue as long as we do not block down the call chain. I think we can improve that even further because there is a common pattern to do a range lookup first and then do something about that. The first part can be done without a sleeping lock in most cases AFAICS. The oom_reaper end then simply retries if there is at least one notifier which couldn't make any progress in !blockable mode. A retry loop is already implemented to wait for the mmap_sem and this is basically the same thing. The simplest way for driver developers to test this code path is to wrap userspace code which uses these notifiers into a memcg and set the hard limit to hit the oom. This can be done e.g. after the test faults in all the mmu notifier managed memory and set the hard limit to something really small. Then we are looking for a proper process tear down. [akpm@linux-foundation.org: coding style fixes] [akpm@linux-foundation.org: minor code simplification] Link: http://lkml.kernel.org/r/20180716115058.5559-1-mhocko@kernel.org Signed-off-by: Michal Hocko <mhocko@suse.com> Acked-by: Christian König <christian.koenig@amd.com> # AMD notifiers Acked-by: Leon Romanovsky <leonro@mellanox.com> # mlx and umem_odp Reported-by: David Rientjes <rientjes@google.com> Cc: "David (ChunMing) Zhou" <David1.Zhou@amd.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: David Airlie <airlied@linux.ie> Cc: Jani Nikula <jani.nikula@linux.intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Doug Ledford <dledford@redhat.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Mike Marciniszyn <mike.marciniszyn@intel.com> Cc: Dennis Dalessandro <dennis.dalessandro@intel.com> Cc: Sudeep Dutt <sudeep.dutt@intel.com> Cc: Ashutosh Dixit <ashutosh.dixit@intel.com> Cc: Dimitri Sivanich <sivanich@sgi.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Juergen Gross <jgross@suse.com> Cc: "Jérôme Glisse" <jglisse@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
932d6562 |
|
19-Jul-2018 |
Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com> |
xen/gntdev: Add initial support for dma-buf UAPI Add UAPI and IOCTLs for dma-buf grant device driver extension: the extension allows userspace processes and kernel modules to use Xen backed dma-buf implementation. With this extension grant references to the pages of an imported dma-buf can be exported for other domain use and grant references coming from a foreign domain can be converted into a local dma-buf for local export. Implement basic initialization and stubs for Xen DMA buffers' support. Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
|
#
1d314567 |
|
19-Jul-2018 |
Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com> |
xen/gntdev: Make private routines/structures accessible This is in preparation for adding support of DMA buffer functionality: make map/unmap related code and structures, used privately by gntdev, ready for dma-buf extension, which will re-use these. Rename corresponding structures as those become non-private to gntdev now. Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
|
#
975ef7ff |
|
19-Jul-2018 |
Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com> |
xen/gntdev: Allow mappings for DMA buffers Allow mappings for DMA backed buffers if grant table module supports such: this extends grant device to not only map buffers made of balloon pages, but also from buffers allocated with dma_alloc_xxx. Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
|
#
cf2acf66 |
|
08-Jan-2018 |
Ross Lagerwall <ross.lagerwall@citrix.com> |
xen/gntdev: Fix partial gntdev_mmap() cleanup When cleaning up after a partially successful gntdev_mmap(), unmap the successfully mapped grant pages otherwise Xen will kill the domain if in debug mode (Attempt to implicitly unmap a granted PTE) or Linux will kill the process and emit "BUG: Bad page map in process" if Xen is in release mode. This is only needed when use_ptemod is true because gntdev_put_map() will unmap grant pages itself when use_ptemod is false. Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
|
#
951a0102 |
|
08-Jan-2018 |
Ross Lagerwall <ross.lagerwall@citrix.com> |
xen/gntdev: Fix off-by-one error when unmapping with holes If the requested range has a hole, the calculation of the number of pages to unmap is off by one. Fix it. Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
|
#
298d275d |
|
25-Oct-2017 |
Juergen Gross <jgross@suse.com> |
xen/gntdev: avoid out of bounds access in case of partial gntdev_mmap() In case gntdev_mmap() succeeds only partially in mapping grant pages it will leave some vital information uninitialized needed later for cleanup. This will lead to an out of bounds array access when unmapping the already mapped pages. So just initialize the data needed for unmapping the pages a little bit earlier. Cc: <stable@vger.kernel.org> Reported-by: Arthur Borsboom <arthurborsboom@gmail.com> Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
|
#
a81461b0 |
|
31-Aug-2017 |
Jérôme Glisse <jglisse@redhat.com> |
xen/gntdev: update to new mmu_notifier semantic Calls to mmu_notifier_invalidate_page() were replaced by calls to mmu_notifier_invalidate_range() and are now bracketed by calls to mmu_notifier_invalidate_range_start()/end() Remove now useless invalidate_page callback. Signed-off-by: Jérôme Glisse <jglisse@redhat.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Roger Pau Monné <roger.pau@citrix.com> Cc: xen-devel@lists.xenproject.org (moderated for non-subscribers) Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
c5f7c5a9 |
|
06-Mar-2017 |
Elena Reshetova <elena.reshetova@intel.com> |
drivers, xen: convert grant_map.users from atomic_t to refcount_t refcount_t type and corresponding API should be used instead of atomic_t when the variable is used as a reference counter. This allows to avoid accidental refcounter overflows that might lead to use-after-free situations. Signed-off-by: Elena Reshetova <elena.reshetova@intel.com> Signed-off-by: Hans Liljestrand <ishkamiel@gmail.com> Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: David Windsor <dwindsor@gmail.com> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
|
#
6e84f315 |
|
08-Feb-2017 |
Ingo Molnar <mingo@kernel.org> |
sched/headers: Prepare for new header dependencies before moving code to <linux/sched/mm.h> We are going to split <linux/sched/mm.h> out of <linux/sched.h>, which will have to be picked up from other headers and a couple of .c files. Create a trivial placeholder <linux/sched/mm.h> file that just maps to <linux/sched.h> to make this patch obviously correct and bisectable. The APIs that are going to be moved first are: mm_alloc() __mmdrop() mmdrop() mmdrop_async_fn() mmdrop_async() mmget_not_zero() mmput() mmput_async() get_task_mm() mm_access() mm_release() Include the new header in the files that are going to need it. Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
30faaafd |
|
21-Nov-2016 |
Boris Ostrovsky <boris.ostrovsky@oracle.com> |
xen/gntdev: Use VM_MIXEDMAP instead of VM_IO to avoid NUMA balancing Commit 9c17d96500f7 ("xen/gntdev: Grant maps should not be subject to NUMA balancing") set VM_IO flag to prevent grant maps from being subjected to NUMA balancing. It was discovered recently that this flag causes get_user_pages() to always fail with -EFAULT. check_vma_flags __get_user_pages __get_user_pages_locked __get_user_pages_unlocked get_user_pages_fast iov_iter_get_pages dio_refill_pages do_direct_IO do_blockdev_direct_IO do_blockdev_direct_IO ext4_direct_IO_read generic_file_read_iter aio_run_iocb (which can happen if guest's vdisk has direct-io-safe option). To avoid this let's use VM_MIXEDMAP flag instead --- it prevents NUMA balancing just as VM_IO does and has no effect on check_vma_flags(). Cc: stable@vger.kernel.org Reported-by: Olaf Hering <olaf@aepfle.de> Suggested-by: Hugh Dickins <hughd@google.com> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Acked-by: Hugh Dickins <hughd@google.com> Tested-by: Olaf Hering <olaf@aepfle.de> Signed-off-by: Juergen Gross <jgross@suse.com>
|
#
c7ebf9d9 |
|
23-May-2016 |
Muhammad Falak R Wani <falakreyaz@gmail.com> |
xen: use vma_pages(). Replace explicit computation of vma page count by a call to vma_pages(). Signed-off-by: Muhammad Falak R Wani <falakreyaz@gmail.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
|
#
36ae220a |
|
09-May-2016 |
David Vrabel <david.vrabel@citrix.com> |
xen/gntdev: reduce copy batch size to 16 IOCTL_GNTDEV_GRANT_COPY batches copy operations to reduce the number of hypercalls. The stack is used to avoid a memory allocation in a hot path. However, a batch size of 24 requires more than 1024 bytes of stack which in some configurations causes a compiler warning. xen/gntdev.c: In function ‘gntdev_ioctl_grant_copy’: xen/gntdev.c:949:1: warning: the frame size of 1248 bytes is larger than 1024 bytes [-Wframe-larger-than=] This is a harmless warning as there is still plenty of stack spare, but people keep trying to "fix" it. Reduce the batch size to 16 to reduce stack usage to less than 1024 bytes. This should have minimal impact on performance. Signed-off-by: David Vrabel <david.vrabel@citrix.com>
|
#
a4cdb556 |
|
02-Dec-2014 |
David Vrabel <david.vrabel@citrix.com> |
xen/gntdev: add ioctl for grant copy Add IOCTL_GNTDEV_GRANT_COPY to allow applications to copy between user space buffers and grant references. This interface is similar to the GNTTABOP_copy hypercall ABI except the local buffers are provided using a virtual address (instead of a GFN and offset). To avoid userspace from having to page align its buffers the driver will use two or more ops if required. If the ioctl returns 0, the application must check the status of each segment with the segments status field. If the ioctl returns a -ve error code (EINVAL or EFAULT), the status of individual ops is undefined. Signed-off-by: David Vrabel <david.vrabel@citrix.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
|
#
b9c0a92a |
|
29-Nov-2015 |
Julia Lawall <Julia.Lawall@lip6.fr> |
xen/gntdev: constify mmu_notifier_ops structures This mmu_notifier_ops structure is never modified, so declare it as const, like the other mmu_notifier_ops structures. Done with the help of Coccinelle. Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
|
#
9c17d965 |
|
10-Nov-2015 |
Boris Ostrovsky <boris.ostrovsky@oracle.com> |
xen/gntdev: Grant maps should not be subject to NUMA balancing Doing so will cause the grant to be unmapped and then, during fault handling, the fault to be mistakenly treated as NUMA hint fault. In addition, even if those maps could partcipate in NUMA balancing, it wouldn't provide any benefit since we are unable to determine physical page's node (even if/when VNUMA is implemented). Marking grant maps' VMAs as VM_IO will exclude them from being part of NUMA balancing. Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: stable@vger.kernel.org Signed-off-by: David Vrabel <david.vrabel@citrix.com>
|
#
7cbea8dc |
|
09-Sep-2015 |
Kirill A. Shutemov <kirill.shutemov@linux.intel.com> |
mm: mark most vm_operations_struct const With two exceptions (drm/qxl and drm/radeon) all vm_operations_struct structs should be constant. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Minchan Kim <minchan@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
30b03d05 |
|
25-Jun-2015 |
Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> |
xen/gntdevt: Fix race condition in gntdev_release() While gntdev_release() is called the MMU notifier is still registered and can traverse priv->maps list even if no pages are mapped (which is the case -- gntdev_release() is called after all). But gntdev_release() will clear that list, so make sure that only one of those things happens at the same time. Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> Cc: <stable@vger.kernel.org> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
|
#
a9fd60e2 |
|
17-Jun-2015 |
Julien Grall <julien.grall@citrix.com> |
xen: Include xen/page.h rather than asm/xen/page.h Using xen/page.h will be necessary later for using common xen page helpers. As xen/page.h already include asm/xen/page.h, always use the later. Signed-off-by: Julien Grall <julien.grall@citrix.com> Reviewed-by: David Vrabel <david.vrabel@citrix.com> Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Cc: Ian Campbell <ian.campbell@citrix.com> Cc: Wei Liu <wei.liu2@citrix.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: netdev@vger.kernel.org Signed-off-by: David Vrabel <david.vrabel@citrix.com>
|
#
b44166cd |
|
03-Apr-2015 |
Bob Liu <bob.liu@oracle.com> |
xen/grant: introduce func gnttab_unmap_refs_sync() There are several place using gnttab async unmap and wait for completion, so move the common code to a function gnttab_unmap_refs_sync(). Signed-off-by: Bob Liu <bob.liu@oracle.com> Acked-by: Roger Pau Monné <roger.pau@citrix.com> Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
|
#
dab069c6 |
|
18-Dec-2014 |
David Vrabel <david.vrabel@citrix.com> |
xen/gntdev: provide find_special_page VMA operation For a PV guest, use the find_special_page op to find the right page. To handle VMAs being split, remember the start of the original VMA so the correct index in the pages array can be calculated. Signed-off-by: David Vrabel <david.vrabel@citrix.com> Reviewed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
|
#
923b2919 |
|
18-Dec-2014 |
David Vrabel <david.vrabel@citrix.com> |
xen/gntdev: mark userspace PTEs as special on x86 PV guests In an x86 PV guest, get_user_pages_fast() on a userspace address range containing foreign mappings does not work correctly because the M2P lookup of the MFN from a userspace PTE may return the wrong page. Force get_user_pages_fast() to fail on such addresses by marking the PTEs as special. If Xen has XENFEAT_gnttab_map_avail_bits (available since at least 4.0), we can do so efficiently in the grant map hypercall. Otherwise, it needs to be done afterwards. This is both inefficient and racy (the mapping is visible to the task before we fixup the PTEs), but will be fine for well-behaved applications that do not use the mapping until after the mmap() system call returns. Guests with XENFEAT_auto_translated_physmap (ARM and x86 HVM or PVH) do not need this since get_user_pages() has always worked correctly for them. Signed-off-by: David Vrabel <david.vrabel@citrix.com> Reviewed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
|
#
74528225 |
|
05-Jan-2015 |
Jennifer Herbert <jennifer.herbert@citrix.com> |
xen/gntdev: safely unmap grants in case they are still in use Use gnttab_unmap_refs_async() to wait until the mapped pages are no longer in use before unmapping them. This allows userspace programs to safely use Direct I/O and AIO to a network filesystem which may retain refs to pages in queued skbs after the filesystem I/O has completed. Signed-off-by: Jennifer Herbert <jennifer.herbert@citrix.com> Reviewed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
|
#
1401c00e |
|
09-Jan-2015 |
David Vrabel <david.vrabel@citrix.com> |
xen/gntdev: convert priv->lock to a mutex Unmapping may require sleeping and we unmap while holding priv->lock, so convert it to a mutex. Signed-off-by: David Vrabel <david.vrabel@citrix.com> Reviewed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
|
#
ff4b156f |
|
08-Jan-2015 |
David Vrabel <david.vrabel@citrix.com> |
xen/grant-table: add helpers for allocating pages Add gnttab_alloc_pages() and gnttab_free_pages() to allocate/free pages suitable to for granted maps. Signed-off-by: David Vrabel <david.vrabel@citrix.com> Reviewed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
|
#
853d0289 |
|
05-Jan-2015 |
David Vrabel <david.vrabel@citrix.com> |
xen/grant-table: pre-populate kernel unmap ops for xen_gnttab_unmap_refs() When unmapping grants, instead of converting the kernel map ops to unmap ops on the fly, pre-populate the set of unmap ops. This allows the grant unmap for the kernel mappings to be trivially batched in the future. Signed-off-by: David Vrabel <david.vrabel@citrix.com> Reviewed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
|
#
e85fc980 |
|
03-Feb-2014 |
Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> |
Revert "xen/grant-table: Avoid m2p_override during mapping" This reverts commit 08ece5bb2312b4510b161a6ef6682f37f4eac8a1. As it breaks ARM builds and needs more attention on the ARM side. Acked-by: David Vrabel <david.vrabel@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
#
08ece5bb |
|
23-Jan-2014 |
Zoltan Kiss <zoltan.kiss@citrix.com> |
xen/grant-table: Avoid m2p_override during mapping The grant mapping API does m2p_override unnecessarily: only gntdev needs it, for blkback and future netback patches it just cause a lock contention, as those pages never go to userspace. Therefore this series does the following: - the original functions were renamed to __gnttab_[un]map_refs, with a new parameter m2p_override - based on m2p_override either they follow the original behaviour, or just set the private flag and call set_phys_to_machine - gnttab_[un]map_refs are now a wrapper to call __gnttab_[un]map_refs with m2p_override false - a new function gnttab_[un]map_refs_userspace provides the old behaviour It also removes a stray space from page.h and change ret to 0 if XENFEAT_auto_translated_physmap, as that is the only possible return value there. v2: - move the storing of the old mfn in page->index to gnttab_map_refs - move the function header update to a separate patch v3: - a new approach to retain old behaviour where it needed - squash the patches into one v4: - move out the common bits from m2p* functions, and pass pfn/mfn as parameter - clear page->private before doing anything with the page, so m2p_find_override won't race with this v5: - change return value handling in __gnttab_[un]map_refs - remove a stray space in page.h - add detail why ret = 0 now at some places v6: - don't pass pfn to m2p* functions, just get it locally Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com> Suggested-by: David Vrabel <david.vrabel@citrix.com> Acked-by: David Vrabel <david.vrabel@citrix.com> Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
#
6926f6d6 |
|
03-Jan-2014 |
Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> |
xen/pvh: Piggyback on PVHVM for grant driver (v4) In PVH the shared grant frame is the PFN and not MFN, hence its mapped via the same code path as HVM. The allocation of the grant frame is done differently - we do not use the early platform-pci driver and have an ioremap area - instead we use balloon memory and stitch all of the non-contingous pages in a virtualized area. That means when we call the hypervisor to replace the GMFN with a XENMAPSPACE_grant_table type, we need to lookup the old PFN for every iteration instead of assuming a flat contingous PFN allocation. Lastly, we only use v1 for grants. This is because PVHVM is not able to use v2 due to no XENMEM_add_to_physmap calls on the error status page (see commit 69e8f430e243d657c2053f097efebc2e2cd559f0 xen/granttable: Disable grant v2 for HVM domains.) Until that is implemented this workaround has to be in place. Also per suggestions by Stefano utilize the PVHVM paths as they share common functionality. v2 of this patch moves most of the PVH code out in the arch/x86/xen/grant-table driver and touches only minimally the generic driver. v3, v4: fixes us some of the code due to earlier patches. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
|
#
ee072640 |
|
23-Jul-2013 |
Stefano Stabellini <stefano.stabellini@eu.citrix.com> |
xen/m2p: use GNTTABOP_unmap_and_replace to reinstate the original mapping GNTTABOP_unmap_grant_ref unmaps a grant and replaces it with a 0 mapping instead of reinstating the original mapping. Doing so separately would be racy. To unmap a grant and reinstate the original mapping atomically we use GNTTABOP_unmap_and_replace. GNTTABOP_unmap_and_replace doesn't work with GNTMAP_contains_pte, so don't use it for kmaps. GNTTABOP_unmap_and_replace zeroes the mapping passed in new_addr so we have to reinstate it, however that is a per-cpu mapping only used for balloon scratch pages, so we can be sure that it's not going to be accessed while the mapping is not valid. Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Reviewed-by: David Vrabel <david.vrabel@citrix.com> Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> CC: alex@alex.org.uk CC: dcrisan@flexiant.com [v1: Konrad fixed up the conflicts] Conflicts: arch/x86/xen/p2m.c
|
#
283c0972 |
|
28-Jun-2013 |
Joe Perches <joe@perches.com> |
xen: Convert printks to pr_<level> Convert printks to pr_<level> (excludes printk(KERN_DEBUG...) to be more consistent throughout the xen subsystem. Add pr_fmt with KBUILD_MODNAME or "xen:" KBUILD_MODNAME Coalesce formats and add missing word spaces Add missing newlines Align arguments and reflow to 80 columns Remove DRV_NAME from formats as pr_fmt adds the same content This does change some of the prefixes of these messages but it also does make them more consistent. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
#
1affa98d |
|
02-Jan-2013 |
Daniel De Graaf <dgdegra@tycho.nsa.gov> |
xen/gntdev: remove erronous use of copy_to_user Since there is now a mapping of granted pages in kernel address space in both PV and HVM, use it for UNMAP_NOTIFY_CLEAR_BYTE instead of accessing memory via copy_to_user and triggering sleep-in-atomic warnings. Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
#
16a1d022 |
|
02-Jan-2013 |
Daniel De Graaf <dgdegra@tycho.nsa.gov> |
xen/gntdev: correctly unmap unlinked maps in mmu notifier If gntdev_ioctl_unmap_grant_ref is called on a range before unmapping it, the entry is removed from priv->maps and the later call to mn_invl_range_start won't find it to do the unmapping. Fix this by creating another list of freeable maps that the mmu notifier can search and use to unmap grants. Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
#
2512f298 |
|
02-Jan-2013 |
Daniel De Graaf <dgdegra@tycho.nsa.gov> |
xen/gntdev: fix unsafe vma access In gntdev_ioctl_get_offset_for_vaddr, we need to hold mmap_sem while calling find_vma() to avoid potentially having the result freed out from under us. Similarly, the MMU notifier functions need to synchronize with gntdev_vma_close to avoid map->vma being freed during their iteration. Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> Reported-by: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
#
a67baeb7 |
|
23-Oct-2012 |
David Vrabel <david.vrabel@citrix.com> |
xen/gntdev: don't leak memory from IOCTL_GNTDEV_MAP_GRANT_REF map->kmap_ops allocated in gntdev_alloc_map() wasn't freed by gntdev_put_map(). Add a gntdev_free_map() helper function to free everything allocated by gntdev_alloc_map(). Signed-off-by: David Vrabel <david.vrabel@citrix.com> Cc: stable@vger.kernel.org Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
#
314e51b9 |
|
08-Oct-2012 |
Konstantin Khlebnikov <khlebnikov@openvz.org> |
mm: kill vma flag VM_RESERVED and mm->reserved_vm counter A long time ago, in v2.4, VM_RESERVED kept swapout process off VMA, currently it lost original meaning but still has some effects: | effect | alternative flags -+------------------------+--------------------------------------------- 1| account as reserved_vm | VM_IO 2| skip in core dump | VM_IO, VM_DONTDUMP 3| do not merge or expand | VM_IO, VM_DONTEXPAND, VM_HUGETLB, VM_PFNMAP 4| do not mlock | VM_IO, VM_DONTEXPAND, VM_HUGETLB, VM_PFNMAP This patch removes reserved_vm counter from mm_struct. Seems like nobody cares about it, it does not exported into userspace directly, it only reduces total_vm showed in proc. Thus VM_RESERVED can be replaced with VM_IO or pair VM_DONTEXPAND | VM_DONTDUMP. remap_pfn_range() and io_remap_pfn_range() set VM_IO|VM_DONTEXPAND|VM_DONTDUMP. remap_vmalloc_range() set VM_DONTEXPAND | VM_DONTDUMP. [akpm@linux-foundation.org: drivers/vfio/pci/vfio_pci.c fixup] Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Carsten Otte <cotte@de.ibm.com> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Eric Paris <eparis@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Hugh Dickins <hughd@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Morris <james.l.morris@oracle.com> Cc: Jason Baron <jbaron@redhat.com> Cc: Kentaro Takeda <takedakn@nttdata.co.jp> Cc: Matt Helsley <matthltc@us.ibm.com> Cc: Nick Piggin <npiggin@kernel.dk> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Robert Richter <robert.richter@amd.com> Cc: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Venkatesh Pallipadi <venki@google.com> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
2fc136ee |
|
11-Sep-2012 |
Stefano Stabellini <stefano.stabellini@eu.citrix.com> |
xen/m2p: do not reuse kmap_op->dev_bus_addr If the caller passes a valid kmap_op to m2p_add_override, we use kmap_op->dev_bus_addr to store the original mfn, but dev_bus_addr is part of the interface with Xen and if we are batching the hypercalls it might not have been written by the hypervisor yet. That means that later on Xen will write to it and we'll think that the original mfn is actually what Xen has written to it. Rather than "stealing" struct members from kmap_op, keep using page->index to store the original mfn and add another parameter to m2p_remove_override to get the corresponding kmap_op instead. It is now responsibility of the caller to keep track of which kmap_op corresponds to a particular page in the m2p_override (gntdev, the only user of this interface that passes a valid kmap_op, is already doing that). CC: stable@kernel.org Reported-and-Tested-By: Sander Eikelenboom <linux@eikelenboom.it> Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
#
b8b0f559 |
|
21-Aug-2012 |
Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> |
xen/apic/xenbus/swiotlb/pcifront/grant/tmem: Make functions or variables static. There is no need for those functions/variables to be visible. Make them static and also fix the compile warnings of this sort: drivers/xen/<some file>.c: warning: symbol '<blah>' was not declared. Should it be static? Some of them just require including the header file that declares the functions. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
#
e8e937be |
|
03-Apr-2012 |
Stefano Stabellini <stefano.stabellini@eu.citrix.com> |
xen/gntdev: do not set VM_PFNMAP Since we are using the m2p_override we do have struct pages corresponding to the user vma mmap'ed by gntdev. Removing the VM_PFNMAP flag makes get_user_pages work on that vma. An example test case would be using a Xen userspace block backend (QDISK) on a file on NFS using O_DIRECT. CC: stable@kernel.org Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
#
7d17e84b |
|
14-Dec-2011 |
Daniel De Graaf <dgdegra@tycho.nsa.gov> |
xen/grant-table: Support mappings required by blkback Add support for mappings without GNTMAP_contains_pte. This was not supported because the unmap operation assumed that this flag was being used; adding a parameter to the unmap operation to allow the PTE clearing to be disabled is sufficient to make unmap capable of supporting either mapping type. Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> [v1: Fix cleanpatch warnings] Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
#
0cc678f8 |
|
27-Oct-2011 |
Daniel De Graaf <dgdegra@tycho.nsa.gov> |
xen/gnt{dev,alloc}: reserve event channels for notify When using the unmap notify ioctl, the event channel used for notification needs to be reserved to avoid it being deallocated prior to sending the notification. Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
#
fc6e0c3b |
|
04-Nov-2011 |
Dan Carpenter <dan.carpenter@oracle.com> |
xen-gntdev: integer overflow in gntdev_alloc_map() The multiplications here can overflow resulting in smaller buffer sizes than expected. "count" comes from a copy_from_user(). Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
#
1f1503ba |
|
11-Oct-2011 |
Daniel De Graaf <dgdegra@tycho.nsa.gov> |
xen/gntdev: Fix sleep-inside-spinlock BUG: sleeping function called from invalid context at /local/scratch/dariof/linux/kernel/mutex.c:271 in_atomic(): 1, irqs_disabled(): 0, pid: 3256, name: qemu-dm 1 lock held by qemu-dm/3256: #0: (&(&priv->lock)->rlock){......}, at: [<ffffffff813223da>] gntdev_ioctl+0x2bd/0x4d5 Pid: 3256, comm: qemu-dm Tainted: G W 3.1.0-rc8+ #5 Call Trace: [<ffffffff81054594>] __might_sleep+0x131/0x135 [<ffffffff816bd64f>] mutex_lock_nested+0x25/0x45 [<ffffffff8131c7c8>] free_xenballooned_pages+0x20/0xb1 [<ffffffff8132194d>] gntdev_put_map+0xa8/0xdb [<ffffffff816be546>] ? _raw_spin_lock+0x71/0x7a [<ffffffff813223da>] ? gntdev_ioctl+0x2bd/0x4d5 [<ffffffff8132243c>] gntdev_ioctl+0x31f/0x4d5 [<ffffffff81007d62>] ? check_events+0x12/0x20 [<ffffffff811433bc>] do_vfs_ioctl+0x488/0x4d7 [<ffffffff81007d4f>] ? xen_restore_fl_direct_reloc+0x4/0x4 [<ffffffff8109168b>] ? lock_release+0x21c/0x229 [<ffffffff81135cdd>] ? rcu_read_unlock+0x21/0x32 [<ffffffff81143452>] sys_ioctl+0x47/0x6a [<ffffffff816bfd82>] system_call_fastpath+0x16/0x1b gntdev_put_map tries to acquire a mutex when freeing pages back to the xenballoon pool, so it cannot be called with a spinlock held. In gntdev_release, the spinlock is not needed as we are freeing the structure later; in the ioctl, only the list manipulation needs to be under the lock. Reported-and-Tested-By: Dario Faggioli <dario.faggioli@citrix.com> Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
#
0930bba6 |
|
29-Sep-2011 |
Stefano Stabellini <stefano.stabellini@eu.citrix.com> |
xen: modify kernel mappings corresponding to granted pages If we want to use granted pages for AIO, changing the mappings of a user vma and the corresponding p2m is not enough, we also need to update the kernel mappings accordingly. Currently this is only needed for pages that are created for user usages through /dev/xen/gntdev. As in, pages that have been in use by the kernel and use the P2M will not need this special mapping. However there are no guarantees that in the future the kernel won't start accessing pages through the 1:1 even for internal usage. In order to avoid the complexity of dealing with highmem, we allocated the pages lowmem. We issue a HYPERVISOR_grant_table_op right away in m2p_add_override and we remove the mappings using another HYPERVISOR_grant_table_op in m2p_remove_override. Considering that m2p_add_override and m2p_remove_override are called once per page we use multicalls and hypercall batching. Use the kmap_op pointer directly as argument to do the mapping as it is guaranteed to be present up until the unmapping is done. Before issuing any unmapping multicalls, we need to make sure that the mapping has already being done, because we need the kmap->handle to be set correctly. Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> [v1: Removed GRANT_FRAME_BIT usage] Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
#
693394b8 |
|
29-Sep-2011 |
Stefano Stabellini <stefano.stabellini@eu.citrix.com> |
xen: add an "highmem" parameter to alloc_xenballooned_pages Add an highmem parameter to alloc_xenballooned_pages, to allow callers to request lowmem or highmem pages. Fix the code style of free_xenballooned_pages' prototype. Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
#
7b0ac956 |
|
26-Jul-2011 |
Ruslan Pisarev <ruslan@rpisarev.org.ua> |
Xen: fix braces coding style issue in gntdev.c and grant-table.c This is a patch to the gntdev.c and grant-table.c files that fixed up braces errors found by the checkpatch.pl tools. Signed-off-by: Ruslan Pisarev <ruslan@rpisarev.org.ua> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
#
a93e20a8 |
|
18-Mar-2011 |
Dan Carpenter <error27@gmail.com> |
xen-gntdev: unlock on error path in gntdev_mmap() We should unlock here and also decrement the number of &map->users. Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
#
12f0258d |
|
18-Mar-2011 |
Dan Carpenter <error27@gmail.com> |
xen-gntdev: return -EFAULT on copy_to_user failure copy_to_user() returns the amount of data remaining to be copied. We want to return a negative error code here. The upper layers just call WARN_ON() if we return non-zero so this doesn't change the behavior. But returning -EFAULT is still cleaner. Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
#
ca47ceaa |
|
09-Mar-2011 |
Daniel De Graaf <dgdegra@tycho.nsa.gov> |
xen-gntdev: Use ballooned pages for grant mappings Grant mappings cause the PFN<->MFN mapping to be lost on the pages used for the mapping. Instead of leaking memory, use pages that have already been ballooned out and so have no valid mapping. This removes the need for the bad-page leak workaround as pages are repopulated by the balloon driver. Acked-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
#
d79647ae |
|
07-Mar-2011 |
Daniel De Graaf <dgdegra@tycho.nsa.gov> |
xen/gntdev,gntalloc: Remove unneeded VM flags The only time when granted pages need to be treated specially is when using Xen's PTE modification for grant mappings owned by another domain (that is, only gntdev on PV guests). Otherwise, the area does not require VM_DONTCOPY and VM_PFNMAP, since it can be accessed just like any other page of RAM. Since the vm_operations_struct close operations decrement reference counts, a corresponding open function that increments them is required now that it is possible to have multiple references to a single area. We are careful in the gntdev to check if we can remove those flags. The reason that we need to be careful in gntdev on PV guests is because we are not changing the PFN/MFN mapping on PV; instead, we change the application's page tables to point to the other domain's memory. This means that the vma cannot be copied without using another grant mapping hypercall; it also requires special handling on unmap, which is the reason for gntdev's dependency on the MMU notifier. For gntalloc, this is not a concern - the pages are owned by the domain using the gntalloc device, and can be mapped and unmapped in the same manner as any other page of memory. Acked-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> [v2: Added in git commit "We are.." from email correspondence]
|
#
38eaeb0f |
|
08-Mar-2011 |
Ian Campbell <ian.campbell@citrix.com> |
xen: gntdev: fix build warning addr is actually a virtual address so use an unsigned long. Fixes: CC drivers/xen/gntdev.o drivers/xen/gntdev.c: In function 'map_grant_pages': drivers/xen/gntdev.c:268: warning: cast from pointer to integer of different size Reduce the scope of the variable at the same time. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
#
f4ee4af4 |
|
23-Feb-2011 |
Daniel De Graaf <dgdegra@tycho.nsa.gov> |
xen-gntdev: Add cast to pointer Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
#
77c35acb |
|
23-Feb-2011 |
Daniel De Graaf <dgdegra@tycho.nsa.gov> |
xen-gntdev: Fix incorrect use of zero handle The handle with numeric value 0 is a valid map handle, so it cannot be used to indicate that a page has not been mapped. Use -1 instead. Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
#
9960be97 |
|
09-Feb-2011 |
Daniel De Graaf <dgdegra@tycho.nsa.gov> |
xen-gntdev: prevent using UNMAP_NOTIFY_CLEAR_BYTE on read-only mappings Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
#
12996fc3 |
|
09-Feb-2011 |
Daniel De Graaf <dgdegra@tycho.nsa.gov> |
xen-gntdev: Avoid double-mapping memory If an already-mapped area of the device was mapped into userspace a second time, a hypercall was incorrectly made to remap the memory again. Avoid the hypercall on later mmap calls, and fail the mmap call if a writable mapping is attempted on a read-only range. Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
#
b57c1869 |
|
09-Feb-2011 |
Daniel De Graaf <dgdegra@tycho.nsa.gov> |
xen-gntdev: Avoid unmapping ranges twice In paravirtualized domains, mn_invl_page or mn_invl_range_start can unmap a segment of a mapped region without unmapping all pages. When the region is later released, the pages will be unmapped twice, leading to an incorrect -EINVAL return. Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
#
84e4075d |
|
09-Feb-2011 |
Daniel De Graaf <dgdegra@tycho.nsa.gov> |
xen-gntdev: Use map->vma for checking map validity The is_mapped flag used to be set at the completion of the map operation, but was not checked in all error paths. Use map->vma instead, which will now be cleared if the initial grant mapping fails. Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
#
0ea22f07 |
|
08-Feb-2011 |
Daniel De Graaf <dgdegra@tycho.nsa.gov> |
xen-gntdev: Fix unmap notify on PV domains In paravirtualized guests, the struct page* for mappings is only a placeholder, and cannot be used to access the granted memory. Use the userspace mapping that we have set up in order to implement UNMAP_NOTIFY_CLEAR_BYTE. Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
#
90b6f305 |
|
03-Feb-2011 |
Daniel De Graaf <dgdegra@tycho.nsa.gov> |
xen-gntdev: Fix memory leak when mmap fails The error path did not decrement the reference count of the grant structure. Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
#
bdc612dc |
|
02-Feb-2011 |
Daniel De Graaf <dgdegra@tycho.nsa.gov> |
xen/gntalloc,gntdev: Add unmap notify ioctl This ioctl allows the users of a shared page to be notified when the other end exits abnormally. [v2: updated description in structs] Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
#
aab8f11a |
|
02-Feb-2011 |
Daniel De Graaf <dgdegra@tycho.nsa.gov> |
xen-gntdev: Support mapping in HVM domains HVM does not allow direct PTE modification, so instead we request that Xen change its internal p2m mappings on the allocated pages and map the memory into userspace normally. Note: The HVM path for map and unmap is slightly different: HVM keeps the pages mapped until the area is deleted, while the PV case (use_ptemod being true) must unmap them when userspace unmaps the range. In the normal use case, this makes no difference to users since unmap time is deletion time. [v2: Expanded commit descr.] Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
#
68b025c8 |
|
02-Feb-2011 |
Daniel De Graaf <dgdegra@tycho.nsa.gov> |
xen-gntdev: Add reference counting to maps This allows userspace to perform mmap() on the gntdev device and then immediately close the filehandle or remove the mapping using the remove ioctl, with the mapped area remaining valid until unmapped. This also fixes an infinite loop when a gntdev device is closed without first unmapping all areas. Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
#
a879211b |
|
02-Feb-2011 |
Daniel De Graaf <dgdegra@tycho.nsa.gov> |
xen-gntdev: Use find_vma rather than iterating our vma list manually This should be faster if many mappings exist, and also removes the only user of map->vma not related to PTE modification. Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
#
ef91082e |
|
02-Feb-2011 |
Daniel De Graaf <dgdegra@tycho.nsa.gov> |
xen-gntdev: Change page limit to be global instead of per-open Because there is no limitation on how many times a user can open a given device file, an per-file-description limit on the number of pages granted offers little to no benefit. Change to a global limit and remove the ioctl() as the parameter can now be changed via sysfs. Xen tools changeset 22768:f8d801e5573e is needed to eliminate the error this change produces in xc_gnttab_set_max_grants. Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
#
a12b4eb3 |
|
10-Dec-2010 |
Stefano Stabellini <stefano.stabellini@eu.citrix.com> |
xen gntdev: use gnttab_map_refs and gnttab_unmap_refs Use gnttab_map_refs and gnttab_unmap_refs to map and unmap the grant ref, so that we can have a corresponding struct page. Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
#
f0a70c88 |
|
07-Jan-2011 |
Daniel De Graaf <dgdegra@tycho.nsa.gov> |
xen/gntdev: Fix circular locking dependency apply_to_page_range will acquire PTE lock while priv->lock is held, and mn_invl_range_start tries to acquire priv->lock with PTE already held. Fix by not holding priv->lock during the entire map operation. This is safe because map->vma is set nonzero while the lock is held, which will cause subsequent maps to fail and will cause the unmap ioctl (and other users of gntdev_del_map) to return -EBUSY until the area is unmapped. It is similarly impossible for gntdev_vma_close to be called while the vma is still being created. Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
#
ba5d1012 |
|
08-Dec-2010 |
Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> |
xen/gntdev: stop using "token" argument It's the struct page of the L1 pte page. But we can get its mfn by simply doing an arbitrary_virt_to_machine() on it anyway (which is the safe conservative choice; since we no longer allow HIGHPTE pages, we would never expect to be operating on a mapped pte page). Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
#
9329e760 |
|
08-Dec-2010 |
Ian Campbell <ian.campbell@citrix.com> |
xen: gntdev: move use of GNTMAP_contains_pte next to the map_op This flag controls the meaning of gnttab_map_grant_ref.host_addr and specifies that the field contains a reference to the pte entry to be used to perform the mapping. Therefore move the use of this flag to the point at which we actually use a reference to the pte instead of something else, splitting up the usage of the flag in this way is confusing and potentially error prone. The other flags are all properties of the mapping itself as opposed to properties of the hypercall arguments and therefore it make sense to continue to pass them round in map->flags. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Cc: Stefano Stabellini <Stefano.Stabellini@eu.citrix.com> Cc: Derek G. Murray <Derek.Murray@cl.cam.ac.uk> Cc: Gerd Hoffmann <kraxel@redhat.com> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
#
8d3eaea2 |
|
11-Nov-2010 |
Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> |
xen/gntdev: add VM_PFNMAP to vma These pages are from other domains, so don't have any local PFN. VM_PFNMAP is the closest concept Linux has to this. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
#
ab31523c |
|
14-Dec-2010 |
Gerd Hoffmann <kraxel@redhat.com> |
xen/gntdev: allow usermode to map granted pages The gntdev driver allows usermode to map granted pages from other domains. This is typically used to implement a Xen backend driver in user mode. Signed-off-by: Gerd Hoffmann <kraxel@redhat.com> Signed-off-by: Stefano Stabellini <Stefano.Stabellini@eu.citrix.com> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|