#
38620fc4 |
|
20-Feb-2024 |
Roger Pau Monne <roger.pau@citrix.com> |
x86/xen: attempt to inflate the memory balloon on PVH When running as PVH or HVM Linux will use holes in the memory map as scratch space to map grants, foreign domain pages and possibly miscellaneous other stuff. However the usage of such memory map holes for Xen purposes can be problematic. The request of holesby Xen happen quite early in the kernel boot process (grant table setup already uses scratch map space), and it's possible that by then not all devices have reclaimed their MMIO space. It's not unlikely for chunks of Xen scratch map space to end up using PCI bridge MMIO window memory, which (as expected) causes quite a lot of issues in the system. At least for PVH dom0 we have the possibility of using regions marked as UNUSABLE in the e820 memory map. Either if the region is UNUSABLE in the native memory map, or it has been converted into UNUSABLE in order to hide RAM regions from dom0, the second stage translation page-tables can populate those areas without issues. PV already has this kind of logic, where the balloon driver is inflated at boot. Re-use the current logic in order to also inflate it when running as PVH. onvert UNUSABLE regions up to the ratio specified in EXTRA_MEM_RATIO to RAM, while reserving them using xen_add_extra_mem() (which is also moved so it's no longer tied to CONFIG_PV). [jgross: fixed build for CONFIG_PVH without CONFIG_XEN_PVH] Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Juergen Gross <jgross@suse.com> Link: https://lore.kernel.org/r/20240220174341.56131-1-roger.pau@citrix.com Signed-off-by: Juergen Gross <jgross@suse.com>
|
#
db283230 |
|
24-Nov-2023 |
Juergen Gross <jgross@suse.com> |
x86/xen: fix percpu vcpu_info allocation Today the percpu struct vcpu_info is allocated via DEFINE_PER_CPU(), meaning that it could cross a page boundary. In this case registering it with the hypervisor will fail, resulting in a panic(). This can easily be fixed by using DEFINE_PER_CPU_ALIGNED() instead, as struct vcpu_info is guaranteed to have a size of 64 bytes, matching the cache line size of x86 64-bit processors (Xen doesn't support 32-bit processors). Fixes: 5ead97c84fa7 ("xen: Core Xen implementation") Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.con> Link: https://lore.kernel.org/r/20231124074852.25161-1-jgross@suse.com Signed-off-by: Juergen Gross <jgross@suse.com>
|
#
37510dd5 |
|
24-Aug-2023 |
Juergen Gross <jgross@suse.com> |
xen: simplify evtchn_do_upcall() call maze There are several functions involved for performing the functionality of evtchn_do_upcall(): - __xen_evtchn_do_upcall() doing the real work - xen_hvm_evtchn_do_upcall() just being a wrapper for __xen_evtchn_do_upcall(), exposed for external callers - xen_evtchn_do_upcall() calling __xen_evtchn_do_upcall(), too, but without any user Simplify this maze by: - removing the unused xen_evtchn_do_upcall() - removing xen_hvm_evtchn_do_upcall() as the only left caller of __xen_evtchn_do_upcall(), while renaming __xen_evtchn_do_upcall() to xen_evtchn_do_upcall() Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Juergen Gross <jgross@suse.com>
|
#
b1c3497e |
|
29-Jul-2022 |
Jane Malalane <jane.malalane@citrix.com> |
x86/xen: Add support for HVMOP_set_evtchn_upcall_vector Implement support for the HVMOP_set_evtchn_upcall_vector hypercall in order to set the per-vCPU event channel vector callback on Linux and use it in preference of HVM_PARAM_CALLBACK_IRQ. If the per-VCPU vector setup is successful on BSP, use this method for the APs. If not, fallback to the global vector-type callback. Also register callback_irq at per-vCPU event channel setup to trick toolstack to think the domain is enlightened. Suggested-by: "Roger Pau Monné" <roger.pau@citrix.com> Signed-off-by: Jane Malalane <jane.malalane@citrix.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Link: https://lore.kernel.org/r/20220729070416.23306-1-jane.malalane@citrix.com Signed-off-by: Juergen Gross <jgross@suse.com>
|
#
e453f872 |
|
28-Oct-2021 |
Juergen Gross <jgross@suse.com> |
x86/xen: switch initial pvops IRQ functions to dummy ones The initial pvops functions handling irq flags will only ever be called before interrupts are being enabled. So switch them to be dummy functions: - xen_save_fl() can always return 0 - xen_irq_disable() is a nop - xen_irq_enable() can BUG() Add some generic paravirt functions for that purpose. Signed-off-by: Juergen Gross <jgross@suse.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Link: https://lore.kernel.org/r/20211028072748.29862-3-jgross@suse.com Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
|
#
12ad6cfc |
|
28-Oct-2021 |
Juergen Gross <jgross@suse.com> |
x86/xen: remove xen_have_vcpu_info_placement flag The flag xen_have_vcpu_info_placement was needed to support Xen hypervisors older than version 3.4, which didn't support the VCPUOP_register_vcpu_info hypercall. Today the Linux kernel requires at least Xen 4.0 to be able to run, so xen_have_vcpu_info_placement can be dropped (in theory the flag was used to ensure a working kernel even in case of the VCPUOP_register_vcpu_info hypercall failing for other reasons than the hypercall not being supported, but the only cases covered by the flag would be parameter errors, which ought not to be made anyway). This allows to let some functions return void now, as they can never fail. Signed-off-by: Juergen Gross <jgross@suse.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Link: https://lore.kernel.org/r/20211028072748.29862-2-jgross@suse.com Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
|
#
9c11112c |
|
30-Sep-2021 |
Jan Beulich <jbeulich@suse.com> |
xen/x86: adjust data placement Both xen_pvh and xen_start_flags get written just once early during init. Using the respective annotation then allows the open-coded placing in .data to go away. Additionally the former, like the latter, wants exporting, or else xen_pvh_domain() can't be used from modules. Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Juergen Gross <jgross@suse.com> Link: https://lore.kernel.org/r/8155ed26-5a1d-c06f-42d8-596d26e75849@suse.com Signed-off-by: Juergen Gross <jgross@suse.com>
|
#
079c4baa |
|
30-Sep-2021 |
Jan Beulich <jbeulich@suse.com> |
xen/x86: hook up xen_banner() also for PVH This was effectively lost while dropping PVHv1 code. Move the function and arrange for it to be called the same way as done in PV mode. Clearly this then needs re-introducing the XENFEAT_mmu_pt_update_preserve_ad check that was recently removed, as that's a PV-only feature. Since the string pointed at by pv_info.name describes the mode, drop "paravirtualized" from the log message while moving the code. Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Juergen Gross <jgross@suse.com> Link: https://lore.kernel.org/r/de03054d-a20d-2114-bb86-eec28e17b3b8@suse.com Signed-off-by: Juergen Gross <jgross@suse.com>
|
#
4d1ab432 |
|
30-Sep-2021 |
Jan Beulich <jbeulich@suse.com> |
xen/x86: generalize preferred console model from PV to PVH Dom0 Without announcing hvc0 as preferred it won't get used as long as tty0 gets registered earlier. This is particularly problematic with there not being any screen output for PVH Dom0 when the screen is in graphics mode, as the necessary information doesn't get conveyed yet from the hypervisor. Follow PV's model, but be conservative and do this for Dom0 only for now. Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Juergen Gross <jgross@suse.com> Link: https://lore.kernel.org/r/582328b6-c86c-37f3-d802-5539b7a86736@suse.com Signed-off-by: Juergen Gross <jgross@suse.com>
|
#
cae7d81a |
|
30-Sep-2021 |
Jan Beulich <jbeulich@suse.com> |
xen/x86: allow PVH Dom0 without XEN_PV=y Decouple XEN_DOM0 from XEN_PV, converting some existing uses of XEN_DOM0 to a new XEN_PV_DOM0. (I'm not convinced all are really / should really be PV-specific, but for starters I've tried to be conservative.) For PVH Dom0 the hypervisor populates MADT with only x2APIC entries, so without x2APIC support enabled in the kernel things aren't going to work very well. (As opposed, DomU-s would only ever see LAPIC entries in MADT as of now.) Note that this then requires PVH Dom0 to be 64-bit, as X86_X2APIC depends on X86_64. In the course of this xen_running_on_version_or_later() needs to be available more broadly. Move it from a PV-specific to a generic file, considering that what it does isn't really PV-specific at all anyway. Note that xen/interface/version.h cannot be included on its own; in enlighten.c, which uses SCHEDOP_* anyway, include xen/interface/sched.h first to resolve the apparently sole missing type (xen_ulong_t). Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Juergen Gross <jgross@suse.com> Link: https://lore.kernel.org/r/983bb72f-53df-b6af-14bd-5e088bd06a08@suse.com Signed-off-by: Juergen Gross <jgross@suse.com>
|
#
9172b5c4 |
|
30-Sep-2021 |
Jan Beulich <jbeulich@suse.com> |
xen/x86: prevent PVH type from getting clobbered Like xen_start_flags, xen_domain_type gets set before .bss gets cleared. Hence this variable also needs to be prevented from getting put in .bss, which is possible because XEN_NATIVE is an enumerator evaluating to zero. Any use prior to init_hvm_pv_info() setting the variable again would lead to wrong decisions; one such case is xenboot_console_setup() when called as a result of "earlyprintk=xen". Use __ro_after_init as more applicable than either __section(".data") or __read_mostly. Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Juergen Gross <jgross@suse.com> Link: https://lore.kernel.org/r/d301677b-6f22-5ae6-bd36-458e1f323d0b@suse.com Signed-off-by: Juergen Gross <jgross@suse.com>
|
#
f39650de |
|
30-Jun-2021 |
Andy Shevchenko <andriy.shevchenko@linux.intel.com> |
kernel.h: split out panic and oops helpers kernel.h is being used as a dump for all kinds of stuff for a long time. Here is the attempt to start cleaning it up by splitting out panic and oops helpers. There are several purposes of doing this: - dropping dependency in bug.h - dropping a loop by moving out panic_notifier.h - unload kernel.h from something which has its own domain At the same time convert users tree-wide to use new headers, although for the time being include new header back to kernel.h to avoid twisted indirected includes for existing users. [akpm@linux-foundation.org: thread_info.h needs limits.h] [andriy.shevchenko@linux.intel.com: ia64 fix] Link: https://lkml.kernel.org/r/20210520130557.55277-1-andriy.shevchenko@linux.intel.com Link: https://lkml.kernel.org/r/20210511074137.33666-1-andriy.shevchenko@linux.intel.com Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org> Co-developed-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Mike Rapoport <rppt@linux.ibm.com> Acked-by: Corey Minyard <cminyard@mvista.com> Acked-by: Christian Brauner <christian.brauner@ubuntu.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Kees Cook <keescook@chromium.org> Acked-by: Wei Liu <wei.liu@kernel.org> Acked-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Sebastian Reichel <sre@kernel.org> Acked-by: Luis Chamberlain <mcgrof@kernel.org> Acked-by: Stephen Boyd <sboyd@kernel.org> Acked-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Acked-by: Helge Deller <deller@gmx.de> # parisc Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
33def849 |
|
21-Oct-2020 |
Joe Perches <joe@perches.com> |
treewide: Convert macro and uses of __section(foo) to __section("foo") Use a more generic form for __section that requires quotes to avoid complications with clang and gcc differences. Remove the quote operator # from compiler_attributes.h __section macro. Convert all unquoted __section(foo) uses to quoted __section("foo"). Also convert __attribute__((section("foo"))) uses to __section("foo") even if the __attribute__ has multiple list entry forms. Conversion done using the script at: https://lore.kernel.org/lkml/75393e5ddc272dc7403de74d645e6c6e0f4e70eb.camel@perches.com/2-convert_section.pl Signed-off-by: Joe Perches <joe@perches.com> Reviewed-by: Nick Desaulniers <ndesaulniers@gooogle.com> Reviewed-by: Miguel Ojeda <ojeda@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
c6875f3a |
|
30-Sep-2019 |
Boris Ostrovsky <boris.ostrovsky@oracle.com> |
x86/xen: Return from panic notifier Currently execution of panic() continues until Xen's panic notifier (xen_panic_event()) is called at which point we make a hypercall that never returns. This means that any notifier that is supposed to be called later as well as significant part of panic() code (such as pstore writes from kmsg_dump()) is never executed. There is no reason for xen_panic_event() to be this last point in execution since panic()'s emergency_restart() will call into xen_emergency_restart() from where we can perform our hypercall. Nevertheless, we will provide xen_legacy_crash boot option that will preserve original behavior during crash. This option could be used, for example, if running kernel dumper (which happens after panic notifiers) is undesirable. Reported-by: James Dingwall <james@dingwall.me.uk> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Reviewed-by: Juergen Gross <jgross@suse.com>
|
#
12366410 |
|
27-Nov-2018 |
Igor Druzhinin <igor.druzhinin@citrix.com> |
Revert "xen/balloon: Mark unallocated host memory as UNUSABLE" This reverts commit b3cf8528bb21febb650a7ecbf080d0647be40b9f. That commit unintentionally broke Xen balloon memory hotplug with "hotplug_unpopulated" set to 1. As long as "System RAM" resource got assigned under a new "Unusable memory" resource in IO/Mem tree any attempt to online this memory would fail due to general kernel restrictions on having "System RAM" resources as 1st level only. The original issue that commit has tried to workaround fa564ad96366 ("x86/PCI: Enable a 64bit BAR on AMD Family 15h (Models 00-1f, 30-3f, 60-7f)") also got amended by the following 03a551734 ("x86/PCI: Move and shrink AMD 64-bit window to avoid conflict") which made the original fix to Xen ballooning unnecessary. Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Juergen Gross <jgross@suse.com>
|
#
57c8a661 |
|
30-Oct-2018 |
Mike Rapoport <rppt@linux.vnet.ibm.com> |
mm: remove include/linux/bootmem.h Move remaining definitions and declarations from include/linux/bootmem.h into include/linux/memblock.h and remove the redundant header. The includes were replaced with the semantic patch below and then semi-automated removal of duplicated '#include <linux/memblock.h> @@ @@ - #include <linux/bootmem.h> + #include <linux/memblock.h> [sfr@canb.auug.org.au: dma-direct: fix up for the removal of linux/bootmem.h] Link: http://lkml.kernel.org/r/20181002185342.133d1680@canb.auug.org.au [sfr@canb.auug.org.au: powerpc: fix up for removal of linux/bootmem.h] Link: http://lkml.kernel.org/r/20181005161406.73ef8727@canb.auug.org.au [sfr@canb.auug.org.au: x86/kaslr, ACPI/NUMA: fix for linux/bootmem.h removal] Link: http://lkml.kernel.org/r/20181008190341.5e396491@canb.auug.org.au Link: http://lkml.kernel.org/r/1536927045-23536-30-git-send-email-rppt@linux.vnet.ibm.com Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Chris Zankel <chris@zankel.net> Cc: "David S. Miller" <davem@davemloft.net> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Greentime Hu <green.hu@gmail.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Guan Xuetao <gxt@pku.edu.cn> Cc: Ingo Molnar <mingo@redhat.com> Cc: "James E.J. Bottomley" <jejb@parisc-linux.org> Cc: Jonas Bonn <jonas@southpole.se> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Ley Foon Tan <lftan@altera.com> Cc: Mark Salter <msalter@redhat.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Matt Turner <mattst88@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Simek <monstr@monstr.eu> Cc: Palmer Dabbelt <palmer@sifive.com> Cc: Paul Burton <paul.burton@mips.com> Cc: Richard Kuo <rkuo@codeaurora.org> Cc: Richard Weinberger <richard@nod.at> Cc: Rich Felker <dalias@libc.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Serge Semin <fancer.lancer@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: Vineet Gupta <vgupta@synopsys.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
3cfa210b |
|
25-Sep-2018 |
Christoph Hellwig <hch@lst.de> |
xen: don't include <xen/xen.h> from <asm/io.h> and <asm/dma-mapping.h> Nothing Xen specific in these headers, which get included from a lot of code in the kernel. So prune the includes and move them to the Xen-specific files that actually use them instead. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
901d209a |
|
28-Aug-2018 |
Juergen Gross <jgross@suse.com> |
x86/xen: Add SPDX identifier in arch/x86/xen files Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: xen-devel@lists.xenproject.org Cc: virtualization@lists.linux-foundation.org Cc: akataria@vmware.com Cc: rusty@rustcorp.com.au Cc: hpa@zytor.com Link: https://lkml.kernel.org/r/20180828074026.820-5-jgross@suse.com
|
#
447ae316 |
|
28-Jul-2018 |
Nicolai Stange <nstange@suse.de> |
x86: Don't include linux/irq.h from asm/hardirq.h The next patch in this series will have to make the definition of irq_cpustat_t available to entering_irq(). Inclusion of asm/hardirq.h into asm/apic.h would cause circular header dependencies like asm/smp.h asm/apic.h asm/hardirq.h linux/irq.h linux/topology.h linux/smp.h asm/smp.h or linux/gfp.h linux/mmzone.h asm/mmzone.h asm/mmzone_64.h asm/smp.h asm/apic.h asm/hardirq.h linux/irq.h linux/irqdesc.h linux/kobject.h linux/sysfs.h linux/kernfs.h linux/idr.h linux/gfp.h and others. This causes compilation errors because of the header guards becoming effective in the second inclusion: symbols/macros that had been defined before wouldn't be available to intermediate headers in the #include chain anymore. A possible workaround would be to move the definition of irq_cpustat_t into its own header and include that from both, asm/hardirq.h and asm/apic.h. However, this wouldn't solve the real problem, namely asm/harirq.h unnecessarily pulling in all the linux/irq.h cruft: nothing in asm/hardirq.h itself requires it. Also, note that there are some other archs, like e.g. arm64, which don't have that #include in their asm/hardirq.h. Remove the linux/irq.h #include from x86' asm/hardirq.h. Fix resulting compilation errors by adding appropriate #includes to *.c files as needed. Note that some of these *.c files could be cleaned up a bit wrt. to their set of #includes, but that should better be done from separate patches, if at all. Signed-off-by: Nicolai Stange <nstange@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
1fe83888 |
|
08-Jun-2018 |
Roger Pau Monne <roger.pau@citrix.com> |
xen: share start flags between PV and PVH Use a global variable to store the start flags for both PV and PVH. This allows the xen_initial_domain macro to work properly on PVH. Note that ARM is also switched to use the new variable. Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Juergen Gross <jgross@suse.com> Signed-off-by: Juergen Gross <jgross@suse.com>
|
#
b3cf8528 |
|
12-Dec-2017 |
Boris Ostrovsky <boris.ostrovsky@oracle.com> |
xen/balloon: Mark unallocated host memory as UNUSABLE Commit f5775e0b6116 ("x86/xen: discard RAM regions above the maximum reservation") left host memory not assigned to dom0 as available for memory hotplug. Unfortunately this also meant that those regions could be used by others. Specifically, commit fa564ad96366 ("x86/PCI: Enable a 64bit BAR on AMD Family 15h (Models 00-1f, 30-3f, 60-7f)") may try to map those addresses as MMIO. To prevent this mark unallocated host memory as E820_TYPE_UNUSABLE (thus effectively reverting f5775e0b6116) and keep track of that region as a hostmem resource that can be used for the hotplug. Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Reviewed-by: Juergen Gross <jgross@suse.com>
|
#
eac779aa |
|
08-Oct-2017 |
Zhenzhong Duan <zhenzhong.duan@oracle.com> |
xen/vcpu: Use a unified name about cpu hotplug state for pv and pvhvm As xen_cpuhp_setup is called by PV and PVHVM, the name of "x86/xen/hvm_guest" is confusing. Signed-off-by: Zhenzhong Duan <zhenzhong.duan@oracle.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
|
#
c9b5d98b |
|
02-Jun-2017 |
Ankur Arora <ankur.a.arora@oracle.com> |
xen/vcpu: Handle xen_vcpu_setup() failure in hotplug The hypercall VCPUOP_register_vcpu_info can fail. This failure is handled by making per_cpu(xen_vcpu, cpu) point to its shared_info slot and those without one (cpu >= MAX_VIRT_CPUS) be NULL. For PVH/PVHVM, this is not enough, because we also need to pull these VCPUs out of circulation. Fix for PVH/PVHVM: on registration failure in the cpuhp prepare callback (xen_cpu_up_prepare_hvm()), return an error to the cpuhp state-machine so it can fail the CPU init. Fix for PV: the registration happens before smp_init(), so, in the failure case we clamp setup_max_cpus and limit the number of VCPUs that smp_init() will bring-up to MAX_VIRT_CPUS. This is functionally correct but it makes the code a bit simpler if we get rid of this explicit clamping: for VCPUs that don't have valid xen_vcpu, fail the CPU init in the cpuhp prepare callback (xen_cpu_up_prepare_pv()). Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com> Signed-off-by: Juergen Gross <jgross@suse.com>
|
#
0b64ffb8 |
|
02-Jun-2017 |
Ankur Arora <ankur.a.arora@oracle.com> |
xen/pvh*: Support > 32 VCPUs at domain restore When Xen restores a PVHVM or PVH guest, its shared_info only holds up to 32 CPUs. The hypercall VCPUOP_register_vcpu_info allows us to setup per-page areas for VCPUs. This means we can boot PVH* guests with more than 32 VCPUs. During restore the per-cpu structure is allocated freshly by the hypervisor (vcpu_info_mfn is set to INVALID_MFN) so that the newly restored guest can make a VCPUOP_register_vcpu_info hypercall. However, we end up triggering this condition in Xen: /* Run this command on yourself or on other offline VCPUS. */ if ( (v != current) && !test_bit(_VPF_down, &v->pause_flags) ) which means we are unable to setup the per-cpu VCPU structures for running VCPUS. The Linux PV code paths makes this work by iterating over cpu_possible in xen_vcpu_restore() with: 1) is target CPU up (VCPUOP_is_up hypercall?) 2) if yes, then VCPUOP_down to pause it 3) VCPUOP_register_vcpu_info 4) if it was down, then VCPUOP_up to bring it back up With Xen commit 192df6f9122d ("xen/x86: allow HVM guests to use hypercalls to bring up vCPUs") this is available for non-PV guests. As such first check if VCPUOP_is_up is actually possible before trying this dance. As most of this dance code is done already in xen_vcpu_restore() let's make it callable on PV, PVH and PVHVM. Based-on-patch-by: Konrad Wilk <konrad.wilk@oracle.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com> Signed-off-by: Juergen Gross <jgross@suse.com>
|
#
ad73fd59 |
|
02-Jun-2017 |
Ankur Arora <ankur.a.arora@oracle.com> |
xen/vcpu: Simplify xen_vcpu related code Largely mechanical changes to aid unification of xen_vcpu_restore() logic for PV, PVH and PVHVM. xen_vcpu_setup(): the only change in logic is that clamp_max_cpus() is now handled inside the "if (!xen_have_vcpu_info_placement)" block. xen_vcpu_restore(): code movement from enlighten_pv.c to enlighten.c. xen_vcpu_info_reset(): pulls together all the code where xen_vcpu is set to default. Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com> Signed-off-by: Juergen Gross <jgross@suse.com>
|
#
3dbd8204 |
|
02-May-2017 |
Boris Ostrovsky <boris.ostrovsky@oracle.com> |
xen: Move xen_have_vector_callback definition to enlighten.c Commit 84d582d236dc ("xen: Revert commits da72ff5bfcb0 and 72a9b186292d") defined xen_have_vector_callback in enlighten_hvm.c. Since guest-type-neutral code refers to this variable this causes build failures when CONFIG_XEN_PVHVM is not defined. Moving xen_have_vector_callback definition to enlighten.c resolves this issue. Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Reported-by: Randy Dunlap <rdunlap@infradead.org> Acked-by: Randy Dunlap <rdunlap@infradead.org> Reviewed-by: Juergen Gross <jgross@suse.com> Signed-off-by: Juergen Gross <jgross@suse.com>
|
#
e1dab14c |
|
14-Mar-2017 |
Vitaly Kuznetsov <vkuznets@redhat.com> |
x86/xen: split off enlighten_pv.c Basically, enlighten.c is renamed to enlighten_pv.c and some code moved out to common enlighten.c. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Juergen Gross <jgross@suse.com> Signed-off-by: Juergen Gross <jgross@suse.com>
|
#
98f2a47a |
|
14-Mar-2017 |
Vitaly Kuznetsov <vkuznets@redhat.com> |
x86/xen: split off enlighten_hvm.c Move PVHVM related code to enlighten_hvm.c. Three functions: xen_cpuhp_setup(), xen_reboot(), xen_emergency_restart() are shared, drop static qualifier from them. These functions will go to common code once it is split from enlighten.c. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Juergen Gross <jgross@suse.com> Signed-off-by: Juergen Gross <jgross@suse.com>
|
#
481d6632 |
|
14-Mar-2017 |
Vitaly Kuznetsov <vkuznets@redhat.com> |
x86/xen: split off enlighten_pvh.c Create enlighten_pvh.c by splitting off PVH related code from enlighten.c, put it under CONFIG_XEN_PVH. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Juergen Gross <jgross@suse.com> Signed-off-by: Juergen Gross <jgross@suse.com>
|
#
52519f2a |
|
14-Mar-2017 |
Vitaly Kuznetsov <vkuznets@redhat.com> |
x86/xen: globalize have_vcpu_info_placement have_vcpu_info_placement applies to both PV and HVM and as we're going to split the code we need to make it global. Rename to xen_have_vcpu_info_placement. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Juergen Gross <jgross@suse.com> Signed-off-by: Juergen Gross <jgross@suse.com>
|
#
0991d22d |
|
14-Mar-2017 |
Vitaly Kuznetsov <vkuznets@redhat.com> |
x86/xen: separate PV and HVM hypervisors As a preparation to splitting the code we need to untangle it: x86_hyper_xen -> x86_hyper_xen_hvm and x86_hyper_xen_pv xen_platform() -> xen_platform_hvm() and xen_platform_pv() xen_cpu_up_prepare() -> xen_cpu_up_prepare_pv() and xen_cpu_up_prepare_hvm() xen_cpu_dead() -> xen_cpu_dead_pv() and xen_cpu_dead_pv_hvm() Add two parameters to xen_cpuhp_setup() to pass proper cpu_up_prepare and cpu_dead hooks. xen_set_cpu_features() is now PV-only so the redundant xen_pv_domain() check can be dropped. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Juergen Gross <jgross@suse.com> Signed-off-by: Juergen Gross <jgross@suse.com>
|
#
b23adb7d |
|
22-Mar-2017 |
Andy Lutomirski <luto@kernel.org> |
x86/xen/gdt: Use X86_FEATURE_XENPV instead of globals for the GDT fixup Xen imposes special requirements on the GDT. Rather than using a global variable for the pgprot, just use an explicit special case for Xen -- this makes it clearer what's going on. It also debloats 64-bit kernels very slightly. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Garnier <thgarnie@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/e9ea96abbfd6a8c87753849171bb5987ecfeb523.1490218061.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
69218e47 |
|
14-Mar-2017 |
Thomas Garnier <thgarnie@google.com> |
x86: Remap GDT tables in the fixmap section Each processor holds a GDT in its per-cpu structure. The sgdt instruction gives the base address of the current GDT. This address can be used to bypass KASLR memory randomization. With another bug, an attacker could target other per-cpu structures or deduce the base of the main memory section (PAGE_OFFSET). This patch relocates the GDT table for each processor inside the fixmap section. The space is reserved based on number of supported processors. For consistency, the remapping is done by default on 32 and 64-bit. Each processor switches to its remapped GDT at the end of initialization. For hibernation, the main processor returns with the original GDT and switches back to the remapping at completion. This patch was tested on both architectures. Hibernation and KVM were both tested specially for their usage of the GDT. Thanks to Boris Ostrovsky <boris.ostrovsky@oracle.com> for testing and recommending changes for Xen support. Signed-off-by: Thomas Garnier <thgarnie@google.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Borislav Petkov <bp@suse.de> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Kosina <jikos@kernel.org> Cc: Joerg Roedel <joro@8bytes.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Kees Cook <keescook@chromium.org> Cc: Len Brown <len.brown@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Lorenzo Stoakes <lstoakes@gmail.com> Cc: Luis R . Rodriguez <mcgrof@kernel.org> Cc: Matt Fleming <matt@codeblueprint.co.uk> Cc: Michal Hocko <mhocko@suse.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Pavel Machek <pavel@ucw.cz> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Rafael J . Wysocki <rjw@rjwysocki.net> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Stanislaw Gruszka <sgruszka@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: kasan-dev@googlegroups.com Cc: kernel-hardening@lists.openwall.com Cc: kvm@vger.kernel.org Cc: lguest@lists.ozlabs.org Cc: linux-doc@vger.kernel.org Cc: linux-efi@vger.kernel.org Cc: linux-mm@kvack.org Cc: linux-pm@vger.kernel.org Cc: xen-devel@lists.xenproject.org Cc: zijun_hu <zijun_hu@htc.com> Link: http://lkml.kernel.org/r/20170314170508.100882-2-thgarnie@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
6415813b |
|
12-Feb-2017 |
Mathias Krause <minipli@googlemail.com> |
x86/cpu: Drop wp_works_ok member of struct cpuinfo_x86 Remove the wp_works_ok member of struct cpuinfo_x86. It's an optimization back from Linux v0.99 times where we had no fixup support yet and did the CR0.WP test via special code in the page fault handler. The < 0 test was an optimization to not do the special casing for each NULL ptr access violation but just for the first one doing the WP test. Today it serves no real purpose as the test no longer needs special code in the page fault handler and the only call side -- mem_init() -- calls it just once, anyway. However, Xen pre-initializes it to 1, to skip the test. Doing the test again for Xen should be no issue at all, as even the commit introducing skipping the test (commit d560bc61575e ("x86, xen: Suppress WP test on Xen")) mentioned it being ban aid only. And, in fact, testing the patch on Xen showed nothing breaks. The pre-fixup times are long gone and with the removal of the fallback handling code in commit a5c2a893dbd4 ("x86, 386 removal: Remove CONFIG_X86_WP_WORKS_OK") the kernel requires a working CR0.WP anyway. So just get rid of the "optimization" and do the test unconditionally. Signed-off-by: Mathias Krause <minipli@googlemail.com> Acked-by: Borislav Petkov <bp@alien8.de> Cc: Jesper Nilsson <jesper.nilsson@axis.com> Cc: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Arnd Hannemann <hannemann@nets.rwth-aachen.de> Cc: Mikael Starvik <starvik@axis.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: "David S. Miller" <davem@davemloft.net> Link: http://lkml.kernel.org/r/1486933932-585-3-git-send-email-minipli@googlemail.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
687d77a5 |
|
01-Mar-2017 |
Ingo Molnar <mingo@kernel.org> |
x86/xen: Update e820 table handling to the new core x86 E820 code Note that I restructured the Xen E820 logic a bit: instead of trying to sort the boot parameters, only the kernel's E820 table is sorted. This is how the x86 code does it and it reduces coupling between the in-kernel E820 code and the (unchanged) boot parameters. Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Juergen Gross <jgross@suse.com> Cc: <stefano.stabellini@eu.citrix.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
7a1c44eb |
|
06-Feb-2017 |
Boris Ostrovsky <boris.ostrovsky@oracle.com> |
xen/pvh: Use Xen's emergency_restart op for PVH guests Using native_machine_emergency_restart (called during reboot) will lead PVH guests to machine_real_restart() where we try to use real_mode_header which is not initialized. Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Reviewed-by: Juergen Gross <jgross@suse.com>
|
#
5adad168 |
|
05-Feb-2017 |
Boris Ostrovsky <boris.ostrovsky@oracle.com> |
xen/pvh: Make sure we don't use ACPI_IRQ_MODEL_PIC for SCI Since we are not using PIC and (at least currently) don't have IOAPIC we want to make sure that acpi_irq_model doesn't stay set to ACPI_IRQ_MODEL_PIC (which is the default value). If we allowed it to stay then acpi_os_install_interrupt_handler() would try (and fail) to request_irq() for PIC. Instead we set the model to ACPI_IRQ_MODEL_PLATFORM which will prevent this from happening. Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Reviewed-by: Juergen Gross <jgross@suse.com>
|
#
7243b933 |
|
05-Feb-2017 |
Boris Ostrovsky <boris.ostrovsky@oracle.com> |
xen/pvh: Bootstrap PVH guest Start PVH guest at XEN_ELFNOTE_PHYS32_ENTRY address. Setup hypercall page, initialize boot_params, enable early page tables. Since this stub is executed before kernel entry point we cannot use variables in .bss which is cleared by kernel. We explicitly place variables that are initialized here into .data. While adjusting xen_hvm_init_shared_info() make it use cpuid_e?x() instead of cpuid() (wherever possible). Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Reviewed-by: Juergen Gross <jgross@suse.com>
|
#
063334f3 |
|
03-Feb-2017 |
Boris Ostrovsky <boris.ostrovsky@oracle.com> |
xen/x86: Remove PVH support We are replacing existing PVH guests with new implementation. We are keeping xen_pvh_domain() macro (for now set to zero) because when we introduce new PVH implementation later in this series we will reuse current PVH-specific code (xen_pvh_gnttab_setup()), and that code is conditioned by 'if (xen_pvh_domain())'. (We will also need a noop xen_pvh_domain() for !CONFIG_XEN_PVH). Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Reviewed-by: Juergen Gross <jgross@suse.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
#
73c1b41e |
|
21-Dec-2016 |
Thomas Gleixner <tglx@linutronix.de> |
cpu/hotplug: Cleanup state names When the state names got added a script was used to add the extra argument to the calls. The script basically converted the state constant to a string, but the cleanup to convert these strings into meaningful ones did not happen. Replace all the useless strings with 'subsys/xxx/yyy:state' strings which are used in all the other places already. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sebastian Siewior <bigeasy@linutronix.de> Link: http://lkml.kernel.org/r/20161221192112.085444152@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
af25ed59 |
|
31-Oct-2016 |
Andy Lutomirski <luto@kernel.org> |
x86/fpu: Remove clts() The kernel doesn't use clts() any more. Remove it and all of its paravirt infrastructure. A careful reader may notice that xen_clts() appears to have been buggy -- it didn't update xen_cr0_value. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com> Cc: Rik van Riel <riel@redhat.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: kvm list <kvm@vger.kernel.org> Link: http://lkml.kernel.org/r/3d3c8ca62f17579b9849a013d71e59a4d5d1b079.1477951965.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
cb5f7e7c |
|
12-Oct-2016 |
Arnd Bergmann <arnd@arndb.de> |
x86: xen: move cpu_up functions out of ifdef Three newly introduced functions are not defined when CONFIG_XEN_PVHVM is disabled, but are still being used: arch/x86/xen/enlighten.c:141:12: warning: ‘xen_cpu_up_prepare’ used but never defined arch/x86/xen/enlighten.c:142:12: warning: ‘xen_cpu_up_online’ used but never defined arch/x86/xen/enlighten.c:143:12: warning: ‘xen_cpu_dead’ used but never defined Fixes: 4d737042d6c4 ("xen/x86: Convert to hotplug state machine") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
|
#
565fdc6a |
|
02-Oct-2016 |
Boris Ostrovsky <boris.ostrovsky@oracle.com> |
xen/x86: Initialize per_cpu(xen_vcpu, 0) a little earlier xen_cpuhp_setup() calls mutex_lock() which, when CONFIG_DEBUG_MUTEXES is defined, ends up calling xen_save_fl(). That routine expects per_cpu(xen_vcpu, 0) to be already initialized. Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Reported-by: Sander Eikelenboom <linux@eikelenboom.it> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
|
#
72a9b186 |
|
26-Aug-2016 |
KarimAllah Ahmed <karahmed@amazon.de> |
xen: Remove event channel notification through Xen PCI platform device Ever since commit 254d1a3f02eb ("xen/pv-on-hvm kexec: shutdown watches from old kernel") using the INTx interrupt from Xen PCI platform device for event channel notification would just lockup the guest during bootup. postcore_initcall now calls xs_reset_watches which will eventually try to read a value from XenStore and will get stuck on read_reply at XenBus forever since the platform driver is not probed yet and its INTx interrupt handler is not registered yet. That means that the guest can not be notified at this moment of any pending event channels and none of the per-event handlers will ever be invoked (including the XenStore one) and the reply will never be picked up by the kernel. The exact stack where things get stuck during xenbus_init: -xenbus_init -xs_init -xs_reset_watches -xenbus_scanf -xenbus_read -xs_single -xs_single -xs_talkv Vector callbacks have always been the favourite event notification mechanism since their introduction in commit 38e20b07efd5 ("x86/xen: event channels delivery on HVM.") and the vector callback feature has always been advertised for quite some time by Xen that's why INTx was broken for several years now without impacting anyone. Luckily this also means that event channel notification through INTx is basically dead-code which can be safely removed without impacting anybody since it has been effectively disabled for more than 4 years with nobody complaining about it (at least as far as I'm aware of). This commit removes event channel notification through Xen PCI platform device. Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: David Vrabel <david.vrabel@citrix.com> Cc: Juergen Gross <jgross@suse.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: x86@kernel.org Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Cc: Julien Grall <julien.grall@citrix.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Ross Lagerwall <ross.lagerwall@citrix.com> Cc: xen-devel@lists.xenproject.org Cc: linux-kernel@vger.kernel.org Cc: linux-pci@vger.kernel.org Cc: Anthony Liguori <aliguori@amazon.com> Signed-off-by: KarimAllah Ahmed <karahmed@amazon.de> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
|
#
1ef55be1 |
|
29-Sep-2016 |
Andy Lutomirski <luto@kernel.org> |
x86/asm: Get rid of __read_cr4_safe() We use __read_cr4() vs __read_cr4_safe() inconsistently. On CR4-less CPUs, all CR4 bits are effectively clear, so we can make the code simpler and more robust by making __read_cr4() always fix up faults on 32-bit kernels. This may fix some bugs on old 486-like CPUs, but I don't have any easy way to test that. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: Borislav Petkov <bp@alien8.de> Cc: david@saggiorato.net Link: http://lkml.kernel.org/r/ea647033d357d9ce2ad2bbde5a631045f5052fb6.1475178370.git.luto@kernel.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
4d737042 |
|
07-Sep-2016 |
Boris Ostrovsky <boris.ostrovsky@oracle.com> |
xen/x86: Convert to hotplug state machine Switch to new CPU hotplug infrastructure. Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Suggested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
|
#
99bc6753 |
|
29-Aug-2016 |
Juergen Gross <jgross@suse.com> |
xen: Add xen_pin_vcpu() to support calling functions on a dedicated pCPU Some hardware models (e.g. Dell Studio 1555 laptops) require calls to the firmware to be issued on CPU 0 only. As Dom0 might have to use these calls, add xen_pin_vcpu() to achieve this functionality. In case either the domain doesn't have the privilege to make the related hypercall or the hypervisor isn't supporting it, issue a warning once and disable further pinning attempts. Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: David Vrabel <david.vrabel@citrix.com> Cc: Douglas_Warzecha@dell.com Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: akataria@vmware.com Cc: boris.ostrovsky@oracle.com Cc: chrisw@sous-sol.org Cc: hpa@zytor.com Cc: jdelvare@suse.com Cc: jeremy@goop.org Cc: linux@roeck-us.net Cc: pali.rohar@gmail.com Cc: rusty@rustcorp.com.au Cc: virtualization@lists.linux-foundation.org Cc: xen-devel@lists.xenproject.org Link: http://lkml.kernel.org/r/1472453327-19050-5-git-send-email-jgross@suse.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
5fc509bc |
|
03-Aug-2016 |
Boris Ostrovsky <boris.ostrovsky@oracle.com> |
xen/x86: Move irq allocation from Xen smp_op.cpu_up() Commit ce0d3c0a6fb1 ("genirq: Revert sparse irq locking around __cpu_up() and move it to x86 for now") reverted irq locking introduced by commit a89941816726 ("hotplug: Prevent alloc/free of irq descriptors during cpu up/down") because of Xen allocating irqs in both of its cpu_up ops. We can move those allocations into CPU notifiers so that original patch can be reinstated. Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
|
#
55467dea |
|
29-Jul-2016 |
Vitaly Kuznetsov <vkuznets@redhat.com> |
xen: change the type of xen_vcpu_id to uint32_t We pass xen_vcpu_id mapping information to hypercalls which require uint32_t type so it would be cleaner to have it as uint32_t. The initializer to -1 can be dropped as we always do the mapping before using it and we never check the 'not set' value anyway. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
|
#
c0253115 |
|
02-Aug-2016 |
Petr Tesarik <ptesarik@suse.com> |
kexec: allow kdump with crash_kexec_post_notifiers If a crash kernel is loaded, do not crash the running domain. This is needed if the kernel is loaded with crash_kexec_post_notifiers, because panic notifiers are run before __crash_kexec() in that case, and this Xen hook prevents its being called later. [akpm@linux-foundation.org: build fix: unconditionally include kexec.h] Link: http://lkml.kernel.org/r/20160713122000.14969.99963.stgit@hananiah.suse.cz Signed-off-by: Petr Tesarik <ptesarik@suse.com> Cc: Juergen Gross <jgross@suse.com> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Eric Biederman <ebiederm@xmission.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Dave Young <dyoung@redhat.com> Cc: David Vrabel <david.vrabel@citrix.com> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
ee42d665 |
|
30-Jun-2016 |
Vitaly Kuznetsov <vkuznets@redhat.com> |
xen/pvhvm: run xen_vcpu_setup() for the boot CPU Historically we didn't call VCPUOP_register_vcpu_info for CPU0 for PVHVM guests (while we had it for PV and ARM guests). This is usually fine as we can use vcpu info in the shared_info page but when we try booting on a vCPU with Xen's vCPU id > 31 (e.g. when we try to kdump after crashing on this CPU) we're not able to boot. Switch to always doing VCPUOP_register_vcpu_info for the boot CPU. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
|
#
e15a8621 |
|
30-Jun-2016 |
Vitaly Kuznetsov <vkuznets@redhat.com> |
x86/xen: use xen_vcpu_id mapping when pointing vcpu_info to shared_info shared_info page has space for 32 vcpu info slots for first 32 vCPUs but these are the first 32 vCPUs from Xen's perspective and we should map them accordingly with the newly introduced xen_vcpu_id mapping. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
|
#
ad5475f9 |
|
30-Jun-2016 |
Vitaly Kuznetsov <vkuznets@redhat.com> |
x86/xen: use xen_vcpu_id mapping for HYPERVISOR_vcpu_op HYPERVISOR_vcpu_op() passes Linux's idea of vCPU id as a parameter while Xen's idea is expected. In some cases these ideas diverge so we need to do remapping. Convert all callers of HYPERVISOR_vcpu_op() to use xen_vcpu_nr(). Leave xen_fill_possible_map() and xen_filter_cpu_maps() intact as they're only being called by PV guests before perpu areas are initialized. While the issue could be solved by switching to early_percpu for xen_vcpu_id I think it's not worth it: PV guests will probably never get to the point where their idea of vCPU id diverges from Xen's. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
|
#
88e957d6 |
|
30-Jun-2016 |
Vitaly Kuznetsov <vkuznets@redhat.com> |
xen: introduce xen_vcpu_id mapping It may happen that Xen's and Linux's ideas of vCPU id diverge. In particular, when we crash on a secondary vCPU we may want to do kdump and unlike plain kexec where we do migrate_to_reboot_cpu() we try booting on the vCPU which crashed. This doesn't work very well for PVHVM guests as we have a number of hypercalls where we pass vCPU id as a parameter. These hypercalls either fail or do something unexpected. To solve the issue introduce percpu xen_vcpu_id mapping. ARM and PV guests get direct mapping for now. Boot CPU for PVHVM guest gets its id from CPUID. With secondary CPUs it is a bit more trickier. Currently, we initialize IPI vectors before these CPUs boot so we can't use CPUID. Use ACPI ids from MADT instead. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
|
#
7a2463dc |
|
13-Jul-2016 |
Paul Gortmaker <paul.gortmaker@windriver.com> |
x86/xen: Audit and remove any unnecessary uses of module.h Historically a lot of these existed because we did not have a distinction between what was modular code and what was providing support to modules via EXPORT_SYMBOL and friends. That changed when we forked out support for the latter into the export.h file. This means we should be able to reduce the usage of module.h in code that is obj-y Makefile or bool Kconfig. The advantage in doing so is that module.h itself sources about 15 other headers; adding significantly to what we feed cpp, and it can obscure what headers we are effectively using. Since module.h was the source for init.h (for __init) and for export.h (for EXPORT_SYMBOL) we consider each obj-y/bool instance for the presence of either and replace as needed. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Acked-by: Juergen Gross <jgross@suse.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: David Vrabel <david.vrabel@citrix.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: xen-devel@lists.xenproject.org Link: http://lkml.kernel.org/r/20160714001901.31603-7-paul.gortmaker@windriver.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
585423c8 |
|
29-Jun-2016 |
Amitoj Kaur Chawla <amitoj1606@gmail.com> |
x86/xen: Use DIV_ROUND_UP The kernel.h macro DIV_ROUND_UP performs the computation (((n) + (d) - 1) /(d)) but is perhaps more readable. The Coccinelle script used to make this change is as follows: @haskernel@ @@ #include <linux/kernel.h> @depends on haskernel@ expression n,d; @@ ( - (n + d - 1) / d + DIV_ROUND_UP(n,d) | - (n + (d - 1)) / d + DIV_ROUND_UP(n,d) ) Signed-off-by: Amitoj Kaur Chawla <amitoj1606@gmail.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
|
#
99158f10 |
|
24-May-2016 |
Andy Lutomirski <luto@kernel.org> |
x86/xen: Simplify set_aliased_prot() A year ago, via the following commit: aa1acff356bb ("x86/xen: Probe target addresses in set_aliased_prot() before the hypercall") I added an explicit probe to work around a hypercall issue. The code can be simplified by using probe_kernel_read(). No change in functionality. Signed-off-by: Andy Lutomirski <luto@kernel.org> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: David Vrabel <david.vrabel@citrix.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: David Vrabel <dvrabel@cantab.net> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jan Beulich <jbeulich@suse.com> Cc: Kees Cook <keescook@chromium.org> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: xen-devel <xen-devel@lists.xen.org> Link: http://lkml.kernel.org/r/0706f1a2538e481194514197298cca6b5e3f2638.1464129798.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
867fe800 |
|
13-Apr-2016 |
Luis R. Rodriguez <mcgrof@kernel.org> |
x86/paravirt: Remove paravirt_enabled() Now that all previous paravirt_enabled() uses were replaced with proper x86 semantics by the previous patches we can remove the unused paravirt_enabled() mechanism. Signed-off-by: Luis R. Rodriguez <mcgrof@kernel.org> Acked-by: Juergen Gross <jgross@suse.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: andrew.cooper3@citrix.com Cc: andriy.shevchenko@linux.intel.com Cc: bigeasy@linutronix.de Cc: boris.ostrovsky@oracle.com Cc: david.vrabel@citrix.com Cc: ffainelli@freebox.fr Cc: george.dunlap@citrix.com Cc: glin@suse.com Cc: jlee@suse.com Cc: josh@joshtriplett.org Cc: julien.grall@linaro.org Cc: konrad.wilk@oracle.com Cc: kozerkov@parallels.com Cc: lenb@kernel.org Cc: lguest@lists.ozlabs.org Cc: linux-acpi@vger.kernel.org Cc: lv.zheng@intel.com Cc: matt@codeblueprint.co.uk Cc: mbizon@freebox.fr Cc: rjw@rjwysocki.net Cc: robert.moore@intel.com Cc: rusty@rustcorp.com.au Cc: tiwai@suse.de Cc: toshi.kani@hp.com Cc: xen-devel@lists.xensource.com Link: http://lkml.kernel.org/r/1460592286-300-15-git-send-email-mcgrof@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
8d152e7a |
|
13-Apr-2016 |
Luis R. Rodriguez <mcgrof@kernel.org> |
x86/rtc: Replace paravirt rtc check with platform legacy quirk We have 4 types of x86 platforms that disable RTC: * Intel MID * Lguest - uses paravirt * Xen dom-U - uses paravirt * x86 on legacy systems annotated with an ACPI legacy flag We can consolidate all of these into a platform specific legacy quirk set early in boot through i386_start_kernel() and through x86_64_start_reservations(). This deals with the RTC quirks which we can rely on through the hardware subarch, the ACPI check can be dealt with separately. For Xen things are bit more complex given that the @X86_SUBARCH_XEN x86_hardware_subarch is shared on for Xen which uses the PV path for both domU and dom0. Since the semantics for differentiating between the two are Xen specific we provide a platform helper to help override default legacy features -- x86_platform.set_legacy_features(). Use of this helper is highly discouraged, its only purpose should be to account for the lack of semantics available within your given x86_hardware_subarch. As per 0-day, this bumps the vmlinux size using i386-tinyconfig as follows: TOTAL TEXT init.text x86_early_init_platform_quirks() +70 +62 +62 +43 Only 8 bytes overhead total, as the main increase in size is all removed via __init. Suggested-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Luis R. Rodriguez <mcgrof@kernel.org> Reviewed-by: Juergen Gross <jgross@suse.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: andrew.cooper3@citrix.com Cc: andriy.shevchenko@linux.intel.com Cc: bigeasy@linutronix.de Cc: boris.ostrovsky@oracle.com Cc: david.vrabel@citrix.com Cc: ffainelli@freebox.fr Cc: george.dunlap@citrix.com Cc: glin@suse.com Cc: jlee@suse.com Cc: josh@joshtriplett.org Cc: julien.grall@linaro.org Cc: konrad.wilk@oracle.com Cc: kozerkov@parallels.com Cc: lenb@kernel.org Cc: lguest@lists.ozlabs.org Cc: linux-acpi@vger.kernel.org Cc: lv.zheng@intel.com Cc: matt@codeblueprint.co.uk Cc: mbizon@freebox.fr Cc: rjw@rjwysocki.net Cc: robert.moore@intel.com Cc: rusty@rustcorp.com.au Cc: tiwai@suse.de Cc: toshi.kani@hp.com Cc: xen-devel@lists.xensource.com Link: http://lkml.kernel.org/r/1460592286-300-5-git-send-email-mcgrof@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
ea179481 |
|
13-Apr-2016 |
Luis R. Rodriguez <mcgrof@kernel.org> |
x86/xen: Use X86_SUBARCH_XEN for PV guest boots The use of subarch should have no current effect on Xen PV guests, as such this should have no current functional effects. Signed-off-by: Luis R. Rodriguez <mcgrof@kernel.org> Reviewed-by: David Vrabel <david.vrabel@citrix.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: andrew.cooper3@citrix.com Cc: andriy.shevchenko@linux.intel.com Cc: bigeasy@linutronix.de Cc: boris.ostrovsky@oracle.com Cc: ffainelli@freebox.fr Cc: george.dunlap@citrix.com Cc: glin@suse.com Cc: jgross@suse.com Cc: jlee@suse.com Cc: josh@joshtriplett.org Cc: julien.grall@linaro.org Cc: konrad.wilk@oracle.com Cc: kozerkov@parallels.com Cc: lenb@kernel.org Cc: lguest@lists.ozlabs.org Cc: linux-acpi@vger.kernel.org Cc: lv.zheng@intel.com Cc: matt@codeblueprint.co.uk Cc: mbizon@freebox.fr Cc: rjw@rjwysocki.net Cc: robert.moore@intel.com Cc: rusty@rustcorp.com.au Cc: tiwai@suse.de Cc: toshi.kani@hp.com Cc: xen-devel@lists.xensource.com Link: http://lkml.kernel.org/r/1460592286-300-3-git-send-email-mcgrof@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
dd2f4a00 |
|
02-Apr-2016 |
Andy Lutomirski <luto@kernel.org> |
x86/paravirt: Add paravirt_{read,write}_msr() This adds paravirt callbacks for unsafe MSR access. On native, they call native_{read,write}_msr(). On Xen, they use xen_{read,write}_msr_safe(). Nothing uses them yet for ease of bisection. The next patch will use them in rdmsrl(), wrmsrl(), etc. I intentionally didn't make them warn on #GP on Xen. I think that should be done separately by the Xen maintainers. Tested-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Andy Lutomirski <luto@kernel.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: KVM list <kvm@vger.kernel.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: xen-devel <Xen-devel@lists.xen.org> Link: http://lkml.kernel.org/r/880eebc5dcd2ad9f310d41345f82061ea500e9fa.1459605520.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
c2ee03b2 |
|
02-Apr-2016 |
Andy Lutomirski <luto@kernel.org> |
x86/paravirt: Add _safe to the read_ms()r and write_msr() PV callbacks These callbacks match the _safe variants, so name them accordingly. This will make room for unsafe PV callbacks. Tested-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Andy Lutomirski <luto@kernel.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: KVM list <kvm@vger.kernel.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: xen-devel <Xen-devel@lists.xen.org> Link: http://lkml.kernel.org/r/9ee3fb6a196a514c93325bdfa15594beecf04876.1459605520.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
16bf9226 |
|
29-Mar-2016 |
Borislav Petkov <bp@suse.de> |
x86/cpufeature: Remove cpu_has_pse Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1459266123-21878-11-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
c109bf95 |
|
29-Mar-2016 |
Borislav Petkov <bp@suse.de> |
x86/cpufeature: Remove cpu_has_pge Use static_cpu_has() in __flush_tlb_all() due to the time-sensitivity of this one. Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1459266123-21878-10-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
88ba2811 |
|
23-Mar-2016 |
Toshi Kani <toshi.kani@hpe.com> |
x86/xen, pat: Remove PAT table init code from Xen Xen supports PAT without MTRRs for its guests. In order to enable WC attribute, it was necessary for xen_start_kernel() to call pat_init_cache_modes() to update PAT table before starting guest kernel. Now that the kernel initializes PAT table to the BIOS handoff state when MTRR is disabled, this Xen-specific PAT init code is no longer necessary. Delete it from xen_start_kernel(). Also change __init_cache_modes() to a static function since PAT table should not be tweaked by other modules. Signed-off-by: Toshi Kani <toshi.kani@hpe.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Juergen Gross <jgross@suse.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Borislav Petkov <bp@suse.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Luis R. Rodriguez <mcgrof@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Toshi Kani <toshi.kani@hp.com> Cc: elliott@hpe.com Cc: paul.gortmaker@windriver.com Cc: xen-devel@lists.xenproject.org Link: http://lkml.kernel.org/r/1458769323-24491-7-git-send-email-toshi.kani@hpe.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
02f037d6 |
|
23-Mar-2016 |
Toshi Kani <toshi.kani@hpe.com> |
x86/mm/pat: Add support of non-default PAT MSR setting In preparation for fixing a regression caused by: 9cd25aac1f44 ("x86/mm/pat: Emulate PAT when it is disabled")' ... PAT needs to support a case that PAT MSR is initialized with a non-default value. When pat_init() is called and PAT is disabled, it initializes the PAT table with the BIOS default value. Xen, however, sets PAT MSR with a non-default value to enable WC. This causes inconsistency between the PAT table and PAT MSR when PAT is set to disable on Xen. Change pat_init() to handle the PAT disable cases properly. Add init_cache_modes() to handle two cases when PAT is set to disable. 1. CPU supports PAT: Set PAT table to be consistent with PAT MSR. 2. CPU does not support PAT: Set PAT table to be consistent with PWT and PCD bits in a PTE. Note, __init_cache_modes(), renamed from pat_init_cache_modes(), will be changed to a static function in a later patch. Signed-off-by: Toshi Kani <toshi.kani@hpe.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Borislav Petkov <bp@suse.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Juergen Gross <jgross@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Luis R. Rodriguez <mcgrof@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Toshi Kani <toshi.kani@hp.com> Cc: elliott@hpe.com Cc: konrad.wilk@oracle.com Cc: paul.gortmaker@windriver.com Cc: xen-devel@lists.xenproject.org Link: http://lkml.kernel.org/r/1458769323-24491-2-git-send-email-toshi.kani@hpe.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
b7a58459 |
|
16-Mar-2016 |
Andy Lutomirski <luto@kernel.org> |
x86/iopl/64: Properly context-switch IOPL on Xen PV On Xen PV, regs->flags doesn't reliably reflect IOPL and the exit-to-userspace code doesn't change IOPL. We need to context switch it manually. I'm doing this without going through paravirt because this is specific to Xen PV. After the dust settles, we can merge this with the 32-bit code, tidy up the iopl syscall implementation, and remove the set_iopl pvop entirely. Fixes XSA-171. Reviewewd-by: Jan Beulich <JBeulich@suse.com> Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Andrew Cooper <andrew.cooper3@citrix.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: David Vrabel <david.vrabel@citrix.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jan Beulich <JBeulich@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/693c3bd7aeb4d3c27c92c622b7d0f554a458173c.1458162709.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
983bb6d2 |
|
28-Feb-2016 |
Josh Poimboeuf <jpoimboe@redhat.com> |
x86/xen: Mark xen_cpuid() stack frame as non-standard objtool reports the following false positive warning: arch/x86/xen/enlighten.o: warning: objtool: xen_cpuid()+0x41: can't find jump dest instruction at .text+0x108 The warning is due to xen_cpuid()'s use of XEN_EMULATE_PREFIX to insert some fake instructions which objtool doesn't know how to decode. Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Bernd Petrovitsch <bernd@petrovitsch.priv.at> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Chris J Arges <chris.j.arges@canonical.com> Cc: David Vrabel <david.vrabel@citrix.com> Cc: Jiri Slaby <jslaby@suse.cz> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Michal Marek <mmarek@suse.cz> Cc: Namhyung Kim <namhyung@gmail.com> Cc: Pedro Alves <palves@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: live-patching@vger.kernel.org Link: http://lkml.kernel.org/r/bb88399840406629e3417831dc371ecd2842e2a6.1456719558.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
16aaa537 |
|
25-Jan-2016 |
Huaitong Han <huaitong.han@intel.com> |
x86/cpufeature: Use enum cpuid_leafs instead of magic numbers Most of the magic numbers in x86_capability[] have been converted to 'enum cpuid_leafs', and this patch updates the remaining part. Signed-off-by: Huaitong Han <huaitong.han@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Alexander Kuleshov <kuleshovmail@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: David Vrabel <david.vrabel@citrix.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Hector Marco-Gisbert <hecmargi@upv.es> Cc: Jiang Liu <jiang.liu@linux.intel.com> Cc: Kees Cook <keescook@chromium.org> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: Viresh Kumar <viresh.kumar@linaro.org> Cc: lguest@lists.ozlabs.org Cc: xen-devel@lists.xenproject.org Link: http://lkml.kernel.org/r/1453750913-4781-3-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
cfafae94 |
|
23-Nov-2015 |
Stefano Stabellini <stefano.stabellini@eu.citrix.com> |
xen: rename dom0_op to platform_op The dom0_op hypercall has been renamed to platform_op since Xen 3.2, which is ancient, and modern upstream Linux kernels cannot run as dom0 and it anymore anyway. Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
|
#
d8c98a1d |
|
11-Dec-2015 |
David Vrabel <david.vrabel@citrix.com> |
x86/paravirt: Prevent rtc_cmos platform device init on PV guests Adding the rtc platform device in non-privileged Xen PV guests causes an IRQ conflict because these guests do not have legacy PIC and may allocate irqs in the legacy range. In a single VCPU Xen PV guest we should have: /proc/interrupts: CPU0 0: 4934 xen-percpu-virq timer0 1: 0 xen-percpu-ipi spinlock0 2: 0 xen-percpu-ipi resched0 3: 0 xen-percpu-ipi callfunc0 4: 0 xen-percpu-virq debug0 5: 0 xen-percpu-ipi callfuncsingle0 6: 0 xen-percpu-ipi irqwork0 7: 321 xen-dyn-event xenbus 8: 90 xen-dyn-event hvc_console ... But hvc_console cannot get its interrupt because it is already in use by rtc0 and the console does not work. genirq: Flags mismatch irq 8. 00000000 (hvc_console) vs. 00000000 (rtc0) We can avoid this problem by realizing that unprivileged PV guests (both Xen and lguests) are not supposed to have rtc_cmos device and so adding it is not necessary. Privileged guests (i.e. Xen's dom0) do use it but they should not have irq conflicts since they allocate irqs above legacy range (above gsi_top, in fact). Instead of explicitly testing whether the guest is privileged we can extend pv_info structure to include information about guest's RTC support. Reported-and-tested-by: Sander Eikelenboom <linux@eikelenboom.it> Signed-off-by: David Vrabel <david.vrabel@citrix.com> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: vkuznets@redhat.com Cc: xen-devel@lists.xenproject.org Cc: konrad.wilk@oracle.com Cc: stable@vger.kernel.org # 4.2+ Link: http://lkml.kernel.org/r/1449842873-2613-1-git-send-email-boris.ostrovsky@oracle.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
91e2eea9 |
|
19-Nov-2015 |
Boris Ostrovsky <boris.ostrovsky@oracle.com> |
x86/xen: Avoid fast syscall path for Xen PV guests After 32-bit syscall rewrite, and specifically after commit: 5f310f739b4c ("x86/entry/32: Re-implement SYSENTER using the new C path") ... the stack frame that is passed to xen_sysexit is no longer a "standard" one (i.e. it's not pt_regs). Since we end up calling xen_iret from xen_sysexit we don't need to fix up the stack and instead follow entry_SYSENTER_32's IRET path directly to xen_iret. We can do the same thing for compat mode even though stack does not need to be fixed. This will allow us to drop usergs_sysret32 paravirt op (in the subsequent patch) Suggested-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Reviewed-by: Borislav Petkov <bp@suse.de> Acked-by: Andy Lutomirski <luto@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: david.vrabel@citrix.com Cc: konrad.wilk@oracle.com Cc: virtualization@lists.linux-foundation.org Cc: xen-devel@lists.xenproject.org Link: http://lkml.kernel.org/r/1447970147-1733-2-git-send-email-boris.ostrovsky@oracle.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
88c15ec9 |
|
19-Nov-2015 |
Boris Ostrovsky <boris.ostrovsky@oracle.com> |
x86/paravirt: Remove the unused irq_enable_sysexit pv op As result of commit "x86/xen: Avoid fast syscall path for Xen PV guests", the irq_enable_sysexit pv op is not called by Xen PV guests anymore and since they were the only ones who used it we can safely remove it. Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Reviewed-by: Borislav Petkov <bp@suse.de> Acked-by: Andy Lutomirski <luto@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: david.vrabel@citrix.com Cc: konrad.wilk@oracle.com Cc: virtualization@lists.linux-foundation.org Cc: xen-devel@lists.xenproject.org Link: http://lkml.kernel.org/r/1447970147-1733-3-git-send-email-boris.ostrovsky@oracle.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
5fdf5d37 |
|
19-Nov-2015 |
Boris Ostrovsky <boris.ostrovsky@oracle.com> |
x86/xen: Avoid fast syscall path for Xen PV guests After 32-bit syscall rewrite, and specifically after commit: 5f310f739b4c ("x86/entry/32: Re-implement SYSENTER using the new C path") ... the stack frame that is passed to xen_sysexit is no longer a "standard" one (i.e. it's not pt_regs). Since we end up calling xen_iret from xen_sysexit we don't need to fix up the stack and instead follow entry_SYSENTER_32's IRET path directly to xen_iret. We can do the same thing for compat mode even though stack does not need to be fixed. This will allow us to drop usergs_sysret32 paravirt op (in the subsequent patch) Suggested-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Reviewed-by: Borislav Petkov <bp@suse.de> Acked-by: Andy Lutomirski <luto@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: david.vrabel@citrix.com Cc: konrad.wilk@oracle.com Cc: virtualization@lists.linux-foundation.org Cc: xen-devel@lists.xenproject.org Link: http://lkml.kernel.org/r/1447970147-1733-2-git-send-email-boris.ostrovsky@oracle.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
46095865 |
|
17-Nov-2015 |
Juergen Gross <jgross@suse.com> |
x86/paravirt: Remove unused pv_apic_ops structure The only member of that structure is startup_ipi_hook which is always set to paravirt_nop. Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: David Vrabel <david.vrabel@citrix.com> Cc: jeremy@goop.org Cc: chrisw@sous-sol.org Cc: akataria@vmware.com Cc: rusty@rustcorp.com.au Cc: virtualization@lists.linux-foundation.org Cc: xen-devel@lists.xen.org Cc: konrad.wilk@oracle.com Cc: boris.ostrovsky@oracle.com Link: http://lkml.kernel.org/r/1447767872-16730-1-git-send-email-jgross@suse.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
a314e3eb |
|
22-Oct-2015 |
Stefano Stabellini <stefano.stabellini@eu.citrix.com> |
xen/arm: Enable cpu_hotplug.c Build cpu_hotplug for ARM and ARM64 guests. Rename arch_(un)register_cpu to xen_(un)register_cpu and provide an empty implementation on ARM and ARM64. On x86 just call arch_(un)register_cpu as we are already doing. Initialize cpu_hotplug on ARM. Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Reviewed-by: Julien Grall <julien.grall@citrix.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
|
#
0b34a166 |
|
25-Sep-2015 |
Vitaly Kuznetsov <vkuznets@redhat.com> |
x86/xen: Support kexec/kdump in HVM guests by doing a soft reset Currently there is a number of issues preventing PVHVM Xen guests from doing successful kexec/kdump: - Bound event channels. - Registered vcpu_info. - PIRQ/emuirq mappings. - shared_info frame after XENMAPSPACE_shared_info operation. - Active grant mappings. Basically, newly booted kernel stumbles upon already set up Xen interfaces and there is no way to reestablish them. In Xen-4.7 a new feature called 'soft reset' is coming. A guest performing kexec/kdump operation is supposed to call SCHEDOP_shutdown hypercall with SHUTDOWN_soft_reset reason before jumping to new kernel. Hypervisor (with some help from toolstack) will do full domain cleanup (but keeping its memory and vCPU contexts intact) returning the guest to the state it had when it was first booted and thus allowing it to start over. Doing SHUTDOWN_soft_reset on Xen hypervisors which don't support it is probably OK as by default all unknown shutdown reasons cause domain destroy with a message in toolstack log: 'Unknown shutdown reason code 5. Destroying domain.' which gives a clue to what the problem is and eliminates false expectations. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
|
#
2ecf91b6 |
|
21-Sep-2015 |
Boris Ostrovsky <boris.ostrovsky@oracle.com> |
xen/x86: Don't try to write syscall-related MSRs for PV guests For PV guests these registers are set up by hypervisor and thus should not be written by the guest. The comment in xen_write_msr_safe() says so but we still write the MSRs, causing the hypervisor to print a warning. Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
|
#
3375d828 |
|
10-Aug-2015 |
Boris Ostrovsky <boris.ostrovsky@oracle.com> |
xen/x86: Don't try to set PCE bit in CR4 Since VPMU code emulates RDPMC instruction with RDMSR and because hypervisor does not emulate it there is no reason to try setting CR4's PCE bit (and the hypervisor will warn on seeing it set). Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
|
#
6b08cd63 |
|
10-Aug-2015 |
Boris Ostrovsky <boris.ostrovsky@oracle.com> |
xen/PMU: Intercept PMU-related MSR and APIC accesses Provide interfaces for recognizing accesses to PMU-related MSRs and LVTPC APIC and process these accesses in Xen PMU code. (The interrupt handler performs XENPMU_flush right away in the beginning since no PMU emulation is available. It will be added with a later patch). Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Reviewed-by: David Vrabel <david.vrabel@citrix.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
|
#
65d0cf0b |
|
10-Aug-2015 |
Boris Ostrovsky <boris.ostrovsky@oracle.com> |
xen/PMU: Initialization code for Xen PMU Map shared data structure that will hold CPU registers, VPMU context, V/PCPU IDs of the CPU interrupted by PMU interrupt. Hypervisor fills this information in its handler and passes it to the guest for further processing. Set up PMU VIRQ. Now that perf infrastructure will assume that PMU is available on a PV guest we need to be careful and make sure that accesses via RDPMC instruction don't cause fatal traps by the hypervisor. Provide a nop RDPMC handler. For the same reason avoid issuing a warning on a write to APIC's LVTPC. Both of these will be made functional in later patches. Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Reviewed-by: David Vrabel <david.vrabel@citrix.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
|
#
6c2681c8 |
|
16-Jul-2015 |
Juergen Gross <jgross@suse.com> |
xen: add explicit memblock_reserve() calls for special pages Some special pages containing interfaces to xen are being reserved implicitly only today. The memblock_reserve() call to reserve them is meant to reserve the p2m list supplied by xen. It is just reserving not only the p2m list itself, but some more pages up to the start of the xen built page tables. To be able to move the p2m list to another pfn range, which is needed for support of huge RAM, this memblock_reserve() must be split up to cover all affected reserved pages explicitly. The affected pages are: - start_info page - xenstore ring (might be missing, mfn is 0 in this case) - console ring (not for initial domain) Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
|
#
aa1acff3 |
|
30-Jul-2015 |
Andy Lutomirski <luto@kernel.org> |
x86/xen: Probe target addresses in set_aliased_prot() before the hypercall The update_va_mapping hypercall can fail if the VA isn't present in the guest's page tables. Under certain loads, this can result in an OOPS when the target address is in unpopulated vmap space. While we're at it, add comments to help explain what's going on. This isn't a great long-term fix. This code should probably be changed to use something like set_memory_ro. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Andrew Cooper <andrew.cooper3@citrix.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: David Vrabel <dvrabel@cantab.net> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jan Beulich <jbeulich@suse.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sasha Levin <sasha.levin@oracle.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: security@kernel.org <security@kernel.org> Cc: <stable@vger.kernel.org> Cc: xen-devel <xen-devel@lists.xen.org> Link: http://lkml.kernel.org/r/0b0e55b995cda11e7829f140b833ef932fcabe3a.1438291540.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
9261e050 |
|
25-Jun-2015 |
Andy Lutomirski <luto@kernel.org> |
x86/asm/tsc, x86/paravirt: Remove read_tsc() and read_tscp() paravirt hooks We've had ->read_tsc() and ->read_tscp() paravirt hooks since the very beginning of paravirt, i.e., d3561b7fa0fb ("[PATCH] paravirt: header and stubs for paravirtualisation"). AFAICT, the only paravirt guest implementation that ever replaced these calls was vmware, and it's gone. Arguably even vmware shouldn't have hooked RDTSC -- we fully support systems that don't have a TSC at all, so there's no point for a paravirt implementation to pretend that we have a TSC but to replace it. I also doubt that these hooks actually worked. Calls to rdtscl() and rdtscll(), which respected the hooks, were used seemingly interchangeably with native_read_tsc(), which did not. Just remove them. If anyone ever needs them again, they can try to make a case for why they need them. Before, on a paravirt config: text data bss dec hex filename 12618257 1816384 1093632 15528273 ecf151 vmlinux After: text data bss dec hex filename 12617207 1816384 1093632 15527223 eced37 vmlinux Signed-off-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Huang Rui <ray.huang@amd.com> Cc: John Stultz <john.stultz@linaro.org> Cc: Len Brown <lenb@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: kvm ML <kvm@vger.kernel.org> Cc: virtualization@lists.linux-foundation.org Link: http://lkml.kernel.org/r/d08a2600fb298af163681e5efd8e599d889a5b97.1434501121.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
9cd25aac |
|
04-Jun-2015 |
Borislav Petkov <bp@suse.de> |
x86/mm/pat: Emulate PAT when it is disabled In the case when PAT is disabled on the command line with "nopat" or when virtualization doesn't support PAT (correctly) - see 9d34cfdf4796 ("x86: Don't rely on VMWare emulating PAT MSR correctly"). we emulate it using the PWT and PCD cache attribute bits. Get rid of boot_pat_state while at it. Based on a conglomerate patch from Toshi Kani. Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Toshi Kani <toshi.kani@hp.com> Acked-by: Juergen Gross <jgross@suse.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Elliott@hp.com Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Luis R. Rodriguez <mcgrof@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: arnd@arndb.de Cc: hch@lst.de Cc: hmh@hmh.eng.br Cc: konrad.wilk@oracle.com Cc: linux-mm <linux-mm@kvack.org> Cc: linux-nvdimm@lists.01.org Cc: stefan.bader@canonical.com Cc: yigal@plexistor.com Link: http://lkml.kernel.org/r/1433436928-31903-3-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
21c4cd10 |
|
26-Apr-2015 |
Ingo Molnar <mingo@kernel.org> |
x86/fpu: Simplify fpu__cpu_init() After the latest round of cleanups, fpu__cpu_init() has become a simple call to fpu__init_cpu(). Rename fpu__init_cpu() to fpu__cpu_init() and remove the extra layer. Reviewed-by: Borislav Petkov <bp@alien8.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
3a9c4b0d |
|
03-Apr-2015 |
Ingo Molnar <mingo@kernel.org> |
x86/fpu: Rename fpu_init() to fpu__cpu_init() fpu_init() is a bit of a misnomer in that it (falsely) creates the impression that it's related to the (old) fpu_finit() function, which initializes FPU ctx state. Rename it to fpu__cpu_init() to make its boot time initialization clear, and to move it to the fpu__*() namespace. Also fix and extend its comment block to point out that it's called not only on the boot CPU, but on secondary CPUs as well. Reviewed-by: Borislav Petkov <bp@alien8.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
a71dbdaa |
|
04-May-2015 |
Boris Ostrovsky <boris.ostrovsky@oracle.com> |
hypervisor/x86/xen: Unset X86_BUG_SYSRET_SS_ATTRS on Xen PV guests Commit 61f01dd941ba ("x86_64, asm: Work around AMD SYSRET SS descriptor attribute issue") makes AMD processors set SS to __KERNEL_DS in __switch_to() to deal with cases when SS is NULL. This breaks Xen PV guests who do not want to load SS with__KERNEL_DS. Since the problem that the commit is trying to address would have to be fixed in the hypervisor (if it in fact exists under Xen) there is no reason to set X86_BUG_SYSRET_SS_ATTRS flag for PV VPCUs here. This can be easily achieved by adding x86_hyper_xen_hvm.set_cpu_features op which will clear this flag. (And since this structure is no longer HVM-specific we should do some renaming). Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Reported-by: Sander Eikelenboom <linux@eikelenboom.it> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
|
#
aac82d31 |
|
03-Apr-2015 |
Andy Lutomirski <luto@kernel.org> |
x86, paravirt, xen: Remove the 64-bit ->irq_enable_sysexit() pvop We don't use irq_enable_sysexit on 64-bit kernels any more. Remove all the paravirt and Xen machinery to support it on 64-bit kernels. Tested-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Denys Vlasenko <vda.linux@googlemail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/8a03355698fe5b94194e9e7360f19f91c1b2cf1f.1428100853.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
feb44f1f |
|
01-Mar-2015 |
Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> |
x86/xen: Provide a "Xen PV" APIC driver to support >255 VCPUs Instead of mangling the default APIC driver, provide a Xen PV guest specific one that explicitly provides appropriate methods. This allows use to report that all APIC IDs are valid, allowing dom0 to boot with more than 255 VCPUs. Since the probe order of APIC drivers is link dependent, we add in an late probe function to change to the Xen PV if it hadn't been done during bootup. Suggested-by: David Vrabel <david.vrabel@citrix.com> Reported-by: Cathy Avery <cathy.avery@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
|
#
8ef46a67 |
|
05-Mar-2015 |
Andy Lutomirski <luto@amacapital.net> |
x86/asm/entry: Add this_cpu_sp0() to read sp0 for the current cpu We currently store references to the top of the kernel stack in multiple places: kernel_stack (with an offset) and init_tss.x86_tss.sp0 (no offset). The latter is defined by hardware and is a clean canonical way to find the top of the stack. Add an accessor so we can start using it. This needs minor paravirt tweaks. On native, sp0 defines the top of the kernel stack and is therefore always correct. On Xen and lguest, the hypervisor tracks the top of the stack, but we want to start reading sp0 in the kernel. Fixing this is simple: just update our local copy of sp0 as well as the hypervisor's copy on task switches. Signed-off-by: Andy Lutomirski <luto@amacapital.net> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/8d675581859712bee09a055ed8f785d80dac1eca.1425611534.git.luto@amacapital.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
5054daa2 |
|
23-Feb-2015 |
Boris Ostrovsky <boris.ostrovsky@oracle.com> |
x86/xen: Initialize cr4 shadow for 64-bit PV(H) guests Commit 1e02ce4cccdc ("x86: Store a per-cpu shadow copy of CR4") introduced CR4 shadows. These shadows are initialized in early boot code. The commit missed initialization for 64-bit PV(H) guests that this patch adds. Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
|
#
31795b47 |
|
11-Feb-2015 |
Boris Ostrovsky <boris.ostrovsky@oracle.com> |
x86/xen: Make sure X2APIC_ENABLE bit of MSR_IA32_APICBASE is not set Commit d524165cb8db ("x86/apic: Check x2apic early") tests X2APIC_ENABLE bit of MSR_IA32_APICBASE when CONFIG_X86_X2APIC is off and panics the kernel when this bit is set. Xen's PV guests will pass this MSR read to the hypervisor which will return its version of the MSR, where this bit might be set. Make sure we clear it before returning MSR value to the caller. Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
|
#
375074cc |
|
24-Oct-2014 |
Andy Lutomirski <luto@amacapital.net> |
x86: Clean up cr4 manipulation CR4 manipulation was split, seemingly at random, between direct (write_cr4) and using a helper (set/clear_in_cr4). Unfortunately, the set_in_cr4 and clear_in_cr4 helpers also poke at the boot code, which only a small subset of users actually wanted. This patch replaces all cr4 access in functions that don't leave cr4 exactly the way they found it with new helpers cr4_set_bits, cr4_clear_bits, and cr4_set_bits_and_update_boot. Signed-off-by: Andy Lutomirski <luto@amacapital.net> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Vince Weaver <vince@deater.net> Cc: "hillf.zj" <hillf.zj@alibaba-inc.com> Cc: Valdis Kletnieks <Valdis.Kletnieks@vt.edu> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/495a10bdc9e67016b8fd3945700d46cfd5c12c2f.1414190806.git.luto@amacapital.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
f221b04f |
|
13-Jan-2015 |
Jan Beulich <JBeulich@suse.com> |
x86/xen: properly retrieve NMI reason Using the native code here can't work properly, as the hypervisor would normally have cleared the two reason bits by the time Dom0 gets to see the NMI (if passed to it at all). There's a shared info field for this, and there's an existing hook to use - just fit the two together. This is particularly relevant so that NMIs intended to be handled by APEI / GHES actually make it to the respective handler. Note that the hook can (and should) be used irrespective of whether being in Dom0, as accessing port 0x61 in a DomU would be even worse, while the shared info field would just hold zero all the time. Note further that hardware NMI handling for PVH doesn't currently work anyway due to missing code in the hypervisor (but it is expected to work the native rather than the PV way). Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
|
#
47591df5 |
|
03-Nov-2014 |
Juergen Gross <jgross@suse.com> |
xen: Support Xen pv-domains using PAT With the dynamical mapping between cache modes and pgprot values it is now possible to use all cache modes via the Xen hypervisor PAT settings in a pv domain. All to be done is to read the PAT configuration MSR and set up the translation tables accordingly. Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: David Vrabel <david.vrabel@citrix.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: stefan.bader@canonical.com Cc: xen-devel@lists.xensource.com Cc: ville.syrjala@linux.intel.com Cc: jbeulich@suse.com Cc: toshi.kani@hp.com Cc: plagnioj@jcrosoft.com Cc: tomi.valkeinen@ti.com Cc: bhelgaas@google.com Link: http://lkml.kernel.org/r/1415019724-4317-19-git-send-email-jgross@suse.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
2c185687 |
|
14-Oct-2014 |
Juergen Gross <jgross@suse.com> |
x86/xen: delay construction of mfn_list_list The 3 level p2m tree for the Xen tools is constructed very early at boot by calling xen_build_mfn_list_list(). Memory needed for this tree is allocated via extend_brk(). As this tree (other than the kernel internal p2m tree) is only needed for domain save/restore, live migration and crash dump analysis it doesn't matter whether it is constructed very early or just some milliseconds later when memory allocation is possible by other means. This patch moves the call of xen_build_mfn_list_list() just after calling xen_pagetable_p2m_copy() simplifying this function, too, as it doesn't have to bother with two parallel trees now. The same applies for some other internal functions. While simplifying code, make early_can_reuse_p2m_middle() static and drop the unused second parameter. p2m_mid_identity_mfn can be removed as well, it isn't used either. Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
|
#
a2ef5dc2 |
|
10-Sep-2014 |
Mukesh Rathor <mukesh.rathor@oracle.com> |
x86/xen: Set EFER.NX and EFER.SCE in PVH guests This fixes two bugs in PVH guests: - Not setting EFER.NX means the NX bit in page table entries is ignored on Intel processors and causes reserved bit page faults on AMD processors. - After the Xen commit 7645640d6ff1 ("x86/PVH: don't set EFER_SCE for pvh guest") PVH guests are required to set EFER.SCE to enable the SYSCALL instruction. Secondary VCPUs are started with pagetables with the NX bit set so EFER.NX must be set before using any stack or data segment. xen_pvh_cpu_early_init() is the new secondary VCPU entry point that sets EFER before jumping to cpu_bringup_and_idle(). Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
|
#
d1e9abd6 |
|
16-Sep-2014 |
Juergen Gross <jgross@suse.com> |
xen: eliminate scalability issues from initrd handling Size restrictions native kernels wouldn't have resulted from the initrd getting mapped into the initial mapping. The kernel doesn't really need the initrd to be mapped, so use infrastructure available in Xen to avoid the mapping and hence the restriction. Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
|
#
f955371c |
|
07-Jan-2014 |
David Vrabel <david.vrabel@citrix.com> |
x86: remove the Xen-specific _PAGE_IOMAP PTE flag The _PAGE_IO_MAP PTE flag was only used by Xen PV guests to mark PTEs that were used to map I/O regions that are 1:1 in the p2m. This allowed Xen to obtain the correct PFN when converting the MFNs read from a PTE back to their PFN. Xen guests no longer use _PAGE_IOMAP for this. Instead mfn_to_pfn() returns the correct PFN by using a combination of the m2p and p2m to determine if an MFN corresponds to a 1:1 mapping in the the p2m. Remove _PAGE_IOMAP, replacing it with _PAGE_UNUSED2 to allow for future uses of the PTE flag. Signed-off-by: David Vrabel <david.vrabel@citrix.com> Acked-by: "H. Peter Anvin" <hpa@zytor.com>
|
#
89cbc767 |
|
16-Aug-2014 |
Christoph Lameter <cl@linux.com> |
x86: Replace __get_cpu_var uses __get_cpu_var() is used for multiple purposes in the kernel source. One of them is address calculation via the form &__get_cpu_var(x). This calculates the address for the instance of the percpu variable of the current processor based on an offset. Other use cases are for storing and retrieving data from the current processors percpu area. __get_cpu_var() can be used as an lvalue when writing data or on the right side of an assignment. __get_cpu_var() is defined as : #define __get_cpu_var(var) (*this_cpu_ptr(&(var))) __get_cpu_var() always only does an address determination. However, store and retrieve operations could use a segment prefix (or global register on other platforms) to avoid the address calculation. this_cpu_write() and this_cpu_read() can directly take an offset into a percpu area and use optimized assembly code to read and write per cpu variables. This patch converts __get_cpu_var into either an explicit address calculation using this_cpu_ptr() or into a use of this_cpu operations that use the offset. Thereby address calculations are avoided and less registers are used when code is generated. Transformations done to __get_cpu_var() 1. Determine the address of the percpu instance of the current processor. DEFINE_PER_CPU(int, y); int *x = &__get_cpu_var(y); Converts to int *x = this_cpu_ptr(&y); 2. Same as #1 but this time an array structure is involved. DEFINE_PER_CPU(int, y[20]); int *x = __get_cpu_var(y); Converts to int *x = this_cpu_ptr(y); 3. Retrieve the content of the current processors instance of a per cpu variable. DEFINE_PER_CPU(int, y); int x = __get_cpu_var(y) Converts to int x = __this_cpu_read(y); 4. Retrieve the content of a percpu struct DEFINE_PER_CPU(struct mystruct, y); struct mystruct x = __get_cpu_var(y); Converts to memcpy(&x, this_cpu_ptr(&y), sizeof(x)); 5. Assignment to a per cpu variable DEFINE_PER_CPU(int, y) __get_cpu_var(y) = x; Converts to __this_cpu_write(y, x); 6. Increment/Decrement etc of a per cpu variable DEFINE_PER_CPU(int, y); __get_cpu_var(y)++ Converts to __this_cpu_inc(y) Cc: Thomas Gleixner <tglx@linutronix.de> Cc: x86@kernel.org Acked-by: H. Peter Anvin <hpa@linux.intel.com> Acked-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Tejun Heo <tj@kernel.org>
|
#
c7341d6a |
|
12-Jul-2014 |
Daniel Kiper <daniel.kiper@oracle.com> |
arch/x86/xen: Silence compiler warnings Compiler complains in the following way when x86 32-bit kernel with Xen support is build: CC arch/x86/xen/enlighten.o arch/x86/xen/enlighten.c: In function ‘xen_start_kernel’: arch/x86/xen/enlighten.c:1726:3: warning: right shift count >= width of type [enabled by default] Such line contains following EFI initialization code: boot_params.efi_info.efi_systab_hi = (__u32)(__pa(efi_systab_xen) >> 32); There is no issue if x86 64-bit kernel is build. However, 32-bit case generate warning (even if that code will not be executed because Xen does not work on 32-bit EFI platforms) due to __pa() returning unsigned long type which has 32-bits width. So move whole EFI initialization stuff to separate function and build it conditionally to avoid above mentioned warning on x86 32-bit architecture. Signed-off-by: Daniel Kiper <daniel.kiper@oracle.com> Reviewed-by: Konrad Rzeszutek Wilk <Konrad.wilk@oracle.com> Signed-off-by: Matt Fleming <matt.fleming@intel.com>
|
#
be81c8a1 |
|
30-Jun-2014 |
Daniel Kiper <daniel.kiper@oracle.com> |
xen: Put EFI machinery in place This patch enables EFI usage under Xen dom0. Standard EFI Linux Kernel infrastructure cannot be used because it requires direct access to EFI data and code. However, in dom0 case it is not possible because above mentioned EFI stuff is fully owned and controlled by Xen hypervisor. In this case all calls from dom0 to EFI must be requested via special hypercall which in turn executes relevant EFI code in behalf of dom0. When dom0 kernel boots it checks for EFI availability on a machine. If it is detected then artificial EFI system table is filled. Native EFI callas are replaced by functions which mimics them by calling relevant hypercall. Later pointer to EFI system table is passed to standard EFI machinery and it continues EFI subsystem initialization taking into account that there is no direct access to EFI boot services, runtime, tables, structures, etc. After that system runs as usual. This patch is based on Jan Beulich and Tang Liang work. Signed-off-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Tang Liang <liang.tang@oracle.com> Signed-off-by: Daniel Kiper <daniel.kiper@oracle.com> Reviewed-by: David Vrabel <david.vrabel@citrix.com> Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com Signed-off-by: Matt Fleming <matt.fleming@intel.com>
|
#
8d693b91 |
|
11-Jul-2014 |
Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> |
xen: Introduce 'xen_nopv' to disable PV extensions for HVM guests. By default when CONFIG_XEN and CONFIG_XEN_PVHVM kernels are run, they will enable the PV extensions (drivers, interrupts, timers, etc) - which is the best option for the majority of use cases. However, in some cases (kexec not fully working, benchmarking) we want to disable Xen PV extensions. As such introduce the 'xen_nopv' parameter that will do it. This parameter is intended only for HVM guests as the Xen PV guests MUST boot with PV extensions. However, even if you use 'xen_nopv' on Xen PV guests it will be ignored. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reviewed-by: David Vrabel <david.vrabel@citrix.com> --- [v2: s/off/xen_nopv/ per Boris Ostrovsky recommendation.] [v3: Add Reviewed-by] [v4: Clarify that this is only for HVM guests]
|
#
abacaadc |
|
02-Jun-2014 |
David Vrabel <david.vrabel@citrix.com> |
x86/xen: fix memory setup for PVH dom0 Since af06d66ee32b (x86: fix setup of PVH Dom0 memory map) in Xen, PVH dom0 need only use the memory memory provided by Xen which has already setup all the correct holes. xen_memory_setup() then ends up being trivial for a PVH guest so introduce a new function (xen_auto_xlated_memory_setup()). Signed-off-by: David Vrabel <david.vrabel@citrix.com> Acked-by: Roger Pau Monné <roger.pau@citrix.com> Tested-by: Roger Pau Monné <roger.pau@citrix.com>
|
#
bc5eb201 |
|
13-May-2014 |
Radim Krčmář <rkrcmar@redhat.com> |
xen/x86: set panic notifier priority to minimum Execution is not going to continue after telling Xen about the crash. Let other panic notifiers run by postponing the final hypercall as much as possible. Signed-off-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
|
#
2605fc21 |
|
01-May-2014 |
Andi Kleen <ak@linux.intel.com> |
asmlinkage, x86: Add explicit __visible to arch/x86/* As requested by Linus add explicit __visible to the asmlinkage users. This marks all functions visible to assembler. Tree sweep for arch/x86/* Signed-off-by: Andi Kleen <ak@linux.intel.com> Link: http://lkml.kernel.org/r/1398984278-29319-3-git-send-email-andi@firstfloor.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
|
#
afca5013 |
|
29-Jan-2014 |
Mukesh Rathor <mukesh.rathor@oracle.com> |
xen/pvh: set CR4 flags for APs During bootup in the 'probe_page_size_mask' these CR4 flags are set in there. But for AP processors they are not set as we do not use 'secondary_startup_64' which the baremetal kernels uses. Instead do it in this function which we use in Xen PVH during our startup for AP processors. As such fix it up to make sure we have that flag set. Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
#
c9f6e997 |
|
20-Jan-2014 |
Roger Pau Monne <roger.pau@citrix.com> |
xen/pvh: Set X86_CR0_WP and others in CR0 (v2) otherwise we will get for some user-space applications that use 'clone' with CLONE_CHILD_SETTID | CLONE_CHILD_CLEARTID end up hitting an assert in glibc manifested by: general protection ip:7f80720d364c sp:7fff98fd8a80 error:0 in libc-2.13.so[7f807209e000+180000] This is due to the nature of said operations which sets and clears the PID. "In the successful one I can see that the page table of the parent process has been updated successfully to use a different physical page, so the write of the tid on that page only affects the child... On the other hand, in the failed case, the write seems to happen before the copy of the original page is done, so both the parent and the child end up with the same value (because the parent copies the page after the write of the child tid has already happened)." (Roger's analysis). The nature of this is due to the Xen's commit of 51e2cac257ec8b4080d89f0855c498cbbd76a5e5 "x86/pvh: set only minimal cr0 and cr4 flags in order to use paging" the CR0_WP was removed so COW features of the Linux kernel were not operating properly. While doing that also update the rest of the CR0 flags to be inline with what a baremetal Linux kernel would set them to. In 'secondary_startup_64' (baremetal Linux) sets: X86_CR0_PE | X86_CR0_MP | X86_CR0_ET | X86_CR0_NE | X86_CR0_WP | X86_CR0_AM | X86_CR0_PG The hypervisor for HVM type guests (which PVH is a bit) sets: X86_CR0_PE | X86_CR0_ET | X86_CR0_TS For PVH it specifically sets: X86_CR0_PG Which means we need to set the rest: X86_CR0_MP | X86_CR0_NE | X86_CR0_WP | X86_CR0_AM to have full parity. Signed-off-by: Roger Pau Monne <roger.pau@citrix.com> Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> [v1: Took out the cr4 writes to be a seperate patch] [v2: 0-DAY kernel found xen_setup_gdt to be missing a static]
|
#
5602aba8 |
|
07-Jan-2014 |
Wei Yongjun <yongjun_wei@trendmicro.com.cn> |
xen/pvh: remove duplicated include from enlighten.c Remove duplicated include. Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
#
2771374d |
|
11-Dec-2013 |
Mukesh Rathor <mukesh.rathor@oracle.com> |
xen/pvh: Piggyback on PVHVM for event channels (v2) PVH is a PV guest with a twist - there are certain things that work in it like HVM and some like PV. There is a similar mode - PVHVM where we run in HVM mode with PV code enabled - and this patch explores that. The most notable PV interfaces are the XenBus and event channels. We will piggyback on how the event channel mechanism is used in PVHVM - that is we want the normal native IRQ mechanism and we will install a vector (hvm callback) for which we will call the event channel mechanism. This means that from a pvops perspective, we can use native_irq_ops instead of the Xen PV specific. Albeit in the future we could support pirq_eoi_map. But that is a feature request that can be shared with PVHVM. Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reviewed-by: David Vrabel <david.vrabel@citrix.com> Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
|
#
5840c84b |
|
13-Dec-2013 |
Mukesh Rathor <mukesh.rathor@oracle.com> |
xen/pvh: Secondary VCPU bringup (non-bootup CPUs) The VCPU bringup protocol follows the PV with certain twists. From xen/include/public/arch-x86/xen.h: Also note that when calling DOMCTL_setvcpucontext and VCPU_initialise for HVM and PVH guests, not all information in this structure is updated: - For HVM guests, the structures read include: fpu_ctxt (if VGCT_I387_VALID is set), flags, user_regs, debugreg[*] - PVH guests are the same as HVM guests, but additionally use ctrlreg[3] to set cr3. All other fields not used should be set to 0. This is what we do. We piggyback on the 'xen_setup_gdt' - but modify a bit - we need to call 'load_percpu_segment' so that 'switch_to_new_gdt' can load per-cpu data-structures. It has no effect on the VCPU0. We also piggyback on the %rdi register to pass in the CPU number - so that when we bootup a new CPU, the cpu_bringup_and_idle will have passed as the first parameter the CPU number (via %rdi for 64-bit). Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
#
8d656bbe |
|
13-Dec-2013 |
Mukesh Rathor <mukesh.rathor@oracle.com> |
xen/pvh: Load GDT/GS in early PV bootup code for BSP. During early bootup we start life using the Xen provided GDT, which means that we are running with %cs segment set to FLAT_KERNEL_CS (FLAT_RING3_CS64 0xe033, GDT index 261). But for PVH we want to be use HVM type mechanism for segment operations. As such we need to switch to the HVM one and also reload ourselves with the __KERNEL_CS:eip to run in the proper GDT and segment. For HVM this is usually done in 'secondary_startup_64' in (head_64.S) but since we are not taking that bootup path (we start in PV - xen_start_kernel) we need to do that in the early PV bootup paths. For good measure we also zero out the %fs, %ds, and %es (not strictly needed as Xen has already cleared them for us). The %gs is loaded by 'switch_to_new_gdt'. Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reviewed-by: David Vrabel <david.vrabel@citrix.com>
|
#
696fd7c5 |
|
14-Dec-2013 |
Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> |
xen/pvh: Don't setup P2M tree. P2M is not available for PVH. Fortunatly for us the P2M code already has mostly the support for auto-xlat guest thanks to commit 3d24bbd7dddbea54358a9795abaf051b0f18973c "grant-table: call set_phys_to_machine after mapping grant refs" which: " introduces set_phys_to_machine calls for auto_translated guests (even on x86) in gnttab_map_refs and gnttab_unmap_refs. translated by swiotlb-xen... " so we don't need to muck much. with above mentioned "commit you'll get set_phys_to_machine calls from gnttab_map_refs and gnttab_unmap_refs but PVH guests won't do anything with them " (Stefano Stabellini) which is OK - we want them to be NOPs. This is because we assume that an "IOMMU is always present on the plaform and Xen is going to make the appropriate IOMMU pagetable changes in the hypercall implementation of GNTTABOP_map_grant_ref and GNTTABOP_unmap_grant_ref, then eveything should be transparent from PVH priviligied point of view and DMA transfers involving foreign pages keep working with no issues[sp] Otherwise we would need a P2M (and an M2P) for PVH priviligied to track these foreign pages .. (see arch/arm/xen/p2m.c)." (Stefano Stabellini). We still have to inhibit the building of the P2M tree. That had been done in the past by not calling xen_build_dynamic_phys_to_machine (which setups the P2M tree and gives us virtual address to access them). But we are missing a check for xen_build_mfn_list_list - which was continuing to setup the P2M tree and would blow up at trying to get the virtual address of p2m_missing (which would have been setup by xen_build_dynamic_phys_to_machine). Hence a check is needed to not call xen_build_mfn_list_list when running in auto-xlat mode. Instead of replicating the check for auto-xlat in enlighten.c do it in the p2m.c code. The reason is that the xen_build_mfn_list_list is called also in xen_arch_post_suspend without any checks for auto-xlat. So for PVH or PV with auto-xlat - we would needlessly allocate space for an P2M tree. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reviewed-by: David Vrabel <david.vrabel@citrix.com> Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
|
#
d285d683 |
|
12-Dec-2013 |
Mukesh Rathor <mukesh.rathor@oracle.com> |
xen/pvh: Early bootup changes in PV code (v4). We don't use the filtering that 'xen_cpuid' is doing because the hypervisor treats 'XEN_EMULATE_PREFIX' as an invalid instruction. This means that all of the filtering will have to be done in the hypervisor/toolstack. Without the filtering we expose to the guest the: - cpu topology (sockets, cores, etc); - the APERF (which the generic scheduler likes to use), see 5e626254206a709c6e937f3dda69bf26c7344f6f "xen/setup: filter APERFMPERF cpuid feature out" - and the inability to figure out whether MWAIT_LEAF should be exposed or not. See df88b2d96e36d9a9e325bfcd12eb45671cbbc937 "xen/enlighten: Disable MWAIT_LEAF so that acpi-pad won't be loaded." - x2apic, see 4ea9b9aca90cfc71e6872ed3522356755162932c "xen: mask x2APIC feature in PV" We also check for vector callback early on, as it is a required feature. PVH also runs at default kernel IOPL. Finally, pure PV settings are moved to a separate function that are only called for pure PV, ie, pv with pvmmu. They are also #ifdef with CONFIG_XEN_PVMMU. Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
|
#
1fb3a8b2 |
|
13-Aug-2013 |
Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> |
xen/spinlock: Fix locking path engaging too soon under PVHVM. The xen_lock_spinning has a check for the kicker interrupts and if it is not initialized it will spin normally (not enter the slowpath). But for PVHVM case we would initialize the kicker interrupt before the CPU came online. This meant that if the booting CPU used a spinlock and went in the slowpath - it would enter the slowpath and block forever. The forever part because during bootup: the spinlock would be taken _before_ the CPU sets itself to be online (more on this further), and we enter to poll on the event channel forever. The bootup CPU (see commit fc78d343fa74514f6fd117b5ef4cd27e4ac30236 "xen/smp: initialize IPI vectors before marking CPU online" for details) and the CPU that started the bootup consult the cpu_online_mask to determine whether the booting CPU should get an IPI. The booting CPU has to set itself in this mask via: set_cpu_online(smp_processor_id(), true); However, if the spinlock is taken before this (and it is) and it polls on an event channel - it will never be woken up as the kernel will never send an IPI to an offline CPU. Note that the PVHVM logic in sending IPIs is using the HVM path which has numerous checks using the cpu_online_mask and cpu_active_mask. See above mention git commit for details. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reviewed-by: David Vrabel <david.vrabel@citrix.com>
|
#
669b0ae9 |
|
16-Aug-2013 |
Vaughan Cao <vaughan.cao@oracle.com> |
xen/pvhvm: Initialize xen panic handler for PVHVM guests kernel use callback linked in panic_notifier_list to notice others when panic happens. NORET_TYPE void panic(const char * fmt, ...){ ... atomic_notifier_call_chain(&panic_notifier_list, 0, buf); } When Xen becomes aware of this, it will call xen_reboot(SHUTDOWN_crash) to send out an event with reason code - SHUTDOWN_crash. xen_panic_handler_init() is defined to register on panic_notifier_list but we only call it in xen_arch_setup which only be called by PV, this patch is necessary for PVHVM. Without this patch, setting 'on_crash=coredump-restart' in PVHVM guest config file won't lead a vmcore to be generate when the guest panics. It can be reproduced with 'echo c > /proc/sysrq-trigger'. Signed-off-by: Vaughan Cao <vaughan.cao@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Acked-by: Joe Jin <joe.jin@oracle.com>
|
#
6efa20e4 |
|
19-Jul-2013 |
Konrad Rzeszutek Wilk <konrad@kernel.org> |
xen: Support 64-bit PV guest receiving NMIs This is based on a patch that Zhenzhong Duan had sent - which was missing some of the remaining pieces. The kernel has the logic to handle Xen-type-exceptions using the paravirt interface in the assembler code (see PARAVIRT_ADJUST_EXCEPTION_FRAME - pv_irq_ops.adjust_exception_frame and and INTERRUPT_RETURN - pv_cpu_ops.iret). That means the nmi handler (and other exception handlers) use the hypervisor iret. The other changes that would be neccessary for this would be to translate the NMI_VECTOR to one of the entries on the ipi_vector and make xen_send_IPI_mask_allbutself use different events. Fortunately for us commit 1db01b4903639fcfaec213701a494fe3fb2c490b (xen: Clean up apic ipi interface) implemented this and we piggyback on the cleanup such that the apic IPI interface will pass the right vector value for NMI. With this patch we can trigger NMIs within a PV guest (only tested x86_64). For this to work with normal PV guests (not initial domain) we need the domain to be able to use the APIC ops - they are already implemented to use the Xen event channels. For that to be turned on in a PV domU we need to remove the masking of X86_FEATURE_APIC. Incidentally that means kgdb will also now work within a PV guest without using the 'nokgdbroundup' workaround. Note that the 32-bit version is different and this patch does not enable that. CC: Lisa Nguyen <lisa@xenapiadmin.com> CC: Ben Guthro <benjamin.guthro@citrix.com> CC: Zhenzhong Duan <zhenzhong.duan@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> [v1: Fixed up per David Vrabel comments] Reviewed-by: Ben Guthro <benjamin.guthro@citrix.com> Reviewed-by: David Vrabel <david.vrabel@citrix.com>
|
#
9df56f19 |
|
25-Jul-2013 |
Jason Wang <jasowang@redhat.com> |
x86: Correctly detect hypervisor We try to handle the hypervisor compatibility mode by detecting hypervisor through a specific order. This is not robust, since hypervisors may implement each others features. This patch tries to handle this situation by always choosing the last one in the CPUID leaves. This is done by letting .detect() return a priority instead of true/false and just re-using the CPUID leaf where the signature were found as the priority (or 1 if it was found by DMI). Then we can just pick hypervisor who has the highest priority. Other sophisticated detection method could also be implemented on top. Suggested by H. Peter Anvin and Paolo Bonzini. Acked-by: K. Y. Srinivasan <kys@microsoft.com> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Cc: Doug Covelli <dcovelli@vmware.com> Cc: Borislav Petkov <bp@suse.de> Cc: Dan Hecht <dhecht@vmware.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Gleb Natapov <gleb@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Link: http://lkml.kernel.org/r/1374742475-2485-4-git-send-email-jasowang@redhat.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
|
#
148f9bb8 |
|
18-Jun-2013 |
Paul Gortmaker <paul.gortmaker@windriver.com> |
x86: delete __cpuinit usage from all x86 files The __cpuinit type of throwaway sections might have made sense some time ago when RAM was more constrained, but now the savings do not offset the cost and complications. For example, the fix in commit 5e427ec2d0 ("x86: Fix bit corruption at CPU resume time") is a good example of the nasty type of bugs that can be created with improper use of the various __init prefixes. After a discussion on LKML[1] it was decided that cpuinit should go the way of devinit and be phased out. Once all the users are gone, we can then finally remove the macros themselves from linux/init.h. Note that some harmless section mismatch warnings may result, since notify_cpu_starting() and cpu_up() are arch independent (kernel/cpu.c) are flagged as __cpuinit -- so if we remove the __cpuinit from arch specific callers, we will also get section mismatch warnings. As an intermediate step, we intend to turn the linux/init.h cpuinit content into no-ops as early as possible, since that will get rid of these warnings. In any case, they are temporary and harmless. This removes all the arch/x86 uses of the __cpuinit macros from all C files. x86 only had the one __CPUINIT used in assembly files, and it wasn't paired off with a .previous or a __FINIT, so we can delete it directly w/o any corresponding additional change there. [1] https://lkml.org/lkml/2013/5/20/589 Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: x86@kernel.org Acked-by: Ingo Molnar <mingo@kernel.org> Acked-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: H. Peter Anvin <hpa@linux.intel.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
|
#
60e019eb |
|
29-Apr-2013 |
H. Peter Anvin <hpa@zytor.com> |
x86: Get rid of ->hard_math and all the FPU asm fu Reimplement FPU detection code in C and drop old, not-so-recommended detection method in asm. Move all the relevant stuff into i387.c where it conceptually belongs. Finally drop cpuinfo_x86.hard_math. [ hpa: huge thanks to Borislav for taking my original concept patch and productizing it ] [ Boris, note to self: do not use static_cpu_has before alternatives! ] Signed-off-by: H. Peter Anvin <hpa@zytor.com> Link: http://lkml.kernel.org/r/1367244262-29511-2-git-send-email-bp@alien8.de Link: http://lkml.kernel.org/r/1365436666-9837-2-git-send-email-bp@alien8.de Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
|
#
4ea9b9ac |
|
28-Mar-2013 |
Zhenzhong Duan <zhenzhong.duan@oracle.com> |
xen: mask x2APIC feature in PV On x2apic enabled pvm, doing sysrq+l, got NULL pointer dereference as below. SysRq : Show backtrace of all active CPUs BUG: unable to handle kernel NULL pointer dereference at (null) IP: [<ffffffff8125e3cb>] memcpy+0xb/0x120 Call Trace: [<ffffffff81039633>] ? __x2apic_send_IPI_mask+0x73/0x160 [<ffffffff8103973e>] x2apic_send_IPI_all+0x1e/0x20 [<ffffffff8103498c>] arch_trigger_all_cpu_backtrace+0x6c/0xb0 [<ffffffff81501be4>] ? _raw_spin_lock_irqsave+0x34/0x50 [<ffffffff8131654e>] sysrq_handle_showallcpus+0xe/0x10 [<ffffffff8131616d>] __handle_sysrq+0x7d/0x140 [<ffffffff81316230>] ? __handle_sysrq+0x140/0x140 [<ffffffff81316287>] write_sysrq_trigger+0x57/0x60 [<ffffffff811ca996>] proc_reg_write+0x86/0xc0 [<ffffffff8116dd8e>] vfs_write+0xce/0x190 [<ffffffff8116e3e5>] sys_write+0x55/0x90 [<ffffffff8150a242>] system_call_fastpath+0x16/0x1b That's because apic points to apic_x2apic_cluster or apic_x2apic_phys but the basic element like cpumask isn't initialized. Mask x2APIC feature in pvm to avoid overwrite of apic pointer, update commit message per Konrad's suggestion. Signed-off-by: Zhenzhong Duan <zhenzhong.duan@oracle.com> Tested-by: Tamon Shiose <tamon.shiose@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
#
d5b17dbf |
|
05-May-2013 |
Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> |
xen/smp/pvhvm: Don't point per_cpu(xen_vpcu, 33 and larger) to shared_info As it will point to some data, but not event channel data (the shared_info has an array limited to 32). This means that for PVHVM guests with more than 32 VCPUs without the usage of VCPUOP_register_info any interrupts to VCPUs larger than 32 would have gone unnoticed during early bootup. That is OK, as during early bootup, in smp_init we end up calling the hotplug mechanism (xen_hvm_cpu_notify) which makes the VCPUOP_register_vcpu_info call for all VCPUs and we can receive interrupts on VCPUs 33 and further. This is just a cleanup. Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
#
a520996a |
|
05-May-2013 |
Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> |
xen/vcpu: Document the xen_vcpu_info and xen_vcpu They are important structures and it is not clear at first look what they are for. The xen_vcpu is a pointer. By default it points to the shared_info structure (at the CPU offset location). However if the VCPUOP_register_vcpu_info hypercall is implemented we can make the xen_vcpu pointer point to a per-CPU location. Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> [v1: Added comments from Ian Campbell] Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
#
7f1fc268 |
|
05-May-2013 |
Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> |
xen/vcpu/pvhvm: Fix vcpu hotplugging hanging. If a user did: echo 0 > /sys/devices/system/cpu/cpu1/online echo 1 > /sys/devices/system/cpu/cpu1/online we would (this a build with DEBUG enabled) get to: smpboot: ++++++++++++++++++++=_---CPU UP 1 .. snip.. smpboot: Stack at about ffff880074c0ff44 smpboot: CPU1: has booted. and hang. The RCU mechanism would kick in an try to IPI the CPU1 but the IPIs (and all other interrupts) would never arrive at the CPU1. At first glance at least. A bit digging in the hypervisor trace shows that (using xenanalyze): [vla] d4v1 vec 243 injecting 0.043163027 --|x d4v1 intr_window vec 243 src 5(vector) intr f3 ] 0.043163639 --|x d4v1 vmentry cycles 1468 ] 0.043164913 --|x d4v1 vmexit exit_reason PENDING_INTERRUPT eip ffffffff81673254 0.043164913 --|x d4v1 inj_virq vec 243 real [vla] d4v1 vec 243 injecting 0.043164913 --|x d4v1 intr_window vec 243 src 5(vector) intr f3 ] 0.043165526 --|x d4v1 vmentry cycles 1472 ] 0.043166800 --|x d4v1 vmexit exit_reason PENDING_INTERRUPT eip ffffffff81673254 0.043166800 --|x d4v1 inj_virq vec 243 real [vla] d4v1 vec 243 injecting there is a pending event (subsequent debugging shows it is the IPI from the VCPU0 when smpboot.c on VCPU1 has done "set_cpu_online(smp_processor_id(), true)") and the guest VCPU1 is interrupted with the callback IPI (0xf3 aka 243) which ends up calling __xen_evtchn_do_upcall. The __xen_evtchn_do_upcall seems to do *something* but not acknowledge the pending events. And the moment the guest does a 'cli' (that is the ffffffff81673254 in the log above) the hypervisor is invoked again to inject the IPI (0xf3) to tell the guest it has pending interrupts. This repeats itself forever. The culprit was the per_cpu(xen_vcpu, cpu) pointer. At the bootup we set each per_cpu(xen_vcpu, cpu) to point to the shared_info->vcpu_info[vcpu] but later on use the VCPUOP_register_vcpu_info to register per-CPU structures (xen_vcpu_setup). This is used to allow events for more than 32 VCPUs and for performance optimizations reasons. When the user performs the VCPU hotplug we end up calling the the xen_vcpu_setup once more. We make the hypercall which returns -EINVAL as it does not allow multiple registration calls (and already has re-assigned where the events are being set). We pick the fallback case and set per_cpu(xen_vcpu, cpu) to point to the shared_info->vcpu_info[vcpu] (which is a good fallback during bootup). However the hypervisor is still setting events in the register per-cpu structure (per_cpu(xen_vcpu_info, cpu)). As such when the events are set by the hypervisor (such as timer one), and when we iterate in __xen_evtchn_do_upcall we end up reading stale events from the shared_info->vcpu_info[vcpu] instead of the per_cpu(xen_vcpu_info, cpu) structures. Hence we never acknowledge the events that the hypervisor has set and the hypervisor keeps on reminding us to ack the events which we never do. The fix is simple. Don't on the second time when xen_vcpu_setup is called over-write the per_cpu(xen_vcpu, cpu) if it points to per_cpu(xen_vcpu_info). Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> CC: stable@vger.kernel.org Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
#
7918c92a |
|
16-Apr-2013 |
Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> |
xen/time: Fix kasprintf splat when allocating timer%d IRQ line. When we online the CPU, we get this splat: smpboot: Booting Node 0 Processor 1 APIC 0x2 installing Xen timer for CPU 1 BUG: sleeping function called from invalid context at /home/konrad/ssd/konrad/linux/mm/slab.c:3179 in_atomic(): 1, irqs_disabled(): 0, pid: 0, name: swapper/1 Pid: 0, comm: swapper/1 Not tainted 3.9.0-rc6upstream-00001-g3884fad #1 Call Trace: [<ffffffff810c1fea>] __might_sleep+0xda/0x100 [<ffffffff81194617>] __kmalloc_track_caller+0x1e7/0x2c0 [<ffffffff81303758>] ? kasprintf+0x38/0x40 [<ffffffff813036eb>] kvasprintf+0x5b/0x90 [<ffffffff81303758>] kasprintf+0x38/0x40 [<ffffffff81044510>] xen_setup_timer+0x30/0xb0 [<ffffffff810445af>] xen_hvm_setup_cpu_clockevents+0x1f/0x30 [<ffffffff81666d0a>] start_secondary+0x19c/0x1a8 The solution to that is use kasprintf in the CPU hotplug path that 'online's the CPU. That is, do it in in xen_hvm_cpu_notify, and remove the call to in xen_hvm_setup_cpu_clockevents. Unfortunatly the later is not a good idea as the bootup path does not use xen_hvm_cpu_notify so we would end up never allocating timer%d interrupt lines when booting. As such add the check for atomic() to continue. CC: stable@vger.kernel.org Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
#
96f28bc6 |
|
03-Apr-2013 |
David Vrabel <david.vrabel@citrix.com> |
x86/xen: populate boot_params with EDD data During early setup of a dom0 kernel, populate boot_params with the Enhanced Disk Drive (EDD) and MBR signature data. This makes information on the BIOS boot device available in /sys/firmware/edd/. Signed-off-by: David Vrabel <david.vrabel@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
#
357d1226 |
|
05-Apr-2013 |
Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> |
x86, xen, gdt: Remove the pvops variant of store_gdt. The two use-cases where we needed to store the GDT were during ACPI S3 suspend and resume. As the patches: x86/gdt/i386: store/load GDT for ACPI S3 or hibernation/resume path is not needed x86/gdt/64-bit: store/load GDT for ACPI S3 or hibernate/resume path is not needed. have demonstrated - there are other mechanism by which the GDT is saved and reloaded during early resume path. Hence we do not need to worry about the pvops call-chain for saving the GDT and can and can eliminate it. The other areas where the store_gdt is used are never going to be hit when running under the pvops platforms. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Link: http://lkml.kernel.org/r/1365194544-14648-4-git-send-email-konrad.wilk@oracle.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
|
#
c79c4982 |
|
25-Feb-2013 |
Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> |
xen/pat: Disable PAT using pat_enabled value. The git commit 8eaffa67b43e99ae581622c5133e20b0f48bcef1 (xen/pat: Disable PAT support for now) explains in details why we want to disable PAT for right now. However that change was not enough and we should have also disabled the pat_enabled value. Otherwise we end up with: mmap-example:3481 map pfn expected mapping type write-back for [mem 0x00010000-0x00010fff], got uncached-minus ------------[ cut here ]------------ WARNING: at /build/buildd/linux-3.8.0/arch/x86/mm/pat.c:774 untrack_pfn+0xb8/0xd0() mem 0x00010000-0x00010fff], got uncached-minus ------------[ cut here ]------------ WARNING: at /build/buildd/linux-3.8.0/arch/x86/mm/pat.c:774 untrack_pfn+0xb8/0xd0() ... Pid: 3481, comm: mmap-example Tainted: GF 3.8.0-6-generic #13-Ubuntu Call Trace: [<ffffffff8105879f>] warn_slowpath_common+0x7f/0xc0 [<ffffffff810587fa>] warn_slowpath_null+0x1a/0x20 [<ffffffff8104bcc8>] untrack_pfn+0xb8/0xd0 [<ffffffff81156c1c>] unmap_single_vma+0xac/0x100 [<ffffffff81157459>] unmap_vmas+0x49/0x90 [<ffffffff8115f808>] exit_mmap+0x98/0x170 [<ffffffff810559a4>] mmput+0x64/0x100 [<ffffffff810560f5>] dup_mm+0x445/0x660 [<ffffffff81056d9f>] copy_process.part.22+0xa5f/0x1510 [<ffffffff81057931>] do_fork+0x91/0x350 [<ffffffff81057c76>] sys_clone+0x16/0x20 [<ffffffff816ccbf9>] stub_clone+0x69/0x90 [<ffffffff816cc89d>] ? system_call_fastpath+0x1a/0x1f ---[ end trace 4918cdd0a4c9fea4 ]--- (a similar message shows up if you end up launching 'mcelog') The call chain is (as analyzed by Liu, Jinsong): do_fork --> copy_process --> dup_mm --> dup_mmap --> copy_page_range --> track_pfn_copy --> reserve_pfn_range --> line 624: flags != want_flags It comes from different memory types of page table (_PAGE_CACHE_WB) and MTRR (_PAGE_CACHE_UC_MINUS). Stefan Bader dug in this deep and found out that: "That makes it clearer as this will do reserve_memtype(...) --> pat_x_mtrr_type --> mtrr_type_lookup --> __mtrr_type_lookup And that can return -1/0xff in case of MTRR not being enabled/initialized. Which is not the case (given there are no messages for it in dmesg). This is not equal to MTRR_TYPE_WRBACK and thus becomes _PAGE_CACHE_UC_MINUS. It looks like the problem starts early in reserve_memtype: if (!pat_enabled) { /* This is identical to page table setting without PAT */ if (new_type) { if (req_type == _PAGE_CACHE_WC) *new_type = _PAGE_CACHE_UC_MINUS; else *new_type = req_type & _PAGE_CACHE_MASK; } return 0; } This would be what we want, that is clearing the PWT and PCD flags from the supported flags - if pat_enabled is disabled." This patch does that - disabling PAT. CC: stable@vger.kernel.org # 3.3 and further Reported-by: Sander Eikelenboom <linux@eikelenboom.it> Reported-and-Tested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reported-and-Tested-by: Stefan Bader <stefan.bader@canonical.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
#
e9daff24 |
|
14-Feb-2013 |
Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> |
Revert "xen PVonHVM: use E820_Reserved area for shared_info" This reverts commit 9d02b43dee0d7fb18dfb13a00915550b1a3daa9f. We are doing this b/c on 32-bit PVonHVM with older hypervisors (Xen 4.1) it ends up bothing up the start_info. This is bad b/c we use it for the time keeping, and the timekeeping code loops forever - as the version field never changes. Olaf says to revert it, so lets do that. Acked-by: Olaf Hering <olaf@aepfle.de> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
#
5eb65be2 |
|
14-Feb-2013 |
Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> |
Revert "xen/PVonHVM: fix compile warning in init_hvm_pv_info" This reverts commit a7be94ac8d69c037d08f0fd94b45a593f1d45176. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
#
4cca6ea0 |
|
17-Jan-2013 |
Alok N Kataria <akataria@vmware.com> |
x86/apic: Allow x2apic without IR on VMware platform This patch updates x2apic initializaition code to allow x2apic on VMware platform even without interrupt remapping support. The hypervisor_x2apic_available hook was added in x2apic initialization code and used by KVM and XEN, before this. I have also cleaned up that code to export this hook through the hypervisor_x86 structure. Compile tested for KVM and XEN configs, this patch doesn't have any functional effect on those two platforms. On VMware platform, verified that x2apic is used in physical mode on products that support this. Signed-off-by: Alok N Kataria <akataria@vmware.com> Reviewed-by: Doug Covelli <dcovelli@vmware.com> Reviewed-by: Dan Hecht <dhecht@vmware.com> Acked-by: H. Peter Anvin <hpa@zytor.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Cc: Avi Kivity <avi@redhat.com> Link: http://lkml.kernel.org/r/1358466282.423.60.camel@akataria-dtop.eng.vmware.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
9d328a94 |
|
13-Dec-2012 |
Wei Liu <wei.liu2@citrix.com> |
xen/vcpu: Fix vcpu restore path. The runstate of vcpu should be restored for all possible cpus, as well as the vcpu info placement. Acked-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Wei Liu <wei.liu2@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
#
a7be94ac |
|
29-Nov-2012 |
Olaf Hering <olaf@aepfle.de> |
xen/PVonHVM: fix compile warning in init_hvm_pv_info After merging the xen-two tree, today's linux-next build (x86_64 allmodconfig) produced this warning: arch/x86/xen/enlighten.c: In function 'init_hvm_pv_info': arch/x86/xen/enlighten.c:1617:16: warning: unused variable 'ebx' [-Wunused-variable] arch/x86/xen/enlighten.c:1617:11: warning: unused variable 'eax' [-Wunused-variable] Introduced by commit 9d02b43dee0d ("xen PVonHVM: use E820_Reserved area for shared_info"). Signed-off-by: Olaf Hering <olaf@aepfle.de> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
#
394b40f6 |
|
27-Nov-2012 |
Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> |
xen/acpi: Move the xen_running_on_version_or_later function. As on ia64 builds we get: include/xen/interface/version.h: In function 'xen_running_on_version_or_later': include/xen/interface/version.h:76: error: implicit declaration of function 'HYPERVISOR_xen_version' We can later on make this function exportable if there are modules using part of it. For right now the only two users are built-in. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
#
e3aa4e61 |
|
07-Nov-2012 |
Liu, Jinsong <jinsong.liu@intel.com> |
xen/acpi: revert pad config check in xen_check_mwait With Xen acpi pad logic added into kernel, we can now revert xen mwait related patch df88b2d96e36d9a9e325bfcd12eb45671cbbc937 ("xen/enlighten: Disable MWAIT_LEAF so that acpi-pad won't be loaded. "). The reason is, when running under newer Xen platform, Xen pad driver would be early loaded, so native pad driver would fail to be loaded, and hence no mwait/monitor #UD risk again. Another point is, only Xen4.2 or later support Xen acpi pad, so we won't expose mwait cpuid capability when running under older Xen platform. Signed-off-by: Liu, Jinsong <jinsong.liu@intel.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
#
9d02b43d |
|
01-Nov-2012 |
Olaf Hering <olaf@aepfle.de> |
xen PVonHVM: use E820_Reserved area for shared_info This is a respin of 00e37bdb0113a98408de42db85be002f21dbffd3 ("xen PVonHVM: move shared_info to MMIO before kexec"). Currently kexec in a PVonHVM guest fails with a triple fault because the new kernel overwrites the shared info page. The exact failure depends on the size of the kernel image. This patch moves the pfn from RAM into an E820 reserved memory area. The pfn containing the shared_info is located somewhere in RAM. This will cause trouble if the current kernel is doing a kexec boot into a new kernel. The new kernel (and its startup code) can not know where the pfn is, so it can not reserve the page. The hypervisor will continue to update the pfn, and as a result memory corruption occours in the new kernel. The toolstack marks the memory area FC000000-FFFFFFFF as reserved in the E820 map. Within that range newer toolstacks (4.3+) will keep 1MB starting from FE700000 as reserved for guest use. Older Xen4 toolstacks will usually not allocate areas up to FE700000, so FE700000 is expected to work also with older toolstacks. In Xen3 there is no reserved area at a fixed location. If the guest is started on such old hosts the shared_info page will be placed in RAM. As a result kexec can not be used. Signed-off-by: Olaf Hering <olaf@aepfle.de> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
#
c2103b7e |
|
07-Oct-2012 |
Wei Yongjun <yongjun_wei@trendmicro.com.cn> |
xen/x86: remove duplicated include from enlighten.c Remove duplicated include. dpatch engine is used to auto generate this patch. (https://github.com/weiyj/dpatch) CC: stable@vger.kernel.org Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
#
1a7bbda5 |
|
10-Oct-2012 |
Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> |
xen/bootup: allow {read|write}_cr8 pvops call. We actually do not do anything about it. Just return a default value of zero and if the kernel tries to write anything but 0 we BUG_ON. This fixes the case when an user tries to suspend the machine and it blows up in save_processor_state b/c 'read_cr8' is set to NULL and we get: kernel BUG at /home/konrad/ssd/linux/arch/x86/include/asm/paravirt.h:100! invalid opcode: 0000 [#1] SMP Pid: 2687, comm: init.late Tainted: G O 3.6.0upstream-00002-gac264ac-dirty #4 Bochs Bochs RIP: e030:[<ffffffff814d5f42>] [<ffffffff814d5f42>] save_processor_state+0x212/0x270 .. snip.. Call Trace: [<ffffffff810733bf>] do_suspend_lowlevel+0xf/0xac [<ffffffff8107330c>] ? x86_acpi_suspend_lowlevel+0x10c/0x150 [<ffffffff81342ee2>] acpi_suspend_enter+0x57/0xd5 CC: stable@vger.kernel.org Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
#
cd0608e7 |
|
10-Oct-2012 |
Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> |
xen/bootup: allow read_tscp call for Xen PV guests. The hypervisor will trap it. However without this patch, we would crash as the .read_tscp is set to NULL. This patch fixes it and sets it to the native_read_tscp call. CC: stable@vger.kernel.org Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
#
ffb8b233 |
|
20-Sep-2012 |
Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> |
xen/x86: retrieve keyboard shift status flags from hypervisor. The xen c/s 25873 allows the hypervisor to retrieve the NUMLOCK flag. With this patch, the Linux kernel can get the state according to the data in the BIOS. Acked-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
#
bd49940a |
|
19-Sep-2012 |
Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> |
xen/boot: Disable BIOS SMP MP table search. As the initial domain we are able to search/map certain regions of memory to harvest configuration data. For all low-level we use ACPI tables - for interrupts we use exclusively ACPI _PRT (so DSDT) and MADT for INT_SRC_OVR. The SMP MP table is not used at all. As a matter of fact we do not even support machines that only have SMP MP but no ACPI tables. Lets follow how Moorestown does it and just disable searching for BIOS SMP tables. This also fixes an issue on HP Proliant BL680c G5 and DL380 G6: 9f->100 for 1:1 PTE Freeing 9f-100 pfn range: 97 pages freed 1-1 mapping on 9f->100 .. snip.. e820: BIOS-provided physical RAM map: Xen: [mem 0x0000000000000000-0x000000000009efff] usable Xen: [mem 0x000000000009f400-0x00000000000fffff] reserved Xen: [mem 0x0000000000100000-0x00000000cfd1dfff] usable .. snip.. Scan for SMP in [mem 0x00000000-0x000003ff] Scan for SMP in [mem 0x0009fc00-0x0009ffff] Scan for SMP in [mem 0x000f0000-0x000fffff] found SMP MP-table at [mem 0x000f4fa0-0x000f4faf] mapped at [ffff8800000f4fa0] (XEN) mm.c:908:d0 Error getting mfn 100 (pfn 5555555555555555) from L1 entry 0000000000100461 for l1e_owner=0, pg_owner=0 (XEN) mm.c:4995:d0 ptwr_emulate: could not get_page_from_l1e() BUG: unable to handle kernel NULL pointer dereference at (null) IP: [<ffffffff81ac07e2>] xen_set_pte_init+0x66/0x71 . snip.. Pid: 0, comm: swapper Not tainted 3.6.0-rc6upstream-00188-gb6fb969-dirty #2 HP ProLiant BL680c G5 .. snip.. Call Trace: [<ffffffff81ad31c6>] __early_ioremap+0x18a/0x248 [<ffffffff81624731>] ? printk+0x48/0x4a [<ffffffff81ad32ac>] early_ioremap+0x13/0x15 [<ffffffff81acc140>] get_mpc_size+0x2f/0x67 [<ffffffff81acc284>] smp_scan_config+0x10c/0x136 [<ffffffff81acc2e4>] default_find_smp_config+0x36/0x5a [<ffffffff81ac3085>] setup_arch+0x5b3/0xb5b [<ffffffff81624731>] ? printk+0x48/0x4a [<ffffffff81abca7f>] start_kernel+0x90/0x390 [<ffffffff81abc356>] x86_64_start_reservations+0x131/0x136 [<ffffffff81abfa83>] xen_start_kernel+0x65f/0x661 (XEN) Domain 0 crashed: 'noreboot' set - not rebooting. which is that ioremap would end up mapping 0xff using _PAGE_IOMAP (which is what early_ioremap sticks as a flag) - which meant we would get MFN 0xFF (pte ff461, which is OK), and then it would also map 0x100 (b/c ioremap tries to get page aligned request, and it was trying to map 0xf4fa0 + PAGE_SIZE - so it mapped the next page) as _PAGE_IOMAP. Since 0x100 is actually a RAM page, and the _PAGE_IOMAP bypasses the P2M lookup we would happily set the PTE to 1000461. Xen would deny the request since we do not have access to the Machine Frame Number (MFN) of 0x100. The P2M[0x100] is for example 0x80140. CC: stable@vger.kernel.org Fixes-Oracle-Bugzilla: https://bugzilla.oracle.com/bugzilla/show_bug.cgi?id=13665 Acked-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
#
3699aad0 |
|
28-Jun-2012 |
Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> |
xen/mmu: The xen_setup_kernel_pagetable doesn't need to return anything. We don't need to return the new PGD - as we do not use it. Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
#
51faaf2b |
|
22-Aug-2012 |
Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> |
Revert "xen/x86: Workaround 64-bit hypervisor and 32-bit initial domain." and "xen/x86: Use memblock_reserve for sensitive areas." This reverts commit 806c312e50f122c47913145cf884f53dd09d9199 and commit 59b294403e9814e7c1154043567f0d71bac7a511. And also documents setup.c and why we want to do it that way, which is that we tried to make the the memblock_reserve more selective so that it would be clear what region is reserved. Sadly we ran in the problem wherein on a 64-bit hypervisor with a 32-bit initial domain, the pt_base has the cr3 value which is not neccessarily where the pagetable starts! As Jan put it: " Actually, the adjustment turns out to be correct: The page tables for a 32-on-64 dom0 get allocated in the order "first L1", "first L2", "first L3", so the offset to the page table base is indeed 2. When reading xen/include/public/xen.h's comment very strictly, this is not a violation (since there nothing is said that the first thing in the page table space is pointed to by pt_base; I admit that this seems to be implied though, namely do I think that it is implied that the page table space is the range [pt_base, pt_base + nt_pt_frames), whereas that range here indeed is [pt_base - 2, pt_base - 2 + nt_pt_frames), which - without a priori knowledge - the kernel would have difficulty to figure out)." - so lets just fall back to the easy way and reserve the whole region. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
#
b8b0f559 |
|
21-Aug-2012 |
Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> |
xen/apic/xenbus/swiotlb/pcifront/grant/tmem: Make functions or variables static. There is no need for those functions/variables to be visible. Make them static and also fix the compile warnings of this sort: drivers/xen/<some file>.c: warning: symbol '<blah>' was not declared. Should it be static? Some of them just require including the header file that declares the functions. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
#
806c312e |
|
21-Aug-2012 |
Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> |
xen/x86: Workaround 64-bit hypervisor and 32-bit initial domain. If a 64-bit hypervisor is booted with a 32-bit initial domain, the hypervisor deals with the initial domain as "compat" and does some extra adjustments (like pagetables are 4 bytes instead of 8). It also adjusts the xen_start_info->pt_base incorrectly. When booted with a 32-bit hypervisor (32-bit initial domain): .. (XEN) Start info: cf831000->cf83147c (XEN) Page tables: cf832000->cf8b5000 .. [ 0.000000] PT: cf832000 (f832000) [ 0.000000] Reserving PT: f832000->f8b5000 And with a 64-bit hypervisor: (XEN) Start info: 00000000cf831000->00000000cf8314b4 (XEN) Page tables: 00000000cf832000->00000000cf8b6000 [ 0.000000] PT: cf834000 (f834000) [ 0.000000] Reserving PT: f834000->f8b8000 To deal with this, we keep keep track of the highest physical address we have reserved via memblock_reserve. If that address does not overlap with pt_base, we have a gap which we reserve. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
#
59b29440 |
|
19-Jul-2012 |
Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> |
xen/x86: Use memblock_reserve for sensitive areas. instead of a big memblock_reserve. This way we can be more selective in freeing regions (and it also makes it easier to understand where is what). [v1: Move the auto_translate_physmap to proper line] [v2: Per Stefano suggestion add more comments] Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
#
ca08649e |
|
16-Aug-2012 |
Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> |
Revert "xen PVonHVM: move shared_info to MMIO before kexec" This reverts commit 00e37bdb0113a98408de42db85be002f21dbffd3. During shutdown of PVHVM guests with more than 2VCPUs on certain machines we can hit the race where the replaced shared_info is not replaced fast enough and the PV time clock retries reading the same area over and over without any any success and is stuck in an infinite loop. Acked-by: Olaf Hering <olaf@aepfle.de> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
#
0ec53ecf |
|
14-Sep-2012 |
Stefano Stabellini <stefano.stabellini@eu.citrix.com> |
xen/arm: receive Xen events on ARM Compile events.c on ARM. Parse, map and enable the IRQ to get event notifications from the device tree (node "/xen"). Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
#
00e37bdb |
|
17-Jul-2012 |
Olaf Hering <olaf@aepfle.de> |
xen PVonHVM: move shared_info to MMIO before kexec Currently kexec in a PVonHVM guest fails with a triple fault because the new kernel overwrites the shared info page. The exact failure depends on the size of the kernel image. This patch moves the pfn from RAM into MMIO space before the kexec boot. The pfn containing the shared_info is located somewhere in RAM. This will cause trouble if the current kernel is doing a kexec boot into a new kernel. The new kernel (and its startup code) can not know where the pfn is, so it can not reserve the page. The hypervisor will continue to update the pfn, and as a result memory corruption occours in the new kernel. One way to work around this issue is to allocate a page in the xen-platform pci device's BAR memory range. But pci init is done very late and the shared_info page is already in use very early to read the pvclock. So moving the pfn from RAM to MMIO is racy because some code paths on other vcpus could access the pfn during the small window when the old pfn is moved to the new pfn. There is even a small window were the old pfn is not backed by a mfn, and during that time all reads return -1. Because it is not known upfront where the MMIO region is located it can not be used right from the start in xen_hvm_init_shared_info. To minimise trouble the move of the pfn is done shortly before kexec. This does not eliminate the race because all vcpus are still online when the syscore_ops will be called. But hopefully there is no work pending at this point in time. Also the syscore_op is run last which reduces the risk further. Signed-off-by: Olaf Hering <olaf@aepfle.de> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
#
4ff2d062 |
|
17-Jul-2012 |
Olaf Hering <olaf@aepfle.de> |
xen: simplify init_hvm_pv_info init_hvm_pv_info is called only in PVonHVM context, move it into ifdef. init_hvm_pv_info does not fail, make it a void function. remove arguments from init_hvm_pv_info because they are not used by the caller. Signed-off-by: Olaf Hering <olaf@aepfle.de> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
#
4648da7c |
|
17-Jul-2012 |
Olaf Hering <olaf@aepfle.de> |
xen: remove cast from HYPERVISOR_shared_info assignment Both have type struct shared_info so no cast is needed. Signed-off-by: Olaf Hering <olaf@aepfle.de> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
#
1c32cdc6 |
|
09-Jul-2012 |
David Vrabel <david.vrabel@citrix.com> |
xen/x86: avoid updating TLS descriptors if they haven't changed When switching tasks in a Xen PV guest, avoid updating the TLS descriptors if they haven't changed. This improves the speed of context switches by almost 10% as much of the time the descriptors are the same or only one is different. The descriptors written into the GDT by Xen are modified from the values passed in the update_descriptor hypercall so we keep shadow copies of the three TLS descriptors to compare against. lmbench3 test Before After Improvement -------------------------------------------- lat_ctx -s 32 24 7.19 6.52 9% lat_pipe 12.56 11.66 7% Signed-off-by: David Vrabel <david.vrabel@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
#
59290362 |
|
09-Jul-2012 |
David Vrabel <david.vrabel@citrix.com> |
xen/x86: add desc_equal() to compare GDT descriptors Signed-off-by: David Vrabel <david.vrabel@citrix.com> [v1: Moving it to the Xen file] Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
#
05e36006a |
|
07-Jun-2012 |
Liu, Jinsong <jinsong.liu@intel.com> |
xen/mce: Register native mce handler as vMCE bounce back point When Xen hypervisor inject vMCE to guest, use native mce handler to handle it Signed-off-by: Ke, Liping <liping.ke@intel.com> Signed-off-by: Jiang, Yunhong <yunhong.jiang@intel.com> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Liu, Jinsong <jinsong.liu@intel.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
#
cef12ee5 |
|
07-Jun-2012 |
Liu, Jinsong <jinsong.liu@intel.com> |
xen/mce: Add mcelog support for Xen platform When MCA error occurs, it would be handled by Xen hypervisor first, and then the error information would be sent to initial domain for logging. This patch gets error information from Xen hypervisor and convert Xen format error into Linux format mcelog. This logic is basically self-contained, not touching other kernel components. By using tools like mcelog tool users could read specific error information, like what they did under native Linux. To test follow directions outlined in Documentation/acpi/apei/einj.txt Acked-and-tested-by: Borislav Petkov <borislav.petkov@amd.com> Signed-off-by: Ke, Liping <liping.ke@intel.com> Signed-off-by: Jiang, Yunhong <yunhong.jiang@intel.com> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Liu, Jinsong <jinsong.liu@intel.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
#
1f975f78 |
|
01-Jun-2012 |
Andre Przywara <andre.przywara@amd.com> |
x86, pvops: Remove hooks for {rd,wr}msr_safe_regs There were paravirt_ops hooks for the full register set variant of {rd,wr}msr_safe which are actually not used by anyone anymore. Remove them to make the code cleaner and avoid silent breakages when the pvops members were uninitialized. This has been boot-tested natively and under Xen with PVOPS enabled and disabled on one machine. Signed-off-by: Andre Przywara <andre.przywara@amd.com> Link: http://lkml.kernel.org/r/1338562358-28182-2-git-send-email-bp@amd64.org Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
|
#
5e626254 |
|
29-May-2012 |
Andre Przywara <andre.przywara@amd.com> |
xen/setup: filter APERFMPERF cpuid feature out Xen PV kernels allow access to the APERF/MPERF registers to read the effective frequency. Access to the MSRs is however redirected to the currently scheduled physical CPU, making consecutive read and compares unreliable. In addition each rdmsr traps into the hypervisor. So to avoid bogus readouts and expensive traps, disable the kernel internal feature flag for APERF/MPERF if running under Xen. This will a) remove the aperfmperf flag from /proc/cpuinfo b) not mislead the power scheduler (arch/x86/kernel/cpu/sched.c) to use the feature to improve scheduling (by default disabled) c) not mislead the cpufreq driver to use the MSRs This does not cover userland programs which access the MSRs via the device file interface, but this will be addressed separately. Signed-off-by: Andre Przywara <andre.przywara@amd.com> Cc: stable@vger.kernel.org # v3.0+ Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
#
1ab46fd3 |
|
30-May-2012 |
Konrad Rzeszutek Wilk <konrad@darnok.org> |
x86, amd, xen: Avoid NULL pointer paravirt references Stub out MSR methods that aren't actually needed. This fixes a crash as Xen Dom0 on AMD Trinity systems. A bigger patch should be added to remove the paravirt machinery completely for the methods which apparently have no users! Reported-by: Andre Przywara <andre.przywara@amd.com> Link: http://lkml.kernel.org/r/20120530222356.GA28417@andromeda.dapyr.net Signed-off-by: H. Peter Anvin <hpa@zytor.com> Cc: <stable@vger.kernel.org>
|
#
211063dc |
|
08-Dec-2011 |
Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> |
xen/acpi/sleep: Enable ACPI sleep via the __acpi_os_prepare_sleep Provide the registration callback to call in the Xen's ACPI sleep functionality. This means that during S3/S5 we make a hypercall XENPF_enter_acpi_sleep with the proper PM1A/PM1B registers. Based of Ke Yu's <ke.yu@intel.com> initial idea. [ From http://xenbits.xensource.com/linux-2.6.18-xen.hg change c68699484a65 ] [v1: Added Copyright and license] [v2: Added check if PM1A/B the 16-bits MSB contain something. The spec only uses 16-bits but might have more in future] Signed-off-by: Liang Tang <liang.tang@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
#
f447d56d |
|
20-Apr-2012 |
Ben Guthro <ben@guthro.net> |
xen: implement apic ipi interface Map native ipi vector to xen vector. Implement apic ipi interface with xen_send_IPI_one. Tested-by: Steven Noonan <steven@uplinklabs.net> Signed-off-by: Ben Guthro <ben@guthro.net> Signed-off-by: Lin Ming <mlin@ss.pku.edu.cn> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
#
83d51ab4 |
|
03-May-2012 |
David Vrabel <dvrabel@cantab.net> |
xen/setup: update VA mapping when releasing memory during setup In xen_memory_setup(), if a page that is being released has a VA mapping this must also be updated. Otherwise, the page will be not released completely -- it will still be referenced in Xen and won't be freed util the mapping is removed and this prevents it from being reallocated at a different PFN. This was already being done for the ISA memory region in xen_ident_map_ISA() but on many systems this was omitting a few pages as many systems marked a few pages below the ISA memory region as reserved in the e820 map. This fixes errors such as: (XEN) page_alloc.c:1148:d0 Over-allocation for domain 0: 2097153 > 2097152 (XEN) memory.c:133:d0 Could not allocate order=0 extent: id=0 memflags=0 (0 of 17) Signed-off-by: David Vrabel <david.vrabel@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
#
76a8df7b |
|
04-May-2012 |
David Vrabel <david.vrabel@citrix.com> |
xen/pci: don't use PCI BIOS service for configuration space accesses The accessing PCI configuration space with the PCI BIOS32 service does not work in PV guests. On systems without MMCONFIG or where the BIOS hasn't marked the MMCONFIG region as reserved in the e820 map, the BIOS service is probed (even though direct access is preferred) and this hangs. CC: stable@kernel.org Acked-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com> [v1: Fixed compile error when CONFIG_PCI is not set] Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
#
558daa28 |
|
02-May-2012 |
Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> |
xen/apic: Return the APIC ID (and version) for CPU 0. On x86_64 on AMD machines where the first APIC_ID is not zero, we get: ACPI: LAPIC (acpi_id[0x01] lapic_id[0x10] enabled) BIOS bug: APIC version is 0 for CPU 1/0x10, fixing up to 0x10 BIOS bug: APIC version mismatch, boot CPU: 0, CPU 1: version 10 which means that when the ACPI processor driver loads and tries to parse the _Pxx states it fails to do as, as it ends up calling acpi_get_cpuid which does this: for_each_possible_cpu(i) { if (cpu_physical_id(i) == apic_id) return i; } And the bootup CPU, has not been found so it fails and returns -1 for the first CPU - which then subsequently in the loop that "acpi_processor_get_info" does results in returning an error, which means that "acpi_processor_add" failing and per_cpu(processor) is never set (and is NULL). That means that when xen-acpi-processor tries to load (much much later on) and parse the P-states it gets -ENODEV from acpi_processor_register_performance() (which tries to read the per_cpu(processor)) and fails to parse the data. Reported-by-and-Tested-by: Stefan Bader <stefan.bader@canonical.com> Suggested-by: Boris Ostrovsky <boris.ostrovsky@amd.com> [v2: Bit-shift APIC ID by 24 bits] Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
#
31b3c9d7 |
|
20-Mar-2012 |
Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> |
xen/x86: Implement x86_apic_ops Or rather just implement one different function as opposed to the native one : the read function. We synthesize the values. Acked-by: Suresh Siddha <suresh.b.siddha@intel.com> [v1: Rebased on top of tip/x86/urgent] [v2: Return 0xfd instead of 0xff in the default case] Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
#
df88b2d9 |
|
26-Apr-2012 |
Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> |
xen/enlighten: Disable MWAIT_LEAF so that acpi-pad won't be loaded. There are exactly four users of __monitor and __mwait: - cstate.c (which allows acpi_processor_ffh_cstate_enter to be called when the cpuidle API drivers are used. However patch "cpuidle: replace xen access to x86 pm_idle and default_idle" provides a mechanism to disable the cpuidle and use safe_halt. - smpboot (which allows mwait_play_dead to be called). However safe_halt is always used so we skip that. - intel_idle (same deal as above). - acpi_pad.c. This the one that we do not want to run as we will hit the below crash. Why do we want to expose MWAIT_LEAF in the first place? We want it for the xen-acpi-processor driver - which uploads C-states to the hypervisor. If MWAIT_LEAF is set, the cstate.c sets the proper address in the C-states so that the hypervisor can benefit from using the MWAIT functionality. And that is the sole reason for using it. Without this patch, if a module performs mwait or monitor we get this: invalid opcode: 0000 [#1] SMP CPU 2 .. snip.. Pid: 5036, comm: insmod Tainted: G O 3.4.0-rc2upstream-dirty #2 Intel Corporation S2600CP/S2600CP RIP: e030:[<ffffffffa000a017>] [<ffffffffa000a017>] mwait_check_init+0x17/0x1000 [mwait_check] RSP: e02b:ffff8801c298bf18 EFLAGS: 00010282 RAX: ffff8801c298a010 RBX: ffffffffa03b2000 RCX: 0000000000000000 RDX: 0000000000000000 RSI: ffff8801c29800d8 RDI: ffff8801ff097200 RBP: ffff8801c298bf18 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000 R13: ffffffffa000a000 R14: 0000005148db7294 R15: 0000000000000003 FS: 00007fbb364f2700(0000) GS:ffff8801ff08c000(0000) knlGS:0000000000000000 CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 000000000179f038 CR3: 00000001c9469000 CR4: 0000000000002660 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process insmod (pid: 5036, threadinfo ffff8801c298a000, task ffff8801c29cd7e0) Stack: ffff8801c298bf48 ffffffff81002124 ffffffffa03b2000 00000000000081fd 000000000178f010 000000000178f030 ffff8801c298bf78 ffffffff810c41e6 00007fff3fb30db9 00007fff3fb30db9 00000000000081fd 0000000000010000 Call Trace: [<ffffffff81002124>] do_one_initcall+0x124/0x170 [<ffffffff810c41e6>] sys_init_module+0xc6/0x220 [<ffffffff815b15b9>] system_call_fastpath+0x16/0x1b Code: <0f> 01 c8 31 c0 0f 01 c9 c9 c3 00 00 00 00 00 00 00 00 00 00 00 00 RIP [<ffffffffa000a017>] mwait_check_init+0x17/0x1000 [mwait_check] RSP <ffff8801c298bf18> ---[ end trace 16582fc8a3d1e29a ]--- Kernel panic - not syncing: Fatal exception With this module (which is what acpi_pad.c would hit): MODULE_AUTHOR("Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>"); MODULE_DESCRIPTION("mwait_check_and_back"); MODULE_LICENSE("GPL"); MODULE_VERSION(); static int __init mwait_check_init(void) { __monitor((void *)¤t_thread_info()->flags, 0, 0); __mwait(0, 0); return 0; } static void __exit mwait_check_exit(void) { } module_init(mwait_check_init); module_exit(mwait_check_exit); Reported-by: Liu, Jinsong <jinsong.liu@intel.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
#
5f054e31 |
|
28-Mar-2012 |
Rusty Russell <rusty@rustcorp.com.au> |
documentation: remove references to cpu_*_map. This has been obsolescent for a while, fix documentation and misc comments. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
|
#
73c154c6 |
|
13-Feb-2012 |
Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> |
xen/enlighten: Expose MWAIT and MWAIT_LEAF if hypervisor OKs it. For the hypervisor to take advantage of the MWAIT support it needs to extract from the ACPI _CST the register address. But the hypervisor does not have the support to parse DSDT so it relies on the initial domain (dom0) to parse the ACPI Power Management information and push it up to the hypervisor. The pushing of the data is done by the processor_harveset_xen module which parses the information that the ACPI parser has graciously exposed in 'struct acpi_processor'. For the ACPI parser to also expose the Cx states for MWAIT, we need to expose the MWAIT capability (leaf 1). Furthermore we also need to expose the MWAIT_LEAF capability (leaf 5) for cstate.c to properly function. The hypervisor could expose these flags when it traps the XEN_EMULATE_PREFIX operations, but it can't do it since it needs to be backwards compatible. Instead we choose to use the native CPUID to figure out if the MWAIT capability exists and use the XEN_SET_PDC query hypercall to figure out if the hypervisor wants us to expose the MWAIT_LEAF capability or not. Note: The XEN_SET_PDC query was implemented in c/s 23783: "ACPI: add _PDC input override mechanism". With this in place, instead of C3 ACPI IOPORT 415 we get now C3:ACPI FFH INTEL MWAIT 0x20 Note: The cpu_idle which would be calling the mwait variants for idling never gets set b/c we set the default pm_idle to be the hypercall variant. Acked-by: Jan Beulich <JBeulich@suse.com> [v2: Fix missing header file include and #ifdef] Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
#
8eaffa67 |
|
10-Feb-2012 |
Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> |
xen/pat: Disable PAT support for now. [Pls also look at https://lkml.org/lkml/2012/2/10/228] Using of PAT to change pages from WB to WC works quite nicely. Changing it back to WB - not so much. The crux of the matter is that the code that does this (__page_change_att_set_clr) has only limited information so when it tries to the change it gets the "raw" unfiltered information instead of the properly filtered one - and the "raw" one tell it that PSE bit is on (while infact it is not). As a result when the PTE is set to be WB from WC, we get tons of: :WARNING: at arch/x86/xen/mmu.c:475 xen_make_pte+0x67/0xa0() :Hardware name: HP xw4400 Workstation .. snip.. :Pid: 27, comm: kswapd0 Tainted: G W 3.2.2-1.fc16.x86_64 #1 :Call Trace: : [<ffffffff8106dd1f>] warn_slowpath_common+0x7f/0xc0 : [<ffffffff8106dd7a>] warn_slowpath_null+0x1a/0x20 : [<ffffffff81005a17>] xen_make_pte+0x67/0xa0 : [<ffffffff810051bd>] __raw_callee_save_xen_make_pte+0x11/0x1e : [<ffffffff81040e15>] ? __change_page_attr_set_clr+0x9d5/0xc00 : [<ffffffff8114c2e8>] ? __purge_vmap_area_lazy+0x158/0x1d0 : [<ffffffff8114cca5>] ? vm_unmap_aliases+0x175/0x190 : [<ffffffff81041168>] change_page_attr_set_clr+0x128/0x4c0 : [<ffffffff81041542>] set_pages_array_wb+0x42/0xa0 : [<ffffffff8100a9b2>] ? check_events+0x12/0x20 : [<ffffffffa0074d4c>] ttm_pages_put+0x1c/0x70 [ttm] : [<ffffffffa0074e98>] ttm_page_pool_free+0xf8/0x180 [ttm] : [<ffffffffa0074f78>] ttm_pool_mm_shrink+0x58/0x90 [ttm] : [<ffffffff8112ba04>] shrink_slab+0x154/0x310 : [<ffffffff8112f17a>] balance_pgdat+0x4fa/0x6c0 : [<ffffffff8112f4b8>] kswapd+0x178/0x3d0 : [<ffffffff815df134>] ? __schedule+0x3d4/0x8c0 : [<ffffffff81090410>] ? remove_wait_queue+0x50/0x50 : [<ffffffff8112f340>] ? balance_pgdat+0x6c0/0x6c0 : [<ffffffff8108fb6c>] kthread+0x8c/0xa0 for every page. The proper fix for this is has been posted and is https://lkml.org/lkml/2012/2/10/228 "x86/cpa: Use pte_attrs instead of pte_flags on CPA/set_p.._wb/wc operations." along with a detailed description of the problem and solution. But since that posting has gone nowhere I am proposing this band-aid solution so that at least users don't get the page corruption (the pages that are WC don't get changed to WB and end up being recycled for filesystem or other things causing mysterious crashes). The negative impact of this patch is that users of WC flag (which are InfiniBand, radeon, nouveau drivers) won't be able to set that flag - so they are going to see performance degradation. But stability is more important here. Fixes RH BZ# 742032, 787403, and 745574 Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
#
416d7214 |
|
10-Feb-2012 |
Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> |
xen/setup: Remove redundant filtering of PTE masks. commit 7347b4082e55ac4a673f06a0a0ce25c37273c9ec "xen: Allow unprivileged Xen domains to create iomap pages" added a redundant line in the early bootup code to filter out the PTE. That filtering is already done a bit earlier so this extra processing is not required. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
#
2113f469 |
|
13-Jan-2012 |
Alex Shi <alex.shi@linux.alibaba.com> |
xen: use this_cpu_xxx replace percpu_xxx funcs percpu_xxx funcs are duplicated with this_cpu_xxx funcs, so replace them for further code clean up. I don't know much of xen code. But, since the code is in x86 architecture, the percpu_xxx is exactly same as this_cpu_xxx serials functions. So, the change is safe. Signed-off-by: Alex Shi <alex.shi@intel.com> Acked-by: Christoph Lameter <cl@gentwo.org> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
#
fe091c20 |
|
08-Dec-2011 |
Tejun Heo <tj@kernel.org> |
memblock: Kill memblock_init() memblock_init() initializes arrays for regions and memblock itself; however, all these can be done with struct initializers and memblock_init() can be removed. This patch kills memblock_init() and initializes memblock with struct initializer. The only difference is that the first dummy entries don't have .nid set to MAX_NUMNODES initially. This doesn't cause any behavior difference. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Russell King <linux@arm.linux.org.uk> Cc: Michal Simek <monstr@monstr.eu> Cc: Paul Mundt <lethal@linux-sh.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Guan Xuetao <gxt@mprc.pku.edu.cn> Cc: "H. Peter Anvin" <hpa@zytor.com>
|
#
90d4f553 |
|
27-Oct-2011 |
Zhenzhong Duan <zhenzhong.duan@oracle.com> |
xen:pvhvm: enable PVHVM VCPU placement when using more than 32 CPUs. PVHVM running with more than 32 vcpus and pv_irq/pv_time enabled need VCPU placement to work, or else it will softlockup. CC: stable@kernel.org Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
#
5e287830 |
|
29-Sep-2011 |
Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> |
xen/enlighten: Fix compile warnings and set cx to known value. We get: linux/arch/x86/xen/enlighten.c: In function ‘xen_start_kernel’: linux/arch/x86/xen/enlighten.c:226: warning: ‘cx’ may be used uninitialized in this function linux/arch/x86/xen/enlighten.c:240: note: ‘cx’ was declared here and the cx is really not set but passed in the xen_cpuid instruction which masks the value with returned masked_ecx from cpuid. This can potentially lead to invalid data being stored in cx. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
#
ccbcdf7c |
|
16-Aug-2011 |
Jan Beulich <JBeulich@novell.com> |
xen/x86: replace order-based range checking of M2P table by linear one The order-based approach is not only less efficient (requiring a shift and a compare, typical generated code looking like this mov eax, [machine_to_phys_order] mov ecx, eax shr ebx, cl test ebx, ebx jnz ... whereas a direct check requires just a compare, like in cmp ebx, [machine_to_phys_nr] jae ... ), but also slightly dangerous in the 32-on-64 case - the element address calculation can wrap if the next power of two boundary is sufficiently far away from the actual upper limit of the table, and hence can result in user space addresses being accessed (with it being unknown what may actually be mapped there). Additionally, the elimination of the mistaken use of fls() here (should have been __fls()) fixes a latent issue on x86-64 that would trigger if the code was run on a system with memory extending beyond the 44-bit boundary. CC: stable@kernel.org Signed-off-by: Jan Beulich <jbeulich@novell.com> [v1: Based on Jeremy's feedback] Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
#
318f5a2a |
|
03-Aug-2011 |
Andy Lutomirski <luto@MIT.EDU> |
x86-64: Add user_64bit_mode paravirt op Three places in the kernel assume that the only long mode CPL 3 selector is __USER_CS. This is not true on Xen -- Xen's sysretq changes cs to the magic value 0xe033. Two of the places are corner cases, but as of "x86-64: Improve vsyscall emulation CS and RIP handling" (c9712944b2a12373cb6ff8059afcfb7e826a6c54), vsyscalls will segfault if called with Xen's extra CS selector. This causes a panic when older init builds die. It seems impossible to make Xen use __USER_CS reliably without taking a performance hit on every system call, so this fixes the tests instead with a new paravirt op. It's a little ugly because ptrace.h can't include paravirt.h. Signed-off-by: Andy Lutomirski <luto@mit.edu> Link: http://lkml.kernel.org/r/f4fcb3947340d9e96ce1054a432f183f9da9db83.1312378163.git.luto@mit.edu Reported-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
|
#
ab78f7ad |
|
17-Dec-2010 |
Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> |
xen/trace: add segment desc tracing Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
|
#
b2abe506 |
|
16-May-2011 |
Tom Goetz <tom.goetz@virtualcomputer.com> |
xen: When calling power_off, don't call the halt function. .. As it won't actually power off the machine. Reported-by: Sven Köhler <sven.koehler@gmail.com> Tested-by: Sven Köhler <sven.koehler@gmail.com> Signed-off-by: Tom Goetz <tom.goetz@virtualcomputer.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
#
c2419b4a |
|
31-May-2011 |
Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> |
xen: allow enable use of VGA console on dom0 Get the information about the VGA console hardware from Xen, and put it into the form the bootloader normally generates, so that the rest of the kernel can deal with VGA as usual. [ Impact: make VGA console work in dom0 ] Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> [v1: Rebased on 2.6.39] [v2: Removed incorrect comments and fixed compile warnings] Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
#
ad3062a0 |
|
04-May-2011 |
Daniel Kiper <dkiper@net-space.pl> |
arch/x86/xen/enlighten: Cleanup code/data sections definitions Cleanup code/data sections definitions accordingly to include/linux/init.h. Signed-off-by: Daniel Kiper <dkiper@net-space.pl> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
#
947ccf9c |
|
09-Nov-2010 |
Shan Haitao <haitao.shan@intel.com> |
xen: Allow PV-OPS kernel to detect whether XSAVE is supported Xen fails to mask XSAVE from the cpuid feature, despite not historically supporting guest use of XSAVE. However, now that XSAVE support has been added to Xen, we need to reliably detect its presence. The most reliable way to do this is to look at the OSXSAVE feature in cpuid which is set iff the OS (Xen, in this case), has set CR4.OSXSAVE. [ Cleaned up conditional a bit. - Jeremy ] Signed-off-by: Shan Haitao <haitao.shan@intel.com> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
#
61f4237d |
|
18-Sep-2010 |
Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> |
xen: just completely disable XSAVE Some (old) versions of Xen just kill the domain if it tries to set any unknown bits in CR4, so we can't reliably probe for OSXSAVE in CR4. Since Xen doesn't support XSAVE for guests at the moment, and no such support is being worked on, there's no downside in just unconditionally masking XSAVE support. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
#
99bbb3a8 |
|
02-Dec-2010 |
Stefano Stabellini <stefano.stabellini@eu.citrix.com> |
xen: PV on HVM: support PV spinlocks and IPIs Initialize PV spinlocks on boot CPU right after native_smp_prepare_cpus (that switch to APIC mode and initialize APIC routing); on secondary CPUs on CPU_UP_PREPARE. Enable the usage of event channels to send and receive IPIs when running as a PV on HVM guest. Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
|
#
cff520b9 |
|
02-Dec-2010 |
Stefano Stabellini <stefano.stabellini@eu.citrix.com> |
xen: do not use xen_info on HVM, set pv_info name to "Xen HVM" Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Acked-by: Jeremy Fitzhardinge <jeremy@goop.org>
|
#
44b46c3e |
|
11-Feb-2011 |
Ian Campbell <Ian.Campbell@eu.citrix.com> |
xen: annotate functions which only call into __init at start of day Both xen_hvm_init_shared_info and xen_build_mfn_list_list can be called at resume time as well as at start of day but only reference __init functions (extend_brk) at start of day. Hence annotate with __ref. WARNING: arch/x86/built-in.o(.text+0x4f1): Section mismatch in reference from the function xen_hvm_init_shared_info() to the function .init.text:extend_brk() The function xen_hvm_init_shared_info() references the function __init extend_brk(). This is often because xen_hvm_init_shared_info lacks a __init annotation or the annotation of extend_brk is wrong. xen_hvm_init_shared_info calls extend_brk() iff !shared_info_page and initialises shared_info_page with the result. This happens at start of day only. WARNING: arch/x86/built-in.o(.text+0x599b): Section mismatch in reference from the function xen_build_mfn_list_list() to the function .init.text:extend_brk() The function xen_build_mfn_list_list() references the function __init extend_brk(). This is often because xen_build_mfn_list_list lacks a __init annotation or the annotation of extend_brk is wrong. (this warning occurs multiple times) xen_build_mfn_list_list only calls extend_brk() at boot time, while building the initial mfn list list Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
#
2ce802f6 |
|
19-Jan-2011 |
Tejun Heo <tj@kernel.org> |
lockdep: Move early boot local IRQ enable/disable status to init/main.c During early boot, local IRQ is disabled until IRQ subsystem is properly initialized. During this time, no one should enable local IRQ and some operations which usually are not allowed with IRQ disabled, e.g. operations which might sleep or require communications with other processors, are allowed. lockdep tracked this with early_boot_irqs_off/on() callbacks. As other subsystems need this information too, move it to init/main.c and make it generally available. While at it, toggle the boolean to early_boot_irqs_disabled instead of enabled so that it can be initialized with %false and %true indicates the exceptional condition. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Pekka Enberg <penberg@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> LKML-Reference: <20110120110635.GB6036@htj.dyndns.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
c1f5db1a |
|
03-Dec-2010 |
Ian Campbell <ian.campbell@citrix.com> |
xen: disable ACPI NUMA for PV guests Xen does not currently expose PV-NUMA information to PV guests. Therefore disable NUMA for the time being to prevent the kernel picking up on an host-level NUMA information which it might come across in the firmware. [ Added comment - Jeremy ] Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
#
d9b8ca84 |
|
20-Dec-2010 |
Sheng Yang <sheng@linux.intel.com> |
xen: HVM X2APIC support This patch is similiar to Gleb Natapov's patch for KVM, which enable the hypervisor to emulate x2apic feature for the guest. By this way, the emulation of lapic would be simpler with x2apic interface(MSR), and faster. [v2: Re-organized 'xen_hvm_need_lapic' per Ian Campbell suggestion] Acked-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Sheng Yang <sheng@linux.intel.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
#
780f36d8 |
|
06-Dec-2010 |
Christoph Lameter <cl@linux.com> |
xen: Use this_cpu_ops Use this_cpu_ops to reduce code size and simplify things in various places. V3->V4: Move instance of this_cpu_inc_return to a later patchset so that this patch can be applied without infrastructure changes. Cc: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Acked-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Tejun Heo <tj@kernel.org>
|
#
805e3f49 |
|
03-Nov-2010 |
Ian Campbell <ian.campbell@citrix.com> |
xen: x86/32: perform initial startup on initial_page_table Only make swapper_pg_dir readonly and pinned when generic x86 architecture code (which also starts on initial_page_table) switches to it. This helps ensure that the generic setup paths work on Xen unmodified. In particular clone_pgd_range writes directly to the destination pgd entries and is used to initialise swapper_pg_dir so we need to ensure that it remains writeable until the last possible moment during bring up. This is complicated slightly by the need to avoid sharing kernel PMD entries when running under Xen, therefore the Xen implementation must make a copy of the kernel PMD (which is otherwise referred to by both intial_page_table and swapper_pg_dir) before switching to swapper_pg_dir. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Cc: Borislav Petkov <bp@alien8.de> Cc: H. Peter Anvin <hpa@linux.intel.com> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
|
#
31e323cc |
|
29-Nov-2010 |
Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> |
xen: don't bother to stop other cpus on shutdown/reboot Xen will shoot all the VCPUs when we do a shutdown hypercall, so there's no need to do it manually. In any case it will fail because all the IPI irqs have been pulled down by this point, so the cross-CPU calls will simply hang forever. Until change 76fac077db6b34e2c6383a7b4f3f4f7b7d06d8ce the function calls were not synchronously waited for, so this wasn't apparent. However after that change the calls became synchronous leading to a hang on shutdown on multi-VCPU guests. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Stable Kernel <stable@kernel.org> Cc: Alok Kataria <akataria@vmware.com>
|
#
5b5c1af1 |
|
23-Nov-2010 |
Ian Campbell <ian.campbell@citrix.com> |
xen: x86/32: perform initial startup on initial_page_table Only make swapper_pg_dir readonly and pinned when generic x86 architecture code (which also starts on initial_page_table) switches to it. This helps ensure that the generic setup paths work on Xen unmodified. In particular clone_pgd_range writes directly to the destination pgd entries and is used to initialise swapper_pg_dir so we need to ensure that it remains writeable until the last possible moment during bring up. This is complicated slightly by the need to avoid sharing kernel PMD entries when running under Xen, therefore the Xen implementation must make a copy of the kernel PMD (which is otherwise referred to by both intial_page_table and swapper_pg_dir) before switching to swapper_pg_dir. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Tested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Borislav Petkov <bp@alien8.de> Cc: H. Peter Anvin <hpa@linux.intel.com> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
#
ec35a69c |
|
15-Nov-2010 |
Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> |
xen: set IO permission early (before early_cpu_init()) This patch is based off "xen dom0: Set up basic IO permissions for dom0." by Juan Quintela <quintela@redhat.com>. On AMD machines when we boot the kernel as Domain 0 we get this nasty: mapping kernel into physical memory Xen: setup ISA identity maps about to get started... (XEN) traps.c:475:d0 Unhandled general protection fault fault/trap [#13] on VCPU 0 [ec=0000] (XEN) domain_crash_sync called from entry.S (XEN) Domain 0 (vcpu#0) crashed on cpu#0: (XEN) ----[ Xen-4.1-101116 x86_64 debug=y Not tainted ]---- (XEN) CPU: 0 (XEN) RIP: e033:[<ffffffff8130271b>] (XEN) RFLAGS: 0000000000000282 EM: 1 CONTEXT: pv guest (XEN) rax: 000000008000c068 rbx: ffffffff8186c680 rcx: 0000000000000068 (XEN) rdx: 0000000000000cf8 rsi: 000000000000c000 rdi: 0000000000000000 (XEN) rbp: ffffffff81801e98 rsp: ffffffff81801e50 r8: ffffffff81801eac (XEN) r9: ffffffff81801ea8 r10: ffffffff81801eb4 r11: 00000000ffffffff (XEN) r12: ffffffff8186c694 r13: ffffffff81801f90 r14: ffffffffffffffff (XEN) r15: 0000000000000000 cr0: 000000008005003b cr4: 00000000000006f0 (XEN) cr3: 0000000221803000 cr2: 0000000000000000 (XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: e02b cs: e033 (XEN) Guest stack trace from rsp=ffffffff81801e50: RIP points to read_pci_config() function. The issue is that we don't set IO permissions for the Linux kernel early enough. The call sequence used to be: xen_start_kernel() x86_init.oem.arch_setup = xen_setup_arch; setup_arch: - early_cpu_init - early_init_amd - read_pci_config - x86_init.oem.arch_setup [ xen_arch_setup ] - set IO permissions. We need to set the IO permissions earlier on, which this patch does. Acked-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
#
7e77506a |
|
29-Sep-2010 |
Ian Campbell <ian.campbell@citrix.com> |
xen: implement XENMEM_machphys_mapping This hypercall allows Xen to specify a non-default location for the machine to physical mapping. This capability is used when running a 32 bit domain 0 on a 64 bit hypervisor to shrink the hypervisor hole to exactly the size required. [ Impact: add Xen hypercall definitions ] Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
|
#
61d8e11e |
|
27-Oct-2010 |
Zimny Lech <napohybelskurwysynom2010@gmail.com> |
Remove duplicate includes from many files Signed-off-by: Zimny Lech <napohybelskurwysynom2010@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
ff12849a |
|
28-Sep-2010 |
Stefano Stabellini <stefano.stabellini@eu.citrix.com> |
xen: mask the MTRR feature from the cpuid We don't want Linux to think that the cpu supports MTRRs when running under Xen because MTRR operations could only be performed through hypercalls. Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
#
4ec5387c |
|
02-Sep-2010 |
Juan Quintela <quintela@redhat.com> |
xen: add the direct mapping area for ISA bus access add the direct mapping area for ISA bus access when running as initial domain Signed-off-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
#
41f2e477 |
|
30-Mar-2010 |
Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> |
xen: add support for PAT Convert Linux PAT entries into Xen ones when constructing ptes. Linux doesn't use _PAGE_PAT for ptes, so the only difference in the first 4 entries is that Linux uses _PAGE_PWT for WC, whereas Xen (and default) use it for WT. xen_pte_val does the inverse conversion. We hard-code assumptions about Linux's current PAT layout, but a warning on the wrmsr to MSR_IA32_CR_PAT should point out any problems. If necessary we could go to a more general table-based conversion between Linux and Xen PAT entries. hugetlbfs poses a problem at the moment, the x86 architecture uses the same flag for _PAGE_PAT and _PAGE_PSE, which changes meaning depending on which pagetable level we're using. At the moment this should be OK so long as nobody tries to do a pte_val on a hugetlbfs pte. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
|
#
33a84750 |
|
27-Aug-2010 |
Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> |
xen: defer building p2m mfn structures until kernel is mapped When building mfn parts of p2m structure, we rely on being able to use mfn_to_virt, which in turn requires kernel to be mapped into the linear area (which is distinct from the kernel image mapping on 64-bit). Defer calling xen_build_mfn_list_list() until after xen_setup_kernel_pagetable(); Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
|
#
1e17fc7e |
|
03-Sep-2010 |
Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> |
xen: remove noise about registering vcpu info Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
|
#
76fac077 |
|
11-Oct-2010 |
Alok Kataria <akataria@vmware.com> |
x86, kexec: Make sure to stop all CPUs before exiting the kernel x86 smp_ops now has a new op, stop_other_cpus which takes a parameter "wait" this allows the caller to specify if it wants to stop until all the cpus have processed the stop IPI. This is required specifically for the kexec case where we should wait for all the cpus to be stopped before starting the new kernel. We now wait for the cpus to stop in all cases except for panic/kdump where we expect things to be broken and we are doing our best to make things work anyway. This patch fixes a legitimate regression, which was introduced during 2.6.30, by commit id 4ef702c10b5df18ab04921fc252c26421d4d6c75. Signed-off-by: Alok N Kataria <akataria@vmware.com> LKML-Reference: <1286833028.1372.20.camel@ank32.eng.vmware.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Jeremy Fitzhardinge <jeremy@xensource.com> Cc: <stable@kernel.org> v2.6.30-36 Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
|
#
b5401a96 |
|
18-Mar-2010 |
Alex Nixon <alex.nixon@citrix.com> |
xen/x86/PCI: Add support for the Xen PCI subsystem The frontend stub lives in arch/x86/pci/xen.c, alongside other sub-arch PCI init code (e.g. olpc.c). It provides a mechanism for Xen PCI frontend to setup/destroy legacy interrupts, MSI/MSI-X, and PCI configuration operations. [ Impact: add core of Xen PCI support ] [ v2: Removed the IOMMU code and only focusing on PCI.] [ v3: removed usage of pci_scan_all_fns as that does not exist] [ v4: introduced pci_xen value to fix compile warnings] [ v5: squished fixes+features in one patch, changed Reviewed-by to Ccs] [ v7: added Acked-by] Signed-off-by: Alex Nixon <alex.nixon@citrix.com> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Matthew Wilcox <willy@linux.intel.com> Cc: Qing He <qing.he@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: x86@kernel.org
|
#
236260b9 |
|
06-Oct-2010 |
Jeremy Fitzhardinge <jeremy@goop.org> |
memblock: Allow memblock_init to be called early The Xen setup code needs to call memblock_x86_reserve_range() very early, so allow it to initialize the memblock subsystem before doing so. The second memblock_init() is ignored. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> LKML-Reference: <4CACFDAD.3090900@goop.org> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
|
#
086748e5 |
|
03-Aug-2010 |
Ian Campbell <ian.campbell@citrix.com> |
xen/panic: use xen_reboot and fix smp_send_stop Offline vcpu when using stop_self. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
|
#
f09f6d19 |
|
15-Jul-2010 |
Donald Dutile <ddutile@redhat.com> |
Xen: register panic notifier to take crashes of xen guests on panic Register a panic notifier so that when the guest crashes it can shut down the domain and indicate it was a crash to the host. Signed-off-by: Donald Dutile <ddutile@redhat.com> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
|
#
c06ee78d |
|
19-Jul-2010 |
Mukesh Rathor <mukesh.rathor@oracle.com> |
xen: support large numbers of CPUs with vcpu info placement When vcpu info placement is supported, we're not limited to MAX_VIRT_CPUS vcpus. However, if it isn't supported, then ignore any excess vcpus. Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
|
#
8a22b999 |
|
12-Jul-2010 |
Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> |
xen: drop xen_sched_clock in favour of using plain wallclock time xen_sched_clock only counts unstolen time. In principle this should be useful to the Linux scheduler so that it knows how much time a process actually consumed. But in practice this doesn't work very well as the scheduler expects the sched_clock time to be synchronized between cpus. It also uses sched_clock to measure the time a task spends sleeping, in which case "unstolen time" isn't meaningful. So just use plain xen_clocksource_read to return wallclock nanoseconds for sched_clock. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
|
#
ca65f9fc |
|
29-Jul-2010 |
Stefano Stabellini <stefano.stabellini@eu.citrix.com> |
Introduce CONFIG_XEN_PVHVM compile option This patch introduce a CONFIG_XEN_PVHVM compile time option to enable/disable Xen PV on HVM support. Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
|
#
59151001 |
|
17-Jun-2010 |
Stefano Stabellini <stefano.stabellini@eu.citrix.com> |
x86: Call HVMOP_pagetable_dying on exit_mmap. When a pagetable is about to be destroyed, we notify Xen so that the hypervisor can clear the related shadow pagetable. Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
|
#
c1c5413a |
|
13-May-2010 |
Stefano Stabellini <stefano.stabellini@eu.citrix.com> |
x86: Unplug emulated disks and nics. Add a xen_emul_unplug command line option to the kernel to unplug xen emulated disks and nics. Set the default value of xen_emul_unplug depending on whether or not the Xen PV frontends and the Xen platform PCI driver have been compiled for this kernel (modules or built-in are both OK). The user can specify xen_emul_unplug=ignore to enable PV drivers on HVM even if the host platform doesn't support unplug. Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
|
#
409771d2 |
|
13-May-2010 |
Stefano Stabellini <stefano.stabellini@eu.citrix.com> |
x86: Use xen_vcpuop_clockevent, xen_clocksource and xen wallclock. Use xen_vcpuop_clockevent instead of hpet and APIC timers as main clockevent device on all vcpus, use the xen wallclock time as wallclock instead of rtc and use xen_clocksource as clocksource. The pv clock algorithm needs to work correctly for the xen_clocksource and xen wallclock to be usable, only modern Xen versions offer a reliable pv clock in HVM guests (XENFEAT_hvm_safe_pvclock). Using the hpet as clocksource means a VMEXIT every time we read/write to the hpet mmio addresses, pvclock give us a better rating without VMEXITs. Same goes for the xen wallclock and xen_vcpuop_clockevent Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: Don Dutile <ddutile@redhat.com> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
|
#
016b6f5f |
|
13-May-2010 |
Stefano Stabellini <stefano.stabellini@eu.citrix.com> |
xen: Add suspend/resume support for PV on HVM guests. Suspend/resume requires few different things on HVM: the suspend hypercall is different; we don't need to save/restore memory related settings; except the shared info page and the callback mechanism. Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
|
#
38e20b07 |
|
13-May-2010 |
Sheng Yang <sheng@linux.intel.com> |
x86/xen: event channels delivery on HVM. Set the callback to receive evtchns from Xen, using the callback vector delivery mechanism. The traditional way for receiving event channel notifications from Xen is via the interrupts from the platform PCI device. The callback vector is a newer alternative that allow us to receive notifications on any vcpu and doesn't need any PCI support: we allocate a vector exclusively to receive events, in the vector handler we don't need to interact with the vlapic, therefore we avoid a VMEXIT. Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: Sheng Yang <sheng@linux.intel.com> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
|
#
bee6ab53 |
|
13-May-2010 |
Sheng Yang <sheng@linux.intel.com> |
x86: early PV on HVM features initialization. Initialize basic pv on hvm features adding a new Xen HVM specific hypervisor_x86 structure. Don't try to initialize xen-kbdfront and xen-fbfront when running on HVM because the backends are not available. Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: Sheng Yang <sheng@linux.intel.com> Signed-off-by: Yaozu (Eddie) Dong <eddie.dong@intel.com> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
|
#
093d7b46 |
|
16-Sep-2009 |
Miroslav Rezanina <mrezanin@redhat.com> |
xen: release unused free memory Scan an e820 table and release any memory which lies between e820 entries, as it won't be used and would just be wasted. At present this is just to release any memory beyond the end of the e820 map, but it will also deal with holes being punched in the map. Derived from patch by Miroslav Rezanina <mrezanin@redhat.com> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
|
#
7347b408 |
|
19-Feb-2010 |
Alex Nixon <alex.nixon@citrix.com> |
xen: Allow unprivileged Xen domains to create iomap pages PV DomU domains are allowed to map hardware MFNs for PCI passthrough, but are not generally allowed to map raw machine pages. In particular, various pieces of code try to map DMI and ACPI tables in the ISA ROM range. We disallow _PAGE_IOMAP for those mappings, so that they are redirected to a set of local zeroed pages we reserve for that purpose. [ Impact: prevent passthrough of ISA space, as we only allow PCI ] Signed-off-by: Alex Nixon <alex.nixon@citrix.com> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
#
5a0e3ad6 |
|
24-Mar-2010 |
Tejun Heo <tj@kernel.org> |
include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h percpu.h is included by sched.h and module.h and thus ends up being included when building most .c files. percpu.h includes slab.h which in turn includes gfp.h making everything defined by the two files universally available and complicating inclusion dependencies. percpu.h -> slab.h dependency is about to be removed. Prepare for this change by updating users of gfp and slab facilities include those headers directly instead of assuming availability. As this conversion needs to touch large number of source files, the following script is used as the basis of conversion. http://userweb.kernel.org/~tj/misc/slabh-sweep.py The script does the followings. * Scan files for gfp and slab usages and update includes such that only the necessary includes are there. ie. if only gfp is used, gfp.h, if slab is used, slab.h. * When the script inserts a new include, it looks at the include blocks and try to put the new include such that its order conforms to its surrounding. It's put in the include block which contains core kernel includes, in the same order that the rest are ordered - alphabetical, Christmas tree, rev-Xmas-tree or at the end if there doesn't seem to be any matching order. * If the script can't find a place to put a new include (mostly because the file doesn't have fitting include block), it prints out an error message indicating which .h file needs to be added to the file. The conversion was done in the following steps. 1. The initial automatic conversion of all .c files updated slightly over 4000 files, deleting around 700 includes and adding ~480 gfp.h and ~3000 slab.h inclusions. The script emitted errors for ~400 files. 2. Each error was manually checked. Some didn't need the inclusion, some needed manual addition while adding it to implementation .h or embedding .c file was more appropriate for others. This step added inclusions to around 150 files. 3. The script was run again and the output was compared to the edits from #2 to make sure no file was left behind. 4. Several build tests were done and a couple of problems were fixed. e.g. lib/decompress_*.c used malloc/free() wrappers around slab APIs requiring slab.h to be added manually. 5. The script was run on all .h files but without automatically editing them as sprinkling gfp.h and slab.h inclusions around .h files could easily lead to inclusion dependency hell. Most gfp.h inclusion directives were ignored as stuff from gfp.h was usually wildly available and often used in preprocessor macros. Each slab.h inclusion directive was examined and added manually as necessary. 6. percpu.h was updated not to include slab.h. 7. Build test were done on the following configurations and failures were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my distributed build env didn't work with gcov compiles) and a few more options had to be turned off depending on archs to make things build (like ipr on powerpc/64 which failed due to missing writeq). * x86 and x86_64 UP and SMP allmodconfig and a custom test config. * powerpc and powerpc64 SMP allmodconfig * sparc and sparc64 SMP allmodconfig * ia64 SMP allmodconfig * s390 SMP allmodconfig * alpha SMP allmodconfig * um on x86_64 SMP allmodconfig 8. percpu.h modifications were reverted so that it could be applied as a separate patch and serve as bisection point. Given the fact that I had only a couple of failures from tests on step 6, I'm fairly confident about the coverage of this conversion patch. If there is a breakage, it's likely to be something in one of the arch headers which should be easily discoverable easily on most builds of the specific arch. Signed-off-by: Tejun Heo <tj@kernel.org> Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
|
#
817a824b |
|
26-Feb-2010 |
Ian Campbell <ian.campbell@citrix.com> |
x86, xen: Disable highmem PTE allocation even when CONFIG_HIGHPTE=y There's a path in the pagefault code where the kernel deliberately breaks its own locking rules by kmapping a high pte page without holding the pagetable lock (in at least page_check_address). This breaks Xen's ability to track the pinned/unpinned state of the page. There does not appear to be a viable workaround for this behaviour so simply disable HIGHPTE for all Xen guests. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> LKML-Reference: <1267204562-11844-1-git-send-email-ian.campbell@citrix.com> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Pasi Kärkkäinen <pasik@iki.fi> Cc: <stable@kernel.org> # .32.x: 14315592: Allow highmem user page tables to be disabled at boot time Cc: <stable@kernel.org> # .32.x Cc: <xen-devel@lists.xensource.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
|
#
e68266b7 |
|
13-Jan-2010 |
Ian Campbell <ian.campbell@citrix.com> |
x86: xen: 64-bit kernel RPL should be 0 Under Xen 64 bit guests actually run their kernel in ring 3, however the hypervisor takes care of squashing descriptor the RPLs transparently (in order to allow them to continue to differentiate between user and kernel space CS using the RPL). Therefore the Xen paravirt backend should use RPL==0 instead of 1 (or 3). Using RPL==1 causes generic arch code to take incorrect code paths because it uses "testl $3, <CS>, je foo" type tests for a userspace CS and this considers 1==userspace. This issue was previously masked because get_kernel_rpl() was omitted when setting CS in kernel_thread(). This was fixed when kernel_thread() was unified with 32 bit in f443ff4201dd25cd4dec183f9919ecba90c8edc2. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Cc: Christian Kujau <lists@nerdbynature.de> Cc: Jeremy Fitzhardinge <Jeremy.Fitzhardinge@citrix.com> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Cc: Brian Gerst <brgerst@gmail.com> LKML-Reference: <1263377768-19600-2-git-send-email-ian.campbell@citrix.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
5d990b62 |
|
04-Dec-2009 |
Chris Wright <chrisw@sous-sol.org> |
PCI: add pci_request_acs Commit ae21ee65e8bc228416bbcc8a1da01c56a847a60c "PCI: acs p2p upsteram forwarding enabling" doesn't actually enable ACS. Add a function to pci core to allow an IOMMU to request that ACS be enabled. The existing mechanism of using iommu_found() in the pci core to know when ACS should be enabled doesn't actually work due to initialization order; iommu has only been detected not initialized. Have Intel and AMD IOMMUs request ACS, and Xen does as well during early init of dom0. Cc: Allen Kay <allen.m.kay@intel.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Cc: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
|
#
499d19b8 |
|
24-Nov-2009 |
Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> |
xen: register runstate info for boot CPU early printk timestamping uses sched_clock, which in turn relies on runstate info under Xen. So make sure we set it up before any printks can be called. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Stable Kernel <stable@kernel.org>
|
#
3905bb2a |
|
20-Nov-2009 |
Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> |
xen: restore runstate_info even if !have_vcpu_info_placement Even if have_vcpu_info_placement is not set, we still need to set up the runstate area on each resumed vcpu. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Stable Kernel <stable@kernel.org>
|
#
be012920 |
|
20-Nov-2009 |
Ian Campbell <Ian.Campbell@citrix.com> |
xen: re-register runstate area earlier on resume. This is necessary to ensure the runstate area is available to xen_sched_clock before any calls to printk which will require it in order to provide a timestamp. I chose to pull the xen_setup_runstate_info out of xen_time_init into the caller in order to maintain parity with calling xen_setup_runstate_info separately from calling xen_time_resume. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Stable Kernel <stable@kernel.org>
|
#
4763ed4d |
|
13-Nov-2009 |
H. Peter Anvin <hpa@zytor.com> |
x86, mm: Clean up and simplify NX enablement The 32- and 64-bit code used very different mechanisms for enabling NX, but even the 32-bit code was enabling NX in head_32.S if it is available. Furthermore, we had a bewildering collection of tests for the available of NX. This patch: a) merges the 32-bit set_nx() and the 64-bit check_efer() function into a single x86_configure_nx() function. EFER control is left to the head code. b) eliminates the nx_enabled variable entirely. Things that need to test for NX enablement can verify __supported_pte_mask directly, and cpu_has_nx gives the supported status of NX. Signed-off-by: H. Peter Anvin <hpa@zytor.com> Cc: Tejun Heo <tj@kernel.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Vegard Nossum <vegardno@ifi.uio.no> Cc: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Chris Wright <chrisw@sous-sol.org> LKML-Reference: <1258154897-6770-5-git-send-email-hpa@zytor.com> Acked-by: Kees Cook <kees.cook@canonical.com>
|
#
1ccbf534 |
|
06-Oct-2009 |
Jeremy Fitzhardinge <jeremy@goop.org> |
xen: move Xen-testing predicates to common header Move xen_domain and related tests out of asm-x86 to xen/xen.h so they can be included whenever they are necessary. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
|
#
82d64699 |
|
22-Oct-2009 |
Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> |
xen: mask extended topology info in cpuid A Xen guest never needs to know about extended topology, and knowing would just confuse it. This patch just zeros ebx in leaf 0xb which indicates no topology info, preventing a crash under Xen on cpus which support this leaf. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Stable Kernel <stable@kernel.org>
|
#
973df35e |
|
27-Oct-2009 |
Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> |
xen: set up mmu_ops before trying to set any ptes xen_setup_stackprotector() ends up trying to set page protections, so we need to have vm_mmu_ops set up before trying to do so. Failing to do so causes an early boot crash. [ Impact: Fix early crash under Xen. ] Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
|
#
b75fe4e5 |
|
21-Sep-2009 |
Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> |
xen: check EFER for NX before setting up GDT mapping x86-64 assumes NX is available by default, so we need to explicitly check for it before using NX. Some first-generation Intel x86-64 processors didn't support NX, and even recent systems allow it to be disabled in BIOS. [ Impact: prevent Xen crash on NX-less 64-bit machines ] Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Stable Kernel <stable@kernel.org>
|
#
7bd867df |
|
09-Sep-2009 |
Feng Tang <feng.tang@intel.com> |
x86: Move get/set_wallclock to x86_platform_ops get/set_wallclock() have already a set of platform dependent implementations (default, EFI, paravirt). MRST will add another variant. Moving them to platform ops simplifies the existing code and minimizes the effort to integrate new variants. Signed-off-by: Feng Tang <feng.tang@intel.com> LKML-Reference: <new-submission> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
577eebea |
|
27-Aug-2009 |
Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> |
xen: make -fstack-protector work under Xen -fstack-protector uses a special per-cpu "stack canary" value. gcc generates special code in each function to test the canary to make sure that the function's stack hasn't been overrun. On x86-64, this is simply an offset of %gs, which is the usual per-cpu base segment register, so setting it up simply requires loading %gs's base as normal. On i386, the stack protector segment is %gs (rather than the usual kernel percpu %fs segment register). This requires setting up the full kernel GDT and then loading %gs accordingly. We also need to make sure %gs is initialized when bringing up secondary cpus too. To keep things consistent, we do the full GDT/segment register setup on both architectures. Because we need to avoid -fstack-protected code before setting up the GDT and because there's no way to disable it on a per-function basis, several files need to have stack-protector inhibited. [ Impact: allow Xen booting with stack-protector enabled ] Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
|
#
0cc0213e |
|
31-Aug-2009 |
H. Peter Anvin <hpa@zytor.com> |
x86, msr: Have the _safe MSR functions return -EIO, not -EFAULT For some reason, the _safe MSR functions returned -EFAULT, not -EIO. However, the only user which cares about the return code as anything other than a boolean is the MSR driver, which wants -EIO. Change it to -EIO across the board. Signed-off-by: H. Peter Anvin <hpa@zytor.com> Cc: Jeremy Fitzhardinge <jeremy@xensource.com> Cc: Chris Wright <chrisw@sous-sol.org> Cc: Alok Kataria <akataria@vmware.com> Cc: Rusty Russell <rusty@rustcorp.com.au>
|
#
2d826404 |
|
20-Aug-2009 |
Thomas Gleixner <tglx@linutronix.de> |
x86: Move tsc_calibration to x86_init_ops TSC calibration is modified by the vmware hypervisor and paravirt by separate means. Moorestown wants to add its own calibration routine as well. So make calibrate_tsc a proper x86_init_ops function and override it by paravirt or by the early setup of the vmware hypervisor. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
845b3944 |
|
19-Aug-2009 |
Thomas Gleixner <tglx@linutronix.de> |
x86: Add timer_init to x86_init_ops The timer init code is convoluted with several quirks and the paravirt timer chooser. Figuring out which code path is actually taken is not for the faint hearted. Move the numaq TSC quirk to tsc_pre_init x86_init_ops function and replace the paravirt time chooser and the remaining x86 quirk with a simple x86_init_ops function. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
736decac |
|
18-Aug-2009 |
Thomas Gleixner <tglx@linutronix.de> |
x86: Move percpu clockevents setup to x86_init_ops paravirt overrides the setup of the default apic timers as per cpu timers. Moorestown needs to override that as well. Move it to x86_init_ops setup and create a separate x86_cpuinit struct which holds the function for the secondary evtl. hotplugabble CPUs. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
f1d7062a |
|
20-Aug-2009 |
Thomas Gleixner <tglx@linutronix.de> |
x86: Move xen_post_allocator_init into xen_pagetable_setup_done We really do not need two paravirt/x86_init_ops functions which are called in two consecutive source lines. Move the only user of post_allocator_init into the already existing pagetable_setup_done function. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
030cb6c0 |
|
20-Aug-2009 |
Thomas Gleixner <tglx@linutronix.de> |
x86: Move paravirt pagetable_setup to x86_init_ops Replace more paravirt hackery by proper x86_init_ops. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
6f30c1ac |
|
20-Aug-2009 |
Thomas Gleixner <tglx@linutronix.de> |
x86: Move paravirt banner printout to x86_init_ops Replace another obscure paravirt magic and move it to x86_init_ops. Such a hook is also useful for embedded and special hardware. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
42bbdb43 |
|
20-Aug-2009 |
Thomas Gleixner <tglx@linutronix.de> |
x86: Replace ARCH_SETUP by a proper x86_init_ops ARCH_SETUP is a horrible leftover from the old arch/i386 mach support code. It still has a lonely user in xen. Move it to x86_init_ops. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
6b18ae3e |
|
20-Aug-2009 |
Thomas Gleixner <tglx@linutronix.de> |
x86: Move memory_setup to x86_init_ops memory_setup is overridden by x86_quirks and by paravirts with weak functions and quirks. Unify the whole mess and make it an unconditional x86_init_ops function which defaults to the standard function and can be overridden by the early platform code. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
7adb4df4 |
|
25-Aug-2009 |
H. Peter Anvin <hpa@zytor.com> |
x86, xen: Initialize cx to suppress warning Initialize cx before calling xen_cpuid(), in order to suppress the "may be used uninitialized in this function" warning. Signed-off-by: H. Peter Anvin <hpa@zytor.com> Cc: Jeremy Fitzhardinge <jeremy@goop.org>
|
#
d560bc61 |
|
25-Aug-2009 |
Jeremy Fitzhardinge <jeremy@goop.org> |
x86, xen: Suppress WP test on Xen Xen always runs on CPUs which properly support WP enforcement in privileged mode, so there's no need to test for it. This also works around a crash reported by Arnd Hannemann, though I think its just a band-aid for that case. Reported-by: Arnd Hannemann <hannemann@nets.rwth-aachen.de> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Acked-by: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
|
#
ce2eef33 |
|
17-Aug-2009 |
Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> |
xen: rearrange things to fix stackprotector Make sure the stack-protector segment registers are properly set up before calling any functions which may have stack-protection compiled into them. [ Impact: prevent Xen early-boot crash when stack-protector is enabled ] Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
|
#
a789ed5f |
|
24-Apr-2009 |
Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> |
xen: cache cr0 value to avoid trap'n'emulate for read_cr0 stts() is implemented in terms of read_cr0/write_cr0 to update the state of the TS bit. This happens during context switch, and so is fairly performance critical. Rather than falling back to a trap-and-emulate native read_cr0, implement our own by caching the last-written value from write_cr0 (the TS bit is the only one we really care about). Impact: optimise Xen context switches Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
|
#
b80119bb |
|
24-Apr-2009 |
Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> |
xen/x86-64: clean up warnings about IST-using traps Ignore known IST-using traps. Aside from the debugger traps, they're low-level faults which Xen will handle for us, so the kernel needn't worry about them. Keep warning in case unknown trap starts using IST. Impact: suppress spurious warnings Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
|
#
6cac5a92 |
|
29-Mar-2009 |
Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> |
xen/x86-64: fix breakpoints and hardware watchpoints Native x86-64 uses the IST mechanism to run int3 and debug traps on an alternative stack. Xen does not do this, and so the frames were being misinterpreted by the ptrace code. This change special-cases these two exceptions by using Xen variants which run on the normal kernel stack properly. Impact: avoid crash or bad data when IST trap is invoked under Xen Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
|
#
2b2a7334 |
|
29-Mar-2009 |
Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> |
xen: clean up gate trap/interrupt constants Use GATE_INTERRUPT/TRAP rather than 0xe/f. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
|
#
bc6081ff |
|
27-Mar-2009 |
Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> |
xen: set _PAGE_NX in __supported_pte_mask before pagetable construction Some 64-bit machines don't support the NX flag in ptes. Check for NX before constructing the kernel pagetables. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
|
#
191216b9 |
|
07-Mar-2009 |
Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> |
xen: mask XSAVE from cpuid Xen leaves XSAVE set in cpuid, but doesn't allow cr4.OSXSAVE to be set. This confuses the kernel and it ends up crashing on an xsetbv instruction. At boot time, try to set cr4.OSXSAVE, and mask XSAVE out of cpuid it we can't. This will produce a spurious error from Xen, but allows us to support XSAVE if/when Xen does. This also factors out the cpuid mask decisions to boot time. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
|
#
c667d5d6 |
|
04-Mar-2009 |
Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> |
xen: remove xen_load_gdt debug Don't need the noise. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
|
#
a957fac5 |
|
04-Mar-2009 |
Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> |
xen: make xen_load_gdt simpler Remove use of multicall machinery which is unused (gdt loading is never performance critical). This removes the implicit use of percpu variables, which simplifies understanding how the percpu code's use of load_gdt interacts with this code. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
|
#
c7da8c82 |
|
04-Mar-2009 |
Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> |
xen: clean up xen_load_gdt Makes the logic a bit clearer. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
|
#
6d02c426 |
|
29-Mar-2009 |
Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> |
xen: clean up gate trap/interrupt constants Use GATE_INTERRUPT/TRAP rather than 0xe/f. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
|
#
707ebbc8 |
|
27-Mar-2009 |
Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> |
xen: set _PAGE_NX in __supported_pte_mask before pagetable construction Some 64-bit machines don't support the NX flag in ptes. Check for NX before constructing the kernel pagetables. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
|
#
e826fe1b |
|
07-Mar-2009 |
Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> |
xen: mask XSAVE from cpuid Xen leaves XSAVE set in cpuid, but doesn't allow cr4.OSXSAVE to be set. This confuses the kernel and it ends up crashing on an xsetbv instruction. At boot time, try to set cr4.OSXSAVE, and mask XSAVE out of cpuid it we can't. This will produce a spurious error from Xen, but allows us to support XSAVE if/when Xen does. This also factors out the cpuid mask decisions to boot time. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
|
#
b4b7e585 |
|
04-Mar-2009 |
Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> |
xen: remove xen_load_gdt debug Don't need the noise. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
|
#
3ce5fa7e |
|
04-Mar-2009 |
Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> |
xen: make xen_load_gdt simpler Remove use of multicall machinery which is unused (gdt loading is never performance critical). This removes the implicit use of percpu variables, which simplifies understanding how the percpu code's use of load_gdt interacts with this code. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
|
#
6ed6bf42 |
|
04-Mar-2009 |
Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> |
xen: clean up xen_load_gdt Makes the logic a bit clearer. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
|
#
224101ed |
|
18-Feb-2009 |
Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> |
x86/paravirt: finish change from lazy cpu to context switch start/end Impact: fix lazy context switch API Pass the previous and next tasks into the context switch start end calls, so that the called functions can properly access the task state (esp in end_context_switch, in which the next task is not yet completely current). Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
|
#
b407fc57 |
|
18-Feb-2009 |
Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> |
x86/paravirt: flush pending mmu updates on context switch Impact: allow preemption during lazy mmu updates If we're in lazy mmu mode when context switching, leave lazy mmu mode, but remember the task's state in TIF_LAZY_MMU_UPDATES. When we resume the task, check this flag and re-enter lazy mmu mode if its set. This sets things up for allowing lazy mmu mode while preemptible, though that won't actually be active until the next change. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
|
#
9976b39b |
|
27-Feb-2009 |
Jeremy Fitzhardinge <jeremy@goop.org> |
xen: deal with virtually mapped percpu data The virtually mapped percpu space causes us two problems: - for hypercalls which take an mfn, we need to do a full pagetable walk to convert the percpu va into an mfn, and - when a hypercall requires a page to be mapped RO via all its aliases, we need to make sure its RO in both the percpu mapping and in the linear mapping This primarily affects the gdt and the vcpu info structure. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Xen-devel <xen-devel@lists.xensource.com> Cc: Gerd Hoffmann <kraxel@redhat.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Tejun Heo <htejun@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
55d80856 |
|
25-Feb-2009 |
Jeremy Fitzhardinge <jeremy@goop.org> |
xen: disable interrupts early, as start_kernel expects This avoids a lockdep warning from: if (DEBUG_LOCKS_WARN_ON(unlikely(!early_boot_irqs_enabled))) return; in trace_hardirqs_on_caller(); Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Mark McLoughlin <markmc@redhat.com> Cc: Xen-devel <xen-devel@lists.xensource.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
7b6aa335 |
|
17-Feb-2009 |
Ingo Molnar <mingo@elte.hu> |
x86, apic: remove genapic.h Impact: cleanup Remove genapic.h and remove all references to it. Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
c1eeb2de |
|
17-Feb-2009 |
Yinghai Lu <yinghai@kernel.org> |
x86: fold apic_ops into genapic Impact: cleanup make it simpler, don't need have one extra struct. v2: fix the sgi_uv build Signed-off-by: Yinghai Lu <yinghai@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
ccbeed3a |
|
09-Feb-2009 |
Tejun Heo <tj@kernel.org> |
x86: make lazy %gs optional on x86_32 Impact: pt_regs changed, lazy gs handling made optional, add slight overhead to SAVE_ALL, simplifies error_code path a bit On x86_32, %gs hasn't been used by kernel and handled lazily. pt_regs doesn't have place for it and gs is saved/loaded only when necessary. In preparation for stack protector support, this patch makes lazy %gs handling optional by doing the followings. * Add CONFIG_X86_32_LAZY_GS and place for gs in pt_regs. * Save and restore %gs along with other registers in entry_32.S unless LAZY_GS. Note that this unfortunately adds "pushl $0" on SAVE_ALL even when LAZY_GS. However, it adds no overhead to common exit path and simplifies entry path with error code. * Define different user_gs accessors depending on LAZY_GS and add lazy_save_gs() and lazy_load_gs() which are noop if !LAZY_GS. The lazy_*_gs() ops are used to save, load and clear %gs lazily. * Define ELF_CORE_COPY_KERNEL_REGS() which always read %gs directly. xen and lguest changes need to be verified. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Jeremy Fitzhardinge <jeremy@xensource.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
e4d04071 |
|
02-Feb-2009 |
Jeremy Fitzhardinge <jeremy@goop.org> |
xen: use direct ops on 64-bit Enable the use of the direct vcpu-access operations on 64-bit. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
|
#
38341432 |
|
02-Feb-2009 |
Jeremy Fitzhardinge <jeremy@goop.org> |
xen: setup percpu data pointers We need to access percpu data fairly early, so set up the percpu registers as soon as possible. We only need to load the appropriate segment register. We already have a GDT, but its hard to change it early because we need to manipulate the pagetable to do so, and that hasn't been set up yet. Also, set the kernel stack when bringing up secondary CPUs. If we don't they all end up sharing the same stack... Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
|
#
795f99b6 |
|
30-Jan-2009 |
Jeremy Fitzhardinge <jeremy@goop.org> |
xen: setup percpu data pointers Impact: fix xen booting We need to access percpu data fairly early, so set up the percpu registers as soon as possible. We only need to load the appropriate segment register. We already have a GDT, but its hard to change it early because we need to manipulate the pagetable to do so, and that hasn't been set up yet. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Tejun Heo <tj@kernel.org>
|
#
ecb93d1c |
|
28-Jan-2009 |
Jeremy Fitzhardinge <jeremy@goop.org> |
x86/paravirt: add register-saving thunks to reduce caller register pressure Impact: Optimization One of the problems with inserting a pile of C calls where previously there were none is that the register pressure is greatly increased. The C calling convention says that the caller must expect a certain set of registers may be trashed by the callee, and that the callee can use those registers without restriction. This includes the function argument registers, and several others. This patch seeks to alleviate this pressure by introducing wrapper thunks that will do the register saving/restoring, so that the callsite doesn't need to worry about it, but the callee function can be conventional compiler-generated code. In many cases (particularly performance-sensitive cases) the callee will be in assembler anyway, and need not use the compiler's calling convention. Standard calling convention is: arguments return scratch x86-32 eax edx ecx eax ? x86-64 rdi rsi rdx rcx rax r8 r9 r10 r11 The thunk preserves all argument and scratch registers. The return register is not preserved, and is available as a scratch register for unwrapped callee code (and of course the return value). Wrapped function pointers are themselves wrapped in a struct paravirt_callee_save structure, in order to get some warning from the compiler when functions with mismatched calling conventions are used. The most common paravirt ops, both statically and dynamically, are interrupt enable/disable/save/restore, so handle them first. This is particularly easy since their calls are handled specially anyway. XXX Deal with VMI. What's their calling convention? Signed-off-by: H. Peter Anvin <hpa@zytor.com>
|
#
319f3ba5 |
|
28-Jan-2009 |
Jeremy Fitzhardinge <jeremy@goop.org> |
xen: move remaining mmu-related stuff into mmu.c Impact: Cleanup Move remaining mmu-related stuff into mmu.c. A general cleanup, and lay the groundwork for later patches. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
|
#
ab897d20 |
|
22-Jan-2009 |
Jeremy Fitzhardinge <jeremy@goop.org> |
x86/pvops: remove pte_flags pvop pte_flags() was introduced as a new pvop in order to extract just the flags portion of a pte, which is a potentially cheaper operation than extracting the page number as well. It turns out this operation is not needed, because simply using a mask to extract the flags from a pte is sufficient for all current users. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
8ce03197 |
|
18-Jan-2009 |
Brian Gerst <brgerst@gmail.com> |
x86: remove pda_init() Impact: cleanup Copy the code to cpu_init() to satisfy the requirement that the cpu be reinitialized. Remove all other calls, since the segments are already initialized in head_64.S. Signed-off-by: Brian Gerst <brgerst@gmail.com> Signed-off-by: Tejun Heo <tj@kernel.org>
|
#
6dbde353 |
|
15-Jan-2009 |
Ingo Molnar <mingo@elte.hu> |
percpu: add optimized generic percpu accessors It is an optimization and a cleanup, and adds the following new generic percpu methods: percpu_read() percpu_write() percpu_add() percpu_sub() percpu_and() percpu_or() percpu_xor() and implements support for them on x86. (other architectures will fall back to a default implementation) The advantage is that for example to read a local percpu variable, instead of this sequence: return __get_cpu_var(var); ffffffff8102ca2b: 48 8b 14 fd 80 09 74 mov -0x7e8bf680(,%rdi,8),%rdx ffffffff8102ca32: 81 ffffffff8102ca33: 48 c7 c0 d8 59 00 00 mov $0x59d8,%rax ffffffff8102ca3a: 48 8b 04 10 mov (%rax,%rdx,1),%rax We can get a single instruction by using the optimized variants: return percpu_read(var); ffffffff8102ca3f: 65 48 8b 05 91 8f fd mov %gs:0x7efd8f91(%rip),%rax I also cleaned up the x86-specific APIs and made the x86 code use these new generic percpu primitives. tj: * fixed generic percpu_sub() definition as Roel Kluin pointed out * added percpu_and() for completeness's sake * made generic percpu ops atomic against preemption Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Tejun Heo <tj@kernel.org>
|
#
004aa322 |
|
13-Jan-2009 |
Tejun Heo <tj@kernel.org> |
x86: misc clean up after the percpu update Do the following cleanups: * kill x86_64_init_pda() which now is equivalent to pda_init() * use per_cpu_offset() instead of cpu_pda() when initializing initial_gs Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
4595f962 |
|
10-Jan-2009 |
Rusty Russell <rusty@rustcorp.com.au> |
x86: change flush_tlb_others to take a const struct cpumask Impact: reduce stack usage, use new cpumask API. This is made a little more tricky by uv_flush_tlb_others which actually alters its argument, for an IPI to be sent to the remaining cpus in the mask. I solve this by allocating a cpumask_var_t for this case and falling back to IPI should this fail. To eliminate temporaries in the caller, all flush_tlb_others implementations now do the this-cpu-elimination step themselves. Note also the curious "cpus_or(f->flush_cpumask, cpumask, f->flush_cpumask)" which has been there since pre-git and yet f->flush_cpumask is always zero at this point. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Mike Travis <travis@sgi.com>
|
#
ecbf29cd |
|
16-Dec-2008 |
Jeremy Fitzhardinge <jeremy@goop.org> |
xen: clean up asm/xen/hypervisor.h Impact: cleanup hypervisor.h had accumulated a lot of crud, including lots of spurious #includes. Clean it all up, and go around fixing up everything else accordingly. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
f63c2f24 |
|
16-Dec-2008 |
Tej <bewith.tej@gmail.com> |
xen: whitespace/checkpatch cleanup Impact: cleanup Signed-off-by: Tej <bewith.tej@gmail.com> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
d05fdf31 |
|
28-Oct-2008 |
Jeremy Fitzhardinge <jeremy@goop.org> |
xen: make sure stray alias mappings are gone before pinning Xen requires that all mappings of pagetable pages are read-only, so that they can't be updated illegally. As a result, if a page is being turned into a pagetable page, we need to make sure all its mappings are RO. If the page had been used for ioremap or vmalloc, it may still have left over mappings as a result of not having been lazily unmapped. This change makes sure we explicitly mop them all up before pinning the page. Unlike aliases created by kmap, the there can be vmalloc aliases even for non-high pages, so we must do the flush unconditionally. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Linux Memory Management List <linux-mm@kvack.org> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
db64fe02 |
|
18-Oct-2008 |
Nick Piggin <npiggin@suse.de> |
mm: rewrite vmap layer Rewrite the vmap allocator to use rbtrees and lazy tlb flushing, and provide a fast, scalable percpu frontend for small vmaps (requires a slightly different API, though). The biggest problem with vmap is actually vunmap. Presently this requires a global kernel TLB flush, which on most architectures is a broadcast IPI to all CPUs to flush the cache. This is all done under a global lock. As the number of CPUs increases, so will the number of vunmaps a scaled workload will want to perform, and so will the cost of a global TLB flush. This gives terrible quadratic scalability characteristics. Another problem is that the entire vmap subsystem works under a single lock. It is a rwlock, but it is actually taken for write in all the fast paths, and the read locking would likely never be run concurrently anyway, so it's just pointless. This is a rewrite of vmap subsystem to solve those problems. The existing vmalloc API is implemented on top of the rewritten subsystem. The TLB flushing problem is solved by using lazy TLB unmapping. vmap addresses do not have to be flushed immediately when they are vunmapped, because the kernel will not reuse them again (would be a use-after-free) until they are reallocated. So the addresses aren't allocated again until a subsequent TLB flush. A single TLB flush then can flush multiple vunmaps from each CPU. XEN and PAT and such do not like deferred TLB flushing because they can't always handle multiple aliasing virtual addresses to a physical address. They now call vm_unmap_aliases() in order to flush any deferred mappings. That call is very expensive (well, actually not a lot more expensive than a single vunmap under the old scheme), however it should be OK if not called too often. The virtual memory extent information is stored in an rbtree rather than a linked list to improve the algorithmic scalability. There is a per-CPU allocator for small vmaps, which amortizes or avoids global locking. To use the per-CPU interface, the vm_map_ram / vm_unmap_ram interfaces must be used in place of vmap and vunmap. Vmalloc does not use these interfaces at the moment, so it will not be quite so scalable (although it will use lazy TLB flushing). As a quick test of performance, I ran a test that loops in the kernel, linearly mapping then touching then unmapping 4 pages. Different numbers of tests were run in parallel on an 4 core, 2 socket opteron. Results are in nanoseconds per map+touch+unmap. threads vanilla vmap rewrite 1 14700 2900 2 33600 3000 4 49500 2800 8 70631 2900 So with a 8 cores, the rewritten version is already 25x faster. In a slightly more realistic test (although with an older and less scalable version of the patch), I ripped the not-very-good vunmap batching code out of XFS, and implemented the large buffer mapping with vm_map_ram and vm_unmap_ram... along with a couple of other tricks, I was able to speed up a large directory workload by 20x on a 64 CPU system. I believe vmap/vunmap is actually sped up a lot more than 20x on such a system, but I'm running into other locks now. vmap is pretty well blown off the profiles. Before: 1352059 total 0.1401 798784 _write_lock 8320.6667 <- vmlist_lock 529313 default_idle 1181.5022 15242 smp_call_function 15.8771 <- vmap tlb flushing 2472 __get_vm_area_node 1.9312 <- vmap 1762 remove_vm_area 4.5885 <- vunmap 316 map_vm_area 0.2297 <- vmap 312 kfree 0.1950 300 _spin_lock 3.1250 252 sn_send_IPI_phys 0.4375 <- tlb flushing 238 vmap 0.8264 <- vmap 216 find_lock_page 0.5192 196 find_next_bit 0.3603 136 sn2_send_IPI 0.2024 130 pio_phys_write_mmr 2.0312 118 unmap_kernel_range 0.1229 After: 78406 total 0.0081 40053 default_idle 89.4040 33576 ia64_spinlock_contention 349.7500 1650 _spin_lock 17.1875 319 __reg_op 0.5538 281 _atomic_dec_and_lock 1.0977 153 mutex_unlock 1.5938 123 iget_locked 0.1671 117 xfs_dir_lookup 0.1662 117 dput 0.1406 114 xfs_iget_core 0.0268 92 xfs_da_hashname 0.1917 75 d_alloc 0.0670 68 vmap_page_range 0.0462 <- vmap 58 kmem_cache_alloc 0.0604 57 memset 0.0540 52 rb_next 0.1625 50 __copy_user 0.0208 49 bitmap_find_free_region 0.2188 <- vmap 46 ia64_sn_udelay 0.1106 45 find_inode_fast 0.1406 42 memcmp 0.2188 42 finish_task_switch 0.1094 42 __d_lookup 0.0410 40 radix_tree_lookup_slot 0.1250 37 _spin_unlock_irqrestore 0.3854 36 xfs_bmapi 0.0050 36 kmem_cache_free 0.0256 35 xfs_vn_getattr 0.0322 34 radix_tree_lookup 0.1062 33 __link_path_walk 0.0035 31 xfs_da_do_buf 0.0091 30 _xfs_buf_find 0.0204 28 find_get_page 0.0875 27 xfs_iread 0.0241 27 __strncpy_from_user 0.2812 26 _xfs_buf_initialize 0.0406 24 _xfs_buf_lookup_pages 0.0179 24 vunmap_page_range 0.0250 <- vunmap 23 find_lock_page 0.0799 22 vm_map_ram 0.0087 <- vmap 20 kfree 0.0125 19 put_page 0.0330 18 __kmalloc 0.0176 17 xfs_da_node_lookup_int 0.0086 17 _read_lock 0.0885 17 page_waitqueue 0.0664 vmap has gone from being the top 5 on the profiles and flushing the crap out of all TLBs, to using less than 1% of kernel time. [akpm@linux-foundation.org: cleanups, section fix] [akpm@linux-foundation.org: fix build on alpha] Signed-off-by: Nick Piggin <npiggin@suse.de> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Cc: Krzysztof Helt <krzysztof.h1@poczta.fm> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
5dc64a34 |
|
10-Oct-2008 |
Ian Campbell <Ian.Campbell@citrix.com> |
xen: do not reserve 2 pages of padding between hypervisor and fixmap. When reserving space for the hypervisor the Xen paravirt backend adds an extra two pages (this was carried forward from the 2.6.18-xen tree which had them "for safety"). Depending on various CONFIG options this can cause the boot time fixmaps to span multiple PMDs which is not supported and triggers a WARN in early_ioremap_init(). This was exposed by 2216d199b1430d1c0affb1498a9ebdbd9c0de439 which moved the dmi table parsing earlier. x86: fix CONFIG_X86_RESERVE_LOW_64K=y The bad_bios_dmi_table() quirk never triggered because we do DMI setup too late. Move it a bit earlier. There is no real reason to reserve these two extra pages and the fixmap already incorporates FIX_HOLE which serves the same purpose. None of the other callers of reserve_top_address do this. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
db053b86 |
|
02-Oct-2008 |
Jeremy Fitzhardinge <jeremy@goop.org> |
xen: clean up x86-64 warnings There are a couple of Xen features which rely on directly accessing per-cpu data via a segment register, which is not yet available on x86-64. In the meantime, just disable direct access to the vcpu info structure; this leaves some of the code as dead, but it will come to life in time, and the warnings are suppressed. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
6a9e9184 |
|
09-Sep-2008 |
Jeremy Fitzhardinge <jeremy@goop.org> |
xen: fix pinning when not using split pte locks We only pin PTE pages when using split PTE locks, so don't do the pin/unpin when attaching/detaching pte pages to a pinned pagetable. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
e4a6be4d |
|
23-Jul-2008 |
Eduardo Habkost <ehabkost@redhat.com> |
x86, xen: Use native_pte_flags instead of native_pte_val for .pte_flags Using native_pte_val triggers the BUG_ON() in the paravirt_ops version of pte_flags(). Signed-off-by: Eduardo Habkost <ehabkost@redhat.com> Acked-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
f8639939 |
|
30-Jul-2008 |
Eduardo Habkost <ehabkost@redhat.com> |
x86, paravirt_ops: use unsigned long instead of u32 for alloc_p*() pfn args This patch changes the pfn args from 'u32' to 'unsigned long' on alloc_p*() functions on paravirt_ops, and the corresponding implementations for Xen and VMI. The prototypes for CONFIG_PARAVIRT=n are already using unsigned long, so paravirt.h now matches the prototypes on asm-x86/pgalloc.h. It shouldn't result in any changes on generated code on 32-bit, with or without CONFIG_PARAVIRT. On both cases, 'codiff -f' didn't show any change after applying this patch. On 64-bit, there are (expected) binary changes only when CONFIG_PARAVIRT is enabled, as the patch is really supposed to change the size of the pfn args. [ v2: KVM_GUEST: use the right parameter type on kvm_release_pt() ] Signed-off-by: Eduardo Habkost <ehabkost@redhat.com> Acked-by: Jeremy Fitzhardinge <jeremy@goop.org> Acked-by: Zachary Amsden <zach@vmware.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
6e833587 |
|
19-Aug-2008 |
Jeremy Fitzhardinge <jeremy@goop.org> |
xen: clean up domain mode predicates There are four operating modes Xen code may find itself running in: - native - hvm domain - pv dom0 - pv domU Clean up predicates for testing for these states to make them more consistent. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Xen-devel <xen-devel@lists.xensource.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
169ad16b |
|
28-Jul-2008 |
Eduardo Habkost <ehabkost@redhat.com> |
xen_alloc_ptpage: cast PFN_PHYS() argument to unsigned long Currently paravirt_ops alloc_p*() uses u32 for the pfn args. We should change that later, but while the pfn parameter is still u32, we need to cast the PFN_PHYS() argument at xen_alloc_ptpage() to unsigned long, otherwise it will lose bits on the shift. I think PFN_PHYS() should behave better when fed with smaller integers, but a cast to unsigned long won't be enough for all cases on 32-bit PAE, and a cast to u64 would be overkill for most users of PFN_PHYS(). We could have two different flavors of PFN_PHYS: one for low pages only (unsigned long) and another that works for any page (u64)), but while we don't have it, we will need the cast to unsigned long on xen_alloc_ptpage(). Signed-off-by: Eduardo Habkost <ehabkost@redhat.com> Acked-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
cef43bf6 |
|
28-Jul-2008 |
Jeremy Fitzhardinge <jeremy@goop.org> |
xen: fix allocation and use of large ldts, cleanup Add a proper comment for set_aliased_prot() and fix an unsigned long/void * warning. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
0d1edf46 |
|
28-Jul-2008 |
Jeremy Fitzhardinge <jeremy@goop.org> |
xen: compile irq functions without -pg for ftrace For some reason I managed to miss a bunch of irq-related functions which also need to be compiled without -pg when using ftrace. This patch moves them into their own file, and starts a cleanup process I've been meaning to do anyway. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Sam Ravnborg <sam@ravnborg.org> Cc: "Alex Nixon (Intern)" <Alex.Nixon@eu.citrix.com> Cc: Eduardo Habkost <ehabkost@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
d89961e2 |
|
24-Jul-2008 |
Jeremy Fitzhardinge <jeremy@goop.org> |
xen: suppress known wrmsrs In general, Xen doesn't support wrmsr from an unprivileged domain; it just ends up ignoring the instruction and printing a message on the console. Given that there are sets of MSRs we know the kernel will try to write to, but we don't care, just eat them in xen_write_msr to cut down on console noise. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
a05d2eba |
|
27-Jul-2008 |
Jeremy Fitzhardinge <jeremy@goop.org> |
xen: fix allocation and use of large ldts When the ldt gets to more than 1 page in size, the kernel uses vmalloc to allocate it. This means that: - when making the ldt RO, we must update the pages in both the vmalloc mapping and the linear mapping to make sure there are no RW aliases. - we need to use arbitrary_virt_to_machine to compute the machine addr for each update Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
b56afe1d |
|
23-Jul-2008 |
Eduardo Habkost <ehabkost@redhat.com> |
x86, xen: Use native_pte_flags instead of native_pte_val for .pte_flags Using native_pte_val triggers the BUG_ON() in the paravirt_ops version of pte_flags(). Signed-off-by: Eduardo Habkost <ehabkost@redhat.com> Acked-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
38ffbe66 |
|
23-Jul-2008 |
Jeremy Fitzhardinge <jeremy@goop.org> |
x86/paravirt/xen: properly fill out the ldt ops LTP testing showed that Xen does not properly implement sys_modify_ldt(). This patch does the final little bits needed to make the ldt work properly. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
59438c9f |
|
21-Jul-2008 |
Jeremy Fitzhardinge <jeremy@goop.org> |
x86: rename PTE_MASK to PTE_PFN_MASK Rusty, in his peevish way, complained that macros defining constants should have a name which somewhat accurately reflects the actual purpose of the constant. Aside from the fact that PTE_MASK gives no clue as to what's actually being masked, and is misleadingly similar to the functionally entirely different PMD_MASK, PUD_MASK and PGD_MASK, I don't really see what the problem is. But if this patch silences the incessent noise, then it will have achieved its goal (TODO: write test-case). Signed-off-by: Jeremy Fitzhardinge <jeremy@goop.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
caf43bf7 |
|
20-Jul-2008 |
Ingo Molnar <mingo@elte.hu> |
x86, xen: fix apic_ops build on UP fix: arch/x86/xen/enlighten.c:615: error: variable ‘xen_basic_apic_ops’ has initializer but incomplete type arch/x86/xen/enlighten.c:616: error: unknown field ‘read’ specified in initializer [...] Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
95c7c23b |
|
15-Jul-2008 |
Jeremy Fitzhardinge <jeremy@goop.org> |
xen: report hypervisor version Various versions of the hypervisor have differences in what ABIs and features they support. Print some details into the boot log to help with remote debugging. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
593f4a78 |
|
16-Jul-2008 |
Maciej W. Rozycki <macro@linux-mips.org> |
x86: APIC: remove apic_write_around(); use alternatives Use alternatives to select the workaround for the 11AP Pentium erratum for the affected steppings on the fly rather than build time. Remove the X86_GOOD_APIC configuration option and replace all the calls to apic_write_around() with plain apic_write(), protecting accesses to the ESR as appropriate due to the 3AP Pentium erratum. Remove apic_read_around() and all its invocations altogether as not needed. Remove apic_write_atomic() and all its implementing backends. The use of ASM_OUTPUT2() is not strictly needed for input constraints, but I have used it for readability's sake. I had the feeling no one else was brave enough to do it, so I went ahead and here it is. Verified by checking the generated assembly and tested with both a 32-bit and a 64-bit configuration, also with the 11AP "feature" forced on and verified with gdb on /proc/kcore to work as expected (as an 11AP machines are quite hard to get hands on these days). Some script complained about the use of "volatile", but apic_write() needs it for the same reason and is effectively a replacement for writel(), so I have disregarded it. I am not sure what the policy wrt defconfig files is, they are generated and there is risk of a conflict resulting from an unrelated change, so I have left changes to them out. The option will get removed from them at the next run. Some testing with machines other than mine will be needed to avoid some stupid mistake, but despite its volume, the change is not really that intrusive, so I am fairly confident that because it works for me, it will everywhere. Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
b3fe1243 |
|
09-Jul-2008 |
Ingo Molnar <mingo@elte.hu> |
xen64: fix build error on 32-bit + !HIGHMEM fix: arch/x86/xen/enlighten.c: In function 'xen_set_fixmap': arch/x86/xen/enlighten.c:1127: error: 'FIX_KMAP_BEGIN' undeclared (first use in this function) arch/x86/xen/enlighten.c:1127: error: (Each undeclared identifier is reported only once arch/x86/xen/enlighten.c:1127: error: for each function it appears in.) arch/x86/xen/enlighten.c:1127: error: 'FIX_KMAP_END' undeclared (first use in this function) make[1]: *** [arch/x86/xen/enlighten.o] Error 1 make: *** [arch/x86/xen/enlighten.o] Error 2 FIX_KMAP_BEGIN is only available on HIGHMEM. Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
1153968a |
|
08-Jul-2008 |
Jeremy Fitzhardinge <jeremy@goop.org> |
xen: implement Xen write_msr operation 64-bit uses MSRs for important things like the base for fs and gs-prefixed addresses. It's more efficient to use a hypercall to update these, rather than go via the trap and emulate path. Other MSR writes are just passed through; in an unprivileged domain they do nothing, but it might be useful later. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Stephen Tweedie <sct@redhat.com> Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
bf18bf94 |
|
08-Jul-2008 |
Jeremy Fitzhardinge <jeremy@goop.org> |
xen64: set up userspace syscall patch 64-bit userspace expects the vdso to be mapped at a specific fixed address, which happens to be in the middle of the kernel address space. Because we have split user and kernel pagetables, we need to make special arrangements for the vsyscall mapping to appear in the kernel part of the user pagetable. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Stephen Tweedie <sct@redhat.com> Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
6fcac6d3 |
|
08-Jul-2008 |
Jeremy Fitzhardinge <jeremy@goop.org> |
xen64: set up syscall and sysenter entrypoints for 64-bit We set up entrypoints for syscall and sysenter. sysenter is only used for 32-bit compat processes, whereas syscall can be used in by both 32 and 64-bit processes. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Stephen Tweedie <sct@redhat.com> Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
d6182fbf |
|
08-Jul-2008 |
Jeremy Fitzhardinge <jeremy@goop.org> |
xen64: allocate and manage user pagetables Because the x86_64 architecture does not enforce segment limits, Xen cannot protect itself with them as it does in 32-bit mode. Therefore, to protect itself, it runs the guest kernel in ring 3. Since it also runs the guest userspace in ring3, the guest kernel must maintain a second pagetable for its userspace, which does not map kernel space. Naturally, the guest kernel pagetables map both kernel and userspace. The userspace pagetable is attached to the corresponding kernel pagetable via the pgd's page->private field. It is allocated and freed at the same time as the kernel pgd via the paravirt_pgd_alloc/free hooks. Fortunately, the user pagetable is almost entirely shared with the kernel pagetable; the only difference is the pgd page itself. set_pgd will populate all entries in the kernel pagetable, and also set the corresponding user pgd entry if the address is less than STACK_TOP_MAX. The user pagetable must be pinned and unpinned with the kernel one, but because the pagetables are aliased, pgd_walk() only needs to be called on the kernel pagetable. The user pgd page is then pinned/unpinned along with the kernel pgd page. xen_write_cr3 must write both the kernel and user cr3s. The init_mm.pgd pagetable never has a user pagetable allocated for it, because it can never be used while running usermode. One awkward area is that early in boot the page structures are not available. No user pagetable can exist at that point, but it complicates the logic to avoid looking at the page structure. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Stephen Tweedie <sct@redhat.com> Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
8a95408e |
|
08-Jul-2008 |
Eduardo Habkost <ehabkost@redhat.com> |
xen64: Clear %fs on xen_load_tls() We need to do this, otherwise we can get a GPF on hypercall return after TLS descriptor is cleared but %fs is still pointing to it. Signed-off-by: Eduardo Habkost <ehabkost@redhat.com> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Stephen Tweedie <sct@redhat.com> Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
b7c3c5c1 |
|
08-Jul-2008 |
Jeremy Fitzhardinge <jeremy@goop.org> |
xen: make sure the kernel command line is right Point the boot params cmd_line_ptr to the domain-builder-provided command line. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Stephen Tweedie <sct@redhat.com> Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
a8fc1089 |
|
08-Jul-2008 |
Eduardo Habkost <ehabkost@redhat.com> |
xen64: implement xen_load_gs_index() xen-64: implement xen_load_gs_index() Signed-off-by: Eduardo Habkost <ehabkost@redhat.com> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Stephen Tweedie <sct@redhat.com> Cc: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
0725cbb9 |
|
08-Jul-2008 |
Jeremy Fitzhardinge <jeremy@goop.org> |
xen64: add identity irq->vector map The x86_64 interrupt subsystem is oriented towards vectors, as opposed to a flat irq space as it is in x86-32. This patch adds a simple identity irq->vector mapping so that we can continue to feed irqs into do_IRQ() and get a good result. Ideally x86_32 will unify with the 64-bit code and use vectors too. At that point we can move to mapping event channels to vectors, which will allow us to economise on irqs (so per-cpu event channels can share irqs, rather than having to allocte one per cpu, for example). Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Stephen Tweedie <sct@redhat.com> Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
952d1d70 |
|
08-Jul-2008 |
Jeremy Fitzhardinge <jeremy@goop.org> |
xen64: add pvop for swapgs swapgs is a no-op under Xen, because the hypervisor makes sure the right version of %gs is current when switching between user and kernel modes. This means that the swapgs "implementation" can be inlined and used when the stack is unsafe (usermode). Unfortunately, it means that disabling patching will result in a non-booting kernel... Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Stephen Tweedie <sct@redhat.com> Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
997409d3 |
|
08-Jul-2008 |
Jeremy Fitzhardinge <jeremy@goop.org> |
xen64: deal with extra words Xen pushes onto exception frames Xen pushes two extra words containing the values of rcx and r11. This pvop hook copies the words back into their appropriate registers, and cleans them off the stack. This leaves the stack in native form, so the normal handler can run unchanged. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Stephen Tweedie <sct@redhat.com> Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
e176d367 |
|
08-Jul-2008 |
Eduardo Habkost <ehabkost@Rawhide-64.localdomain> |
xen64: xen_write_idt_entry() and cvt_gate_to_trap() Changed to use the (to-be-)unified descriptor structs. Signed-off-by: Eduardo Habkost <ehabkost@Rawhide-64.localdomain> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Stephen Tweedie <sct@redhat.com> Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
8745f8b0 |
|
08-Jul-2008 |
Jeremy Fitzhardinge <jeremy@goop.org> |
xen64: defer setting pagetable alloc/release ops We need to wait until the page structure is available to use the proper pagetable page alloc/release operations, since they use struct page to determine if a pagetable is pinned. This happened to work in 32bit because nobody allocated new pagetable pages in the interim between xen_pagetable_setup_done and xen_post_allocator_init, but the 64-bit kenrel needs to allocate more pagetable levels. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Stephen Tweedie <sct@redhat.com> Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
39dbc5bd |
|
08-Jul-2008 |
Jeremy Fitzhardinge <jeremy@goop.org> |
xen32: create initial mappings like 64-bit Rearrange the pagetable initialization to share code with the 64-bit kernel. Rather than deferring anything to pagetable_setup_start, just set up an initial pagetable in swapper_pg_dir early at startup, and create an additional 8MB of physical memory mappings. This matches the native head_32.S mappings to a large degree, and allows the rest of the pagetable setup to continue without much Xen vs. native difference. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Stephen Tweedie <sct@redhat.com> Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
d114e198 |
|
08-Jul-2008 |
Jeremy Fitzhardinge <jeremy@goop.org> |
xen64: map an initial chunk of physical memory Early in boot, map a chunk of extra physical memory for use later on. We need a pool of mapped pages to allocate further pages to construct pagetables mapping all physical memory. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Stephen Tweedie <sct@redhat.com> Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
22911b3f |
|
08-Jul-2008 |
Jeremy Fitzhardinge <jeremy@goop.org> |
xen64: 64-bit starts using set_pte from very early It also doesn't need the 32-bit hack version of set_pte for initial pagetable construction, so just make it use the real thing. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Stephen Tweedie <sct@redhat.com> Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
084a2a4e |
|
08-Jul-2008 |
Jeremy Fitzhardinge <jeremy@goop.org> |
xen64: early mapping setup Set up the initial pagetables to map the kernel mapping into the physical mapping space. This makes __va() usable, since it requires physical mappings. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Stephen Tweedie <sct@redhat.com> Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
7d087b68 |
|
08-Jul-2008 |
Jeremy Fitzhardinge <jeremy@goop.org> |
xen: cpu_detect is 32-bit only Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Stephen Tweedie <sct@redhat.com> Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
15664f96 |
|
08-Jul-2008 |
Jeremy Fitzhardinge <jeremy@goop.org> |
xen64: use set_fixmap for shared_info structure Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Stephen Tweedie <sct@redhat.com> Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
5b09b287 |
|
08-Jul-2008 |
Jeremy Fitzhardinge <jeremy@goop.org> |
x86_64: add workaround for no %gs-based percpu As a stopgap until Mike Travis's x86-64 gs-based percpu patches are ready, provide workaround functions for x86_read/write_percpu for Xen's use. Specifically, this means that we can't really make use of vcpu placement, because we can't use a single gs-based memory access to get to vcpu fields. So disable all that for now. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Stephen Tweedie <sct@redhat.com> Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
a9e7062d |
|
08-Jul-2008 |
Jeremy Fitzhardinge <jeremy@goop.org> |
xen: move smp setup into smp.c Move all the smp_ops setup into smp.c, allowing a lot of things to become static. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Stephen Tweedie <sct@redhat.com> Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
f5d36de0 |
|
08-Jul-2008 |
Jeremy Fitzhardinge <jeremy@goop.org> |
xen64: random ifdefs to mask out 32-bit only code Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Stephen Tweedie <sct@redhat.com> Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
f6e58732 |
|
08-Jul-2008 |
Jeremy Fitzhardinge <jeremy@goop.org> |
xen64: add extra pv_mmu_ops We need extra pv_mmu_ops for 64-bit, to deal with the extra level of pagetable. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Stephen Tweedie <sct@redhat.com> Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
851fa3c4 |
|
08-Jul-2008 |
Jeremy Fitzhardinge <jeremy@goop.org> |
xen: define set_pte from the outset We need set_pte to work from a relatively early point, so enable it from the start. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Stephen Tweedie <sct@redhat.com> Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
a312b37b |
|
08-Jul-2008 |
Eduardo Habkost <ehabkost@redhat.com> |
x86/paravirt: call paravirt_pagetable_setup_{start, done} Call paravirt_pagetable_setup_{start,done} These paravirt_ops functions were not being called on x86_64. Signed-off-by: Eduardo Habkost <ehabkost@redhat.com> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Stephen Tweedie <sct@redhat.com> Cc: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
94a8c3c2 |
|
13-Jul-2008 |
Yinghai Lu <yhlu.kernel@gmail.com> |
x86: let 32bit use apic_ops too - fix fix for pv - clean up the namespace there too. Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com> Cc: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
ad66dd34 |
|
11-Jul-2008 |
Suresh Siddha <suresh.b.siddha@intel.com> |
x2apic: xen64 paravirt basic apic ops Define the Xen specific basic apic ops, in additon to paravirt apic ops, with some misc warning fixes. Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Cc: akpm@linux-foundation.org Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
e93ef949 |
|
01-Jul-2008 |
Alok Kataria <akataria@vmware.com> |
x86: rename paravirtualized TSC functions Rename the paravirtualized calculate_cpu_khz to calibrate_tsc. In all cases, we actually calibrate_tsc and use that as the cpu_khz value. Signed-off-by: Alok N Kataria <akataria@vmware.com> Signed-off-by: Dan Hecht <dhecht@vmware.com> Cc: Dan Hecht <dhecht@vmware.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
fab58420 |
|
24-Jun-2008 |
Jeremy Fitzhardinge <jeremy@goop.org> |
x86/paravirt, 64-bit: add adjust_exception_frame 64-bit Xen pushes a couple of extra words onto an exception frame. Add a hook to deal with them. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: xen-devel <xen-devel@lists.xensource.com> Cc: Stephen Tweedie <sct@redhat.com> Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
d75cd22f |
|
24-Jun-2008 |
Jeremy Fitzhardinge <jeremy@goop.org> |
x86/paravirt: split sysret and sysexit Don't conflate sysret and sysexit; they're different instructions with different semantics, and may be in use at the same time (at least within the same kernel, depending on whether its an Intel or AMD system). sysexit - just return to userspace, does no register restoration of any kind; must explicitly atomically enable interrupts. sysret - reloads flags from r11, so no need to explicitly enable interrupts on 64-bit, responsible for restoring usermode %gs Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citirx.com> Cc: xen-devel <xen-devel@lists.xensource.com> Cc: Stephen Tweedie <sct@redhat.com> Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
eba0045f |
|
24-Jun-2008 |
Jeremy Fitzhardinge <jeremy@goop.org> |
x86/paravirt: add a pgd_alloc/free hooks Add hooks which are called at pgd_alloc/free time. The pgd_alloc hook may return an error code, which if non-zero, causes the pgd allocation to be failed. The hooks may be used to allocate/free auxillary per-pgd information. also fix: > * Ingo Molnar <mingo@elte.hu> wrote: > > include/asm/pgalloc.h: In function ‘paravirt_pgd_free': > include/asm/pgalloc.h:14: error: parameter name omitted > arch/x86/kernel/entry_64.S: In file included from > arch/x86/kernel/traps_64.c:51:include/asm/pgalloc.h: In function ‘paravirt_pgd_free': > include/asm/pgalloc.h:14: error: parameter name omitted Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: xen-devel <xen-devel@lists.xensource.com> Cc: Stephen Tweedie <sct@redhat.com> Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
88a6846c |
|
16-Jun-2008 |
Jeremy Fitzhardinge <jeremy@goop.org> |
xen: set max_pfn_mapped Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Yinghai Lu <yhlu.kernel@gmail.com> Cc: the arch/x86 maintainers <x86@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: Yinghai Lu <yhlu.kernel@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
3b16cf87 |
|
26-Jun-2008 |
Jens Axboe <jens.axboe@oracle.com> |
x86: convert to generic helpers for IPI function calls This converts x86, x86-64, and xen to use the new helpers for smp_call_function() and friends, and adds support for smp_call_function_single(). Acked-by: Ingo Molnar <mingo@elte.hu> Acked-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
|
#
e57778a1 |
|
16-Jun-2008 |
Jeremy Fitzhardinge <jeremy@goop.org> |
xen: implement ptep_modify_prot_start/commit Xen has a pte update function which will update a pte while preserving its accessed and dirty bits. This means that ptep_modify_prot_start() can be implemented as a simple read of the pte value. The hardware may update the pte in the meantime, but ptep_modify_prot_commit() updates it while preserving any changes that may have happened in the meantime. The updates in ptep_modify_prot_commit() are batched if we're currently in lazy mmu mode. The mmu_update hypercall can take a batch of updates to perform, but this code doesn't make particular use of that feature, in favour of using generic multicall batching to get them all into the hypervisor. The net effect of this is that each mprotect pte update turns from two expensive trap-and-emulate faults into they hypervisor into a single hypercall whose cost is amortized in a batched multicall. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Acked-by: Hugh Dickins <hugh@veritas.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
08b882c6 |
|
16-Jun-2008 |
Jeremy Fitzhardinge <jeremy@goop.org> |
paravirt: add hooks for ptep_modify_prot_start/commit This patch adds paravirt-ops hooks in pv_mmu_ops for ptep_modify_prot_start and ptep_modify_prot_commit. This allows the hypervisor-specific backends to implement these in some more efficient way. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Acked-by: Hugh Dickins <hugh@veritas.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
28499143 |
|
08-May-2008 |
Jeremy Fitzhardinge <jeremy@goop.org> |
xen: remove support for non-PAE 32-bit Non-PAE operation has been deprecated in Xen for a while, and is rarely tested or used. xen-unstable has now officially dropped non-PAE support. Since Xen/pvops' non-PAE support has also been broken for a while, we may as well completely drop it altogether. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
aeaaa59c |
|
17-Jun-2008 |
Jeremy Fitzhardinge <jeremy@goop.org> |
x86/paravirt/xen: add set_fixmap pv_mmu_ops Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com> Signed-off-by: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
05345b0f |
|
16-Jun-2008 |
Jeremy Fitzhardinge <jeremy@goop.org> |
xen: mask unwanted pte bits in __supported_pte_mask [ Stable: this isn't a bugfix in itself, but it's a pre-requiste for "xen: don't drop NX bit" ] Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Stable Kernel <stable@kernel.org> Cc: the arch/x86 maintainers <x86@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
eb179e44 |
|
16-Jun-2008 |
Jeremy Fitzhardinge <jeremy@goop.org> |
xen: mask unwanted pte bits in __supported_pte_mask [ Stable: this isn't a bugfix in itself, but it's a pre-requiste for "xen: don't drop NX bit" ] Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Stable Kernel <stable@kernel.org> Cc: the arch/x86 maintainers <x86@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
9c7a7942 |
|
30-May-2008 |
Jeremy Fitzhardinge <jeremy@goop.org> |
xen: restore vcpu_info mapping If we're using vcpu_info mapping, then make sure its restored on all processors before relasing them from stop_machine. The only complication is that if this fails, we can't continue because we've already made assumptions that the mapping is available (baked in calls to the _direct versions of the functions, for example). Fortunately this can only happen with a 32-bit hypervisor, which may possibly run out of mapping space. On a 64-bit hypervisor, this is a non-issue. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
e2426cf8 |
|
30-May-2008 |
Jeremy Fitzhardinge <jeremy@goop.org> |
xen: avoid hypercalls when updating unpinned pud/pmd When operating on an unpinned pagetable (ie, one under construction or destruction), it isn't necessary to use a hypercall to update a pud/pmd entry. Jan Beulich observed that a similar optimisation avoided many thousands of hypercalls while doing a kernel build. One tricky part is that early in the kernel boot there's no page structure, so we can't check to see if the page is pinned. In that case, we just always use the hypercall. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Jan Beulich <jbeulich@novell.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
f0d43100 |
|
29-May-2008 |
Yinghai Lu <yhlu.kernel@gmail.com> |
x86: extend e820 early_res support 32bit -fix #3 introduce init_pg_table_start, so xen PV could specify the value. Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
0e91398f |
|
26-May-2008 |
Jeremy Fitzhardinge <jeremy@goop.org> |
xen: implement save/restore This patch implements Xen save/restore and migration. Saving is triggered via xenbus, which is polled in drivers/xen/manage.c. When a suspend request comes in, the kernel prepares itself for saving by: 1 - Freeze all processes. This is primarily to prevent any partially-completed pagetable updates from confusing the suspend process. If CONFIG_PREEMPT isn't defined, then this isn't necessary. 2 - Suspend xenbus and other devices 3 - Stop_machine, to make sure all the other vcpus are quiescent. The Xen tools require the domain to run its save off vcpu0. 4 - Within the stop_machine state, it pins any unpinned pgds (under construction or destruction), performs canonicalizes various other pieces of state (mostly converting mfns to pfns), and finally 5 - Suspend the domain Restore reverses the steps used to save the domain, ending when all the frozen processes are thawed. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
d5edbc1f |
|
26-May-2008 |
Jeremy Fitzhardinge <jeremy@goop.org> |
xen: add p2m mfn_list_list When saving a domain, the Xen tools need to remap all our mfns to portable pfns. In order to remap our p2m table, it needs to know where all its pages are, so maintain the references to the p2m table for it to use. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
a0d695c8 |
|
26-May-2008 |
Jeremy Fitzhardinge <jeremy@goop.org> |
xen: make dummy_shared_info non-static Rename dummy_shared_info to xen_dummy_shared_info and make it non-static, in anticipation of users outside of enlighten.c Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
d451bb7a |
|
26-May-2008 |
Jeremy Fitzhardinge <jeremy@goop.org> |
xen: make phys_to_machine structure dynamic We now support the use of memory hotplug, so the physical to machine page mapping structure must be dynamic. This is implemented as a two-level radix tree structure, which allows us to efficiently incrementally allocate memory for the p2m table as new pages are added. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
83abc70a |
|
26-May-2008 |
Jeremy Fitzhardinge <jeremy@goop.org> |
xen: make earlyprintk=xen work again For some perverse reason, if you call add_preferred_console() it prevents setup_early_printk() from successfully enabling the boot console - unless you make it a preferred console too... Also, make xenboot console output distinct from normal console output, since it gets repeated when the console handover happens, and the duplicated output is confusing without disambiguation. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Markus Armbruster <armbru@redhat.com> Cc: Gerd Hoffmann <kraxel@redhat.com>
|
#
9e124fe1 |
|
26-May-2008 |
Markus Armbruster <armbru@redhat.com> |
xen: Enable console tty by default in domU if it's not a dummy Without console= arguments on the kernel command line, the first console to register becomes enabled and the preferred console (the one behind /dev/console). This is normally tty (assuming CONFIG_VT_CONSOLE is enabled, which it commonly is). This is okay as long tty is a useful console. But unless we have the PV framebuffer, and it is enabled for this domain, tty0 in domU is merely a dummy. In that case, we want the preferred console to be the Xen console hvc0, and we want it without having to fiddle with the kernel command line. Commit b8c2d3dfbc117dff26058fbac316b8acfc2cb5f7 did that for us. Since we now have the PV framebuffer, we want to enable and prefer tty again, but only when PVFB is enabled. But even then we still want to enable the Xen console as well. Problem: when tty registers, we can't yet know whether the PVFB is enabled. By the time we can know (xenstore is up), the console setup game is over. Solution: enable console tty by default, but keep hvc as the preferred console. Change the preferred console to tty when PVFB probes successfully, unless we've been given console kernel parameters. Signed-off-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
a15af1c9 |
|
26-May-2008 |
Jeremy Fitzhardinge <jeremy@goop.org> |
x86/paravirt: add pte_flags to just get pte flags Add pte_flags() to extract the flags from a pte. This is a special case of pte_val() which is only guaranteed to return the pte's flags correctly; the page number may be corrupted or missing. The intent is to allow paravirt implementations to return pte flags without having to do any translation of the page number (most notably, Xen). Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
239d1fc0 |
|
26-May-2008 |
Jeremy Fitzhardinge <jeremy@goop.org> |
xen: don't worry about preempt during xen_irq_enable() When enabling interrupts, we don't need to worry about preemption, because we either enter with interrupts disabled - so no preemption - or the caller is confused and is re-enabling interrupts on some indeterminate processor. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
2956a351 |
|
26-May-2008 |
Jeremy Fitzhardinge <jeremy@goop.org> |
xen: allow some cr4 updates The guest can legitimately change things like cr4.OSFXSR and OSXMMEXCPT, so let it. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
349c709f |
|
26-May-2008 |
Jeremy Fitzhardinge <jeremy@goop.org> |
xen: use new sched_op Use the new sched_op hypercall, mainly because xenner doesn't support the old one. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
7b1333aa |
|
26-May-2008 |
Jeremy Fitzhardinge <jeremy@goop.org> |
xen: use hypercall rather than clts Xen will trap and emulate clts, but its better to use a hypercall. Also, xenner doesn't handle clts. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
3843fc25 |
|
08-May-2008 |
Jeremy Fitzhardinge <jeremy@goop.org> |
xen: remove support for non-PAE 32-bit Non-PAE operation has been deprecated in Xen for a while, and is rarely tested or used. xen-unstable has now officially dropped non-PAE support. Since Xen/pvops' non-PAE support has also been broken for a while, we may as well completely drop it altogether. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
af7ae3b9 |
|
02-Apr-2008 |
Jeremy Fitzhardinge <jeremy@goop.org> |
xen: allow compilation with non-flat memory There's no real reason we can't support sparsemem/discontigmem, so do so. This is mostly useful to support hotplug memory. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
41e332b2 |
|
02-Apr-2008 |
Jeremy Fitzhardinge <jeremy@goop.org> |
xen: disable preemption during tlb flush Various places in the kernel flush the tlb even though preemption doens't guarantee the tlb flush is happening on any particular CPU. In many cases this doesn't seem to matter, so don't make a fuss about it. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
dbe9e994 |
|
17-Mar-2008 |
Jeremy Fitzhardinge <jeremy@goop.org> |
xen: no need for domU to worry about MCE/MCA Mask MCE/MCA out of cpu caps. Its harmless to leave them there, but it does prevent the kernel from starting an unnecessary thread. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
e2a81baf |
|
17-Mar-2008 |
Jeremy Fitzhardinge <jeremy@goop.org> |
xen: support sysenter/sysexit if hypervisor does 64-bit Xen supports sysenter for 32-bit guests, so support its use. (sysenter is faster than int $0x80 in 32-on-64.) sysexit is still not supported, so we fake it up using iret. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
6944a9c8 |
|
17-Mar-2008 |
Jeremy Fitzhardinge <jeremy@goop.org> |
x86: rename paravirt_alloc_pt etc after the pagetable structure Rename (alloc|release)_(pt|pd) to pte/pmd to explicitly match the name of the appropriate pagetable level structure. [ x86.git merge work by Mark McLoughlin <markmc@redhat.com> ] Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
81e103f1 |
|
17-Apr-2008 |
Jeremy Fitzhardinge <jeremy@xensource.com> |
xen: use iret instruction all the time Change iret implementation to not be dependent on direct-access vcpu structure. Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
b8c2d3df |
|
27-Feb-2008 |
Markus Armbruster <armbru@redhat.com> |
xen: make hvc0 the preferred console in domU This makes the Xen console just work. Before, you had to ask for it on the kernel command line with console=hvc0 Signed-off-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
c946c7de |
|
02-Apr-2008 |
Mark McLoughlin <markmc@redhat.com> |
xen: Clear PG_pinned in release_{pt,pd}() Signed-off-by: Mark McLoughlin <markmc@redhat.com> Cc: xen-devel@lists.xensource.com Cc: Mark McLoughlin <markmc@redhat.com> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
a684d69d |
|
02-Apr-2008 |
Mark McLoughlin <markmc@redhat.com> |
xen: Do not pin/unpin PMD pages i.e. with this simple test case: int fd = open("/dev/zero", O_RDONLY); munmap(mmap((void *)0x40000000, 0x1000_LEN, PROT_READ, MAP_PRIVATE, fd, 0), 0x1000); close(fd); we currently get: kernel BUG at arch/x86/xen/enlighten.c:678! ... EIP is at xen_release_pt+0x79/0xa9 ... Call Trace: [<c041da25>] ? __pmd_free_tlb+0x1a/0x75 [<c047a192>] ? free_pgd_range+0x1d2/0x2b5 [<c047a2f3>] ? free_pgtables+0x7e/0x93 [<c047b272>] ? unmap_region+0xb9/0xf5 [<c047c1bd>] ? do_munmap+0x193/0x1f5 [<c047c24f>] ? sys_munmap+0x30/0x3f [<c0408cce>] ? syscall_call+0x7/0xb ======================= and xen complains: (XEN) mm.c:2241:d4 Mfn 1cc37 not pinned Further details at: https://bugzilla.redhat.com/436453 Signed-off-by: Mark McLoughlin <markmc@redhat.com> Cc: xen-devel@lists.xensource.com Cc: Mark McLoughlin <markmc@redhat.com> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
f6433706 |
|
02-Apr-2008 |
Mark McLoughlin <markmc@redhat.com> |
xen: refactor xen_{alloc,release}_{pt,pd}() Signed-off-by: Mark McLoughlin <markmc@redhat.com> Cc: xen-devel@lists.xensource.com Cc: Mark McLoughlin <markmc@redhat.com> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
2e8fe719 |
|
17-Mar-2008 |
Jeremy Fitzhardinge <jeremy@goop.org> |
xen: fix UP setup of shared_info We need to set up the shared_info pointer once we've mapped the real shared_info into its fixmap slot. That needs to happen once the general pagetable setup has been done. Previously, the UP shared_info was set up one in xen_start_kernel, but that was left pointing to the dummy shared info. Unfortunately there's no really good place to do a later setup of the shared_info in UP, so just do it once the pagetable setup has been done. [ Stable: needed in 2.6.24.x ] Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Stable Kernel <stable@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
04c44a08 |
|
17-Mar-2008 |
Jeremy Fitzhardinge <jeremy@goop.org> |
xen: fix RMW when unmasking events xen_irq_enable_direct and xen_sysexit were using "andw $0x00ff, XEN_vcpu_info_pending(vcpu)" to unmask events and test for pending ones in one instuction. Unfortunately, the pending flag must be modified with a locked operation since it can be set by another CPU, and the unlocked form of this operation was causing the pending flag to get lost, allowing the processor to return to usermode with pending events and ultimately deadlock. The simple fix would be to make it a locked operation, but that's rather costly and unnecessary. The fix here is to split the mask-clearing and pending-testing into two instructions; the interrupt window between them is of no concern because either way pending or new events will be processed. This should fix lingering bugs in using direct vcpu structure access too. [ Stable: needed in 2.6.24.x ] Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Stable <stable@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
d40e7059 |
|
29-Feb-2008 |
Jeremy Fitzhardinge <jeremy@xensource.com> |
xen: mask out SEP from CPUID Fix 32-on-64 pvops kernel: we don't want userspace using syscall/sysenter, even if the hypervisor supports it, so mask it out from CPUID. Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
2b540781 |
|
13-Feb-2008 |
Jeremy Fitzhardinge <jeremy@xensource.com> |
xen: unpin initial Xen pagetable once we're finished with it Unpin the Xen-provided pagetable once we've finished with it, so it doesn't cause stray references which cause later swapper_pg_dir pagetable updates to fail. Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Tested-by: Jody Belka <knew-linux@pimb.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
1c70e9bd |
|
30-Jan-2008 |
Jeremy Fitzhardinge <jeremy@goop.org> |
xen: deal with pmd being allocated/freed Deal properly with pmd-level pages being allocated and freed dynamically. We can handle them more or less the same as pte pages. Also, deal with early_ioremap pagetable manipulations. Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
4dbf7af6 |
|
30-Jan-2008 |
Jan Beulich <jbeulich@novell.com> |
x86: adjust/fix LDT handling for Xen Based on patch from Jan Beulich <jbeulich@novell.com>. Don't rely on kmalloc(PAGE_SIZE) returning PAGE_SIZE aligned memory (Xen requires GDT *and* LDT to be page-aligned). Using the page allocator interface also removes the (albeit small) slab allocator overhead. The same change being done for 64-bits for consistency. Further, the Xen hypercall interface expects the LDT address to be virtual, not machine. [ Adjusted to unified ldt.c - Jeremy ] Signed-off-by: Jan Beulich <jbeulich@novell.com> Acked-by: Jeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
75b8bb3e |
|
30-Jan-2008 |
Glauber de Oliveira Costa <gcosta@redhat.com> |
x86: change write_ldt_entry signature this patch changes the signature of write_ldt_entry. Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com> CC: Zachary Amsden <zach@vmware.com> CC: Jeremy Fitzhardinge <Jeremy.Fitzhardinge.citrix.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
014b15be |
|
30-Jan-2008 |
Glauber de Oliveira Costa <gcosta@redhat.com> |
x86: change write_gdt_entry signature. This patch changes the write_gdt_entry function signature. Instead of the old "a" and "b" parameters, it now receives a pointer to a desc_struct, and the size of the entry being handled. This is because x86_64 can have some 16-byte entries as well as 8-byte ones. Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com> CC: Zachary Amsden <zach@vmware.com> CC: Jeremy Fitzhardinge <Jeremy.Fitzhardinge.citrix.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
8d947344 |
|
30-Jan-2008 |
Glauber de Oliveira Costa <gcosta@redhat.com> |
x86: change write_idt_entry signature this patch changes write_idt_entry signature. It now takes a gate_desc instead of the a and b parameters. It will allow it to be later unified between i386 and x86_64. Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com> CC: Zachary Amsden <zach@vmware.com> CC: Jeremy Fitzhardinge <Jeremy.Fitzhardinge.citrix.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
6b68f01b |
|
30-Jan-2008 |
Glauber de Oliveira Costa <gcosta@redhat.com> |
x86: unify struct desc_ptr This patch unifies struct desc_ptr between i386 and x86_64. They can be expressed in the exact same way in C code, only having to change the name of one of them. As Xgt_desc_struct is ugly and big, this is the one that goes away. There's also a padding field in i386, but it is not really needed in the C structure definition. Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
faca6227 |
|
30-Jan-2008 |
H. Peter Anvin <hpa@zytor.com> |
x86: use generic register name in the thread and tss structures This changes size-specific register names (eip/rip, esp/rsp, etc.) to generic names in the thread and tss structures. Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
65ea5b03 |
|
30-Jan-2008 |
H. Peter Anvin <hpa@zytor.com> |
x86: rename the struct pt_regs members for 32/64-bit consistency We have a lot of code which differs only by the naming of specific members of structures that contain registers. In order to enable additional unifications, this patch drops the e- or r- size prefix from the register names in struct pt_regs, and drops the x- prefixes for segment registers on the 32-bit side. This patch also performs the equivalent renames in some additional places that might be candidates for unification in the future. Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
6abcd98f |
|
30-Jan-2008 |
Glauber de Oliveira Costa <gcosta@redhat.com> |
x86: irqflags consolidation This patch consolidates the irqflags include files containing common paravirt definitions. The native definition for interrupt handling, halt, and such, are the same for 32 and 64 bit, and they are kept in irqflags.h. the differences are split in the arch-specific files. The syscall function, irq_enable_sysexit, has a very specific i386 naming, and its name is then changed to a more general one. Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Acked-by: Jeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
42e0a9aa |
|
30-Jan-2008 |
Thomas Gleixner <tglx@linutronix.de> |
x86: use u32 for some lapic functions Use u32 so 32 and 64bit have the same interface. Andrew Morton: xen, lguest build fixes Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
f9c4cfe9 |
|
23-Jan-2008 |
Jeremy Fitzhardinge <jeremy@goop.org> |
xen: disable vcpu_info placement for now There have been several reports of Xen guest domains locking up when using vcpu_info structure placement. Disable it for now. Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
7999f4b4 |
|
10-Dec-2007 |
Jeremy Fitzhardinge <jeremy@goop.org> |
xen: relax signature check Some versions of Xen 3.x set their magic number to "xen-3.[12]", so relax the test to match them. Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
fb893e99 |
|
17-Oct-2007 |
Jesper Juhl <jesper.juhl@gmail.com> |
i386: Clean up duplicate includes in arch/i386/xen/ This patch cleans up duplicate includes in arch/i386/xen/ [ tglx: arch/x86 adaptation ] Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com> Signed-off-by: Andi Kleen <ak@suse.de> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
30c82645 |
|
15-Oct-2007 |
H. Peter Anvin <hpa@zytor.com> |
[x86] remove uses of magic macros for boot_params access Instead of using magic macros for boot_params access, simply use the boot_params structure. Signed-off-by: H. Peter Anvin <hpa@zytor.com>
|
#
e3d26976 |
|
16-Oct-2007 |
Jeremy Fitzhardinge <jeremy@xensource.com> |
xen: fix incorrect vcpu_register_vcpu_info hypercall argument The kernel's copy of struct vcpu_register_vcpu_info was out of date, at best causing the hypercall to fail and the guest kernel to fall back to the old mechanism, or worse, causing random memory corruption. [ Stable folks: applies to 2.6.23 ] Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Cc: Stable Kernel <stable@kernel.org> Cc: Morten =?utf-8?q?B=C3=B8geskov?= <xen-users@morten.bogeskov.dk> Cc: Mark Williamson <mark.williamson@cl.cam.ac.uk>
|
#
fb1d8404 |
|
16-Oct-2007 |
Jeremy Fitzhardinge <jeremy@xensource.com> |
xen: ask the hypervisor how much space it needs reserved Ask the hypervisor how much space it needs reserved, since 32-on-64 doesn't need any space, and it may change in future. Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
|
#
74260714 |
|
16-Oct-2007 |
Jeremy Fitzhardinge <jeremy@xensource.com> |
xen: lock pte pages while pinning/unpinning When a pagetable is created, it is made globally visible in the rmap prio tree before it is pinned via arch_dup_mmap(), and remains in the rmap tree while it is unpinned with arch_exit_mmap(). This means that other CPUs may race with the pinning/unpinning process, and see a pte between when it gets marked RO and actually pinned, causing any pte updates to fail with write-protect faults. As a result, all pte pages must be properly locked, and only unlocked once the pinning/unpinning process has finished. In order to avoid taking spinlocks for the whole pagetable - which may overflow the PREEMPT_BITS portion of preempt counter - it locks and pins each pte page individually, and then finally pins the whole pagetable. Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Cc: Rik van Riel <riel@redhat.com> Cc: Hugh Dickens <hugh@veritas.com> Cc: David Rientjes <rientjes@google.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andi Kleen <ak@suse.de> Cc: Keir Fraser <keir@xensource.com> Cc: Jan Beulich <jbeulich@novell.com>
|
#
9f79991d |
|
16-Oct-2007 |
Jeremy Fitzhardinge <jeremy@xensource.com> |
xen: deal with stale cr3 values when unpinning pagetables When a pagetable is no longer in use, it must be unpinned so that its pages can be freed. However, this is only possible if there are no stray uses of the pagetable. The code currently deals with all the usual cases, but there's a rare case where a vcpu is changing cr3, but is doing so lazily, and the change hasn't actually happened by the time the pagetable is unpinned, even though it appears to have been completed. This change adds a second per-cpu cr3 variable - xen_current_cr3 - which tracks the actual state of the vcpu cr3. It is only updated once the actual hypercall to set cr3 has been completed. Other processors wishing to unpin a pagetable can check other vcpu's xen_current_cr3 values to see if any cross-cpu IPIs are needed to clean things up. [ Stable folks: 2.6.23 bugfix ] Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Cc: Stable Kernel <stable@kernel.org>
|
#
d626a1f1 |
|
16-Oct-2007 |
Jesper Juhl <jesper.juhl@gmail.com> |
Clean up duplicate includes in arch/i386/xen/ This patch cleans up duplicate includes in arch/i386/xen/ Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com> Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
|
#
8965c1c0 |
|
16-Oct-2007 |
Jeremy Fitzhardinge <jeremy@xensource.com> |
paravirt: clean up lazy mode handling Currently, the set_lazy_mode pv_op is overloaded with 5 functions: 1. enter lazy cpu mode 2. leave lazy cpu mode 3. enter lazy mmu mode 4. leave lazy mmu mode 5. flush pending batched operations This complicates each paravirt backend, since it needs to deal with all the possible state transitions, handling flushing, etc. In particular, flushing is quite distinct from the other 4 functions, and seems to just cause complication. This patch removes the set_lazy_mode operation, and adds "enter" and "leave" lazy mode operations on mmu_ops and cpu_ops. All the logic associated with enter and leaving lazy states is now in common code (basically BUG_ONs to make sure that no mode is current when entering a lazy mode, and make sure that the mode is current when leaving). Also, flush is handled in a common way, by simply leaving and re-entering the lazy mode. The result is that the Xen, lguest and VMI lazy mode implementations are much simpler. Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Cc: Andi Kleen <ak@suse.de> Cc: Zach Amsden <zach@vmware.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Avi Kivity <avi@qumranet.com> Cc: Anthony Liguory <aliguori@us.ibm.com> Cc: "Glauber de Oliveira Costa" <glommer@gmail.com> Cc: Jun Nakajima <jun.nakajima@intel.com>
|
#
93b1eab3 |
|
16-Oct-2007 |
Jeremy Fitzhardinge <jeremy@xensource.com> |
paravirt: refactor struct paravirt_ops into smaller pv_*_ops This patch refactors the paravirt_ops structure into groups of functionally related ops: pv_info - random info, rather than function entrypoints pv_init_ops - functions used at boot time (some for module_init too) pv_misc_ops - lazy mode, which didn't fit well anywhere else pv_time_ops - time-related functions pv_cpu_ops - various privileged instruction ops pv_irq_ops - operations for managing interrupt state pv_apic_ops - APIC operations pv_mmu_ops - operations for managing pagetables There are several motivations for this: 1. Some of these ops will be general to all x86, and some will be i386/x86-64 specific. This makes it easier to share common stuff while allowing separate implementations where needed. 2. At the moment we must export all of paravirt_ops, but modules only need selected parts of it. This allows us to export on a case by case basis (and also choose which export license we want to apply). 3. Functional groupings make things a bit more readable. Struct paravirt_ops is now only used as a template to generate patch-site identifiers, and to extract function pointers for inserting into jmp/calls when patching. It is only instantiated when needed. Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Cc: Andi Kleen <ak@suse.de> Cc: Zach Amsden <zach@vmware.com> Cc: Avi Kivity <avi@qumranet.com> Cc: Anthony Liguory <aliguori@us.ibm.com> Cc: "Glauber de Oliveira Costa" <glommer@gmail.com> Cc: Jun Nakajima <jun.nakajima@intel.com>
|
#
9702785a |
|
11-Oct-2007 |
Thomas Gleixner <tglx@linutronix.de> |
i386: move xen Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|