#
27f6a9c8 |
|
31-Jan-2024 |
Peter Hilber <peter.hilber@opensynergy.com> |
kvmclock: Unexport kvmclock clocksource The KVM PTP driver now refers to the clocksource ID CSID_X86_KVM_CLK, not to the clocksource itself any more. There are no remaining users of the clocksource export. Therefore, make the clocksource static again. Signed-off-by: Peter Hilber <peter.hilber@opensynergy.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20240201010453.2212371-9-peter.hilber@opensynergy.com
|
#
576bd496 |
|
31-Jan-2024 |
Peter Hilber <peter.hilber@opensynergy.com> |
x86/kvm, ptp/kvm: Add clocksource ID, set system_counterval_t.cs_id Add a clocksource ID for the x86 kvmclock. Also, for ptp_kvm, set the recently added struct system_counterval_t member cs_id to the clocksource ID (x86 kvmclock or ARM Generic Timer). In the future, get_device_system_crosststamp() will compare the clocksource ID in struct system_counterval_t, rather than the clocksource. For now, to avoid touching too many subsystems at once, extract the clocksource ID from the clocksource. The clocksource dereference will be removed once everything is converted over.. Signed-off-by: Peter Hilber <peter.hilber@opensynergy.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20240201010453.2212371-5-peter.hilber@opensynergy.com
|
#
1c6d984f |
|
04-Dec-2023 |
Kirill A. Shutemov <kirill.shutemov@linux.intel.com> |
x86/kvm: Do not try to disable kvmclock if it was not enabled kvm_guest_cpu_offline() tries to disable kvmclock regardless if it is present in the VM. It leads to write to a MSR that doesn't exist on some configurations, namely in TDX guest: unchecked MSR access error: WRMSR to 0x12 (tried to write 0x0000000000000000) at rIP: 0xffffffff8110687c (kvmclock_disable+0x1c/0x30) kvmclock enabling is gated by CLOCKSOURCE and CLOCKSOURCE2 KVM paravirt features. Do not disable kvmclock if it was not enabled. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Fixes: c02027b5742b ("x86/kvm: Disable kvmclock on all CPUs on shutdown") Reviewed-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Wanpeng Li <wanpengli@tencent.com> Cc: stable@vger.kernel.org Message-Id: <20231205004510.27164-6-kirill.shutemov@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
54aa699e |
|
02-Jan-2024 |
Bjorn Helgaas <bhelgaas@google.com> |
arch/x86: Fix typos Fix typos, most reported by "codespell arch/x86". Only touches comments, no code changes. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Randy Dunlap <rdunlap@infradead.org> Link: https://lore.kernel.org/r/20240103004011.1758650-1-helgaas@kernel.org
|
#
5c5e9a2b |
|
18-May-2023 |
Peter Zijlstra <peterz@infradead.org> |
x86/tsc: Provide sched_clock_noinstr() With the intent to provide local_clock_noinstr(), a variant of local_clock() that's safe to be called from noinstr code (with the assumption that any such code will already be non-preemptible), prepare for things by providing a noinstr sched_clock_noinstr() function. Specifically, preempt_enable_*() calls out to schedule(), which upsets noinstr validation efforts. vmlinux.o: warning: objtool: native_sched_clock+0x96: call to preempt_schedule_notrace_thunk() leaves .noinstr.text section vmlinux.o: warning: objtool: kvm_clock_read+0x22: call to preempt_schedule_notrace_thunk() leaves .noinstr.text section Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Michael Kelley <mikelley@microsoft.com> # Hyper-V Link: https://lore.kernel.org/r/20230519102715.910937674@infradead.org
|
#
8739c681 |
|
26-Jan-2023 |
Peter Zijlstra <peterz@infradead.org> |
sched/clock/x86: Mark sched_clock() noinstr In order to use sched_clock() from noinstr code, mark it and all it's implenentations noinstr. The whole pvclock thing (used by KVM/Xen) is a bit of a pain, since it calls out to watchdogs, create a pvclock_clocksource_read_nowd() variant doesn't do that and can be noinstr. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20230126151323.702003578@infradead.org
|
#
77d72792 |
|
08-Mar-2022 |
Wanpeng Li <wanpengli@tencent.com> |
x86/kvm: Don't waste kvmclock memory if there is nopv parameter When the "nopv" command line parameter is used, it should not waste memory for kvmclock. Signed-off-by: Wanpeng Li <wanpengli@tencent.com> Message-Id: <1646727529-11774-1-git-send-email-wanpengli@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
92e68cc5 |
|
25-Feb-2022 |
Dexuan Cui <decui@microsoft.com> |
x86/kvmclock: Fix Hyper-V Isolated VM's boot issue when vCPUs > 64 When Linux runs as an Isolated VM on Hyper-V, it supports AMD SEV-SNP but it's partially enlightened, i.e. cc_platform_has( CC_ATTR_GUEST_MEM_ENCRYPT) is true but sev_active() is false. Commit 4d96f9109109 per se is good, but with it now kvm_setup_vsyscall_timeinfo() -> kvmclock_init_mem() calls set_memory_decrypted(), and later gets stuck when trying to zere out the pages pointed by 'hvclock_mem', if Linux runs as an Isolated VM on Hyper-V. The cause is that here now the Linux VM should no longer access the original guest physical addrss (GPA); instead the VM should do memremap() and access the original GPA + ms_hyperv.shared_gpa_boundary: see the example code in drivers/hv/connection.c: vmbus_connect() or drivers/hv/ring_buffer.c: hv_ringbuffer_init(). If the VM tries to access the original GPA, it keepts getting injected a fault by Hyper-V and gets stuck there. Here the issue happens only when the VM has >=65 vCPUs, because the global static array hv_clock_boot[] can hold 64 "struct pvclock_vsyscall_time_info" (the sizeof of the struct is 64 bytes), so kvmclock_init_mem() only allocates memory in the case of vCPUs > 64. Since the 'hvclock_mem' pages are only useful when the kvm clock is supported by the underlying hypervisor, fix the issue by returning early when Linux VM runs on Hyper-V, which doesn't support kvm clock. Fixes: 4d96f9109109 ("x86/sev: Replace occurrences of sev_active() with cc_platform_has()") Tested-by: Andrea Parri (Microsoft) <parri.andrea@gmail.com> Signed-off-by: Andrea Parri (Microsoft) <parri.andrea@gmail.com> Signed-off-by: Dexuan Cui <decui@microsoft.com> Message-Id: <20220225084600.17817-1-decui@microsoft.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
3c51d0a6 |
|
22-Feb-2022 |
Wanpeng Li <wanpengli@tencent.com> |
x86/kvm: Don't waste memory if kvmclock is disabled Even if "no-kvmclock" is passed in cmdline parameter, the guest kernel still allocates hvclock_mem which is scaled by the number of vCPUs, let's check kvmclock enable in advance to avoid this memory waste. Signed-off-by: Wanpeng Li <wanpengli@tencent.com> Message-Id: <1645520523-30814-1-git-send-email-wanpengli@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
f3f26dae |
|
09-Dec-2021 |
David Woodhouse <dwmw@amazon.co.uk> |
x86/kvm: Silence per-cpu pr_info noise about KVM clocks and steal time I made the actual CPU bringup go nice and fast... and then Linux spends half a minute printing stupid nonsense about clocks and steal time for each of 256 vCPUs. Don't do that. Nobody cares. Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Message-Id: <20211209150938.3518-12-dwmw2@infradead.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
4d96f910 |
|
08-Sep-2021 |
Tom Lendacky <thomas.lendacky@amd.com> |
x86/sev: Replace occurrences of sev_active() with cc_platform_has() Replace uses of sev_active() with the more generic cc_platform_has() using CC_ATTR_GUEST_MEM_ENCRYPT. If future support is added for other memory encryption technologies, the use of CC_ATTR_GUEST_MEM_ENCRYPT can be updated, as required. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20210928191009.32551-7-bp@alien8.de
|
#
ad9af930 |
|
28-Sep-2021 |
Zelin Deng <zelin.deng@linux.alibaba.com> |
x86/kvmclock: Move this_cpu_pvti into kvmclock.h There're other modules might use hv_clock_per_cpu variable like ptp_kvm, so move it into kvmclock.h and export the symbol to make it visiable to other modules. Signed-off-by: Zelin Deng <zelin.deng@linux.alibaba.com> Cc: <stable@vger.kernel.org> Message-Id: <1632892429-101194-2-git-send-email-zelin.deng@linux.alibaba.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
3d6b8413 |
|
14-Apr-2021 |
Vitaly Kuznetsov <vkuznets@redhat.com> |
x86/kvm: Disable all PV features on crash Crash shutdown handler only disables kvmclock and steal time, other PV features remain active so we risk corrupting memory or getting some side-effects in kdump kernel. Move crash handler to kvm.c and unify with CPU offline. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20210414123544.1060604-5-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
c02027b5 |
|
14-Apr-2021 |
Vitaly Kuznetsov <vkuznets@redhat.com> |
x86/kvm: Disable kvmclock on all CPUs on shutdown Currenly, we disable kvmclock from machine_shutdown() hook and this only happens for boot CPU. We need to disable it for all CPUs to guard against memory corruption e.g. on restore from hibernate. Note, writing '0' to kvmclock MSR doesn't clear memory location, it just prevents hypervisor from updating the location so for the short while after write and while CPU is still alive, the clock remains usable and correct so we don't need to switch to some other clocksource. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20210414123544.1060604-4-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
a0e2bf7c |
|
11-Mar-2021 |
Juergen Gross <jgross@suse.com> |
x86/paravirt: Switch time pvops functions to use static_call() The time pvops functions are the only ones left which might be used in 32-bit mode and which return a 64-bit value. Switch them to use the static_call() mechanism instead of pvops, as this allows quite some simplification of the pvops implementation. Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20210311142319.4723-5-jgross@suse.com
|
#
d7eb79c6 |
|
23-Feb-2021 |
Wanpeng Li <wanpengli@tencent.com> |
KVM: kvmclock: Fix vCPUs > 64 can't be online/hotpluged # lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 88 On-line CPU(s) list: 0-63 Off-line CPU(s) list: 64-87 # cat /proc/cmdline BOOT_IMAGE=/vmlinuz-5.10.0-rc3-tlinux2-0050+ root=/dev/mapper/cl-root ro rd.lvm.lv=cl/root rhgb quiet console=ttyS0 LANG=en_US .UTF-8 no-kvmclock-vsyscall # echo 1 > /sys/devices/system/cpu/cpu76/online -bash: echo: write error: Cannot allocate memory The per-cpu vsyscall pvclock data pointer assigns either an element of the static array hv_clock_boot (#vCPU <= 64) or dynamically allocated memory hvclock_mem (vCPU > 64), the dynamically memory will not be allocated if kvmclock vsyscall is disabled, this can result in cpu hotpluged fails in kvmclock_setup_percpu() which returns -ENOMEM. It's broken for no-vsyscall and sometimes you end up with vsyscall disabled if the host does something strange. This patch fixes it by allocating this dynamically memory unconditionally even if vsyscall is disabled. Fixes: 6a1cac56f4 ("x86/kvm: Use __bss_decrypted attribute in shared variables") Reported-by: Zelin Deng <zelin.deng@linux.alibaba.com> Cc: Brijesh Singh <brijesh.singh@amd.com> Cc: stable@vger.kernel.org#v4.19-rc5+ Signed-off-by: Wanpeng Li <wanpengli@tencent.com> Message-Id: <1614130683-24137-1-git-send-email-wanpengli@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
789f52c0 |
|
06-Nov-2020 |
Alex Shi <alex.shi@linux.alibaba.com> |
x86/kvm: remove unused macro HV_CLOCK_SIZE This macro is useless, and could cause gcc warning: arch/x86/kernel/kvmclock.c:47:0: warning: macro "HV_CLOCK_SIZE" is not used [-Wunused-macros] Let's remove it. Signed-off-by: Alex Shi <alex.shi@linux.alibaba.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Sean Christopherson <sean.j.christopherson@intel.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: Wanpeng Li <wanpengli@tencent.com> Cc: Jim Mattson <jmattson@google.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: x86@kernel.org Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org Message-Id: <1604651963-10067-1-git-send-email-alex.shi@linux.alibaba.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
b95a8a27 |
|
07-Feb-2020 |
Thomas Gleixner <tglx@linutronix.de> |
x86/vdso: Use generic VDSO clock mode storage Switch to the generic VDSO clock mode storage. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com> (VDSO parts) Acked-by: Juergen Gross <jgross@suse.com> (Xen parts) Acked-by: Paolo Bonzini <pbonzini@redhat.com> (KVM parts) Link: https://lkml.kernel.org/r/20200207124403.152039903@linutronix.de
|
#
eec399dd |
|
07-Feb-2020 |
Thomas Gleixner <tglx@linutronix.de> |
x86/vdso: Move VDSO clocksource state tracking to callback All architectures which use the generic VDSO code have their own storage for the VDSO clock mode. That's pointless and just requires duplicate code. X86 abuses the function which retrieves the architecture specific clock mode storage to mark the clocksource as used in the VDSO. That's silly because this is invoked on every tick when the VDSO data is updated. Move this functionality to the clocksource::enable() callback so it gets invoked once when the clocksource is installed. This allows to make the clock mode storage generic. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Michael Kelley <mikelley@microsoft.com> (Hyper-V parts) Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com> (VDSO parts) Acked-by: Juergen Gross <jgross@suse.com> (Xen parts) Link: https://lkml.kernel.org/r/20200207124402.934519777@linutronix.de
|
#
7539b174 |
|
04-Jan-2019 |
Marcelo Tosatti <mtosatti@redhat.com> |
x86: kvmguest: use TSC clocksource if invariant TSC is exposed The invariant TSC bit has the following meaning: "The time stamp counter in newer processors may support an enhancement, referred to as invariant TSC. Processor's support for invariant TSC is indicated by CPUID.80000007H:EDX[8]. The invariant TSC will run at a constant rate in all ACPI P-, C-. and T-states. This is the architectural behavior moving forward. On processors with invariant TSC support, the OS may use the TSC for wall clock timer services (instead of ACPI or HPET timers). TSC reads are much more efficient and do not incur the overhead associated with a ring transition or access to a platform resource." IOW, TSC does not change frequency. In such case, and with TSC scaling hardware available to handle migration, it is possible to use the TSC clocksource directly, whose system calls are faster. Reduce the rating of kvmclock clocksource to allow TSC clocksource to be the default if invariant TSC is exposed. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> v2: Use feature bits and tsc_unstable() check (Sean Christopherson) Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
b5179ec4 |
|
25-Jan-2019 |
Pavel Tatashin <pasha.tatashin@soleen.com> |
x86/kvmclock: set offset for kvm unstable clock VMs may show incorrect uptime and dmesg printk offsets on hypervisors with unstable clock. The problem is produced when VM is rebooted without exiting from qemu. The fix is to calculate clock offset not only for stable clock but for unstable clock as well, and use kvm_sched_clock_read() which substracts the offset for both clocks. This is safe, because pvclock_clocksource_read() does the right thing and makes sure that clock always goes forward, so once offset is calculated with unstable clock, we won't get new reads that are smaller than offset, and thus won't get negative results. Thank you Jon DeVree for helping to reproduce this issue. Fixes: 857baa87b642 ("sched/clock: Enable sched clock early") Cc: stable@vger.kernel.org Reported-by: Dominique Martinet <asmadeus@codewreck.org> Signed-off-by: Pavel Tatashin <pasha.tatashin@soleen.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
649472a1 |
|
02-Nov-2018 |
Peng Hao <peng.hao2@zte.com.cn> |
x86/kvmclock: convert to SPDX identifiers Update the verbose license text with the matching SPDX license identifier. Signed-off-by: Peng Hao <peng.hao2@zte.com.cn> [Changed deprecated GPL-2.0+ to GPL-2.0-or-later. - Radim] Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
|
#
6a1cac56 |
|
14-Sep-2018 |
Brijesh Singh <brijesh.singh@amd.com> |
x86/kvm: Use __bss_decrypted attribute in shared variables The recent removal of the memblock dependency from kvmclock caused a SEV guest regression because the wall_clock and hv_clock_boot variables are no longer mapped decrypted when SEV is active. Use the __bss_decrypted attribute to put the static wall_clock and hv_clock_boot in the .bss..decrypted section so that they are mapped decrypted during boot. In the preparatory stage of CPU hotplug, the per-cpu pvclock data pointer assigns either an element of the static array or dynamically allocated memory for the pvclock data pointer. The static array are now mapped decrypted but the dynamically allocated memory is not mapped decrypted. However, when SEV is active this memory range must be mapped decrypted. Add a function which is called after the page allocator is up, and allocate memory for the pvclock data pointers for the all possible cpus. Map this memory range as decrypted when SEV is active. Fixes: 368a540e0232 ("x86/kvmclock: Remove memblock dependency") Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Borislav Petkov <bp@suse.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Sean Christopherson <sean.j.christopherson@intel.com> Cc: "Radim Krčmář" <rkrcmar@redhat.com> Cc: kvm@vger.kernel.org Link: https://lkml.kernel.org/r/1536932759-12905-3-git-send-email-brijesh.singh@amd.com
|
#
5c83511b |
|
28-Aug-2018 |
Juergen Gross <jgross@suse.com> |
x86/paravirt: Use a single ops structure Instead of using six globally visible paravirt ops structures combine them in a single structure, keeping the original structures as sub-structures. This avoids the need to assemble struct paravirt_patch_template at runtime on the stack each time apply_paravirt() is being called (i.e. when loading a module). [ tglx: Made the struct and the initializer tabular for readability sake ] Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: xen-devel@lists.xenproject.org Cc: virtualization@lists.linux-foundation.org Cc: akataria@vmware.com Cc: rusty@rustcorp.com.au Cc: boris.ostrovsky@oracle.com Cc: hpa@zytor.com Link: https://lkml.kernel.org/r/20180828074026.820-9-jgross@suse.com
|
#
1088c6ee |
|
30-Jul-2018 |
Dou Liyang <douly.fnst@cn.fujitsu.com> |
x86/kvmclock: Mark kvm_get_preset_lpj() as __init kvm_get_preset_lpj() is only called from kvmclock_init(), so mark it __init as well. Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Pavel Tatashin <pasha.tatashin@oracle.com> Cc: <hpa@zytor.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář<rkrcmar@redhat.com> Cc: <kvm@vger.kernel.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: "Radim Krčmář" <rkrcmar@redhat.com> Cc: kvm@vger.kernel.org Link: https://lkml.kernel.org/r/20180730075421.22830-3-douly.fnst@cn.fujitsu.com
|
#
95a3d445 |
|
19-Jul-2018 |
Thomas Gleixner <tglx@linutronix.de> |
x86/kvmclock: Switch kvmclock data to a PER_CPU variable The previous removal of the memblock dependency from kvmclock introduced a static data array sized 64bytes * CONFIG_NR_CPUS. That's wasteful on large systems when kvmclock is not used. Replace it with: - A static page sized array of pvclock data. It's page sized because the pvclock data of the boot cpu is mapped into the VDSO so otherwise random other data would be exposed to the vDSO - A PER_CPU variable of pvclock data pointers. This is used to access the pcvlock data storage on each CPU. The setup is done in two stages: - Early boot stores the pointer to the static page for the boot CPU in the per cpu data. - In the preparatory stage of CPU hotplug assign either an element of the static array (when the CPU number is in that range) or allocate memory and initialize the per cpu pointer. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Cc: steven.sistare@oracle.com Cc: daniel.m.jordan@oracle.com Cc: linux@armlinux.org.uk Cc: schwidefsky@de.ibm.com Cc: heiko.carstens@de.ibm.com Cc: john.stultz@linaro.org Cc: sboyd@codeaurora.org Cc: hpa@zytor.com Cc: douly.fnst@cn.fujitsu.com Cc: peterz@infradead.org Cc: prarit@redhat.com Cc: feng.tang@intel.com Cc: pmladek@suse.com Cc: gnomes@lxorguk.ukuu.org.uk Cc: linux-s390@vger.kernel.org Cc: boris.ostrovsky@oracle.com Cc: jgross@suse.com Link: https://lkml.kernel.org/r/20180719205545.16512-8-pasha.tatashin@oracle.com
|
#
e499a9b6 |
|
19-Jul-2018 |
Thomas Gleixner <tglx@linutronix.de> |
x86/kvmclock: Move kvmclock vsyscall param and init to kvmclock There is no point to have this in the kvm code itself and call it from there. This can be called from an initcall and the parameter is cleared when the hypervisor is not KVM. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Cc: steven.sistare@oracle.com Cc: daniel.m.jordan@oracle.com Cc: linux@armlinux.org.uk Cc: schwidefsky@de.ibm.com Cc: heiko.carstens@de.ibm.com Cc: john.stultz@linaro.org Cc: sboyd@codeaurora.org Cc: hpa@zytor.com Cc: douly.fnst@cn.fujitsu.com Cc: peterz@infradead.org Cc: prarit@redhat.com Cc: feng.tang@intel.com Cc: pmladek@suse.com Cc: gnomes@lxorguk.ukuu.org.uk Cc: linux-s390@vger.kernel.org Cc: boris.ostrovsky@oracle.com Cc: jgross@suse.com Link: https://lkml.kernel.org/r/20180719205545.16512-7-pasha.tatashin@oracle.com
|
#
42f8df93 |
|
19-Jul-2018 |
Thomas Gleixner <tglx@linutronix.de> |
x86/kvmclock: Mark variables __initdata and __ro_after_init The kvmclock parameter is init data and the other variables are not modified after init. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Cc: steven.sistare@oracle.com Cc: daniel.m.jordan@oracle.com Cc: linux@armlinux.org.uk Cc: schwidefsky@de.ibm.com Cc: heiko.carstens@de.ibm.com Cc: john.stultz@linaro.org Cc: sboyd@codeaurora.org Cc: hpa@zytor.com Cc: douly.fnst@cn.fujitsu.com Cc: peterz@infradead.org Cc: prarit@redhat.com Cc: feng.tang@intel.com Cc: pmladek@suse.com Cc: gnomes@lxorguk.ukuu.org.uk Cc: linux-s390@vger.kernel.org Cc: boris.ostrovsky@oracle.com Cc: jgross@suse.com Link: https://lkml.kernel.org/r/20180719205545.16512-6-pasha.tatashin@oracle.com
|
#
146c394d |
|
19-Jul-2018 |
Thomas Gleixner <tglx@linutronix.de> |
x86/kvmclock: Cleanup the code - Cleanup the mrs write for wall clock. The type casts to (int) are sloppy because the wrmsr parameters are u32 and aside of that wrmsrl() already provides the high/low split for free. - Remove the pointless get_cpu()/put_cpu() dance from various functions. Either they are called during early init where CPU is guaranteed to be 0 or they are already called from non preemptible context where smp_processor_id() can be used safely - Simplify the convoluted check for kvmclock in the init function. - Mark the parameter parsing function __init. No point in keeping it around. - Convert to pr_info() Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Cc: steven.sistare@oracle.com Cc: daniel.m.jordan@oracle.com Cc: linux@armlinux.org.uk Cc: schwidefsky@de.ibm.com Cc: heiko.carstens@de.ibm.com Cc: john.stultz@linaro.org Cc: sboyd@codeaurora.org Cc: hpa@zytor.com Cc: douly.fnst@cn.fujitsu.com Cc: peterz@infradead.org Cc: prarit@redhat.com Cc: feng.tang@intel.com Cc: pmladek@suse.com Cc: gnomes@lxorguk.ukuu.org.uk Cc: linux-s390@vger.kernel.org Cc: boris.ostrovsky@oracle.com Cc: jgross@suse.com Link: https://lkml.kernel.org/r/20180719205545.16512-5-pasha.tatashin@oracle.com
|
#
7a5ddc8f |
|
19-Jul-2018 |
Thomas Gleixner <tglx@linutronix.de> |
x86/kvmclock: Decrapify kvm_register_clock() The return value is pointless because the wrmsr cannot fail if KVM_FEATURE_CLOCKSOURCE or KVM_FEATURE_CLOCKSOURCE2 are set. kvm_register_clock() is only called locally so wants to be static. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Cc: steven.sistare@oracle.com Cc: daniel.m.jordan@oracle.com Cc: linux@armlinux.org.uk Cc: schwidefsky@de.ibm.com Cc: heiko.carstens@de.ibm.com Cc: john.stultz@linaro.org Cc: sboyd@codeaurora.org Cc: hpa@zytor.com Cc: douly.fnst@cn.fujitsu.com Cc: peterz@infradead.org Cc: prarit@redhat.com Cc: feng.tang@intel.com Cc: pmladek@suse.com Cc: gnomes@lxorguk.ukuu.org.uk Cc: linux-s390@vger.kernel.org Cc: boris.ostrovsky@oracle.com Cc: jgross@suse.com Link: https://lkml.kernel.org/r/20180719205545.16512-4-pasha.tatashin@oracle.com
|
#
7ef363a3 |
|
19-Jul-2018 |
Thomas Gleixner <tglx@linutronix.de> |
x86/kvmclock: Remove page size requirement from wall_clock There is no requirement for wall_clock data to be page aligned or page sized. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Cc: steven.sistare@oracle.com Cc: daniel.m.jordan@oracle.com Cc: linux@armlinux.org.uk Cc: schwidefsky@de.ibm.com Cc: heiko.carstens@de.ibm.com Cc: john.stultz@linaro.org Cc: sboyd@codeaurora.org Cc: hpa@zytor.com Cc: douly.fnst@cn.fujitsu.com Cc: peterz@infradead.org Cc: prarit@redhat.com Cc: feng.tang@intel.com Cc: pmladek@suse.com Cc: gnomes@lxorguk.ukuu.org.uk Cc: linux-s390@vger.kernel.org Cc: boris.ostrovsky@oracle.com Cc: jgross@suse.com Link: https://lkml.kernel.org/r/20180719205545.16512-3-pasha.tatashin@oracle.com
|
#
368a540e |
|
19-Jul-2018 |
Pavel Tatashin <pasha.tatashin@oracle.com> |
x86/kvmclock: Remove memblock dependency KVM clock is initialized later compared to other hypervisor clocks because it has a dependency on the memblock allocator. Bring it in line with other hypervisors by using memory from the BSS instead of allocating it. The benefits: - Remove ifdef from common code - Earlier availability of the clock - Remove dependency on memblock, and reduce code The downside: - Static allocation of the per cpu data structures sized NR_CPUS * 64byte Will be addressed in follow up patches. [ tglx: Split out from larger series ] Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Cc: steven.sistare@oracle.com Cc: daniel.m.jordan@oracle.com Cc: linux@armlinux.org.uk Cc: schwidefsky@de.ibm.com Cc: heiko.carstens@de.ibm.com Cc: john.stultz@linaro.org Cc: sboyd@codeaurora.org Cc: hpa@zytor.com Cc: douly.fnst@cn.fujitsu.com Cc: peterz@infradead.org Cc: prarit@redhat.com Cc: feng.tang@intel.com Cc: pmladek@suse.com Cc: gnomes@lxorguk.ukuu.org.uk Cc: linux-s390@vger.kernel.org Cc: boris.ostrovsky@oracle.com Cc: jgross@suse.com Link: https://lkml.kernel.org/r/20180719205545.16512-2-pasha.tatashin@oracle.com
|
#
e10f7805 |
|
14-Jul-2018 |
Peng Hao <peng.hao2@zte.com.cn> |
kvmclock: fix TSC calibration for nested guests Inside a nested guest, access to hardware can be slow enough that tsc_read_refs always return ULLONG_MAX, causing tsc_refine_calibration_work to be called periodically and the nested guest to spend a lot of time reading the ACPI timer. However, if the TSC frequency is available from the pvclock page, we can just set X86_FEATURE_TSC_KNOWN_FREQ and avoid the recalibration. 'refine' operation. Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Peng Hao <peng.hao2@zte.com.cn> [Commit message rewritten. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
94ffba48 |
|
15-Jul-2018 |
Radim Krčmář <rkrcmar@redhat.com> |
x86/kvmclock: set pvti_cpu0_va after enabling kvmclock pvti_cpu0_va is the address of shared kvmclock data structure. pvti_cpu0_va is currently kept unset (1) on 32 bit systems, (2) when kvmclock vsyscall is disabled, and (3) if kvmclock is not stable. This poses a problem, because kvm_ptp needs pvti_cpu0_va, but (1) can work on 32 bit, (2) has little relation to the vsyscall, and (3) does not need stable kvmclock (although kvmclock won't be used for system clock if it's not stable, so kvm_ptp is pointless in that case). Expose pvti_cpu0_va whenever kvmclock is enabled to allow all users to work with it. This fixes a regression found on Gentoo: https://bugs.gentoo.org/658544. Fixes: 9f08890ab906 ("x86/pvclock: add setter for pvclock_pvti_cpu0_va") Cc: stable@vger.kernel.org Reported-by: Andreas Steinmetz <ast@domdv.de> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
e27c4929 |
|
27-Apr-2018 |
Arnd Bergmann <arnd@arndb.de> |
x86: Convert x86_platform_ops to timespec64 The x86 platform operations are fairly isolated, so it's easy to change them from using timespec to timespec64. It has been checked that all the users and callers are safe, and there is only one critical function that is broken beyond 2106: pvclock_read_wallclock() uses a 32-bit number of seconds since the epoch to communicate the boot time between host and guest in a virtual environment. This will work until 2106, but fixing this is outside the scope of this change, Add a comment at least. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Acked-by: Radim Krčmář <rkrcmar@redhat.com> Acked-by: Jan Kiszka <jan.kiszka@siemens.com> Cc: Juergen Gross <jgross@suse.com> Cc: jailhouse-dev@googlegroups.com Cc: Borislav Petkov <bp@suse.de> Cc: kvm@vger.kernel.org Cc: y2038@lists.linaro.org Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com> Cc: xen-devel@lists.xenproject.org Cc: John Stultz <john.stultz@linaro.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Joao Martins <joao.m.martins@oracle.com> Link: https://lkml.kernel.org/r/20180427201435.3194219-1-arnd@arndb.de
|
#
9f08890a |
|
08-Nov-2017 |
Joao Martins <joao.m.martins@oracle.com> |
x86/pvclock: add setter for pvclock_pvti_cpu0_va Right now there is only a pvclock_pvti_cpu0_va() which is defined on kvmclock since: commit dac16fba6fc5 ("x86/vdso: Get pvclock data from the vvar VMA instead of the fixmap") The only user of this interface so far is kvm. This commit adds a setter function for the pvti page and moves pvclock_pvti_cpu0_va to pvclock, which is a more generic place to have it; and would allow other PV clocksources to use it, such as Xen. While moving pvclock_pvti_cpu0_va into pvclock, rename also this function to pvclock_get_pvti_cpu0_va (including its call sites) to be symmetric with the setter (pvclock_set_pvti_cpu0_va). Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Acked-by: Andy Lutomirski <luto@kernel.org> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
|
#
819aeee0 |
|
20-Oct-2017 |
Brijesh Singh <brijesh.singh@amd.com> |
X86/KVM: Clear encryption attribute when SEV is active The guest physical memory area holding the struct pvclock_wall_clock and struct pvclock_vcpu_time_info are shared with the hypervisor. It periodically updates the contents of the memory. When SEV is active, the encryption attributes from the shared memory pages must be cleared so that both hypervisor and guest can access the data. Signed-off-by: Brijesh Singh <brijesh.singh@amd.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Borislav Petkov <bp@suse.de> Tested-by: Borislav Petkov <bp@suse.de> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: kvm@vger.kernel.org Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Paolo Bonzini <pbonzini@redhat.com> Link: https://lkml.kernel.org/r/20171020143059.3291-18-brijesh.singh@amd.com
|
#
00875520 |
|
31-Oct-2017 |
Jason Gunthorpe <jgg@ziepe.ca> |
kvm: Return -ENODEV from update_persistent_clock kvm does not support setting the RTC, so the correct result is -ENODEV. Returning -1 will cause sync_cmos_clock to keep trying to set the RTC every second. Signed-off-by: Jason Gunthorpe <jgg@ziepe.ca> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
e6017571 |
|
01-Feb-2017 |
Ingo Molnar <mingo@kernel.org> |
sched/headers: Prepare for new header dependencies before moving code to <linux/sched/clock.h> We are going to split <linux/sched/clock.h> out of <linux/sched.h>, which will have to be picked up from other headers and .c files. Create a trivial placeholder <linux/sched/clock.h> file that just maps to <linux/sched.h> to make this patch obviously correct and bisectable. Include the new header in the files that are going to need it. Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
f4066c2b |
|
24-Jan-2017 |
Marcelo Tosatti <mtosatti@redhat.com> |
kvmclock: export kvmclock clocksource and data pointers To be used by KVM PTP driver. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
acb04058 |
|
19-Jan-2017 |
Peter Zijlstra <peterz@infradead.org> |
sched/clock: Fix hotplug crash Mike reported that he could trigger the WARN_ON_ONCE() in set_sched_clock_stable() using hotplug. This exposed a fundamental problem with the interface, we should never mark the TSC stable if we ever find it to be unstable. Therefore set_sched_clock_stable() is a broken interface. The reason it existed is that not having it is a pain, it means all relevant architecture code needs to call clear_sched_clock_stable() where appropriate. Of the three architectures that select HAVE_UNSTABLE_SCHED_CLOCK ia64 and parisc are trivial in that they never called set_sched_clock_stable(), so add an unconditional call to clear_sched_clock_stable() to them. For x86 the story is a lot more involved, and what this patch tries to do is ensure we preserve the status quo. So even is Cyrix or Transmeta have usable TSC they never called set_sched_clock_stable() so they now get an explicit mark unstable. Reported-by: Mike Galbraith <efault@gmx.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: 9881b024b7d7 ("sched/clock: Delay switching sched_clock to stable") Link: http://lkml.kernel.org/r/20170119133633.GB6536@twins.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
a5a1d1c2 |
|
21-Dec-2016 |
Thomas Gleixner <tglx@linutronix.de> |
clocksource: Use a plain u64 instead of cycle_t There is no point in having an extra type for extra confusion. u64 is unambiguous. Conversion was done with the following coccinelle script: @rem@ @@ -typedef u64 cycle_t; @fix@ typedef cycle_t; @@ -cycle_t +u64 Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: John Stultz <john.stultz@linaro.org>
|
#
a4497a86 |
|
08-Sep-2016 |
Prarit Bhargava <prarit@redhat.com> |
x86, clock: Fix kvm guest tsc initialization When booting a kvm guest on AMD with the latest kernel the following messages are displayed in the boot log: tsc: Unable to calibrate against PIT tsc: HPET/PMTIMER calibration failed aa297292d708 ("x86/tsc: Enumerate SKL cpu_khz and tsc_khz via CPUID") introduced a change to account for a difference in cpu and tsc frequencies for Intel SKL processors. Before this change the native tsc set x86_platform.calibrate_tsc to native_calibrate_tsc() which is a hardware calibration of the tsc, and in tsc_init() executed tsc_khz = x86_platform.calibrate_tsc(); cpu_khz = tsc_khz; The kvm code changed x86_platform.calibrate_tsc to kvm_get_tsc_khz() and executed the same tsc_init() function. This meant that KVM guests did not execute the native hardware calibration function. After aa297292d708, there are separate native calibrations for cpu_khz and tsc_khz. The code sets x86_platform.calibrate_tsc to native_calibrate_tsc() which is now an Intel specific calibration function, and x86_platform.calibrate_cpu to native_calibrate_cpu() which is the "old" native_calibrate_tsc() function (ie, the native hardware calibration function). tsc_init() now does cpu_khz = x86_platform.calibrate_cpu(); tsc_khz = x86_platform.calibrate_tsc(); if (tsc_khz == 0) tsc_khz = cpu_khz; else if (abs(cpu_khz - tsc_khz) * 10 > tsc_khz) cpu_khz = tsc_khz; The kvm code should not call the hardware initialization in native_calibrate_cpu(), as it isn't applicable for kvm and it didn't do that prior to aa297292d708. This patch resolves this issue by setting x86_platform.calibrate_cpu to kvm_get_tsc_khz(). v2: I had originally set x86_platform.calibrate_cpu to cpu_khz_from_cpuid(), however, pbonzini pointed out that the CPUID leaf in that function is not available in KVM. I have changed the function pointer to kvm_get_tsc_khz(). Fixes: aa297292d708 ("x86/tsc: Enumerate SKL cpu_khz and tsc_khz via CPUID") Signed-off-by: Prarit Bhargava <prarit@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: x86@kernel.org Cc: Len Brown <len.brown@intel.com> Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org> Cc: Borislav Petkov <bp@suse.de> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: "Christopher S. Hall" <christopher.s.hall@intel.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: kvm@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
404f6aac |
|
08-Aug-2016 |
Kees Cook <keescook@chromium.org> |
x86: Apply more __ro_after_init and const Guided by grsecurity's analogous __read_only markings in arch/x86, this applies several uses of __ro_after_init to structures that are only updated during __init, and const for some structures that are never updated. Additionally extends __init markings to some functions that are only used during __init, and cleans up some missing C99 style static initializers. Signed-off-by: Kees Cook <keescook@chromium.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Brad Spengler <spender@grsecurity.net> Cc: Brian Gerst <brgerst@gmail.com> Cc: David Brown <david.brown@linaro.org> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Emese Revfy <re.emese@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mathias Krause <minipli@googlemail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: PaX Team <pageexec@freemail.hu> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: kernel-hardening@lists.openwall.com Link: http://lkml.kernel.org/r/20160808232906.GA29731@www.outflux.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
6a6256f9 |
|
23-Feb-2016 |
Adam Buchbinder <adam.buchbinder@gmail.com> |
x86: Fix misspellings in comments Signed-off-by: Adam Buchbinder <adam.buchbinder@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: trivial@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
cc1e24fd |
|
10-Dec-2015 |
Andy Lutomirski <luto@kernel.org> |
x86/vdso: Remove pvclock fixmap machinery Signed-off-by: Andy Lutomirski <luto@kernel.org> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/4933029991103ae44672c82b97a20035f5c1fe4f.1449702533.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
dac16fba |
|
10-Dec-2015 |
Andy Lutomirski <luto@kernel.org> |
x86/vdso: Get pvclock data from the vvar VMA instead of the fixmap Signed-off-by: Andy Lutomirski <luto@kernel.org> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/9d37826fdc7e2d2809efe31d5345f97186859284.1449702533.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
72c930dc |
|
18-Sep-2015 |
Radim Krčmář <rkrcmar@redhat.com> |
x86: kvmclock: abolish PVCLOCK_COUNTS_FROM_ZERO Newer KVM won't be exposing PVCLOCK_COUNTS_FROM_ZERO anymore. The purpose of that flags was to start counting system time from 0 when the KVM clock has been initialized. We can achieve the same by selecting one read as the initial point. A simple subtraction will work unless the KVM clock count overflows earlier (has smaller width) than scheduler's cycle count. We should be safe till x86_128. Because PVCLOCK_COUNTS_FROM_ZERO was enabled only on new hypervisors, setting sched clock as stable based on PVCLOCK_TSC_STABLE_BIT might regress on older ones. I presume we don't need to change kvm_clock_read instead of introducing kvm_sched_clock_read. A problem could arise in case sched_clock is expected to return the same value as get_cycles, but we should have merged those clocks in that case. Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Acked-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
2965faa5 |
|
09-Sep-2015 |
Dave Young <dyoung@redhat.com> |
kexec: split kexec_load syscall from kexec core code There are two kexec load syscalls, kexec_load another and kexec_file_load. kexec_file_load has been splited as kernel/kexec_file.c. In this patch I split kexec_load syscall code to kernel/kexec.c. And add a new kconfig option KEXEC_CORE, so we can disable kexec_load and use kexec_file_load only, or vice verse. The original requirement is from Ted Ts'o, he want kexec kernel signature being checked with CONFIG_KEXEC_VERIFY_SIG enabled. But kexec-tools use kexec_load syscall can bypass the checking. Vivek Goyal proposed to create a common kconfig option so user can compile in only one syscall for loading kexec kernel. KEXEC/KEXEC_FILE selects KEXEC_CORE so that old config files still work. Because there's general code need CONFIG_KEXEC_CORE, so I updated all the architecture Kconfig with a new option KEXEC_CORE, and let KEXEC selects KEXEC_CORE in arch Kconfig. Also updated general kernel code with to kexec_load syscall. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Dave Young <dyoung@redhat.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Petr Tesarik <ptesarik@suse.cz> Cc: Theodore Ts'o <tytso@mit.edu> Cc: Josh Boyer <jwboyer@fedoraproject.org> Cc: David Howells <dhowells@redhat.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
0ad83caa |
|
28-May-2015 |
Luiz Capitulino <lcapitulino@redhat.com> |
x86: kvmclock: set scheduler clock stable If you try to enable NOHZ_FULL on a guest today, you'll get the following error when the guest tries to deactivate the scheduler tick: WARNING: CPU: 3 PID: 2182 at kernel/time/tick-sched.c:192 can_stop_full_tick+0xb9/0x290() NO_HZ FULL will not work with unstable sched clock CPU: 3 PID: 2182 Comm: kworker/3:1 Not tainted 4.0.0-10545-gb9bb6fb #204 Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 Workqueue: events flush_to_ldisc ffffffff8162a0c7 ffff88011f583e88 ffffffff814e6ba0 0000000000000002 ffff88011f583ed8 ffff88011f583ec8 ffffffff8104d095 ffff88011f583eb8 0000000000000000 0000000000000003 0000000000000001 0000000000000001 Call Trace: <IRQ> [<ffffffff814e6ba0>] dump_stack+0x4f/0x7b [<ffffffff8104d095>] warn_slowpath_common+0x85/0xc0 [<ffffffff8104d146>] warn_slowpath_fmt+0x46/0x50 [<ffffffff810bd2a9>] can_stop_full_tick+0xb9/0x290 [<ffffffff810bd9ed>] tick_nohz_irq_exit+0x8d/0xb0 [<ffffffff810511c5>] irq_exit+0xc5/0x130 [<ffffffff814f180a>] smp_apic_timer_interrupt+0x4a/0x60 [<ffffffff814eff5e>] apic_timer_interrupt+0x6e/0x80 <EOI> [<ffffffff814ee5d1>] ? _raw_spin_unlock_irqrestore+0x31/0x60 [<ffffffff8108bbc8>] __wake_up+0x48/0x60 [<ffffffff8134836c>] n_tty_receive_buf_common+0x49c/0xba0 [<ffffffff8134a6bf>] ? tty_ldisc_ref+0x1f/0x70 [<ffffffff81348a84>] n_tty_receive_buf2+0x14/0x20 [<ffffffff8134b390>] flush_to_ldisc+0xe0/0x120 [<ffffffff81064d05>] process_one_work+0x1d5/0x540 [<ffffffff81064c81>] ? process_one_work+0x151/0x540 [<ffffffff81065191>] worker_thread+0x121/0x470 [<ffffffff81065070>] ? process_one_work+0x540/0x540 [<ffffffff8106b4df>] kthread+0xef/0x110 [<ffffffff8106b3f0>] ? __kthread_parkme+0xa0/0xa0 [<ffffffff814ef4f2>] ret_from_fork+0x42/0x70 [<ffffffff8106b3f0>] ? __kthread_parkme+0xa0/0xa0 ---[ end trace 06e3507544a38866 ]--- However, it turns out that kvmclock does provide a stable sched_clock callback. So, let the scheduler know this which in turn makes NOHZ_FULL work in the guest. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
c35ebbea |
|
13-May-2015 |
Paolo Bonzini <pbonzini@redhat.com> |
Revert "kvmclock: set scheduler clock stable" This reverts commit ff7bbb9c6ab6e6620429daeff39424bbde1a94b4. Sasha Levin is seeing odd jump in time values during boot of a KVM guest: [...] [ 0.000000] tsc: Detected 2260.998 MHz processor [3376355.247558] Calibrating delay loop (skipped) preset value.. [...] and bisected them to this commit. Reported-by: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
ff7bbb9c |
|
23-Apr-2015 |
Luiz Capitulino <lcapitulino@redhat.com> |
kvmclock: set scheduler clock stable If you try to enable NOHZ_FULL on a guest today, you'll get the following error when the guest tries to deactivate the scheduler tick: WARNING: CPU: 3 PID: 2182 at kernel/time/tick-sched.c:192 can_stop_full_tick+0xb9/0x290() NO_HZ FULL will not work with unstable sched clock CPU: 3 PID: 2182 Comm: kworker/3:1 Not tainted 4.0.0-10545-gb9bb6fb #204 Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 Workqueue: events flush_to_ldisc ffffffff8162a0c7 ffff88011f583e88 ffffffff814e6ba0 0000000000000002 ffff88011f583ed8 ffff88011f583ec8 ffffffff8104d095 ffff88011f583eb8 0000000000000000 0000000000000003 0000000000000001 0000000000000001 Call Trace: <IRQ> [<ffffffff814e6ba0>] dump_stack+0x4f/0x7b [<ffffffff8104d095>] warn_slowpath_common+0x85/0xc0 [<ffffffff8104d146>] warn_slowpath_fmt+0x46/0x50 [<ffffffff810bd2a9>] can_stop_full_tick+0xb9/0x290 [<ffffffff810bd9ed>] tick_nohz_irq_exit+0x8d/0xb0 [<ffffffff810511c5>] irq_exit+0xc5/0x130 [<ffffffff814f180a>] smp_apic_timer_interrupt+0x4a/0x60 [<ffffffff814eff5e>] apic_timer_interrupt+0x6e/0x80 <EOI> [<ffffffff814ee5d1>] ? _raw_spin_unlock_irqrestore+0x31/0x60 [<ffffffff8108bbc8>] __wake_up+0x48/0x60 [<ffffffff8134836c>] n_tty_receive_buf_common+0x49c/0xba0 [<ffffffff8134a6bf>] ? tty_ldisc_ref+0x1f/0x70 [<ffffffff81348a84>] n_tty_receive_buf2+0x14/0x20 [<ffffffff8134b390>] flush_to_ldisc+0xe0/0x120 [<ffffffff81064d05>] process_one_work+0x1d5/0x540 [<ffffffff81064c81>] ? process_one_work+0x151/0x540 [<ffffffff81065191>] worker_thread+0x121/0x470 [<ffffffff81065070>] ? process_one_work+0x540/0x540 [<ffffffff8106b4df>] kthread+0xef/0x110 [<ffffffff8106b3f0>] ? __kthread_parkme+0xa0/0xa0 [<ffffffff814ef4f2>] ret_from_fork+0x42/0x70 [<ffffffff8106b3f0>] ? __kthread_parkme+0xa0/0xa0 ---[ end trace 06e3507544a38866 ]--- However, it turns out that kvmclock does provide a stable sched_clock callback. So, let the scheduler know this which in turn makes NOHZ_FULL work in the guest. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
29fa6825 |
|
05-Dec-2014 |
Andy Lutomirski <luto@amacapital.net> |
x86, kvm: Clear paravirt_enabled on KVM guests for espfix32's benefit paravirt_enabled has the following effects: - Disables the F00F bug workaround warning. There is no F00F bug workaround any more because Linux's standard IDT handling already works around the F00F bug, but the warning still exists. This is only cosmetic, and, in any event, there is no such thing as KVM on a CPU with the F00F bug. - Disables 32-bit APM BIOS detection. On a KVM paravirt system, there should be no APM BIOS anyway. - Disables tboot. I think that the tboot code should check the CPUID hypervisor bit directly if it matters. - paravirt_enabled disables espfix32. espfix32 should *not* be disabled under KVM paravirt. The last point is the purpose of this patch. It fixes a leak of the high 16 bits of the kernel stack address on 32-bit KVM paravirt guests. Fixes CVE-2014-8134. Cc: stable@vger.kernel.org Suggested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
c6338ce4 |
|
26-Sep-2014 |
Tiejun Chen <tiejun.chen@intel.com> |
kvm: kvmclock: use get_cpu() and put_cpu() We can use get_cpu() and put_cpu() to replace preempt_disable()/cpu = smp_processor_id() and preempt_enable() for slightly better code. Signed-off-by: Tiejun Chen <tiejun.chen@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
0d75de4a |
|
18-Feb-2014 |
Fernando Luis Vázquez Cao <fernando_b1@lab.ntt.co.jp> |
kvm: remove redundant registration of BSP's hv_clock area These days hv_clock allocation is memblock based (i.e. the percpu allocator is not involved), which means that the physical address of each of the per-cpu hv_clock areas is guaranteed to remain unchanged through all its lifetime and we do not need to update its location after CPU bring-up. Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
d63285e9 |
|
11-Oct-2013 |
Marcelo Tosatti <mtosatti@redhat.com> |
pvclock: detect watchdog reset at pvclock read Implement reset of kernel watchdogs at pvclock read time. This avoids adding special code to every watchdog. This is possible for watchdogs which measure time based on sched_clock() or ktime_get() variants. Suggested by Don Zickus. Acked-by: Don Zickus <dzickus@redhat.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>
|
#
148f9bb8 |
|
18-Jun-2013 |
Paul Gortmaker <paul.gortmaker@windriver.com> |
x86: delete __cpuinit usage from all x86 files The __cpuinit type of throwaway sections might have made sense some time ago when RAM was more constrained, but now the savings do not offset the cost and complications. For example, the fix in commit 5e427ec2d0 ("x86: Fix bit corruption at CPU resume time") is a good example of the nasty type of bugs that can be created with improper use of the various __init prefixes. After a discussion on LKML[1] it was decided that cpuinit should go the way of devinit and be phased out. Once all the users are gone, we can then finally remove the macros themselves from linux/init.h. Note that some harmless section mismatch warnings may result, since notify_cpu_starting() and cpu_up() are arch independent (kernel/cpu.c) are flagged as __cpuinit -- so if we remove the __cpuinit from arch specific callers, we will also get section mismatch warnings. As an intermediate step, we intend to turn the linux/init.h cpuinit content into no-ops as early as possible, since that will get rid of these warnings. In any case, they are temporary and harmless. This removes all the arch/x86 uses of the __cpuinit macros from all C files. x86 only had the one __CPUINIT used in assembly files, and it wasn't paired off with a .previous or a __FINIT, so we can delete it directly w/o any corresponding additional change there. [1] https://lkml.org/lkml/2013/5/20/589 Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: x86@kernel.org Acked-by: Ingo Molnar <mingo@kernel.org> Acked-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: H. Peter Anvin <hpa@linux.intel.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
|
#
07868fc6 |
|
10-Jun-2013 |
Igor Mammedov <imammedo@redhat.com> |
x86: kvmclock: zero initialize pvclock shared memory area kernel might hung in pvclock_clocksource_read() due to uninitialized memory might contain odd version value in following cycle: do { version = __pvclock_read_cycles(src, &ret, &flags); } while ((src->version & 1) || version != src->version); if secondary kvmclock is accessed before it's registered with kvm. Clear garbage in pvclock shared memory area right after it's allocated to avoid this issue. Ref: https://bugzilla.kernel.org/show_bug.cgi?id=59521 Signed-off-by: Igor Mammedov <imammedo@redhat.com> [See BZ for analysis. We may want a different fix for 3.11, but this is the safest for now - Paolo] Cc: <stable@vger.kernel.org> # 3.8 Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
3565184e |
|
13-May-2013 |
David Vrabel <david.vrabel@citrix.com> |
x86: Increase precision of x86_platform.get/set_wallclock() All the virtualized platforms (KVM, lguest and Xen) have persistent wallclocks that have more than one second of precision. read_persistent_wallclock() and update_persistent_wallclock() allow for nanosecond precision but their implementation on x86 with x86_platform.get/set_wallclock() only allows for one second precision. This means guests may see a wallclock time that is off by up to 1 second. Make set_wallclock() and get_wallclock() take a struct timespec parameter (which allows for nanosecond precision) so KVM and Xen guests may start with a more accurate wallclock time and a Xen dom0 can maintain a more accurate wallclock for guests. Signed-off-by: David Vrabel <david.vrabel@citrix.com> Signed-off-by: John Stultz <john.stultz@linaro.org>
|
#
fe1140cc |
|
23-Feb-2013 |
Jan Kiszka <jan.kiszka@siemens.com> |
x86: kvmclock: Do not setup kvmclock vsyscall in the absence of that clock This fixes boot lockups with "no-kvmclock", when the host is not exposing this particular feature (QEMU: -cpu ...,-kvmclock) or when the kvmclock initialization failed for whatever reason. Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>
|
#
ed55705d |
|
18-Feb-2013 |
Marcelo Tosatti <mtosatti@redhat.com> |
x86: pvclock kvm: align allocation size to page size To match whats mapped via vsyscalls to userspace. Reported-by: Peter Hurley <peter@hurleysoftware.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
|
#
5dfd486c |
|
22-Jan-2013 |
Dave Hansen <dave@linux.vnet.ibm.com> |
x86, kvm: Fix kvm's use of __pa() on percpu areas In short, it is illegal to call __pa() on an address holding a percpu variable. This replaces those __pa() calls with slow_virt_to_phys(). All of the cases in this patch are in boot time (or CPU hotplug time at worst) code, so the slow pagetable walking in slow_virt_to_phys() is not expected to have a performance impact. The times when this actually matters are pretty obscure (certain 32-bit NUMA systems), but it _does_ happen. It is important to keep KVM guests working on these systems because the real hardware is getting harder and harder to find. This bug manifested first by me seeing a plain hang at boot after this message: CPU 0 irqstacks, hard=f3018000 soft=f301a000 or, sometimes, it would actually make it out to the console: [ 0.000000] BUG: unable to handle kernel paging request at ffffffff I eventually traced it down to the KVM async pagefault code. This can be worked around by disabling that code either at compile-time, or on the kernel command-line. The kvm async pagefault code was injecting page faults in to the guest which the guest misinterpreted because its "reason" was not being properly sent from the host. The guest passes a physical address of an per-cpu async page fault structure via an MSR to the host. Since __pa() is broken on percpu data, the physical address it sent was bascially bogus and the host went scribbling on random data. The guest never saw the real reason for the page fault (it was injected by the host), assumed that the kernel had taken a _real_ page fault, and panic()'d. The behavior varied, though, depending on what got corrupted by the bad write. Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com> Link: http://lkml.kernel.org/r/20130122212435.4905663F@kernel.stglabs.ibm.com Acked-by: Rik van Riel <riel@redhat.com> Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
|
#
3dc4f7cf |
|
27-Nov-2012 |
Marcelo Tosatti <mtosatti@redhat.com> |
x86: kvm guest: pvclock vsyscall support Hook into generic pvclock vsyscall code, with the aim to allow userspace to have visibility into pvclock data. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
|
#
7069ed67 |
|
27-Nov-2012 |
Marcelo Tosatti <mtosatti@redhat.com> |
x86: kvmclock: allocate pvclock shared memory area We want to expose the pvclock shared memory areas, which the hypervisor periodically updates, to userspace. For a linear mapping from userspace, it is necessary that entire page sized regions are used for array of pvclock structures. There is no such guarantee with per cpu areas, therefore move to memblock_alloc based allocation. Acked-by: Glauber Costa <glommer@parallels.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
|
#
e32025a5 |
|
11-Jun-2012 |
Marcelo Tosatti <mtosatti@redhat.com> |
x86: kvmclock: remove check_and_clear_guest_paused warning CPU offline path calls the hrtimer interrupt handler with interrupts disabled, without touching preempt_count, triggering this warning. Remove the warning since it is supposed to be used from hrtimer interrupt context only. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
|
#
24899709 |
|
15-Mar-2012 |
Eric B Munson <emunson@mgebm.net> |
kvmclock: remove unneeded EXPORT macro check_and_clear_guest_paused does not need to be exported as it isn't used by any modules, remove the export. Signed-off-by: Eric B Munson <emunson@mgebm.net> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
|
#
3b5d56b9 |
|
10-Mar-2012 |
Eric B Munson <emunson@mgebm.net> |
kvmclock: Add functions to check if the host has stopped the vm When a host stops or suspends a VM it will set a flag to show this. The watchdog will use these functions to determine if a softlockup is real, or the result of a suspended VM. Signed-off-by: Eric B Munson <emunson@mgebm.net> asm-generic changes Acked-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
|
#
b74f05d6 |
|
13-Feb-2012 |
Marcelo Tosatti <mtosatti@redhat.com> |
x86: kvmclock: abstract save/restore sched_clock_state Upon resume from hibernation, CPU 0's hvclock area contains the old values for system_time and tsc_timestamp. It is necessary for the hypervisor to update these values with uptodate ones before the CPU uses them. Abstract TSC's save/restore sched_clock_state functions and use restore_state to write to KVM_SYSTEM_TIME MSR, forcing an update. Also move restore_sched_clock_state before __restore_processor_state, since the later calls CONFIG_LOCK_STAT's lockstat_clock (also for TSC). Thanks to Igor Mammedov for tracking it down. Fixes suspend-to-disk with kvmclock. Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
|
#
df156f90 |
|
07-Feb-2012 |
Igor Mammedov <imammedo@redhat.com> |
x86: Introduce x86_cpuinit.early_percpu_clock_init hook When kvm guest uses kvmclock, it may hang on vcpu hot-plug. This is caused by an overflow in pvclock_get_nsec_offset, u64 delta = tsc - shadow->tsc_timestamp; which in turn is caused by an undefined values from percpu hv_clock that hasn't been initialized yet. Uninitialized clock on being booted cpu is accessed from start_secondary -> smp_callin -> smp_store_cpu_info -> identify_secondary_cpu -> mtrr_ap_init -> mtrr_restore -> stop_machine_from_inactive_cpu -> queue_stop_cpus_work ... -> sched_clock -> kvm_clock_read which is well before x86_cpuinit.setup_percpu_clockev call in start_secondary, where percpu clock is initialized. This patch introduces a hook that allows to setup/initialize per_cpu clock early and avoid overflow due to reading - undefined values - old values if cpu was offlined and then onlined again Another possible early user of this clock source is ftrace that accesses it to get timestamps for ring buffer entries. So if mtrr_ap_init is moved from identify_secondary_cpu to past x86_cpuinit.setup_percpu_clockev in start_secondary, ftrace may cause the same overflow/hang on cpu hot-plug anyway. More complete description of the problem: https://lkml.org/lkml/2012/2/2/101 Credits to Marcelo Tosatti <mtosatti@redhat.com> for hook idea. Acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Igor Mammedov <imammedo@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
|
#
95ef1e52 |
|
15-Nov-2011 |
Avi Kivity <avi@redhat.com> |
KVM guest: prevent tracing recursion with kvmclock Prevent tracing of preempt_disable() in get_cpu_var() in kvm_clock_read(). When CONFIG_DEBUG_PREEMPT is enabled, preempt_disable/enable() are traced and this causes the function_graph tracer to go into an infinite recursion. By open coding the preempt_disable() around the get_cpu_var(), we can use the notrace version which prevents preempt_disable/enable() from being traced and prevents the recursion. Based on a similar patch for Xen from Jeremy Fitzhardinge. Tested-by: Gleb Natapov <gleb@redhat.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Avi Kivity <avi@redhat.com>
|
#
d910f5c1 |
|
11-Jul-2011 |
Glauber Costa <glommer@redhat.com> |
KVM guest: KVM Steal time registration This patch implements the kvm bits of the steal time infrastructure. The most important part of it, is the steal time clock. It is an continuous clock that shows the accumulated amount of steal time since vcpu creation. It is supposed to survive cpu offlining/onlining. [marcelo: fix build with CONFIG_KVM_GUEST=n] Signed-off-by: Glauber Costa <glommer@redhat.com> Acked-by: Rik van Riel <riel@redhat.com> Tested-by: Eric B Munson <emunson@mgebm.net> CC: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> CC: Peter Zijlstra <peterz@infradead.org> CC: Avi Kivity <avi@redhat.com> CC: Anthony Liguori <aliguori@us.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
|
#
b01cc1b0 |
|
26-Apr-2010 |
John Stultz <johnstul@us.ibm.com> |
x86: Convert remaining x86 clocksources to clocksource_register_hz/khz This converts the remaining x86 clocksources to use clocksource_register_hz/khz. CC: jacob.jun.pan@intel.com CC: Glauber Costa <glommer@redhat.com> CC: Dimitri Sivanich <sivanich@sgi.com> CC: Rusty Russell <rusty@rustcorp.com.au> CC: Jeremy Fitzhardinge <jeremy@xensource.com> CC: Chris McDermott <lcm@us.ibm.com> CC: Thomas Gleixner <tglx@linutronix.de> Tested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> [xen] Signed-off-by: John Stultz <johnstul@us.ibm.com>
|
#
ca3f1017 |
|
14-Oct-2010 |
Gleb Natapov <gleb@redhat.com> |
KVM paravirt: Move kvm_smp_prepare_boot_cpu() from kvmclock.c to kvm.c. Async PF also needs to hook into smp_prepare_boot_cpu so move the hook into generic code. Acked-by: Rik van Riel <riel@redhat.com> Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
|
#
19b6a85b |
|
02-Aug-2010 |
Arjan Koers <0h61vkll2ly8@xutrox.com> |
KVM guest: Move a printk that's using the clock before it's ready Fix a hang during SMP kernel boot on KVM that showed up after commit 489fb490dbf8dab0249ad82b56688ae3842a79e8 (2.6.35) and 59aab522154a2f17b25335b63c1cf68a51fb6ae0 (2.6.34.1). The problem only occurs when CONFIG_PRINTK_TIME is set. KVM-Stable-Tag. Signed-off-by: Arjan Koers <0h61vkll2ly8@xutrox.com> Signed-off-by: Avi Kivity <avi@redhat.com>
|
#
3a0d7256 |
|
10-May-2010 |
Glauber Costa <glommer@redhat.com> |
x86, paravirt: don't compute pvclock adjustments if we trust the tsc If the HV told us we can fully trust the TSC, skip any correction Signed-off-by: Glauber Costa <glommer@redhat.com> Acked-by: Zachary Amsden <zamsden@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
|
#
838815a7 |
|
10-May-2010 |
Glauber Costa <glommer@redhat.com> |
x86: KVM guest: Try using new kvm clock msrs We now added a new set of clock-related msrs in replacement of the old ones. In theory, we could just try to use them and get a return value indicating they do not exist, due to our use of kvm_write_msr_save. However, kvm clock registration happens very early, and if we ever try to write to a non-existant MSR, we raise a lethal #GP, since our idt handlers are not in place yet. So this patch tests for a cpuid feature exported by the host to decide which set of msrs are supported. Signed-off-by: Glauber Costa <glommer@redhat.com> Acked-by: Zachary Amsden <zamsden@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
|
#
7bd867df |
|
09-Sep-2009 |
Feng Tang <feng.tang@intel.com> |
x86: Move get/set_wallclock to x86_platform_ops get/set_wallclock() have already a set of platform dependent implementations (default, EFI, paravirt). MRST will add another variant. Moving them to platform ops simplifies the existing code and minimizes the effort to integrate new variants. Signed-off-by: Feng Tang <feng.tang@intel.com> LKML-Reference: <new-submission> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
a20316d2 |
|
31-Aug-2009 |
Glauber Costa <glommer@redhat.com> |
KVM guest: fix bogus wallclock physical address calculation The use of __pa() to calculate the address of a C-visible symbol is wrong, and can lead to unpredictable results. See arch/x86/include/asm/page.h for details. It should be replaced with __pa_symbol(), that does the correct math here, by taking relocations into account. This ensures the correct wallclock data structure physical address is passed to the hypervisor. Cc: stable@kernel.org Signed-off-by: Glauber Costa <glommer@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
|
#
2d826404 |
|
20-Aug-2009 |
Thomas Gleixner <tglx@linutronix.de> |
x86: Move tsc_calibration to x86_init_ops TSC calibration is modified by the vmware hypervisor and paravirt by separate means. Moorestown wants to add its own calibration routine as well. So make calibrate_tsc a proper x86_init_ops function and override it by paravirt or by the early setup of the vmware hypervisor. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
736decac |
|
18-Aug-2009 |
Thomas Gleixner <tglx@linutronix.de> |
x86: Move percpu clockevents setup to x86_init_ops paravirt overrides the setup of the default apic timers as per cpu timers. Moorestown needs to override that as well. Move it to x86_init_ops setup and create a separate x86_cpuinit struct which holds the function for the secondary evtl. hotplugabble CPUs. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
8e19608e |
|
21-Apr-2009 |
Magnus Damm <damm@igel.co.jp> |
clocksource: pass clocksource to read() callback Pass clocksource pointer to the read() callback for clocksources. This allows us to share the callback between multiple instances. [hugh@veritas.com: fix powerpc build of clocksource pass clocksource mods] [akpm@linux-foundation.org: cleanup] Signed-off-by: Magnus Damm <damm@igel.co.jp> Acked-by: John Stultz <johnstul@us.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Hugh Dickins <hugh@veritas.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
8e6dafd6 |
|
22-Feb-2009 |
Ingo Molnar <mingo@elte.hu> |
x86: refactor x86_quirks support Impact: cleanup Make x86_quirks support more transparent. The highlevel methods are now named: extern void x86_quirk_pre_intr_init(void); extern void x86_quirk_intr_init(void); extern void x86_quirk_trap_init(void); extern void x86_quirk_pre_time_init(void); extern void x86_quirk_time_init(void); This makes it clear that if some platform extension has to do something here that it is considered ... weird, and is discouraged. Also remove arch_hooks.h and move it into setup.h (and other header files where appropriate). Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
e93353c9 |
|
05-Dec-2008 |
Eduardo Habkost <ehabkost@redhat.com> |
x86: KVM guest: kvm_get_tsc_khz: return khz, not lpj kvm_get_tsc_khz() currently returns the previously-calculated preset_lpj value, but it is in loops-per-jiffy, not kHz. The current code works correctly only when HZ=1000. Signed-off-by: Eduardo Habkost <ehabkost@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
|
#
423cd25a |
|
24-Nov-2008 |
Glauber Costa <glommer@redhat.com> |
x86: KVM guest: sign kvmclock as paravirt Currently, we only set the KVM paravirt signature in case of CONFIG_KVM_GUEST. However, it is possible to have it turned off, while CONFIG_KVM_CLOCK is turned on. This is also a paravirt case, and should be shown accordingly. Signed-off-by: Glauber Costa <glommer@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
|
#
23a14b9e |
|
22-Nov-2008 |
Al Viro <viro@ftp.linux.org.uk> |
kvm_setup_secondary_clock() is cpuinit Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
a29a2af3 |
|
29-Oct-2008 |
Rakib Mullick <rakib.mullick@gmail.com> |
x86: KVM guest: fix section mismatch warning in kvmclock.c WARNING: arch/x86/kernel/built-in.o(.text+0x1722c): Section mismatch in reference from the function kvm_setup_secondary_clock() to the function .devinit.text:setup_secondary_APIC_clock() The function kvm_setup_secondary_clock() references the function __devinit setup_secondary_APIC_clock(). This is often because kvm_setup_secondary_clock lacks a __devinit annotation or the annotation of setup_secondary_APIC_clock is wrong. Signed-off-by: Md.Rakib H. Mullick <rakib.mullick@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Avi Kivity <avi@redhat.com>
|
#
0293615f |
|
28-Jul-2008 |
Glauber Costa <gcosta@redhat.com> |
x86: KVM guest: use paravirt function to calculate cpu khz We're currently facing timing problems in guests that do calibration under heavy load, and then the load vanishes. This means we'll have a much lower lpj than we actually should, and delays end up taking less time than they should, which is a nasty bug. Solution is to pass on the lpj value from host to guest, and have it preset. Signed-off-by: Glauber Costa <gcosta@redhat.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
|
#
7e37c299 |
|
30-Jun-2008 |
Adrian Bunk <bunk@kernel.org> |
x86: KVM guest: make kvm_smp_prepare_boot_cpu() static This patch makes the needlessly global kvm_smp_prepare_boot_cpu() static. Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: Avi Kivity <avi@qumranet.com>
|
#
f6e16d5a |
|
03-Jun-2008 |
Gerd Hoffmann <kraxel@redhat.com> |
x86: KVM guest: Use the paravirt clocksource structs and functions This patch updates the kvm host code to use the pvclock structs and functions, thereby making it compatible with Xen. The patch also fixes an initialization bug: on SMP systems the per-cpu has two different locations early at boot and after CPU bringup. kvmclock must take that in account when registering the physical address within the host. Signed-off-by: Gerd Hoffmann <kraxel@redhat.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
|
#
2ddfd20e |
|
22-May-2008 |
Ingo Molnar <mingo@elte.hu> |
namespacecheck: automated fixes Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
b8ba5f10 |
|
29-Apr-2008 |
Glauber Costa <gcosta@redhat.com> |
x86: KVM geust: make setup_secondary_clock definition dependent on local apic Since the pv_apic_ops are only present if CONFIG_X86_LOCAL_APIC is compiled in, kvmclock failed to build without this option. This patch fixes this. Signed-off-by: Glauber Costa <gcosta@redhat.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
|
#
1e977aa1 |
|
17-Mar-2008 |
Glauber Costa <gcosta@redhat.com> |
x86: KVM guest: disable clock before rebooting. This patch writes 0 (actually, what really matters is that the LSB is cleared) to the system time msr before shutting down the machine for kexec. Without it, we can have a random memory location being written when the guest comes back It overrides the functions shutdown, used in the path of kernel_kexec() (sys.c) and crash_shutdown, used in the path of crash_kexec() (kexec.c) Signed-off-by: Glauber Costa <gcosta@redhat.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
|
#
790c73f6 |
|
15-Feb-2008 |
Glauber de Oliveira Costa <gcosta@redhat.com> |
x86: KVM guest: paravirtualized clocksource This is the guest part of kvm clock implementation It does not do tsc-only timing, as tsc can have deltas between cpus, and it did not seem worthy to me to keep adjusting them. We do use it, however, for fine-grained adjustment. Other than that, time comes from the host. [randy dunlap: add missing include] [randy dunlap: disallow on Voyager or Visual WS] Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com> Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
|