History log of /linux-master/arch/x86/kvm/pmu.h
Revision Date Author Comments
# f19063b1 09-Nov-2023 Sean Christopherson <seanjc@google.com>

KVM: x86/pmu: Snapshot event selectors that KVM emulates in software

Snapshot the event selectors for the events that KVM emulates in software,
which is currently instructions retired and branch instructions retired.
The event selectors a tied to the underlying CPU, i.e. are constant for a
given platform even though perf doesn't manage the mappings as such.

Getting the event selectors from perf isn't exactly cheap, especially if
mitigations are enabled, as at least one indirect call is involved.

Snapshot the values in KVM instead of optimizing perf as working with the
raw event selectors will be required if KVM ever wants to emulate events
that aren't part of perf's uABI, i.e. that don't have an "enum perf_hw_id"
entry.

Link: https://lore.kernel.org/r/20231110022857.1273836-8-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>


# e5a65d4f 09-Nov-2023 Sean Christopherson <seanjc@google.com>

KVM: x86/pmu: Add macros to iterate over all PMCs given a bitmap

Add and use kvm_for_each_pmc() to dedup a variety of open coded for-loops
that iterate over valid PMCs given a bitmap (and because seeing checkpatch
whine about bad macro style is always amusing).

No functional change intended.

Link: https://lore.kernel.org/r/20231110022857.1273836-6-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>


# b31880ca 09-Nov-2023 Sean Christopherson <seanjc@google.com>

KVM: x86/pmu: Move pmc_idx => pmc translation helper to common code

Add a common helper for *internal* PMC lookups, and delete the ops hook
and Intel's implementation. Keep AMD's implementation, but rename it to
amd_pmu_get_pmc() to make it somewhat more obvious that it's suited for
both KVM-internal and guest-initiated lookups.

Because KVM tracks all counters in a single bitmap, getting a counter
when iterating over a bitmap, e.g. of all valid PMCs, requires a small
amount of math, that while simple, isn't super obvious and doesn't use the
same semantics as PMC lookups from RDPMC! Although AMD doesn't support
fixed counters, the common PMU code still behaves as if there a split, the
high half of which just happens to always be empty.

Opportunstically add a comment to explain both what is going on, and why
KVM uses a single bitmap, e.g. the boilerplate for iterating over separate
bitmaps could be done via macros, so it's not (just) about deduplicating
code.

Link: https://lore.kernel.org/r/20231110022857.1273836-4-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>


# be6b067d 09-Nov-2023 Sean Christopherson <seanjc@google.com>

KVM: x86/pmu: Add common define to capture fixed counters offset

Add a common define to "officially" solidify KVM's split of counters,
i.e. to commit to using bits 31:0 to track general purpose counters and
bits 63:32 to track fixed counters (which only Intel supports). KVM
already bleeds this behavior all over common PMU code, and adding a KVM-
defined macro allows clarifying that the value is a _base_, as oppposed to
the _flag_ that is used to access fixed PMCs via RDPMC (which perf
confusingly calls INTEL_PMC_FIXED_RDPMC_BASE).

No functional change intended.

Link: https://lore.kernel.org/r/20231110022857.1273836-3-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>


# 7bb7fce1 09-Jan-2024 Sean Christopherson <seanjc@google.com>

KVM: x86/pmu: Prioritize VMX interception over #GP on RDPMC due to bad index

Apply the pre-intercepts RDPMC validity check only to AMD, and rename all
relevant functions to make it as clear as possible that the check is not a
standard PMC index check. On Intel, the basic rule is that only invalid
opcodes and privilege/permission/mode checks have priority over VM-Exit,
i.e. RDPMC with an invalid index should VM-Exit, not #GP. While the SDM
doesn't explicitly call out RDPMC, it _does_ explicitly use RDMSR of a
non-existent MSR as an example where VM-Exit has priority over #GP, and
RDPMC is effectively just a variation of RDMSR.

Manually testing on various Intel CPUs confirms this behavior, and the
inverted priority was introduced for SVM compatibility, i.e. was not an
intentional change for Intel PMUs. On AMD, *all* exceptions on RDPMC have
priority over VM-Exit.

Check for a NULL kvm_pmu_ops.check_rdpmc_early instead of using a RET0
static call so as to provide a convenient location to document the
difference between Intel and AMD, and to again try to make it as obvious
as possible that the early check is a one-off thing, not a generic "is
this PMC valid?" helper.

Fixes: 8061252ee0d2 ("KVM: SVM: Add intercept checks for remaining twobyte instructions")
Cc: Jim Mattson <jmattson@google.com>
Tested-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Link: https://lore.kernel.org/r/20240109230250.424295-8-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>


# cbbd1aa8 09-Jan-2024 Sean Christopherson <seanjc@google.com>

KVM: x86/pmu: Allow programming events that match unsupported arch events

Remove KVM's bogus restriction that the guest can't program an event whose
encoding matches an unsupported architectural event. The enumeration of
an architectural event only says that if a CPU supports an architectural
event, then the event can be programmed using the architectural encoding.
The enumeration does NOT say anything about the encoding when the CPU
doesn't report support the architectural event.

Preventing the guest from counting events whose encoding happens to match
an architectural event breaks existing functionality whenever Intel adds
an architectural encoding that was *ever* used for a CPU that doesn't
enumerate support for the architectural event, even if the encoding is for
the exact same event!

E.g. the architectural encoding for Top-Down Slots is 0x01a4. Broadwell
CPUs, which do not support the Top-Down Slots architectural event, 0x01a4
is a valid, model-specific event. Denying guest usage of 0x01a4 if/when
KVM adds support for Top-Down slots would break any Broadwell-based guest.

Reported-by: Kan Liang <kan.liang@linux.intel.com>
Closes: https://lore.kernel.org/all/2004baa6-b494-462c-a11f-8104ea152c6a@linux.intel.com
Fixes: a21864486f7e ("KVM: x86/pmu: Fix available_event_types check for REF_CPU_CYCLES event")
Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Reviewed-by: Jim Mattson <jmattson@google.com>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Tested-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Link: https://lore.kernel.org/r/20240109230250.424295-3-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>


# fd89499a 03-Nov-2023 Sean Christopherson <seanjc@google.com>

KVM: x86/pmu: Track emulated counter events instead of previous counter

Explicitly track emulated counter events instead of using the common
counter value that's shared with the hardware counter owned by perf.
Bumping the common counter requires snapshotting the pre-increment value
in order to detect overflow from emulation, and the snapshot approach is
inherently flawed.

Snapshotting the previous counter at every increment assumes that there is
at most one emulated counter event per emulated instruction (or rather,
between checks for KVM_REQ_PMU). That's mostly holds true today because
KVM only emulates (branch) instructions retired, but the approach will
fall apart if KVM ever supports event types that don't have a 1:1
relationship with instructions.

And KVM already has a relevant bug, as handle_invalid_guest_state()
emulates multiple instructions without checking KVM_REQ_PMU, i.e. could
miss an overflow event due to clobbering pmc->prev_counter. Not checking
KVM_REQ_PMU is problematic in both cases, but at least with the emulated
counter approach, the resulting behavior is delayed overflow detection,
as opposed to completely lost detection.

Tracking the emulated count fixes another bug where the snapshot approach
can signal spurious overflow due to incorporating both the emulated count
and perf's count in the check, i.e. if overflow is detected by perf, then
KVM's emulation will also incorrectly signal overflow. Add a comment in
the related code to call out the need to process emulated events *after*
pausing the perf event (big kudos to Mingwei for figuring out that
particular wrinkle).

Cc: Mingwei Zhang <mizhang@google.com>
Cc: Roman Kagan <rkagan@amazon.de>
Cc: Jim Mattson <jmattson@google.com>
Cc: Dapeng Mi <dapeng1.mi@linux.intel.com>
Cc: Like Xu <like.xu.linux@gmail.com>
Reviewed-by: Mingwei Zhang <mizhang@google.com>
Link: https://lore.kernel.org/r/20231103230541.352265-7-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>


# 89acf123 03-Nov-2023 Sean Christopherson <seanjc@google.com>

KVM: x86/pmu: Update sample period in pmc_write_counter()

Update a PMC's sample period in pmc_write_counter() to deduplicate code
across all callers of pmc_write_counter(). Opportunistically move
pmc_write_counter() into pmc.c now that it's doing more work. WRMSR isn't
such a hot path that an extra CALL+RET pair will be problematic, and the
order of function definitions needs to be changed anyways, i.e. now is a
convenient time to eat the churn.

Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Link: https://lore.kernel.org/r/20231103230541.352265-6-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>


# f2f63f7e 03-Nov-2023 Sean Christopherson <seanjc@google.com>

KVM: x86/pmu: Stop calling kvm_pmu_reset() at RESET (it's redundant)

Drop kvm_vcpu_reset()'s call to kvm_pmu_reset(), the call is performed
only for RESET, which is really just the same thing as vCPU creation,
and kvm_arch_vcpu_create() *just* called kvm_pmu_init(), i.e. there can't
possibly be any work to do.

Unlike Intel, AMD's amd_pmu_refresh() does fill all_valid_pmc_idx even if
guest CPUID is empty, but everything that is at all dynamic is guaranteed
to be '0'/NULL, e.g. it should be impossible for KVM to have already
created a perf event.

Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Link: https://lore.kernel.org/r/20231103230541.352265-4-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>


# cbb359d8 03-Nov-2023 Sean Christopherson <seanjc@google.com>

KVM: x86/pmu: Move PMU reset logic to common x86 code

Move the common (or at least "ignored") aspects of resetting the vPMU to
common x86 code, along with the stop/release helpers that are no used only
by the common pmu.c.

There is no need to manually handle fixed counters as all_valid_pmc_idx
tracks both fixed and general purpose counters, and resetting the vPMU is
far from a hot path, i.e. the extra bit of overhead to the PMC from the
index is a non-issue.

Zero fixed_ctr_ctrl in common code even though it's Intel specific.
Ensuring it's zero doesn't harm AMD/SVM in any way, and stopping the fixed
counters via all_valid_pmc_idx, but not clearing the associated control
bits, would be odd/confusing.

Make the .reset() hook optional as SVM no longer needs vendor specific
handling.

Cc: stable@vger.kernel.org
Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Link: https://lore.kernel.org/r/20231103230541.352265-2-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>


# b29a2acd 04-May-2023 Roman Kagan <rkagan@amazon.de>

KVM: x86/pmu: Truncate counter value to allowed width on write

Performance counters are defined to have width less than 64 bits. The
vPMU code maintains the counters in u64 variables but assumes the value
to fit within the defined width. However, for Intel non-full-width
counters (MSR_IA32_PERFCTRx) the value receieved from the guest is
truncated to 32 bits and then sign-extended to full 64 bits. If a
negative value is set, it's sign-extended to 64 bits, but then in
kvm_pmu_incr_counter() it's incremented, truncated, and compared to the
previous value for overflow detection.

That previous value is not truncated, so it always evaluates bigger than
the truncated new one, and a PMI is injected. If the PMI handler writes
a negative counter value itself, the vCPU never quits the PMI loop.

Turns out that Linux PMI handler actually does write the counter with
the value just read with RDPMC, so when no full-width support is exposed
via MSR_IA32_PERF_CAPABILITIES, and the guest initializes the counter to
a negative value, it locks up.

This has been observed in the field, for example, when the guest configures
atop to use perfevents and runs two instances of it simultaneously.

To address the problem, maintain the invariant that the counter value
always fits in the defined bit width, by truncating the received value
in the respective set_msr methods. For better readability, factor the
out into a helper function, pmc_write_counter(), shared by vmx and svm
parts.

Fixes: 9cd803d496e7 ("KVM: x86: Update vPMCs when retiring instructions")
Cc: stable@vger.kernel.org
Signed-off-by: Roman Kagan <rkagan@amazon.de>
Link: https://lore.kernel.org/all/20230504120042.785651-1-rkagan@amazon.de
Tested-by: Like Xu <likexu@tencent.com>
[sean: tweak changelog, s/set/write in the helper]
Signed-off-by: Sean Christopherson <seanjc@google.com>


# 6a08083f 02-Jun-2023 Like Xu <likexu@tencent.com>

KVM: x86/pmu: Disable vPMU if the minimum num of counters isn't met

Disable PMU support when running on AMD and perf reports fewer than four
general purpose counters. All AMD PMUs must define at least four counters
due to AMD's legacy architecture hardcoding the number of counters
without providing a way to enumerate the number of counters to software,
e.g. from AMD's APM:

The legacy architecture defines four performance counters (PerfCtrn)
and corresponding event-select registers (PerfEvtSeln).

Virtualizing fewer than four counters can lead to guest instability as
software expects four counters to be available. Rather than bleed AMD
details into the common code, just define a const unsigned int and
provide a convenient location to document why Intel and AMD have different
mins (in particular, AMD's lack of any way to enumerate less than four
counters to the guest).

Keep the minimum number of counters at Intel at one, even though old P6
and Core Solo/Duo processor effectively require a minimum of two counters.
KVM can, and more importantly has up until this point, supported a vPMU so
long as the CPU has at least one counter. Perf's support for P6/Core CPUs
does require two counters, but perf will happily chug along with a single
counter when running on a modern CPU.

Cc: Jim Mattson <jmattson@google.com>
Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Like Xu <likexu@tencent.com>
[sean: set Intel min to '1', not '2']
Link: https://lore.kernel.org/r/20230603011058.1038821-8-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>


# 13afa29a 02-Jun-2023 Like Xu <likexu@tencent.com>

KVM: x86/pmu: Provide Intel PMU's pmc_is_enabled() as generic x86 code

Move the Intel PMU implementation of pmc_is_enabled() to common x86 code
as pmc_is_globally_enabled(), and drop AMD's implementation. AMD PMU
currently supports only v1, and thus not PERF_GLOBAL_CONTROL, thus the
semantics for AMD are unchanged. And when support for AMD PMU v2 comes
along, the common behavior will also Just Work.

Signed-off-by: Like Xu <likexu@tencent.com>
Co-developed-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20230603011058.1038821-6-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>


# c85cdc1c 02-Jun-2023 Like Xu <likexu@tencent.com>

KVM: x86/pmu: Move handling PERF_GLOBAL_CTRL and friends to common x86

Move the handling of GLOBAL_CTRL, GLOBAL_STATUS, and GLOBAL_OVF_CTRL,
a.k.a. GLOBAL_STATUS_RESET, from Intel PMU code to generic x86 PMU code.
AMD PerfMonV2 defines three registers that have the same semantics as
Intel's variants, just with different names and indices. Conveniently,
since KVM virtualizes GLOBAL_CTRL on Intel only for PMU v2 and above, and
AMD's version shows up in v2, KVM can use common code for the existence
check as well.

Signed-off-by: Like Xu <likexu@tencent.com>
Co-developed-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20230603011058.1038821-5-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>


# 8de18543 02-Jun-2023 Like Xu <likexu@tencent.com>

KVM: x86/pmu: Move reprogram_counters() to pmu.h

Move reprogram_counters() out of Intel specific PMU code and into pmu.h so
that it can be used to implement AMD PMU v2 support.

No functional change intended.

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Like Xu <likexu@tencent.com>
[sean: rewrite changelog]
Link: https://lore.kernel.org/r/20230603011058.1038821-3-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>


# 4fa5843d 10-Mar-2023 Like Xu <likexu@tencent.com>

KVM: x86/pmu: Fix a typo in kvm_pmu_request_counter_reprogam()

Fix a "reprogam" => "reprogram" typo in kvm_pmu_request_counter_reprogam().

Fixes: 68fb4757e867 ("KVM: x86/pmu: Defer reprogram_counter() to kvm_pmu_handle_event()")
Signed-off-by: Like Xu <likexu@tencent.com>
Link: https://lore.kernel.org/r/20230310113349.31799-1-likexu@tencent.com
[sean: trim the changelog]
Signed-off-by: Sean Christopherson <seanjc@google.com>


# 8911ce66 24-Jan-2023 Sean Christopherson <seanjc@google.com>

KVM: x86/pmu: Cap kvm_pmu_cap.num_counters_gp at KVM's internal max

Limit kvm_pmu_cap.num_counters_gp during kvm_init_pmu_capability() based
on the vendor PMU capabilities so that consuming num_counters_gp naturally
does the right thing. This fixes a mostly theoretical bug where KVM could
over-report its PMU support in KVM_GET_SUPPORTED_CPUID for leaf 0xA, e.g.
if the number of counters reported by perf is greater than KVM's
hardcoded internal limit. Incorporating input from the AMD PMU also
avoids over-reporting MSRs to save when running on AMD.

Link: https://lore.kernel.org/r/20230124234905.3774678-2-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>


# 2a3003e9 05-Dec-2022 Like Xu <likexu@tencent.com>

KVM: x86/pmu: Drop event_type and rename "struct kvm_event_hw_type_mapping"

After commit ("02791a5c362b KVM: x86/pmu: Use PERF_TYPE_RAW
to merge reprogram_{gp,fixed}counter()"), vPMU starts to directly
use the hardware event eventsel and unit_mask to reprogram perf_event,
and the event_type field in the "struct kvm_event_hw_type_mapping"
is simply no longer being used.

Convert the struct into an anonymous struct as the current name is
obsolete as the structure no longer has any mapping semantics, and
placing the struct definition directly above its sole user makes its
easier to understand what the array is filling in.

Signed-off-by: Like Xu <likexu@tencent.com>
Link: https://lore.kernel.org/r/20221205122048.16023-1-likexu@tencent.com
[sean: drop new comment, use anonymous struct]
Signed-off-by: Sean Christopherson <seanjc@google.com>


# 6a5cba7b 20-Dec-2022 Aaron Lewis <aaronlewis@google.com>

KVM: x86/pmu: Correct the mask used in a pmu event filter lookup

When checking if a pmu event the guest is attempting to program should
be filtered, only consider the event select + unit mask in that
decision. Use an architecture specific mask to mask out all other bits,
including bits 35:32 on Intel. Those bits are not part of the event
select and should not be considered in that decision.

Fixes: 66bb8a065f5a ("KVM: x86: PMU Event Filter")
Signed-off-by: Aaron Lewis <aaronlewis@google.com>
Link: https://lore.kernel.org/r/20221220161236.555143-2-aaronlewis@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>


# 4d7404e5 08-Feb-2023 Sean Christopherson <seanjc@google.com>

KVM: x86/pmu: Disable vPMU support on hybrid CPUs (host PMUs)

Disable KVM support for virtualizing PMUs on hosts with hybrid PMUs until
KVM gains a sane way to enumeration the hybrid vPMU to userspace and/or
gains a mechanism to let userspace opt-in to the dangers of exposing a
hybrid vPMU to KVM guests. Virtualizing a hybrid PMU, or at least part of
a hybrid PMU, is possible, but it requires careful, deliberate
configuration from userspace.

E.g. to expose full functionality, vCPUs need to be pinned to pCPUs to
prevent migrating a vCPU between a big core and a little core, userspace
must enumerate a reasonable topology to the guest, and guest CPUID must be
curated per vCPU to enumerate accurate vPMU capabilities.

The last point is especially problematic, as KVM doesn't control which
pCPU it runs on when enumerating KVM's vPMU capabilities to userspace,
i.e. userspace can't rely on KVM_GET_SUPPORTED_CPUID in it's current form.

Alternatively, userspace could enable vPMU support by enumerating the
set of features that are common and coherent across all cores, e.g. by
filtering PMU events and restricting guest capabilities. But again, that
requires userspace to take action far beyond reflecting KVM's supported
feature set into the guest.

For now, simply disable vPMU support on hybrid CPUs to avoid inducing
seemingly random #GPs in guests, and punt support for hybrid CPUs to a
future enabling effort.

Reported-by: Jianfeng Gao <jianfeng.gao@intel.com>
Cc: stable@vger.kernel.org
Cc: Andrew Cooper <Andrew.Cooper3@citrix.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Link: https://lore.kernel.org/all/20220818181530.2355034-1-kan.liang@linux.intel.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20230208204230.1360502-2-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>


# 55c590ad 07-Dec-2022 Like Xu <likexu@tencent.com>

KVM: x86/pmu: Prevent zero period event from being repeatedly released

The current vPMU can reuse the same pmc->perf_event for the same
hardware event via pmc_pause/resume_counter(), but this optimization
does not apply to a portion of the TSX events (e.g., "event=0x3c,in_tx=1,
in_tx_cp=1"), where event->attr.sample_period is legally zero at creation,
thus making the perf call to perf_event_period() meaningless (no need to
adjust sample period in this case), and instead causing such reusable
perf_events to be repeatedly released and created.

Avoid releasing zero sample_period events by checking is_sampling_event()
to follow the previously enable/disable optimization.

Signed-off-by: Like Xu <likexu@tencent.com>
Message-Id: <20221207071506.15733-2-likexu@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>


# 68fb4757 22-Sep-2022 Like Xu <likexu@tencent.com>

KVM: x86/pmu: Defer reprogram_counter() to kvm_pmu_handle_event()

Batch reprogramming PMU counters by setting KVM_REQ_PMU and thus
deferring reprogramming kvm_pmu_handle_event() to avoid reprogramming
a counter multiple times during a single VM-Exit.

Deferring programming will also allow KVM to fix a bug where immediately
reprogramming a counter can result in sleeping (taking a mutex) while
interrupts are disabled in the VM-Exit fastpath.

Introduce kvm_pmu_request_counter_reprogam() to make it obvious that
KVM is _requesting_ a reprogram and not actually doing the reprogram.

Opportunistically refine related comments to avoid misunderstandings.

Signed-off-by: Like Xu <likexu@tencent.com>
Link: https://lore.kernel.org/r/20220831085328.45489-5-likexu@tencent.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20220923001355.3741194-4-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>


# 545feb96 10-Jun-2022 Sean Christopherson <seanjc@google.com>

Revert "KVM: x86: always allow host-initiated writes to PMU MSRs"

Revert the hack to allow host-initiated accesses to all "PMU" MSRs,
as intel_is_valid_msr() returns true for _all_ MSRs, regardless of whether
or not it has a snowball's chance in hell of actually being a PMU MSR.

That mostly gets papered over by the actual get/set helpers only handling
MSRs that they knows about, except there's the minor detail that
kvm_pmu_{g,s}et_msr() eat reads and writes when the PMU is disabled.
I.e. KVM will happy allow reads and writes to _any_ MSR if the PMU is
disabled, either via module param or capability.

This reverts commit d1c88a4020567ba4da52f778bcd9619d87e4ea75.

Fixes: d1c88a402056 ("KVM: x86: always allow host-initiated writes to PMU MSRs")
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20220611005755.753273-5-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>


# 6ef25aa0 31-May-2022 Like Xu <likexu@tencent.com>

KVM: x86/pmu: Restrict advanced features based on module enable_pmu

Once vPMU is disabled, the KVM would not expose features like:
PEBS (via clear kvm_pmu_cap.pebs_ept), legacy LBR and ARCH_LBR,
CPUID 0xA leaf, PDCM bit and MSR_IA32_PERF_CAPABILITIES, plus
PT_MODE_HOST_GUEST mode.

What this group of features has in common is that their use
relies on the underlying PMU counter and the host perf_event as a
back-end resource requester or sharing part of the irq delivery path.

Signed-off-by: Like Xu <likexu@tencent.com>
Message-Id: <20220601031925.59693-2-likexu@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>


# b9181c8e 31-May-2022 Like Xu <likexu@tencent.com>

KVM: x86/pmu: Avoid exposing Intel BTS feature

The BTS feature (including the ability to set the BTS and BTINT
bits in the DEBUGCTL MSR) is currently unsupported on KVM.

But we may try using the BTS facility on a PEBS enabled guest like this:
perf record -e branches:u -c 1 -d ls
and then we would encounter the following call trace:

[] unchecked MSR access error: WRMSR to 0x1d9 (tried to write 0x00000000000003c0)
at rIP: 0xffffffff810745e4 (native_write_msr+0x4/0x20)
[] Call Trace:
[] intel_pmu_enable_bts+0x5d/0x70
[] bts_event_add+0x54/0x70
[] event_sched_in+0xee/0x290

As it lacks any CPUID indicator or perf_capabilities valid bit
fields to prompt for this information, the platform would hint
the Intel BTS feature unavailable to guest by setting the
BTS_UNAVAIL bit in the IA32_MISC_ENABLE.

Signed-off-by: Like Xu <likexu@tencent.com>
Message-Id: <20220601031925.59693-3-likexu@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>


# d7808f73 18-May-2022 Like Xu <likexu@tencent.com>

KVM: x86/pmu: Update global enable_pmu when PMU is undetected

On some virt platforms (L1 guest w/o PMU), the value of module parameter
'enable_pmu' for nested L2 guests should be updated at initialisation.

Considering that there is no concept of "architecture pmu" in AMD or Hygon
and that the versions (prior to Zen 4) are all 0, but that the theoretical
available counters are at least AMD64_NUM_COUNTERS, the utility
check_hw_exists() is reused in the initialisation call path.

Opportunistically update Intel specific comments.

Fixes: 8eeac7e999e8 ("KVM: x86/pmu: Add kvm_pmu_cap to optimize perf_get_x86_pmu_capability")
Signed-off-by: Like Xu <likexu@tencent.com>
Message-Id: <20220518170118.66263-3-likexu@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>


# 7aadaa98 18-May-2022 Like Xu <likexu@tencent.com>

KVM: x86/pmu: Drop amd_event_mapping[] in the KVM context

All gp or fixed counters have been reprogrammed using PERF_TYPE_RAW,
which means that the table that maps perf_hw_id to event select values is
no longer useful, at least for AMD.

For Intel, the logic to check if the pmu event reported by Intel cpuid is
not available is still required, in which case pmc_perf_hw_id() could be
renamed to hw_event_is_unavail() and a bool value is returned to replace
the semantics of "PERF_COUNT_HW_MAX+1".

Signed-off-by: Like Xu <likexu@tencent.com>
Message-Id: <20220518132512.37864-12-likexu@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>


# e99fae6e 25-May-2022 Paolo Bonzini <pbonzini@redhat.com>

KVM: x86/pmu: Use only the uniform interface reprogram_counter()

Since reprogram_counter(), reprogram_{gp, fixed}_counter() currently have
the same incoming parameter "struct kvm_pmc *pmc", the callers can simplify
the conetxt by using uniformly exported interface, which makes reprogram_
{gp, fixed}_counter() static and eliminates EXPORT_SYMBOL_GPL.

Signed-off-by: Like Xu <likexu@tencent.com>
Message-Id: <20220518132512.37864-8-likexu@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>


# 76d287b2 18-May-2022 Like Xu <likexu@tencent.com>

KVM: x86/pmu: Drop "u8 ctrl, int idx" for reprogram_fixed_counter()

Since afrer reprogram_fixed_counter() is called, it's bound to assign
the requested fixed_ctr_ctrl to pmu->fixed_ctr_ctrl, this assignment step
can be moved forward (the stale value for diff is saved extra early),
thus simplifying the passing of parameters.

No functional change intended.

Signed-off-by: Like Xu <likexu@tencent.com>
Message-Id: <20220518132512.37864-7-likexu@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>


# fb121aaf 18-May-2022 Like Xu <likexu@tencent.com>

KVM: x86/pmu: Drop "u64 eventsel" for reprogram_gp_counter()

Because inside reprogram_gp_counter() it is bound to assign the requested
eventel to pmc->eventsel, this assignment step can be moved forward, thus
simplifying the passing of parameters to "struct kvm_pmc *pmc" only.

No functional change intended.

Signed-off-by: Like Xu <likexu@tencent.com>
Message-Id: <20220518132512.37864-6-likexu@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>


# a40239b4 18-May-2022 Like Xu <likexu@tencent.com>

KVM: x86/pmu: Pass only "struct kvm_pmc *pmc" to reprogram_counter()

Passing the reference "struct kvm_pmc *pmc" when creating
pmc->perf_event is sufficient. This change helps to simplify the
calling convention by replacing reprogram_{gp, fixed}_counter()
with reprogram_counter() seamlessly.

No functional change intended.

Signed-off-by: Like Xu <likexu@tencent.com>
Message-Id: <20220518132512.37864-5-likexu@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>


# d1c88a40 25-May-2022 Paolo Bonzini <pbonzini@redhat.com>

KVM: x86: always allow host-initiated writes to PMU MSRs

Whenever an MSR is part of KVM_GET_MSR_INDEX_LIST, it has to be always
retrievable and settable with KVM_GET_MSR and KVM_SET_MSR. Accept
the PMU MSRs unconditionally in intel_is_valid_msr, if the access was
host-initiated.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>


# ec4036ed 20-May-2022 Paolo Bonzini <pbonzini@redhat.com>

KVM: x86/pmu: remove useless prototype

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>


# 43d62d10 18-May-2022 Like Xu <likexu@tencent.com>

KVM: x86/pmu: Move the vmx_icl_pebs_cpu[] definition out of the header file

Defining a static const array in a header file would introduce redundant
definitions to the point of confusing semantics, and such a use case would
only bring complaints from the compiler:

arch/x86/kvm/pmu.h:20:32: warning: ‘vmx_icl_pebs_cpu’ defined but not used [-Wunused-const-variable=]
20 | static const struct x86_cpu_id vmx_icl_pebs_cpu[] = {
| ^~~~~~~~~~~~~~~~

Fixes: a095df2c5f48 ("KVM: x86/pmu: Adjust precise_ip to emulate Ice Lake guest PDIR counter")
Signed-off-by: Like Xu <likexu@tencent.com>
Message-Id: <20220518170118.66263-1-likexu@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>


# 968635ab 11-Apr-2022 Like Xu <likexu@tencent.com>

KVM: x86/pmu: Add kvm_pmu_cap to optimize perf_get_x86_pmu_capability

The information obtained from the interface perf_get_x86_pmu_capability()
doesn't change, so an exported "struct x86_pmu_capability" is introduced
for all guests in the KVM, and it's initialized before hardware_setup().

Signed-off-by: Like Xu <likexu@tencent.com>
Message-Id: <20220411101946.20262-16-likexu@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>


# 63f21f32 11-Apr-2022 Like Xu <like.xu@linux.intel.com>

KVM: x86/pmu: Move pmc_speculative_in_use() to arch/x86/kvm/pmu.h

It allows this inline function to be reused by more callers in
more files, such as pmu_intel.c.

Signed-off-by: Like Xu <like.xu@linux.intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Message-Id: <20220411101946.20262-14-likexu@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>


# 6ebe4436 11-Apr-2022 Like Xu <like.xu@linux.intel.com>

KVM: x86/pmu: Adjust precise_ip to emulate Ice Lake guest PDIR counter

The PEBS-PDIR facility on Ice Lake server is supported on IA31_FIXED0 only.
If the guest configures counter 32 and PEBS is enabled, the PEBS-PDIR
facility is supposed to be used, in which case KVM adjusts attr.precise_ip
to 3 and request host perf to assign the exactly requested counter or fail.

The CPU model check is also required since some platforms may place the
PEBS-PDIR facility in another counter index.

Signed-off-by: Like Xu <like.xu@linux.intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Message-Id: <20220411101946.20262-10-likexu@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>


# 8f969c0c 29-Mar-2022 Like Xu <likexu@tencent.com>

KVM: x86: Copy kvm_pmu_ops by value to eliminate layer of indirection

Replace the kvm_pmu_ops pointer in common x86 with an instance of the
struct to save one pointer dereference when invoking functions. Copy the
struct by value to set the ops during kvm_init().

Signed-off-by: Like Xu <likexu@tencent.com>
[sean: Move pmc_is_enabled(), make kvm_pmu_ops static]
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20220329235054.3534728-3-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>


# 75189d1d 08-Apr-2022 Like Xu <likexu@tencent.com>

KVM: x86/pmu: Update AMD PMC sample period to fix guest NMI-watchdog

NMI-watchdog is one of the favorite features of kernel developers,
but it does not work in AMD guest even with vPMU enabled and worse,
the system misrepresents this capability via /proc.

This is a PMC emulation error. KVM does not pass the latest valid
value to perf_event in time when guest NMI-watchdog is running, thus
the perf_event corresponding to the watchdog counter will enter the
old state at some point after the first guest NMI injection, forcing
the hardware register PMC0 to be constantly written to 0x800000000001.

Meanwhile, the running counter should accurately reflect its new value
based on the latest coordinated pmc->counter (from vPMC's point of view)
rather than the value written directly by the guest.

Fixes: 168d918f2643 ("KVM: x86: Adjust counter sample period after a wrmsr")
Reported-by: Dongli Cao <caodongli@kingsoft.com>
Signed-off-by: Like Xu <likexu@tencent.com>
Reviewed-by: Yanan Wang <wangyanan55@huawei.com>
Tested-by: Yanan Wang <wangyanan55@huawei.com>
Reviewed-by: Jim Mattson <jmattson@google.com>
Message-Id: <20220409015226.38619-1-likexu@tencent.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>


# 0144ba0c 01-Feb-2022 Wei Wang <wei.w.wang@intel.com>

KVM: x86: use the KVM side max supported fixed counter

KVM vPMU doesn't support to emulate all the fixed counters that the
host PMU driver has supported, e.g. the fixed counter 3 used by
Topdown metrics hasn't been supported by KVM so far.

Rename MAX_FIXED_COUNTERS to KVM_PMC_MAX_FIXED to have a more
straightforward naming convention as INTEL_PMC_MAX_FIXED used by the
host PMU driver, and fix vPMU to use the KVM side KVM_PMC_MAX_FIXED
for the virtual fixed counter emulation, instead of the host side
INTEL_PMC_MAX_FIXED.

Signed-off-by: Wei Wang <wei.w.wang@intel.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/1643750603-100733-2-git-send-email-kan.liang@linux.intel.com


# 9cd803d4 30-Nov-2021 Eric Hankland <ehankland@google.com>

KVM: x86: Update vPMCs when retiring instructions

When KVM retires a guest instruction through emulation, increment any
vPMCs that are configured to monitor "instructions retired," and
update the sample period of those counters so that they will overflow
at the right time.

Signed-off-by: Eric Hankland <ehankland@google.com>
[jmattson:
- Split the code to increment "branch instructions retired" into a
separate commit.
- Added 'static' to kvm_pmu_incr_counter() definition.
- Modified kvm_pmu_incr_counter() to check pmc->perf_event->state ==
PERF_EVENT_STATE_ACTIVE.
]
Fixes: f5132b01386b ("KVM: Expose a version 2 architectural PMU to a guests")
Signed-off-by: Jim Mattson <jmattson@google.com>
[likexu:
- Drop checks for pmc->perf_event or event state or event type
- Increase a counter once its umask bits and the first 8 select bits are matched
- Rewrite kvm_pmu_incr_counter() with a less invasive approach to the host perf;
- Rename kvm_pmu_record_event to kvm_pmu_trigger_event;
- Add counter enable and CPL check for kvm_pmu_trigger_event();
]
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Like Xu <likexu@tencent.com>
Message-Id: <20211130074221.93635-6-likexu@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>


# 6ed1298e 30-Nov-2021 Like Xu <likexu@tencent.com>

KVM: x86/pmu: Reuse pmc_perf_hw_id() and drop find_fixed_event()

Since we set the same semantic event value for the fixed counter in
pmc->eventsel, returning the perf_hw_id for the fixed counter via
find_fixed_event() can be painlessly replaced by pmc_perf_hw_id()
with the help of pmc_is_fixed() check.

Signed-off-by: Like Xu <likexu@tencent.com>
Message-Id: <20211130074221.93635-4-likexu@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>


# 7c174f30 30-Nov-2021 Like Xu <likexu@tencent.com>

KVM: x86/pmu: Refactoring find_arch_event() to pmc_perf_hw_id()

The find_arch_event() returns a "unsigned int" value,
which is used by the pmc_reprogram_counter() to
program a PERF_TYPE_HARDWARE type perf_event.

The returned value is actually the kernel defined generic
perf_hw_id, let's rename it to pmc_perf_hw_id() with simpler
incoming parameters for better self-explanation.

Signed-off-by: Like Xu <likexu@tencent.com>
Message-Id: <20211130074221.93635-3-likexu@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>


# e6cd31f1 05-Nov-2021 Jim Mattson <jmattson@google.com>

kvm: x86: Convert return type of *is_valid_rdpmc_ecx() to bool

These function names sound like predicates, and they have siblings,
*is_valid_msr(), which _are_ predicates. Moreover, there are comments
that essentially warn that these functions behave unexpectedly.

Flip the polarity of the return values, so that they become
predicates, and convert the boolean result to a success/failure code
at the outer call site.

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Jim Mattson <jmattson@google.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20211105202058.1048757-1-jmattson@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>


# e79f49c3 28-Jul-2021 Like Xu <likexu@tencent.com>

KVM: x86/pmu: Introduce pmc->is_paused to reduce the call time of perf interfaces

Based on our observations, after any vm-exit associated with vPMU, there
are at least two or more perf interfaces to be called for guest counter
emulation, such as perf_event_{pause, read_value, period}(), and each one
will {lock, unlock} the same perf_event_ctx. The frequency of calls becomes
more severe when guest use counters in a multiplexed manner.

Holding a lock once and completing the KVM request operations in the perf
context would introduce a set of impractical new interfaces. So we can
further optimize the vPMU implementation by avoiding repeated calls to
these interfaces in the KVM context for at least one pattern:

After we call perf_event_pause() once, the event will be disabled and its
internal count will be reset to 0. So there is no need to pause it again
or read its value. Once the event is paused, event period will not be
updated until the next time it's resumed or reprogrammed. And there is
also no need to call perf_event_period twice for a non-running counter,
considering the perf_event for a running counter is never paused.

Based on this implementation, for the following common usage of
sampling 4 events using perf on a 4u8g guest:

echo 0 > /proc/sys/kernel/watchdog
echo 25 > /proc/sys/kernel/perf_cpu_time_max_percent
echo 10000 > /proc/sys/kernel/perf_event_max_sample_rate
echo 0 > /proc/sys/kernel/perf_cpu_time_max_percent
for i in `seq 1 1 10`
do
taskset -c 0 perf record \
-e cpu-cycles -e instructions -e branch-instructions -e cache-misses \
/root/br_instr a
done

the average latency of the guest NMI handler is reduced from
37646.7 ns to 32929.3 ns (~1.14x speed up) on the Intel ICX server.
Also, in addition to collecting more samples, no loss of sampling
accuracy was observed compared to before the optimization.

Signed-off-by: Like Xu <likexu@tencent.com>
Message-Id: <20210728120705.6855-1-likexu@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>


# d9f6e12f 18-Mar-2021 Ingo Molnar <mingo@kernel.org>

x86: Fix various typos in comments

Fix ~144 single-word typos in arch/x86/ code comments.

Doing this in a single commit should reduce the churn.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: linux-kernel@vger.kernel.org


# 9aa4f622 31-Jan-2021 Like Xu <like.xu@linux.intel.com>

KVM: vmx/pmu: Release guest LBR event via lazy release mechanism

The vPMU uses GUEST_LBR_IN_USE_IDX (bit 58) in 'pmu->pmc_in_use' to
indicate whether a guest LBR event is still needed by the vcpu. If the
vcpu no longer accesses LBR related registers within a scheduling time
slice, and the enable bit of LBR has been unset, vPMU will treat the
guest LBR event as a bland event of a vPMC counter and release it
as usual. Also, the pass-through state of LBR records msrs is cancelled.

Signed-off-by: Like Xu <like.xu@linux.intel.com>
Message-Id: <20210201051039.255478-10-like.xu@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>


# e6209a3b 31-Jan-2021 Like Xu <like.xu@linux.intel.com>

KVM: vmx/pmu: Emulate legacy freezing LBRs on virtual PMI

The current vPMU only supports Architecture Version 2. According to
Intel SDM "17.4.7 Freezing LBR and Performance Counters on PMI", if
IA32_DEBUGCTL.Freeze_LBR_On_PMI = 1, the LBR is frozen on the virtual
PMI and the KVM would emulate to clear the LBR bit (bit 0) in
IA32_DEBUGCTL. Also, guest needs to re-enable IA32_DEBUGCTL.LBR
to resume recording branches.

Signed-off-by: Like Xu <like.xu@linux.intel.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Message-Id: <20210201051039.255478-9-like.xu@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>


# 2e8cd7a3 23-Jun-2020 Like Xu <like.xu@linux.intel.com>

kvm: x86: limit the maximum number of vPMU fixed counters to 3

Some new Intel platforms (such as TGL) already have the
fourth fixed counter TOPDOWN.SLOTS, but it has not been
fully enabled on KVM and the host.

Therefore, we limit edx.split.num_counters_fixed to 3,
so that it does not break the kvm-unit-tests PMU test
case and bad-handled userspace.

Signed-off-by: Like Xu <like.xu@linux.intel.com>
Message-Id: <20200624015928.118614-1-like.xu@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>


# cbd71758 29-May-2020 Wei Wang <wei.w.wang@intel.com>

KVM: x86/pmu: Tweak kvm_pmu_get_msr to pass 'struct msr_data' in

Change kvm_pmu_get_msr() to get the msr_data struct, as the host_initiated
field from the struct could be used by get_msr. This also makes this API
consistent with kvm_pmu_set_msr. No functional changes.

Signed-off-by: Wei Wang <wei.w.wang@intel.com>
Message-Id: <20200529074347.124619-2-like.xu@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>


# afaf0b2f 21-Mar-2020 Sean Christopherson <seanjc@google.com>

KVM: x86: Copy kvm_x86_ops by value to eliminate layer of indirection

Replace the kvm_x86_ops pointer in common x86 with an instance of the
struct to save one pointer dereference when invoking functions. Copy the
struct by value to set the ops during kvm_init().

Arbitrarily use kvm_x86_ops.hardware_enable to track whether or not the
ops have been initialized, i.e. a vendor KVM module has been loaded.

Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200321202603.19355-7-sean.j.christopherson@intel.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>


# 168d918f 21-Feb-2020 Eric Hankland <ehankland@google.com>

KVM: x86: Adjust counter sample period after a wrmsr

The sample_period of a counter tracks when that counter will
overflow and set global status/trigger a PMI. However this currently
only gets set when the initial counter is created or when a counter is
resumed; this updates the sample period after a wrmsr so running
counters will accurately reflect their new value.

Signed-off-by: Eric Hankland <ehankland@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>


# 13c5183a 11-Dec-2019 Marios Pomonis <pomonis@google.com>

KVM: x86: Protect MSR-based index computations in pmu.h from Spectre-v1/L1TF attacks

This fixes a Spectre-v1/L1TF vulnerability in the get_gp_pmc() and
get_fixed_pmc() functions.
They both contain index computations based on the (attacker-controlled)
MSR number.

Fixes: 25462f7f5295 ("KVM: x86/vPMU: Define kvm_pmu_ops to support vPMU function dispatch")

Signed-off-by: Nick Finco <nifi@google.com>
Signed-off-by: Marios Pomonis <pomonis@google.com>
Reviewed-by: Andrew Honig <ahonig@google.com>
Cc: stable@vger.kernel.org
Reviewed-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>


# 9477f444 13-Nov-2019 Oliver Upton <oupton@google.com>

KVM: VMX: Add helper to check reserved bits in IA32_PERF_GLOBAL_CTRL

Create a helper function to check the validity of a proposed value for
IA32_PERF_GLOBAL_CTRL from the existing check in intel_pmu_set_msr().

Per Intel's SDM, the reserved bits in IA32_PERF_GLOBAL_CTRL must be
cleared for the corresponding host/guest state fields.

Suggested-by: Jim Mattson <jmattson@google.com>
Co-developed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
Signed-off-by: Oliver Upton <oupton@google.com>
Reviewed-by: Jim Mattson <jmattson@google.com>
Reviewed-by: Peter Shier <pshier@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>


# b35e5548 27-Oct-2019 Like Xu <like.xu@linux.intel.com>

KVM: x86/vPMU: Add lazy mechanism to release perf_event per vPMC

Currently, a host perf_event is created for a vPMC functionality emulation.
It’s unpredictable to determine if a disabled perf_event will be reused.
If they are disabled and are not reused for a considerable period of time,
those obsolete perf_events would increase host context switch overhead that
could have been avoided.

If the guest doesn't WRMSR any of the vPMC's MSRs during an entire vcpu
sched time slice, and its independent enable bit of the vPMC isn't set,
we can predict that the guest has finished the use of this vPMC, and then
do request KVM_REQ_PMU in kvm_arch_sched_in and release those perf_events
in the first call of kvm_pmu_handle_event() after the vcpu is scheduled in.

This lazy mechanism delays the event release time to the beginning of the
next scheduled time slice if vPMC's MSRs aren't changed during this time
slice. If guest comes back to use this vPMC in next time slice, a new perf
event would be re-created via perf_event_create_kernel_counter() as usual.

Suggested-by: Wei Wang <wei.w.wang@intel.com>
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Like Xu <like.xu@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>


# a6da0d77 27-Oct-2019 Like Xu <like.xu@linux.intel.com>

KVM: x86/vPMU: Reuse perf_event to avoid unnecessary pmc_reprogram_counter

The perf_event_create_kernel_counter() in the pmc_reprogram_counter() is
a heavyweight and high-frequency operation, especially when host disables
the watchdog (maximum 21000000 ns) which leads to an unacceptable latency
of the guest NMI handler. It limits the use of vPMUs in the guest.

When a vPMC is fully enabled, the legacy reprogram_*_counter() would stop
and release its existing perf_event (if any) every time EVEN in most cases
almost the same requested perf_event will be created and configured again.

For each vPMC, if the reuqested config ('u64 eventsel' for gp and 'u8 ctrl'
for fixed) is the same as its current config AND a new sample period based
on pmc->counter is accepted by host perf interface, the current event could
be reused safely as a new created one does. Otherwise, do release the
undesirable perf_event and reprogram a new one as usual.

It's light-weight to call pmc_pause_counter (disable, read and reset event)
and pmc_resume_counter (recalibrate period and re-enable event) as guest
expects instead of release-and-create again on any condition. Compared to
use the filterable event->attr or hw.config, a new 'u64 current_config'
field is added to save the last original programed config for each vPMC.

Based on this implementation, the number of calls to pmc_reprogram_counter
is reduced by ~82.5% for a gp sampling event and ~99.9% for a fixed event.
In the usage of multiplexing perf sampling mode, the average latency of the
guest NMI handler is reduced from 104923 ns to 48393 ns (~2.16x speed up).
If host disables watchdog, the minimum latecy of guest NMI handler could be
speed up at ~3413x (from 20407603 to 5979 ns) and at ~786x in the average.

Suggested-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Like Xu <like.xu@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>


# c900c156 27-Oct-2019 Like Xu <like.xu@linux.intel.com>

KVM: x86/vPMU: Introduce a new kvm_pmu_ops->msr_idx_to_pmc callback

Introduce a new callback msr_idx_to_pmc that returns a struct kvm_pmc*,
and change kvm_pmu_is_valid_msr to return ".msr_idx_to_pmc(vcpu, msr) ||
.is_valid_msr(vcpu, msr)" and AMD just returns false from .is_valid_msr.

Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Like Xu <like.xu@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>


# 98ff80f5 27-Oct-2019 Like Xu <like.xu@linux.intel.com>

KVM: x86/vPMU: Rename pmu_ops callbacks from msr_idx to rdpmc_ecx

The leagcy pmu_ops->msr_idx_to_pmc is only called in kvm_pmu_rdpmc, so
this function actually receives the contents of ECX before RDPMC, and
translates it to a kvm_pmc. Let's clarify its semantic by renaming the
existing msr_idx_to_pmc to rdpmc_ecx_to_pmc, and is_valid_msr_idx to
is_valid_rdpmc_ecx; likewise for the wrapper kvm_pmu_is_valid_msr_idx.

Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Like Xu <like.xu@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>


# 66bb8a06 10-Jul-2019 Eric Hankland <ehankland@google.com>

KVM: x86: PMU Event Filter

Some events can provide a guest with information about other guests or the
host (e.g. L3 cache stats); providing the capability to restrict access
to a "safe" set of events would limit the potential for the PMU to be used
in any side channel attacks. This change introduces a new VM ioctl that
sets an event filter. If the guest attempts to program a counter for
any blacklisted or non-whitelisted event, the kernel counter won't be
created, so any RDPMC/RDMSR will show 0 instances of that event.

Signed-off-by: Eric Hankland <ehankland@google.com>
[Lots of changes. All remaining bugs are probably mine. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>


# 0e6f467e 20-May-2019 Paolo Bonzini <pbonzini@redhat.com>

KVM: x86/pmu: mask the result of rdpmc according to the width of the counters

This patch will simplify the changes in the next, by enforcing the
masking of the counters to RDPMC and RDMSR.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>


# 2d7921c4 12-Mar-2018 Arbel Moshe <arbel.moshe@oracle.com>

KVM: x86: Add support for VMware backdoor Pseudo-PMCs

VMware exposes the following Pseudo PMCs:
0x10000: Physical host TSC
0x10001: Elapsed real time in ns
0x10002: Elapsed apparent time in ns

For more info refer to:
https://www.vmware.com/files/pdf/techpaper/Timekeeping-In-VirtualMachines.pdf

VMware allows access to these Pseduo-PMCs even when read via RDPMC
in Ring3 and CR4.PCE=0. Therefore, commit modifies x86 emulator
to allow access to these PMCs in this situation. In addition,
emulation of these PMCs were added to kvm_pmu_rdpmc().

Signed-off-by: Arbel Moshe <arbel.moshe@oracle.com>
Signed-off-by: Liran Alon <liran.alon@oracle.com>
Reviewed-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>


# b2441318 01-Nov-2017 Greg Kroah-Hartman <gregkh@linuxfoundation.org>

License cleanup: add SPDX GPL-2.0 license identifier to files with no license

Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.

By default all files without license information are under the default
license of the kernel, which is GPL version 2.

Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier. The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.

This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.

How this work was done:

Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
- file had no licensing information it it.
- file was a */uapi/* one with no licensing information in it,
- file was a */uapi/* one with existing licensing information,

Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.

The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne. Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.

The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed. Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.

Criteria used to select files for SPDX license identifier tagging was:
- Files considered eligible had to be source code files.
- Make and config files were included as candidates if they contained >5
lines of source
- File already had some variant of a license header in it (even if <5
lines).

All documentation files were explicitly excluded.

The following heuristics were used to determine which SPDX license
identifiers to apply.

- when both scanners couldn't find any license traces, file was
considered to have no license information in it, and the top level
COPYING file license applied.

For non */uapi/* files that summary was:

SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 11139

and resulted in the first patch in this series.

If that file was a */uapi/* path one, it was "GPL-2.0 WITH
Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was:

SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 WITH Linux-syscall-note 930

and resulted in the second patch in this series.

- if a file had some form of licensing information in it, and was one
of the */uapi/* ones, it was denoted with the Linux-syscall-note if
any GPL family license was found in the file or had no licensing in
it (per prior point). Results summary:

SPDX license identifier # files
---------------------------------------------------|------
GPL-2.0 WITH Linux-syscall-note 270
GPL-2.0+ WITH Linux-syscall-note 169
((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21
((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17
LGPL-2.1+ WITH Linux-syscall-note 15
GPL-1.0+ WITH Linux-syscall-note 14
((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5
LGPL-2.0+ WITH Linux-syscall-note 4
LGPL-2.1 WITH Linux-syscall-note 3
((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3
((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1

and that resulted in the third patch in this series.

- when the two scanners agreed on the detected license(s), that became
the concluded license(s).

- when there was disagreement between the two scanners (one detected a
license but the other didn't, or they both detected different
licenses) a manual inspection of the file occurred.

- In most cases a manual inspection of the information in the file
resulted in a clear resolution of the license that should apply (and
which scanner probably needed to revisit its heuristics).

- When it was not immediately clear, the license identifier was
confirmed with lawyers working with the Linux Foundation.

- If there was any question as to the appropriate license identifier,
the file was flagged for further research and to be revisited later
in time.

In total, over 70 hours of logged manual review was done on the
spreadsheet to determine the SPDX license identifiers to apply to the
source files by Kate, Philippe, Thomas and, in some cases, confirmation
by lawyers working with the Linux Foundation.

Kate also obtained a third independent scan of the 4.13 code base from
FOSSology, and compared selected files where the other two scanners
disagreed against that SPDX file, to see if there was new insights. The
Windriver scanner is based on an older version of FOSSology in part, so
they are related.

Thomas did random spot checks in about 500 files from the spreadsheets
for the uapi headers and agreed with SPDX license identifier in the
files he inspected. For the non-uapi files Thomas did random spot checks
in about 15000 files.

In initial set of patches against 4.14-rc6, 3 files were found to have
copy/paste license identifier errors, and have been fixed to reflect the
correct identifier.

Additionally Philippe spent 10 hours this week doing a detailed manual
inspection and review of the 12,461 patched files from the initial patch
version early this week with:
- a full scancode scan run, collecting the matched texts, detected
license ids and scores
- reviewing anything where there was a license detected (about 500+
files) to ensure that the applied SPDX license was correct
- reviewing anything where there was no detection but the patch license
was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
SPDX license was correct

This produced a worksheet with 20 files needing minor correction. This
worksheet was then exported into 3 different .csv files for the
different types of files to be modified.

These .csv files were then reviewed by Greg. Thomas wrote a script to
parse the csv files and add the proper SPDX tag to the file, in the
format that the file expected. This script was further refined by Greg
based on the output to detect more types of files automatically and to
distinguish between header and source .c files (which need different
comment types.) Finally Greg ran the script using the .csv files to
generate the patches.

Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


# 25462f7f 19-Jun-2015 Wei Huang <wehuang@redhat.com>

KVM: x86/vPMU: Define kvm_pmu_ops to support vPMU function dispatch

This patch defines a new function pointer struct (kvm_pmu_ops) to
support vPMU for both Intel and AMD. The functions pointers defined in
this new struct will be linked with Intel and AMD functions later. In the
meanwhile the struct that maps from event_sel bits to PERF_TYPE_HARDWARE
events is renamed and moved from Intel specific code to kvm_host.h as a
common struct.

Reviewed-by: Joerg Roedel <jroedel@suse.de>
Tested-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Wei Huang <wei@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>


# 474a5bb9 19-Jun-2015 Wei Huang <wehuang@redhat.com>

KVM: x86/vPMU: introduce pmu.h header

This will be used for private function used by AMD- and Intel-specific
PMU implementations.

Signed-off-by: Wei Huang <wei@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>