History log of /freebsd-current/sys/x86/x86/mca.c
Revision Date Author Comments
# 792655ab 07-Aug-2023 Ed Maste <emaste@FreeBSD.org>

x86: make EARLY_AP_STARTUP mandatory

When early AP startup was introduced in 2016 it was put behind a kernel
option EARLY_AP_STARTUP as a transition aid, so that it could be turned
off if necessary. For x86 the non-EARLY_AP_STARTUP case is no longer
functional, so disallow it.

Other archs are still incompatible with EARLY_AP_STARTUP, so the option
cannot yet be removed entirely.

Reported by: wollman
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D41351


# 685dc743 16-Aug-2023 Warner Losh <imp@FreeBSD.org>

sys: Remove $FreeBSD$: one-line .c pattern

Remove /^[\s*]*__FBSDID\("\$FreeBSD\$"\);?\s*\n/


# 4d846d26 10-May-2023 Warner Losh <imp@FreeBSD.org>

spdx: The BSD-2-Clause-FreeBSD identifier is obsolete, drop -FreeBSD

The SPDX folks have obsoleted the BSD-2-Clause-FreeBSD identifier. Catch
up to that fact and revert to their recommended match of BSD-2-Clause.

Discussed with: pfg
MFC After: 3 days
Sponsored by: Netflix


# 9755e244 04-Sep-2022 Gordon Bergling <gbe@FreeBSD.org>

x86: Correct a typo in source code comment

- s/occured/occurred/

MFC after: 3 days


# ac64943c 05-Aug-2022 Alexander Motin <mav@FreeBSD.org>

mca: Add sysctl to mute corrected errors.

Setting hw.mca.log_corrected to 0 will mute corrected errors logging
except ones marked as reaching Yellow threshold by hardware.

MFC after: 1 week


# 63346fef 08-Dec-2021 Alexander Motin <mav@FreeBSD.org>

mca: Some error handling logic improvements.

- Enable local MCEs on capable Intel CPUs. It delivers exceptions
only to the affected CPU instead of global broadcast, requiring a lot
of synchronization between CPUs. AMD always deliver MCEs locally.
- Make MCE handler process only uncorrected errors, while CMCI and
polling only corrected. It reduces synchronization problems between
them and is explicitly recommended by the documentation.
- Add minimal support for uncorrected software recoverable errors
on Intel CPUs. It allows to avoid kernel panics in case uncorrected
errors do not affect current operation, like ones found during scrub
or write. Such errors are only logged, postponing the panic until
the corrupted data will actually be needed (that may never happen).
- Reduce polling period from 1 hour to 5 minutes.

MFC after: 2 weeks


# 9a128e16 07-Dec-2021 Alexander Motin <mav@FreeBSD.org>

mca: Switch to using taskqueue_enqueue_timeout_sbt().

Previously it was not allowed on fast taskqueues. It was fixed in
4730a8972b1f. This should make no functional change, just a bit
cleaner and efficient code.

MFC after: 1 week


# 3bdba24c 07-Dec-2021 Alexander Motin <mav@FreeBSD.org>

mca: Decode new Intel status bits.

MFC after: 1 week


# 935dc0de 07-Dec-2021 Alexander Motin <mav@FreeBSD.org>

mca: Remove excessively verbose debug messages.

Expecially in case of AMD there was more than dozen lines per CPU.

MFC after: 1 week


# c2003f26 07-Dec-2021 Alexander Motin <mav@FreeBSD.org>

mca: Make some sysctls also a loader tunables.

MFC after: 1 week


# b5770470 08-Feb-2021 Mark Johnston <markj@FreeBSD.org>

mca: Handle inconsistent CMCI capability reporting

A BIOS bug may apparently cause the BSP to report that it does not
implement CMCI, with some APs reporting that they do. In this scenario,
avoid a NULL pointer dereference that occurs in cmci_monitor() because
cmc_state was not allocated by the BSP.

PR: 253272
Reported by: asomers, mmacy
Reviewed by: kib (previous version)
MFC after: 1 week


# ab6c81a2 01-Sep-2020 Mateusz Guzik <mjg@FreeBSD.org>

x86: clean up empty lines in .c and .h files


# 7029da5c 26-Feb-2020 Pawel Biernacki <kaktus@FreeBSD.org>

Mark more nodes as CTLFLAG_MPSAFE or CTLFLAG_NEEDGIANT (17 of many)

r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are
still not MPSAFE (or already are but aren’t properly marked).
Use it in preparation for a general review of all nodes.

This is non-functional change that adds annotations to SYSCTL_NODE and
SYSCTL_PROC nodes using one of the soon-to-be-required flags.

Mark all obvious cases as MPSAFE. All entries that haven't been marked
as MPSAFE before are by default marked as NEEDGIANT

Approved by: kib (mentor, blanket)
Commented by: kib, gallatin, melifaro
Differential Revision: https://reviews.freebsd.org/D23718


# 2ee49fac 21-Jan-2020 Konstantin Belousov <kib@FreeBSD.org>

Add support for Hygon Dhyana Family 18h processor.

As a new x86 CPU vendor, Chengdu Haiguang IC Design Co., Ltd (Hygon)
is a joint venture between AMD and Haiguang Information Technology Co.,
Ltd., aims at providing x86 processors for China server market.

The first generation Hygon processor(Dhyana) shares most architecture
with AMD's family 17h, but with different CPU vendor ID("HygonGenuine")
and PCI vendor ID(0x1d94) and family series number 18h(Hygon negotiated
with AMD to confirm that only Hygon use family 18h).

To enable Hygon Dhyana support in FreeBSD, add new definitions
HYGON_VENDOR_ID("HygonGenuine") and X86_VENDOR_HYGON(0x1d94) to identify
Hygon Dhyana CPU.

Initialize the CPU features(topology, local APIC ext, MSI, TSC, hwpstate,
MCA, DEBUG_CTL, etc) for amd64 and i386 mode by sharing the code path of
AMD family 17h.

The changes have been applied on FreeBSD 13.0-CURRENT and tested
successfully on Hygon Dhyana processor.

References:
[1] Linux kernel patches for Hygon Dhyana, merged in 4.20:

https://git.kernel.org/tip/c9661c1e80b609cd038db7c908e061f0535804ef

[2] MSR and CPUID definition:

https://www.amd.com/system/files/TechDocs/54945_PPR_Family_17h_Models_00h-0Fh.pdf

Submitted by: Pu Wen <puwen@hygon.cn>
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D23163


# ca8929d2 08-Jun-2019 Jonathan T. Looney <jtl@FreeBSD.org>

Currently, MCA entries remain on an every-growing linked list. This means
that it becomes increasingly expensive to process a steady stream of
correctable errors. Additionally, the memory used by the MCA entries can
grow without bound.

Change the code to maintain two separate lists: a list of entries which
still need to be logged, and a list of entries which have already been
logged. Additionally, allow a user-configurable limit on the number of
entries which will be saved after they are logged. (The limit defaults
to -1 [unlimited], which is the current behavior.)

Reviewed by: imp, jhb
MFC after: 2 weeks
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D20482


# 18f9bb6f 22-May-2019 Andrew Gallatin <gallatin@FreeBSD.org>

x86 MCA: introduce MCA hooks for different vendor implementations

This is needed for AMD SMCA processors, as SMCA uses different
MSR address for access MCA banks.

Use IA32 specific msr_ops as defualt, and use SMCA-specific msr_ops
when on an SMCA-enabled processor

Submitted by: chandu from amd dot com
Reviewed by: cem
Differential Revision: https://reviews.freebsd.org/D18055


# 215e4657 24-Apr-2018 Konstantin Belousov <kib@FreeBSD.org>

Ensure that cmci_monitor() is not executed in parallel, since shared
machine check banks must be only monitored by single CPU.

Noted and reviewed by: jhb
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D15157


# d9d8645c 24-Apr-2018 Konstantin Belousov <kib@FreeBSD.org>

Use IS_BSP() macro.

Noted and reviewed by: jhb
Sponsored by: The FreeBSD Foundation
MFC after: 3 days
Differential revision: https://reviews.freebsd.org/D15157


# d821d364 21-Jan-2018 Pedro F. Giffuni <pfg@FreeBSD.org>

Unsign some values related to allocation.

When allocating memory through malloc(9), we always expect the amount of
memory requested to be unsigned as a negative value would either stand for
an error or an overflow.
Unsign some values, found when considering the use of mallocarray(9), to
avoid unnecessary casting. Also consider that indexes should be of
at least the same size/type as the upper limit they pretend to index.

MFC after: 3 weeks


# ebf5747b 27-Nov-2017 Pedro F. Giffuni <pfg@FreeBSD.org>

sys/x86: further adoption of SPDX licensing ID tags.

Mainly focus on files that use BSD 2-Clause license, however the tool I
was using misidentified many licenses so this was mostly a manual - error
prone - task.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.


# c50df68a 17-Sep-2017 Conrad Meyer <cem@FreeBSD.org>

MCA: Expand AMD Thresholding support to cover all banks

When it was added in r314636, AMD Thresholding was hardcoded to only
bank 4 (Northbridge) for some reason. However, even on family 10h the
MCAx_MISC register Valid/Present bits determine whether thresholding is
supported on that bank.

Expand thresholding support to monitor all monitorable banks. This
simplifies some of the logic and makes it more consistent with our Intel
CMCI support.

Reviewed by: markj (earlier version)
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D12321


# d63edb4d 11-Sep-2017 Conrad Meyer <cem@FreeBSD.org>

MCA: Rename AMD MISC bits/masks

They apply to all AMD MCAi_MISC0 registers, not just MCA4 (NB).

No functional change.

Sponsored by: Dell EMC Isilon


# f739be66 11-Sep-2017 Conrad Meyer <cem@FreeBSD.org>

x86 MCA: Extract CMCI support predicate into function

On AMD, the MCG_CAP feature bit is reserved -- not explicitly zero. Do not
use it to determine CMCI support.

Reviewed by: avg, markj
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D12320


# 01a20b98 07-Sep-2017 Conrad Meyer <cem@FreeBSD.org>

mca: Fix printf types from r323289 on i386

Reported by: Michael Butler <imb AT protected-networks.net>
Sponsored by: Dell EMC Isilon


# 092c0e86 07-Sep-2017 Conrad Meyer <cem@FreeBSD.org>

x86 MCA: Helpfully, print why ECC thresholding is not enabled on AMD

Sponsored by: Dell EMC Isilon


# d848ecfb 07-Sep-2017 Conrad Meyer <cem@FreeBSD.org>

x86 MCA: Enable AMD thresholding support on 17h

17h supports MCA thresholding in the same way as 16h and earlier.
Supposedly a ScalableMca feature bit in CPUID 8000_0007:EBX must be set, but
that was not true for earlier models, so be careful about relying on it.

While here, document a missing bit in LS MCA MISC0.

Reviewed by: truckman
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D12237


# 1c5df7bd 28-Apr-2017 Conrad Meyer <cem@FreeBSD.org>

x86 MCA: Fix a deadlock in MCA exception processing

In exceptional circumstances, an MCA exception will trigger when the
freelist is exhausted. In such a case, no error will be logged on the list
and 'mca_count' will not be incremented.

Prior to this patch, all CPUs that received the exception would spin
forever.

With this change, the CPU that detects the error but finds the freelist
empty will proceed to panic the machine, ending the deadlock.

A follow-up to r260457.

Reported by: Ryan Libby <rlibby at gmail.com>
Reviewed by: jhb@
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D10536


# c1ad4beb 05-Mar-2017 Andriy Gapon <avg@FreeBSD.org>

mca: fix up couple of issues introduced with amd thresholding in r314636

1. There a was a typo in one place where the processor family is
checked (16 vs 0x16). Now the checks are consolidated in a single
function.
2. Instead of an array of struct amd_et_state objects the code allocated
an array of pointers. That was no problem on amd64 where the sizes
are the same, but could be a problem on i386.

Reported by: tuexen and others
Tested by: tuexen (earlier version of the fix)
Pointyhat to: avg
MFC after: 5 days
X-MFC with: r314636


# 7abf4604 03-Mar-2017 Andriy Gapon <avg@FreeBSD.org>

MCA: add AMD Error Thresholding support

Currently the feature is implemented only for a subset of errors
reported via Bank 4. The subset includes only DRAM-related errors.

The new code builds upon and reuses the Intel CMC (Correctable MCE
Counters) support code. However, the AMD feature is quite different
and, unfortunately, much less regular.

For references please see AMD BKDGs for models 10h - 16h.
Specifically, see MSR0000_0413 NB Machine Check Misc (Thresholding)
Register (MC4_MISC0).
http://developer.amd.com/resources/developer-guides-manuals/

Reviewed by: jhb
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D9613


# b929de10 21-Feb-2017 Andriy Gapon <avg@FreeBSD.org>

mca: change type of last_intr to time_t for consinstency

time_uptime is time_t

MFC after: 1 day
X-MFC with: r313752


# 92b87cdb 14-Feb-2017 Andriy Gapon <avg@FreeBSD.org>

mca: use time_uptime instead of ticks for CMCI throttling

This solves several problems.
First of all, cmc_throttle is specified in seconds and there was no
conversion between ticks and seconds when they were mixed together.
Second, we avoid potential problems with ticks wrapping around.

Resolution of time_uptime should be sufficient for the throttling
purposes.

Discussed with: jhb
MFC after: 12 days


# 3be5c621 14-Feb-2017 Andriy Gapon <avg@FreeBSD.org>

mca: fix writes to MSR_MC_CTL2 in cmci_update

Previously, if the threshold was changed, then MC_CTL2_CMCI_EN would get
cleared and the logic would switch to the polling only mode.

Discussed with: jhb
MFC after: 2 weeks


# be04edbb 12-Jan-2017 Pedro F. Giffuni <pfg@FreeBSD.org>

Remove __nonnull() attributes from x86 machine check architecture.

These are of the few cases where we use the GCC non-null attributes in
non-header code. As part of a review [1] of our use of such attributes we
are replacing such uses of the overly aggressive GCC attribute with clang's
_Nonnull attribute.

In this case the attributes serve little purpose as they just don't
enforce run time checks, If anything the attributes would cause NULL pointer
checks to be ignored but there are no such checks so only effect is
cosmetic.

The references appear to be left over from code development and likely
already fulfilled their purpose.

Reference [1]:
https://reviews.freebsd.org/D9004

Reviewed by: jhb
MFC after: 3 weeks


# f85ea63d 14-Dec-2016 Mark Johnston <markj@FreeBSD.org>

Don't run the MCA record refill task during boot.

The MCA taskqueue is not initialized until some time after CMCIs are
enabled on the BSP.

Reviewed by: cem, jhb
MFC after: 1 week
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D8783


# fdce57a0 14-May-2016 John Baldwin <jhb@FreeBSD.org>

Add an EARLY_AP_STARTUP option to start APs earlier during boot.

Currently, Application Processors (non-boot CPUs) are started by
MD code at SI_SUB_CPU, but they are kept waiting in a "pen" until
SI_SUB_SMP at which point they are released to run kernel threads.
SI_SUB_SMP is one of the last SYSINIT levels, so APs don't enter
the scheduler and start running threads until fairly late in the
boot.

This change moves SI_SUB_SMP up to just before software interrupt
threads are created allowing the APs to start executing kernel
threads much sooner (before any devices are probed). This allows
several initialization routines that need to perform initialization
on all CPUs to now perform that initialization in one step rather
than having to defer the AP initialization to a second SYSINIT run
at SI_SUB_SMP. It also permits all CPUs to be available for
handling interrupts before any devices are probed.

This last feature fixes a problem on with interrupt vector exhaustion.
Specifically, in the old model all device interrupts were routed
onto the boot CPU during boot. Later after the APs were released at
SI_SUB_SMP, interrupts were redistributed across all CPUs.

However, several drivers for multiqueue hardware allocate N interrupts
per CPU in the system. In a system with many CPUs, just a few drivers
doing this could exhaust the available pool of interrupt vectors on
the boot CPU as each driver was allocating N * mp_ncpu vectors on the
boot CPU. Now, drivers will allocate interrupts on their desired CPUs
during boot meaning that only N interrupts are allocated from the boot
CPU instead of N * mp_ncpu.

Some other bits of code can also be simplified as smp_started is
now true much earlier and will now always be true for these bits of
code. This removes the need to treat the single-CPU boot environment
as a special case.

As a transition aid, the new behavior is available under a new kernel
option (EARLY_AP_STARTUP). This will allow the option to be turned off
if need be during initial testing. I plan to enable this on x86 by
default in a followup commit in the next few days and to have all
platforms moved over before 11.0. Once the transition is complete,
the option will be removed along with the !EARLY_AP_STARTUP code.

These changes have only been tested on x86. Other platform maintainers
are encouraged to port their architectures over as well. The main
things to check for are any uses of smp_started in MD code that can be
simplified and SI_SUB_SMP SYSINITs in MD code that can be removed in
the EARLY_AP_STARTUP case (e.g. the interrupt shuffling).

PR: kern/199321
Reviewed by: markj, gnn, kib
Sponsored by: Netflix


# d68b7cfa 13-May-2016 Bjoern A. Zeeb <bz@FreeBSD.org>

Remove the extra _RD as _RDTUN already includes it.

Submitted by: emaste
MFC after: 2 weeks


# 2474dccf 13-May-2016 Bjoern A. Zeeb <bz@FreeBSD.org>

We already turn the AMD erratum383 workaround on for certain VM_GUEST_VM
if specific CPU features are not present.
Some simulation environments, e.g. gem5, have been found to require more
TLB management from the kernel in certain setups. It is currently unclear why.
Turning on the workaround_erratum383 seems to help and make problems (panics)
go away.
Given this is a fairly uncommon environment so far, allowing the workaround
to be manually enabled from loader in order to make debugging and comparing
traces easier, but also to allow gem5 run FreeBSD in X86 timing mode, seems
to be the least intrusive option for now until the issue if fully understood.

Sponsored by: DARPA/AFRL
Reviewed by: kib, alc (earlier)
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D6206


# cbc4d2db 01-Mar-2016 John Baldwin <jhb@FreeBSD.org>

Remove taskqueue_enqueue_fast().

taskqueue_enqueue() was changed to support both fast and non-fast
taskqueues 10 years ago in r154167. It has been a compat shim ever
since. It's time for the compat shim to go.

Submitted by: Howard Su <howard0su@gmail.com>
Reviewed by: sephe
Differential Revision: https://reviews.freebsd.org/D5131


# fd90e2ed 22-May-2015 Jung-uk Kim <jkim@FreeBSD.org>

CALLOUT_MPSAFE has lost its meaning since r141428, i.e., for more than ten
years for head. However, it is continuously misused as the mpsafe argument
for callout_init(9). Deprecate the flag and clean up callout_init() calls
to make them more consistent.

Differential Revision: https://reviews.freebsd.org/D2613
Reviewed by: jhb
MFC after: 2 weeks


# 179fa75e 23-Apr-2015 John Baldwin <jhb@FreeBSD.org>

Reassign copyright statements on several files from Advanced
Computing Technologies LLC to Hudson River Trading LLC.

Approved by: Hudson River Trading LLC (who owns ACT LLC)
MFC after: 1 week


# 62def911 19-Apr-2015 Marius Strobl <marius@FreeBSD.org>

Refine the workaround for Intel HSD131 [1] added in r269052:
- Use the full mask described by the erratum as with a sufficiently high
number of these false-positives, the overflow bit (bit 62) additionally
gets set [7].
- HSD131 has been brought into several other Haswell-derived CPUs including
to the next generation, i. e. Intel Broadwell. Thus, also skip reporting of
these benign errors by default on CPU models affected by HSM142, HSW131 and
BDM48 [2 - 5], describing the HSD131 silicon bug for additional models.
Also, Celeron 2955U with a CPU ID of 0x45 have been reported to be covered
by this fault [6], with the specification update concerned with HSM142 [2]
only referring to 0x3c and 0x46.

Submitted by: David Froehlich [7]
MFC after: 3 days

http://www.intel.de/content/dam/www/public/us/en/documents/specification-updates/4th-gen-core-family-desktop-specification-update.pdf [1]
http://www.intel.com/content/dam/www/public/us/en/documents/specification-updates/4th-gen-core-family-mobile-specification-update.pdf [2]
http://www.intel.com/content/dam/www/public/us/en/documents/specification-updates/5th-gen-core-family-spec-update.pdf [3]
http://www.intel.de/content/dam/www/public/us/en/documents/specification-updates/core-m-processor-family-spec-update.pdf [4]
http://www.intel.com/content/dam/www/public/us/en/documents/specification-updates/xeon-e3-1200v3-spec-update.pdf [5]
https://lists.freebsd.org/pipermail/freebsd-hackers/2015-January/046878.html [6]


# dc0ca751 29-Jul-2014 Marius Strobl <marius@FreeBSD.org>

Fix yet another comment typo in r269052.


# fe88aba3 29-Jul-2014 Marius Strobl <marius@FreeBSD.org>

Fix comment typo in r269052.

Submitted by: Daniel O'Connor


# a0d9385f 24-Jul-2014 Marius Strobl <marius@FreeBSD.org>

Intel desktop Haswell CPUs may report benign corrected parity errors (see
HSD131 erratum in [1]) at a considerable rate. So filter these (default),
unless logging is enabled. Unfortunately, there really is no better way to
reasonably implement suppressing these errors than to just skipping them
in mca_log(). Given that they are reported for bank 0, they'd need to be
masked in MSR_MC0_CTL. However, P6 family processors require that register
to be set to either all 0s or all 1s, disabling way more than the one error
in question when using all 0s there. Alternatively, it could be masked for
the corresponding CMCI, but that still wouldn't keep the periodic scanner
from detecting these spurious errors. Apart from that, register contents of
MSR_MC0_CTL{,2} don't seem to be publicly documented, neither in the Intel
Architectures Developer's Manual nor in the Haswell datasheets.

Note that while HSD131 actually is only about C0-stepping as of revision
014 of the Intel desktop 4th generation processor family specification
update, these corrected errors also have been observed with D0-stepping
aka "Haswell Refresh".

1: http://www.intel.de/content/dam/www/public/us/en/documents/specification-updates/4th-gen-core-family-desktop-specification-update.pdf

Reviewed by: jhb
MFC after: 3 days
Sponsored by: Bally Wulff Games & Entertainment GmbH


# af3b2549 27-Jun-2014 Hans Petter Selasky <hselasky@FreeBSD.org>

Pull in r267961 and r267973 again. Fix for issues reported will follow.


# 37a107a4 27-Jun-2014 Glen Barber <gjb@FreeBSD.org>

Revert r267961, r267973:

These changes prevent sysctl(8) from returning proper output,
such as:

1) no output from sysctl(8)
2) erroneously returning ENOMEM with tools like truss(1)
or uname(1)
truss: can not get etype: Cannot allocate memory


# 3da1cf1e 27-Jun-2014 Hans Petter Selasky <hselasky@FreeBSD.org>

Extend the meaning of the CTLFLAG_TUN flag to automatically check if
there is an environment variable which shall initialize the SYSCTL
during early boot. This works for all SYSCTL types both statically and
dynamically created ones, except for the SYSCTL NODE type and SYSCTLs
which belong to VNETs. A new flag, CTLFLAG_NOFETCH, has been added to
be used in the case a tunable sysctl has a custom initialisation
function allowing the sysctl to still be marked as a tunable. The
kernel SYSCTL API is mostly the same, with a few exceptions for some
special operations like iterating childrens of a static/extern SYSCTL
node. This operation should probably be made into a factored out
common macro, hence some device drivers use this. The reason for
changing the SYSCTL API was the need for a SYSCTL parent OID pointer
and not only the SYSCTL parent OID list pointer in order to quickly
generate the sysctl path. The motivation behind this patch is to avoid
parameter loading cludges inside the OFED driver subsystem. Instead of
adding special code to the OFED driver subsystem to post-load tunables
into dynamically created sysctls, we generalize this in the kernel.

Other changes:
- Corrected a possibly incorrect sysctl name from "hw.cbb.intr_mask"
to "hw.pcic.intr_mask".
- Removed redundant TUNABLE statements throughout the kernel.
- Some minor code rewrites in connection to removing not needed
TUNABLE statements.
- Added a missing SYSCTL_DECL().
- Wrapped two very long lines.
- Avoid malloc()/free() inside sysctl string handling, in case it is
called to initialize a sysctl from a tunable, hence malloc()/free() is
not ready when sysctls from the sysctl dataset are registered.
- Bumped FreeBSD version to indicate SYSCTL API change.

MFC after: 2 weeks
Sponsored by: Mellanox Technologies


# 053cbfe3 13-Mar-2014 John Baldwin <jhb@FreeBSD.org>

Correct type for malloc().

Submitted by: "Conrad Meyer" <conrad.meyer@isilon.com>


# e07ef9b0 23-Jan-2014 John Baldwin <jhb@FreeBSD.org>

Move <machine/apicvar.h> to <x86/apicvar.h>.


# 6d403615 08-Jan-2014 John Baldwin <jhb@FreeBSD.org>

The changes in r233781 attempted to make logging during a machine check
exception more readable. In practice they prevented all logging during
a machine check exception on at least some systems. Specifically, when
an uncorrected ECC error is detected in a DIMM on a Nehalem/Westmere
class machine, all CPUs receive a machine check exception, but only
CPUs on the same package as the memory controller for the erroring DIMM
log an error. The CPUs on the other package would complete the scan of
their machine check banks and panic before the first set of CPUs could
log an error. The end result was a clearer display during the panic
(no interleaved messages), but a crashdump without any useful info about
the error that occurred.

To handle this case, make all CPUs spin in the machine check handler
once they have completed their scan of their machine check banks until
at least one machine check error is logged. I tried using a DELAY()
instead so that the CPUs would not potentially hang forever, but that
was not reliable in testing.

While here, don't clear MCIP from MSR_MCG_STATUS before invoking panic.
Only clear it if the machine check handler does not panic and returns
to the interrupted thread.


# b867b16d 02-Apr-2012 John Baldwin <jhb@FreeBSD.org>

Further tweak the changes made in r233709. The kernel doesn't permit
sleeping from a swi handler (even though in this case it would be ok), so
switch the refill and scanning SWI handlers to being tasks on a fast
taskqueue. Also, only schedule the refill task for a CMCI as an MC# can
fire at any time, so it should do the minimal amount of work needed and
avoid opportunities to deadlock before it panics (such as scheduling a
task it won't ever need in practice). To handle the case of an MC# only
finding recoverable errors (which should never happen), always try to
refill the event free list when the periodic scan executes.

MFC after: 2 weeks


# f2e3bfc0 02-Apr-2012 John Baldwin <jhb@FreeBSD.org>

Make machine check exception logging more readable. On newer Intel systems,
an uncorrected ECC error tends to fire on all CPUs in a package
simultaneously and the current printf hacks are not sufficient to make
the messages legible. Instead, use the existing mca_lock spinlock to
serialize calls to mca_log() and change the machine check code to panic
directly when an unrecoverable error is encoutered rather than falling
back to a trap_fatal() call in trap() (which adds nearly a screen-full of
logging messages that aren't useful for machine checks).

MFC after: 2 weeks


# 8b9e9831 30-Mar-2012 John Baldwin <jhb@FreeBSD.org>

Attempt to make machine check handling a bit more robust:
- Don't malloc() new MCA records for machine checks logged due to a
CMCI or MC# exception. Instead, use a pre-allocated pool of records.
When a CMCI or MC# exception fires, schedule a swi to refill the pool.
The pool is sized to hold at least one record per available machine
bank, and one record per CPU. This should handle the case of all CPUs
triggering a single bank at once as well as the case a single CPU
triggering all of its banks. The periodic scans still use malloc()
since they are run from a safe context.
- Since we have to create an swi to handle refills, make the periodic scan
a second swi for the same thread instead of having a separate taskqueue
thread for the scans.

Suggested by: mdf (avoiding malloc())
MFC after: 2 weeks


# 6472ac3d 07-Nov-2011 Ed Schouten <ed@FreeBSD.org>

Mark all SYSCTL_NODEs static that have no corresponding SYSCTL_DECLs.

The SYSCTL_NODE macro defines a list that stores all child-elements of
that node. If there's no SYSCTL_DECL macro anywhere else, there's no
reason why it shouldn't be static.


# e119a7bc 03-Feb-2011 John Baldwin <jhb@FreeBSD.org>

Use a dedicated taskqueue with a thread that runs at a software-interrupt
priority for the periodic polling of the machine check registers.


# 5ecdb3c4 01-Nov-2010 John Baldwin <jhb@FreeBSD.org>

Move the <machine/mca.h> header to <x86/mca.h>.


# a7d5f7eb 19-Oct-2010 Jamie Gritton <jamie@FreeBSD.org>

A new jail(8) with a configuration file, to replace the work currently done
by /etc/rc.d/jail.


# a955c461 28-Jul-2010 John Baldwin <jhb@FreeBSD.org>

The corrected error count field is dependent on CMCI, not TES.

MFC after: 1 week


# 61d3f0ba 15-Jun-2010 John Baldwin <jhb@FreeBSD.org>

Restore the machine check register banks on resume. For banks being
monitored via CMCI, reset the interrupt threshold to 1 on resume.

Reviewed by: jkim
MFC after: 2 weeks


# 3aa6d94e 11-Jun-2010 John Baldwin <jhb@FreeBSD.org>

Update several places that iterate over CPUs to use CPU_FOREACH().


# 2465e30f 08-Jun-2010 John Baldwin <jhb@FreeBSD.org>

Move the machine check support code to the x86 tree since it is identical
on i386 and amd64.

Requested by: alc