367457 |
07-Nov-2020 |
dim |
MFC r344855 (by jhb):
Drop "All rights reserved" from my copyright statements.
Reviewed by: rgrimes Differential Revision: https://reviews.freebsd.org/D19485 |
358582 |
03-Mar-2020 |
kib |
MFC r358315: Fix IBRS for machines with IBRS_ALL capability. |
347700 |
16-May-2019 |
markj |
MFC r337715, r337751, r337754, r337758, r337813, r338354, r338687, r339124, r341821: Add support for boot-time Intel microcode loading. |
340016 |
01-Nov-2018 |
jhb |
MFC 338360,338415,338624,338630,338631,338725: Dynamic x86 IRQ layout.
338360: Dynamically allocate IRQ ranges on x86.
Previously, x86 used static ranges of IRQ values for different types of I/O interrupts. Interrupt pins on I/O APICs and 8259A PICs used IRQ values from 0 to 254. MSI interrupts used a compile-time-defined range starting at 256, and Xen event channels used a compile-time-defined range after MSI. Some recent systems have more than 255 I/O APIC interrupt pins which resulted in those IRQ values overflowing into the MSI range triggering an assertion failure.
Replace statically assigned ranges with dynamic ranges. Do a single pass computing the sizes of the IRQ ranges (PICs, MSI, Xen) to determine the total number of IRQs required. Allocate the interrupt source and interrupt count arrays dynamically once this pass has completed. To minimize runtime complexity these arrays are only sized once during bootup. The PIC range is determined by the PICs present in the system. The MSI and Xen ranges continue to use a fixed size, though this does make it possible to turn the MSI range size into a tunable in the future.
As a result, various places are updated to use dynamic limits instead of constants. In addition, the vmstat(8) utility has been taught to understand that some kernels may treat 'intrcnt' and 'intrnames' as pointers rather than arrays when extracting interrupt stats from a crashdump. This is determined by the presence (vs absence) of a global 'nintrcnt' symbol.
This change reverts r189404 which worked around a buggy BIOS which enumerated an I/O APIC twice (using the same memory mapped address for both entries but using an IRQ base of 256 for one entry and a valid IRQ base for the second entry). Making the "base" of MSI IRQ values dynamic avoids the panic that r189404 worked around, and there may now be valid I/O APICs with an IRQ base above 256 which this workaround would incorrectly skip.
If in the future the issue reported in PR 130483 reoccurs, we will have to add a pass over the I/O APIC entries in the MADT to detect duplicates using the memory mapped address and use some strategy to choose the "correct" one.
While here, reserve room in intrcnts for the Hyper-V counters.
338415: Fix build of x86 UP kernels after dynamic IRQ changes in r338360.
338624: msi: remove the check that interrupt sources have been added
When running as a specific type of Xen guest the hypervisor won't provide any emulated IO-APICs or legacy PICs at all, thus hitting the following assert in the MSI code:
panic: Assertion num_io_irqs > 0 failed at /usr/src/sys/x86/x86/msi.c:334 cpuid = 0 time = 1 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xffffffff826ffa70 vpanic() at vpanic+0x1a3/frame 0xffffffff826ffad0 panic() at panic+0x43/frame 0xffffffff826ffb30 msi_init() at msi_init+0xed/frame 0xffffffff826ffb40 apic_setup_io() at apic_setup_io+0x72/frame 0xffffffff826ffb50 mi_startup() at mi_startup+0x118/frame 0xffffffff826ffb70 start_kernel() at start_kernel+0x10
Fix this by removing the assert in the MSI code, since it's possible to get to the MSI initialization without having registered any other interrupt sources.
338630: lapic: skip setting intrcnt if lapic is not present
Instead of panicking. Legacy PVH mode doesn't provide a lapic, and since native_lapic_intrcnt is called unconditionally this would cause the assert to trigger. Change the assert into a continue in order to take into account the possibility of systems without a lapic.
338631: xen: legacy PVH fixes for the new interrupt count
Register interrupts using the PIC pic_register_sources method instead of doing it in apic_setup_io. This is now required, since the internal interrupt structures are not yet setup when calling apic_setup_io.
338725: Fix a regression in r338360 when booting an x86 machine without APIC.
The atpic_register_sources callback tries to avoid registering interrupt sources that would collide with an I/O APIC. However, the previous implementation was failing to register IRQs 8-15 since the slave PIC saw valid IRQs from the master and assumed an I/O APIC was present. To fix, go back to registering all 8259A interrupt sources in one loop when the master's register_sources method is invoked.
PR: 229429, 130483, 231291 |
335659 |
26-Jun-2018 |
avg |
MFC r334340: add support for console resuming, implement it for uart, use on x86 |
335554 |
22-Jun-2018 |
avg |
MFC r332918, r333222: go deeper for ACPI suspend bounce test
debug.acpi.suspend_bounce sysctl now allows a deeper dive into the sleep abyss. The system will execute the suspend sequence up to the call to AcpiEnterSleepState(). That includes saving processor contexts and parking APs. Then, instead of actually entering the sleep state, the BSP will call resumectx() to emulate the wakeup. The APs should get restarted by the sequence of Init and Startup IPIs that BSP sends to them.
AcpiOsEnterSleep() is used to implement this feature.
Joint work with jkim. |
334152 |
24-May-2018 |
kib |
MFC r334004: Add Intel Spec Store Bypass Disable control.
This also includes the i386/include/pcpu.h part of the r334018.
Security: CVE-2018-3639 Approved by: re (gjb) |
333361 |
08-May-2018 |
kib |
MFC r333125: Turn off IBRS on suspend.
Approved by: re (marius) |
331909 |
03-Apr-2018 |
avg |
MFC r327056: Use resume_cpus() instead of restart_cpus() to resume from ACPI suspension. |
322996 |
29-Aug-2017 |
mav |
MFC r322802: Fix off-by-one error when parsing SRAT table. |
319937 |
14-Jun-2017 |
kib |
MFC r319825: More accurately handle early EFER restoration on resume.
Approved by: re (delphij) |
319133 |
29-May-2017 |
kib |
MFC r318318: Ensure that resume path on amd64 only accesses page tables for normal operation after processor is configured to allow all required features. |
316330 |
31-Mar-2017 |
royger |
MFC r315402:
x86/srat: fix parsing of APIC IDs > MAX_APIC_ID |
316303 |
30-Mar-2017 |
jkim |
MFC: r306686, r308953, r311462, r311529, r312438, r314611
- Merge ACPICA 20170303. - Remove '-vd' option to make iasl(8) reproducible.
Relnotes: yes |
314210 |
24-Feb-2017 |
kib |
MFC r313154: For i386, remove config options CPU_DISABLE_CMPXCHG, CPU_DISABLE_SSE and device npx. |
313128 |
03-Feb-2017 |
markj |
MFC r302793: Allow ACPI wakeup code and page tables to be stored in non-contiguous pages. |
310195 |
18-Dec-2016 |
kib |
MFC r309854: Prefix hex memory addresses with 0x in diagnostic messages from the SRAT parser. |
306628 |
03-Oct-2016 |
kib |
MFC r305978: Detect x2APIC mode on boot and obey it. |
302408 |
08-Jul-2016 |
gjb |
Copy head@r302406 to stable/11 as part of the 11.0-RELEASE cycle. Prune svn:mergeinfo from the new branch, as nothing has been merged here.
Additional commits post-branch will follow.
Approved by: re (implicit) Sponsored by: The FreeBSD Foundation |
302147 |
23-Jun-2016 |
markj |
Use M_NOWAIT when allocating memory for the ACPI wakeup handler.
If the allocation attempt fails, we may otherwise VM_WAIT after a failed attempt to reclaim contiguous memory in the requested range. After r297466, this results in the thread going to sleep, causing a hang during boot.
Reviewed by: jkim, kib Approved by: re (gjb) Sponsored by: EMC / Isilon Storage Division Differential Revision: https://reviews.freebsd.org/D6945
|
299004 |
03-May-2016 |
vangyzen |
Work around (ignore) broken SRAT tables
Instead of panicking when parsing an invalid ACPI SRAT table, just ignore it, effectively disabling NUMA.
https://lists.freebsd.org/pipermail/freebsd-current/2016-May/060984.html
Reported and tested by: Bill O'Hanlon (bill.ohanlon at gmail.com) Reviewed by: jhb MFC after: 1 week Relnotes: If dmesg shows "SRAT: Duplicate local APIC ID", try updating your BIOS to fix NUMA support. Sponsored by: Dell Inc.
|
298951 |
03-May-2016 |
jhb |
Revert bus_get_cpus() for now.
I really thought I had run this through the tinderbox before committing, but many places need <sys/types.h> -> <sys/param.h> for <sys/bus.h> now.
|
298933 |
02-May-2016 |
jhb |
Add a new bus method to fetch device-specific CPU sets.
bus_get_cpus() returns a specified set of CPUs for a device. It accepts an enum for the second parameter that indicates the type of cpuset to request. Currently two valus are supported:
- LOCAL_CPUS (on x86 this returns all the CPUs in the package closest to the device when DEVICE_NUMA is enabled) - INTR_CPUS (like LOCAL_CPUS but only returns 1 SMT thread for each core)
For systems that do not support NUMA (or if it is not enabled in the kernel config), LOCAL_CPUS fails with EINVAL. INTR_CPUS is mapped to 'all_cpus' by default. The idea is that INTR_CPUS should always return a valid set.
Device drivers which want to use per-CPU interrupts should start using INTR_CPUS instead of simply assigning interrupts to all available CPUs. In the future we may wish to add tunables to control the policy of INTR_CPUS (e.g. should it be local-only or global, should it ignore SMT threads or not).
The x86 nexus driver exposes the internal set of interrupt CPUs from the the x86 interrupt code via INTR_CPUS.
The ACPI bus driver and PCI bridge drivers use _PXM to return a suitable LOCAL_CPUS set when _PXM exists and DEVICE_NUMA is enabled. They also and the global INTR_CPUS set from the nexus driver with the per-domain set from _PXM to generate a local INTR_CPUS set for child devices.
Reviewed by: wblock (manpage) Differential Revision: https://reviews.freebsd.org/D5519
|
298321 |
20-Apr-2016 |
cem |
SRAT: Don't overflow domain_pxm table
If we reached MAXMEMDOM, we would previously try to insert an additional element and only detect overflow after causing (probably trivial) memory overflow. Instead, detect the ndomain > MAXMEMDOM case before we write past the end.
Reported by: Coverity CID: 1354783 Sponsored by: EMC / Isilon Storage Division
|
297954 |
14-Apr-2016 |
imp |
Deprecate using hints.acpi.0.rsdp to communicate the RSDP to the system. This uses the hints mechnanism. This mostly works today because when there's no static hints (the default), this value can be fetched from the hint. When there is a static hints file, the hint passed from the boot loader to the kernel is ignored, but for the BIOS case we're able to find it anyway. However, with UEFI, the fallback doesn't work, so we get a panic instead.
Switch to acpi.rsdp and use TUNABLE_ULONG_FETCH instead. Continue to generate the old values to allow for transitions. In addition, fall back to the old method if the new method isn't present.
Add comments about all this.
Differential Revision: https://reviews.freebsd.org/D5866
|
297748 |
09-Apr-2016 |
jhb |
Add more fine-grained kernel options for NUMA support.
VM_NUMA_ALLOC is used to enable use of domain-aware memory allocation in the virtual memory system. DEVICE_NUMA is used to enable affinity reporting for devices such as bus_get_domain().
MAXMEMDOM must still be set to a value greater than for any NUMA support to be effective. Note that 'cpuset -gd' always works if MAXMEMDOM is enabled and the system supports NUMA.
Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D5782
|
295880 |
22-Feb-2016 |
skra |
As <machine/pmap.h> is included from <vm/pmap.h>, there is no need to include it explicitly when <vm/pmap.h> is already included.
Reviewed by: alc, kib Differential Revision: https://reviews.freebsd.org/D5373
|
291686 |
03-Dec-2015 |
kib |
In the SandyBridge x2APIC workaround detection code, only fetch the environment variable when SandyBridge CPU is detected. Reduce code duplication.
Sponsored by: The FreeBSD Foundation
|
287841 |
16-Sep-2015 |
adrian |
Add ASUS Sandybridge laptops to the similar x2apic disable logic that was recently added for Lenovo laptops.
This is a prime candidate for conversion into a table and also checking other fields like "product".
Tested:
* ASUS UX31E
|
286994 |
21-Aug-2015 |
kib |
Automatically disable x2APIC mode on SandyBridge Lenovo machines. I believe that the bug only affects mobile CPUs, at least I did not see other reports, but it is impossible to detect it in madt_setup_local().
While there, reduce duplication in the information strings printed when x2APIC is auto-disabled, and do not print the line when user manually override the setting.
Tested and reviewed by: royger (previous version) Sponsored by: The FreeBSD Foundation
|
284583 |
18-Jun-2015 |
jkim |
Merge ACPICA 20150619.
|
284175 |
09-Jun-2015 |
jhb |
Handle X2APIC entries in the MADT for APICs with an ID < 255. At least one BIOS has been seen to include such entries even though the relevant specs require that X2APIC entries only be used for CPUs with an APIC ID >= 255.
This was tested on a system with "plain" local APIC entries in the MADT to ensure no regressions, but it has not yet been tested on a system with X2APIC entries in the MADT. Currently such systems do not boot at all, and with this change they might now boot correctly.
Differential Revision: https://reviews.freebsd.org/D2521 Reviewed by: kib MFC after: 2 weeks
|
282991 |
15-May-2015 |
adrian |
Update the comments to match what the code ended up becoming.
-1 is now "no locality information available".
Sponsored by: Norse Corp, Inc.
|
282617 |
08-May-2015 |
adrian |
Add initial memory locality cost awareness to the VM, and include a basic ACPI SLIT table parser.
For now this just exports the map via sysctl; it'll eventually be useful to userland when there's more useful NUMA support in -HEAD.
* Add an optional mem_locality map; * add a mapping function taking from/to domain and returning the relative cost, or -1 if it's not available; * Add a very basic SLIT parser to x86 ACPI.
Differential Revision: https://reviews.freebsd.org/D2460 Reviewed by: rpaulo, stas, jhb Sponsored by: Norse Corp, Inc (hardware, coding); Dell (hardware)
|
281887 |
23-Apr-2015 |
jhb |
Reassign copyright statements on several files from Advanced Computing Technologies LLC to Hudson River Trading LLC.
Approved by: Hudson River Trading LLC (who owns ACT LLC) MFC after: 1 week
|
281495 |
13-Apr-2015 |
kib |
Add config option PAE_TABLES for the i386 kernel. It switches pmap to use PAE format for the page tables, but does not incur other consequences of the full PAE config. In particular, vm_paddr_t and bus_addr_t are left 32bit, and max supported memory is still limited by 4GB.
The option allows to have nx permissions for memory mappings on i386 kernel, while keeping the usual i386 KBI and avoiding the kernel data sizing problems typical for the PAE config.
Intel documented that the PAE format for page tables is available starting with the Pentium Pro, but it is possible that the plain Pentium CPUs have the required support (Appendix H). The goal is to enable the option and non-exec mappings on i386 for the GENERIC kernel. Anybody wanting a useful system on 486, have to reconfigure the modern i386 kernel anyway.
Discussed with: alc, jhb Sponsored by: The FreeBSD Foundation MFC after: 2 weeks
|
281475 |
12-Apr-2015 |
jkim |
Fix build on i386.
Reported by: bz
|
279286 |
25-Feb-2015 |
kib |
For now, disable x2APIC mode when Xen is detected, even if CPU declares support for it. Newer versions of Xen works fine with x2APIC code, but e.g. Xen 4.2 delivers GPF on the LAPIC MSR write, despite x2APIC mode being known to hypervisor.
Discussed with: royger Sponsored by: The FreeBSD Foundation
|
279079 |
20-Feb-2015 |
tijl |
Fix build on i386 without "device apic"
Reviewed by: kib
|
278954 |
18-Feb-2015 |
kib |
Fix UP build.
Sponsored by: The FreeBSD Foundation MFC after: 2 months
|
278869 |
16-Feb-2015 |
kib |
Initialize x2APIC mode on the resume path before accessing LAPIC.
Remove unneeded disable of LAPIC in the native_lapic_xapic_mode(). We attempt to send wakeup IPI on the resume path right after BSP wakeup, so disabling is wrong.
Reported and tested by: glebius, "Ranjan1018 ." <214748mv@gmail.com> Sponsored by: The FreeBSD Foundation MFC after: 2 months
|
278749 |
14-Feb-2015 |
kib |
Detect whether x2APIC on VMWare is usable without interrupt redirection support. Older versions of the hypervisor mis-interpret the cpuid format in ioapic registers when x2APIC is turned on, but IR is not used by the guest OS.
Based on: Linux commit 4cca6ea04d31c22a7d0436949c072b27bde41f86 Tested by: markj Sponsored by: The FreeBSD Foundation MFC after: 2 months
|
278473 |
09-Feb-2015 |
kib |
Add x2APIC support. Enable it by default if CPU is capable. The hw.x2apic_enable tunable allows disabling it from the loader prompt.
To closely repeat effects of the uncached memory ops when accessing registers in the xAPIC mode, the x2APIC writes to MSRs are preceeded by mfence, except for the EOI notifications. This is probably too strict, only ICR writes to send IPI require serialization to ensure that other CPUs see the previous actions when IPI is delivered. This may be changed later.
In vmm justreturn IPI handler, call doreti_iret instead of doing iretd inline, to handle corner conditions.
Note that the patch only switches LAPICs into x2APIC mode. It does not enables FreeBSD to support > 255 CPUs, which requires parsing x2APIC MADT entries and doing interrupts remapping, but is the required step on the way.
Reviewed by: neel Tested by: pho (real hardware), neel (on bhyve) Discussed with: jhb, grehan Sponsored by: The FreeBSD Foundation MFC after: 2 months
|
276829 |
08-Jan-2015 |
jhb |
Create a cpuset mask for each NUMA domain that is available in the kernel via the global cpuset_domain[] array. To export these to userland, add a CPU_WHICH_DOMAIN level that can be used to fetch the mask for a specific domain. Add a -d flag to cpuset(1) that can be used to fetch the mask for a given domain.
Differential Revision: https://reviews.freebsd.org/D1232 Submitted by: jeff (kernel bits) Reviewed by: adrian, jeff
|
273995 |
02-Nov-2014 |
jhb |
MFamd64: Add support for extended FPU states on i386. This includes support for AVX on i386. - Similar to amd64, move the FPU save area out of the PCB and instead store saved FPU state in a variable-sized buffer after the PCB on the stack. - To support the variable PCB location, alter the locore code to only use the bottom-most page of proc0stack for init386(). init386() returns the correct stack pointer to locore which adjusts the stack for thread0 before calling mi_startup(). - Don't bother setting cr3 in thread0's pcb in locore before calling init386(). It wasn't used (init386() overwrote it at the end) and it doesn't work with the variable-sized FPU save area. - Remove the new-bus attachment from npx. This was only ever useful for external co-processors using IRQ13, but those have not been supported for several years. npxinit() is now called much earlier during boot (init386()) similar to amd64. - Implement PT_{GET,SET}XSTATE and I386_GET_XFPUSTATE. - npxsave() is now only called from context switch contexts so it can use XSAVEOPT.
Differential Revision: https://reviews.freebsd.org/D1058 Reviewed by: kib Tested on: FreeBSD/i386 VM under bhyve on Intel i5-2520
|
272800 |
09-Oct-2014 |
adrian |
Missing from previous commit - keep the VM domain -> PXM mapping array and use it to map PXM -> VM domain when needed.
Differential Revision: D906 Reviewed by: jhb
|
271192 |
06-Sep-2014 |
jhb |
Create a separate structure for per-CPU state saved across suspend and resume that is a superset of a pcb. Move the FPU state out of the pcb and into this new structure. As part of this, move the FPU resume code on amd64 into a C function. This allows resumectx() to still operate only on a pcb and more closely mirrors the i386 code.
Reviewed by: kib (earlier version)
|
270850 |
30-Aug-2014 |
jhb |
Save and restore FPU state across suspend and resume. In earlier revisions of this patch, resumectx() called npxresume() directly, but that doesn't work because resumectx() runs with a non-standard %cs selector. Instead, all of the FPU suspend/resume handling is done in C.
MFC after: 1 week
|
269512 |
04-Aug-2014 |
royger |
x86/madt: make the interrupt override parser a public function
Split a portion of the code in madt_parse_interrupt_override to a separate function, that is public and can be used from other code. This will be needed by the Xen port, since FreeBSD needs to parse the interrupt overrides and notify Xen about them.
This commit should not introduce any functional change.
Sponsored by: Citrix Systems R&D Reviewed by: jhb, gibbs
x86/acpica/madt.c: - Introduce madt_parse_interrupt_values() that parses the intr information from ACPI and returns the triggering and the polarity. This is a subset of the functionality that used to be part of madt_parse_interrupt_override(). - Make madt_found_sci_override a global variable that can be used from other files.
x86/include/acpica_machdep.h: - Prototype of madt_parse_interrupt_values. - Extern declaration of madt_found_sci_override.
|
269511 |
04-Aug-2014 |
royger |
xen: change quality of the MADT ACPI enumerator
Lower the quality of the MADT ACPI enumerator, so on Xen Dom0 we can force the usage of the Xen mptable enumerator even when ACPI is detected.
This is needed because Xen might restrict the number of vCPUs available to Dom0, but the MADT ACPI table parsed in FreeBSD is the native one (which enumerates all the CPUs available in the system).
Sponsored by: Citrix Systems R&D Reviewed by: gibbs
x86/acpica/madt.c: - Lower MADT enumerator quality to -50.
x86/xen/pvcpu_enum.c: - Rise Xen PV enumerator to 0.
|
269184 |
28-Jul-2014 |
akiyama |
Add missing newline to output dmesg properly.
|
263859 |
28-Mar-2014 |
takawata |
Change default logic to CONFORM because this routine is shared with SCI polarity setting.
Reviewed by: jhb
|
263795 |
27-Mar-2014 |
takawata |
Strict value checking will cause problem. Bay trail DN2820FYKH is supported on Linux but does not work on FreeBSD. This behaviour is bug-compatible with Linux-3.13.5.
References: http://d.hatena.ne.jp/syuu1228/20140326 http://lxr.linux.no/linux+v3.13.5/arch/x86/kernel/acpi/boot.c#L1094
Submitted by: syuu
|
263794 |
27-Mar-2014 |
takawata |
To check polarity, check ACPI_MADT_POLARITY_CONFORMS, instead of ACPI_MADT_TRIGGER_CONFORMS.
PR:amd64/188010 Submitted by: syuu
|
262752 |
04-Mar-2014 |
jkim |
Move fpusave() wrapper for suspend hander to sys/amd64/amd64/fpu.c.
Inspired by: jhb
|
261520 |
05-Feb-2014 |
jhb |
Drop the 3rd clause from all 3 clause BSD licenses where I am the sole holder to convert them to 2 clause BSD licenses.
MFC after: 1 week
|
261087 |
23-Jan-2014 |
jhb |
Move <machine/apicvar.h> to <x86/apicvar.h>.
|
259823 |
24-Dec-2013 |
jhb |
Fix i386 build.
Pointy hat to: jhb
|
259782 |
23-Dec-2013 |
jhb |
Add a resume hook for bhyve that runs a function on all CPUs during resume. For Intel CPUs, invoke vmxon for CPUs that were in VMX mode at the time of suspend.
Reviewed by: neel
|
259140 |
09-Dec-2013 |
jhb |
Move constants for indices in the local APIC's local vector table from apicvar.h to apicreg.h.
|
256073 |
05-Oct-2013 |
gibbs |
Formalize the concept of virtual CPU ids by adding a per-cpu vcpu_id field. Perform vcpu enumeration for Xen PV and HVM environments and convert all Xen drivers to use vcpu_id instead of a hard coded assumption of the mapping algorithm (acpi or apic ID) in use.
Submitted by: Roger Pau Monné Sponsored by: Citrix Systems R&D Reviewed by: gibbs Approved by: re (blanket Xen)
amd64/include/pcpu.h: i386/include/pcpu.h: Add vcpu_id to the amd64 and i386 pcpu structures.
dev/xen/timer/timer.c x86/xen/xen_intr.c Use new vcpu_id instead of assuming acpi_id == vcpu_id.
i386/xen/mp_machdep.c: i386/xen/mptable.c x86/xen/hvm.c: Perform Xen HVM and Xen full PV vcpu_id mapping.
x86/xen/hvm.c: x86/acpica/madt.c Change SYSINIT ordering of acpi CPU enumeration so that it is guaranteed to be available at the time of Xen HVM vcpu id mapping.
|
255726 |
20-Sep-2013 |
gibbs |
Add support for suspend/resume/migration operations when running as a Xen PVHVM guest.
Submitted by: Roger Pau Monné Sponsored by: Citrix Systems R&D Reviewed by: gibbs Approved by: re (blanket Xen) MFC after: 2 weeks
sys/amd64/amd64/mp_machdep.c: sys/i386/i386/mp_machdep.c: - Make sure that are no MMU related IPIs pending on migration. - Reset pending IPI_BITMAP on resume. - Init vcpu_info on resume.
sys/amd64/include/intr_machdep.h: sys/i386/include/intr_machdep.h: sys/x86/acpica/acpi_wakeup.c: sys/x86/x86/intr_machdep.c: sys/x86/isa/atpic.c: sys/x86/x86/io_apic.c: sys/x86/x86/local_apic.c: - Add a "suspend_cancelled" parameter to pic_resume(). For the Xen PIC, restoration of interrupt services differs between the aborted suspend and normal resume cases, so we must provide this information.
sys/dev/acpica/acpi_timer.c: sys/dev/xen/timer/timer.c: sys/timetc.h: - Don't swap out "suspend safe" timers across a suspend/resume cycle. This includes the Xen PV and ACPI timers.
sys/dev/xen/control/control.c: - Perform proper suspend/resume process for PVHVM: - Suspend all APs before going into suspension, this allows us to reset the vcpu_info on resume for each AP. - Reset shared info page and callback on resume.
sys/dev/xen/timer/timer.c: - Implement suspend/resume support for the PV timer. Since FreeBSD doesn't perform a per-cpu resume of the timer, we need to call smp_rendezvous in order to correctly resume the timer on each CPU.
sys/dev/xen/xenpci/xenpci.c: - Don't reset the PCI interrupt on each suspend/resume.
sys/kern/subr_smp.c: - When suspending a PVHVM domain make sure there are no MMU IPIs in-flight, or we will get a lockup on resume due to the fact that pending event channels are not carried over on migration. - Implement a generic version of restart_cpus that can be used by suspended and stopped cpus.
sys/x86/xen/hvm.c: - Implement resume support for the hypercall page and shared info. - Clear vcpu_info so it can be reset by APs when resuming from suspension.
sys/dev/xen/xenpci/xenpci.c: sys/x86/xen/hvm.c: sys/x86/xen/xen_intr.c: - Support UP kernel configurations.
sys/x86/xen/xen_intr.c: - Properly rebind per-cpus VIRQs and IPIs on resume.
|
254065 |
07-Aug-2013 |
kib |
Split the pagequeues per NUMA domains, and split pageademon process into threads each processing queue in a single domain. The structure of the pagedaemons and queues is kept intact, most of the changes come from the need for code to find an owning page queue for given page, calculated from the segment containing the page.
The tie between NUMA domain and pagedaemon thread/pagequeue split is rather arbitrary, the multithreaded daemon could be allowed for the single-domain machines, or one domain might be split into several page domains, to further increase concurrency.
Right now, each pagedaemon thread tries to reach the global target, precalculated at the start of the pass. This is not optimal, since it could cause excessive page deactivation and freeing. The code should be changed to re-check the global page deficit state in the loop after some number of iterations.
The pagedaemons reach the quorum before starting the OOM, since one thread inability to meet the target is normal for split queues. Only when all pagedaemons fail to produce enough reusable pages, OOM is started by single selected thread.
Launder is modified to take into account the segments layout with regard to the region for which cleaning is performed.
Based on the preliminary patch by jeff, sponsored by EMC / Isilon Storage Division.
Reviewed by: alc Tested by: pho Sponsored by: The FreeBSD Foundation
|
250601 |
13-May-2013 |
attilio |
o Add accessor functions to add and remove pages from a specific freelist. o Split the pool of free pages queues really by domain and not rely on definition of VM_RAW_NFREELIST. o For MAXMEMDOM > 1, wrap the RR allocation logic into a specific function that is called when calculating the allocation domain. The RR counter is kept, currently, per-thread. In the future it is expected that such function evolves in a real policy decision referee, based on specific informations retrieved by per-thread and per-vm_object attributes. o Add the concept of "probed domains" under the form of vm_ndomains. It is responsibility for every architecture willing to support multiple memory domains to correctly probe vm_ndomains along with mem_affinity segments attributes. Those two values are supposed to remain always consistent. Please also note that vm_ndomains and td_dom_rr_idx are both int because segments already store domains as int. Ideally u_int would have much more sense. Probabilly this should be cleaned up in the future. o Apply RR domain selection also to vm_phys_zero_pages_idle().
Sponsored by: EMC / Isilon storage division Partly obtained from: jeff Reviewed by: alc Tested by: jeff
|
250389 |
08-May-2013 |
attilio |
Revert r250339 as apparently it is more clutter than help.
Sponsored by: EMC / Isilon storage division Requested by: jhb
|
250339 |
07-May-2013 |
attilio |
Add functions to do ACPI System Locality Information Table parsing and printing at boot. For reference on table informations and purposes please review ACPI specs.
Sponsored by: EMC / Isilon storage division Obtained from: jeff Reviewed by: jhb (earlier version)
|
250338 |
07-May-2013 |
attilio |
Rename VM_NDOMAIN into MAXMEMDOM and move it into machine/param.h in order to match the MAXCPU concept. The change should also be useful for consolidation and consistency.
Sponsored by: EMC / Isilon storage division Obtained from: jeff Reviewed by: alc
|
246805 |
14-Feb-2013 |
jhb |
Make VM_NDOMAIN a kernel option so that it can be enabled from a kernel config file.
Requested by: phk (ages ago) MFC after: 1 month
|
239340 |
16-Aug-2012 |
jkim |
Merge ACPICA 20120816.
|
237037 |
13-Jun-2012 |
jkim |
- Remove unused code for CR3 and CR4. - Fix few style(9) nits while I am here.
|
236938 |
12-Jun-2012 |
iwasaki |
Share IPI init and startup code of mp_machdep.c with acpi_wakeup.c as ipi_startup().
|
236772 |
09-Jun-2012 |
iwasaki |
Add x86/acpica/acpi_wakeup.c for amd64 and i386. Difference of suspend/resume procedures are minimized among them.
common: - Add global cpuset suspended_cpus to indicate APs are suspended/resumed. - Remove acpi_waketag and acpi_wakemap from acpivar.h (no longer used). - Add some variables in acpi_wakecode.S in order to minimize the difference among amd64 and i386. - Disable load_cr3() because now CR3 is restored in resumectx().
amd64: - Add suspend/resume related members (such as MSR) in PCB. - Modify savectx() for above new PCB members. - Merge acpi_switch.S into cpu_switch.S as resumectx().
i386: - Merge(and remove) suspendctx() into savectx() in order to match with amd64 code.
Reviewed by: attilio@, acpi@
|
233623 |
28-Mar-2012 |
jhb |
Allocate the ioapics[] array dynamically since it is only needed for the duration of madt_setup_io(). This avoids having the array take up permanent space in the BSS.
Inspired by: bde MFC after: 2 weeks
|
233305 |
22-Mar-2012 |
jhb |
Mark the 'lapics' and 'ioapics' arrays here static since they are private to this file. The 'lapics' array was actually shadowing a completely different 'lapics' array that is private to local_apic.c.
Reported by: bde MFC after: 2 weeks
|
229427 |
03-Jan-2012 |
jhb |
Fix a few bugs in the SRAT parsing code: - Actually increment ndomain when building our list of known domains so that we can properly renumber them to be 0-based and dense. - If the number of domains exceeds the configured maximum (VM_NDOMAIN), bail out of processing the SRAT and disable NUMA rather than hitting an obscure panic later. - Don't bother parsing the SRAT at all if VM_NDOMAIN is set to 1 to disable NUMA (the default).
Reported by: phk (2) MFC after: 1 week
|
228283 |
05-Dec-2011 |
ed |
Get rid of kludgy per-descriptor state handling in acpi_apm.
Where i386/bios/apm.c requires no per-descriptor state, the ACPI version of these device do. Instead of using hackish clone lists that leave stale device nodes lying around, use the cdevpriv API.
|
227293 |
07-Nov-2011 |
ed |
Mark MALLOC_DEFINEs static that have no corresponding MALLOC_DECLAREs.
This means that their use is restricted to a single C file.
|
226039 |
05-Oct-2011 |
jhb |
Ignore SRAT memory entries if the memory range does not overlap with an existing phys_avail[] table. If a hw.physmem setting causes a memory domain to not be present in phys_avail[], the SRAT table will now be ignored rather than triggering a panic when a CPU in the missing domain tries to allocate a page.
MFC after: 1 week
|
225177 |
25-Aug-2011 |
attilio |
Fix a deficiency in the selinfo interface: If a selinfo object is recorded (via selrecord()) and then it is quickly destroyed, with the waiters missing the opportunity to awake, at the next iteration they will find the selinfo object destroyed, causing a PF#.
That happens because the selinfo interface has no way to drain the waiters before to destroy the registered selinfo object. Also this race is quite rare to get in practice, because it would require a selrecord(), a poll request by another thread and a quick destruction of the selrecord()'ed selinfo object.
Fix this by adding the seldrain() routine which should be called before to destroy the selinfo objects (in order to avoid such case), and fix the present cases where it might have already been called. Sometimes, the context is safe enough to prevent this type of race, like it happens in device drivers which installs selinfo objects on poll callbacks. There, the destruction of the selinfo object happens at driver detach time, when all the filedescriptors should be already closed, thus there cannot be a race. For this case, mfi(4) device driver can be set as an example, as it implements a full correct logic for preventing this from happening.
Sponsored by: Sandvine Incorporated Reported by: rstone Tested by: pluknet Reviewed by: jhb, kib Approved by: re (bz) MFC after: 3 weeks
|
217265 |
11-Jan-2011 |
jhb |
Remove unneeded includes of <sys/linker_set.h>. Other headers that use it internally contain nested includes.
Reviewed by: bde
|
215097 |
10-Nov-2010 |
jkim |
Make APM emulation look more closer to its origin. Use device_get_softc(9) instead of hardcoding acpi(4) unit number as we have device_t for it.
|
215072 |
10-Nov-2010 |
jkim |
Refactor acpi_machdep.c for amd64 and i386, move APM emulation into a new file acpi_apm.c, and place it on sys/x86/acpica.
|
215024 |
09-Nov-2010 |
jkim |
Now OsdEnvironment.c is identical on amd64 and i386. Move it to a new home.
|
215012 |
08-Nov-2010 |
jhb |
Move the MADT parser for amd64 and i386 to sys/x86/acpica now that it is identical on both platforms.
|
210620 |
29-Jul-2010 |
jhb |
When performing a sanity check on the SRAT table to ensure that each memory domain has an assigned CPU, ignore disabled CPUs. Previously disabled CPUs were counted as being in domain 0.
Reported by: mdf
|
210552 |
27-Jul-2010 |
jhb |
Add a parser for the ACPI SRAT table for amd64 and i386. It sets PCPU(domain) for each CPU and populates a mem_affinity array suitable for the NUMA support in the physical memory allocator.
Reviewed by: alc
|