342587 |
29-Dec-2018 |
jhb |
MFC 340441: Revert r332735 and fix MSI-X to properly fail allocations when full.
The off-by-one errors in 332735 weren't actual errors and were preventing the last MSI interrupt source from being used. Instead, the issue is that when all MSI interrupt sources were allocated, the loop in msix_alloc() would terminate with 'msi' still set to non-null. The only check for 'i' overflowing was in the 'msi' == NULL case, so msix_alloc() would try to reuse the last MSI interrupt source instead of failing.
Fix by moving the check for all sources being in use to just after the loop. |
333126 |
30-Apr-2018 |
jhb |
MFC 332735: Fix two off-by-one errors when allocating MSI and MSI-X interrupts.
x86 enforces an (arbitray) limit on the number of available MSI and MSI-X interrupts to simplify code (in particular, interrupt_source[] is statically sized). This means that an attempt to allocate an MSI vector needs to fail if it would go beyond the limit, but the checks for exceeding the limit had an off-by-one error. In the case of MSI-X which allocates interrupts one at a time this meant that IRQ 768 kept getting handed out multiple times for msix_alloc() instead of failing because all MSI IRQs were in use. |
332743 |
19-Apr-2018 |
jhb |
MFC 331466: Add a workaround to the hypervisor detection for older versions of KVM.
Originally KVM set %eax to 0 in the cpuid leaf 0x4000000 rather than to the highest supported leaf in the hypervisor "branch". Detect this case and fixup the %eax value so that the hypervisor is still detected. |
331910 |
03-Apr-2018 |
avg |
MFC r327056: Use resume_cpus() instead of restart_cpus() to resume from ACPI suspension.
The merge needed fixing because common x86 code changed by the original commit is still split between i386 and amd64 in this branch. |
331305 |
21-Mar-2018 |
avg |
MFC r330793: fix r297857, do not modify CPU extension bits under virtual machines
PR: 213155 |
330959 |
15-Mar-2018 |
marius |
MFC: 327314
With the advent of interrupt remapping, Intel has repurposed bit 11 (now: Interrupt_Index[15]) and assigned the previously reserved bits 55:48 (Interrupt_Index[14:0] goes into 63:49 while Destination Field used 63:56 and bit 48 now is Interrupt_Format) in the IO redirection tables (see the VT-d specification, "5.1.5.1 I/OxAPIC Programming"). Thus, when not using interrupt remapping, ensure that all previously reserved bits in the high part of the RTEs are zero instead of doing a read-modify-write for their Destination Field bits only. Otherwise, on machines based on Apollo Lake and its derivatives such as Denverton, typically some of the previously preserved bits remain set after boot when not employing interrupt remapping. The result is that INTx interrupts are not getting delivered. Note: With an AMD IOMMU, interrupt remapping apparently bypasses the IO APIC altogether.
Submitted by: loos (modulo comment) Reviewed by: jhb (modulo comment) |
326638 |
06-Dec-2017 |
jkim |
MFC: r267961, r309361, r322710, r323286, r326378, r326383, r326407
Sync. hwpstate with head.
r267961 (hselasky, partial):
Remove a redundant TUNABLE statement.
r309361 (danfe):
- Mention mismatching numbers in MSR vs. ACPI _PSS count warning. - Rephrase unsupported AMD CPUs message and wrap as an overly long line. - Improve readability when reporting resulted P-state transition (debug).
r322710, r323286 (cem):
- Add support for family 17h pstate info from MSRs. - Yield CPU awaiting frequency change.
r326378, r326383, r326407:
- Fix some style(9) nits. - Add a tunable "debug.hwpstate_verify" to check P-state after changing it and turn it off by default. |
323733 |
19-Sep-2017 |
avg |
MFC r319212: fix indentation |
322523 |
14-Aug-2017 |
jkim |
MFC: r322323
Split identify_cpu() into two functions for amd64 as we do for i386. This fixes a regression introduced in r322205.
Approved by: re (marius) |
322205 |
07-Aug-2017 |
jkim |
MFC: r322076
Detect hypervisor early so that we set lower hz on it. > Description of fields to fill in above: 76 columns --| > PR: If and which Problem Report is related. > Submitted by: If someone else sent in the change. > Reported by: If someone else reported the issue. > Reviewed by: If someone else reviewed your modification. > Approved by: If you needed approval for this commit. > Obtained from: If the change is from a third party. > MFC after: N [day[s]|week[s]|month[s]]. Request a reminder email. > MFH: Ports tree branch name. Request approval for merge. > Relnotes: Set to 'yes' for mention in release notes. > Security: Vulnerability reference (one per line) or description. > Sponsored by: If the change was sponsored by an organization. > Differential Revision: https://reviews.freebsd.org/D### (*full* phabric URL needed). > Empty fields above will be automatically removed.
_M . M sys/amd64/amd64/machdep.c M sys/amd64/include/md_var.h M sys/i386/i386/machdep.c M sys/i386/include/md_var.h M sys/x86/x86/identcpu.c |
318977 |
27-May-2017 |
hselasky |
MFC r318353: Avoid use of contiguous memory allocations in busdma when possible.
This patch improves the boundary checks in busdma to allow more cases using the regular page based kernel memory allocator. Especially in the case of having a non-zero boundary in the parent DMA tag. For example AMD64 based platforms set the PCI DMA tag boundary to PCI_DMA_BOUNDARY, 4GB, which before this patch caused contiguous memory allocations to be preferred when allocating more than PAGE_SIZE bytes. Even if the required alignment was less than PAGE_SIZE bytes.
This patch also fixes the nsegments check for using kmem_alloc_attr() when the maximum segment size is less than PAGE_SIZE bytes.
Updated some comments describing the code in question.
Differential Revision: https://reviews.freebsd.org/D10645 Reviewed by: kib, jhb, gallatin, scottl Sponsored by: Mellanox Technologies |
315928 |
25-Mar-2017 |
grehan |
MFC r315361 and r315364: Hide MONITORX/MWAITX from guests.
r315361 Add the AMD MONITORX/MWAITX feature definition introduced in Bulldozer/Ryzen CPUs.
r315364 Hide the AMD MONITORX/MWAITX capability. Otherwise, recent Linux guests will use these instructions, resulting in #UD exceptions since bhyve doesn't implement MONITOR/MWAIT exits.
This fixes boot-time hangs in recent Linux guests on Ryzen CPUs (and probably Bulldozer aka AMD FX as well). |
314667 |
04-Mar-2017 |
avg |
MFC r283291: don't use CALLOUT_MPSAFE with callout_init()
The main purpose of this MFC is to reduce conflicts for other merges. Parts of the original change have already "trickled down" via individual MFCs. |
314662 |
04-Mar-2017 |
avg |
MFC r314357: edge-triggered interrupt mode is set by clearing APIC_LVT_TM |
314386 |
28-Feb-2017 |
avg |
MFC r313751: mca: fix writes to MSR_MC_CTL2 in cmci_update |
314350 |
27-Feb-2017 |
avg |
MFC r313752,r314035: mca: use time_uptime instead of ticks for CMCI throttling |
313165 |
03-Feb-2017 |
pfg |
MFC r312001: Remove __nonnull() attributes from x86 machine check architecture.
In this case the attributes serve little purpose as they just don't enforce run time checks, If anything the attributes would cause NULL pointer checks to be ignored but there are no such checks so the only effect is cosmetic.
Reviewed by: jhb, avg |
309443 |
02-Dec-2016 |
jhb |
MFC 308005: Add powerd(8) support for several families of AMD CPUs.
Use the same logic to calculate the nominal CPU frequency from the P-state MSRs on family 0x12, 0x15, and 0x16 CPUs as is used for family 0x10. Family 0x14 was included in the original patch in the PR but I left that out as the BIOS writer's guide for family 0x14 CPUs show a different layout for the relevant MSR and include a different formulate for calculating the frequency.
While here, simplify a few expressions and print out the family of unsupported CPUs in hex rather than decimal.
PR: 212020 |
309168 |
25-Nov-2016 |
jhb |
MFC 307333: Reprogram I/O APIC interrupt pins when registering an I/O APIC.
All I/O APIC pins are masked when an I/O APIC is first probed. The APIC enumerator (MP Table or MADT) then parses its associated tables to configure individual pins to set custom delivery modes or alternate routing (e.g. routing IRQ 0 to intpin 2). Pins for regular interrupt pins are left masked until the first interrupt is assigned. However, pins with unusual settings (e.g. NMI or SMI) are never assigned an interrupt and thus never re-programmed. The I/O APIC code used to reprogram all interrupt pins during registration but this was lost in r151979.
In theory, this is mostly a no-op as the ACPI APIC table does not include a way to enumerate NMI or SMI pins for the I/O APIC, so only systems using an MP Table would be affected. |
307258 |
14-Oct-2016 |
sephe |
MFC 306481
x86/ioapic: Fix destination cpu for Hyper-V
On Hyper-V: - Stick to the first cpu for all I/O APIC pins. - And don't allow destination cpu changes.
Reviewed by: jhb Sponsored by: Microsoft Differential Revision: https://reviews.freebsd.org/D7949 |
307244 |
14-Oct-2016 |
sephe |
MFC 305722
x86: Use sx lock for interrupt sources.
- Certain pic_assign_cpu, e.g. msi_assign_cpu can have quite a long call chain. For msi_assign_cpu, mutex makes complex PCI bridge drivers more tricky, e.g. sleep can note be called, etc, it will be pretty tricky for upcoming Hyper-V PCI bridge driver for PCI pass-through. - It is not used on any hot code path nor non-sleepable context, so sx should have the same effect as mutex.
PIC list is still protected by mutex to keep suspend/resume work.
Discussed with: jhb Reviewed by: jhb Sponsored by: Microsoft Differential Revision: https://reviews.freebsd.org/D7784 |
307213 |
13-Oct-2016 |
royger |
MFC r303491:
Revert r291022: x86/intr: allow mutex recursion in intr_remove_handler
Sponsored by: Citrix Systems R&D |
306536 |
30-Sep-2016 |
jkim |
MFC: r284583, r285797, r285799, r287168, r298714, r298720, r298838, r300879
Merge ACPICA up to 20160527.
Requested by: mav |
306466 |
30-Sep-2016 |
jhb |
MFC 303886: Add additional constants.
- Add constants for the fields in the root-entry table address register, namely the root type type (RTT) and root table address (RTA) mask. - Add macros for the bitmask of the domain ID field in the second word of context table entries as well as a helper macro (DMAR_CTX2_GET_DID) to extract the domain ID from a context table entry.
Sponsored by: Chelsio Communications |
305826 |
15-Sep-2016 |
kib |
MFC r305744: Fix typo in comment. |
305672 |
09-Sep-2016 |
jhb |
MFC 304637: Fix build for !SMP kernels after the Xen MSIX workaround.
Move msix_disable_migration under #ifdef SMP since it doesn't make sense for !SMP kernels.
PR: 212014 |
305615 |
08-Sep-2016 |
pfg |
MFC r303891, r303892: sys: replace comma with semicolon when pertinent.
Uses of commas instead of a semicolons can easily go undetected. The comma can serve as a statement separator but this shouldn't be abused when statements are meant to be standalone. |
303776 |
05-Aug-2016 |
jhb |
MFC 302181,302635: Disable MSI-X migration on older Xen hypervisors.
302181: Add a tunable to disable migration of MSI-X interrupts.
The new 'machdep.disable_msix_migration' tunable can be set to 1 to disable migration of MSI-X interrupts.
Xen versions prior to 4.6.0 do not properly handle updates to MSI-X table entries after the initial write. In particular, the operation to unmask a table entry after updating it during migration is not propagated to the "real" table for passthrough devices causing the interrupt to remain masked. At least some systems in EC2 are affected by this bug when using SRIOV. The tunable can be set in loader.conf as a workaround.
302635: xen: automatically disable MSI-X interrupt migration
If the hypervisor version is smaller than 4.6.0. Xen commits 74fd00 and 70a3cb are required on the hypervisor side for this to be fixed, and those are only included in 4.6.0, so stay on the safe side and disable MSI-X interrupt migration on anything older than 4.6.0.
It should not cause major performance degradation unless a lot of MSI-X interrupts are allocated. |
301776 |
10-Jun-2016 |
kib |
MFC r301278 Reduce number of iterations used for calibrating ICR read loop.
MFC r301279: Record correct commit message for r301278. |
299485 |
11-May-2016 |
vangyzen |
MFC r299004: Work around (ignore) broken SRAT tables |
299062 |
04-May-2016 |
avg |
MFC r297857: re-enable AMD Topology extension on certain models if disabled by BIOS |
298715 |
27-Apr-2016 |
jhb |
MFC 297039,297374,297398,297484: Poll the IPI status while waiting constantly instead of delaying 5 microseconds between checks. This avoids inserting a minimum latency of 5 microseconds on each IPI.
297039: Check IPI status more frequently when waiting.
An IPI cannot be sent via the local APIC if a previous IPI is still being delivered. Attempts to send an IPI will wait for a pending IPI to clear. Prior to r278325 these checks used a spin loop with a hardcoded maximum count which broke AP startup on some systems. However, r278325 also enforced a minimum latency of 5 microseconds if an IPI was still pending which resulted in a measurable performance hit. This change reduces that minimum latency to 1 microsecond.
297374: Calibrate the frequency of the of the native_lapic_ipi_wait() loop, and avoid a delay while waiting for IPI delivery acknowledgement in xAPIC mode. This makes the loop exit immediately after the delivery bit in APIC_ICR register is set, instead of waiting for some microseconds.
We only need to ensure that some amount of time is allowed for the LAPIC to react to the command, and we need that the wait time is finite and reasonable. For that reasons, it is irrelevant if the CPU frequency or throttling decrease the speed and make the loop, calibrated for full CPU speed at boot time, execute somewhat slower.
297398: Fix several bugs in r297374: - fix UP build [1] - do not obliterate initial reading of rdtsc by the loop counter [2] - restore the meaning of the argument -1 to native_lapic_ipi_wait() as wait until LAPIC acknowledge without timeout - correct formula for calculating loop iteration count for 1us, it was inverted, and ensure that even on unlikely slow CPUs at least one check for ack is performed.
297484: Style(9), use tabs for the #define LOOPS line. Print unsigned values with %u. Make code slightly more compact by inlining loop limit. |
298506 |
23-Apr-2016 |
kib |
MFC r298101: Add x86 CPU features definitions published in the Intel SDM rev. 58. |
295789 |
19-Feb-2016 |
sephe |
MFC [Hyper-V]: r293719-r293722, r293869-r293871, r293873-r293875, r293877
r293719 hyperv/hn: Implement LRO r293720 hyperv/hn: Implement SIOC[SG]IFMEDIA support r293721 hyperv/hn: Avoid mbuf cluster allocation, if the packet is small. r293722 hyperv/hn: Removed unused netvsc_init() r293869 hyperv/hn: Unbreak LINT-NOIP r293870 hyperv: use x86 generic code to do the hypervisor detection r293871 hyperv: remove unused vmbus definitions r293873 hyperv: implement an event timer r293874 hyperv: add interrupt counters r293875 hyperv: set receive buffer size according to NVSP protocol version r293877 Unbreak `make depend` with sys/modules/hyperv/vmbus after r293870
Approved by: re (glebius), adrian (mentor) Sponsored by: Microsoft OSTC |
294677 |
24-Jan-2016 |
ian |
MFC r289618, r290316:
Fix printf format to allow for bus_size_t not being u_long on all platforms.
Fix an alignment check that is wrong in half the busdma implementations. This will enable the elimination of a workaround in the USB driver that artifically allocates buffers twice as big as they need to be (which actually saves memory for very small buffers on the buggy platforms).
When deciding how to allocate a dma buffer, armv4, armv6, mips, and x86/iommu all correctly check for the tag alignment <= maxsize as enabling simple uma/malloc based allocation. Powerpc, sparc64, x86/bounce, and arm64/bounce were all checking for alignment < maxsize; on those platforms when alignment was equal to the max size it would fall back to page-based allocators even for very small buffers.
This change makes all platforms use the <= check. It should be noted that on all platforms other than arm[v6] and mips, this check is relying on undocumented behavior in malloc(9) that if you allocate a block of a given size it will be aligned to the next larger power-of-2 boundary. There is nothing in the malloc(9) man page that makes that explicit promise (but the busdma code has been relying on this behavior all along so I guess it works).
Arm and mips code uses the allocator in kern/subr_busdma_buffalloc.c, which does explicitly implement this promise about size and alignment. Other platforms probably should switch to the aligned allocator. |
294274 |
18-Jan-2016 |
emaste |
MFC r293343: Move amd64 metadata.h to x86 and share with i386 |
293195 |
05-Jan-2016 |
kib |
MFC r292890: Add standard extended feature bit 6 from the Intel SDM rev. 57. |
292775 |
27-Dec-2015 |
marius |
MFC: r286785, r291088, r291120 - Reformat x86 bounce buffer synchronization code to reduce indentation. No functional change. - Avoid a NULL pointer dereference in bounce_bus_dmamap_sync() when the map has been created via bounce_bus_dmamem_alloc(). Even for coherent DMA - which bus_dmamem_alloc(9) typically is used for -, calling of bus_dmamap_sync(9) isn't optional. [1] - Avoid a NULL pointer dereference in bounce_bus_dmamap_unload() when the map has been created via bounce_bus_dmamem_alloc(). In that case bus_dmamap_unload(9) typically isn't called during normal operation but still should be during detach, cleanup from failed attach etc. [2]
PR: 188899 (non-original problem) [1] Submitted by: yongari [2] |
291687 |
03-Dec-2015 |
royger |
MFC r291024:
xen: fix dropping bitmap IPIs during resume
Sponsored by: Citrix Systems R&D |
291647 |
02-Dec-2015 |
royger |
Revert MFC of r291023:
Due to the delta between HEAD and stable/10 event channel code, this fix is not needed on stable/10 and was also causing build issues. Revert it. |
291645 |
02-Dec-2015 |
royger |
MFC r291023:
xen/intr: properly dispose event channels on resume
Sponsored by: Citrix Systems R&D |
291644 |
02-Dec-2015 |
royger |
MFC r291022:
x86/intr: allow mutex recursion in intr_remove_handler
Sponsored by: Citrix Systems R&D |
291378 |
27-Nov-2015 |
kib |
MFC r291266: Correct the number of DTLB entries reported for the CPUID Leaf 2 descriptor 0x6c. |
291239 |
24-Nov-2015 |
royger |
MFC r286999:
xen: allow disabling PV disks and nics
Sponsored by: Citrix Systems R&D |
290187 |
30-Oct-2015 |
kib |
MFC r289823: Decode new values for CPUID leaf 2 cache and TLB descriptors, from the Intel SDM revision 56. |
288461 |
01-Oct-2015 |
jhb |
MFC 284175: Handle X2APIC entries in the MADT for APICs with an ID < 255. At least one BIOS has been seen to include such entries even though the relevant specs require that X2APIC entries only be used for CPUs with an APIC ID >= 255.
This was tested on a system with "plain" local APIC entries in the MADT to ensure no regressions, but it has not yet been tested on a system with X2APIC entries in the MADT. Currently such systems do not boot at all, and with this change they might now boot correctly. |
287462 |
04-Sep-2015 |
sbruno |
MFC r276834
Update Features2 to display SDBG capability of processor. This is showing up on Haswell-class CPUs
From the Intel SDM, "Table 3-20. Feature Information Returned in the ECX Register"
11 | SDBG | A value of 1 indicates the processor supports IA32_DEBUG_INTERFACE MSR for silicon debug.
Submitted by: jiashiun@gmail.com |
287139 |
25-Aug-2015 |
jkim |
MFC: r286265, r286293, r286328
Always define __va_list for amd64 and restore pre-r232261 behavior for i386. |
287126 |
25-Aug-2015 |
marcel |
MFC r286667 & r286723
Better support memory mapped console devices, such as VGA and EFI frame buffers and memory mapped UARTs.
PR: 191564, 194952, 202276 |
286854 |
17-Aug-2015 |
kib |
MFC r286777: Comment only change, fix grammar and somewhat clarify the action. |
286852 |
17-Aug-2015 |
kib |
MFC r286228: Clear the IA32_MISC_ENABLE MSR bit on APs. |
286311 |
05-Aug-2015 |
kib |
Implement x86 ptrace(2) requests PT_{GET,SET}{FS,GS}BASE.
MFC r284918: Add helper fill_based_sd(9).
MFC r284919: Add x86 PT_GETFSBASE, PT_GETGSBASE machine-depended ptrace requests to obtain the thread %fs and %gs bases. Add x86 PT_SETFSBASE and PT_SETGSBASE requests to set the bases from debuggers. The set requests, similarly to the sysarch({I386,AMD64}_SET_FSBASE), override the corresponding segment registers.
MFC r284965: Document x86 machine-specific ptrace(2) requests.
MFC r285011: Disallow a debugger on 64bit system to set fs/gs bases of the 32bit process beyond the end of the process address space.
MFC r285104: Grammar and language fixes. |
286275 |
04-Aug-2015 |
kib |
MFC r285932: Add bit names for the IA32_MISC_ENABLE msr. |
285446 |
13-Jul-2015 |
brueffer |
MFC: r284931
Set the initial system time to a sane (as in: not end of 21st century) value when booting on a PC with CMOS clock set to a year before 2000.
This uses 1980 (instead of 1970 as in the initial patch) as pivot year as suggested by imp in the PR followup.
PR: 195703 Submitted by: cs@soi.spb.ru Reviewed by: imp Approved by: re (gjb) |
285174 |
05-Jul-2015 |
marius |
MFC: r281751
Refine the workaround for Intel HSD131 [1] added in r269052 (MFCed to stable/10 in r269592): - Use the full mask described by the erratum as with a sufficiently high number of these false-positives, the overflow bit (bit 62) additionally gets set [7]. - HSD131 has been brought into several other Haswell-derived CPUs including to the next generation, i. e. Intel Broadwell. Thus, also skip reporting of these benign errors by default on CPU models affected by HSM142, HSW131 and BDM48 [2 - 5], describing the HSD131 silicon bug for additional models. Also, Celeron 2955U with a CPU ID of 0x45 have been reported to be covered by this fault [6], with the specification update concerned with HSM142 [2] only referring to 0x3c and 0x46.
Submitted by: David Froehlich [7] Approved by: re (kib)
http://www.intel.de/content/dam/www/public/us/en/documents/specification-updates/4th-gen-core-family-desktop-specification-update.pdf [1] http://www.intel.com/content/dam/www/public/us/en/documents/specification-updates/4th-gen-core-family-mobile-specification-update.pdf [2] http://www.intel.com/content/dam/www/public/us/en/documents/specification-updates/5th-gen-core-family-spec-update.pdf [3] http://www.intel.de/content/dam/www/public/us/en/documents/specification-updates/core-m-processor-family-spec-update.pdf [4] http://www.intel.com/content/dam/www/public/us/en/documents/specification-updates/xeon-e3-1200v3-spec-update.pdf [5] https://lists.freebsd.org/pipermail/freebsd-hackers/2015-January/046878.html [6] |
284900 |
28-Jun-2015 |
neel |
MFC r282209: Emulate the 'bit test' instruction.
MFC r282259: Re-implement RTC current time calculation to eliminate the possibility of losing time.
MFC r282281: Advertise the MTRR feature via CPUID and emulate the minimal set of MTRR MSRs.
MFC r282284: When an instruction cannot be decoded just return to userspace so bhyve(8) can dump the instruction bytes.
MFC r282287: Don't require <sys/cpuset.h> to be always included before <machine/vmm.h>.
MFC r282296: Emulate MSR_SYSCFG which is accessed by Linux on AMD cpus when MTRRs are enabled.
MFC r282301: Relax limits when transitioning a vector from the IRR to the ISR and also when extinguishing it from the ISR in response to an EOI.
MFC r282335: Advertise an additional memory BAR in the "dummy" device emulation.
MFC r282336: Emulate machine check related MSRs to allow guest OSes like Windows to boot.
MFC r282351: Don't advertise the Intel SMX capability to the guest.
MFC r282407: Emulate the 'CMP r/m8, imm8' instruction.
MFC r282519: Add macros for AMD-specific bits in MSR_EFER: LMSLE, FFXSR and TCE.
MFC r282520: Emulate guest writes to EFER_MSR properly.
MFC r282558: Deprecate the 3-way return values from vm_gla2gpa() and vm_copy_setup().
MFC r282571: Check 'td_owepreempt' and yield the vcpu thread if it is set.
MFC r282595: Allow byte reads of AHCI registers.
MFC r282784: Handling indirect descriptors is a capability of the host and not one that needs to be negotiated. Use the host capabilities field and not the negotiated field when verifying that indirect descriptors are supported.
MFC r282788: Allow configuration of the sector size advertised to the guest.
MFC r282865: Set the subvendor field in config space to the vendor ID. This is required by the Windows virtio drivers to correctly match a device.
MFC r282922: Bump the size of the blockif scatter-gather list to 67.
MFC r283075: Fix off-by-one in array index bounds check. bhyveload would allow you to create 33 entries on an array that only has 32 slots
MFC r283168: Temporarily revert r282922 which bumped the max descriptors.
MFC r283255: Emulate the "CMP r/m, reg" instruction (opcode 39H).
MFC r283256: Add an option "--get-vmcs-exit-inst-length" to display the instruction length of the instruction that caused the VM-exit.
MFC r283264: Change the header type of the emulated host-bridge from type 1 to type 0.
MFC r283293: Don't rely on the 'VM-exit instruction length' field in the VMCS to always have an accurate length on an EPT violation.
MFC r283299: Remove bogus verification of instruction length after instruction decode.
MFC r283308: Exceptions don't deliver an error code in real mode.
MFC r283657: Fix non-deterministic delays when accessing a vcpu that was in "running" or "sleeping" state.
MFC r283973: Use tunable 'hw.vmm.svm.features' to disable specific SVM features even though they might be available in hardware. Use tunable 'hw.vmm.svm.num_asids' to limit the number of ASIDs used by the hypervisor.
MFC r284046: Fix regression in 'verify_gla()' with the RIP-relative addressing mode.
MFC r284174: Support guest writes to the TSC by enabling the "use TSC offsetting" execution control. |
284338 |
13-Jun-2015 |
kib |
MFC r284104: Updates from SDM rev. 55. |
284021 |
05-Jun-2015 |
kib |
MFC r283735: Remove several write-only variables. |
284019 |
05-Jun-2015 |
kib |
MFC r283692: Explicitely enable queued invalidation completion interrupt. |
283927 |
02-Jun-2015 |
jhb |
MFC 281887: Reassign copyright statements on several files from Advanced Computing Technologies LLC to Hudson River Trading LLC. |
283910 |
02-Jun-2015 |
jhb |
MFC 281266: Move the 32-bit compatible procfs types from freebsd32.h to <sys/procfs.h> and export them to userland. - Define __HAVE_REG32 on platforms that define a reg32 structure and check for this in <sys/procfs.h> to control when to export prstatus32, etc. - Add prstatus32_t and prpsinfo32_t typedefs for the 32-bit structures. libbfd looks for these types, and having them fixes 'gcore' in gdb of a 32-bit process on a 64-bit platform. - Use the structure definitions from <sys/procfs.h> in gcore's elf32 core dump code instead of duplicating the definitions. |
282506 |
05-May-2015 |
hselasky |
MFC r282120: The add_bounce_page() function can be called when loading physical pages which pass a NULL virtual address. If the BUS_DMA_KEEP_PG_OFFSET flag is set, use the physical address to compute the page offset instead. The physical address should always be valid when adding bounce pages and should contain the same page offset like the virtual address.
Submitted by: Svatopluk Kraus <onwahe@gmail.com> Reviewed by: jhb@ |
282065 |
27-Apr-2015 |
kib |
MFC r281495: Add config option PAE_TABLES for the i386 kernel. It switches pmap to use PAE format for the page tables, but does not incur other consequences of the full PAE config. In particular, vm_paddr_t and bus_addr_t are left 32bit, and max supported memory is still limited by 4GB.
The option allows to have nx permissions for memory mappings on i386 kernel, while keeping the usual i386 KBI and avoiding the kernel data sizing problems typical for the PAE config. |
281687 |
18-Apr-2015 |
jkim |
MFC: r281396, r281475
Merge ACPICA 20150410.
Relnotes: yes |
281560 |
15-Apr-2015 |
jhb |
MFC 278325,280866: Revert the IPI startup sequence to match what is described in the Intel Multiprocessor Specification v1.4. The Intel SDM claims that
278325: Revert the IPI startup sequence to match what is described in the Intel Multiprocessor Specification v1.4. The Intel SDM claims that the INIT IPIs here are invalid, but other systems follow the MP spec instead.
While here, fix the IPI wait routine to accept a timeout in microseconds instead of a raw spin count, and don't spin forever during AP startup. Instead, panic if a STARTUP IPI is not delivered after 20 us.
280866: Wait 100 microseconds for a local APIC to dispatch each startup-related IPI rather than 20. The MP 1.4 specification states in Appendix B.2:
"A period of 20 microseconds should be sufficient for IPI dispatch to complete under normal operating conditions".
(Note that this appears to be separate from the 10 millisecond (INIT) and 200 microsecond (STARTUP) waits after the IPIs are dispatched.) The Intel SDM is silent on this issue as far as I can tell.
At least some hardware requires 60 microseconds as noted in the PR, so bump this to 100 to be on the safe side.
PR: 196542, 197756 |
281545 |
15-Apr-2015 |
kib |
MFC r281254: Account for the offset of the page run when allocating the dmar_map_entry. |
280973 |
02-Apr-2015 |
jhb |
MFC 276724: On some Intel CPUs with a P-state but not C-state invariant TSC the TSC may also halt in C2 and not just C3 (it seems that in some cases the BIOS advertises its C3 state as a C2 state in _CST). Just play it safe and disable both C2 and C3 states if a user forces the use of the TSC as the timecounter on such CPUs.
PR: 192316 |
280970 |
01-Apr-2015 |
jhb |
MFC 261790: Add support for managing PCI bus numbers. As with BARs and PCI-PCI bridge I/O windows, the default is to preserve the firmware-assigned resources. PCI bus numbers are only managed if NEW_PCIB is enabled and the architecture defines a PCI_RES_BUS resource type. - Add a helper API to create top-level PCI bus resource managers for each PCI domain/segment. Host-PCI bridge drivers use this API to allocate bus numbers from their associated domain. - Change the PCI bus and CardBus drivers to allocate a bus resource for their bus number from the parent PCI bridge device. - Change the PCI-PCI and PCI-CardBus bridge drivers to allocate the full range of bus numbers from secbus to subbus from their parent bridge. The drivers also always program their primary bus register. The bridge drivers also support growing their bus range by extending the bus resource and updating subbus to match the larger range. - Add support for managing PCI bus resources to the Host-PCI bridge drivers used for amd64 and i386 (acpi_pcib, mptable_pcib, legacy_pcib, and qpi_pcib). - Define a PCI_RES_BUS resource type for amd64 and i386.
PR: 197076 |
280969 |
01-Apr-2015 |
jhb |
MFC 260973: - Reuse legacy_pcib_(read|write)_config() methods in the QPI pcib driver. - Reuse legacy_pcib_alloc_msi{,x}() methods in the QPI and mptable pcib drivers. |
280874 |
31-Mar-2015 |
kib |
MFC r280435: When mapping an allocated entry, use the entry size, instead of the requested size. If tag restrictions caused split entry, its size is less then requsted. |
280873 |
31-Mar-2015 |
kib |
MFC r280434: Assert that the mapping loop makes progress. |
280684 |
26-Mar-2015 |
kib |
MFC r280254: Provide definitions for all descriptors types in the DMAR invalidation queue. |
280425 |
24-Mar-2015 |
kib |
MFC r280196: Recheck that boundary is not crossed after the move to satisfy boundary restriction. |
280424 |
24-Mar-2015 |
kib |
MFC r280195: When inserting new entry into the address map, ensure that not only next entry does not intersect with the tail of the new entry, but also that previous entry is also before new entry start. |
280343 |
22-Mar-2015 |
kib |
MFC r280253: Fix syntax error. |
279904 |
12-Mar-2015 |
scottl |
MFC r271889, 272799, 272800, 274976 This brings in bus_get_domain() and the related reporting via devinfo, dmesg, and sysctl.
Obtained from: adrian, jhb Sponsored by: Netflix, Inc. |
279486 |
01-Mar-2015 |
kib |
MFC r276949: (only to ease merging of r279117).
MFC r279117: Revert r276949 and redo the fix for PCIe/PCI bridges, which do not follow specification and do not provide PCIe capability. |
279485 |
01-Mar-2015 |
kib |
MFC r276948: Print rid when announcing DMAR context creation. Print sid when fault occurs. |
279484 |
01-Mar-2015 |
kib |
MFC r276867: Fix DMAR context allocations for the devices behind PCIe->PCI bridges after dmar driver was converted to use rids. The bus component to calculate context page must be taken from the requestor rid, which is a bridge, and not from the device bus number. |
279470 |
01-Mar-2015 |
rstone |
MFC r264007,r264008,r264009,r264011,r264012,r264013
MFC support for PCI Alternate RID Interpretation. ARI is an optional PCIe feature that allows PCI devices to present up to 256 functions on a bus. This is effectively a prerequisite for PCI SR-IOV support.
r264007: Add a method to get the PCI RID for a device.
Reviewed by: kib MFC after: 2 months Sponsored by: Sandvine Inc.
r264008: Re-implement the DMAR I/O MMU code in terms of PCI RIDs
Under the hood the VT-d spec is really implemented in terms of PCI RIDs instead of bus/slot/function, even though the spec makes pains to convert back to bus/slot/function in examples. However working with bus/slot/function is not correct when PCI ARI is in use, so convert to using RIDs in most cases. bus/slot/function will only be used when reporting errors to a user.
Reviewed by: kib MFC after: 2 months Sponsored by: Sandvine Inc.
r264009: Re-write bhyve's I/O MMU handling in terms of PCI RID.
Reviewed by: neel MFC after: 2 months Sponsored by: Sandvine Inc.
r264011: Add support for PCIe ARI
PCIe Alternate RID Interpretation (ARI) is an optional feature that allows devices to have up to 256 different functions. It is implemented by always setting the PCI slot number to 0 and re-purposing the 5 bits used to encode the slot number to instead contain the function number. Combined with the original 3 bits allocated for the function number, this allows for 256 functions.
This is enabled by default, but it's expected to be a no-op on currently supported hardware. It's a prerequisite for supporting PCI SR-IOV, and I want the ARI support to go in early to help shake out any bugs in it. ARI can be disabled by setting the tunable hw.pci.enable_ari=0.
Reviewed by: kib MFC after: 2 months Sponsored by: Sandvine Inc.
r264012: Print status of ARI capability in pciconf -c
Teach pciconf how to print out the status (enabled/disabled) of the ARI capability on PCI Root Complexes and Downstream Ports.
MFC after: 2 months Sponsored by: Sandvine Inc.
r264013: Add missing copyright date.
MFC after: 2 months |
279211 |
23-Feb-2015 |
jhb |
MFC 274817,274878,276801,276840,278976: Improve support for XSAVE with debuggers. - Dump an NT_X86_XSTATE note if XSAVE is in use. This note is designed to match what Linux does in that 1) it dumps the entire XSAVE area including the fxsave state, and 2) it stashes a copy of the current xsave mask in the unused padding between the fxsave state and the xstate header at the same location used by Linux. - Teach readelf() to recognize NT_X86_XSTATE notes. - Change PT_GET/SETXSTATE to take the entire XSAVE state instead of only the extra portion. This avoids having to always make two ptrace() calls to get or set the full XSAVE state. - Add a PT_GET_XSTATE_INFO which returns the length of the current XSTATE save area (so the size of the buffer needed for PT_GETXSTATE) and the current XSAVE mask (%xcr0). |
278947 |
18-Feb-2015 |
kib |
MFC r278606: Registers definitions for the new capabilities. |
278946 |
18-Feb-2015 |
kib |
MFC r278605: vm_page_lookup() accepts read-locked object. |
278522 |
10-Feb-2015 |
jhb |
MFC 273800: Rework virtual machine hypervisor detection. - Move the existing code to x86/x86/identcpu.c since it is x86-specific. - If the CPUID2_HV flag is set, assume a hypervisor is present and query the 0x40000000 leaf to determine the hypervisor vendor ID. Export the vendor ID and the highest supported hypervisor CPUID leaf via hv_vendor[] and hv_high variables, respectively. The hv_vendor[] array is also exported via the hw.hv_vendor sysctl. - Merge the VMWare detection code from tsc.c into the new probe in identcpu.c. Add a VM_GUEST_VMWARE to identify vmware and use that in the TSC code to identify VMWare. |
277493 |
21-Jan-2015 |
jhb |
MFC 272666: Fix build for i386 kernels with out 'I686_CPU'.
Reported by: Mike Tancsa <mike@sentex.net> |
277374 |
19-Jan-2015 |
kib |
MFC r277047: For x86, read MAXPHYADDR into variable cpu_maxphyaddr. |
277315 |
18-Jan-2015 |
kib |
MFC r277023: Avoid excessive flushing and do missed neccessary flushing in the IOMMU page table update code. |
276482 |
31-Dec-2014 |
neel |
MFC r273748
Output a summary of optional SVM features in dmesg similar to CPU features. If bootverbose is enabled, a detailed list is provided; otherwise, a single-line summary is displayed.
Requested by: jhb |
276386 |
30-Dec-2014 |
neel |
MFC 261321 Rename the AMD MSR_PERFCTR[0-3] so the Pentium Pro MSR_PERFCTR[0-1] aren't redefined.
MFC r273214 Fix build to not bogusly always rebuild vmm.ko.
MFC r273338 Add support for AMD's nested page tables in pmap.c: - Provide the correct bit mask for various bit fields in a PTE (e.g. valid bit) for a pmap of type PT_RVI. - Add a function 'pmap_type_guest(pmap)' that returns TRUE if the pmap is of type PT_EPT or PT_RVI.
Add CPU_SET_ATOMIC_ACQ(num, cpuset): This is used when activating a vcpu in the nested pmap. Using the 'acquire' variant guarantees that the load of the 'pm_eptgen' will happen only after the vcpu is activated in 'pm_active'.
Add defines for various AMD-specific MSRs.
Discussed with: kib (r261321) |
276349 |
28-Dec-2014 |
neel |
MFC r270326 Fix a recursive lock acquisition in vi_reset_dev().
MFC r270434 Return the spurious interrupt vector (IRQ7 or IRQ15) if the atpic cannot find any unmasked pin with an interrupt asserted.
MFC r270436 Fix a bug in the emulation of CPUID leaf 0x4.
MFC r270437 Add "hw.vmm.topology.threads_per_core" and "hw.vmm.topology.cores_per_package" tunables to modify the default cpu topology advertised by bhyve.
MFC r270855 Set the 'inst_length' to '0' early on before any error conditions are detected in the emulation of the task switch. If any exceptions are triggered then the guest %rip should point to instruction that caused the task switch as opposed to the one after it.
MFC r270857 The "SUB" instruction used in getcc() actually does 'x -= y' so use the proper constraint for 'x'. The "+r" constraint indicates that 'x' is an input and output register operand.
While here generate code for different variants of getcc() using a macro GETCC(sz) where 'sz' indicates the operand size.
Update the status bits in %rflags when emulating AND and OR opcodes.
MFC r271439 Initialize 'bc_rdonly' to the right value.
MFC r271451 Optimize the common case of injecting an interrupt into a vcpu after a HLT by explicitly moving it out of the interrupt shadow.
MFC r271888 Restructure the MSR handling so it is entirely handled by processor-specific code.
MFC r271890 MSR_KGSBASE is no longer saved and restored from the guest MSR save area. This behavior was changed in r271888 so update the comment block to reflect this.
MFC r271891 Add some more KTR events to help debugging.
MFC r272197 mmap(2) requires either MAP_PRIVATE or MAP_SHARED for non-anonymous mappings.
MFC r272395 Get rid of code that dealt with the hardware not being able to save/restore the PAT MSR on guest exit/entry. This workaround was done for a beta release of VMware Fusion 5 but is no longer needed in later versions.
All Intel CPUs since Nehalem have supported saving and restoring MSR_PAT in the VM exit and entry controls.
MFC r272670 Inject #UD into the guest when it executes either 'MONITOR' or 'MWAIT'.
MFC r272710 Implement the FLUSH operation in the virtio-block emulation.
MFC r272838 iasl(8) expects integer fields in data tables to be specified as hexadecimal values. Therefore the bit width of the "PM Timer Block" was actually being interpreted as 50-bits instead of the expected 32-bit.
This eliminates an error message emitted by a Linux 3.17 guest during boot: "Invalid length for FADT/PmTimerBlock: 50, using default 32"
MFC r272839 Support Intel-specific MSRs that are accessed when booting up a linux in bhyve: - MSR_PLATFORM_INFO - MSR_TURBO_RATIO_LIMITx - MSR_RAPL_POWER_UNIT
MFC r273108 Emulate "POP r/m". This is needed to boot OpenBSD/i386 MP kernel in bhyve.
MFC r273212 Support stopping and restarting the AHCI command list via toggling PxCMD.ST from '1' to '0' and back. This allows the driver a chance to recover if for instance a timeout occurred due to activity on the host. |
276134 |
23-Dec-2014 |
kib |
MFC r271208: Add a define for index of IA32_XSS MSR. |
276133 |
23-Dec-2014 |
kib |
MFC r271206: Adjust the definition of struct xstate_hdr according to SDM rev. 50. |
276132 |
23-Dec-2014 |
kib |
MFC r271197: Add more bits for the XSAVE features from CPUID 0xd, sub-function 1 %eax report. Print the XSAVE features 0xd/1 in the boot banner. |
276084 |
22-Dec-2014 |
jhb |
MFC 273988,273989,273995,274057: MFamd64: Add support for extended FPU states on i386. This includes support for AVX on i386. |
276076 |
22-Dec-2014 |
jhb |
MFC 271405,271408,271409,272658: MFamd64: Use initializecpu() to set various model-specific registers on AP startup and AP resume (it was already used for BSP startup and BSP resume). |
276070 |
22-Dec-2014 |
jhb |
MFC 260557,271076,271077,271082,271083,271098: - Remove spaces from boot messages when we print the CPU ID/Family/Stepping - Move prototypes for various functions into out of C files and into <machine/md_var.h>. - Reduce diffs between i386 and amd64 initcpu.c and identcpu.c files. - Move blacklists of broken TSCs out of the printcpuinfo() function and into the TSC probe routine. - Merge the amd64 and i386 identcpu.c into a single x86 implementation. |
273736 |
27-Oct-2014 |
hselasky |
MFC r263710, r273377, r273378, r273423 and r273455:
- De-vnet hash sizes and hash masks. - Fix multiple issues related to arguments passed to SYSCTL macros.
Sponsored by: Mellanox Technologies |
271999 |
22-Sep-2014 |
jhb |
MFC 270850,271053,271192,271717: Save and restore FPU state across suspend and resume on i386. - Create a separate structure for per-CPU state saved across suspend and resume that is a superset of a pcb. - Store the FPU state for suspend and resume in the new structure (for amd64, this moves it out of the PCB) - On both i386 and amd64, all of the FPU suspend/resume handling is now done in C.
Approved by: re (hrs) |
270873 |
31-Aug-2014 |
akiyama |
MFC r263859: Change default logic to CONFORM because this routine is shared with SCI polarity setting.
Reviewed by: jhb
MFC r269184: Add missing newline to output dmesg properly. |
270159 |
19-Aug-2014 |
grehan |
MFC r267921, r267934, r267949, r267959, r267966, r268202, r268276, r268427, r268428, r268521, r268638, r268639, r268701, r268777, r268889, r268922, r269008, r269042, r269043, r269080, r269094, r269108, r269109, r269281, r269317, r269700, r269896, r269962, r269989.
Catch bhyve up to CURRENT.
Lightly tested with FreeBSD i386/amd64, Linux i386/amd64, and OpenBSD/amd64. Still resolving an issue with OpenBSD/i386.
Many thanks to jhb@ for all the hard work on the prior MFCs !
r267921 - support the "mov r/m8, imm8" instruction r267934 - document options r267949 - set DMI vers/date to fixed values r267959 - doc: sort cmd flags r267966 - EPT misconf post-mortem info r268202 - use correct flag for event index r268276 - 64-bit virtio capability api r268427 - invalidate guest TLB when cr3 is updated, needed for TSS r268428 - identify vcpu's operating mode r268521 - use correct offset in guest logical-to-linear translation r268638 - chs value r268639 - chs fake values r268701 - instr emul operand/address size override prefix support r268777 - emulation for legacy x86 task switching r268889 - nested exception support r268922 - fix INVARIANTS build r269008 - emulate instructions found in the OpenBSD/i386 5.5 kernel r269042 - fix fault injection r269043 - Reduce VMEXIT_RESTARTs in task_switch.c r269080 - fix issues in PUSH emulation r269094 - simplify return values from the inout handlers r269108 - don't return -1 from the push emulation handler r269109 - avoid permanent sleep in vm_handle_hlt() r269281 - list VT-x features in base kernel dmesg r269317 - Mark AHCI fatal errors as not completed r269700 - Support PCI extended config space in bhyve r269896 - Minor cleanup r269962 - use max guest memory when creating IOMMU domain r269989 - fix interrupt mode names |
269592 |
05-Aug-2014 |
marius |
MFC: r260457
The changes in r233781 attempted to make logging during a machine check exception more readable. In practice they prevented all logging during a machine check exception on at least some systems. Specifically, when an uncorrected ECC error is detected in a DIMM on a Nehalem/Westmere class machine, all CPUs receive a machine check exception, but only CPUs on the same package as the memory controller for the erroring DIMM log an error. The CPUs on the other package would complete the scan of their machine check banks and panic before the first set of CPUs could log an error. The end result was a clearer display during the panic (no interleaved messages), but a crashdump without any useful info about the error that occurred.
To handle this case, make all CPUs spin in the machine check handler once they have completed their scan of their machine check banks until at least one machine check error is logged. I tried using a DELAY() instead so that the CPUs would not potentially hang forever, but that was not reliable in testing.
While here, don't clear MCIP from MSR_MCG_STATUS before invoking panic. Only clear it if the machine check handler does not panic and returns to the interrupted thread.
MFC: r263113
Correct type for malloc().
Submitted by: "Conrad Meyer" <conrad.meyer@isilon.com>
MFC: r269052, r269239, r269242
Intel desktop Haswell CPUs may report benign corrected parity errors (see HSD131 erratum in [1]) at a considerable rate. So filter these (default), unless logging is enabled. Unfortunately, there really is no better way to reasonably implement suppressing these errors than to just skipping them in mca_log(). Given that they are reported for bank 0, they'd need to be masked in MSR_MC0_CTL. However, P6 family processors require that register to be set to either all 0s or all 1s, disabling way more than the one error in question when using all 0s there. Alternatively, it could be masked for the corresponding CMCI, but that still wouldn't keep the periodic scanner from detecting these spurious errors. Apart from that, register contents of MSR_MC0_CTL{,2} don't seem to be publicly documented, neither in the Intel Architectures Developer's Manual nor in the Haswell datasheets.
Note that while HSD131 actually is only about C0-stepping as of revision 014 of the Intel desktop 4th generation processor family specification update, these corrected errors also have been observed with D0-stepping aka "Haswell Refresh".
1: http://www.intel.de/content/dam/www/public/us/en/documents/specification-updates/4th-gen-core-family-desktop-specification-update.pdf
Reviewed by: jhb Sponsored by: Bally Wulff Games & Entertainment GmbH |
268076 |
01-Jul-2014 |
scottl |
Merge r266746, 266775:
Now that there are separate back-end implementations of busdma, the bounce implementation shouldn't steal flags from the common front-end. Move those flags to the back-end.
Eliminate the fake contig_dmamap and replace it with a new flag, BUS_DMA_KMEM_ALLOC. They serve the same purpose, but using the flag means that the map can be NULL again, which in turn enables significant optimizations for the common case of no bouncing.
Obtained from: Netflix, Inc. |
267809 |
23-Jun-2014 |
rodrigc |
MFC r263795:
Strict value checking will cause problem. Bay trail DN2820FYKH is supported on Linux but does not work on FreeBSD. This behaviour is bug-compatible with Linux-3.13.5.
References: http://d.hatena.ne.jp/syuu1228/20140326 http://lxr.linux.no/linux+v3.13.5/arch/x86/kernel/acpi/boot.c#L1094
Submitted by: syuu PR: 187966 |
267808 |
23-Jun-2014 |
rodrigc |
Undo bad merge. |
267807 |
23-Jun-2014 |
rodrigc |
MFC r263795:
Strict value checking will cause problem. Bay trail DN2820FYKH is supported on Linux but does not work on FreeBSD. This behaviour is bug-compatible with Linux-3.13.5.
References: http://d.hatena.ne.jp/syuu1228/20140326 http://lxr.linux.no/linux+v3.13.5/arch/x86/kernel/acpi/boot.c#L1094
Submitted by: syuu PR: 187966 |
267418 |
12-Jun-2014 |
jhb |
MFC 266263,266551,266552: - Add definitions for more structured extended features as well as XSAVE Extended Features for AVX512 and MPX (Memory Protection Extensions). - Don't permit users to request a subset of the AVX512 or MPX xsave masks. |
267068 |
04-Jun-2014 |
jhb |
MFC 263772: Fix build without SMP.
PR: 187854 |
266084 |
14-May-2014 |
ian |
MFC r257738, r259202, r258410, r260288, r260292, r260294, r260320, r260323, r260326, r260327, r260331, r260333, r260340, r260371, r260372, r260373, r260374, r260375
Add common bus_space tag definition shared for most supported ARMv6/v7 SoCs. Correct license statements to reflect the fact that these files were all derived from sys/arm/mv/bus_space.c.
In pmap_unmapdev(), remember the size, and use that as an argument to kva_free(), or we'd end up always passing it a size of 0
In pmap_mapdev(), first check whether a static mapping exists,
Convert TI static device mapping to use the new arm_devmap_add_entry(),
Use the common armv6 fdt_bus_tag defintion for tegra instead of a local copy.
Eliminate use of fdt_immr_addr(), it's not needed for tegra
Convert lpc from using fdt_immr style to arm_devmap_add_entry() to make static device mappings.
Retire machine/fdt.h as a header used by MI code, as its function is now obsolete. This involves the following pieces: - Remove it entirely on PowerPC, where it is not used by MD code either - Remove all references to machine/fdt.h in non-architecture-specific code (aside from uart_cpu_fdt.c, shared by ARM and MIPS, and so is somewhat non-arch-specific). - Fix code relying on header pollution from machine/fdt.h includes - Legacy fdtbus.c (still used on x86 FDT systems) now passes resource requests to its parent (nexus). This allows x86 FDT devices to allocate both memory and IO requests and removes the last notionally MI use of fdtbus_bs_tag. - On those architectures that retain a machine/fdt.h, unused bits like FDT_MAP_IRQ and FDT_INTR_MAX have been removed.
Add #include <machine/fdt.h> to a few files that used to get it via pollution
Enable the mv cesa security/crypto device by providing the required property in the dts source, and adding the right devices to the kernel config.
Remove dev/fdt/fdt_pci.c, which was code specific to Marvell ARM SoCs, related to setting up static device mappings. Since it was only used by arm/mv/mv_pci.c, it's now just static functions within that file, plus one public function that gets called only from arm/mv/mv_machdep.c.
Switch RPi to using arm_devmap_add_entry() to set up static device mapping.
Allow 'no static device mappings' to potentially work.
Don't try to find a static mapping before calling pmap_mapdev(), that logic is now part of pmap_mapdev() and doesn't need to be duplicated here.
Switch a10 to using arm_devmap_add_entry() to set up static device mapping. |
264496 |
15-Apr-2014 |
tijl |
MFC r263998:
Rename __wchar_t so it no longer conflicts with __wchar_t from clang 3.4 -fms-extensions. |
264118 |
04-Apr-2014 |
royger |
MFC r263001
Move asm IPIs handlers to C code, so both Xen and native IPI handlers share the same code.
Approved by: gibbs Sponsored by: Citrix Systems R&D |
263747 |
25-Mar-2014 |
kib |
MFC r263306: Add some support for the PCI(e)-PCI bridges to the Intel VT-d driver. |
263746 |
25-Mar-2014 |
kib |
MFC r263305: Provide a workaround by identity mapping the 32 pages after the bogus entry start, which seems to be enough for the reported BIOS. |
263687 |
24-Mar-2014 |
emaste |
MFC r263289: Update NetBSD Foundation copyrights to 2-clause BSD
The NetBSD Foundation states "Third parties are encouraged to change the license on any files which have a 4-clause license contributed to the NetBSD Foundation to a 2-clause license."
This change removes clauses 3 and 4 from copyright / license blocks that list The NetBSD Foundation as the only copyright holder.
Sponsored by: The FreeBSD Foundation |
263470 |
21-Mar-2014 |
kib |
MFC r263304: Trim at EOL. |
262981 |
10-Mar-2014 |
jkim |
MFC: r262746, r262748, r262750, r262752
Move fpusave() wrapper for suspend hander to sys/amd64/amd64/fpu.c. |
262192 |
18-Feb-2014 |
jhb |
MFC 261517,261520: Convert the license on files where I am the sole copyright holder to 2 clause BSD licenses. |
262141 |
18-Feb-2014 |
jhb |
MFC 259140: Move constants for indices in the local APIC's local vector table from apicvar.h to apicreg.h. |
262042 |
17-Feb-2014 |
avg |
MFC r257417: Remove references to an unused fasttrap probe hook |
261455 |
04-Feb-2014 |
eadler |
MFC r258779,r258780,r258787,r258822:
Fix undefined behavior: (1 << 31) is not defined as 1 is an int and this shifts into the sign bit. Instead use (1U << 31) which gets the expected result.
Similar to the (1 << 31) case it is not defined to do (2 << 30).
This fix is not ideal as it assumes a 32 bit int, but does fix the issue for most cases.
A similar change was made in OpenBSD. |
261325 |
31-Jan-2014 |
jhb |
MFC 259823: Fix i386 build. |
261275 |
29-Jan-2014 |
jhb |
MFC 259782: Add a resume hook for bhyve that runs a function on all CPUs during resume. For Intel CPUs, invoke vmxon for CPUs that were in VMX mode at the time of suspend. |
260473 |
09-Jan-2014 |
mav |
MFC r259197: Do not DELAY() for P-state transition unless we want to see the result.
Intel manual says: "If a transition is already in progress, transition to a new value will subsequently take effect. Reads of IA32_PERF_CTL determine the last targeted operating point." So seems it should be fine to just trigger wanted transition and go. Linux does the same. |
259837 |
24-Dec-2013 |
jhb |
MFC 259013: Fix the processor table entry structure to use a fixed-width type for 32-bit fields so it is the correct size on amd64. Remove a workaround for the broken structure from bhyve(8). |
259512 |
17-Dec-2013 |
kib |
MFC DMAR busdma implementation.
MFC r257251: Import the driver for VT-d DMAR hardware. Implement the busdma(9) using DMARs.
MFC r257512: Add support for queued invalidation.
MFC miscellaneous follow-ups to r257251.
MFC r257266: Remove redundand assignment to error variable and check for its value.
MFC r257308: Remove redundand declaration.
MFC r257511: Return BUS_PROBE_NOWILDCARD from the DMAR probe method.
MFC r257860,r257896,r257900,r257902,r257903 (by dim): Fixes for gcc compilation. |
259511 |
17-Dec-2013 |
kib |
MFC r257230: Add a virtual table for the busdma methods on x86, to allow different busdma implementations to coexist. |
259510 |
17-Dec-2013 |
kib |
MFC r257228: Add bus_dmamap_load_ma() function to load map with the array of vm_pages. |
259073 |
07-Dec-2013 |
peter |
Hoist all the mergeinfo up to the root in preparation for enforcing merges to the root only. All MFC's were rerecorded to the root.
Going forward, if an MFC includes mergeinfo, it will need to be made to the root and committed from the root. Merges with --ignore-ancestry or diff | patch can go anywhere.
The mergeinfo in HEAD is in a bad state from years of neglect and manual tampering and this was branched into 10.x. This confuses the coalescing code and prevents it from doing its job.
Approved by: re (gjb, implicit) |
258994 |
05-Dec-2013 |
sbruno |
MFC r257769 to stable/10
Fix powerd/states on AMD cpus. Resolves issues with system reporting: hwpstate0: set freq failed, err 6
Tested on FX-8150 and others.
PR: kern/167018 Submitted by: avg@ Approved by: re (gjb) |
258559 |
25-Nov-2013 |
emaste |
MFC r258135: x86: Allow users to change PSL_RF via ptrace(PT_SETREGS...)
Debuggers may need to change PSL_RF. Note that tf_eflags is already stored in the signal context during signal handling and PSL_RF previously could be modified via sigreturn, so this change should not provide any new ability to userspace.
For background see the thread at: http://lists.freebsd.org/pipermail/freebsd-i386/2007-September/005910.html
Reviewed by: jhb, kib
Sponsored by: DARPA, AFRL Approved by: re (gjb) |
258159 |
15-Nov-2013 |
kib |
MFC r257856: Add bits for the AMD features from CPUID function 0x80000001 ECX, described in the rev. 3.0 of the Kabini BKDG, document 48751.pdf.
Approved by: re (gjb) |
257492 |
01-Nov-2013 |
kib |
MFC r257069: Add ddb 'show ioapic' and 'show all ioapics' commands.
Approved by: re (glebius) |
256281 |
10-Oct-2013 |
gjb |
Copy head (r256279) to stable/10 as part of the 10.0-RELEASE cycle.
Approved by: re (implicit) Sponsored by: The FreeBSD Foundation
|
256105 |
07-Oct-2013 |
phk |
Add a va_copy() to our fall-back stdarg implementation for use with lint(1)
Approved by: re@ (glebius@)
|
256073 |
05-Oct-2013 |
gibbs |
Formalize the concept of virtual CPU ids by adding a per-cpu vcpu_id field. Perform vcpu enumeration for Xen PV and HVM environments and convert all Xen drivers to use vcpu_id instead of a hard coded assumption of the mapping algorithm (acpi or apic ID) in use.
Submitted by: Roger Pau Monné Sponsored by: Citrix Systems R&D Reviewed by: gibbs Approved by: re (blanket Xen)
amd64/include/pcpu.h: i386/include/pcpu.h: Add vcpu_id to the amd64 and i386 pcpu structures.
dev/xen/timer/timer.c x86/xen/xen_intr.c Use new vcpu_id instead of assuming acpi_id == vcpu_id.
i386/xen/mp_machdep.c: i386/xen/mptable.c x86/xen/hvm.c: Perform Xen HVM and Xen full PV vcpu_id mapping.
x86/xen/hvm.c: x86/acpica/madt.c Change SYSINIT ordering of acpi CPU enumeration so that it is guaranteed to be available at the time of Xen HVM vcpu id mapping.
|
256071 |
05-Oct-2013 |
gibbs |
Correct panic caused by attaching both Xen PV and HyperV virtualization aware drivers on Xen hypervisors that advertise support for some HyperV features.
x86/xen/hvm.c: When running in HVM mode on a Xen hypervisor, set vm_guest to VM_GUEST_XEN so other virtualization aware components in the FreeBSD kernel can detect this mode is active.
dev/hyperv/vmbus/hv_hv.c: Use vm_guest to ignore Xen's HyperV emulation when Xen is detected and Xen PV drivers are active.
Reported by: Shanker Balan Submitted by: Roger Pau Monné Sponsored by: Citrix Systems R&D Reviewed by: gibbs Approved by: re (Xen blanket)
|
255913 |
27-Sep-2013 |
gibbs |
sys/x86/xen/hvm.c: Set cpu_ops correctly for Xen hypervisors lacking the vector callback feature.
Set preliminary Xen cpu_ops settings during early HVM initialization. The old location raced with the startup of APs.
Submitted by: Roger Pau Monné Reviewed by: gibbs Approved by: re (blanket Xen)
|
255744 |
20-Sep-2013 |
gibbs |
Merge Xen PVHVM support into the GENERIC kernel config for both amd64 and i386.
Submitted by: Roger Pau Monné Sponsored by: Citrix Systems R&D Reviewed by: gibbs Approved by: re (blanket Xen) MFC after: 2 weeks
sys/amd64/amd64/mp_machdep.c: sys/amd64/include/cpu.h: sys/i386/i386/mp_machdep.c: sys/i386/include/cpu.h: - Introduce two new CPU hooks for initialization and resume purposes. This allows us to get rid of the XENHVM ifdefs in mp_machdep, and also sets some hooks into common code that can be used by other hypervisor implementations.
sys/amd64/conf/XENHVM: sys/i386/conf/XENHVM: - Remove these configs now that GENERIC has builtin support for Xen HVM.
sys/kern/subr_smp.c: - Make sure there are no pending IPIs when suspending a system.
sys/x86/xen/hvm.c: - Add cpu init and resume vectors that are called from mp_machdep using the new hooks. - Only clear the vcpu_info mapping data on resume. It is already clear for the BSP on a cold boot and is set correctly as APs are started. - Gate xen_hvm_init_cpu only to systems running under Xen.
sys/x86/xen/xen_intr.c: - Gate the setup of event channels only to systems running under Xen.
|
255726 |
20-Sep-2013 |
gibbs |
Add support for suspend/resume/migration operations when running as a Xen PVHVM guest.
Submitted by: Roger Pau Monné Sponsored by: Citrix Systems R&D Reviewed by: gibbs Approved by: re (blanket Xen) MFC after: 2 weeks
sys/amd64/amd64/mp_machdep.c: sys/i386/i386/mp_machdep.c: - Make sure that are no MMU related IPIs pending on migration. - Reset pending IPI_BITMAP on resume. - Init vcpu_info on resume.
sys/amd64/include/intr_machdep.h: sys/i386/include/intr_machdep.h: sys/x86/acpica/acpi_wakeup.c: sys/x86/x86/intr_machdep.c: sys/x86/isa/atpic.c: sys/x86/x86/io_apic.c: sys/x86/x86/local_apic.c: - Add a "suspend_cancelled" parameter to pic_resume(). For the Xen PIC, restoration of interrupt services differs between the aborted suspend and normal resume cases, so we must provide this information.
sys/dev/acpica/acpi_timer.c: sys/dev/xen/timer/timer.c: sys/timetc.h: - Don't swap out "suspend safe" timers across a suspend/resume cycle. This includes the Xen PV and ACPI timers.
sys/dev/xen/control/control.c: - Perform proper suspend/resume process for PVHVM: - Suspend all APs before going into suspension, this allows us to reset the vcpu_info on resume for each AP. - Reset shared info page and callback on resume.
sys/dev/xen/timer/timer.c: - Implement suspend/resume support for the PV timer. Since FreeBSD doesn't perform a per-cpu resume of the timer, we need to call smp_rendezvous in order to correctly resume the timer on each CPU.
sys/dev/xen/xenpci/xenpci.c: - Don't reset the PCI interrupt on each suspend/resume.
sys/kern/subr_smp.c: - When suspending a PVHVM domain make sure there are no MMU IPIs in-flight, or we will get a lockup on resume due to the fact that pending event channels are not carried over on migration. - Implement a generic version of restart_cpus that can be used by suspended and stopped cpus.
sys/x86/xen/hvm.c: - Implement resume support for the hypercall page and shared info. - Clear vcpu_info so it can be reset by APs when resuming from suspension.
sys/dev/xen/xenpci/xenpci.c: sys/x86/xen/hvm.c: sys/x86/xen/xen_intr.c: - Support UP kernel configurations.
sys/x86/xen/xen_intr.c: - Properly rebind per-cpus VIRQs and IPIs on resume.
|
255331 |
06-Sep-2013 |
gibbs |
Implement PV IPIs for PVHVM guests and further converge PV and HVM IPI implmementations.
Submitted by: Roger Pau Monné Sponsored by: Citrix Systems R&D Submitted by: gibbs (misc cleanup, table driven config) Reviewed by: gibbs MFC after: 2 weeks
sys/amd64/include/cpufunc.h: sys/amd64/amd64/pmap.c: Move invltlb_globpcid() into cpufunc.h so that it can be used by the Xen HVM version of tlb shootdown IPI handlers.
sys/x86/xen/xen_intr.c: sys/xen/xen_intr.h: Rename xen_intr_bind_ipi() to xen_intr_alloc_and_bind_ipi(), and remove the ipi vector parameter. This api allocates an event channel port that can be used for ipi services, but knows nothing of the actual ipi for which that port will be used. Removing the unused argument and cleaning up the comments surrounding its declaration helps clarify its actual role.
sys/amd64/amd64/mp_machdep.c: sys/amd64/include/cpu.h: sys/i386/i386/mp_machdep.c: sys/i386/include/cpu.h: Implement a generic framework for amd64 and i386 that allows the implementation of certain CPU management functions to be selected at runtime. Currently this is only used for the ipi send function, which we optimize for Xen when running on a Xen hypervisor, but can easily be expanded to support more operations.
sys/x86/xen/hvm.c: Implement Xen PV IPI handlers and operations, replacing native send IPI.
sys/amd64/include/pcpu.h: sys/i386/include/pcpu.h: sys/i386/include/smp.h: Remove NR_VIRQS and NR_IPIS from FreeBSD headers. NR_VIRQS is defined already for us in the xen interface files. NR_IPIS is only needed in one file per Xen platform and is easily inferred by the IPI vector table that is defined in those files.
sys/i386/xen/mp_machdep.c: Restructure to more closely match the HVM implementation by performing table driven IPI setup.
|
255139 |
01-Sep-2013 |
gibbs |
Conform to style(9). No functional changes.
sys/x86/xen/hvm.c: Do not rely on implicit conversion to boolean in expressions (e.g. use "if (rc != 0)" instead of "if (rc)".
Line continuations for functions are indented an additional 4 spaces.
Insert an empty line if the function has no local variables.
Prefer separate initializtion statements to initialzing local variables in their declaration.
Braces that are not necessary may be left out.
MFC after: 2 weeks
|
255040 |
29-Aug-2013 |
gibbs |
Implement vector callback for PVHVM and unify event channel implementations
Re-structure Xen HVM support so that: - Xen is detected and hypercalls can be performed very early in system startup. - Xen interrupt services are implemented using FreeBSD's native interrupt delivery infrastructure. - the Xen interrupt service implementation is shared between PV and HVM guests. - Xen interrupt handlers can optionally use a filter handler in order to avoid the overhead of dispatch to an interrupt thread. - interrupt load can be distributed among all available CPUs. - the overhead of accessing the emulated local and I/O apics on HVM is removed for event channel port events. - a similar optimization can eventually, and fairly easily, be used to optimize MSI.
Early Xen detection, HVM refactoring, PVHVM interrupt infrastructure, and misc Xen cleanups:
Sponsored by: Spectra Logic Corporation
Unification of PV & HVM interrupt infrastructure, bug fixes, and misc Xen cleanups:
Submitted by: Roger Pau Monné Sponsored by: Citrix Systems R&D
sys/x86/x86/local_apic.c: sys/amd64/include/apicvar.h: sys/i386/include/apicvar.h: sys/amd64/amd64/apic_vector.S: sys/i386/i386/apic_vector.s: sys/amd64/amd64/machdep.c: sys/i386/i386/machdep.c: sys/i386/xen/exception.s: sys/x86/include/segments.h: Reserve IDT vector 0x93 for the Xen event channel upcall interrupt handler. On Hypervisors that support the direct vector callback feature, we can request that this vector be called directly by an injected HVM interrupt event, instead of a simulated PCI interrupt on the Xen platform PCI device. This avoids all of the overhead of dealing with the emulated I/O APIC and local APIC. It also means that the Hypervisor can inject these events on any CPU, allowing upcalls for different ports to be handled in parallel.
sys/amd64/amd64/mp_machdep.c: sys/i386/i386/mp_machdep.c: Map Xen per-vcpu area during AP startup.
sys/amd64/include/intr_machdep.h: sys/i386/include/intr_machdep.h: Increase the FreeBSD IRQ vector table to include space for event channel interrupt sources.
sys/amd64/include/pcpu.h: sys/i386/include/pcpu.h: Remove Xen HVM per-cpu variable data. These fields are now allocated via the dynamic per-cpu scheme. See xen_intr.c for details.
sys/amd64/include/xen/hypercall.h: sys/dev/xen/blkback/blkback.c: sys/i386/include/xen/xenvar.h: sys/i386/xen/clock.c: sys/i386/xen/xen_machdep.c: sys/xen/gnttab.c: Prefer FreeBSD primatives to Linux ones in Xen support code.
sys/amd64/include/xen/xen-os.h: sys/i386/include/xen/xen-os.h: sys/xen/xen-os.h: sys/dev/xen/balloon/balloon.c: sys/dev/xen/blkback/blkback.c: sys/dev/xen/blkfront/blkfront.c: sys/dev/xen/console/xencons_ring.c: sys/dev/xen/control/control.c: sys/dev/xen/netback/netback.c: sys/dev/xen/netfront/netfront.c: sys/dev/xen/xenpci/xenpci.c: sys/i386/i386/machdep.c: sys/i386/include/pmap.h: sys/i386/include/xen/xenfunc.h: sys/i386/isa/npx.c: sys/i386/xen/clock.c: sys/i386/xen/mp_machdep.c: sys/i386/xen/mptable.c: sys/i386/xen/xen_clock_util.c: sys/i386/xen/xen_machdep.c: sys/i386/xen/xen_rtc.c: sys/xen/evtchn/evtchn_dev.c: sys/xen/features.c: sys/xen/gnttab.c: sys/xen/gnttab.h: sys/xen/hvm.h: sys/xen/xenbus/xenbus.c: sys/xen/xenbus/xenbus_if.m: sys/xen/xenbus/xenbusb_front.c: sys/xen/xenbus/xenbusvar.h: sys/xen/xenstore/xenstore.c: sys/xen/xenstore/xenstore_dev.c: sys/xen/xenstore/xenstorevar.h: Pull common Xen OS support functions/settings into xen/xen-os.h.
sys/amd64/include/xen/xen-os.h: sys/i386/include/xen/xen-os.h: sys/xen/xen-os.h: Remove constants, macros, and functions unused in FreeBSD's Xen support.
sys/xen/xen-os.h: sys/i386/xen/xen_machdep.c: sys/x86/xen/hvm.c: Introduce new functions xen_domain(), xen_pv_domain(), and xen_hvm_domain(). These are used in favor of #ifdefs so that FreeBSD can dynamically detect and adapt to the presence of a hypervisor. The goal is to have an HVM optimized GENERIC, but more is necessary before this is possible.
sys/amd64/amd64/machdep.c: sys/dev/xen/xenpci/xenpcivar.h: sys/dev/xen/xenpci/xenpci.c: sys/x86/xen/hvm.c: sys/sys/kernel.h: Refactor magic ioport, Hypercall table and Hypervisor shared information page setup, and move it to a dedicated HVM support module.
HVM mode initialization is now triggered during the SI_SUB_HYPERVISOR phase of system startup. This currently occurs just after the kernel VM is fully setup which is just enough infrastructure to allow the hypercall table and shared info page to be properly mapped.
sys/xen/hvm.h: sys/x86/xen/hvm.c: Add definitions and a method for configuring Hypervisor event delievery via a direct vector callback.
sys/amd64/include/xen/xen-os.h: sys/x86/xen/hvm.c:
sys/conf/files: sys/conf/files.amd64: sys/conf/files.i386: Adjust kernel build to reflect the refactoring of early Xen startup code and Xen interrupt services.
sys/dev/xen/blkback/blkback.c: sys/dev/xen/blkfront/blkfront.c: sys/dev/xen/blkfront/block.h: sys/dev/xen/control/control.c: sys/dev/xen/evtchn/evtchn_dev.c: sys/dev/xen/netback/netback.c: sys/dev/xen/netfront/netfront.c: sys/xen/xenstore/xenstore.c: sys/xen/evtchn/evtchn_dev.c: sys/dev/xen/console/console.c: sys/dev/xen/console/xencons_ring.c Adjust drivers to use new xen_intr_*() API.
sys/dev/xen/blkback/blkback.c: Since blkback defers all event handling to a taskqueue, convert this task queue to a "fast" taskqueue, and schedule it via an interrupt filter. This avoids an unnecessary ithread context switch.
sys/xen/xenstore/xenstore.c: The xenstore driver is MPSAFE. Indicate as much when registering its interrupt handler.
sys/xen/xenbus/xenbus.c: sys/xen/xenbus/xenbusvar.h: Remove unused event channel APIs.
sys/xen/evtchn.h: Remove all kernel Xen interrupt service API definitions from this file. It is now only used for structure and ioctl definitions related to the event channel userland device driver.
Update the definitions in this file to match those from NetBSD. Implementing this interface will be necessary for Dom0 support.
sys/xen/evtchn/evtchnvar.h: Add a header file for implemenation internal APIs related to managing event channels event delivery. This is used to allow, for example, the event channel userland device driver to access low-level routines that typical kernel consumers of event channel services should never access.
sys/xen/interface/event_channel.h: sys/xen/xen_intr.h: Standardize on the evtchn_port_t type for referring to an event channel port id. In order to prevent low-level event channel APIs from leaking to kernel consumers who should not have access to this data, the type is defined twice: Once in the Xen provided event_channel.h, and again in xen/xen_intr.h. The double declaration is protected by __XEN_EVTCHN_PORT_DEFINED__ to ensure it is never declared twice within a given compilation unit.
sys/xen/xen_intr.h: sys/xen/evtchn/evtchn.c: sys/x86/xen/xen_intr.c: sys/dev/xen/xenpci/evtchn.c: sys/dev/xen/xenpci/xenpcivar.h: New implementation of Xen interrupt services. This is similar in many respects to the i386 PV implementation with the exception that events for bound to event channel ports (i.e. not IPI, virtual IRQ, or physical IRQ) are further optimized to avoid mask/unmask operations that aren't necessary for these edge triggered events.
Stubs exist for supporting physical IRQ binding, but will need additional work before this implementation can be fully shared between PV and HVM.
sys/amd64/amd64/mp_machdep.c: sys/i386/i386/mp_machdep.c: sys/i386/xen/mp_machdep.c sys/x86/xen/hvm.c: Add support for placing vcpu_info into an arbritary memory page instead of using HYPERVISOR_shared_info->vcpu_info. This allows the creation of domains with more than 32 vcpus.
sys/i386/i386/machdep.c: sys/i386/xen/clock.c: sys/i386/xen/xen_machdep.c: sys/i386/xen/exception.s: Add support for new event channle implementation.
|
254373 |
15-Aug-2013 |
brooks |
Call set_i8254_freq with MODE_STOP (0) rather than a magic number of 0.
|
254305 |
13-Aug-2013 |
jkim |
Merge acpica_machdep.h for amd64 and i386 and move to x86. In fact, these two files were functionally identical.
|
254065 |
07-Aug-2013 |
kib |
Split the pagequeues per NUMA domains, and split pageademon process into threads each processing queue in a single domain. The structure of the pagedaemons and queues is kept intact, most of the changes come from the need for code to find an owning page queue for given page, calculated from the segment containing the page.
The tie between NUMA domain and pagedaemon thread/pagequeue split is rather arbitrary, the multithreaded daemon could be allowed for the single-domain machines, or one domain might be split into several page domains, to further increase concurrency.
Right now, each pagedaemon thread tries to reach the global target, precalculated at the start of the pass. This is not optimal, since it could cause excessive page deactivation and freeing. The code should be changed to re-check the global page deficit state in the loop after some number of iterations.
The pagedaemons reach the quorum before starting the OOM, since one thread inability to meet the target is normal for split queues. Only when all pagedaemons fail to produce enough reusable pages, OOM is started by single selected thread.
Launder is modified to take into account the segments layout with regard to the region for which cleaning is performed.
Based on the preliminary patch by jeff, sponsored by EMC / Isilon Storage Division.
Reviewed by: alc Tested by: pho Sponsored by: The FreeBSD Foundation
|
254025 |
07-Aug-2013 |
jeff |
Replace kernel virtual address space allocation with vmem. This provides transparent layering and better fragmentation.
- Normalize functions that allocate memory to use kmem_* - Those that allocate address space are named kva_* - Those that operate on maps are named kmap_* - Implement recursive allocation handling for kmem_arena in vmem.
Reviewed by: alc Tested by: pho Sponsored by: EMC / Isilon Storage Division
|
253747 |
28-Jul-2013 |
avg |
x86: detect mwait capabilities and extensions, when present
Reviewed by: kib (earlier amd64-only version) MFC after: 2 weeks
|
251900 |
18-Jun-2013 |
rpaulo |
Fix a KTR_BUSDMA format string.
|
250840 |
21-May-2013 |
marcel |
Add basic support for FDT to i386 & amd64. This change includes: 1. Common headers for fdt.h and ofw_machdep.h under x86/include with indirections under i386/include and amd64/include. 2. New modinfo for loader provided FDT blob. 3. Common x86_init_fdt() called from hammer_time() on amd64 and init386() on i386. 4. Split-off FDT specific low-level console functions from FDT bus methods for the uart(4) driver. The low-level console logic has been moved to uart_cpu_fdt.c and is used for arm, mips & powerpc only. The FDT bus methods are shared across all architectures. 5. Add dev/fdt/fdt_x86.c to hold the fdt_fixup_table[] and the fdt_pic_table[] arrays. Both are empty right now.
FDT addresses are I/O ports on x86. Since the core FDT code does not handle different address spaces, adding support for both I/O ports and memory addresses requires some thought and discussion. It may be better to use a compile-time option that controls this.
Obtained from: Juniper Networks, Inc.
|
250601 |
13-May-2013 |
attilio |
o Add accessor functions to add and remove pages from a specific freelist. o Split the pool of free pages queues really by domain and not rely on definition of VM_RAW_NFREELIST. o For MAXMEMDOM > 1, wrap the RR allocation logic into a specific function that is called when calculating the allocation domain. The RR counter is kept, currently, per-thread. In the future it is expected that such function evolves in a real policy decision referee, based on specific informations retrieved by per-thread and per-vm_object attributes. o Add the concept of "probed domains" under the form of vm_ndomains. It is responsibility for every architecture willing to support multiple memory domains to correctly probe vm_ndomains along with mem_affinity segments attributes. Those two values are supposed to remain always consistent. Please also note that vm_ndomains and td_dom_rr_idx are both int because segments already store domains as int. Ideally u_int would have much more sense. Probabilly this should be cleaned up in the future. o Apply RR domain selection also to vm_phys_zero_pages_idle().
Sponsored by: EMC / Isilon storage division Partly obtained from: jeff Reviewed by: alc Tested by: jeff
|
250576 |
12-May-2013 |
eadler |
Fix several typos
PR: kern/176054 Submitted by: Christoph Mallon <christoph.mallon@gmx.de> MFC after: 3 days
|
250487 |
10-May-2013 |
hiren |
Adding a detach method to p4tcc driver.
PR: 118739 Submitted by: Dan Lukes <dan@obluda.cz> (earlier version) Reviewed by: jhb Approved by: sbruno (mentor) MFC after: 1 week
|
250389 |
08-May-2013 |
attilio |
Revert r250339 as apparently it is more clutter than help.
Sponsored by: EMC / Isilon storage division Requested by: jhb
|
250339 |
07-May-2013 |
attilio |
Add functions to do ACPI System Locality Information Table parsing and printing at boot. For reference on table informations and purposes please review ACPI specs.
Sponsored by: EMC / Isilon storage division Obtained from: jeff Reviewed by: jhb (earlier version)
|
250338 |
07-May-2013 |
attilio |
Rename VM_NDOMAIN into MAXMEMDOM and move it into machine/param.h in order to match the MAXCPU concept. The change should also be useful for consolidation and consistency.
Sponsored by: EMC / Isilon storage division Obtained from: jeff Reviewed by: alc
|
249625 |
18-Apr-2013 |
mav |
Introduce kern.timecounter.smp_tsc_adjust tunable (disabled by default) and respective functionality, allowing to synchronize TSC on APs to match BSP's during boot. It may be unsafe in general case due to theoretical chance of later drift if CPUs are using different clock rate or source, but it allows to use TSC in some cases when difference caused by some initialization bug, while TSCs are known to increment synchronously.
Reviewed by: jimharris, kib MFC after: 1 month
|
249608 |
18-Apr-2013 |
rpaulo |
Move the previously added CPUID7 macros to CPUID_STDEXT.
|
249602 |
18-Apr-2013 |
rpaulo |
Add the most current CPUID7_* definitions.
|
249351 |
11-Apr-2013 |
neel |
Make the code to check if VMX is enabled more readable by using macros instead of magic numbers.
Discussed with: Chris Torek
|
249324 |
10-Apr-2013 |
neel |
Unsynchronized TSCs on the host require special handling in bhyve:
- use clock_gettime(2) as the time base for the emulated ACPI timer instead of directly using rdtsc().
- don't advertise the invariant TSC capability to the guest to discourage it from using the TSC as its time base.
Discussed with: jhb@ (about making 'smp_tsc' a global) Reported by: Dan Mack on freebsd-virtualization@ Obtained from: NetApp
|
248968 |
01-Apr-2013 |
kib |
Record the correct error in the trace.
Sponsored by: The FreeBSD Foundation MFC after: 3 days
|
247463 |
28-Feb-2013 |
mav |
MFcalloutng: Switch eventtimers(9) from using struct bintime to sbintime_t. Even before this not a single driver really supported full dynamic range of struct bintime even in theory, not speaking about practical inexpediency. This change legitimates the status quo and cleans up the code.
|
247101 |
21-Feb-2013 |
imp |
Use critical_enter/critical_exit around the time sensitive part of this code to depessimize the worst case we've lived with silently and uneventfully for the past 12 years. Add a comment about a refinement for those needing more assurance of accuracy.
Fix ddb's show rtc command deadlock potential when debugging rtc code by not taking the lock if we're in the debugger. If you need a thumb to count the number of people that have encountered this, I'd be surprised.
Submitted by: bde
|
247086 |
21-Feb-2013 |
imp |
Correct comment about use of pmtimer, and the real reason it isn't used or desirable for amd64.
|
247068 |
21-Feb-2013 |
imp |
Fix broken usage of splhigh() by removing it.
|
247047 |
20-Feb-2013 |
kib |
Convert machine/elf.h, machine/frame.h, machine/sigframe.h, machine/signal.h and machine/ucontext.h into common x86 includes, copying from amd64 and merging with i386.
Kernel-only compat definitions are kept in the i386/include/sigframe.h and i386/include/signal.h, to reduce amd64 kernel namespace pollution. The amd64 compat uses its own definitions so far.
The _MACHINE_ELF_WANT_32BIT definition is to allow the sys/boot/userboot/userboot/elf32_freebsd.c to use i386 ELF definitions on the amd64 compile host. The same hack could be usefully abused by other code too.
|
247000 |
19-Feb-2013 |
davide |
Fixup r246916 in case gcc is used to build.
Reported by: attilio, simon
|
246916 |
17-Feb-2013 |
mav |
MFcalloutng: Microoptimize i8254 one-shot operation mode (disabled by default to allow timecounter functionality) by not writing to mode and MSB registers when it is not required. This saves several microseconds of CPU time per call, reducing minimal measured interrupts interval to 19.5us.
|
246805 |
14-Feb-2013 |
jhb |
Make VM_NDOMAIN a kernel option so that it can be enabled from a kernel config file.
Requested by: phk (ages ago) MFC after: 1 month
|
246713 |
12-Feb-2013 |
kib |
Reform the busdma API so that new types may be added without modifying every architecture's busdma_machdep.c. It is done by unifying the bus_dmamap_load_buffer() routines so that they may be called from MI code. The MD busdma is then given a chance to do any final processing in the complete() callback.
The cam changes unify the bus_dmamap_load* handling in cam drivers.
The arm and mips implementations are updated to track virtual addresses for sync(). Previously this was done in a type specific way. Now it is done in a generic way by recording the list of virtuals in the map.
Submitted by: jeff (sponsored by EMC/Isilon) Reviewed by: kan (previous version), scottl, mjacob (isp(4), no objections for target mode changes) Discussed with: ian (arm changes) Tested by: marius (sparc64), mips (jmallet), isci(4) on x86 (jharris), amd64 (Fabian Keil <freebsd-listen@fabiankeil.de>)
|
246247 |
02-Feb-2013 |
avg |
x86 suspend/resume: suspend pics and pseudo-pics in reverse order
- change 'pics' from STAILQ to TAILQ - ensure that Local APIC is always first in 'pics'
Reviewed by: jhb Tested by: Sergey V. Dyatko <sergey.dyatko@gmail.com>, KAHO Toshikazu <kaho@elam.kais.kyoto-u.ac.jp> MFC after: 12 days
|
246212 |
01-Feb-2013 |
kib |
The change to reduce default smp_tsc_shift caused tsc shift to become zero on slower machines, which make the fenced get_timecount methods not used despite needed. Remove the (shift > 0) condition when selecting the get_timecount() implementation.
Rename smp_tsc_shift to tsc_shift, and apply it for the UP case too. Allow shift to reach value of 31 instead of 30, as it was previously (should be nop).
Reorganize the tc quality calculation to remove the conditionally compiled block. Rename test_smp_tsc() to test_tsc() and provide separate versions for SMP and UP builds. The check for virtialized hardware is more natural to perform in the smp version of the test_tsc(), since it is only done for smp case.
Noted and reviewed by: bde (previous version) MFC after: 12 days
|
246116 |
30-Jan-2013 |
kib |
Reduce default shift used to calculate the max frequency for the TSC timecounter to 1, and correspondingly increase the precision of the gettimeofday(2) and related functions in the default configuration.
The motivation for the TSC-low timecounter, as described in the r222866, seems to provide a workaround for the non-serializing behaviour of the RDTSC on some Intel hardware. Tests demonstrate that even with the pre-shift of 8, the cross-core non-monotonicity of the RDTSC is still observed reliably, e.g. on the Nehalems. The r238755 and r238973 implemented the proper fix for the issue.
The pre-shift of 1 is applied to keep TSC not overflowing for the frequency of hardclock down to 2 sec/intr. The pre-shift is made a tunable to allow the easy debugging of the issues users could see with the shift being too low.
Reviewed by: bde MFC after: 2 weeks
|
245577 |
17-Jan-2013 |
jhb |
Don't attempt to use clflush on the local APIC register window. Various CPUs exhibit bad behavior if this is done (Intel Errata AAJ3, hangs on Pentium-M, and trashing of the local APIC registers on a VIA C7). The local APIC is implicitly mapped UC already via MTRRs, so the clflush isn't necessary anyway.
MFC after: 2 weeks
|
245055 |
05-Jan-2013 |
neel |
Add macros required to enable VMX operation on Intel processors.
Obtained from: NetApp
|
244193 |
13-Dec-2012 |
jimharris |
Add bus_space_read_8 and bus_space_write_8 for amd64.
Rather than trying to KASSERT for callers that invoke this on IO tags, either do nothing (for write_8) or return ~0 (for read_8). Using KASSERT here just makes bus.h too messy from both polluting bus.h with systm.h (for any number of drivers that include bus.h without first including systm.h) or ports that use bus.h directly (i.e. libpciaccess) as reported by zeising@.
Also don't try to implement all of the other bus_space functions for 8 byte access since realistically only these two are needed for some devices that expose 64-bit memory-mapped registers.
Put the amd64-specific functions here rather than sys/amd64/include/bus.h so that we can keep this header unified for x86, as requested by mdf@ and tijl@.
Submitted by: Carl Delsey <carl.r.delsey@intel.com> MFC after: 3 days
|
244191 |
13-Dec-2012 |
jimharris |
Revert r243960 based on feedback regarding keeping x86 headers unified (mdf@, tijl@) and use of KASSERT/systm.h in bus.h (zeising@, bde@).
Alternate implementation will be made in a separate commit.
|
243960 |
06-Dec-2012 |
jimharris |
Add amd64 implementations for 8-byte bus_space routines.
Submitted by: Carl Delsey <carl.r.delsey@intel.com> Discussed with: jhb, rwatson Reviewed by: jimharris MFC after: 1 week
|
243764 |
01-Dec-2012 |
avg |
ioapic_program_intpin: program high bits before low bits
Programming the low bits has a side-effect if unmasking the pin if it is not disabled. So if an interrupt was pending then it would be delivered with the correct new vector but to the incorrect old LAPIC.
This fix could be made clearer by preserving the mask bit while programming the low bits and then explicitly resetting the mask bit after all the programming is done.
Probability to trip over the fixed bug could be increased by bootverbose because printing of the interrupt information in ioapic_assign_cpu lengthened the time window during which an interrupt could arrive while a pin is masked.
Reported by: Andreas Longwitz <longwitz@incore.de> Tested by: Andreas Longwitz <longwitz@incore.de> MFC after: 12 days
|
242432 |
01-Nov-2012 |
kib |
Provide the reading and display of the Standard Extended Features, introduced with the IvyBridge CPUs. Provide the definitions for new bits in CR3 and CR4 registers.
Tested by: avg, Michael Moll <kvedulv@kvedulv.de> MFC after: 2 weeks
|
241885 |
22-Oct-2012 |
eadler |
This isn't functionally identical. In some cases a hint to disable unit 0 would in fact disable all units.
This reverts r241856
Approved by: cperciva (implicit)
|
241856 |
22-Oct-2012 |
eadler |
Now that device disabling is generic, remove extraneous code from the device drivers that used to provide this feature.
Reviewed by: des Approved by: cperciva MFC after: 1 week
|
241374 |
09-Oct-2012 |
attilio |
Add an unified macro to deny ability from the compiler to reorder instruction loads/stores at its will. The macro __compiler_membar() is currently supported for both gcc and clang, but kernel compilation will fail otherwise.
Reviewed by: bde, kib Discussed with: dim, theraven MFC after: 2 weeks
|
241371 |
09-Oct-2012 |
attilio |
Reverts r234074,234105,234564,234723,234989,235231-235232 and part of r234247. Use, instead, the static intializer introduced in r239923 for x86 and sparc64 intr_cpus, unwinding the code to the initial version.
Reviewed by: marius
|
241073 |
30-Sep-2012 |
kevlo |
Add missing header needed by free(9). Spotted by: David Wolfskill <david at catwhisker dot org>
|
241066 |
30-Sep-2012 |
kevlo |
Free result of device_get_children(9).
|
241027 |
28-Sep-2012 |
jhb |
- Re-shuffle the <machine/pc/bios.h> headers to move all kernel-specific bits under #ifdef _KERNEL but leave definitions for various structures defined by standards ($PIR table, SMAP entries, etc.) available to userland. - Consolidate duplicate SMBIOS table structure definitions in ipmi(4) and smbios(4) in <machine/pc/bios.h> and make them available to userland.
MFC after: 2 weeks
|
239354 |
17-Aug-2012 |
jhb |
Allow static DMA allocations that allow for enough segments to do page-sized segments for the entire allocation to use kmem_alloc_attr() to allocate KVM rather than using kmem_alloc_contig(). This avoids requiring a single physically contiguous chunk in this case.
Submitted by: Peter Jeremy (original version) MFC after: 1 month
|
239340 |
16-Aug-2012 |
jkim |
Merge ACPICA 20120816.
|
239133 |
07-Aug-2012 |
jimharris |
During TSC synchronization test, use rdtsc() rather than rdtsc32(), to protect against 32-bit TSC overflow while the sync test is running.
On dual-socket Xeon E5-2600 (SNB) systems with up to 32 threads, there is non-trivial chance (2-3%) that TSC synchronization test fails due to 32-bit TSC overflow while the synchronization test is running.
Sponsored by: Intel Reviewed by: jkim Discussed with: jkim, kib
|
239020 |
03-Aug-2012 |
jhb |
Correct function name in comment.
Submitted by: alc
|
239013 |
03-Aug-2012 |
mav |
Microoptimize LAPIC timer routines to avoid reading from hardware during programming using earlier cached values. This makes respective routines to disappear from PMC top and reduces total number of active CPU cycles on idle 24-core system by 10%.
|
239008 |
03-Aug-2012 |
jhb |
Improve the handling of static DMA buffers that use non-default memory attributes (currently just BUS_DMA_NOCACHE): - Don't call pmap_change_attr() on the returned address, instead use kmem_alloc_contig() to ask the VM system for memory with the requested attribute. - As a result, always use kmem_alloc_contig() for non-default memory attributes, even for sub-page allocations. This requires adjusting bus_dmamem_free()'s logic for determining which free routine to use. - For x86, add a new dummy bus_dmamap that is used for static DMA buffers allocated via kmem_alloc_contig(). bus_dmamem_free() can then use the map pointer to determine which free routine to use. - For powerpc, add a new flag to the allocated map (bus_dmamem_alloc() always creates a real map on powerpc) to indicate which free routine should be used.
Note that the BUS_DMA_NOCACHE handling in powerpc is currently #ifdef'd out. I have left it disabled but updated it to match x86.
Reviewed by: scottl MFC after: 1 month
|
238975 |
01-Aug-2012 |
kib |
Do a trivial reformatting of the comment, to record the proper commit message for r238973:
Rdtsc instruction is not synchronized, it seems on some Intel cores it can bypass even the locked instructions. As a result, rdtsc executed on different cores may return unordered TSC values even when the rdtsc appearance in the instruction sequences is provably ordered.
Similarly to what has been done in r238755 for TSC synchronization test, add explicit fences right before rdtsc in the timecounters 'get' functions. Intel recommends to use LFENCE, while AMD refers to MFENCE. For VIA follow what Linux does and use LFENCE. With this change, I see no reordered reads of TSC on Nehalem.
Change the rmb() to inlined CPUID in the SMP TSC synchronization test. On i386, locked instruction is used for rmb(), and as noted earlier, it is not enough. Since i386 machine may not support SSE2, do simplest possible synchronization with CPUID.
MFC after: 1 week Discussed with: avg, bde, jkim
|
238973 |
01-Aug-2012 |
kib |
diff --git a/sys/x86/x86/tsc.c b/sys/x86/x86/tsc.c index c253a96..3d8bd30 100644 --- a/sys/x86/x86/tsc.c +++ b/sys/x86/x86/tsc.c @@ -82,7 +82,11 @@ static void tsc_freq_changed(void *arg, const struct cf_level *level, static void tsc_freq_changing(void *arg, const struct cf_level *level, int *status); static unsigned tsc_get_timecount(struct timecounter *tc); -static unsigned tsc_get_timecount_low(struct timecounter *tc); +static inline unsigned tsc_get_timecount_low(struct timecounter *tc); +static unsigned tsc_get_timecount_lfence(struct timecounter *tc); +static unsigned tsc_get_timecount_low_lfence(struct timecounter *tc); +static unsigned tsc_get_timecount_mfence(struct timecounter *tc); +static unsigned tsc_get_timecount_low_mfence(struct timecounter *tc); static void tsc_levels_changed(void *arg, int unit);
static struct timecounter tsc_timecounter = { @@ -262,6 +266,10 @@ probe_tsc_freq(void) (vm_guest == VM_GUEST_NO && CPUID_TO_FAMILY(cpu_id) >= 0x10)) tsc_is_invariant = 1; + if (cpu_feature & CPUID_SSE2) { + tsc_timecounter.tc_get_timecount = + tsc_get_timecount_mfence; + } break; case CPU_VENDOR_INTEL: if ((amd_pminfo & AMDPM_TSC_INVARIANT) != 0 || @@ -271,6 +279,10 @@ probe_tsc_freq(void) (CPUID_TO_FAMILY(cpu_id) == 0xf && CPUID_TO_MODEL(cpu_id) >= 0x3)))) tsc_is_invariant = 1; + if (cpu_feature & CPUID_SSE2) { + tsc_timecounter.tc_get_timecount = + tsc_get_timecount_lfence; + } break; case CPU_VENDOR_CENTAUR: if (vm_guest == VM_GUEST_NO && @@ -278,6 +290,10 @@ probe_tsc_freq(void) CPUID_TO_MODEL(cpu_id) >= 0xf && (rdmsr(0x1203) & 0x100000000ULL) == 0) tsc_is_invariant = 1; + if (cpu_feature & CPUID_SSE2) { + tsc_timecounter.tc_get_timecount = + tsc_get_timecount_lfence; + } break; }
@@ -328,16 +344,31 @@ init_TSC(void)
#ifdef SMP
-/* rmb is required here because rdtsc is not a serializing instruction. */ -#define TSC_READ(x) \ -static void \ -tsc_read_##x(void *arg) \ -{ \ - uint32_t *tsc = arg; \ - u_int cpu = PCPU_GET(cpuid); \ - \ - rmb(); \ - tsc[cpu * 3 + x] = rdtsc32(); \ +/* + * RDTSC is not a serializing instruction, and does not drain + * instruction stream, so we need to drain the stream before executing + * it. It could be fixed by use of RDTSCP, except the instruction is + * not available everywhere. + * + * Use CPUID for draining in the boot-time SMP constistency test. The + * timecounters use MFENCE for AMD CPUs, and LFENCE for others (Intel + * and VIA) when SSE2 is present, and nothing on older machines which + * also do not issue RDTSC prematurely. There, testing for SSE2 and + * vendor is too cumbersome, and we learn about TSC presence from + * CPUID. + * + * Do not use do_cpuid(), since we do not need CPUID results, which + * have to be written into memory with do_cpuid(). + */ +#define TSC_READ(x) \ +static void \ +tsc_read_##x(void *arg) \ +{ \ + uint32_t *tsc = arg; \ + u_int cpu = PCPU_GET(cpuid); \ + \ + __asm __volatile("cpuid" : : : "eax", "ebx", "ecx", "edx"); \ + tsc[cpu * 3 + x] = rdtsc32(); \ } TSC_READ(0) TSC_READ(1) @@ -487,7 +518,16 @@ init: for (shift = 0; shift < 31 && (tsc_freq >> shift) > max_freq; shift++) ; if (shift > 0) { - tsc_timecounter.tc_get_timecount = tsc_get_timecount_low; + if (cpu_feature & CPUID_SSE2) { + if (cpu_vendor_id == CPU_VENDOR_AMD) { + tsc_timecounter.tc_get_timecount = + tsc_get_timecount_low_mfence; + } else { + tsc_timecounter.tc_get_timecount = + tsc_get_timecount_low_lfence; + } + } else + tsc_timecounter.tc_get_timecount = tsc_get_timecount_low; tsc_timecounter.tc_name = "TSC-low"; if (bootverbose) printf("TSC timecounter discards lower %d bit(s)\n", @@ -599,16 +639,48 @@ tsc_get_timecount(struct timecounter *tc __unused) return (rdtsc32()); }
-static u_int +static inline u_int tsc_get_timecount_low(struct timecounter *tc) { uint32_t rv;
__asm __volatile("rdtsc; shrd %%cl, %%edx, %0" - : "=a" (rv) : "c" ((int)(intptr_t)tc->tc_priv) : "edx"); + : "=a" (rv) : "c" ((int)(intptr_t)tc->tc_priv) : "edx"); return (rv); }
+static u_int +tsc_get_timecount_lfence(struct timecounter *tc __unused) +{ + + lfence(); + return (rdtsc32()); +} + +static u_int +tsc_get_timecount_low_lfence(struct timecounter *tc) +{ + + lfence(); + return (tsc_get_timecount_low(tc)); +} + +static u_int +tsc_get_timecount_mfence(struct timecounter *tc __unused) +{ + + mfence(); + return (rdtsc32()); +} + +static u_int +tsc_get_timecount_low_mfence(struct timecounter *tc) +{ + + mfence(); + return (tsc_get_timecount_low(tc)); +} + uint32_t cpu_fill_vdso_timehands(struct vdso_timehands *vdso_th) {
|
238755 |
24-Jul-2012 |
jimharris |
Add rmb() to tsc_read_##x to enforce serialization of rdtsc captures.
Intel Architecture Manual specifies that rdtsc instruction is not serialized, so without this change, TSC synchronization test would periodically fail, resulting in use of HPET timecounter instead of TSC-low. This caused severe performance degradation (40-50%) when running high IO/s workloads due to HPET MMIO reads and GEOM stat collection.
Tests on Xeon E5-2600 (Sandy Bridge) 8C systems were seeing TSC synchronization fail approximately 20% of the time.
Sponsored by: Intel Reviewed by: kib MFC after: 3 days
|
238450 |
14-Jul-2012 |
kib |
Add support for the XSAVEOPT instruction use. Our XSAVE/XRSTOR usage mostly meets the guidelines set by the Intel SDM: 1. We use XRSTOR and XSAVE from the same CPL using the same linear address for the store area 2. Contrary to the recommendations, we cannot zero the FPU save area for a new thread, since fork semantic requires the copy of the previous state. This advice seemingly contradicts to the advice from the item 6. 3. We do use XSAVEOPT in the context switch code only, and the area for XSAVEOPT already always contains the data saved by XSAVE. 4. We do not modify the save area between XRSTOR, when the area is loaded into FPU context, and XSAVE. We always spit the fpu context into save area and start emulation when directly writing into FPU context. 5. We do not use segmented addressing to access save area, or rather, always address it using %ds basing. 6. XSAVEOPT can be only executed in the area which was previously loaded with XRSTOR, since context switch code checks for FPU use by outgoing thread before saving, and thread which stopped emulation forcibly get context loaded with XRSTOR. 7. The PCB cannot be paged out while FPU emulation is turned off, since stack of the executing thread is never swapped out.
The context switch code is patched to issue XSAVEOPT instead of XSAVE if supported. This approach eliminates one conditional in the context switch code, which would be needed otherwise.
For user-visible machine context to have proper data, fpugetregs() checks for unsaved extension blocks and manually copies pristine FPU state into them, according to the description provided by CPUID leaf 0xd.
MFC after: 1 month
|
237517 |
24-Jun-2012 |
andrew |
Make the wchar_t type machine dependent.
This is required for ARM EABI. Section 7.1.1 of the Procedure Call for the ARM Architecture (AAPCS) defines wchar_t as either an unsigned int or an unsigned short with the former preferred.
Because of this requirement we need to move the definition of __wchar_t to a machine dependent header. It also cleans up the macros defining the limits of wchar_t by defining __WCHAR_MIN and __WCHAR_MAX in the same machine dependent header then using them to define WCHAR_MIN and WCHAR_MAX respectively.
Discussed with: bde
|
237433 |
22-Jun-2012 |
kib |
Implement mechanism to export some kernel timekeeping data to usermode, using shared page. The structures and functions have vdso prefix, to indicate the intended location of the code in some future.
The versioned per-algorithm data is exported in the format of struct vdso_timehands, which mostly repeats the content of in-kernel struct timehands. Usermode reading of the structure can be lockless. Compatibility export for 32bit processes on 64bit host is also provided. Kernel also provides usermode with indication about currently used timecounter, so that libc can fall back to syscall if configured timecounter is unknown to usermode code.
The shared data updates are initiated both from the tc_windup(), where a fast task is queued to do the update, and from sysctl handlers which change timecounter. A manual override switch kern.timecounter.fast_gettime allows to turn off the mechanism.
Only x86 architectures export the real algorithm data, and there, only for tsc timecounter. HPET counters page could be exported as well, but I prefer to not further glue the kernel and libc ABI there until proper vdso-based solution is developed.
Minimal stubs neccessary for non-x86 architectures to still compile are provided.
Discussed with: bde Reviewed by: jhb Tested by: flo MFC after: 1 month
|
237037 |
13-Jun-2012 |
jkim |
- Remove unused code for CR3 and CR4. - Fix few style(9) nits while I am here.
|
236938 |
12-Jun-2012 |
iwasaki |
Share IPI init and startup code of mp_machdep.c with acpi_wakeup.c as ipi_startup().
|
236772 |
09-Jun-2012 |
iwasaki |
Add x86/acpica/acpi_wakeup.c for amd64 and i386. Difference of suspend/resume procedures are minimized among them.
common: - Add global cpuset suspended_cpus to indicate APs are suspended/resumed. - Remove acpi_waketag and acpi_wakemap from acpivar.h (no longer used). - Add some variables in acpi_wakecode.S in order to minimize the difference among amd64 and i386. - Disable load_cr3() because now CR3 is restored in resumectx().
amd64: - Add suspend/resume related members (such as MSR) in PCB. - Modify savectx() for above new PCB members. - Merge acpi_switch.S into cpu_switch.S as resumectx().
i386: - Merge(and remove) suspendctx() into savectx() in order to match with amd64 code.
Reviewed by: attilio@, acpi@
|
236503 |
03-Jun-2012 |
avg |
free wdog_kern_pat calls in post-panic paths from under SW_WATCHDOG
Those calls are useful with hardware watchdog drivers too.
MFC after: 3 weeks
|
235939 |
24-May-2012 |
obrien |
Consitently use "__LP64__". [there are 33 __LP64__'s in the kernel (minus cddl/ and contrib/), and 11 _LP64's]
|
235563 |
17-May-2012 |
jhb |
Don't expose i386-only ptrace constants on amd64. This broke gdb with libthread_db on amd64.
Reported by: avg
|
234989 |
03-May-2012 |
attilio |
Revert part of r234723 by re-enabling the SMP protection for intr_bind() on x86. This has been requested by jhb and I strongly disagree with this, but as long as he is the x86 and interrupt subsystem maintainer I will follow his directives.
The disagreement cames from what we should really consider as a public KPI. IMHO, if we really need a selection between the kernel functions, we may need an explicit protection like _KERNEL_KPI, which defines which subset of the kernel function might really be considered as part of the KPI (for thirdy part modules) and which not. As long as we don't have this mechanism I just consider any possible function as usable by thirdy part code, thus intr_bind() included.
MFC after: 1 week
|
234723 |
26-Apr-2012 |
attilio |
Clean up the intr* MD KPI from the SMP dependency, removing a cause of discrepancy between modules and kernel, but deal with SMP differences within the functions themselves.
As an added bonus this also helps in terms of code readability.
Requested by: gibbs Reviewed by: jhb, marius MFC after: 1 week
|
234364 |
17-Apr-2012 |
grehan |
Add x2apic MSR definitions
Reviewed by: jhb Obtained from: bhyve via Neel via NetApp
|
234153 |
11-Apr-2012 |
jhb |
Trim stray blank line.
|
234059 |
09-Apr-2012 |
jhb |
Recognize the RDRAND instruction feature.
Submitted by: Michael Fuckner michael fuckner net MFC after: 3 days
|
233961 |
06-Apr-2012 |
gibbs |
Fix interrupt load balancing regression, introduced in revision 222813, that left all un-pinned interrupts assigned to CPU 0.
sys/x86/x86/intr_machdep.c: In intr_shuffle_irqs(), remove CPU_SETOF() call that initialized the "intr_cpus" cpuset to only contain CPU0.
This initialization is too late and nullifies the results of calls the intr_add_cpu() that occur much earlier in the boot process. Since "intr_cpus" is statically initialized to the empty set, and all processors, including the BSP, already add themselves to "intr_cpus" no special initialization for the BSP is necessary.
MFC after: 3 days
|
233793 |
02-Apr-2012 |
jhb |
Further tweak the changes made in r233709. The kernel doesn't permit sleeping from a swi handler (even though in this case it would be ok), so switch the refill and scanning SWI handlers to being tasks on a fast taskqueue. Also, only schedule the refill task for a CMCI as an MC# can fire at any time, so it should do the minimal amount of work needed and avoid opportunities to deadlock before it panics (such as scheduling a task it won't ever need in practice). To handle the case of an MC# only finding recoverable errors (which should never happen), always try to refill the event free list when the periodic scan executes.
MFC after: 2 weeks
|
233781 |
02-Apr-2012 |
jhb |
Make machine check exception logging more readable. On newer Intel systems, an uncorrected ECC error tends to fire on all CPUs in a package simultaneously and the current printf hacks are not sufficient to make the messages legible. Instead, use the existing mca_lock spinlock to serialize calls to mca_log() and change the machine check code to panic directly when an unrecoverable error is encoutered rather than falling back to a trap_fatal() call in trap() (which adds nearly a screen-full of logging messages that aren't useful for machine checks).
MFC after: 2 weeks
|
233709 |
30-Mar-2012 |
jhb |
Attempt to make machine check handling a bit more robust: - Don't malloc() new MCA records for machine checks logged due to a CMCI or MC# exception. Instead, use a pre-allocated pool of records. When a CMCI or MC# exception fires, schedule a swi to refill the pool. The pool is sized to hold at least one record per available machine bank, and one record per CPU. This should handle the case of all CPUs triggering a single bank at once as well as the case a single CPU triggering all of its banks. The periodic scans still use malloc() since they are run from a safe context. - Since we have to create an swi to handle refills, make the periodic scan a second swi for the same thread instead of having a separate taskqueue thread for the scans.
Suggested by: mdf (avoiding malloc()) MFC after: 2 weeks
|
233707 |
30-Mar-2012 |
jhb |
Move the legacy(4) driver to x86.
|
233684 |
29-Mar-2012 |
dim |
Fix an issue introduced in sys/x86/include/endian.h with r232721. In that revision, the bswapXX_const() macros were renamed to bswapXX_gen().
Also, bswap64_gen() was implemented as two calls to bswap32(), and similarly, bswap32_gen() as two calls to bswap16(). This mainly helps our base gcc to produce more efficient assembly.
However, the arguments are not properly masked, which results in the wrong value being calculated in some instances. For example, bswap32(0x12345678) returns 0x7c563412, and bswap64(0x123456789abcdef0) returns 0xfcdefc9a7c563412.
Fix this by appropriately masking the arguments to bswap16() in bswap32_gen(), and to bswap32() in bswap64_gen(). This should also silence warnings from clang.
Submitted by: jh
|
233683 |
29-Mar-2012 |
dim |
Revert sys/x86/include/endian.h to what it was before r233419, as that revision has two problems: - It can produce worse code with both clang and gcc. - It doesn't fix the actual issue introduced in r232721, which will be fixed in the next commit.
Submitted by: bde, tijl and jh Pointy hat to: dim
|
233676 |
29-Mar-2012 |
jhb |
Use a more proper fix for enabling HT MSI mapping windows on Host-PCI bridges. Rather than blindly enabling the windows on all of them, only enable the window when an MSI interrupt is enabled for a device behind the bridge, similar to what already happens for HT PCI-PCI bridges.
To implement this, each x86 Host-PCI bridge driver has to be able to locate it's actual backing device on bus 0. For ACPI, use the _ADR method to find the slot and function of the device. For the non-ACPI case, the legacy(4) driver already scans bus 0 looking for Host-PCI bridge devices. Now it saves the slot and function of each bridge that it finds as ivars that the Host-PCI bridge driver can then use in its pcib_map_msi() method.
This fixes machines where non-MSI interrupts were broken by the previous round of HT MSI changes.
Tested by: bapt MFC after: 1 week
|
233675 |
29-Mar-2012 |
jhb |
Restore proper use of bounce buffers for ISA DMA. When locking was added, the call to pmap_kextract() was moved up, and as a result the code never updated the physical address to use for DMA if a bounce buffer was used. Restore the earlier location of pmap_kextract() so it takes bounce buffers into account.
Tested by: kargl MFC after: 1 week
|
233623 |
28-Mar-2012 |
jhb |
Allocate the ioapics[] array dynamically since it is only needed for the duration of madt_setup_io(). This avoids having the array take up permanent space in the BSS.
Inspired by: bde MFC after: 2 weeks
|
233613 |
28-Mar-2012 |
jhb |
Move the DTrace return IDT vector back up from 0x20 to 0x92. The 0x20 vector is currently dedicated to servicing IRQ 0 from the 8259A's, so it shouldn't be overloaded for DTrace.
Tested by: rstone MFC after: 1 week
|
233419 |
24-Mar-2012 |
dim |
Fix the following clang warning in sys/dev/dcons/dcons.c, caused by the recent changes in sys/x86/include/endian.h:
sys/dev/dcons/dcons.c:190:15: error: implicit conversion from '__uint32_t' (aka 'unsigned int') to '__uint16_t' (aka 'unsigned short') changes value from 1684238190 to 28526 [-Werror,-Wconstant-conversion] buf->magic = ntohl(DCONS_MAGIC); ^~~~~~~~~~~~~~~~~~ sys/sys/param.h:306:18: note: expanded from: #define ntohl(x) __ntohl(x) ^ ./x86/endian.h:128:20: note: expanded from: #define __ntohl(x) __bswap32(x) ^ ./x86/endian.h:78:20: note: expanded from: __bswap32_gen((__uint32_t)(x)) : __bswap32_var(x)) ^ ./x86/endian.h:68:26: note: expanded from: (((__uint32_t)__bswap16(x) << 16) | __bswap16((x) >> 16)) ^ ./x86/endian.h:75:53: note: expanded from: __bswap16_gen((__uint16_t)(x)) : __bswap16_var(x))) ~~~~~~~~~~~~~ ^
This is because the __bswapXX_gen() macros (for x86) call the regular __bswapXX() macros. Since the __bswapXX_gen() variants are only called when their arguments are constant, there is no need to do that constancy check recursively. Also, it causes the above error with clang.
Fix it by calling __bswap16_gen() from __bswap32_gen(), and similarly, __bswap32_gen() from __bswap64_gen().
While here, add extra parentheses around the __bswap16_gen() macro expansion, to prevent unexpected side effects.
|
233305 |
22-Mar-2012 |
jhb |
Mark the 'lapics' and 'ioapics' arrays here static since they are private to this file. The 'lapics' array was actually shadowing a completely different 'lapics' array that is private to local_apic.c.
Reported by: bde MFC after: 2 weeks
|
233209 |
19-Mar-2012 |
tijl |
Copy amd64 sysarch.h to x86 and merge with i386 sysarch.h. Replace amd64/i386/pc98 sysarch.h with stubs.
|
233207 |
19-Mar-2012 |
tijl |
Copy i386 specialreg.h to x86 and merge with amd64 specialreg.h. Replace amd64/i386/pc98 specialreg.h with stubs.
|
233204 |
19-Mar-2012 |
tijl |
Copy i386 psl.h to x86 and replace amd64/i386/pc98 psl.h with stubs.
|
233203 |
19-Mar-2012 |
tijl |
Move userland bits (and some common kernel bits) from amd64 and i386 segments.h to a new x86 segments.h.
Add __packed attribute to some structs (just to be sure). Also make it clear that i386 GDT and LDT entries are used in ia64 code.
|
233125 |
18-Mar-2012 |
tijl |
Eliminate ia32_reg.h by moving its contents to x86 and ia64 reg.h.
Reviewed by: kib
|
233124 |
18-Mar-2012 |
tijl |
Copy i386 reg.h to x86 and merge with amd64 reg.h. Replace i386/amd64/pc98 reg.h with stubs.
The tREGISTER macros are only made visible on i386. These macros are deprecated and should not be available on amd64.
The i386 and amd64 versions of struct reg have been renamed to struct __reg32 and struct __reg64. During compilation either __reg32 or __reg64 is defined as reg depending on the machine architecture. On amd64 the i386 struct is also available as struct reg32 which is used in COMPAT_FREEBSD32 code.
Most of compat/ia32/ia32_reg.h is now IA64 only.
Reviewed by: kib (previous version)
|
233044 |
16-Mar-2012 |
tijl |
Move userland bits of i386 npx.h and amd64 fpu.h to x86 fpu.h. Remove FPU types from compat/ia32/ia32_reg.h that are no longer needed. Create machine/npx.h on amd64 to allow compiling i386 code that uses this header.
The original npx.h and fpu.h define struct envxmm differently. Both definitions have been included in the new x86 header as struct __envxmm32 and struct __envxmm64. During compilation either __envxmm32 or __envxmm64 is defined as envxmm depending on machine architecture. On amd64 the i386 struct is also available as struct envxmm32.
Reviewed by: kib
|
233036 |
16-Mar-2012 |
jhb |
Revert the PCIe 4GB boundary issue workaround now that the proper fix is in HEAD.
Ok'd by: scottl
|
233031 |
16-Mar-2012 |
nyan |
- Fix to build a native i386 kernel without the SMP and atpic. - Merge r232744 changes to pc98. (Allow a kernel to be built with 'nodevice atpic'.) - Move ICU related defines from x86/isa/atpic.c to x86/isa/icu.h and use them in x86/x86/intr_machdep.c.
Reviewed by: jhb
|
232747 |
09-Mar-2012 |
jhb |
Move i386's intr_machdep.c to the x86 tree and share it with amd64.
|
232745 |
09-Mar-2012 |
dim |
Add casts to __uint16_t to the __bswap16() macros on all arches which didn't already have them. This is because the ternary expression will return int, due to the Usual Arithmetic Conversions. Such casts are not needed for the 32 and 64 bit variants.
While here, add additional parentheses around the x86 variant, to protect against unintended consequences.
MFC after: 2 weeks
|
232730 |
09-Mar-2012 |
tijl |
Cast the expression in __bswap16(x) to __uint16_t because it is promoted to int.
Reviewed by: dim
|
232721 |
09-Mar-2012 |
tijl |
Clean up x86 endian.h: - Remove extern "C". There are no functions with external linkage here. [1] - Rename bswapNN_const(x) to bswapNN_gen(x) to indicate that these macros are generic implementations that can take non-constant arguments. [1] - Split up __GNUCLIKE_ASM && __GNUCLIKE_BUILTIN_CONSTANT_P and deal with each separately. - Replace _LP64 with __amd64__ because asm instructions are machine dependent, not ABI dependent.
Submitted by: bde [1] Reviewed by: bde
|
232520 |
04-Mar-2012 |
tijl |
Copy amd64 ptrace.h to x86 and merge with i386 ptrace.h. Replace amd64/i386/pc98 ptrace.h with stubs.
For amd64 PT_GETXSTATE and PT_SETXSTATE have been redefined to match the i386 values. The old values are still supported but should no longer be used.
Reviewed by: kib
|
232519 |
04-Mar-2012 |
tijl |
Do not use INT64_C and UINT64_C to define 64 bit integer limits. They aren't defined for C++ code unless __STDC_CONSTANT_MACROS is defined.
Reported by: jhb
|
232492 |
04-Mar-2012 |
tijl |
Copy amd64 trap.h to x86 and replace amd64/i386/pc98 trap.h with stubs.
|
232491 |
04-Mar-2012 |
tijl |
Copy amd64 float.h to x86 and merge with i386 float.h. Replace amd64/i386/pc98 float.h with stubs.
|
232356 |
01-Mar-2012 |
jhb |
- Change contigmalloc() to use the vm_paddr_t type instead of an unsigned long for specifying a boundary constraint. - Change bus_dma tags to use bus_addr_t instead of bus_size_t for boundary constraints.
These allow boundary constraints to be fully expressed for cases where sizeof(bus_addr_t) != sizeof(bus_size_t). Specifically, it allows a driver to properly specify a 4GB boundary in a PAE kernel.
Note that this cannot be safely MFC'd without a lot of compat shims due to KBI changes, so I do not intend to merge it.
Reviewed by: scottl
|
232276 |
28-Feb-2012 |
tijl |
Copy amd64 stdarg.h to x86 and replace amd64/i386/pc98 stdarg.h with stubs.
|
232275 |
28-Feb-2012 |
tijl |
Copy amd64 setjmp.h to x86 and replace amd64/i386/pc98 setjmp.h with stubs.
|
232267 |
28-Feb-2012 |
emaste |
Workaround for PCIe 4GB boundary issue
Enforce a boundary of no more than 4GB - transfers crossing a 4GB boundary can lead to data corruption due to PCIe limitations. This change is a less-intrusive workaround that can be quickly merged back to older branches; a cleaner implementation will arrive in HEAD later but may require KPI changes.
This change is based on a suggestion by jhb@.
Reviewed by: scottl, jhb Sponsored by: Sandvine Incorporated MFC after: 3 days
|
232266 |
28-Feb-2012 |
tijl |
Copy amd64 endian.h to x86 and merge with i386 endian.h. Replace amd64/i386/pc98 endian.h with stubs.
In __bswap64_const(x) the conflict between 0xffUL and 0xffULL has been resolved by reimplementing the macro in terms of __bswap32(x). As a side effect __bswap64_var(x) is now implemented using two bswap instructions on i386 and should be much faster. __bswap32_const(x) has been reimplemented in terms of __bswap16(x) for consistency.
|
232264 |
28-Feb-2012 |
tijl |
Copy amd64 _stdint.h to x86 and merge with i386 _stdint.h. Replace amd64/i386/pc98 _stdint.h with stubs.
|
232262 |
28-Feb-2012 |
tijl |
Copy amd64 _limits.h to x86 and merge with i386 _limits.h. Replace amd64/i386/pc98 _limits.h with stubs.
|
232261 |
28-Feb-2012 |
tijl |
Copy amd64 _types.h to x86 and merge with i386 _types.h. Replace existing amd64/i386/pc98 _types.h with stubs.
|
232232 |
27-Feb-2012 |
jhb |
- Panic up front if a kernel does not include 'device atpic' and an APIC is not found. - Don't panic if lapic_enable_cmc() is called and the APIC is not enabled. This can happen due to booting a kernel with APIC disabled on a CPU that supports CMCI. - Wrap a long line.
|
232199 |
26-Feb-2012 |
kan |
Fix apparent logic reversal in setting the 'auto_mode' flag.
MFC after: 2 weeks
|
229427 |
03-Jan-2012 |
jhb |
Fix a few bugs in the SRAT parsing code: - Actually increment ndomain when building our list of known domains so that we can properly renumber them to be 0-based and dense. - If the number of domains exceeds the configured maximum (VM_NDOMAIN), bail out of processing the SRAT and disable NUMA rather than hitting an obscure panic later. - Don't bother parsing the SRAT at all if VM_NDOMAIN is set to 1 to disable NUMA (the default).
Reported by: phk (2) MFC after: 1 week
|
228283 |
05-Dec-2011 |
ed |
Get rid of kludgy per-descriptor state handling in acpi_apm.
Where i386/bios/apm.c requires no per-descriptor state, the ACPI version of these device do. Instead of using hackish clone lists that leave stale device nodes lying around, use the cdevpriv API.
|
227843 |
22-Nov-2011 |
marius |
- There's no need to overwrite the default device method with the default one. Interestingly, these are actually the default for quite some time (bus_generic_driver_added(9) since r52045 and bus_generic_print_child(9) since r52045) but even recently added device drivers do this unnecessarily. Discussed with: jhb, marcel - While at it, use DEVMETHOD_END. Discussed with: jhb - Also while at it, use __FBSDID.
|
227309 |
07-Nov-2011 |
ed |
Mark all SYSCTL_NODEs static that have no corresponding SYSCTL_DECLs.
The SYSCTL_NODE macro defines a list that stores all child-elements of that node. If there's no SYSCTL_DECL macro anywhere else, there's no reason why it shouldn't be static.
|
227293 |
07-Nov-2011 |
ed |
Mark MALLOC_DEFINEs static that have no corresponding MALLOC_DECLAREs.
This means that their use is restricted to a single C file.
|
226039 |
05-Oct-2011 |
jhb |
Ignore SRAT memory entries if the memory range does not overlap with an existing phys_avail[] table. If a hw.physmem setting causes a memory domain to not be present in phys_avail[], the SRAT table will now be ignored rather than triggering a panic when a CPU in the missing domain tries to allocate a page.
MFC after: 1 week
|
225177 |
25-Aug-2011 |
attilio |
Fix a deficiency in the selinfo interface: If a selinfo object is recorded (via selrecord()) and then it is quickly destroyed, with the waiters missing the opportunity to awake, at the next iteration they will find the selinfo object destroyed, causing a PF#.
That happens because the selinfo interface has no way to drain the waiters before to destroy the registered selinfo object. Also this race is quite rare to get in practice, because it would require a selrecord(), a poll request by another thread and a quick destruction of the selrecord()'ed selinfo object.
Fix this by adding the seldrain() routine which should be called before to destroy the selinfo objects (in order to avoid such case), and fix the present cases where it might have already been called. Sometimes, the context is safe enough to prevent this type of race, like it happens in device drivers which installs selinfo objects on poll callbacks. There, the destruction of the selinfo object happens at driver detach time, when all the filedescriptors should be already closed, thus there cannot be a race. For this case, mfi(4) device driver can be set as an example, as it implements a full correct logic for preventing this from happening.
Sponsored by: Sandvine Incorporated Reported by: rstone Tested by: pluknet Reviewed by: jhb, kib Approved by: re (bz) MFC after: 3 weeks
|
225069 |
22-Aug-2011 |
silby |
Disable TSC usage inside SMP VM environments. On my VMware ESXi 4.1 environment with a core i5-2500K, operation in this mode causes timeouts from the mpt driver. Switching to the ACPI-fast timer resolves this issue. Switching the VM back to single CPU mode also works, which is why I have not disabled the TSC in that mode.
I did not test with KVM or other VM environments, but I am being cautious and assuming that the TSC is not reliable in SMP mode there as well.
Reviewed by: kib Approved by: re (kib) MFC after: Not applicable, the timecounter code is new for 9.x
|
224096 |
16-Jul-2011 |
jhb |
Fix build when NEW_PCIB is not defined.
Submitted by: gcooper (partially) Pointy hat to: jhb
|
224069 |
15-Jul-2011 |
jhb |
Respect the BIOS/firmware's notion of acceptable address ranges for PCI resource allocation on x86 platforms: - Add a new helper API that Host-PCI bridge drivers can use to restrict resource allocation requests to a set of address ranges for different resource types. - For the ACPI Host-PCI bridge driver, use Producer address range resources in _CRS to enumerate valid address ranges for a given Host-PCI bridge. This can be disabled by including "hostres" in the debug.acpi.disabled tunable. - For the MPTable Host-PCI bridge driver, use entries in the extended MPTable to determine the valid address ranges for a given Host-PCI bridge. This required adding code to parse extended table entries.
Similar to the new PCI-PCI bridge driver, these changes are only enabled if the NEW_PCIB kernel option is enabled (which is enabled by default on amd64 and i386).
Approved by: re (kib)
|
224042 |
14-Jul-2011 |
jkim |
If TSC stops ticking in C3, disable deep sleep when the user forcefully select TSC as timecounter hardware.
Tested by: Fabian Keil (freebsd-listen at fabiankeil dot de)
|
223440 |
22-Jun-2011 |
jhb |
Move {amd64,i386}/pci/pci_bus.c and {amd64,i386}/include/pci_cfgreg.h to the x86 tree. The $PIR code is still only enabled on i386 and not amd64. While here, make the qpi(4) driver on conditional on 'device pci'.
|
223426 |
22-Jun-2011 |
jkim |
Set negative quality to TSC timecounter when C3 state is enabled for Intel processors unless the invariant TSC bit of CPUID is set. Intel processors may stop incrementing TSC when DPSLP# pin is asserted, according to Intel processor manuals, i. e., TSC timecounter is useless if the processor can enter deep sleep state (C3/C4). This problem was accidentally uncovered by r222869, which increased timecounter quality of P-state invariant TSC, e.g., for Core2 Duo T5870 (Family 6, Model f) and Atom N270 (Family 6, Model 1c).
Reported by: Fabian Keil (freebsd-listen at fabiankeil dot de) Ian FREISLICH (ianf at clue dot co dot za) Tested by: Fabian Keil (freebsd-listen at fabiankeil dot de) - Core2 Duo T5870 (C3 state available/enabled) jkim - Xeon X5150 (C3 state unavailable)
|
223211 |
17-Jun-2011 |
jkim |
Teach the compiler how to shift TSC value efficiently. As noted in r220631, some times compiler inserts redundant instructions to preserve unused upper 32 bits even when it is casted to a 32-bit value. Unfortunately, it seems the problem becomes more serious when it is shifted, especially on amd64.
|
222884 |
08-Jun-2011 |
jkim |
Tidy up r222866.
- Re-add accidentally removed atomic op. for sysctl(9) handler. - Remove a period(`.') at the end of a debugging message. - Consistently spell "low" for "TSC-low" timecounter throughout.
Pointed out by: bde
|
222869 |
08-Jun-2011 |
jkim |
Increase quality of TSC (or TSC-low) timecounter to 1000 if it is P-state invariant. For SMP case (TSC-low), it also has to pass SMP synchronization test and the CPU vendor/model has to be white-listed explicitly. Currently, all Intel CPUs and single-socket AMD Family 15h processors are listed here.
Discussed with: hackers
|
222866 |
08-Jun-2011 |
jkim |
Introduce low-resolution TSC timecounter "TSC-low". It replaces the normal TSC timecounter if TSC frequency is higher than ~4.29 MHz (or 2^32-1 Hz) or multiple CPUs are present. The "TSC-low" frequency is always lower than a preset maximum value and derived from TSC frequency (by being halved until it becomes lower than the maximum). Note the maximum value for SMP case is significantly lower than UP case because we want to reduce (rare but known) "temporal anomalies" caused by non-serialized RDTSC instruction. Normally, it is still higher than "ACPI-fast" timecounter frequency (which was default timecounter hardware for long time until r222222) to be useful.
|
222864 |
08-Jun-2011 |
jkim |
Remove a redundant assignment since r221703.
|
222813 |
07-Jun-2011 |
attilio |
etire the cpumask_t type and replace it with cpuset_t usage.
This is intended to fix the bug where cpu mask objects are capped to 32. MAXCPU, then, can now arbitrarely bumped to whatever value. Anyway, as long as several structures in the kernel are statically allocated and sized as MAXCPU, it is suggested to keep it as low as possible for the time being.
Technical notes on this commit itself: - More functions to handle with cpuset_t objects are introduced. The most notable are cpusetobj_ffs() (which calculates a ffs(3) for a cpuset_t object), cpusetobj_strprint() (which prepares a string representing a cpuset_t object) and cpusetobj_strscan() (which creates a valid cpuset_t starting from a string representation). - pc_cpumask and pc_other_cpus are target to be removed soon. With the moving from cpumask_t to cpuset_t they are now inefficient and not really useful. Anyway, for the time being, please note that access to pcpu datas is protected by sched_pin() in order to avoid migrating the CPU while reading more than one (possible) word - Please note that size of cpuset_t objects may differ between kernel and userland. While this is not directly related to the patch itself, it is good to understand that concept and possibly use the patch as a reference on how to deal with cpuset_t objects in userland, when accessing kernland members. - KTR_CPUMASK is changed and now is represented through a string, to be set as the example reported in NOTES.
Please additively note that no MAXCPU is bumped in this patch, but private testing has been done until to MAXCPU=128 on a real 8x8x2(htt) machine (amd64).
Please note that the FreeBSD version is not yet bumped because of the upcoming pcpu changes. However, note that this patch is not targeted for MFC.
People to thank for the time spent on this patch: - sbruno, pluknet and Nicholas Esborn (nick AT desert DOT net) tested several revision of the patches and really helped in improving stability of this work. - marius fixed several bugs in the sparc64 implementation and reviewed patches related to ktr. - jeff and jhb discussed the basic approach followed. - kib and marcel made targeted review on some specific part of the patch. - marius, art, nwhitehorn and andreast reviewed MD specific part of the patch. - marius, andreast, gonzo, nwhitehorn and jceel tested MD specific implementations of the patch. - Other people have made contributions on other patches that have been already committed and have been listed separately.
Companies that should be mentioned for having participated at several degrees: - Yahoo! for having offered the machines used for testing on big count of CPUs. - The FreeBSD Foundation for having sponsored my devsummit attendance, which has been instrumental. - Sandvine for having offered offices and infrastructure during development.
(I really hope I didn't forget anyone, if it happened I apologize in advance).
|
221703 |
09-May-2011 |
jkim |
Implement boot-time TSC synchronization test for SMP. This test is executed when the user has indicated that the system has synchronized TSCs or it has P-state invariant TSCs. For the former case, we may clear the tunable if it fails the test to prevent accidental foot-shooting. For the latter case, we may set it if it passes the test to notify the user that it may be usable.
|
221526 |
06-May-2011 |
jhb |
Retire isa_setup_intr() and isa_teardown_intr() and use the generic bus versions instead. They were never needed as bus_generic_intr() and bus_teardown_intr() had been changed to pass the original child device up in 42734, but the ISA bus was not converted to new-bus until 45720.
|
221508 |
05-May-2011 |
mav |
Some changes around LAPIC timer programming.
This fixes heavy interrupt storm and resulting system freeze when using LAPIC timer in one-shot mode under Xen HVM. There, unlike real hardware, programming timer with zero period almost immediately causes interrupt.
|
221393 |
03-May-2011 |
jhb |
Reimplement how PCI-PCI bridges manage their I/O windows. Previously the driver would verify that requests for child devices were confined to any existing I/O windows, but the driver relied on the firmware to initialize the windows and would never grow the windows for new requests. Now the driver actively manages the I/O windows.
This is implemented by allocating a bus resource for each I/O window from the parent PCI bus and suballocating that resource to child devices. The suballocations are managed by creating an rman for each I/O window. The suballocated resources are mapped by passing the bus_activate_resource() call up to the parent PCI bus. Windows are grown when needed by using bus_adjust_resource() to adjust the resource allocated from the parent PCI bus. If the adjust request succeeds, the window is adjusted and the suballocation request for the child device is retried.
When growing a window, the rman_first_free_region() and rman_last_free_region() routines are used to determine if the front or end of the existing I/O window is free. From using that, the smallest ranges that need to be added to either the front or back of the window are computed. The driver will first try to grow the window in whichever direction requires the smallest growth first followed by the other direction if that fails.
Subtractive bridges will first attempt to satisfy requests for child resources from I/O windows (including attempts to grow the windows). If that fails, the request is passed up to the parent PCI bus directly however.
The PCI-PCI bridge driver will try to use firmware-assigned ranges for child BARs first and only allocate a "fresh" range if that specific range cannot be accommodated in the I/O window. This allows systems where the firmware assigns resources during boot but later wipes the I/O windows (some ACPI BIOSen are known to do this) to "rediscover" the original I/O window ranges.
The ACPI Host-PCI bridge driver has been adjusted to correctly honor hw.acpi.host_mem_start and the I/O port equivalent when a PCI-PCI bridge makes a wildcard request for an I/O window range.
The new PCI-PCI bridge driver is only enabled if the NEW_PCIB kernel option is enabled. This is a transition aide to allow platforms that do not yet support bus_activate_resource() and bus_adjust_resource() in their Host-PCI bridge drivers (and possibly other drivers as needed) to use the old driver for now. Once all platforms support the new driver, the kernel option and old driver will be removed.
PR: kern/143874 kern/149306 Tested by: mav
|
221331 |
02-May-2011 |
jkim |
Fix build with clang. Please note there is an LLVM/Clang PR:
http://llvm.org/bugs/show_bug.cgi?id=9379
Reported by: rpaulo, dim
|
221324 |
02-May-2011 |
jhb |
Add implementations of BUS_ADJUST_RESOURCE() to the PCI bus driver, generic PCI-PCI bridge driver, x86 nexus driver, and x86 Host to PCI bridge drivers.
|
221218 |
29-Apr-2011 |
jhb |
Change rman_manage_region() to actually honor the rm_start and rm_end constraints on the rman and reject attempts to manage a region that is out of range. - Fix various places that set rm_end incorrectly (to ~0 or ~0u instead of ~0ul). - To preserve existing behavior, change rman_init() to set rm_start and rm_end to allow managing the full range (0 to ~0ul) if they are not set by the caller when rman_init() is called.
|
221214 |
29-Apr-2011 |
jkim |
Detect VMware guest and set the TSC frequency as reported by the hypervisor. VMware products virtualize TSC and it run at fixed frequency in so-called "apparent time". Although virtualized i8254 also runs in apparent time, TSC calibration always gives slightly off frequency because of the complicated timer emulation and lost-tick correction mechanism.
|
221178 |
28-Apr-2011 |
jkim |
Turn off periodic recalibration of CPU ticker frequency if it is invariant.
|
221173 |
28-Apr-2011 |
attilio |
Add the watchdogs patting during the (shutdown time) disk syncing and disk dumping. With the option SW_WATCHDOG on, these operations are doomed to let watchdog fire, fi they take too long.
I implemented the stubs this way because I really want wdog_kern_* KPI to not be dependant by SW_WATCHDOG being on (and really, the option only enables watchdog activation in hardclock) and also avoid to call them when not necessary (avoiding not-volountary watchdog activations).
Sponsored by: Sandvine Incorporated Discussed with: emaste, des MFC after: 2 weeks
|
221102 |
27-Apr-2011 |
jkim |
Use ACPI-supplied CPU frequencies instead of estimated ones as we are about to use other values from the same table anyway.
MFC after: 3 days
|
220641 |
14-Apr-2011 |
jkim |
Use newly added rdtsc32() for DELAY(9) as well.
|
220637 |
14-Apr-2011 |
jkim |
Work around an emulator problem where virtual CPU advertises TSC is P-state invariant and APERF/MPERF MSRs exist but these MSRs never tick. When we calculate effective frequency from cpu_est_clockrate(), it caused panic of division-by-zero. Now we test whether these MSRs actually increase to avoid such foot-shooting.
Reported by: dim Tested by: dim
|
220632 |
14-Apr-2011 |
jkim |
Use newly added rdtsc32() for the timecounter_get_t method.
|
220613 |
14-Apr-2011 |
jkim |
Add some tunable descriptions about x86 timers.
Requested by: arundel
|
220581 |
12-Apr-2011 |
jkim |
Do not use TSC for DELAY(9) if it not P-state invariant to avoid possible foot-shooting. DELAY() becomes unreliable when TSC frequency varies wildly, especially cpufreq(4) and powerd(8) are used at the same time.
|
220579 |
12-Apr-2011 |
jkim |
Probe capability to find effective frequency. When the TSC is P-state invariant, APERF/MPERF ratio can be used to find effective frequency.
|
220577 |
12-Apr-2011 |
jkim |
Add a new tunable 'machdep.disable_tsc_calibration' to allow skipping TSC frequency calibration. For Intel processors, if brand string from CPUID contains its nominal frequency, this frequency is used instead.
|
220545 |
11-Apr-2011 |
jkim |
Merge two similar functions to reduce duplication.
|
220459 |
08-Apr-2011 |
jkim |
Refactor DELAYDEBUG as it is only useful for correcting i8254 frequency.
|
220433 |
07-Apr-2011 |
jkim |
Use atomic load & store for TSC frequency. It may be overkill for amd64 but safer for i386 because it can be easily over 4 GHz now. More worse, it can be easily changed by user with 'machdep.tsc_freq' tunable (directly) or cpufreq(4) (indirectly). Note it is intentionally not used in performance critical paths to avoid performance regression (but we should, in theory). Alternatively, we may add "virtual TSC" with lower frequency if maximum frequency overflows 32 bits (and ignore possible incoherency as we do now).
|
219700 |
16-Mar-2011 |
jkim |
Revert r219676.
Requested by: jhb, bde
|
219676 |
15-Mar-2011 |
jkim |
Do not let machdep.tsc_freq modify tsc_freq itself. It is bad for i386 as it does not operate atomically. Actually, it serves no purpose.
Noticed by: bde
|
219673 |
15-Mar-2011 |
jkim |
Deprecate tsc_present as the last of its real consumers finally disappeared.
|
219646 |
14-Mar-2011 |
jkim |
When TSC is unavailable, broken or disabled and the current timecounter has better quality than i8254 timer, use it for DELAY(9).
|
219473 |
11-Mar-2011 |
jkim |
Add a tunable "machdep.disable_tsc" to turn off TSC. Specifically, it turns off boot-time CPU frequency calibration, DELAY(9) with TSC, and using TSC as a CPU ticker. Note tsc_present does not change by this tunable.
|
219469 |
10-Mar-2011 |
jkim |
Turn off pointless P-state invariant TSC detection based on CPU model on a virtual machine.
|
219461 |
10-Mar-2011 |
jkim |
Deprecate rarely used tsc_is_broken. Instead, we zero out tsc_freq because it is almost always used with tsc_freq any way.
|
219046 |
25-Feb-2011 |
jkim |
Set C1 "I/O then Halt" capability bit for Intel EIST. Some broken BIOSes refuse to load external SSDTs if this bit is unset for _PDC. It seems Linux and OpenSolaris did the same long ago.
MFC after: 1 week
|
218965 |
23-Feb-2011 |
brucec |
Fix typos - remove duplicate "is".
PR: docs/154934 Submitted by: Eitan Adler <lists at eitanadler.com> MFC after: 3 days
|
218221 |
03-Feb-2011 |
jhb |
Use a dedicated taskqueue with a thread that runs at a software-interrupt priority for the periodic polling of the machine check registers.
|
217616 |
19-Jan-2011 |
mdf |
Introduce signed and unsigned version of CTLTYPE_QUAD, renaming existing uses. Rename sysctl_handle_quad() to sysctl_handle_64().
|
217360 |
13-Jan-2011 |
jhb |
If an interrupt on an I/O APIC is moved to a different CPU after it has started to execute, it seems that the corresponding ISR bit in the "old" local APIC can be cleared. This causes the local APIC interrupt routine to fail to find an interrupt to service. Rather than panic'ing in this case, simply return from the interrupt without sending an EOI to the local APIC. If there are any other pending interrupts in other ISR registers, the local APIC will assert a new interrupt.
Tested by: steve
|
217337 |
13-Jan-2011 |
mdf |
Revert to using bus_size_t for the bounce_zone's alignment member.
Reuqested by: jhb
|
217330 |
12-Jan-2011 |
mdf |
Fix a brain fart. Since this file is shared between i386 and amd64, a bus_size_t may be 32 or 64 bits. Change the bounce_zone alignment field to explicitly be 32 bits, as I can't really imagine a DMA device that needs anything close to 2GB alignment of data.
|
217326 |
12-Jan-2011 |
mdf |
sysctl(9) cleanup checkpoint: amd64 GENERIC builds cleanly.
Commit the kernel changes.
|
217265 |
11-Jan-2011 |
jhb |
Remove unneeded includes of <sys/linker_set.h>. Other headers that use it internally contain nested includes.
Reviewed by: bde
|
217157 |
08-Jan-2011 |
tijl |
Copy powerpc/include/_inttypes.h to x86 and replace i386/amd64/pc98 headers with stubs.
Approved by: kib (mentor)
|
216679 |
23-Dec-2010 |
jhb |
Drop the icu_lock spinlock while pausing briefly after masking the interrupt in the I/O APIC before moving it to a different CPU. If the interrupt had been triggered by the I/O APIC after locking icu_lock but before we masked the pin in the I/O APIC, then this could cause the interrupt to be pending on the "old" CPU and it would finally trigger after we had moved the interrupt to the new CPU. This could cause us to panic as there was no interrupt source associated with the old IDT vector on the old CPU. Dropping the lock after the interrupt is masked but before it is moved allows the interrupt to fire and be handled in this case before it is moved.
Tested by: Daniel Braniss danny of cs huji ac il MFC after: 1 week
|
216592 |
20-Dec-2010 |
tijl |
Merge amd64 and i386 bus.h and move the resulting header to x86. Replace the original amd64 and i386 headers with stubs.
Rename (AMD64|I386)_BUS_SPACE_* to X86_BUS_SPACE_* everywhere.
Reviewed by: imp (previous version), jhb Approved by: kib (mentor)
|
216490 |
16-Dec-2010 |
jhb |
Small style fixes: - Avoid side-effect assignments in if statements when possible. - Don't use ! to check for NULL pointers, explicitly check against NULL. - Explicitly check error return values against 0. - Don't use INTR_MPSAFE for interrupt handlers with only filters as it is meaningless. - Remove unneeded function casts.
|
216337 |
09-Dec-2010 |
jkim |
Remove AMD Family 0Fh, Model 6Bh, Stepping 2 from the list of P-state invariant CPUs. I do not believe this model is P-state invariant any more. Maybe cpufreq(4) was broken at the time of commit. :-(
|
216316 |
09-Dec-2010 |
cperciva |
Replace i386/i386/busdma_machdep.c and amd64/amd64/busdma_machdep.c (which are identical) with a single x86/x86/busdma_machdep.c.
|
216283 |
08-Dec-2010 |
jkim |
Merge sys/amd64/amd64/tsc.c and sys/i386/i386/tsc.c and move to sys/x86/x86.
Discussed with: avg
|
215856 |
26-Nov-2010 |
tijl |
Merge amd64/i386 _align.h by aligning on the size of register_t (copied from powerpc).
Reviewed by: imp, jhb Approved by: kib (mentor)
|
215751 |
23-Nov-2010 |
avg |
x86/local_apic: use newly added ARAT bit definition
ARAT: APIC-Timer-always-running feature.
Suggested by: mav MFC after: 12 days
|
215398 |
16-Nov-2010 |
avg |
hwpstate: use CPU_FOREACH when binding to all available processors
Also, add a comment mentioning _PSD - on some systems it's enough to put one logical CPU into a particular P-state to make other CPUs in the same domain to enter that P-state.
Also, call sched_unbind() after the loop - sched_bind() automatically rebinds from previous CPU to a new one, and the new arrangement of code is safer against early loop exit.
Plus one minor style nit.
MFC after: 10 days
|
215140 |
11-Nov-2010 |
jkim |
Move identical copies of apm_bios.h to sys/x86/include, replace them with stubs, and adjust PC98 stub accordingly.
Reviewed by: imp, nyan
|
215131 |
11-Nov-2010 |
avg |
make it possible to actually enable hwpstate_verbose
Either via the tunable or the sysctl.
MFC after: 3 days
|
215097 |
10-Nov-2010 |
jkim |
Make APM emulation look more closer to its origin. Use device_get_softc(9) instead of hardcoding acpi(4) unit number as we have device_t for it.
|
215072 |
10-Nov-2010 |
jkim |
Refactor acpi_machdep.c for amd64 and i386, move APM emulation into a new file acpi_apm.c, and place it on sys/x86/acpica.
|
215051 |
09-Nov-2010 |
attilio |
Move the mptable.h under x86/include/.
Sponsored by: Sandvine Incorporated MFC after: 14 days
|
215024 |
09-Nov-2010 |
jkim |
Now OsdEnvironment.c is identical on amd64 and i386. Move it to a new home.
|
215012 |
08-Nov-2010 |
jhb |
Move the MADT parser for amd64 and i386 to sys/x86/acpica now that it is identical on both platforms.
|
215009 |
08-Nov-2010 |
jhb |
Sync the APIC startup sequence with amd64: - Register APIC enumerators at SI_SUB_TUNABLES - 1 instead of SI_SUB_CPU - 1. - Probe CPUs at SI_SUB_TUNABLES - 1. This allows i386 to set a truly accurate mp_maxid value rather than always setting it to MAXCPU - 1.
|
215001 |
08-Nov-2010 |
jhb |
Only dump the values of the PMC and CMCI local vector table entries on a local APIC if those LVT entries are valid. This quiets spurious illegal register local APIC errors during boot on a CPU that doesn't support those vectors.
MFC after: 1 week
|
214686 |
02-Nov-2010 |
jhb |
Cosmetic change to revert one of my earlier ones. #if __i386__ && PAE is identical to just #if PAE since PAE is only a valid option for i386.
Submitted by: attilio
|
214681 |
02-Nov-2010 |
jhb |
Further tweaks to the ram_attach() routine: - Use > 2^32 - 1 instead of >= when checking for memory regions above 4G. - Skip SMAP entries > 4G on i386 rather than breaking out of the loop since SMAP entries are not guaranteed to be in order. - Remove 'i' and loop over 'rid' directly in the dump_avail[] case. - Only check for 4G regions in the dump_avail[] case on i386 if PAE is enabled since vm_paddr_t is 32-bit in the !PAE case.
Submitted by: alc
|
214676 |
02-Nov-2010 |
jhb |
Skip SMAP regions above 4GB on i386 since they will not fit into a long. While here, update some comments to better explain the new code flow.
Tested by: dhw
|
214631 |
01-Nov-2010 |
jhb |
Move <machine/apicreg.h> to <x86/apicreg.h>.
|
214630 |
01-Nov-2010 |
jhb |
Move the <machine/mca.h> header to <x86/mca.h>.
|
214629 |
01-Nov-2010 |
jhb |
Add an x86/include directory to the kernel to hold headers that are common to amd64, i386, and pc98. The headers are installed to /usr/include/x86 during an installworld, and an 'x86' symlink is created for kernel builds similar to 'machine' so that the headers can be included as <x86/foo.h>.
Reviewed by: imp
|
214515 |
29-Oct-2010 |
attilio |
- Merge ram_attach() implementation for i386 and amd64 - Rename RES_BUS_SPACE_* into BUS_SPACE_* for consistency - Trim out an unnecessary checking condition
Sponsored by: Sandvine Incorporated Requested and reviewed by: jhb
|
214457 |
28-Oct-2010 |
attilio |
Merge nexus.c from amd64 and i386 to x86 subtree.
Sponsored by: Sandvine Incorporated Tested by: gianni
|
214446 |
28-Oct-2010 |
attilio |
Merge the mptable support from MD bits to x86 subtree.
Sponsored by: Sandvine Incorporated Discussed with: jhb
|
214386 |
26-Oct-2010 |
attilio |
Style fix.
Reported by: bde, dim
|
214380 |
26-Oct-2010 |
attilio |
Remove usage of PRI* macro for style compliancy.
Requested by: bde, jhb Sponsored by: Sandvine Incorporated
|
214373 |
26-Oct-2010 |
attilio |
Merge dump_machdep.c i386/amd64 under the x86 subtree.
Sponsored by: Sandvine Incorporated Tested by: gianni
|
214347 |
25-Oct-2010 |
jhb |
Use 'saveintr' instead of 'savecrit' or 'eflags' to hold the state returned by intr_disable().
Requested by: bde
|
213918 |
16-Oct-2010 |
avg |
atrtc: remove (pre-)historic check of RTC NVRAM at address 0x0e
Old scrolls tell that once upon a time IBM AT BIOS was known to put some useful system diagnostic information into RTC NVRAM. It is not really known if and for how long PC BIOSes followed that convention, but I believe that many, if not all, modern BIOSes do not do that any more (not mentioning other types of x86 firmware). Some diagnostic bits don't even make any sense any longer. The check results in confusing messages upon boot on some systems. So I am removing it.
Discussed with: bde, jhb, mav MFC after: 3 weeks
|
212812 |
18-Sep-2010 |
mav |
Restore pre-r212778 optimization, skipping timer reprogramming when it is not neccessary. It allows to avoid time counter jump of up to 1/18s, when base frequency slightly tuned via machdep.i8254_freq sysctl. Fix few style things.
Suggested by: bde
|
212778 |
17-Sep-2010 |
mav |
Add one-shot mode support to attimer (i8254) event timer.
Unluckily, using one-shot mode is impossible, when same hardware used for time counting. Introduce new tunable hint.attimer.0.timecounter, setting which to 0 disables i8254 time counter and allows one-shot mode. Note, that on some systems there may be no other reliable enough time counters, so this tunable should be used with understanding.
|
212721 |
16-Sep-2010 |
mav |
Few whitespace cleanups and comments tunings.
Submitted by: arundel
|
212541 |
13-Sep-2010 |
mav |
Refactor timer management code with priority to one-shot operation mode. The main goal of this is to generate timer interrupts only when there is some work to do. When CPU is busy interrupts are generating at full rate of hz + stathz to fullfill scheduler and timekeeping requirements. But when CPU is idle, only minimum set of interrupts (down to 8 interrupts per second per CPU now), needed to handle scheduled callouts is executed. This allows significantly increase idle CPU sleep time, increasing effect of static power-saving technologies. Also it should reduce host CPU load on virtualized systems, when guest system is idle.
There is set of tunables, also available as writable sysctls, allowing to control wanted event timer subsystem behavior: kern.eventtimer.timer - allows to choose event timer hardware to use. On x86 there is up to 4 different kinds of timers. Depending on whether chosen timer is per-CPU, behavior of other options slightly differs. kern.eventtimer.periodic - allows to choose periodic and one-shot operation mode. In periodic mode, current timer hardware taken as the only source of time for time events. This mode is quite alike to previous kernel behavior. One-shot mode instead uses currently selected time counter hardware to schedule all needed events one by one and program timer to generate interrupt exactly in specified time. Default value depends of chosen timer capabilities, but one-shot mode is preferred, until other is forced by user or hardware. kern.eventtimer.singlemul - in periodic mode specifies how much times higher timer frequency should be, to not strictly alias hardclock() and statclock() events. Default values are 2 and 4, but could be reduced to 1 if extra interrupts are unwanted. kern.eventtimer.idletick - makes each CPU to receive every timer interrupt independently of whether they busy or not. By default this options is disabled. If chosen timer is per-CPU and runs in periodic mode, this option has no effect - all interrupts are generating.
As soon as this patch modifies cpu_idle() on some platforms, I have also refactored one on x86. Now it makes use of MONITOR/MWAIT instrunctions (if supported) under high sleep/wakeup rate, as fast alternative to other methods. It allows SMP scheduler to wake up sleeping CPUs much faster without using IPI, significantly increasing performance on some highly task-switching loads.
Tested by: many (on i386, amd64, sparc64 and powerc) H/W donated by: Gheorghe Ardelean Sponsored by: iXsystems, Inc.
|
212292 |
07-Sep-2010 |
jhb |
Each processor socket in a QPI system has a special PCI bus for the "uncore" devices (such as the memory controller) in that socket. Stop hardcoding support for two busses, but instead start probing buses at domain 0, bus 255 and walk down until a bus probe fails. Also, do not probe a bus if it has already been enumerated elsewhere (e.g. if ACPI ever enumerates these buses in the future).
|
212004 |
30-Aug-2010 |
rpaulo |
When DTrace is enabled, make sure we don't overwrite the IDT_DTRACE_RET entry with an IRQ for some hardware component.
Reviewed by: jhb Sponsored by: The FreeBSD Foundation
|
211821 |
25-Aug-2010 |
jhb |
Correctly ensure that the CPU family is 0x6, not non-zero.
Submitted by: Dimitry Andric
|
211820 |
25-Aug-2010 |
jhb |
Intel QPI chipsets actually provide two extra "non-core" PCI buses that provide PCI devices for various hardware such as memory controllers, etc. These PCI buses are not enumerated via ACPI however. Add qpi(4) psuedo bus and Host-PCI bridge drivers to enumerate these buses. Currently the driver uses the CPU ID to determine the bridges' presence.
In collaboration with: Joseph Golio @ Isilon Systems MFC after: 2 weeks
|
211756 |
24-Aug-2010 |
mav |
Enable timer interrupt before starting timer. This allows to handle very short periods without interrupt loss.
|
210620 |
29-Jul-2010 |
jhb |
When performing a sanity check on the SRAT table to ensure that each memory domain has an assigned CPU, ignore disabled CPUs. Previously disabled CPUs were counted as being in domain 0.
Reported by: mdf
|
210577 |
28-Jul-2010 |
jhb |
The corrected error count field is dependent on CMCI, not TES.
MFC after: 1 week
|
210552 |
27-Jul-2010 |
jhb |
Add a parser for the ACPI SRAT table for amd64 and i386. It sets PCPU(domain) for each CPU and populates a mem_affinity array suitable for the NUMA support in the physical memory allocator.
Reviewed by: alc
|
210444 |
24-Jul-2010 |
mav |
Increment td->td_intr_nesting_level for LAPIC timer interrupts. Among other things it hints SCHED_ULE to run clock swi handlers on their native CPUs, avoiding many unneeded IPI_PREEMPT calls.
|
210298 |
20-Jul-2010 |
mav |
Fix several un-/signedness bugs of r210290 and r210293. Add one more check.
|
210290 |
20-Jul-2010 |
mav |
Extend timer driver API to report also minimal and maximal supported period lengths. Make MI wrapper code to validate periods in request. Make kernel clock management code to honor these hardware limitations while choosing hz, stathz and profhz values.
|
210054 |
14-Jul-2010 |
mav |
Move timeevents.c to MI code, as it is not x86-specific. I already have it working on Marvell ARM SoCs, and it would be nice to unify timer code between more platforms.
|
210052 |
14-Jul-2010 |
mav |
Remove some unneeded includes. Code now can be built on ARM.
|
209990 |
13-Jul-2010 |
mav |
Rise knowledge about curthread->td_intr_frame by one step. Make timer callback argument really opaque. Not repeat interrupt handler's problem in case somebody will ever need to have both argument and frame.
|
209979 |
13-Jul-2010 |
mav |
Unify pc98 event timer code with the rest of x86.
Reviewed by: nyan@
|
209927 |
12-Jul-2010 |
mav |
Instead of deleting existing IRQ resource, which is not really working for ACPI bus, find wanted IRQ rid or spare one. This should fix panic during boot on systems reporting fancy IRQ numbers for attimer and atrtc.
|
209901 |
11-Jul-2010 |
mav |
Make kernel panic with reasonable message if no usable event timer found.
|
209635 |
01-Jul-2010 |
mav |
Allow attimer to be hinted at ISA if not reported by ISA PNP or ACPI. Rephrase respective atrtc code same way to be more readable.
|
209634 |
01-Jul-2010 |
mav |
Rework r209456: Instead of using fake rid (which ISA doesn't like), delete untrusted IRQ resource and let it be recreated.
|
209456 |
23-Jun-2010 |
mav |
Do not trust IRQ reported by ACPI. There are cases when it is wrong.
|
209440 |
22-Jun-2010 |
mav |
Add "legacy route" support to HPET driver. When enabled, this mode makes HPET to steal IRQ0 from i8254 and IRQ8 from RTC timers. It can be suitable for HPETs without FSB interrupts support, as it gives them two unshared IRQs. It allows them to provide one per-CPU event timer on dual-CPU system, that should be suitable for further tickless kernels.
To enable it, such lines may be added to /boot/loader.conf: hint.atrtc.0.clock=0 hint.attimer.0.clock=0 hint.hpet.0.legacy_route=1
|
209401 |
21-Jun-2010 |
mav |
Fix i386 LINT build broken by r209371. There appeared such legacy thing as APM, that somehow breaking RTC.
|
209371 |
20-Jun-2010 |
mav |
Implement new event timers infrastructure. It provides unified APIs for writing event timer drivers, for choosing best possible drivers by machine independent code and for operating them to supply kernel with hardclock(), statclock() and profclock() events in unified fashion on various hardware.
Infrastructure provides support for both per-CPU (independent for every CPU core) and global timers in periodic and one-shot modes. MI management code at this moment uses only periodic mode, but one-shot mode use planned for later, as part of tickless kernel project.
For this moment infrastructure used on i386 and amd64 architectures. Other archs are welcome to follow, while their current operation should not be affected.
This patch updates existing drivers (i8254, RTC and LAPIC) for the new order, and adds event timers support into the HPET driver. These drivers have different capabilities: LAPIC - per-CPU timer, supports periodic and one-shot operation, may freeze in C3 state, calibrated on first use, so may be not exactly precise. HPET - depending on hardware can work as per-CPU or global, supports periodic and one-shot operation, usually provides several event timers. i8254 - global, limited to periodic mode, because same hardware used also as time counter. RTC - global, supports only periodic mode, set of frequencies in Hz limited by powers of 2.
Depending on hardware capabilities, drivers preferred in following orders, either LAPIC, HPETs, i8254, RTC or HPETs, LAPIC, i8254, RTC. User may explicitly specify wanted timers via loader tunables or sysctls: kern.eventtimer.timer1 and kern.eventtimer.timer2. If requested driver is unavailable or unoperational, system will try to replace it. If no more timers available or "NONE" specified for second, system will operate using only one timer, multiplying it's frequency by few times and uing respective dividers to honor hz, stathz and profhz values, set during initial setup.
|
209339 |
19-Jun-2010 |
mav |
Core i5, same as previously Core2Duo, found to not set P-state for single core lower then set on other cores. Do not try to test P-states on attach on SMP systems. It is hopeless now and will just pollute verbose logs. If needed, check still can be forced via loader tunable.
|
209212 |
15-Jun-2010 |
jhb |
Restore the machine check register banks on resume. For banks being monitored via CMCI, reset the interrupt threshold to 1 on resume.
Reviewed by: jkim MFC after: 2 weeks
|
209154 |
14-Jun-2010 |
mav |
Virtualize pci_remap_msi_irq() call from general MSI code. It allows MSI (FSB interrupts) to be used by non-PCI devices, such as HPET.
|
209059 |
11-Jun-2010 |
jhb |
Update several places that iterate over CPUs to use CPU_FOREACH().
|
208991 |
10-Jun-2010 |
mav |
Do not disable edge-triggered interrupts before migration. DELAY() with interrupt disabled highly probable causes interrupt loss.
|
208922 |
08-Jun-2010 |
jhb |
Move the MD support for PCI message signalled interrupts to the x86 tree as it is identical for i386 and amd64.
|
208921 |
08-Jun-2010 |
jhb |
Move the machine check support code to the x86 tree since it is identical on i386 and amd64.
Requested by: alc
|
208919 |
08-Jun-2010 |
jhb |
Move the I/O APIC code to the x86 tree since it is identical on i386 and amd64.
|
208507 |
24-May-2010 |
jhb |
Add support for corrected machine check interrupts. CMCI is a new local APIC interrupt that fires when a threshold of corrected machine check events is reached. CMCI also includes a count of events when reporting corrected errors in the bank's status register. Note that individual banks may or may not support CMCI. If they do, each bank includes its own threshold register that determines when the interrupt fires. Currently the code uses a very simple strategy where it doubles the threshold on each interrupt until it succeeds in throttling the interrupt to occur only once a minute (this interval can be tuned via sysctl). The threshold is also adjusted on each hourly poll which will lower the threshold once events stop occurring.
Tested by: Sailaja Bangaru sbappana at yahoo com MFC after: 1 month
|
208494 |
24-May-2010 |
mav |
- Implement MI helper functions, dividing one or two timer interrupts with arbitrary frequencies into hardclock(), statclock() and profclock() calls. Same code with minor variations duplicated several times over the tree for different timer drivers and architectures. - Switch all x86 archs to new functions, simplifying the code and removing extra logic from timer drivers. Other archs are also welcome.
|
208479 |
24-May-2010 |
mav |
Restore different APIC init orders for i386 and amd64 unified in r208452. Seems noone of them contents both arch for different reasons.
Submitted by: kib@
|
208452 |
23-May-2010 |
mav |
Unify local_apic.c for x86 archs,
|
206922 |
20-Apr-2010 |
rpaulo |
Fix another instance of lapic_cyclic_clock_func.
|
206421 |
09-Apr-2010 |
attilio |
Default the machdep.lapic_allclocks to be enabled in order to cope with broken atrtc. Now if you want more correct stats on profhz and stathz it may be disabled by setting to 0.
Reported by: A. Akephalos <akephalos dot akephalos at gmail dot com>, Jakub Lach <jakub_lach at mailplus dot pl> MFC: 1 week
|
204641 |
03-Mar-2010 |
attilio |
Improving the clocks auto-tunning by firstly checking if the atrtc may be correctly initialized and just then assign to softclock/profclock. Right now, some atrtc seems reporting strange diagnostic error* making the current pattern bogus.
In order to do that cleanly, lapic_setup_clock(), on both ia32 and amd64, now accepts as arguments the desired sources to handle, and returns the actual ones (LAPIC_CLOCK_NONE is forbidden because otherwise there is no meaning in calling such function). This allows to bring out into commont x86 code the handling part for machdep.lapic_allclocks tunable, which is retained.
Sponsored by: Sandvine Incorporated Tested by: yongari, Richard Todd <rmtodd at ichotolot dot servalan dot com> MFC: 3 weeks X-MFC: r202387, 204309
|
204309 |
25-Feb-2010 |
attilio |
Introduce the new kernel sub-tree x86 which should contain all the code shared and generalized between our current amd64, i386 and pc98.
This is just an initial step that should lead to a more complete effort. For the moment, a very simple porting of cpufreq modules, BIOS calls and the whole MD specific ISA bus part is added to the sub-tree but ideally a lot of code might be added and more shared support should grow.
Sponsored by: Sandvine Incorporated Reviewed by: emaste, kib, jhb, imp Discussed on: arch MFC: 3 weeks
|