Cross Reference: /freebsd-9.3-release/sys/sparc64/sparc64/mp

History log of /freebsd-9.3-release/sys/sparc64/sparc64/mp_machdep.c
Revision	Date	Author	Comments (<<< Hide modified files) (Show modified files >>>)
# 267654	19-Jun-2014	gjb	Copy stable/9 to releng/9.3 as part of the 9.3-RELEASE cycle. Approved by: re (implicit) Sponsored by: The FreeBSD Foundation /freebsd-9.3-release
# 256207	09-Oct-2013	mav	MFC r251703 (by attilio): - Add a BIT_FFS() macro and use it to replace cpusetffs_obj()
# 241681	18-Oct-2012	marius	MFC: r239864 - Unlike cache invalidation and TLB demapping IPIs, reading registers from other CPUs doesn't require locking so get rid of it. As the latter is used for the timecounter on certain machine models, using a spin lock in this case can lead to a deadlock with the upcoming callout(9) rework. - Merge r134227/r167250 from x86: Avoid cross-IPI SMP deadlock by using the smp_ipi_mtx spin lock not only for smp_rendezvous_cpus() but also for the MD cache invalidation and TLB demapping IPIs. - Mark some unused function arguments as such.
# 225736	22-Sep-2011	kensmith	Copy head to stable/9 as part of 9.0-RELEASE release cycle. Approved by: re (implicit)
# 224681	06-Aug-2011	marius	Remove a shortcut which is invalid with MAXCPU > IDR_CHEETAH_MAX_BN_PAIRS. Approved by: re (kib)
# 223806	05-Jul-2011	marius	Remove the IDR_CHEETAH_MAX_BN_PAIRS limit from cheetah_ipi_selected(). This is just a simple approach. For reasons unknown OpenSolaris uses a more sophisticated one involving IPIing the remaining CPUs in reverse order after the first batch of 32.
# 223758	04-Jul-2011	attilio	With retirement of cpumask_t and usage of cpuset_t for representing a mask of CPUs, pc_other_cpus and pc_cpumask become highly inefficient. Remove them and replace their usage with custom pc_cpuid magic (as, atm, pc_cpumask can be easilly represented by (1 << pc_cpuid) and pc_other_cpus by (all_cpus & ~(1 << pc_cpuid))). This change is not targeted for MFC because of struct pcpu members removal and dependency by cpumask_t retirement. MD review by: marcel, marius, alc Tested by: pluknet MD testing by: marcel, marius, gonzo, andreast
# 223719	02-Jul-2011	marius	- For Cheetah- and Zeus-class CPUs don't flush all unlocked entries from the TLBs in order to get rid of the user mappings but instead traverse them an flush only the latter like we also do for the Spitfire-class. Also flushing the unlocked kernel entries can cause instant faults which when called from within cpu_switch() are handled with the scheduler lock held which in turn can cause timeouts on the acquisition of the lock by other CPUs. This was easily seen with a 16-core V890 but occasionally also happened with 2-way machines. While at it, move the SPARC64-V support code entirely to zeus.c. This causes a little bit of duplication but is less confusing than partially using Cheetah-class bits for these. - For SPARC64-V ensure that 4-Mbyte page entries are stored in the 1024- entry, 2-way set associative TLB. - In {d,i}tlb_get_data_sun4u() turn off the interrupts in order to ensure that ASI_{D,I}TLB_DATA_ACCESS_REG actually are read twice back-to-back. Tested by: Peter Jeremy (16-core US-IV), Michael Moll (2-way SPARC64-V)
# 223346	20-Jun-2011	marius	- Remove MD usage of pc_cpumask and pc_other_cpus. [1] - Remove CTASSERTs which no longer need to hold since r222813. Submitted by: attilio [1]
# 222828	07-Jun-2011	marius	Adapt CATR() to r222813. This is somewhat tricky as we can't afford using more than three temporary register in several places CATR() is used so this code trades instructions in for registers. Actually, this still isn't sufficient and CATR() has the side-effect of clobbering %y. Luckily, with the current uses of CATR() this either doesn't matter or we are able to (save and) restore it. Now that there's only one use of AND() and TEST() left inline these.
# 222813	07-Jun-2011	attilio	etire the cpumask_t type and replace it with cpuset_t usage. This is intended to fix the bug where cpu mask objects are capped to 32. MAXCPU, then, can now arbitrarely bumped to whatever value. Anyway, as long as several structures in the kernel are statically allocated and sized as MAXCPU, it is suggested to keep it as low as possible for the time being. Technical notes on this commit itself: - More functions to handle with cpuset_t objects are introduced. The most notable are cpusetobj_ffs() (which calculates a ffs(3) for a cpuset_t object), cpusetobj_strprint() (which prepares a string representing a cpuset_t object) and cpusetobj_strscan() (which creates a valid cpuset_t starting from a string representation). - pc_cpumask and pc_other_cpus are target to be removed soon. With the moving from cpumask_t to cpuset_t they are now inefficient and not really useful. Anyway, for the time being, please note that access to pcpu datas is protected by sched_pin() in order to avoid migrating the CPU while reading more than one (possible) word - Please note that size of cpuset_t objects may differ between kernel and userland. While this is not directly related to the patch itself, it is good to understand that concept and possibly use the patch as a reference on how to deal with cpuset_t objects in userland, when accessing kernland members. - KTR_CPUMASK is changed and now is represented through a string, to be set as the example reported in NOTES. Please additively note that no MAXCPU is bumped in this patch, but private testing has been done until to MAXCPU=128 on a real 8x8x2(htt) machine (amd64). Please note that the FreeBSD version is not yet bumped because of the upcoming pcpu changes. However, note that this patch is not targeted for MFC. People to thank for the time spent on this patch: - sbruno, pluknet and Nicholas Esborn (nick AT desert DOT net) tested several revision of the patches and really helped in improving stability of this work. - marius fixed several bugs in the sparc64 implementation and reviewed patches related to ktr. - jeff and jhb discussed the basic approach followed. - kib and marcel made targeted review on some specific part of the patch. - marius, art, nwhitehorn and andreast reviewed MD specific part of the patch. - marius, andreast, gonzo, nwhitehorn and jceel tested MD specific implementations of the patch. - Other people have made contributions on other patches that have been already committed and have been listed separately. Companies that should be mentioned for having participated at several degrees: - Yahoo! for having offered the machines used for testing on big count of CPUs. - The FreeBSD Foundation for having sponsored my devsummit attendance, which has been instrumental. - Sandvine for having offered offices and infrastructure during development. (I really hope I didn't forget anyone, if it happened I apologize in advance).
# 222531	31-May-2011	nwhitehorn	On multi-core, multi-threaded PPC systems, it is important that the threads be brought up in the order they are enumerated in the device tree (in particular, that thread 0 on each core be brought up first). The SLIST through which we loop to start the CPUs has all of its entries added with SLIST_INSERT_HEAD(), which means it is in reverse order of enumeration and so AP startup would always fail in such situations (causing a machine check or RTAS failure). Fix this by changing the SLIST into an STAILQ, and inserting new CPUs at the end. Reviewed by: jhb
# 216803	29-Dec-2010	marius	On UltraSPARC-III+ and greater take advantage of ASI_ATOMIC_QUAD_LDD_PHYS, which takes an physical address instead of an virtual one, for loading TTEs of the kernel TSB so we no longer need to lock the kernel TSB into the dTLB, which only has a very limited number of lockable dTLB slots. The net result is that we now basically can handle a kernel TSB of any size and no longer need to limit the kernel address space based on the number of dTLB slots available for locked entries. Consequently, other parts of the trap handlers now also only access the the kernel TSB via its physical address in order to avoid nested traps, as does the PMAP bootstrap code as we haven't taken over the trap table at that point, yet. Apart from that the kernel TSB now is accessed via a direct mapping when we are otherwise taking advantage of ASI_ATOMIC_QUAD_LDD_PHYS so no further code changes are needed. Most of this is implemented by extending the patching of the TSB addresses and mask as well as the ASIs used to load it into the trap table so the runtime overhead of this change is rather low. Currently the use of ASI_ATOMIC_QUAD_LDD_PHYS is not yet enabled on SPARC64 CPUs due to lack of testing and due to the fact it might require minor adjustments there. Theoretically it should be possible to use the same approach also for the user TSB, which already is not locked into the dTLB, avoiding nested traps. However, for reasons I don't understand yet OpenSolaris only does that with SPARC64 CPUs. On the other hand I think that also addressing the user TSB physically and thus avoiding nested traps would get us closer to sharing this code with sun4v, which only supports trap level 0 and 1, so eventually we could have a single kernel which runs on both sun4u and sun4v (as does Linux and OpenBSD). Developed at and committed from: 27C3
# 214071	19-Oct-2010	marius	- Wrap exchanging td_intr_frame and calling the event timer callback in a critical section as apparently required by both. I don't think either belongs in the event timer front-ends but the callback should handle this as necessary instead just like for example intr_event_handle() does but this is how the other architectures currently handle it, either explicitly or implicitly. - Further rename and reword references to hardclock as this front-end no longer has a notion of actually calling it.
# 213868	14-Oct-2010	marius	- In the spirit of r212559 add a comment describing what will eventually lower the PIL. - Just as with the AP ensure that the (S)TICK timer(s) are in a known state when starting BSPs.
# 212619	14-Sep-2010	marius	Remove redundant raising of the PIL to PIL_TICK as the respective locore code already did that.
# 212541	13-Sep-2010	mav	Refactor timer management code with priority to one-shot operation mode. The main goal of this is to generate timer interrupts only when there is some work to do. When CPU is busy interrupts are generating at full rate of hz + stathz to fullfill scheduler and timekeeping requirements. But when CPU is idle, only minimum set of interrupts (down to 8 interrupts per second per CPU now), needed to handle scheduled callouts is executed. This allows significantly increase idle CPU sleep time, increasing effect of static power-saving technologies. Also it should reduce host CPU load on virtualized systems, when guest system is idle. There is set of tunables, also available as writable sysctls, allowing to control wanted event timer subsystem behavior: kern.eventtimer.timer - allows to choose event timer hardware to use. On x86 there is up to 4 different kinds of timers. Depending on whether chosen timer is per-CPU, behavior of other options slightly differs. kern.eventtimer.periodic - allows to choose periodic and one-shot operation mode. In periodic mode, current timer hardware taken as the only source of time for time events. This mode is quite alike to previous kernel behavior. One-shot mode instead uses currently selected time counter hardware to schedule all needed events one by one and program timer to generate interrupt exactly in specified time. Default value depends of chosen timer capabilities, but one-shot mode is preferred, until other is forced by user or hardware. kern.eventtimer.singlemul - in periodic mode specifies how much times higher timer frequency should be, to not strictly alias hardclock() and statclock() events. Default values are 2 and 4, but could be reduced to 1 if extra interrupts are unwanted. kern.eventtimer.idletick - makes each CPU to receive every timer interrupt independently of whether they busy or not. By default this options is disabled. If chosen timer is per-CPU and runs in periodic mode, this option has no effect - all interrupts are generating. As soon as this patch modifies cpu_idle() on some platforms, I have also refactored one on x86. Now it makes use of MONITOR/MWAIT instrunctions (if supported) under high sleep/wakeup rate, as fast alternative to other methods. It allows SMP scheduler to wake up sleeping CPUs much faster without using IPI, significantly increasing performance on some highly task-switching loads. Tested by: many (on i386, amd64, sparc64 and powerc) H/W donated by: Gheorghe Ardelean Sponsored by: iXsystems, Inc.
# 211197	11-Aug-2010	jhb	Update various places that store or manipulate CPU masks to use cpumask_t instead of int or u_int. Since cpumask_t is currently u_int on all platforms this should just be a cosmetic change.
# 211071	08-Aug-2010	marius	- As it is not possible for sched_bind(9) to context switch with td_critnest > 1 when not already running on the desired CPU read the TICK counter of the BSP via a direct cross trap request in that case instead. - Treat the STICK based timecounter the same way as the TICK based one regarding its quality and obtaining the counter value from the BSP. Like the TICK timers the STICK ones also are only synchronized during their startup (which might not result in good synchronicity in the first place) but not afterwards and might drift over time, causing problems when the time is read from different CPUs (see r135972).
# 211050	07-Aug-2010	marius	- Introduce a cpu_ipi_single() function pointer in order to send IPIs to single CPUs more efficiently with Cheetah(-class) and Jalapeno CPUs. Besides being used to implement the ipi_cpu() introduced in r210939, cpu_ipi_single() will also be used internally by the sparc64 MD code. - Factor out the Jalapeno support from the Cheetah IPI send functions in order to be able to more easily and efficiently implement support for more than 32 target CPUs as well as a workaround for Cheetah+ erratum 25 for the latter.
# 210601	29-Jul-2010	mav	Adapt sparc64 and sun4v timer code for the new event timers infrastructure. Reviewed by: marius@
# 207537	02-May-2010	marius	Add support for SPARC64 V (and where it already makes sense for other HAL/Fujitsu) CPUs. For the most part this consists of fleshing out the MMU and cache handling, it doesn't add pmap optimizations possible with these CPU, yet, though. With these changes FreeBSD runs stable on Fujitsu Siemens PRIMEPOWER 250 and likely also other models based on SPARC64 V like 450, 650 and 850. Thanks go to Michael Moll for providing access to a PRIMEPOWER 250.
# 207248	26-Apr-2010	marius	Don't bother enabling interrupts before we're ready to handle them. This prevents the firmware of Fujitsu Siemens PRIMEPOWER250, which both causes stray interrupts and erroneously enables interrupts at least when calling SUNW,set-trap-table, in the foot.
# 204152	20-Feb-2010	marius	Some machines can not only consist of CPUs running at different speeds but also of different types, f.e. Sun Fire V890 can be equipped with a mix of UltraSPARC IV and IV+ CPUs, requiring different MMU initialization and different workarounds for model specific errata. Therefore move the CPU implementation number from a global variable to the per-CPU data. Functions which are called before the latter is available are passed the implementation number as a parameter now.
# 203838	13-Feb-2010	marius	- Search the whole OFW device tree instead of only the children of the root nexus device for the CPUs as starting with UltraSPARC IV the 'cpu' nodes hang off of from 'cmp' (chip multi-threading processor) or 'core' or combinations thereof. Also in large UltraSPARC III based machines the 'cpu' nodes hang off of 'ssm' (scalable shared memory) nodes which group snooping-coherency domains together instead of directly from the nexus. It would be great if we could use newbus to deal with the different ways the 'cpu' devices can hang off of pseudo ones but unfortunately both cpu_mp_setmaxid() and sparc64_init() have to work prior to regular device probing. - Add support for UltraSPARC IV and IV+ CPUs. Due to the fact that these are multi-core each CPU has two Fireplane config registers and thus the module/target ID has to be determined differently so the one specific to a certain core is used. Similarly, starting with UltraSPARC IV the individual cores use a different property in the OFW device tree to indicate the CPU/core ID as it no longer is in coincidence with the shared slot/socket ID. This involves changing the MD KTR code to not directly read the UPA module ID either. We use the MID stored in the per-CPU data instead of calling cpu_get_mid() as a replacement in order prevent clobbering any registers as side-effect in the assembler version. This requires CATR() invocations from mp_startup() prior to mapping the per-CPU pages to be removed though. While at it additionally distinguish between CPUs with Fireplane and JBus interconnects as these also use slightly different sizes for the JBus/agent/module/target IDs. - Make sparc64_shutdown_final() static as it's not used outside of machdep.c.
# 194784	23-Jun-2009	jeff	Implement a facility for dynamic per-cpu variables. - Modules and kernel code alike may use DPCPU_DEFINE(), DPCPU_GET(), DPCPU_SET(), etc. akin to the statically defined PCPU_. Requires only one extra instruction more than PCPU_ and is virtually the same as __thread for builtin and much faster for shared objects. DPCPU variables can be initialized when defined. - Modules are supported by relocating the module's per-cpu linker set over space reserved in the kernel. Modules may fail to load if there is insufficient space available. - Track space available for modules with a one-off extent allocator. Free may block for memory to allocate space for an extent. Reviewed by: jhb, rwatson, kan, sam, grehan, marius, marcel, stas
# 190106	19-Mar-2009	marius	- Remove the delay in cpu_mp_shutdown() which is no longer necessary since we have stopped using SUNW,stop-self with r186395. - There's no need to wrap kdb_active in #ifdef KDB as it's always available.
# 186395	22-Dec-2008	marius	- According to comments in OpenBSD, E{2,4}50 tend to have fragile firmware versions which wedge when using the OFW test service, so given that we don't really depend on SUNW,stop-self just nuke it altogether instead of risking problems. - At least Fire V880 have a small hardware glitch which causes the reception of IDR_NACKs for CPUs we actually haven't tried to send an IPI to, even not as part of the initial try. According to tests this apparently can be safely ignored though, so just return if checking for the individual IDR_NACKs indicates no outstanding dispatch. Serializing the sending of IPIs between MD and MI code by the combined usage of smp_ipi_mtx makes no difference to this phenomenon. [1] - Provide relevant debugging bits already with the initial panic in case of problems with the IPI dispatch, which would have allowed to diagnose the above problem without a specially built kernel. - In case of cheetah_ipi_selected() base the delay we wait for other CPUs which also might want to dispatch IPIs on the total amount of CPUs instead of just the number of CPUs we let this CPU send IPIs to because in the worst case all CPUs also want to IPI us at the same time. Reported and access for extensive tests provided by: Beat Gaetzi [1]
# 186347	19-Dec-2008	nwhitehorn	Modularize the Open Firmware client interface to allow run-time switching of OFW access semantics, in order to allow future support for real-mode OF access and flattened device frees. OF client interface modules are implemented using KOBJ, in a similar way to the PPC PMAP modules. Because we need Open Firmware to be available before mutexes can be used on sparc64, changes are also included to allow KOBJ to be used very early in the boot process by only using the mutex once we know it has been initialized. Reviewed by: marius, grehan
# 183201	20-Sep-2008	marius	Use the STICK timers only when absolutely necessary, i.e. if a machine consists of CPUs running at different speeds, for driving hardclock as these timers in turn are driven at frequencies as low as 5MHz, resulting in bad granularity compared to the TICK timers. However, don't employ the workaround for the BlackBird erratum #1 when using the TICK timer on machines with cheetah-class CPUs for performance reasons. Reported by: Florian Smeets
# 183142	18-Sep-2008	marius	- Newer firmware versions no longer provide SUNW,stop-self so just disable interrupts and loop forever with these. - Hide all MP-related bits in <machine/smp.h> underneath #ifdef SMP. - Inline ipi_all_but_self(9) and ipi_selected(9). We don't expose any additional bits but save a few cycles by doing so. - Remove ipi_all(9), which actually only called panic(9). It can't be implemented natively anyway and having it removed at least causes MI users to fail already fail when linking.
# 182769	04-Sep-2008	marius	Ensure the caches have the desired configuration (see especially cheetah_cache_enable()).
# 182768	04-Sep-2008	marius	Flesh out MMU and cache handling of cheetah-class CPUs.
# 182730	03-Sep-2008	marius	- USIII-based machines can consist of CPUs running at different frequencies (and having different cache sizes) so use the STICK (System TICK) timer, which was introduced due to this and is driven by the same frequency across all CPUs, instead of the TICK timer, whose frequency varies with the CPU clock, to drive hardclock. We try to use the STICK counter with all CPUs that are USIII or beyond, even when not necessary due to identical CPUs, as we can can also avoid the workaround for the BlackBird erratum #1 there. Unfortunately, using the STICK counter currently causes a hang with USIIIi MP machines for reasons unknown, so we still use the TICK timer there (which is okay as they can only consist of identical CPUs). - Given that we only (try to) synchronize the (S)TICK timers of APs with the BSP during startup, we could end up spinning forever in DELAY(9) if that function is migrated to another CPU while we're spinning due to clock drift afterwards, so pin to the CPU in order to avoid migration. Unfortunately, pinning doesn't work at the point DELAY(9) is required by the low-level console drivers, yet, so switch to a function pointer, which is updated accordingly, for implementing DELAY(9). For USIII and beyond, this would also allow to easily use the STICK counter instead of the TICK one here, there's no benefit in doing so however. While at it, use cpu_spinwait(9) for spinning in the delay- functions. This currently is a NOP though. - Don't set the TICK timer of the BSP to 0 during at startup as there's no need to do so. - Implement cpu_est_clockrate(). - Unfortunately, USIIIi-based machines don't provide a timecounter device besides the STICK and TICK counters (well, in theory the Tomatillo bridges have a performance counter that can be (ab)used as timecounter by configuring it to count bus cycles, though unlike the performance counter of Schizo bridges, the Tomatillo one is broken and counts Sun knows what in this mode). This means that we've to use a (S)TICK counter for timecounting, which has the old problem of not being in sync across CPUs, so provide an additional timecounter function which binds itself to the BSP but has an adequate low priority.
# 182689	02-Sep-2008	marius	- USIII-based machines can consist of CPUs having different cache sizes (and running at different frequencies) so move the cacheinfo to the PCPU data. While at it, remove some redundant and/or unused members from struct cacheinfo. - In sparc64_init don't assume the first CPU node we find in the OFW device tree is the BSP.
# 182020	22-Aug-2008	marius	cosmetic changes and style fixes
# 181701	13-Aug-2008	marius	cosmetic changes and style fixes
# 178443	23-Apr-2008	marius	o Rename ic_eoi to ic_clear to emphasize the functions it points don't send and EOI which works like on amd64/i386 and blocks all interrupts on the relevant interrupt controller. o Replace the post_filter and post_inthread hooks registered when creating the interrupt events with just ic_clear as on sparc64 we don't need to do any disable->EOI->enable dance to unblock all but the relevant interrupt while running the filter or handler; just not clearing the interrupt already has the same effect. o Merge from amd64/i386: - Split the intr_table_lock into an sx lock used for most things, and a spin lock to protect intrcnt_index. - Add support for binding interrupts to CPUs, including for the bus_bind_intr(9) interface, a assign_cpu hook and initially shuffling interrupts arround in a round-robin fashion. Reviewed by: jhb MFC after: 1 month
# 178048	09-Apr-2008	marius	- Add support for IPI_PREEMPT. [1] - Add my copyright to mp_machdep.c for having implemented support for USIII and up and some fixes. Obtained from: sun4v (modulo style(9) bugs) [1]
# 176994	09-Mar-2008	marius	- Do as the comment in pmap_bootstrap() suggests and flush all non-locked TLB entries possibly left over by the firmware and also do so while bootstrapping APs. - Use __FBSDID. MFC after: 1 month
# 176734	02-Mar-2008	jeff	- Remove the old smp cpu topology specification with a new, more flexible tree structure that encodes the level of cache sharing and other properties. - Provide several convenience functions for creating one and two level cpu trees as well as a default flat topology. The system now always has some topology. - On i386 and amd64 create a seperate level in the hierarchy for HTT and multi-core cpus. This will allow the scheduler to intelligently load balance non-uniform cores. Presently we don't detect what level of the cache hierarchy is shared at each level in the topology. - Add a mechanism for testing common topologies that have more information than the MD code is able to provide via the kern.smp.topology tunable. This should be considered a debugging tool only and not a stable api. Sponsored by: Nokia
# 170846	16-Jun-2007	marius	- Add support for sending IPIs with USIII and greater sun4u CPUs. These CPUs use an enhanced layout of the interrupt vector dispatch and dispatch status registers in order to allow sending IPIs to multiple targets simultaneously. Thus support for these CPUs was put in a newly added cheetah_ipi_selected(). This is intended to be pointed to by cpu_ipi_selected, which now is a function pointer, in order to avoid cpu_impl checks once booted. Alternatively it can point to spitfire_ipi_selected(), which was renamed from cpu_ipi_selected(). Consequently cpu_ipi_send() was also renamed to spitfire_ipi_send() (there's no need for a cheetah equivalent of this so far). Initialization of the cpu_ipi_selected pointer and other requirements is done in mp_init(), which was renamed from mp_tramp_alloc(), as cpu_mp_start() isn't called on UP systems while cpu_ipi_selected() is. As a side-effect this allows to make mp_tramp static to sys/sparc64/sparc64/mp_machdep.c. For the sake of avoiding #ifdef SMP and for keeping the history in place cheetah_ipi_selected() and spitfire_ipi_{selected,send}() where not put into/moved to sys/sparc64/sparc64/{cheetah,spitfire}.c - Add some CTASSERTs and KASSERTs ensuring that MAXCPU doesn't exceed the data types we use to store the CPU bit fields or the number of USIII and greater CPUs supported by the current cheetah_ipi_selected() implementation (which for JBus-CPUs is only 4; that should be fine though as according to OpenSolaris there are no sun4u machines with more than 4 JBus-CPUs). - In cpu_mp_start() don't enumerate and start more than MAXCPU CPUs as we can't handle more than that. - In cpu_mp_start() check for upa-portid vs. portid depending on cpu_impl for consistency with nexus(4). - In spitfire_ipi_selected() add KASSERTs ensuring that a CPU isn't told to IPI itself as sun4u CPUs just can't do that. - In spitfire_ipi_send() do a MEMBAR #Sync after writing the interrupt vector data as we want to make sure the payload was actually written before we trigger the dispatch. - In spitfire_ipi_send() also verify IDR_BUSY when checking whether the dispatch was successful as it has to be cleared for this to be the case. - Remove some redundant variables.
# 170303	04-Jun-2007	jeff	Commit 10/14 of sched_lock decomposition. - Use sched_throw() rather than replicating the same cpu_throw() code for each architecture. This also allows the scheduler to use any locking it may want to. - Use the thread_lock() rather than sched_lock when preempting. - The scheduler lock is not required to synchronize release_aps. Tested by: kris, current@ Tested on: i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc. Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)
# 169796	20-May-2007	marius	- Staticize cpu_ipi_send() and cpu_mp_unleash() as these aren't referenced outside of mp_machdep.c - Replace a magic 14 with the newly added IDC_ITID_SHIFT macro. - Remove the global mp_boot_mid variable as it's not really necessary and just replacing it with PCPU_GET(mid) doesn't have any impact on performance once booted. - Replace PCPU_GET(cpuid) with the curcpu shortcut. - Replace hardcoded function names in panic strings etc with __func__ so they don't need to be updated when renaming the function. - Use register_t instead of u_long for variables used to hold the return value of intr_disable() so we don't need to apply any knowledge about the actual width of that value here. - Improve the wording of some comments. - Fix several style(9) bugs.
# 161966	03-Sep-2006	marius	Do as the USII CPU manual suggests and leave interrupts enabled for a bit before retrying to resend an IPI in order to avoid deadlocks if the other CPU is also trying to send one. OpenSolaris uses a delay of 1 microsecond here but waiting 2 microseconds with interrupts enabled like Linux does shouldn't hurt but is a bit safer. MFC after: 1 day
# 157240	28-Mar-2006	marius	- We only lock the local per-CPU page in the local dTLB, so accessing the foreign per-CPU pages in cpu_ipi_send() in order to get the module IDs of the other CPUs can cause a page fault. If this happens when doing a TLB shootdown while dealing with another page fault this causes a panic due to the recursive page fault. As I don't spot other code that assumes or requires that accessing foreign per-CPU pages must not page fault solve this by adding a statically allocated (and therefore locked in the kernel pages) array which establishes a FreeBSD CPU ID -> module ID relation and use that in cpu_ipi_selected() (instead of statically allocating the per-CPU pages which would just waste memory on say a dual CPU machine as sun4u theoretically supports up to 128 CPUs or wasting dTLB slots for the foreign per-CPU pages). [1] - Fix a potential race in cpu_ipi_send(); as we don't serialize the access to cpu_ipi_selected() between MI and MD use (only MI-MI and MD-MD) we might catch the NACK bit caused by sending another IPI. Solve this by checking the NACK bit in the contents of the interrupt dispatch status reg read while interrupts were still turned off instead of reading that reg anew after interrupts were turned on again. This is also what the CPU docs suggest to do. - Add a workaround for the SpitFire erratum #54 bug (affecting interrupt dispatch). While public info regarding what this CPU bug actually causes is not available testing shows that with the workaround in place it's less likely to get a "couldn't send ipi" panic, it doesn't solve these panics entirely though. [2] Reported by: kris [1] Some clue from: kmacy [1] Info from: Linux, OpenSolaris [2] Additional testing by: kris MFC after: 3 days
# 155444	07-Feb-2006	phk	Modify the way we account for CPU time spent (step 1) Keep track of time spent by the cpu in various contexts in units of "cputicks" and scale to real-world microsec^H^H^H^H^H^H^H^Hclock_t only when somebody wants to inspect the numbers. For now "cputicks" are still derived from the current timecounter and therefore things should by definition remain sensible also on SMP machines. (The main reason for this first milestone commit is to verify that hypothesis.) On slower machines, the avoided multiplications to normalize timestams at every context switch, comes out as a 5-7% better score on the unixbench/context1 microbenchmark. On more modern hardware no change in performance is seen.
# 152022	03-Nov-2005	jhb	Add stoppcbs[] arrays on Alpha and sparc64 and have each CPU save its current context in the IPI_STOP handler so that we can get accurate stack traces of threads on other CPUs on these two archs like we do now on i386 and amd64. Tested on: alpha, sparc64
# 145150	16-Apr-2005	marius	- Add a workaround for a bug in BlackBird CPUs (said to be part of the SpitFire erratum #54) which can cause writes to the TICK_CMPR register to fail. This seems to fix the dying clocks problem reported by jhb@ and kris@. [1] - In tick_start() don't reset the tick counter of the boot processor to zero. It's initially reset in _start() and afterwards but _before_ tick_start() is called on the BSP the APs synchronise with the tick counter of the BSP in mp_startup(). Resetting the tick counter of the BSP in tick_start() probably also was the cause of problems seen when using the CPU tick counter as timecounter on SMP machines. Not resetting the tick counter of the BSP in mp_startup() makes the tick counters and tick interrupts between the BSP and APs be pretty much in sync as it's supposed to be. This also means there's no longer a real reason to have separate tick_start() and tick_start_ap() so merge them and zap tick_start_ap(). This is also a first step in simplifying the interface to the tick counters in preparation to use alternate clock hardware where available. - Switch to the algorithm used on FreeBSD/ia64 for updating the tick interrupt register and which compensates the clock drift caused by varying delays between when the tick interrupts actually trigger and when they are serviced. Not compensating the clock drift mainly hurts interactive performance especially when using WITNESS. [2] For further information about the algorithm also see the commit log of sys/ia64/ia64/interrupt.c rev. 1.38. On sparc64 the sysctls for monitoring the behaviour of the tick interrupts are machdep.tick.adjust_edges, machdep.tick.adjust_excess, machdep.tick.adjust_missed and machdep.tick.adjust_ticks. - In tick_init() just use tick_stop() for stopping the tick interrupts until a proper handler is set up later. This also stops the system tick interrupt on USIII systems earlier. - In tick_start() check for a rough upper limit of HZ. - Some minor changes, e.g. use FBSDID, remove unused headers, etc. Info obtained from: Linux [1] Ok'ed by: marcel [2] Additional testing by: kris (earlier version of the workaround), jhb X-MFC after: 3 days [1]
# 144637	04-Apr-2005	jhb	Divorce critical sections from spinlocks. Critical sections as denoted by critical_enter() and critical_exit() are now solely a mechanism for deferring kernel preemptions. They no longer have any affect on interrupts. This means that standalone critical sections are now very cheap as they are simply unlocked integer increments and decrements for the common case. Spin mutexes now use a separate KPI implemented in MD code: spinlock_enter() and spinlock_exit(). This KPI is responsible for providing whatever MD guarantees are needed to ensure that a thread holding a spin lock won't be preempted by any other code that will try to lock the same lock. For now all archs continue to block interrupts in a "spinlock section" as they did formerly in all critical sections. Note that I've also taken this opportunity to push a few things into MD code rather than MI. For example, critical_fork_exit() no longer exists. Instead, MD code ensures that new threads have the correct state when they are created. Also, we no longer try to fixup the idlethreads for APs in MI code. Instead, each arch sets the initial curthread and adjusts the state of the idle thread it borrows in order to perform the initial context switch. This change is largely a big NOP, but the cleaner separation it provides will allow for more efficient alternative locking schemes in other parts of the kernel (bare critical sections rather than per-CPU spin mutexes for per-CPU data for example). Reviewed by: grehan, cognet, arch@, others Tested on: i386, alpha, sparc64, powerpc, arm, possibly more
# 135855	27-Sep-2004	kensmith	Some minor print/panic message cleanups.
# 131950	10-Jul-2004	marcel	Update for the KDB framework: o Make debugging code conditional upon KDB instead of DDB. o Call kdb_enter() instead of Debugger(). o Remove implementation of Debugger(). o Check kdb_active instead of db_active. o Call kdb_trap() according to the new world order.
# 123126	03-Dec-2003	jhb	Fix all users of mp_maxid to use the same semantics, namely: 1) mp_maxid is a valid FreeBSD CPU ID in the range 0 .. MAXCPU - 1. 2) For all active CPUs in the system, PCPU_GET(cpuid) <= mp_maxid. Approved by: re (scottl) Tested on: i386, amd64, alpha
# 122947	21-Nov-2003	jhb	- Split cpu_mp_probe() into two parts. cpu_mp_setmaxid() is still called very early (SI_SUB_TUNABLES - 1) and is responsible for setting mp_maxid. cpu_mp_probe() is now called at SI_SUB_CPU and determines if SMP is actually present and sets mp_ncpus and all_cpus. Splitting these up allows an architecture to probe CPUs later than SI_SUB_TUNABLES by just setting mp_maxid to MAXCPU in cpu_mp_setmaxid(). This could allow the CPU probing code to live in a module, for example, since modules sysinit's in modules cannot be invoked prior to SI_SUB_KLD. This is needed to re-enable the ACPI module on i386. - For the alpha SMP probing code, use LOCATE_PCS() instead of duplicating its contents in a few places. Also, add a smp_cpu_enabled() function to avoid duplicating some code. There is room for further code reduction later since much of this code is also present in cpu_mp_start(). - All archs besides i386 still set mp_maxid to the same values they set it to before this change. i386 now sets mp_maxid to MAXCPU. Tested on: alpha, amd64, i386, ia64, sparc64 Approved by: re (scottl)
# 119696	02-Sep-2003	marcel	Preparatory commit to allow prototypes in ofw_machdep.h to contain both newbus types and OFW types. This involves either including <machine/bus.h> or <dev/ofw/openfirm.h>. Reviewed by: jake, jmg, tmm
# 119291	22-Aug-2003	imp	Prefer new location of pci include files (which have only been in the tree for two or more years now), except in a few places where there's code to be compatible with older versions of FreeBSD.
# 113238	08-Apr-2003	jake	Use vm_paddr_t for physical addresses.
# 112993	02-Apr-2003	peter	Commit a partial lazy thread switch mechanism for i386. it isn't as lazy as it could be and can do with some more cleanup. Currently its under options LAZY_SWITCH. What this does is avoid %cr3 reloads for short context switches that do not involve another user process. ie: we can take an interrupt, switch to a kthread and return to the user without explicitly flushing the tlb. However, this isn't as exciting as it could be, the interrupt overhead is still high and too much blocks on Giant still. There are some debug sysctls, for stats and for an on/off switch. The main problem with doing this has been "what if the process that you're running on exits while we're borrowing its address space?" - in this case we use an IPI to give it a kick when we're about to reclaim the pmap. Its not compiled in unless you add the LAZY_SWITCH option. I want to fix a few more things and get some more feedback before turning it on by default. This is NOT a replacement for Bosko's lazy interrupt stuff. This was more meant for the kthread case, while his was for interrupts. Mine helps a little for interrupts, but his helps a lot more. The stats are enabled with options SWTCH_OPTIM_STATS - this has been a pseudo-option for years, I just added a bunch of stuff to it. One non-trivial change was to select a new thread before calling cpu_switch() in the first place. This allows us to catch the silly case of doing a cpu_switch() to the current process. This happens uncomfortably often. This simplifies a bit of the asm code in cpu_switch (no longer have to call choosethread() in the middle). This has been implemented on i386 and (thanks to jake) sparc64. The others will come soon. This is actually seperate to the lazy switch stuff. Glanced at by: jake, jhb
# 112396	19-Mar-2003	jake	Remove a workaround for mysterious junk appearing in the tlb of secondary cpus. It turned out to be a bug in the loader.
# 108187	22-Dec-2002	jake	- Add a spin lock to single thread cache invalidation and tlb flush ipis, which allows ipis to be sent outside of Giant. - Remove the ap boot mutex, which is unused.
# 107103	20-Nov-2002	jhb	Fix compile in the case of SMP defined but DDB not defined. Approved by: re (implicit, DP2 doesn't build w/o this)
# 104271	01-Oct-2002	jake	Get rid of the TODO macro in the few places that still need work; either comment it out or change to explicit panics. It conflicts with things like #if TODO in drivers.
# 102042	18-Aug-2002	jake	Forgot this in last commit.
# 101898	15-Aug-2002	jake	Store the number of itlb and dtlb entries separately; they may be different. Find the prom node for the boot cpu earlier and store it in the per-cpu area, so that cache_init can be called earlier.
# 99936	13-Jul-2002	jake	Try both upa-portid and portid properties when finding the module id of a secondary cpu. Its called portid on UltraSPARCIII machines.
# 98033	08-Jun-2002	jake	Remove test code.
# 98031	08-Jun-2002	jake	Fix bizarre SMP problems. The secondary cpus sometimes start up with junk in their tlb which the prom doesn't clear out, so we have to do so manually before mapping the kernel page table or the cpu can hang due various conditions which cause undefined behaviour from the tlb.
# 97511	29-May-2002	jake	Forgot to commit this file. Catch up to loader->kernel abi changes.
# 97001	20-May-2002	jake	Add SMP aware cache flushing functions, which operate on a single physical page. These send IPIs if necessary in order to keep the caches in sync on all cpus.
# 95132	20-Apr-2002	jake	Add needed include of tick.h.
# 93818	04-Apr-2002	jhb	Change callers of mtx_init() to pass in an appropriate lock type name. In most cases NULL is passed, but in some cases such as network driver locks (which use the MTX_NETWORK_LOCK macro) and UMA zone locks, a name is used. Tested on: i386, alpha, sparc64
# 93683	02-Apr-2002	tmm	Set mp_maxid so that UMA works with SMP. Submitted by: jake
# 92205	13-Mar-2002	jake	Add support for starting and stopping cpus with ipis. Stop the other cpus when shutting down or entering the debugger. Submitted by: tmm
# 91783	07-Mar-2002	jake	Implement delivery of tlb shootdown ipis. This is currently more fine grained than the other implementations; we have complete control over the tlb, so we only demap specific pages. We take advantage of the ranged tlb flush api to send one ipi for a range of pages, and due to the pm_active optimization we rarely send ipis for demaps from user pmaps. Remove now unused routines to load the tlb; this is only done once outside of the tlb fault handlers. Minor cleanups to the smp startup code. This boots multi user with both cpus active on a dual ultra 60 and on a dual ultra 2.
# 91617	04-Mar-2002	jake	Add support for starting secondary cpus in kernel, as opposed to relying on the loader to do it. Improve smp startup code to be less racy and to defer certain things until the right time. This almost boots single user on my dual ultra 60, it is still very fragile: SMP: AP CPU #1 Launched! Enter full pathname of shell or RETURN for /bin/sh: # ls Debugger("trapsig") Stopped at Debugger+0x1c: ta %xcc, 1 db> heh No such command db>
# 91066	22-Feb-2002	phk	Convert p->p_runtime and PCPU(switchtime) to bintime format.
# 89051	08-Jan-2002	jake	Add initial smp support. This gets as far as allowing the secondary cpu(s) into the kernel, and sync-ing them up to "kernel" mode so we can send them ipis, which also work. Thanks to John Baldwin for providing me with access to the hardware that made this possible. Parts obtained from: bsd/os