History log of /freebsd-9.3-release/sys/sparc64/include/smp.h
Revision Date Author Comments
(<<< Hide modified files)
(Show modified files >>>)
# 267654 19-Jun-2014 gjb

Copy stable/9 to releng/9.3 as part of the 9.3-RELEASE cycle.

Approved by: re (implicit)
Sponsored by: The FreeBSD Foundation

# 246014 27-Jan-2013 marius

MFC: r245850

Revert the part of r239864 (MFC'ed to stable/9 in r241681) which removed
obtaining the SMP mutex around reading registers from other CPUs. As it
turns out, the hardware doesn't really like concurrent IPI'ing causing
adverse effects. Also the thought deadlock when using this spin lock here
and the targeted CPU(s) are also holding or in case of nested locks can't
actually happen. This is due to the fact that on sparc64, spinlock_enter()
only raises the PIL but doesn't disable interrupts completely. Thus direct
cross calls as used for the register reading (and all other MD IPI needs)
still will be executed by the targeted CPU(s) in that case.


# 241681 18-Oct-2012 marius

MFC: r239864

- Unlike cache invalidation and TLB demapping IPIs, reading registers from
other CPUs doesn't require locking so get rid of it. As the latter is used
for the timecounter on certain machine models, using a spin lock in this
case can lead to a deadlock with the upcoming callout(9) rework.
- Merge r134227/r167250 from x86:
Avoid cross-IPI SMP deadlock by using the smp_ipi_mtx spin lock not only
for smp_rendezvous_cpus() but also for the MD cache invalidation and TLB
demapping IPIs.
- Mark some unused function arguments as such.


# 225736 22-Sep-2011 kensmith

Copy head to stable/9 as part of 9.0-RELEASE release cycle.

Approved by: re (implicit)


# 223126 15-Jun-2011 marius

Don't include curcpu in the mask which is used as the IPI cookie as we
have to ignore it when sending the IPI anyway. Actually I can't think of
a good reason why this ever was done that way in the first place as it's
not even usefull for debugging.
While at it replace the use of pc_other_cpus as it's slated for deorbit.


# 222813 07-Jun-2011 attilio

etire the cpumask_t type and replace it with cpuset_t usage.

This is intended to fix the bug where cpu mask objects are
capped to 32. MAXCPU, then, can now arbitrarely bumped to whatever
value. Anyway, as long as several structures in the kernel are
statically allocated and sized as MAXCPU, it is suggested to keep it
as low as possible for the time being.

Technical notes on this commit itself:
- More functions to handle with cpuset_t objects are introduced.
The most notable are cpusetobj_ffs() (which calculates a ffs(3)
for a cpuset_t object), cpusetobj_strprint() (which prepares a string
representing a cpuset_t object) and cpusetobj_strscan() (which
creates a valid cpuset_t starting from a string representation).
- pc_cpumask and pc_other_cpus are target to be removed soon.
With the moving from cpumask_t to cpuset_t they are now inefficient
and not really useful. Anyway, for the time being, please note that
access to pcpu datas is protected by sched_pin() in order to avoid
migrating the CPU while reading more than one (possible) word
- Please note that size of cpuset_t objects may differ between kernel
and userland. While this is not directly related to the patch itself,
it is good to understand that concept and possibly use the patch
as a reference on how to deal with cpuset_t objects in userland, when
accessing kernland members.
- KTR_CPUMASK is changed and now is represented through a string, to be
set as the example reported in NOTES.

Please additively note that no MAXCPU is bumped in this patch, but
private testing has been done until to MAXCPU=128 on a real 8x8x2(htt)
machine (amd64).

Please note that the FreeBSD version is not yet bumped because of
the upcoming pcpu changes. However, note that this patch is not
targeted for MFC.

People to thank for the time spent on this patch:
- sbruno, pluknet and Nicholas Esborn (nick AT desert DOT net) tested
several revision of the patches and really helped in improving
stability of this work.
- marius fixed several bugs in the sparc64 implementation and reviewed
patches related to ktr.
- jeff and jhb discussed the basic approach followed.
- kib and marcel made targeted review on some specific part of the
patch.
- marius, art, nwhitehorn and andreast reviewed MD specific part of
the patch.
- marius, andreast, gonzo, nwhitehorn and jceel tested MD specific
implementations of the patch.
- Other people have made contributions on other patches that have been
already committed and have been listed separately.

Companies that should be mentioned for having participated at several
degrees:
- Yahoo! for having offered the machines used for testing on big
count of CPUs.
- The FreeBSD Foundation for having sponsored my devsummit attendance,
which has been instrumental.
- Sandvine for having offered offices and infrastructure during
development.

(I really hope I didn't forget anyone, if it happened I apologize in
advance).


# 212541 13-Sep-2010 mav

Refactor timer management code with priority to one-shot operation mode.
The main goal of this is to generate timer interrupts only when there is
some work to do. When CPU is busy interrupts are generating at full rate
of hz + stathz to fullfill scheduler and timekeeping requirements. But
when CPU is idle, only minimum set of interrupts (down to 8 interrupts per
second per CPU now), needed to handle scheduled callouts is executed.
This allows significantly increase idle CPU sleep time, increasing effect
of static power-saving technologies. Also it should reduce host CPU load
on virtualized systems, when guest system is idle.

There is set of tunables, also available as writable sysctls, allowing to
control wanted event timer subsystem behavior:
kern.eventtimer.timer - allows to choose event timer hardware to use.
On x86 there is up to 4 different kinds of timers. Depending on whether
chosen timer is per-CPU, behavior of other options slightly differs.
kern.eventtimer.periodic - allows to choose periodic and one-shot
operation mode. In periodic mode, current timer hardware taken as the only
source of time for time events. This mode is quite alike to previous kernel
behavior. One-shot mode instead uses currently selected time counter
hardware to schedule all needed events one by one and program timer to
generate interrupt exactly in specified time. Default value depends of
chosen timer capabilities, but one-shot mode is preferred, until other is
forced by user or hardware.
kern.eventtimer.singlemul - in periodic mode specifies how much times
higher timer frequency should be, to not strictly alias hardclock() and
statclock() events. Default values are 2 and 4, but could be reduced to 1
if extra interrupts are unwanted.
kern.eventtimer.idletick - makes each CPU to receive every timer interrupt
independently of whether they busy or not. By default this options is
disabled. If chosen timer is per-CPU and runs in periodic mode, this option
has no effect - all interrupts are generating.

As soon as this patch modifies cpu_idle() on some platforms, I have also
refactored one on x86. Now it makes use of MONITOR/MWAIT instrunctions
(if supported) under high sleep/wakeup rate, as fast alternative to other
methods. It allows SMP scheduler to wake up sleeping CPUs much faster
without using IPI, significantly increasing performance on some highly
task-switching loads.

Tested by: many (on i386, amd64, sparc64 and powerc)
H/W donated by: Gheorghe Ardelean
Sponsored by: iXsystems, Inc.


# 211197 11-Aug-2010 jhb

Update various places that store or manipulate CPU masks to use cpumask_t
instead of int or u_int. Since cpumask_t is currently u_int on all
platforms this should just be a cosmetic change.


# 211071 08-Aug-2010 marius

- As it is not possible for sched_bind(9) to context switch with
td_critnest > 1 when not already running on the desired CPU read the
TICK counter of the BSP via a direct cross trap request in that case
instead.
- Treat the STICK based timecounter the same way as the TICK based one
regarding its quality and obtaining the counter value from the BSP.
Like the TICK timers the STICK ones also are only synchronized during
their startup (which might not result in good synchronicity in the
first place) but not afterwards and might drift over time, causing
problems when the time is read from different CPUs (see r135972).


# 211050 07-Aug-2010 marius

- Introduce a cpu_ipi_single() function pointer in order to send IPIs
to single CPUs more efficiently with Cheetah(-class) and Jalapeno CPUs.
Besides being used to implement the ipi_cpu() introduced in r210939,
cpu_ipi_single() will also be used internally by the sparc64 MD code.
- Factor out the Jalapeno support from the Cheetah IPI send functions
in order to be able to more easily and efficiently implement support
for more than 32 target CPUs as well as a workaround for Cheetah+
erratum 25 for the latter.


# 210939 06-Aug-2010 jhb

Add a new ipi_cpu() function to the MI IPI API that can be used to send an
IPI to a specific CPU by its cpuid. Replace calls to ipi_selected() that
constructed a mask for a single CPU with calls to ipi_cpu() instead. This
will matter more in the future when we transition from cpumask_t to
cpuset_t for CPU masks in which case building a CPU mask is more expensive.

Submitted by: peter, sbruno
Reviewed by: rookie
Obtained from: Yahoo! (x86)
MFC after: 1 month


# 210601 29-Jul-2010 mav

Adapt sparc64 and sun4v timer code for the new event timers infrastructure.

Reviewed by: marius@


# 209695 04-Jul-2010 marius

- Pin the IPI cache and TLB demap functions in order to prevent migration
between determining the other CPUs and calling cpu_ipi_selected(), which
apart from generally doing the wrong thing can lead to a panic when a
CPU is told to IPI itself (which sun4u doesn't support).
Reported and tested by: Nathaniel W Filardo
- Add __unused where appropriate.

MFC after: 3 days


# 204152 20-Feb-2010 marius

Some machines can not only consist of CPUs running at different speeds
but also of different types, f.e. Sun Fire V890 can be equipped with a
mix of UltraSPARC IV and IV+ CPUs, requiring different MMU initialization
and different workarounds for model specific errata. Therefore move the
CPU implementation number from a global variable to the per-CPU data.
Functions which are called before the latter is available are passed the
implementation number as a parameter now.


# 196196 13-Aug-2009 attilio

* Completely Remove the option STOP_NMI from the kernel. This option
has proven to have a good effect when entering KDB by using a NMI,
but it completely violates all the good rules about interrupts
disabled while holding a spinlock in other occasions. This can be the
cause of deadlocks on events where a normal IPI_STOP is expected.
* Adds an new IPI called IPI_STOP_HARD on all the supported architectures.
This IPI is responsible for sending a stop message among CPUs using a
privileged channel when disponible. In other cases it just does match a
normal IPI_STOP.
Right now the IPI_STOP_HARD functionality uses a NMI on ia32 and amd64
architectures, while on the other has a normal IPI_STOP effect. It is
responsibility of maintainers to eventually implement an hard stop
when necessary and possible.
* Use the new IPI facility in order to implement a new userend SMP kernel
function called stop_cpus_hard(). That is specular to stop_cpu() but
it does use the privileged channel for the stopping facility.
* Let KDB use the newly introduced function stop_cpus_hard() and leave
stop_cpus() for all the other cases
* Disable interrupts on CPU0 when starting the process of APs suspension.
* Style cleanup and comments adding

This patch should fix the reboot/shutdown deadlocks many users are
constantly reporting on mailing lists.

Please don't forget to update your config file with the STOP_NMI
option removal

Reviewed by: jhb
Tested by: pho, bz, rink
Approved by: re (kib)


# 183142 18-Sep-2008 marius

- Newer firmware versions no longer provide SUNW,stop-self so just
disable interrupts and loop forever with these.
- Hide all MP-related bits in <machine/smp.h> underneath #ifdef SMP.
- Inline ipi_all_but_self(9) and ipi_selected(9). We don't expose any
additional bits but save a few cycles by doing so.
- Remove ipi_all(9), which actually only called panic(9). It can't be
implemented natively anyway and having it removed at least causes
MI users to fail already fail when linking.


# 182730 03-Sep-2008 marius

- USIII-based machines can consist of CPUs running at different
frequencies (and having different cache sizes) so use the STICK
(System TICK) timer, which was introduced due to this and is
driven by the same frequency across all CPUs, instead of the
TICK timer, whose frequency varies with the CPU clock, to drive
hardclock. We try to use the STICK counter with all CPUs that are
USIII or beyond, even when not necessary due to identical CPUs,
as we can can also avoid the workaround for the BlackBird erratum
#1 there. Unfortunately, using the STICK counter currently causes
a hang with USIIIi MP machines for reasons unknown, so we still
use the TICK timer there (which is okay as they can only consist
of identical CPUs).
- Given that we only (try to) synchronize the (S)TICK timers of APs
with the BSP during startup, we could end up spinning forever in
DELAY(9) if that function is migrated to another CPU while we're
spinning due to clock drift afterwards, so pin to the CPU in order
to avoid migration. Unfortunately, pinning doesn't work at the
point DELAY(9) is required by the low-level console drivers, yet,
so switch to a function pointer, which is updated accordingly, for
implementing DELAY(9). For USIII and beyond, this would also allow
to easily use the STICK counter instead of the TICK one here,
there's no benefit in doing so however.
While at it, use cpu_spinwait(9) for spinning in the delay-
functions. This currently is a NOP though.
- Don't set the TICK timer of the BSP to 0 during at startup as
there's no need to do so.
- Implement cpu_est_clockrate().
- Unfortunately, USIIIi-based machines don't provide a timecounter
device besides the STICK and TICK counters (well, in theory the
Tomatillo bridges have a performance counter that can be (ab)used
as timecounter by configuring it to count bus cycles, though unlike
the performance counter of Schizo bridges, the Tomatillo one is
broken and counts Sun knows what in this mode). This means that
we've to use a (S)TICK counter for timecounting, which has the old
problem of not being in sync across CPUs, so provide an additional
timecounter function which binds itself to the BSP but has an
adequate low priority.


# 178048 09-Apr-2008 marius

- Add support for IPI_PREEMPT. [1]
- Add my copyright to mp_machdep.c for having implemented support for
USIII and up and some fixes.

Obtained from: sun4v (modulo style(9) bugs) [1]


# 170846 16-Jun-2007 marius

- Add support for sending IPIs with USIII and greater sun4u CPUs.
These CPUs use an enhanced layout of the interrupt vector dispatch
and dispatch status registers in order to allow sending IPIs to
multiple targets simultaneously. Thus support for these CPUs was
put in a newly added cheetah_ipi_selected(). This is intended to
be pointed to by cpu_ipi_selected, which now is a function pointer,
in order to avoid cpu_impl checks once booted. Alternatively it
can point to spitfire_ipi_selected(), which was renamed from
cpu_ipi_selected(). Consequently cpu_ipi_send() was also renamed
to spitfire_ipi_send() (there's no need for a cheetah equivalent
of this so far). Initialization of the cpu_ipi_selected pointer
and other requirements is done in mp_init(), which was renamed
from mp_tramp_alloc(), as cpu_mp_start() isn't called on UP
systems while cpu_ipi_selected() is. As a side-effect this allows
to make mp_tramp static to sys/sparc64/sparc64/mp_machdep.c.
For the sake of avoiding #ifdef SMP and for keeping the history in
place cheetah_ipi_selected() and spitfire_ipi_{selected,send}()
where not put into/moved to sys/sparc64/sparc64/{cheetah,spitfire}.c
- Add some CTASSERTs and KASSERTs ensuring that MAXCPU doesn't
exceed the data types we use to store the CPU bit fields or the
number of USIII and greater CPUs supported by the current
cheetah_ipi_selected() implementation (which for JBus-CPUs is
only 4; that should be fine though as according to OpenSolaris
there are no sun4u machines with more than 4 JBus-CPUs).
- In cpu_mp_start() don't enumerate and start more than MAXCPU CPUs
as we can't handle more than that.
- In cpu_mp_start() check for upa-portid vs. portid depending on
cpu_impl for consistency with nexus(4).
- In spitfire_ipi_selected() add KASSERTs ensuring that a CPU isn't
told to IPI itself as sun4u CPUs just can't do that.
- In spitfire_ipi_send() do a MEMBAR #Sync after writing the
interrupt vector data as we want to make sure the payload was
actually written before we trigger the dispatch.
- In spitfire_ipi_send() also verify IDR_BUSY when checking whether
the dispatch was successful as it has to be cleared for this to
be the case.
- Remove some redundant variables.


# 169796 20-May-2007 marius

- Staticize cpu_ipi_send() and cpu_mp_unleash() as these aren't
referenced outside of mp_machdep.c
- Replace a magic 14 with the newly added IDC_ITID_SHIFT macro.
- Remove the global mp_boot_mid variable as it's not really necessary
and just replacing it with PCPU_GET(mid) doesn't have any impact on
performance once booted.
- Replace PCPU_GET(cpuid) with the curcpu shortcut.
- Replace hardcoded function names in panic strings etc with __func__
so they don't need to be updated when renaming the function.
- Use register_t instead of u_long for variables used to hold the
return value of intr_disable() so we don't need to apply any
knowledge about the actual width of that value here.
- Improve the wording of some comments.
- Fix several style(9) bugs.


# 169730 19-May-2007 kan

Include machine/pcb.hto turn extern struct pcb stoppcbs[]; construct
into the valid C.


# 152022 03-Nov-2005 jhb

Add stoppcbs[] arrays on Alpha and sparc64 and have each CPU save its
current context in the IPI_STOP handler so that we can get accurate stack
traces of threads on other CPUs on these two archs like we do now on i386
and amd64.

Tested on: alpha, sparc64


# 143190 06-Mar-2005 alc

Declare as volatile the memory location referenced by a pointer rather than
the pointer's value.


# 135943 29-Sep-2004 kensmith

We seem to have occasions where sending an IPI takes significantly
longer than 'normal'. The cause is still being tracked down but
in the meantime there are machines where raising IPI_RETRIES does
help - it's not just a case of the machine staying locked up longer
and then panic-ing anyway. Several helpful folks on sparc64@ tried
a patch that helped figure out what to raise this number to.

Discussed on: sparc64@
MFC after: 3 days


# 113238 08-Apr-2003 jake

Use vm_paddr_t for physical addresses.


# 112399 19-Mar-2003 jake

- Remove unused cache flushing routines. These will not necessary work
on future UltraSPARC cpus for which the data cache is not direct mapped.
- Move UltraSPARC I and II (spitfire, blackbird, sapphire, sabre) specific
functions to spitfire.c, and add cheetah.c for UltraSPARC III specific
functions. Initially just cache flushing, but there are a few other
functions that will need to move here.
- Add an ipi handler for data cache flushing on UltraSPARC III.
- Use function pointers to select the right cache flushing functions based
on cpu_impl.

With this it is possible to boot single user from an mfs root on UltraSPARC
III systems, including spinning up secondary processors. There is currently
no support for the host to pci bridge, and no documentation for it is
publically available.

Thanks to Oleg Derevenetz for providing access to a system with UltraSPARC
III+ cpus.


# 108187 22-Dec-2002 jake

- Add a spin lock to single thread cache invalidation and tlb flush ipis,
which allows ipis to be sent outside of Giant.
- Remove the ap boot mutex, which is unused.


# 100718 26-Jul-2002 jake

Remove the tlb argument to tlb_page_demap (itlb or dtlb), in order to better
match the pmap_invalidate api.


# 99879 12-Jul-2002 tmm

When sending cache flushing IPIs, don't try to IPI the triggering CPU
itself; this causes undefined behaviour on UltraSPARCs. In particular,
the interrupt packet data words will not necessarily be delivered
correctly, which would result in a crash.
This bug also caused the cache-flushing work to be done twice on the
triggering CPU (when it did not cause crashes).

Reviewed by: jake


# 98033 08-Jun-2002 jake

Remove test code.


# 97001 20-May-2002 jake

Add SMP aware cache flushing functions, which operate on a single physical
page. These send IPIs if necessary in order to keep the caches in sync on
all cpus.


# 92199 13-Mar-2002 jake

Make IPI_WAIT use a bit mask of the cpus that a pmap is active on and only
wait for those cpus, instead of all of them by using a count. Oops.
Make the pointer to the mask that the primary cpu spins on volatile, so
gcc doesn't optimize out an important load. Oops again.
Activate tlb shootdown ipi synchronization now that it works. We have
all involved cpus wait until all the others are done. This may not be
necessary, it is mostly for sanity.
Make the trigger level interrupt ipi handler work.

Submitted by: tmm


# 91783 07-Mar-2002 jake

Implement delivery of tlb shootdown ipis. This is currently more fine grained
than the other implementations; we have complete control over the tlb, so we
only demap specific pages. We take advantage of the ranged tlb flush api
to send one ipi for a range of pages, and due to the pm_active optimization
we rarely send ipis for demaps from user pmaps.

Remove now unused routines to load the tlb; this is only done once outside
of the tlb fault handlers.
Minor cleanups to the smp startup code.

This boots multi user with both cpus active on a dual ultra 60 and on a
dual ultra 2.


# 91617 04-Mar-2002 jake

Add support for starting secondary cpus in kernel, as opposed to relying
on the loader to do it. Improve smp startup code to be less racy and to
defer certain things until the right time. This almost boots single user
on my dual ultra 60, it is still very fragile:

SMP: AP CPU #1 Launched!
Enter full pathname of shell or RETURN for /bin/sh:
# ls
Debugger("trapsig")
Stopped at Debugger+0x1c: ta %xcc, 1
db> heh
No such command
db>


# 91157 23-Feb-2002 jake

Include intr_machdep.h only for !LOCORE.


# 89425 16-Jan-2002 jake

Add extern to avoid sloppy common style declarations.

Tripped over by: jhb, mux@sneakerz.org


# 89051 08-Jan-2002 jake

Add initial smp support. This gets as far as allowing the secondary
cpu(s) into the kernel, and sync-ing them up to "kernel" mode so we can
send them ipis, which also work.

Thanks to John Baldwin for providing me with access to the hardware
that made this possible.

Parts obtained from: bsd/os


# 81334 09-Aug-2001 obrien

The author isn't a [UC] Regents. Correct the copyright language.


# 80709 31-Jul-2001 jake

Flesh out the sparc64 port considerably. This contains:
- mostly complete kernel pmap support, and tested but currently turned
off userland pmap support
- low level assembly language trap, context switching and support code
- fully implemented atomic.h and supporting cpufunc.h
- some support for kernel debugging with ddb
- various header tweaks and filling out of machine dependent structures


# 80708 31-Jul-2001 jake

Add skeleton machine dependent headers and c files for a port of freebsd
to a new architecture. This is the base of the sparc64 port, but contains
limited machine dependent code, and can be used a base for ports. Included
are:
- standard machine dependent headers, tweaked for a 64 bit, big endian
architecture, including empty versions of all the machine dependent
structures
- a machine independent atomic.h, which can be used until a port has
support for interrupts and the operations really need to be atomic
- stub versions of all the machine dependent functions, which panic
when called and print out the name of the function that needs to
be implemented. functions which are normally in assembly files are
not included, but this should reduce the number of different undefined
references on the first few compiles from hundreds to 5 or 6
Given minimal startup code and console support it should be trivial to
make this compile and run the first few sysinits on almost any architecture.

Requested by: alfred, imp, jhb