History log of /openbsd-current/sys/kern/kern_srp.c
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
# 1.13 06-Dec-2020 cheloha

srp_finalize(9): tsleep(9) -> tsleep_nsec(9)

srp_finalize(9) spins until the refcount hits zero. Blocking for at
least 1ms each iteration instead of blocking for at most 1 tick is
sufficient.

Discussed with mpi@.

ok claudio@ jmatthew@


Revision tags: OPENBSD_6_2_BASE OPENBSD_6_3_BASE OPENBSD_6_4_BASE OPENBSD_6_5_BASE OPENBSD_6_6_BASE OPENBSD_6_7_BASE OPENBSD_6_8_BASE
# 1.12 08-Sep-2017 deraadt

If you use sys/param.h, you don't need sys/types.h


Revision tags: OPENBSD_6_1_BASE
# 1.11 15-Sep-2016 dlg

all pools have their ipl set via pool_setipl, so fold it into pool_init.

the ioff argument to pool_init() is unused and has been for many
years, so this replaces it with an ipl argument. because the ipl
will be set on init we no longer need pool_setipl.

most of these changes have been done with coccinelle using the spatch
below. cocci sucks at formatting code though, so i fixed that by hand.

the manpage and subr_pool.c bits i did myself.

ok tedu@ jmatthew@

@ipl@
expression pp;
expression ipl;
expression s, a, o, f, m, p;
@@
-pool_init(pp, s, a, o, f, m, p);
-pool_setipl(pp, ipl);
+pool_init(pp, s, a, ipl, f, m, p);


Revision tags: OPENBSD_6_0_BASE
# 1.10 01-Jun-2016 dlg

add support for using SRPs without the garbage collection machinery.

the gc machinery may sleep during srp_update, which makes it hard
to use from an interrupt context. srp_swap simply swaps the references
in an srp and relies ont he caller to schedule work in a process
context where it may sleep with srp_finalise until the reference
is no longer in use.

our network stack currently modifies routing tables in an interrupt
context, so this is built to be used to support rtable updates in
our current stack while supporting concurrent lookups.

ok jmatthew@ mpi@


# 1.9 18-May-2016 dlg

rename srp_finalize to srp_gc_finalize


# 1.8 18-May-2016 dlg

rework the srp api so it takes an srp_ref struct that the caller provides.

the srp_ref struct is used to track the location of the callers
hazard pointer so later calls to srp_follow and srp_enter already
know what to clear. this in turn means most of the caveats around
using srps go away. specifically, you can now:

- switch cpus while holding an srp ref
- ie, you can sleep while holding an srp ref
- you can take and release srp refs in any order

the original intent was to simplify use of the api when dealing
with complicated data structures. the caller now no longer has to
track the location of the srp a value was fetched from, the srp_ref
effectively does that for you.

srp lists have been refactored to use srp_refs instead of srpl_iter
structs.

this is in preparation of using srps inside the ART code. ART is a
complicated data structure, and lookups require overlapping holds
of srp references.

ok mpi@ jmatthew@


Revision tags: OPENBSD_5_9_BASE
# 1.7 23-Nov-2015 mpi

Do not include <sys/atomic.h> inside <sys/refcnt.h>.

Prevent lazy developers, like David and I, to use atomic operations
without including <sys/atomic.h>.

ok dlg@


# 1.6 11-Sep-2015 dlg

unbreak build on UP kernels.

found by deraadt@


# 1.5 11-Sep-2015 dlg

make srp use refcnts so it can use refcnt_finalize instead of
sleep_setup/sleep_finish.


# 1.4 11-Sep-2015 dlg

remove some bits of srp.h i had pasted in here by accident


# 1.3 09-Sep-2015 dlg

implement a singly linked list built with SRPs.

this allows us to build lists of things that can be followed by
multiple cpus.

ok mpi@ claudio@


# 1.2 01-Sep-2015 dlg

mattieu baptiste reported a problem with bpf+srps where the per cpu
hazard pointers were becoming corrupt and therefore panics.

the problem turned out to be that bridge_input calls if_input on
behalf of a hardware interface which then calls bpf_mtap at splsoftnet,
while the actual hardware nic calls if_input and bpf_mtap at splnet.
the hardware interrupts ran in the middle of the bpf calls bridge
runs at softnet. this means the same srps are being entered and
left on the same cpu at different ipls, which led to races because
of the order of operations on the per cpu hazard pointers.

after a lot of experimentation, jmatthew@ figured out how to deal
with this problem without introducing per cpu critical sections
(ie, splhigh) calls in srp_enter and srp_leave, and without introducing
atomic operations.

the solution is to iterate forward through the array of hazard
pointers in srp_enter, and backward in srp_leave to clear. if you
guarantee that you leave srps in the reverse order to entering them,
then you can use the same set of SRPs at different IPLs on the same
CPU.

the ordering requirement is a problem if we want to build linked
data structures out of srps because you need to hold a ref to the
current element containing the next srp to use it, before giving
up the current ref. we're adding srp_follow() to support taking the
next ref and giving up the current one while preserving the structure
of the hazard pointer list. srp_follow() does this by reusing the
hazard pointer for the current reference for the next ref.

both mattieu baptiste and jmatthew@ have been hitting this pretty
hard with a tweaked version of srp+bpf that uses srp_follow instead
of interleaved srp_enter/srp_leave sequences. neither can reproduce
the panics anymore.

thanks to mattieu for the report and tests
ok jmatthew@


Revision tags: OPENBSD_5_8_BASE
# 1.1 02-Jul-2015 dlg

introduce srp, which according to the manpage i wrote is short for
"shared reference pointers".

srp allows concurrent access to a data structure by multiple cpus
while avoiding interlocking cpu opcodes. it manages its own reference
counts and the garbage collection of those data structure to avoid
use after frees.

internally srp is a twisted version of hazard pointers, which are
a relative of RCU.

jmatthew wrote the bulk of a hazard pointer implementation and
changed bpf to use it to allow mpsafe access to bpfilters. however,
at s2k15 we were trying to apply it to other data structures but
the memory overhead of every hazard pointer would have blown out
significantly in several uses cases. a bulk of our time at s2k15
was spent reworking hazard pointers into srp.

this diff adds the srp api and adds the necessary metadata to struct
cpuinfo on our MP architectures. srp on uniprocessor platforms has
alternate code that is optimised because it knows there'll be no
concurrent access to data by multiple cpus.

srp is made available to the system via param.h, so it should be
available everywhere in the kernel.

the docs likely need improvement cos im too close to the implementation.

ok mpi@


Revision tags: OPENBSD_6_2_BASE
# 1.12 08-Sep-2017 deraadt

If you use sys/param.h, you don't need sys/types.h


Revision tags: OPENBSD_6_1_BASE
# 1.11 15-Sep-2016 dlg

all pools have their ipl set via pool_setipl, so fold it into pool_init.

the ioff argument to pool_init() is unused and has been for many
years, so this replaces it with an ipl argument. because the ipl
will be set on init we no longer need pool_setipl.

most of these changes have been done with coccinelle using the spatch
below. cocci sucks at formatting code though, so i fixed that by hand.

the manpage and subr_pool.c bits i did myself.

ok tedu@ jmatthew@

@ipl@
expression pp;
expression ipl;
expression s, a, o, f, m, p;
@@
-pool_init(pp, s, a, o, f, m, p);
-pool_setipl(pp, ipl);
+pool_init(pp, s, a, ipl, f, m, p);


Revision tags: OPENBSD_6_0_BASE
# 1.10 01-Jun-2016 dlg

add support for using SRPs without the garbage collection machinery.

the gc machinery may sleep during srp_update, which makes it hard
to use from an interrupt context. srp_swap simply swaps the references
in an srp and relies ont he caller to schedule work in a process
context where it may sleep with srp_finalise until the reference
is no longer in use.

our network stack currently modifies routing tables in an interrupt
context, so this is built to be used to support rtable updates in
our current stack while supporting concurrent lookups.

ok jmatthew@ mpi@


# 1.9 18-May-2016 dlg

rename srp_finalize to srp_gc_finalize


# 1.8 18-May-2016 dlg

rework the srp api so it takes an srp_ref struct that the caller provides.

the srp_ref struct is used to track the location of the callers
hazard pointer so later calls to srp_follow and srp_enter already
know what to clear. this in turn means most of the caveats around
using srps go away. specifically, you can now:

- switch cpus while holding an srp ref
- ie, you can sleep while holding an srp ref
- you can take and release srp refs in any order

the original intent was to simplify use of the api when dealing
with complicated data structures. the caller now no longer has to
track the location of the srp a value was fetched from, the srp_ref
effectively does that for you.

srp lists have been refactored to use srp_refs instead of srpl_iter
structs.

this is in preparation of using srps inside the ART code. ART is a
complicated data structure, and lookups require overlapping holds
of srp references.

ok mpi@ jmatthew@


Revision tags: OPENBSD_5_9_BASE
# 1.7 23-Nov-2015 mpi

Do not include <sys/atomic.h> inside <sys/refcnt.h>.

Prevent lazy developers, like David and I, to use atomic operations
without including <sys/atomic.h>.

ok dlg@


# 1.6 11-Sep-2015 dlg

unbreak build on UP kernels.

found by deraadt@


# 1.5 11-Sep-2015 dlg

make srp use refcnts so it can use refcnt_finalize instead of
sleep_setup/sleep_finish.


# 1.4 11-Sep-2015 dlg

remove some bits of srp.h i had pasted in here by accident


# 1.3 09-Sep-2015 dlg

implement a singly linked list built with SRPs.

this allows us to build lists of things that can be followed by
multiple cpus.

ok mpi@ claudio@


# 1.2 01-Sep-2015 dlg

mattieu baptiste reported a problem with bpf+srps where the per cpu
hazard pointers were becoming corrupt and therefore panics.

the problem turned out to be that bridge_input calls if_input on
behalf of a hardware interface which then calls bpf_mtap at splsoftnet,
while the actual hardware nic calls if_input and bpf_mtap at splnet.
the hardware interrupts ran in the middle of the bpf calls bridge
runs at softnet. this means the same srps are being entered and
left on the same cpu at different ipls, which led to races because
of the order of operations on the per cpu hazard pointers.

after a lot of experimentation, jmatthew@ figured out how to deal
with this problem without introducing per cpu critical sections
(ie, splhigh) calls in srp_enter and srp_leave, and without introducing
atomic operations.

the solution is to iterate forward through the array of hazard
pointers in srp_enter, and backward in srp_leave to clear. if you
guarantee that you leave srps in the reverse order to entering them,
then you can use the same set of SRPs at different IPLs on the same
CPU.

the ordering requirement is a problem if we want to build linked
data structures out of srps because you need to hold a ref to the
current element containing the next srp to use it, before giving
up the current ref. we're adding srp_follow() to support taking the
next ref and giving up the current one while preserving the structure
of the hazard pointer list. srp_follow() does this by reusing the
hazard pointer for the current reference for the next ref.

both mattieu baptiste and jmatthew@ have been hitting this pretty
hard with a tweaked version of srp+bpf that uses srp_follow instead
of interleaved srp_enter/srp_leave sequences. neither can reproduce
the panics anymore.

thanks to mattieu for the report and tests
ok jmatthew@


Revision tags: OPENBSD_5_8_BASE
# 1.1 02-Jul-2015 dlg

introduce srp, which according to the manpage i wrote is short for
"shared reference pointers".

srp allows concurrent access to a data structure by multiple cpus
while avoiding interlocking cpu opcodes. it manages its own reference
counts and the garbage collection of those data structure to avoid
use after frees.

internally srp is a twisted version of hazard pointers, which are
a relative of RCU.

jmatthew wrote the bulk of a hazard pointer implementation and
changed bpf to use it to allow mpsafe access to bpfilters. however,
at s2k15 we were trying to apply it to other data structures but
the memory overhead of every hazard pointer would have blown out
significantly in several uses cases. a bulk of our time at s2k15
was spent reworking hazard pointers into srp.

this diff adds the srp api and adds the necessary metadata to struct
cpuinfo on our MP architectures. srp on uniprocessor platforms has
alternate code that is optimised because it knows there'll be no
concurrent access to data by multiple cpus.

srp is made available to the system via param.h, so it should be
available everywhere in the kernel.

the docs likely need improvement cos im too close to the implementation.

ok mpi@