History log of /freebsd-11-stable/sys/kern/subr_taskqueue.c
Revision Date Author Comments
(<<< Hide modified files)
(Show modified files >>>)
# 354406 06-Nov-2019 mav

MFC r354241: Some more taskqueue optimizations.

- Optimize enqueue for two task priority values by adding new tq_hint
field, pointing to the last task inserted into the middle of the list.
In case of more then two priority values it should halve average search.
- Move tq_active insert/remove out of the taskqueue_run_locked loop.
Instead of dirtying few shared cache lines per task introduce different
mechanism to drain active tasks, based on task sequence number counter,
that uses only cache lines already present in cache. Since the new
mechanism does not need ordering, switch tq_active from TAILQ to LIST.
- Move static and dynamic struct taskqueue fields into different cache
lines. Move lock into its own cache line, so that heavy lock spinning
by multiple waiting threads would not affect the running thread.
- While there, correct some TQ_SLEEP() wait messages.

This change fixes certain ZFS write workloads, causing huge congestion
on taskqueue lock. Those workloads combine some large block writes to
saturate the pool and trigger allocation throttling, which uses higher
priority tasks to requeue the delayed I/Os, with many small blocks to
generate deep queue of small tasks for taskqueue to sort.

Sponsored by: iXsystems, Inc.


# 354405 06-Nov-2019 mav

MFC r349220: Add wakeup_any(), cheaper wakeup_one() for taskqueue(9).

wakeup_one() and underlying sleepq_signal() spend additional time trying
to be fair, waking thread with highest priority, sleeping longest time.
But in case of taskqueue there are many absolutely identical threads, and
any fairness between them is quite pointless. It makes even worse, since
round-robin wakeups not only make previous CPU affinity in scheduler quite
useless, but also hide from user chance to see CPU bottlenecks, when
sequential workload with one request at a time looks evenly distributed
between multiple threads.

This change adds new SLEEPQ_UNFAIR flag to sleepq_signal(), making it wakeup
thread that went to sleep last, but no longer in context switch (to avoid
immediate spinning on the thread lock). On top of that new wakeup_any()
function is added, equivalent to wakeup_one(), but setting the flag.
On top of that taskqueue(9) is switchied to wakeup_any() to wakeup its
threads.

As result, on 72-core Xeon v4 machine sequential ZFS write to 12 ZVOLs
with 16KB block size spend 34% less time in wakeup_any() and descendants
then it was spending in wakeup_one(), and total write throughput increased
by ~10% with the same as before CPU usage.


# 341154 28-Nov-2018 markj

MFC r340730, r340731:
Add taskqueue_quiesce(9) and use it to implement taskq_wait().

PR: 227784


# 328392 25-Jan-2018 pkelsey

MFC of r305169:

_taskqueue_start_threads() now fails if it doesn't actually start any threads.

Reviewed by: jhb
Differential Revision: https://reviews.freebsd.org/D7701


# 323447 11-Sep-2017 ian

MFC r320901-r320902, r320996-r320997, r321002, r321048, r321400, r321743,
r321745

r320901:
Protect access to the AT realtime clock with its own mutex.

The mutex protecting access to the registered realtime clock should not be
overloaded to protect access to the atrtc hardware, which might not even be
the registered rtc. More importantly, the resettodr mutex needs to be
eliminated to remove locking/sleeping restrictions on clock drivers, and
that can't happen if MD code for amd64 depends on it. This change moves the
protection into what's really being protected: access to the atrtc date and
time registers.

This change also adds protection when the clock is accessed from
xentimer_settime(), which bypasses the resettodr locking.

Differential Revision: https://reviews.freebsd.org/D11483

r320902:
Support multiple realtime clocks, and remove locking/sleeping restrictions
on clock drivers.

This tracks multiple concurrent realtime clock drivers in a list sorted by
clock resolution. When system time changes (and periodically) the
clock_settime() methods of all registered clocks are invoked.

To initialize system time, each driver is tried in turn from best to worst
resolution, until one succesfully returns a valid time.

The code no longer holds a mutex while calling the clock_settime() and
clock_gettime() methods of the registered clocks. This allows clock drivers
to do whatever kind of locking or sleeping is necessary (this is especially
important for i2c clock chips since i2c drivers often need to sleep).

A new clock_register_flags() function allows the clock driver to pass
flags. The flags currently defined help support drivers that use their own
techniques to avoid roundoff errors (prevents the 4/5 rounding done by the
subr_rtc code). A driver which may need to wait for resources (such as bus
ownership) may pass a flag to indicate that it will obtain system time for
itself after waiting for resources; this is merely an optimization to avoid
the common code retrieving a timespec that will never get used.

Relnotes: yes
Differential Revision: https://reviews.freebsd.org/D11484

r320996:
Allow setting debug.clocktime as a tunable. Print 64-bit time_t correctly
on 32-bit systems.

r320997:
Minor optimization: instead of converting between days and years using
loops that start in 1970, assume most conversions are going to be for recent
dates and use a precomputed number of days through the end of 2016.

r321002:
Revert r320997. There are reports of it getting the wrong results, so
clearly my testing was insuffficent, and it's best to just revert it
until I get it straightened out.

r321048:
Minor optimization: instead of converting between days and years using loops
that start in 1970, assume most conversions are going to be for recent dates
and use a precomputed number of days through the end of 2016.

This is a do-over of r320997, hopefully this time with 100% more workiness.

The first attempt had an off-by-one error, but instead of just adding
another mysterious +1 adjustment, this rearranges the relationship between
recent_base_year and recent_base_days so that the latter is the number of
days that occurred before the start of the associated year (instead of the
count thru the end of that year). This makes the recent_base stuff work
more like the original loop logic that didn't need any +1 adjustments.

r321400:
Add common code to support realtime clocks that store year without century.

Most realtime clocks store the year as 2 BCD digits. Some add a century bit
to extend the range another hundred years. Every clock driver has its own
code to determine the century and pass a full year value to clock_ct_to_ts().
Now clock drivers can just convert BCD to bin and store the result in the
clocktime struct and let the common code figure out the century. Clocks
with a century bit can just add 100 to year if the century bit is on.

r321743:
Add taskqueue_enqueue_timeout_sbt(), because sometimes you want more control
over the scheduling precision than 'ticks' can offer, and because sometimes
you're already working with sbintime_t units and it's dumb to convert them
to ticks just so they can get converted back to sbintime_t under the hood.

r321745:
Add clock_schedule(), a feature that allows realtime clock drivers to
request that their clock_settime() methods be called at a given offset
from top-of-second. This adds a timeout_task to the rtc_instance so that
each clock can be separately added to taskqueue_thread with the scheduling
it prefers, instead of looping through all the clocks at once with a
single task on taskqueue_thread. If a driver doesn't call clock_schedule()
the default is the old behavior: clock_settime() is queued immediately.


# 315267 14-Mar-2017 hselasky

MFC r314553:

Implement taskqueue_poll_is_busy() for use by the LinuxKPI.
Refer to comment above function for a detailed description.

Discussed with: kib @
Sponsored by: Mellanox Technologies


# 306946 10-Oct-2016 hselasky

MFC r306441 and r306634:
While draining a timeout task prevent the taskqueue_enqueue_timeout()
function from restarting the timer.

Commonly taskqueue_enqueue_timeout() is called from within the task
function itself without any checks for teardown. Then it can happen
the timer stays active after the return of taskqueue_drain_timeout(),
because the timeout and task is drained separately.

This patch factors out the teardown flag into the timeout task itself,
allowing existing code to stay as-is instead of applying a teardown
flag to each and every of the timeout task consumers.

Add assert to taskqueue_drain_timeout() which prevents parallel
execution on the same timeout task.

Update manual page documenting the return value of
taskqueue_enqueue_timeout().

Differential Revision: https://reviews.freebsd.org/D8012
Reviewed by: kib, trasz


# 304704 23-Aug-2016 shurd

MFC r304021: Update iflib to support more NIC designs

- Move group task queue into kern/subr_gtaskqueue.c
- Change intr_enable to return an int so it can be detected if it's not
implemented
- Allow different TX/RX queues per set to be different sizes
- Don't split up TX mbufs before transmit
- Allow a completion queue for TX as well as RX
- Pass the RX budget to isc_rxd_available() to allow an earlier return
and avoid multiple calls

Approved by: sbruno


# 302408 07-Jul-2016 gjb

Copy head@r302406 to stable/11 as part of the 11.0-RELEASE cycle.
Prune svn:mergeinfo from the new branch, as nothing has been merged
here.

Additional commits post-branch will follow.

Approved by: re (implicit)
Sponsored by: The FreeBSD Foundation


/freebsd-11-stable/MAINTAINERS
/freebsd-11-stable/cddl
/freebsd-11-stable/cddl/contrib/opensolaris
/freebsd-11-stable/cddl/contrib/opensolaris/cmd/dtrace/test/tst/common/print
/freebsd-11-stable/cddl/contrib/opensolaris/cmd/zfs
/freebsd-11-stable/cddl/contrib/opensolaris/lib/libzfs
/freebsd-11-stable/contrib/amd
/freebsd-11-stable/contrib/apr
/freebsd-11-stable/contrib/apr-util
/freebsd-11-stable/contrib/atf
/freebsd-11-stable/contrib/binutils
/freebsd-11-stable/contrib/bmake
/freebsd-11-stable/contrib/byacc
/freebsd-11-stable/contrib/bzip2
/freebsd-11-stable/contrib/com_err
/freebsd-11-stable/contrib/compiler-rt
/freebsd-11-stable/contrib/dialog
/freebsd-11-stable/contrib/dma
/freebsd-11-stable/contrib/dtc
/freebsd-11-stable/contrib/ee
/freebsd-11-stable/contrib/elftoolchain
/freebsd-11-stable/contrib/elftoolchain/ar
/freebsd-11-stable/contrib/elftoolchain/brandelf
/freebsd-11-stable/contrib/elftoolchain/elfdump
/freebsd-11-stable/contrib/expat
/freebsd-11-stable/contrib/file
/freebsd-11-stable/contrib/gcc
/freebsd-11-stable/contrib/gcclibs/libgomp
/freebsd-11-stable/contrib/gdb
/freebsd-11-stable/contrib/gdtoa
/freebsd-11-stable/contrib/groff
/freebsd-11-stable/contrib/ipfilter
/freebsd-11-stable/contrib/ldns
/freebsd-11-stable/contrib/ldns-host
/freebsd-11-stable/contrib/less
/freebsd-11-stable/contrib/libarchive
/freebsd-11-stable/contrib/libarchive/cpio
/freebsd-11-stable/contrib/libarchive/libarchive
/freebsd-11-stable/contrib/libarchive/libarchive_fe
/freebsd-11-stable/contrib/libarchive/tar
/freebsd-11-stable/contrib/libc++
/freebsd-11-stable/contrib/libc-vis
/freebsd-11-stable/contrib/libcxxrt
/freebsd-11-stable/contrib/libexecinfo
/freebsd-11-stable/contrib/libpcap
/freebsd-11-stable/contrib/libstdc++
/freebsd-11-stable/contrib/libucl
/freebsd-11-stable/contrib/libxo
/freebsd-11-stable/contrib/llvm
/freebsd-11-stable/contrib/llvm/projects/libunwind
/freebsd-11-stable/contrib/llvm/tools/clang
/freebsd-11-stable/contrib/llvm/tools/lldb
/freebsd-11-stable/contrib/llvm/tools/llvm-dwarfdump
/freebsd-11-stable/contrib/llvm/tools/llvm-lto
/freebsd-11-stable/contrib/mdocml
/freebsd-11-stable/contrib/mtree
/freebsd-11-stable/contrib/ncurses
/freebsd-11-stable/contrib/netcat
/freebsd-11-stable/contrib/ntp
/freebsd-11-stable/contrib/nvi
/freebsd-11-stable/contrib/one-true-awk
/freebsd-11-stable/contrib/openbsm
/freebsd-11-stable/contrib/openpam
/freebsd-11-stable/contrib/openresolv
/freebsd-11-stable/contrib/pf
/freebsd-11-stable/contrib/sendmail
/freebsd-11-stable/contrib/serf
/freebsd-11-stable/contrib/sqlite3
/freebsd-11-stable/contrib/subversion
/freebsd-11-stable/contrib/tcpdump
/freebsd-11-stable/contrib/tcsh
/freebsd-11-stable/contrib/tnftp
/freebsd-11-stable/contrib/top
/freebsd-11-stable/contrib/top/install-sh
/freebsd-11-stable/contrib/tzcode/stdtime
/freebsd-11-stable/contrib/tzcode/zic
/freebsd-11-stable/contrib/tzdata
/freebsd-11-stable/contrib/unbound
/freebsd-11-stable/contrib/vis
/freebsd-11-stable/contrib/wpa
/freebsd-11-stable/contrib/xz
/freebsd-11-stable/crypto/heimdal
/freebsd-11-stable/crypto/openssh
/freebsd-11-stable/crypto/openssl
/freebsd-11-stable/gnu/lib
/freebsd-11-stable/gnu/usr.bin/binutils
/freebsd-11-stable/gnu/usr.bin/cc/cc_tools
/freebsd-11-stable/gnu/usr.bin/gdb
/freebsd-11-stable/lib/libc/locale/ascii.c
/freebsd-11-stable/sys/cddl/contrib/opensolaris
/freebsd-11-stable/sys/contrib/dev/acpica
/freebsd-11-stable/sys/contrib/ipfilter
/freebsd-11-stable/sys/contrib/libfdt
/freebsd-11-stable/sys/contrib/octeon-sdk
/freebsd-11-stable/sys/contrib/x86emu
/freebsd-11-stable/sys/contrib/xz-embedded
/freebsd-11-stable/usr.sbin/bhyve/atkbdc.h
/freebsd-11-stable/usr.sbin/bhyve/bhyvegc.c
/freebsd-11-stable/usr.sbin/bhyve/bhyvegc.h
/freebsd-11-stable/usr.sbin/bhyve/console.c
/freebsd-11-stable/usr.sbin/bhyve/console.h
/freebsd-11-stable/usr.sbin/bhyve/pci_fbuf.c
/freebsd-11-stable/usr.sbin/bhyve/pci_xhci.c
/freebsd-11-stable/usr.sbin/bhyve/pci_xhci.h
/freebsd-11-stable/usr.sbin/bhyve/ps2kbd.c
/freebsd-11-stable/usr.sbin/bhyve/ps2kbd.h
/freebsd-11-stable/usr.sbin/bhyve/ps2mouse.c
/freebsd-11-stable/usr.sbin/bhyve/ps2mouse.h
/freebsd-11-stable/usr.sbin/bhyve/rfb.c
/freebsd-11-stable/usr.sbin/bhyve/rfb.h
/freebsd-11-stable/usr.sbin/bhyve/sockstream.c
/freebsd-11-stable/usr.sbin/bhyve/sockstream.h
/freebsd-11-stable/usr.sbin/bhyve/usb_emul.c
/freebsd-11-stable/usr.sbin/bhyve/usb_emul.h
/freebsd-11-stable/usr.sbin/bhyve/usb_mouse.c
/freebsd-11-stable/usr.sbin/bhyve/vga.c
/freebsd-11-stable/usr.sbin/bhyve/vga.h
# 302372 06-Jul-2016 nwhitehorn

Replace a number of conflations of mp_ncpus and mp_maxid with either
mp_maxid or CPU_FOREACH() as appropriate. This fixes a number of places in
the kernel that assumed CPU IDs are dense in [0, mp_ncpus) and would try,
for example, to run tasks on CPUs that did not exist or to allocate too
few buffers on systems with sparse CPU IDs in which there are holes in the
range and mp_maxid > mp_ncpus. Such circumstances generally occur on
systems with SMT, but on which SMT is disabled. This patch restores system
operation at least on POWER8 systems configured in this way.

There are a number of other places in the kernel with potential problems
in these situations, but where sparse CPU IDs are not currently known
to occur, mostly in the ARM machine-dependent code. These will be fixed
in a follow-up commit after the stable/11 branch.

PR: kern/210106
Reviewed by: jhb
Approved by: re (glebius)


# 301208 02-Jun-2016 mjg

taskqueue: plug a leak in _taskqueue_create

While here make some style fixes and postpone the sprintf so that it is
only done when the function can no longer fail.

CID: 1356041


# 300372 21-May-2016 avg

fix loss of taskqueue wakeups (introduced in r300113)

Submitted by: kmacy
Tested by: dchagin


# 300219 19-May-2016 scottl

Adjust the creation of tq_name so it can be freed correctly

Reviewed by: jhb, allanjude
Differential Revision: D6454


# 300113 18-May-2016 scottl

Import the 'iflib' API library for network drivers. From the author:

"iflib is a library to eliminate the need for frequently duplicated device
independent logic propagated (poorly) across many network drivers."

Participation is purely optional. The IFLIB kernel config option is
provided for drivers that want to transition between legacy and iflib
modes of operation. ixl and ixgbe driver conversions will be committed
shortly. We hope to see participation from the Broadcom and maybe
Chelsio drivers in the near future.

Submitted by: mmacy@nextbsd.org
Reviewed by: gallatin
Differential Revision: D5211


# 296272 01-Mar-2016 jhb

Remove taskqueue_enqueue_fast().

taskqueue_enqueue() was changed to support both fast and non-fast
taskqueues 10 years ago in r154167. It has been a compat shim ever
since. It's time for the compat shim to go.

Submitted by: Howard Su <howard0su@gmail.com>
Reviewed by: sephe
Differential Revision: https://reviews.freebsd.org/D5131


# 290805 13-Nov-2015 rrs

This fixes several places where callout_stops return is examined. The
new return codes of -1 were mistakenly being considered "true". Callout_stop
now returns -1 to indicate the callout had either already completed or
was not running and 0 to indicate it could not be stopped. Also update
the manual page to make it more consistent no non-zero in the callout_stop
or callout_reset descriptions.

MFC after: 1 Month with associated callout change.


# 283551 25-May-2015 delphij

MFuser/delphij/zfs-arc-rebase@r281754:

In r256613, taskqueue_enqueue_locked() have been modified to release the
task queue lock before returning. In r276665, taskqueue_drain_all() will
call taskqueue_enqueue_locked() to insert the barrier task into the queue,
but did not reacquire the lock after it but later code expects the lock
still being held (e.g. TQ_SLEEP()).

The barrier task is special and if we release then reacquire the lock,
there would be a small race window where a high priority task could sneak
into the queue. Looking more closely, the race seems to be tolerable but
is undesirable from semantics standpoint.

To solve this, in taskqueue_drain_tq_queue(), instead of directly calling
taskqueue_enqueue_locked(), insert the barrier task directly without
releasing the lock.


# 279300 25-Feb-2015 adrian

Remove taskqueue_start_threads_pinned(); there's noa generic cpuset version of this.

Sponsored by: Norse Corp, Inc.


# 278879 17-Feb-2015 adrian

Implement taskqueue_start_threads_cpuset().

This is a more generic version of taskqueue_start_threads_pinned()
which only supports a single cpuid.

This originally came from John Baldwin <jhb@> who implemented it
as part of a push towards NUMA awareness in drivers. I started implementing
something similar for RSS and NUMA, then found he already did it.

I'd like to axe taskqueue_start_threads_pinned() so it doesn't become
part of a longer-term API. (Read: hps@ wants to MFC things, and
if I don't do this soon, he'll MFC what's here. :-)

I have a follow-up commit which converts the intel drivers over
to using the cpuset version of this function, so we can eventually
nuke the the pinned version.

Tested:

* igb, ixgbe

Obtained from: jhbbsd


# 276665 04-Jan-2015 gibbs

Prevent live-lock and access of destroyed data in taskqueue_drain_all().

Phabric: https://reviews.freebsd.org/D1247
Reviewed by: jhb, avg
Sponsored by: Spectra Logic Corporation

sys/kern_subr_taskqueue.c:
Modify taskqueue_drain_all() processing to use a temporary
"barrier task", rather than rely on a user task that may
be destroyed during taskqueue_drain_all()'s execution. The
barrier task is queued behind all previously queued tasks
and then has its priority elevated so that future tasks
cannot pass it in the queue.

Use a similar barrier scheme to drain threads processing
current tasks. This requires taskqueue_run_locked() to
insert and remove the taskqueue_busy object for the running
thread for every task processed.

share/man/man9/taskqueue.9:
Remove warning about live-lock issues with taskqueue_drain_all()
and indicate that it does not wait for tasks queued after
it begins processing.


# 275345 30-Nov-2014 gibbs

Remove trailing whitespace.


# 269666 07-Aug-2014 ae

Temporary revert r269661, it looks like the patch isn't complete.


# 269661 07-Aug-2014 ae

Use cpuset_setithread() to apply cpu mask to taskq threads.

Sponsored by: Yandex LLC


# 266939 01-Jun-2014 adrian

Pin the right thread.

This _was_ right, a last minute suggestion and not enough testing makes
Adrian a bad boy.

Tested:

* igb(4) with RSS patches, by hand verifying each igb(4) taskqueue
tid from procstat -ka using cpuset -g -t <tid>.


# 266629 24-May-2014 adrian

Add a new taskqueue setup method that takes a cpuid to pin the
taskqueue worker thread(s) to.

For now it isn't a taskqueue/taskthread error to fail to pin
to the given cpuid.

Thanks to rpaulo@, kib@ and jhb@ for feedback.

Tested:

* igb(4), with local RSS patches to pin taskqueues.

TODO:

* ask the doc team for help in documenting the new API call.
* add a taskqueue_start_threads_cpuset() method which takes
a cpuset_t - but this may require a bunch of surgery to
bring cpuset_t into scope.


# 258713 28-Nov-2013 avg

add taskqueue_drain_all

This API has semantics similar to that of taskqueue_drain but acts on
all tasks that might be queued or running on a taskqueue.
A caller must ensure that no new tasks are being enqueued otherwise this
call would be totally meaningless. For example, if the tasks are
enqueued by an interrupt filter then its interrupt must be disabled.

MFC after: 10 days


# 258354 19-Nov-2013 avg

taskqueue_cancel: garbage collect a write-only variable

MFC after: 3 days


# 256862 21-Oct-2013 mav

Add comments that taskqueue_enqueue_locked() returns without the lock.


# 256730 18-Oct-2013 glebius

Revert r256587.

Requested by: zec


# 256613 16-Oct-2013 mav

MFprojects/camlock r254763:
Move tq_enqueue() call out of the queue lock for known handlers (actually
I have found no others in the base system). This reduces queue lock hold
time and congestion spinning under active multithreaded enqueuing.


# 256612 16-Oct-2013 mav

MFprojects/camlock r254685:
Remove TQ_FLAGS_PENDING flag, softly duplicating queue emptiness status.


# 256587 16-Oct-2013 glebius

For VIMAGE kernels store vnet in the struct task, and set vnet context
during task processing.

Reported & tested by: mm


# 254787 24-Aug-2013 mav

MFprojects/camlock r254460:
Remove locking from taskqueue_member(). The list of threads is static
during the taskqueue life cycle, so there is no need to protect it,
taking quite congested lock several more times for each ZFS I/O.


# 248649 23-Mar-2013 will

Extend taskqueue(9) to enable per-taskqueue callbacks.

The scope of these callbacks is primarily to support actions that affect the
taskqueue's thread environments. They are entirely optional, and
consequently are introduced as a new API: taskqueue_set_callback().

This interface allows the caller to specify that a taskqueue requires a
callback and optional context pointer for a given callback type.

The callback types included in this commit can be used to register a
constructor and destructor for thread-local storage using osd(9). This
allows a particular taskqueue to define that its threads require a specific
type of TLS, without the need for a specially-orchestrated task-based
mechanism for startup and shutdown in order to accomplish it.

Two callback types are supported at this point:

- TASKQUEUE_CALLBACK_TYPE_INIT, called by every thread when it starts, prior
to processing any tasks.
- TASKQUEUE_CALLBACK_TYPE_SHUTDOWN, called by every thread when it exits,
after it has processed its last task but before the taskqueue is
reclaimed.

While I'm here:

- Add two new macros, TQ_ASSERT_LOCKED and TQ_ASSERT_UNLOCKED, and use them
in appropriate locations.
- Fix taskqueue.9 to mention taskqueue_start_threads(), which is a required
interface for all consumers of taskqueue(9).

Reviewed by: kib (all), eadler (taskqueue.9), brd (taskqueue.9)
Approved by: ken (mentor)
Sponsored by: Spectra Logic
MFC after: 1 month


# 243341 20-Nov-2012 kib

Add a special meaning to the negative ticks argument for
taskqueue_enqueue_timeout(). Do not rearm the callout if it is
already armed and the ticks is negative. Otherwise rearm it to fire
in abs(ticks) ticks in the future.

The intended use is to call taskqueue_enqueue_timeout() for the given
timeout_task with the same negative ticks argument. As result, the
task is scheduled to execute not further than abs(ticks) ticks in
future, and the consequent enqueues are coalesced until the already
scheduled task is finished.

Reviewed by: rwatson
Tested by: Markus Gebert <markus.gebert@hostpoint.ch>
MFC after: 2 weeks


# 239779 28-Aug-2012 jhb

Shorten the name of the fast SWI taskqueue to "fast taskq" so that
it fits.

Reported by: lev
MFC after: 1 week


# 225570 15-Sep-2011 adrian

Ensure that ta_pending doesn't overflow u_short by capping its value at USHRT_MAX.

If it overflows before the taskqueue can run, the task will be
re-added to the taskqueue and cause a loop in the task list.

Reported by: Arnaud Lacombe <lacombar@gmail.com>
Submitted by: Ryan Stone <rysto32@gmail.com>
Reviewed by: jhb
Approved by: re (kib)
MFC after: 1 day


# 221059 26-Apr-2011 kib

Implement the delayed task execution extension to the taskqueue
mechanism. The caller may specify a timeout in ticks after which the
task will be scheduled.

Sponsored by: The FreeBSD Foundation
Reviewed by: jeff, jhb
MFC after: 1 month


# 215750 23-Nov-2010 avg

taskqueue: drop unused tq_name field

tq_name was used write-only and besides it was just a pointer, so it
could point to some garbage in a temporary buffer that's gone.
This change shouldn't change KPI/KBI as struct taskqueue is private to
subr_taskqueue.c.
If we find a need for tq_name it can be resurrected at any moment.
taskqueue_create() interface is preserved for this purpose.

Suggested by: jhb
MFC after: 10 days


# 215021 08-Nov-2010 jmallett

Use macros rather than inline functions to lock and unlock mutexes, so that
line number information is preserved in witness.

Reviewed by: jhb


# 215011 08-Nov-2010 mdf

Add a taskqueue_cancel(9) to cancel a pending task without waiting for
it to run as taskqueue_drain(9) does.

Requested by: hselasky
Original code: jeff
Reviewed by: jhb
MFC after: 2 weeks


# 213813 13-Oct-2010 mdf

Use a safer mechanism for determining if a task is currently running,
that does not rely on the lifetime of pointers being the same. This also
restores the task KBI.

Suggested by: jhb
MFC after: 1 month


# 213739 12-Oct-2010 mdf

Re-expose and briefly document taskqueue_run(9). The function is used
in at least one 3rd party driver.

Requested by: jhb


# 211928 28-Aug-2010 pjd

Run all tasks from a proper context, with proper priority, etc.

Reviewed by: jhb
MFC after: 1 month


# 211284 13-Aug-2010 pjd

Simplify taskqueue_drain() by using proved macros.


# 210380 22-Jul-2010 mdf

Remove unused variable that snuck in during development.

Approved by: zml (mentor)


# 210377 22-Jul-2010 mdf

Fix taskqueue_drain(9) to not have false negatives. For threaded
taskqueues, more than one task can be running simultaneously.

Also make taskqueue_run(9) static to the file, since there are no
consumers in the base kernel and the function signature needs to change
with this fix.

Remove mention of taskqueue_run(9) and taskqueue_run_fast(9) from the
taskqueue(9) man page.

Reviewed by: jhb
Approved by: zml (mentor)


# 209062 11-Jun-2010 avg

fix a few cases where a string is passed via format argument instead of
via %s

Most of the cases looked harmless, but this is done for the sake of
correctness. In one case it even allowed to drop an intermediate buffer.

Found by: clang
MFC after: 2 week


# 208715 01-Jun-2010 zml

Revert taskqueue(9) related commits until mdf@ is approved and can
resolve issues.

This reverts commits r207439, r208623, r208624


# 208624 28-May-2010 zml

Avoid a wakeup(9) if we can be sure no one is waiting on the task.

Submitted by: Matthew Fleming <matthew.fleming@isilon.com>
Reviewed by: zml, jhb


# 208623 28-May-2010 zml

Revert r207439 and solve the problem differently. The task handler
ta_func may free the task structure, so no references to its members
are valid after the handler has been called. Using a per-queue member
and having waits longer than strictly necessary was suggested by jhb.

Submitted by: Matthew Fleming <matthew.fleming@isilon.com>
Reviewed by: zml, jhb


# 207439 30-Apr-2010 zml

Handle taskqueue_drain(9) correctly on a threaded taskqueue:

taskqueue_drain(9) will not correctly detect whether a task is
currently running. The check is against a field in the taskqueue
struct, but for a threaded queue with more than one thread, multiple
threads can simultaneously be running a task, thus stomping over the
tq_running field.

Submitted by: Matthew Fleming <matthew.fleming@isilon.com>
Reviewed by: jhb
Approved by: dfr (mentor)


# 198411 23-Oct-2009 jhb

- Fix several off-by-one errors when using MAXCOMLEN. The p_comm[] and
td_name[] arrays are actually MAXCOMLEN + 1 in size and a few places that
created shadow copies of these arrays were just using MAXCOMLEN.
- Prefer using sizeof() of an array type to explicit constants for the
array length in a few places.
- Ensure that all of p_comm[] and td_name[] is always zero'd during
execve() to guard against any possible information leaks. Previously
trailing garbage in p_comm[] could be leaked to userland in ktrace
record headers via td_name[].

Reviewed by: bde


# 196358 18-Aug-2009 pjd

Remove unused taskqueue_find() function.

Reviewed by: dfr
Approved by: re (kib)


# 196295 17-Aug-2009 pjd

Remove OpenSolaris taskq port (it performs very poorly in our kernel) and
replace it with wrappers around our taskqueue(9).
To make it possible implement taskqueue_member() function which returns 1
if the given thread was created by the given taskqueue.

Approved by: re (kib)


# 196293 17-Aug-2009 pjd

Because taskqueue_run() can drop tq_mutex, we need to check if the
TQ_FLAGS_ACTIVE flag wasn't removed in the meantime, which means we missed a
wakeup.

Approved by: re (kib)


# 188592 13-Feb-2009 thompsa

Remove semicolon left in the last commit

Spotted by: csjp


# 188548 12-Feb-2009 thompsa

Check the exit flag at the start of the taskqueue loop rather than the end. It
is possible to tear down the taskqueue before the thread has run and the
taskqueue loop would sleep forever.

Reviewed by: sam
MFC after: 1 week


# 188058 03-Feb-2009 imp

Use NULL in preference to 0 for pointers.


# 180588 18-Jul-2008 kmacy

revert local change


# 180583 18-Jul-2008 kmacy

import vendor fixes to cxgb


# 178123 11-Apr-2008 jhb

Use kthread_exit() to terminate a taskqueue thread rather than kproc_exit()
now that the taskqueue threads are kthreads rather than kprocs.

Reported by: kris


# 178015 08-Apr-2008 sam

change taskqueue_start_threads to create threads instead of proc's

Reviewed by: jhb


# 177621 25-Mar-2008 scottl

Implement taskqueue_block() and taskqueue_unblock(). These functions allow
the owner of a queue to block and unblock execution of the tasks in the
queue while allowing tasks to continue to be added queue. Combining this
with taskqueue_drain() allows a queue to be safely disabled. The unblock
function may run (or schedule to run) the queue when it is called, just as
calling taskqueue_enqueue() would.

Reviewed by: jhb, sam


# 172836 20-Oct-2007 julian

Rename the kthread_xxx (e.g. kthread_create()) calls
to kproc_xxx as they actually make whole processes.
Thos makes way for us to add REAL kthread_create() and friends
that actually make theads. it turns out that most of these
calls actually end up being moved back to the thread version
when it's added. but we need to make this cosmetic change first.

I'd LOVE to do this rename in 7.0 so that we can eventually MFC the
new kthread_xxx() calls.


# 170307 04-Jun-2007 jeff

Commit 14/14 of sched_lock decomposition.
- Use thread_lock() rather than sched_lock for per-thread scheduling
sychronization.
- Use the per-process spinlock rather than the sched_lock for per-process
scheduling synchronization.

Tested by: kris, current@
Tested on: i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc.
Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)


# 166188 23-Jan-2007 jeff

- Remove setrunqueue and replace it with direct calls to sched_add().
setrunqueue() was mostly empty. The few asserts and thread state
setting were moved to the individual schedulers. sched_add() was
chosen to displace it for naming consistency reasons.
- Remove adjustrunqueue, it was 4 lines of code that was ifdef'd to be
different on all three schedulers where it was only called in one place
each.
- Remove the long ifdef'd out remrunqueue code.
- Remove the now redundant ts_state. Inspect the thread state directly.
- Don't set TSF_* flags from kern_switch.c, we were only doing this to
support a feature in one scheduler.
- Change sched_choose() to return a thread rather than a td_sched. Also,
rely on the schedulers to return the idlethread. This simplifies the
logic in choosethread(). Aside from the run queue links kern_switch.c
mostly does not care about the contents of td_sched.

Discussed with: julian

- Move the idle thread loop into the per scheduler area. ULE wants to
do something different from the other schedulers.

Suggested by: jhb

Tested on: x86/amd64 sched_{4BSD, ULE, CORE}.


# 158904 24-May-2006 sam

When starting up threads in taskqueue_start_threads create them
stopped before adjusting their priority and setting them on the run
q so they cannot race for resources (pointed out by njl).

While here add a console printf on thread create fails; otherwise
noone may notice (e.g. return value is always 0 and caller has no
way to verify).

Reviewed by: jhb, scottl
MFC after: 2 weeks


# 157815 17-Apr-2006 jhb

Change msleep() and tsleep() to not alter the calling thread's priority
if the specified priority is zero. This avoids a race where the calling
thread could read a snapshot of it's current priority, then a different
thread could change the first thread's priority, then the original thread
would call sched_prio() inside msleep() undoing the change made by the
second thread. I used a priority of zero as no thread that calls msleep()
or tsleep() should be specifying a priority of zero anyway.

The various places that passed 'curthread->td_priority' or some variant
as the priority now pass 0.


# 157314 30-Mar-2006 sam

fixup error handling in taskqueue_start_threads: check for kthread_create
failing, print a message when we fail for some reason as most callers do
not check the return value (e.g. 'cuz they're called from SYSINIT)

Reviewed by: scottl
MFC after: 1 week


# 154333 13-Jan-2006 scottl

Add the following to the taskqueue api:

taskqueue_start_threads(struct taskqueue **, int count, int pri,
const char *name, ...);

This allows the creation of 1 or more threads that will service a single
taskqueue. Also rework the taskqueue_create() API to remove the API change
that was introduced a while back. Creating a taskqueue doesn't rely on
the presence of a process structure, and the proc mechanics are much better
encapsulated in taskqueue_start_threads(). Also clean up the
taskqueue_terminate() and taskqueue_free() functions to safely drain
pending tasks and remove all associated threads.

The TASKQUEUE_DEFINE and TASKQUEUE_DEFINE_THREAD macros have been changed
to use the new API, but drivers compiled against the old definitions will
still work. Thus, recompiling drivers is not a strict requirement.


# 154205 10-Jan-2006 scottl

The interlock in taskqueue_terminate() is completely wrong for taskqueues
that use spinlocks. Remove it for now.


# 154167 10-Jan-2006 scottl

Add functions and macros and refactor code to make it easier to manage
fast taskqueues. The following have been added:

TASKQUEUE_FAST_DEFINE() - create a global task queue.
an arbitrary execution context.
TASKQUEUE_FAST_DEFINE_THREAD() - create a global taskqueue that uses a
dedicated kthread.
taskqueue_create_fast() - create a local/private taskqueue.

These are all complimentary of the standard taskqueue functions. They are
primarily useful for fast interrupt handlers that can only use spinlock for
synchronization.

I personally think that the taskqueue API is starting to get too narrow and
hairy, but fixing it will require a major redesign on the API. Such a
redesign would be good but would break compatibility with FreeBSD 6.x, so
it really isn't desirable at this time.

Submitted by: sam


# 153676 23-Dec-2005 scottl

Create the taskqueue_fast handler with INTR_MPSAFE so that it doesn't run
with Giant.

MFC After: 3 days


# 151656 25-Oct-2005 jhb

Use shorter names for the Giant and fast taskqueues so that their names
actually fit.


# 151624 24-Oct-2005 jhb

Revert previous change to this file. I accidentally committed while
fixing spelling in a comment.


# 151623 24-Oct-2005 jhb

Spell hierarchy correctly in comments.

Submitted by: Wojciech A. Koszek dunstan at freebsd dot czest dot pl


# 145729 30-Apr-2005 sam

o enable shutdown of taskqueue threads; the thread servicing the queue checks
a new entry in the taskqueue struct each time it wakes up to see if it
should terminate
o adjust TASKQUEUE_DEFINE_THREAD & co. to record the thread/proc identity for
the shutdown rendezvous
o replace wakeup after adding a task to a queue with wakeup_one; this helps
queues where multiple threads are used to service tasks (e.g. acpi)
o remove NULL check of tq_enqueue method; it should never be NULL

Reviewed by: dfr, njl


# 145473 24-Apr-2005 sam

o eliminate modification of task structures after their run to avoid
modify-after-free races when the task structure is malloc'd
o shrink task structure by removing ta_flags (no longer needed with
avoid fix) and combining ta_pending and ta_priority

Reviewed by: dwhite, dfr
MFC after: 4 days


# 136131 05-Oct-2004 imp

Add taskqueue_drain. This waits for the specified task to finish, if
running, or returns. The calling program is responsible for making sure
that nothing new is enqueued.

# man page coming soon.


# 133305 08-Aug-2004 jmg

rearange some code that handles the thread taskqueue so that it is more
generic. Introduce a new define TASKQUEUE_DEFINE_THREAD that takes a
single arg, which is the name of the queue.

Document these changes.


# 131246 28-Jun-2004 jhb

- Execute all of the tasks on the taskqueue during taskqueue_free() after
the queue has been removed from the global taskqueue_queues list. This
removes the need for the draining queue hack.
- Allow taskqueue_run() to be called with the taskqueue mutex held. It
can still be called without the lock for API compatiblity. In that case
it will acquire the lock internally.
- Don't lock the individual queue mutex in taskqueue_find() until after the
strcmp as the global queues mutex is sufficient for the strcmp.
- Simplify taskqueue_thread_loop() now that it can hold the lock across
taskqueue_run().

Submitted by: bde (mostly)


# 126027 19-Feb-2004 jhb

Tidy up the thread taskqueue implementation and close a lost wakeup race.
Instead of creating a mutex that we msleep on but don't actually lock when
doing the corresponding wakeup(), in the kthread, lock the mutex associated
with our taskqueue and msleep while the queue is empty. Assert that the
queue is locked when the callback function is called to wake the kthread.


# 123614 17-Dec-2003 jhb

Various style fixes.

Submitted by: bde (mostly, if not all)


# 122436 10-Nov-2003 alfred

Fix a bug where the taskqueue kproc was being parented by init
because RFNOWAIT was being passed to kproc_create.

The result was that shutdown took quite a bit longer because this
errant "child" would not respond to termination signals from init
at system shutdown.

RFNOWAIT dissassociates itself from the caller by attaching to init
as a parent proc. We could have had the taskqueue proc listen for
SIGKILL, but being able to SIGKILL a potentially critical system
process doesn't seem like a good idea.


# 119812 06-Sep-2003 sam

correct fast swi taskqueue spinlock name to be different from the sleep lock

Submitted by: Tor Egge <Tor.Egge@cvsup.no.freebsd.org>


# 119789 05-Sep-2003 sam

"fast swi" taskqueue support. This is a taskqueue that uses spinlocks
making it useful for dispatching swi tasks from fast interrupt handlers.

Sponsered by: FreeBSD Foundation


# 119708 03-Sep-2003 ken

Move dynamic sysctl(8) variable creation for the cd(4) and da(4) drivers
out of cdregister() and daregister(), which are run from interrupt context.

The sysctl code does blocking mallocs (M_WAITOK), which causes problems
if malloc(9) actually needs to sleep.

The eventual fix for this issue will involve moving the CAM probe process
inside a kernel thread. For now, though, I have fixed the issue by moving
dynamic sysctl variable creation for these two drivers to a task queue
running in a kernel thread.

The existing task queues (taskqueue_swi and taskqueue_swi_giant) run in
software interrupt handlers, which wouldn't fix the problem at hand. So I
have created a new task queue, taskqueue_thread, that runs inside a kernel
thread. (It also runs outside of Giant -- clients must explicitly acquire
and release Giant in their taskqueue functions.)

scsi_cd.c: Remove sysctl variable creation code from cdregister(), and
move it to a new function, cdsysctlinit(). Queue
cdsysctlinit() to the taskqueue_thread taskqueue once we
have fully registered the cd(4) driver instance.

scsi_da.c: Remove sysctl variable creation code from daregister(), and
move it to move it to a new function, dasysctlinit().
Queue dasysctlinit() to the taskqueue_thread taskqueue once
we have fully registered the da(4) instance.

taskqueue.h: Declare the new taskqueue_thread taskqueue, update some
comments.

subr_taskqueue.c:
Create the new kernel thread taskqueue. This taskqueue
runs outside of Giant, so any functions queued to it would
need to explicitly acquire/release Giant if they need it.

cd.4: Update the cd(4) man page to talk about the minimum command
size sysctl/loader tunable. Also note that the changer
variables are available as loader tunables as well.

da.4: Update the da(4) man page to cover the retry_count,
default_timeout and minimum_cmd_size sysctl variables/loader
tunables. Remove references to /dev/r???, they aren't used
any longer.

cd.9: Update the cd(9) man page to describe the CD_Q_10_BYTE_ONLY
quirk.

taskqueue.9: Update the taskqueue(9) man page to describe the new thread
task queue, and the taskqueue_swi_giant queue.

MFC after: 3 days


# 116182 10-Jun-2003 obrien

Use __FBSDID().


# 111528 26-Feb-2003 scottl

Introduce a new taskqueue that runs completely free of Giant, and in
turns runs its tasks free of Giant too. It is intended that as drivers
become locked down, they will move out of the old, Giant-bound taskqueue
and into this new one. The old taskqueue has been renamed to
taskqueue_swi_giant, and the new one keeps the name taskqueue_swi.


# 101154 01-Aug-2002 jhb

Forced commit to note that the previous log was incorrect. The previous
commit added an assertion that a taskqueue being free'd wasn't being
drained at the same time.


# 101153 01-Aug-2002 jhb

If we fail to write to a vnode during a ktrace write, then we drop all
other references to that vnode as a trace vnode in other processes as well
as in any pending requests on the todo list. Thus, it is possible for a
ktrace request structure to have a NULL ktr_vp when it is destroyed in
ktr_freerequest(). We shouldn't call vrele() on the vnode in that case.

Reported by: bde


# 93818 04-Apr-2002 jhb

Change callers of mtx_init() to pass in an appropriate lock type name. In
most cases NULL is passed, but in some cases such as network driver locks
(which use the MTX_NETWORK_LOCK macro) and UMA zone locks, a name is used.

Tested on: i386, alpha, sparc64


# 88900 05-Jan-2002 jhb

Change the preemption code for software interrupt thread schedules and
mutex releases to not require flags for the cases when preemption is
not allowed:

The purpose of the MTX_NOSWITCH and SWI_NOSWITCH flags is to prevent
switching to a higher priority thread on mutex releease and swi schedule,
respectively when that switch is not safe. Now that the critical section
API maintains a per-thread nesting count, the kernel can easily check
whether or not it should switch without relying on flags from the
programmer. This fixes a few bugs in that all current callers of
swi_sched() used SWI_NOSWITCH, when in fact, only the ones called from
fast interrupt handlers and the swi_sched of softclock needed this flag.
Note that to ensure that swi_sched()'s in clock and fast interrupt
handlers do not switch, these handlers have to be explicitly wrapped
in critical_enter/exit pairs. Presently, just wrapping the handlers is
sufficient, but in the future with the fully preemptive kernel, the
interrupt must be EOI'd before critical_exit() is called. (critical_exit()
can switch due to a deferred preemption in a fully preemptive kernel.)

I've tested the changes to the interrupt code on i386 and alpha. I have
not tested ia64, but the interrupt code is almost identical to the alpha
code, so I expect it will work fine. PowerPC and ARM do not yet have
interrupt code in the tree so they shouldn't be broken. Sparc64 is
broken, but that's been ok'd by jake and tmm who will be fixing the
interrupt code for sparc64 shortly.

Reviewed by: peter
Tested on: i386, alpha


# 85560 26-Oct-2001 jhb

- Change the taskqueue locking to protect the necessary parts of a task
while it is on a queue with the queue lock and remove the per-task locks.
- Remove TASK_DESTROY now that it is no longer needed.
- Go back to inlining TASK_INIT now that it is short again.

Inspired by: dfr


# 85521 26-Oct-2001 jhb

Add locking to taskqueues. There is one mutex per task, one mutex per
queue, and a mutex to protect the global list of taskqueues. The only
visible change is that a TASK_DESTROY() macro has been added to mirror
the TASK_INIT() macro to destroy a task before it is free'd.

Submitted by: Andrew Reiter <awr@watson.org>


# 76666 16-May-2001 alfred

remove include of ipl.h because it no longer exists


# 72238 09-Feb-2001 jhb

- Catch up to the new swi API changes:
- Use swi_* function names.
- Use void * to hold cookies to handlers instead of struct intrhand *.
- In sio.c, use 'driver_name' instead of "sio" as the name of the driver
lock to minimize diffs with cy(4).


# 69774 08-Dec-2000 phk

Staticize some malloc M_ instances.


# 67551 25-Oct-2000 jhb

- Overhaul the software interrupt code to use interrupt threads for each
type of software interrupt. Roughly, what used to be a bit in spending
now maps to a swi thread. Each thread can have multiple handlers, just
like a hardware interrupt thread.
- Instead of using a bitmask of pending interrupts, we schedule the specific
software interrupt thread to run, so spending, NSWI, and the shandlers
array are no longer needed. We can now have an arbitrary number of
software interrupt threads. When you register a software interrupt
thread via sinthand_add(), you get back a struct intrhand that you pass
to sched_swi() when you wish to schedule your swi thread to run.
- Convert the name of 'struct intrec' to 'struct intrhand' as it is a bit
more intuitive. Also, prefix all the members of struct intrhand with
'ih_'.
- Make swi_net() a MI function since there is now no point in it being
MD.

Submitted by: cp


# 66698 05-Oct-2000 jhb

- Heavyweight interrupt threads on the alpha for device I/O interrupts.
- Make softinterrupts (SWI's) almost completely MI, and divorce them
completely from the x86 hardware interrupt code.
- The ihandlers array is now gone. Instead, there is a MI shandlers array
that just contains SWI handlers.
- Most of the former machine/ipl.h files have moved to a new sys/ipl.h.
- Stub out all the spl*() functions on all architectures.

Submitted by: dfr


# 65822 13-Sep-2000 jhb

- Remove the inthand2_t type and use the equivalent driver_intr_t type from
newbus for referencing device interrupt handlers.
- Move the 'struct intrec' type which describes interrupt sources into
sys/interrupt.h instead of making it just be a x86 structure.
- Don't create 'ithd' and 'intrec' typedefs, instead, just use 'struct ithd'
and 'struct intrec'
- Move the code to translate new-bus interrupt flags into an interrupt thread
priority out of the x86 nexus code and into a MI ithread_priority()
function in sys/kern/kern_intr.c.
- Remove now-uneeded x86-specific headers from sys/dev/ata/ata-all.c and
sys/pci/pci_compat.c.


# 64199 03-Aug-2000 hsu

Modify to use fixed STAILQ_LAST().

Reviewed by: dfr


# 61033 28-May-2000 dfr

Add taskqueue system for easy-to-use SWIs among other things.

Reviewed by: arch