Cross Reference: /openbsd-current/sys/net/bpfdesc.h

History log of /openbsd-current/sys/net/bpfdesc.h
Revision (<<< Hide revision tags) (Show revision tags >>>)	Date	Author	Comments
# 1.49	15-Aug-2024	dlg	add BIOCSETFNR, which is like BIOCSETF but doesnt reset the buffer or stats. from Matthew Luckie <mjl@luckie.org.nz> via tech@ deraadt@ likes it.
Revision tags: OPENBSD_7_3_BASE OPENBSD_7_4_BASE OPENBSD_7_5_BASE
# 1.48	09-Mar-2023	dlg	add a timeout between capturing a packet and making the buffer readable. before this, there were three reasons that a bpf read will finish. the first is the obvious one: the bpf packet buffer in the kernel fills up. by default this is about 32k, so if you're only capturing a small packet packet every few seconds, it can take a long time for the buffer to fill up before you can read them. the second is if bpf has been configured to enable immediate mode with ioctl(BIOCIMMEDIATE). this means that when any packet is written into the bpf buffer, the buffer is immediately readable. this is fine if the packet rate is low, but if the packet rate is high you don't get the benefit of buffering many packets that bpf is supposed to provide. the third mechanism is if bpf has been configured with the BIOCSRTIMEOUT ioctl, which sets a maximum wait time on a bpf read. BIOCSRTIMEOUT means than a clock starts ticking down when a program (eg pflogd) reads from bpf. when the clock reaches zero then the read returns with whatever is in the bpf packet buffer. however, there could be nothing in the buffer, and the read will still complete. deraadt@ noticed this behaviour with pflogd. it wants packets logged by pf to end up on disk in a timely fashion, but it's fine with tolerating a bit of delay so it can take advantatage of buffering to amortise the cost of the reads per packet. it currently does this with BIOCSRTIMEOUT set to half a second, which means it's always waking up every half second even if there's nothing to log. this diff adds BIOCSWTIMEOUT, which specifies a timeout from when bpf first puts a packet in the capture buffer, and when the buffer becomes readable. by default this wait timeout is infinite, meaning the buffer has to be filled before it becomes readable. BIOCSWTIMEOUT can be set to enable the new functionality. BIOCIMMEDIATE is turned into a variation of BIOCSWTIMEOUT with the wait time set to 0, ie, wait 0 seconds between when a packet is written to the buffer and when the buffer becomes readable. combining BIOCSWTIMEOUT and BIOCIMMEDIATE simplifies the code a lot. for pflogd, this means if there are no packets to capture, pflogd won't wake up every half second to do nothing. however, when a packet is logged by pf, bpf will wait another half second to see if any more packets arrive (or the buffer fills up) before the read fires. discussed a lot with deraadt@ and sashan@ ok sashan@
Revision tags: OPENBSD_7_2_BASE
# 1.47	09-Jul-2022	visa	Unwrap klist from struct selinfo as this code no longer uses selwakeup(). OK jsg@
Revision tags: OPENBSD_7_1_BASE
# 1.46	17-Mar-2022	visa	Use the refcnt API in bpf. OK sashan@ bluhm@
Revision tags: OPENBSD_6_9_BASE OPENBSD_7_0_BASE
# 1.45	21-Jan-2021	dlg	let vfs keep track of nonblocking state for us. ok claudio@ mvs@
# 1.44	02-Jan-2021	cheloha	bpf(4): remove ticks Change bd_rtout to a uint64_t of nanoseconds. Update the code in bpfioctl() and bpfread() accordingly. Add a local copy of nsecuptime() to make the diff smaller. This will need to move to kern_tc.c if/when we have another user elsewhere in the kernel. Prompted by mpi@. With input from dlg@. ok dlg@ mpi@ visa@
# 1.43	26-Dec-2020	cheloha	bpf(4): bpf_d struct: replace bd_rdStart member with bd_nreaders member bd_rdStart is strange. It nominally represents the start of a read(2) on a given bpf(4) descriptor, but there are several problems with it: 1. If there are multiple readers, the bd_rdStart is not set by subsequent readers, so their timeout is screwed up. The read timeout should really be tracked on a per-thread basis in bpfread(). 2. We set bd_rdStart for poll(2), select(2), and kevent(2), even though that makes no sense. We should not be setting bd_rdStart in bpfpoll() or bpfkqfilter(). 3. bd_rdStart is buggy. If ticks is 0 when the read starts then bpf_catchpacket() won't wake up the reader. This is a problem inherent to the design of bd_rdStart: it serves as both a boolean and a scalar value, even though 0 is a valid value in the scalar range. So let's replace it with a better struct member. "bd_nreaders" is a count of threads sleeping in bpfread(). It is incremented before a thread goes to sleep in bpfread() and decremented when a thread wakes up. If bd_nreaders is greater than zero when we reach bpf_catchpacket() and fbuf is non-NULL we wake up all readers. The read timeout, if any, is now tracked locally by the thread in bpfread(). Unlike bd_rdStart, bpfpoll() and bpfkqfilter() don't touch bd_nreaders. Prompted by mpi@. Basic idea from dlg@. Lots of input from dlg@. Tested by dlg@ with tcpdump(8) (blocking read) and flow-collector (https://github.com/eait-itig/flow-collector, non-blocking read). ok dlg@
# 1.42	11-Dec-2020	cheloha	bpf(4): BIOCGRTIMEOUT, BIOCSRTIMEOUT: protect bd_rtout with bd_mtx Reading and writing bd_rtout is not an atomic operation, so it needs to be done under the per-descriptor mutex. While here, start annotating locking in bpfdesc.h. There's lots more to do on this front, but you have to start somewhere. Tweaked by mpi@. ok mpi@
Revision tags: OPENBSD_6_8_BASE
# 1.41	13-May-2020	cheloha	bpf(4): separate descriptor non-blocking status from read timeout If you set FIONBIO on a bpf(4) descriptor you enable non-blocking mode and also clobber any read timeout set for the descriptor. The reverse is also true: do BIOCSRTIMEOUT and you'll set a timeout and simultaneously disable non-blocking status. The two are mutually exclusive. This relationship is undocumented and might cause a bug. At the very least it makes reasoning about the code difficult. This patch adds a new member to bpf_d, bd_rnonblock, to store the non-blocking status of the descriptor. The read timeout is still kept in bd_rtout. With this in place, non-blocking status and the read timeout can coexist. Setting one state does not clear the other, and vice versa. Separating the two states also clears the way for changing the bpf(4) read timeout to use the system clock instead of ticks. More on that in a later patch. With insight from dlg@ regarding the purpose of the read timeout. ok dlg@
Revision tags: OPENBSD_6_7_BASE
# 1.40	02-Jan-2020	claudio	Switch bpf to use pgsigio(9) and sigio_init(9) instead of handrolling something with csignal(). OK visa@
# 1.39	21-Oct-2019	sashan	put bpfdesc reference counting back, revert change introduced in 1.175 as: BPF: remove redundant reference counting of filedescriptors Anton@ made problem crystal clear: I've been looking into a similar bpf panic reported by syzkaller, which looks somewhat related. The one reported by syzkaller is caused by issuing ioctl(SIOCIFDESTROY) on the interface which the packet filter is attached to. This will in turn invoke the following functions expressed as an inverted stacktrace: 1. bpfsdetach() 2. vdevgone() 3. VOP_REVOKE() 4. vop_generic_revoke() 5. vgonel() 6. vclean(DOCLOSE) 7. VOP_CLOSE() 8. bpfclose() Note that bpfclose() is called before changing the vnode type. In bpfclose(), the `struct bpf_d` is immediately removed from the global bpf_d_list list and might end up sleeping inside taskq_barrier(systq). Since the bpf file descriptor (fd) is still present and valid, another thread could perform an ioctl() on the fd only to fault since bpfilter_lookup() will return NULL. The vnode is not locked in this path either so it won't end up waiting on the ongoing vclean(). Steps to trigger the similar type of panic are straightforward, let there be two processes running concurrently: process A: while true ; do ifconfig tun0 up ; ifconfig tun0 destroy ; done process B: while true ; do tcpdump -i tun0 ; done panic happens within few secs (Dell PowerEdge 710) OK @visa, OK @anton
Revision tags: OPENBSD_6_6_BASE
# 1.38	18-May-2019	sashan	branches: 1.38.2; BPF: remove redundant reference counting of filedescriptors OK visa@, OK mpi@
# 1.37	15-Apr-2019	sashan	moving BPF to RCU OK visa@
Revision tags: OPENBSD_6_3_BASE OPENBSD_6_4_BASE OPENBSD_6_5_BASE
# 1.36	24-Jan-2018	dlg	add support for bpf on "subsystems", not just network interfaces bpf assumed that it was being unconditionally attached to network interfaces, and maintained a pointer to a struct ifnet *. this was mostly used to get at the name of the interface, which is how userland asks to be attached to a particular interface. this diff adds a pointer to the name and uses it instead of the interface pointer for these lookups. this in turn allows bpf to be attached to arbitrary subsystems in the kernel which just have to supply a name rather than an interface pointer. for example, bpf could be attached to pf_test so you can see what packets are about to be filtered. mpi@ is using this to look at usb transfers. bpf still uses the interface pointer for bpfwrite, and for enabling and disabling promisc. however, these are nopped out for subsystems. ok mpi@
Revision tags: OPENBSD_6_1_BASE OPENBSD_6_2_BASE
# 1.35	24-Jan-2017	krw	A space here, a space there. Soon we're talking real whitespace rectification.
# 1.34	09-Jan-2017	mpi	Use a mutex to serialize accesses to buffer slots. With this change bpf_catchpacket() no longer need the KERNEL_LOCK(). Tested by Hrvoje Popovski who reported a recursion in the previous attempt. ok bluhm@
# 1.33	03-Jan-2017	mpi	Revert previous, there's still a problem with recursive entries in bpf_mpath_ether(). Problem reported by Hrvoje Popovski.
# 1.32	02-Jan-2017	mpi	Use a mutex to serialize accesses to buffer slots. With this change bpf_catchpacket() no longer need the KERNEL_LOCK(). ok bluhm@, jmatthew@
# 1.31	22-Aug-2016	mpi	Call csignal() and selwakeup() from a KERNEL_LOCK'd task. This will allow us make bpf_tap() KERNEL_LOCK() free. Discussed with dlg@ and input from guenther@
Revision tags: OPENBSD_6_0_BASE
# 1.30	30-Mar-2016	dlg	remove support for BIOCGQUEUE and BIOSGQUEUE nothing uses them, and the implementation make incorrect assumptions about mbufs within bpf processing that could lead to some weird failures. ok sthen@ deraadt@ mpi@
Revision tags: OPENBSD_5_9_BASE
# 1.29	03-Dec-2015	mpi	Use SRPL_HEAD() and SRPL_ENTRY() to be consistent with and allow to fallback to a SLIST. ok dlg@, jasper@
# 1.28	09-Sep-2015	dlg	convert bpf to using an srp list for the list of descriptors. this replaces the hand rolled list. the code has always used hand rolled lists, but that gets a bit cumbersome when theyre SRPs. requested ages ago by mpi@
# 1.27	01-Sep-2015	dlg	reintroduce bpf.c r1.121. this differs slightly from 1.121 in that it uses the new srp_follow() to walk the list of descriptors on an interface. this is instead of interleaving srp_enter() and srp_leave(), which can lead to races and corruption if you're touching the same SRPs at different IPLs on the same CPU. ok deraadt@ jmatthew@
# 1.26	23-Aug-2015	dlg	back out bpf+srp. its blowing up in a bridge setup. ill debug this out of the tree.
# 1.25	16-Aug-2015	dlg	make bpf_mtap mpsafe by using SRPs. this was originally implemented by jmatthew@ last year, and updated by us both during s2k15. there are four data structures that need to be looked after. the first is the bpf interface itself. it is allocated and freed at the same time as an actual interface, so if you're able to send or receive packets, you're able to run bpf on an interface too. dont need to do any work there. the second are bpf descriptors. these represent userland attaching to a bpf interface, so you can have many of them on a single bpf interface. they were arranged in a singly linked list before. now the head and next pointers are replaced with SRP pointers and followed by srp_enter. the list updates are serialised by the kernel lock. the third are the bpf filters. there is an inbound and outbound filter on each bpf descriptor, ann a process can replace them at any time. the pointers from the descriptor to those is also changed to be accessed via srp_enter. updates are serialised by the kernel lock. the fourth thing is the ring that bpf writes to for userland to read. there's one of these per descriptor. because these are only updated when a filter matches (which is hopefully a relatively rare event), we take the kernel lock to serialise the writes to the ring. all this together means you can run bpf against a packet without taking the kernel lock unless you actually caught a packet and need to send it to userland. even better, you can run bpf in parallel, so if we ever support multiple rings on a single interface, we can run bpf on each ring on different cpus safely. ive hit this pretty hard in production at work (yay dhcrelay) on myx (which does rx outside the biglock). ok jmatthew@ mpi@ millert@
Revision tags: OPENBSD_5_7_BASE OPENBSD_5_8_BASE
# 1.24	10-Feb-2015	pelikan	make bpf(4) able to filter based on a pf(4) queue ID for tcpdump -Q qname ALTQ version has been on tech@ for years, people were generally ok with it. ok henning
# 1.23	05-Oct-2014	lteo	fix typo in comment: correspoding -> corresponding
Revision tags: OPENBSD_5_5_BASE OPENBSD_5_6_BASE
# 1.22	18-Dec-2013	krw	Revert the other part of bpf.c's r1.84. May finally fix RD Thrush's encounter with "timeout_add: to_ticks (-1) < 0". Pointed out by RD Thrush.
# 1.21	12-Nov-2013	dlg	try bpf.c r1.84 again, this time without semantic changes to if statements. cheers to sthen@ and krw@ for properly dealing with the fallout of my first commit.
# 1.20	11-Nov-2013	sthen	Revert bpf.c 1.84 / bpfdesc.h 1.19 for now, "panic: timeout_add: to_ticks (-1) < 0" seen by RD Thrush, http://article.gmane.org/gmane.os.openbsd.bugs/20113 where he has a long-running process using bpf which is active at the time of panic. krw@ agrees with reverting for now.
# 1.19	11-Nov-2013	dlg	replace the user of ticks in a condition like "interval + start < ticks" with "ticks - start > interval" because the latter copes with the ticks value wrapping. pointed out by guenther@ ok krw@
# 1.18	24-Oct-2013	deraadt	Move obvious kernel prototypes (and structure's with kernel pointers, obviously only used in the kernel) behind #ifdef _KERNEL
Revision tags: OPENBSD_4_0_BASE OPENBSD_4_1_BASE OPENBSD_4_2_BASE OPENBSD_4_3_BASE OPENBSD_4_4_BASE OPENBSD_4_5_BASE OPENBSD_4_6_BASE OPENBSD_4_7_BASE OPENBSD_4_8_BASE OPENBSD_4_9_BASE OPENBSD_5_0_BASE OPENBSD_5_1_BASE OPENBSD_5_2_BASE OPENBSD_5_3_BASE OPENBSD_5_4_BASE
# 1.17	25-Mar-2006	djm	allow bpf(4) to ignore packets based on their direction (inbound or outbound), using a new BIOCSDIRFILT ioctl; guidance, feedback and ok canacar@
Revision tags: OPENBSD_3_9_BASE
# 1.16	21-Nov-2005	millert	Move contents of sys/select.h to sys/selinfo.h in preparation for a userland-visible sys/select.h. Consistent with what Net and Free do. OK deraadt@, tested with full ports build by naddy@.
Revision tags: OPENBSD_3_7_BASE OPENBSD_3_8_BASE
# 1.15	17-Dec-2004	reyk	knf cleanup, convert old k&r-style functions to ansi-style for a consistent style in sys/net/bpf.c. ok henning@, "looks fine" canacar@
Revision tags: OPENBSD_3_6_BASE
# 1.14	22-Jun-2004	canacar	Add a new "filter drop" flag to bpf and related ioclts. When enabled, it notifies the calling interface that the packet matches a bpf filter and should be dropped. ok henning@ markus@ frantzen@
Revision tags: SMP_SYNC_A SMP_SYNC_B
# 1.13	28-May-2004	grange	bpf device cloning. Now to have more bpf devices just add device nodes in /dev, no need to recompile kernel anymore. Code from form@pdp-11.org.ru, some help from markus@. ok markus@ canacar@ deraadt@
# 1.12	08-May-2004	canacar	reference count bpf descriptors to protect against disappearing interfaces while asleep in read. ok deraadt@
Revision tags: OPENBSD_3_5_BASE
# 1.11	22-Oct-2003	canacar	Add locking and write filtering to bpf descriptors. Locking prevents dangerous ioctls such as changing the interface and sending signals to be executed by an unprivileged process. A filter can also be applied to packets injected through a bpf descriptor. These features allow programs using bpf descriptors to safely drop/seperate privileges. ok frantzen@ henning@ mcbride@
Revision tags: OPENBSD_3_4_BASE
# 1.10	02-Jun-2003	millert	Remove the advertising clause in the UCB license which Berkeley rescinded 22 July 1999. Proofed by myself and Theo.
Revision tags: OPENBSD_3_1_BASE OPENBSD_3_2_BASE OPENBSD_3_3_BASE UBC_SYNC_A UBC_SYNC_B
# 1.9	14-Mar-2002	millert	First round of __P removal in sys
Revision tags: OPENBSD_3_0_BASE UBC_BASE
# 1.8	09-Jun-2001	angelos	branches: 1.8.4; By popular demand, protect from multiple inclusion, and fix to use the same naming style.
# 1.7	28-May-2001	dugsong	add BIOC[GS]HDRCMPLT ioctl for BPF, to disable overwriting of link level source address in forged frames. from NetBSD. art@ok
Revision tags: OPENBSD_2_8_BASE OPENBSD_2_9_BASE
# 1.6	19-Jun-2000	jason	de-#ifdef-ize
Revision tags: OPENBSD_2_6_BASE OPENBSD_2_7_BASE SMP_BASE kame_19991208
# 1.5	08-Aug-1999	niklas	branches: 1.5.4; Support detaching of network interfaces. Still work to do in ipf, and other families than inet.
Revision tags: OPENBSD_2_4_BASE OPENBSD_2_5_BASE
# 1.4	26-Jun-1998	deraadt	fix bpf select(); from mts@rare.net
Revision tags: OPENBSD_2_2_BASE OPENBSD_2_3_BASE
# 1.3	31-Aug-1997	deraadt	for non-tty TIOCSPGRP/F_SETOWN/FIOSETOWN pgid setting calls, store uid and euid as well, then deliver them using new csignal() interface which ensures that pgid setting process is permitted to signal the pgid process(es). Thanks to newsham@aloha.net for extensive help and discussion.
Revision tags: OPENBSD_2_1_BASE
# 1.2	24-Feb-1997	niklas	OpenBSD tags + some prototyping police
# 1.1	18-Oct-1995	deraadt	branches: 1.1.1; Initial revision
# 1.48	09-Mar-2023	dlg	add a timeout between capturing a packet and making the buffer readable. before this, there were three reasons that a bpf read will finish. the first is the obvious one: the bpf packet buffer in the kernel fills up. by default this is about 32k, so if you're only capturing a small packet packet every few seconds, it can take a long time for the buffer to fill up before you can read them. the second is if bpf has been configured to enable immediate mode with ioctl(BIOCIMMEDIATE). this means that when any packet is written into the bpf buffer, the buffer is immediately readable. this is fine if the packet rate is low, but if the packet rate is high you don't get the benefit of buffering many packets that bpf is supposed to provide. the third mechanism is if bpf has been configured with the BIOCSRTIMEOUT ioctl, which sets a maximum wait time on a bpf read. BIOCSRTIMEOUT means than a clock starts ticking down when a program (eg pflogd) reads from bpf. when the clock reaches zero then the read returns with whatever is in the bpf packet buffer. however, there could be nothing in the buffer, and the read will still complete. deraadt@ noticed this behaviour with pflogd. it wants packets logged by pf to end up on disk in a timely fashion, but it's fine with tolerating a bit of delay so it can take advantatage of buffering to amortise the cost of the reads per packet. it currently does this with BIOCSRTIMEOUT set to half a second, which means it's always waking up every half second even if there's nothing to log. this diff adds BIOCSWTIMEOUT, which specifies a timeout from when bpf first puts a packet in the capture buffer, and when the buffer becomes readable. by default this wait timeout is infinite, meaning the buffer has to be filled before it becomes readable. BIOCSWTIMEOUT can be set to enable the new functionality. BIOCIMMEDIATE is turned into a variation of BIOCSWTIMEOUT with the wait time set to 0, ie, wait 0 seconds between when a packet is written to the buffer and when the buffer becomes readable. combining BIOCSWTIMEOUT and BIOCIMMEDIATE simplifies the code a lot. for pflogd, this means if there are no packets to capture, pflogd won't wake up every half second to do nothing. however, when a packet is logged by pf, bpf will wait another half second to see if any more packets arrive (or the buffer fills up) before the read fires. discussed a lot with deraadt@ and sashan@ ok sashan@
Revision tags: OPENBSD_7_2_BASE
# 1.47	09-Jul-2022	visa	Unwrap klist from struct selinfo as this code no longer uses selwakeup(). OK jsg@
Revision tags: OPENBSD_7_1_BASE
# 1.46	17-Mar-2022	visa	Use the refcnt API in bpf. OK sashan@ bluhm@
Revision tags: OPENBSD_6_9_BASE OPENBSD_7_0_BASE
# 1.45	21-Jan-2021	dlg	let vfs keep track of nonblocking state for us. ok claudio@ mvs@
# 1.44	02-Jan-2021	cheloha	bpf(4): remove ticks Change bd_rtout to a uint64_t of nanoseconds. Update the code in bpfioctl() and bpfread() accordingly. Add a local copy of nsecuptime() to make the diff smaller. This will need to move to kern_tc.c if/when we have another user elsewhere in the kernel. Prompted by mpi@. With input from dlg@. ok dlg@ mpi@ visa@
# 1.43	26-Dec-2020	cheloha	bpf(4): bpf_d struct: replace bd_rdStart member with bd_nreaders member bd_rdStart is strange. It nominally represents the start of a read(2) on a given bpf(4) descriptor, but there are several problems with it: 1. If there are multiple readers, the bd_rdStart is not set by subsequent readers, so their timeout is screwed up. The read timeout should really be tracked on a per-thread basis in bpfread(). 2. We set bd_rdStart for poll(2), select(2), and kevent(2), even though that makes no sense. We should not be setting bd_rdStart in bpfpoll() or bpfkqfilter(). 3. bd_rdStart is buggy. If ticks is 0 when the read starts then bpf_catchpacket() won't wake up the reader. This is a problem inherent to the design of bd_rdStart: it serves as both a boolean and a scalar value, even though 0 is a valid value in the scalar range. So let's replace it with a better struct member. "bd_nreaders" is a count of threads sleeping in bpfread(). It is incremented before a thread goes to sleep in bpfread() and decremented when a thread wakes up. If bd_nreaders is greater than zero when we reach bpf_catchpacket() and fbuf is non-NULL we wake up all readers. The read timeout, if any, is now tracked locally by the thread in bpfread(). Unlike bd_rdStart, bpfpoll() and bpfkqfilter() don't touch bd_nreaders. Prompted by mpi@. Basic idea from dlg@. Lots of input from dlg@. Tested by dlg@ with tcpdump(8) (blocking read) and flow-collector (https://github.com/eait-itig/flow-collector, non-blocking read). ok dlg@
# 1.42	11-Dec-2020	cheloha	bpf(4): BIOCGRTIMEOUT, BIOCSRTIMEOUT: protect bd_rtout with bd_mtx Reading and writing bd_rtout is not an atomic operation, so it needs to be done under the per-descriptor mutex. While here, start annotating locking in bpfdesc.h. There's lots more to do on this front, but you have to start somewhere. Tweaked by mpi@. ok mpi@
Revision tags: OPENBSD_6_8_BASE
# 1.41	13-May-2020	cheloha	bpf(4): separate descriptor non-blocking status from read timeout If you set FIONBIO on a bpf(4) descriptor you enable non-blocking mode and also clobber any read timeout set for the descriptor. The reverse is also true: do BIOCSRTIMEOUT and you'll set a timeout and simultaneously disable non-blocking status. The two are mutually exclusive. This relationship is undocumented and might cause a bug. At the very least it makes reasoning about the code difficult. This patch adds a new member to bpf_d, bd_rnonblock, to store the non-blocking status of the descriptor. The read timeout is still kept in bd_rtout. With this in place, non-blocking status and the read timeout can coexist. Setting one state does not clear the other, and vice versa. Separating the two states also clears the way for changing the bpf(4) read timeout to use the system clock instead of ticks. More on that in a later patch. With insight from dlg@ regarding the purpose of the read timeout. ok dlg@
Revision tags: OPENBSD_6_7_BASE
# 1.40	02-Jan-2020	claudio	Switch bpf to use pgsigio(9) and sigio_init(9) instead of handrolling something with csignal(). OK visa@
# 1.39	21-Oct-2019	sashan	put bpfdesc reference counting back, revert change introduced in 1.175 as: BPF: remove redundant reference counting of filedescriptors Anton@ made problem crystal clear: I've been looking into a similar bpf panic reported by syzkaller, which looks somewhat related. The one reported by syzkaller is caused by issuing ioctl(SIOCIFDESTROY) on the interface which the packet filter is attached to. This will in turn invoke the following functions expressed as an inverted stacktrace: 1. bpfsdetach() 2. vdevgone() 3. VOP_REVOKE() 4. vop_generic_revoke() 5. vgonel() 6. vclean(DOCLOSE) 7. VOP_CLOSE() 8. bpfclose() Note that bpfclose() is called before changing the vnode type. In bpfclose(), the `struct bpf_d` is immediately removed from the global bpf_d_list list and might end up sleeping inside taskq_barrier(systq). Since the bpf file descriptor (fd) is still present and valid, another thread could perform an ioctl() on the fd only to fault since bpfilter_lookup() will return NULL. The vnode is not locked in this path either so it won't end up waiting on the ongoing vclean(). Steps to trigger the similar type of panic are straightforward, let there be two processes running concurrently: process A: while true ; do ifconfig tun0 up ; ifconfig tun0 destroy ; done process B: while true ; do tcpdump -i tun0 ; done panic happens within few secs (Dell PowerEdge 710) OK @visa, OK @anton
Revision tags: OPENBSD_6_6_BASE
# 1.38	18-May-2019	sashan	branches: 1.38.2; BPF: remove redundant reference counting of filedescriptors OK visa@, OK mpi@
# 1.37	15-Apr-2019	sashan	moving BPF to RCU OK visa@
Revision tags: OPENBSD_6_3_BASE OPENBSD_6_4_BASE OPENBSD_6_5_BASE
# 1.36	24-Jan-2018	dlg	add support for bpf on "subsystems", not just network interfaces bpf assumed that it was being unconditionally attached to network interfaces, and maintained a pointer to a struct ifnet *. this was mostly used to get at the name of the interface, which is how userland asks to be attached to a particular interface. this diff adds a pointer to the name and uses it instead of the interface pointer for these lookups. this in turn allows bpf to be attached to arbitrary subsystems in the kernel which just have to supply a name rather than an interface pointer. for example, bpf could be attached to pf_test so you can see what packets are about to be filtered. mpi@ is using this to look at usb transfers. bpf still uses the interface pointer for bpfwrite, and for enabling and disabling promisc. however, these are nopped out for subsystems. ok mpi@
Revision tags: OPENBSD_6_1_BASE OPENBSD_6_2_BASE
# 1.35	24-Jan-2017	krw	A space here, a space there. Soon we're talking real whitespace rectification.
# 1.34	09-Jan-2017	mpi	Use a mutex to serialize accesses to buffer slots. With this change bpf_catchpacket() no longer need the KERNEL_LOCK(). Tested by Hrvoje Popovski who reported a recursion in the previous attempt. ok bluhm@
# 1.33	03-Jan-2017	mpi	Revert previous, there's still a problem with recursive entries in bpf_mpath_ether(). Problem reported by Hrvoje Popovski.
# 1.32	02-Jan-2017	mpi	Use a mutex to serialize accesses to buffer slots. With this change bpf_catchpacket() no longer need the KERNEL_LOCK(). ok bluhm@, jmatthew@
# 1.31	22-Aug-2016	mpi	Call csignal() and selwakeup() from a KERNEL_LOCK'd task. This will allow us make bpf_tap() KERNEL_LOCK() free. Discussed with dlg@ and input from guenther@
Revision tags: OPENBSD_6_0_BASE
# 1.30	30-Mar-2016	dlg	remove support for BIOCGQUEUE and BIOSGQUEUE nothing uses them, and the implementation make incorrect assumptions about mbufs within bpf processing that could lead to some weird failures. ok sthen@ deraadt@ mpi@
Revision tags: OPENBSD_5_9_BASE
# 1.29	03-Dec-2015	mpi	Use SRPL_HEAD() and SRPL_ENTRY() to be consistent with and allow to fallback to a SLIST. ok dlg@, jasper@
# 1.28	09-Sep-2015	dlg	convert bpf to using an srp list for the list of descriptors. this replaces the hand rolled list. the code has always used hand rolled lists, but that gets a bit cumbersome when theyre SRPs. requested ages ago by mpi@
# 1.27	01-Sep-2015	dlg	reintroduce bpf.c r1.121. this differs slightly from 1.121 in that it uses the new srp_follow() to walk the list of descriptors on an interface. this is instead of interleaving srp_enter() and srp_leave(), which can lead to races and corruption if you're touching the same SRPs at different IPLs on the same CPU. ok deraadt@ jmatthew@
# 1.26	23-Aug-2015	dlg	back out bpf+srp. its blowing up in a bridge setup. ill debug this out of the tree.
# 1.25	16-Aug-2015	dlg	make bpf_mtap mpsafe by using SRPs. this was originally implemented by jmatthew@ last year, and updated by us both during s2k15. there are four data structures that need to be looked after. the first is the bpf interface itself. it is allocated and freed at the same time as an actual interface, so if you're able to send or receive packets, you're able to run bpf on an interface too. dont need to do any work there. the second are bpf descriptors. these represent userland attaching to a bpf interface, so you can have many of them on a single bpf interface. they were arranged in a singly linked list before. now the head and next pointers are replaced with SRP pointers and followed by srp_enter. the list updates are serialised by the kernel lock. the third are the bpf filters. there is an inbound and outbound filter on each bpf descriptor, ann a process can replace them at any time. the pointers from the descriptor to those is also changed to be accessed via srp_enter. updates are serialised by the kernel lock. the fourth thing is the ring that bpf writes to for userland to read. there's one of these per descriptor. because these are only updated when a filter matches (which is hopefully a relatively rare event), we take the kernel lock to serialise the writes to the ring. all this together means you can run bpf against a packet without taking the kernel lock unless you actually caught a packet and need to send it to userland. even better, you can run bpf in parallel, so if we ever support multiple rings on a single interface, we can run bpf on each ring on different cpus safely. ive hit this pretty hard in production at work (yay dhcrelay) on myx (which does rx outside the biglock). ok jmatthew@ mpi@ millert@
Revision tags: OPENBSD_5_7_BASE OPENBSD_5_8_BASE
# 1.24	10-Feb-2015	pelikan	make bpf(4) able to filter based on a pf(4) queue ID for tcpdump -Q qname ALTQ version has been on tech@ for years, people were generally ok with it. ok henning
# 1.23	05-Oct-2014	lteo	fix typo in comment: correspoding -> corresponding
Revision tags: OPENBSD_5_5_BASE OPENBSD_5_6_BASE
# 1.22	18-Dec-2013	krw	Revert the other part of bpf.c's r1.84. May finally fix RD Thrush's encounter with "timeout_add: to_ticks (-1) < 0". Pointed out by RD Thrush.
# 1.21	12-Nov-2013	dlg	try bpf.c r1.84 again, this time without semantic changes to if statements. cheers to sthen@ and krw@ for properly dealing with the fallout of my first commit.
# 1.20	11-Nov-2013	sthen	Revert bpf.c 1.84 / bpfdesc.h 1.19 for now, "panic: timeout_add: to_ticks (-1) < 0" seen by RD Thrush, http://article.gmane.org/gmane.os.openbsd.bugs/20113 where he has a long-running process using bpf which is active at the time of panic. krw@ agrees with reverting for now.
# 1.19	11-Nov-2013	dlg	replace the user of ticks in a condition like "interval + start < ticks" with "ticks - start > interval" because the latter copes with the ticks value wrapping. pointed out by guenther@ ok krw@
# 1.18	24-Oct-2013	deraadt	Move obvious kernel prototypes (and structure's with kernel pointers, obviously only used in the kernel) behind #ifdef _KERNEL
Revision tags: OPENBSD_4_0_BASE OPENBSD_4_1_BASE OPENBSD_4_2_BASE OPENBSD_4_3_BASE OPENBSD_4_4_BASE OPENBSD_4_5_BASE OPENBSD_4_6_BASE OPENBSD_4_7_BASE OPENBSD_4_8_BASE OPENBSD_4_9_BASE OPENBSD_5_0_BASE OPENBSD_5_1_BASE OPENBSD_5_2_BASE OPENBSD_5_3_BASE OPENBSD_5_4_BASE
# 1.17	25-Mar-2006	djm	allow bpf(4) to ignore packets based on their direction (inbound or outbound), using a new BIOCSDIRFILT ioctl; guidance, feedback and ok canacar@
Revision tags: OPENBSD_3_9_BASE
# 1.16	21-Nov-2005	millert	Move contents of sys/select.h to sys/selinfo.h in preparation for a userland-visible sys/select.h. Consistent with what Net and Free do. OK deraadt@, tested with full ports build by naddy@.
Revision tags: OPENBSD_3_7_BASE OPENBSD_3_8_BASE
# 1.15	17-Dec-2004	reyk	knf cleanup, convert old k&r-style functions to ansi-style for a consistent style in sys/net/bpf.c. ok henning@, "looks fine" canacar@
Revision tags: OPENBSD_3_6_BASE
# 1.14	22-Jun-2004	canacar	Add a new "filter drop" flag to bpf and related ioclts. When enabled, it notifies the calling interface that the packet matches a bpf filter and should be dropped. ok henning@ markus@ frantzen@
Revision tags: SMP_SYNC_A SMP_SYNC_B
# 1.13	28-May-2004	grange	bpf device cloning. Now to have more bpf devices just add device nodes in /dev, no need to recompile kernel anymore. Code from form@pdp-11.org.ru, some help from markus@. ok markus@ canacar@ deraadt@
# 1.12	08-May-2004	canacar	reference count bpf descriptors to protect against disappearing interfaces while asleep in read. ok deraadt@
Revision tags: OPENBSD_3_5_BASE
# 1.11	22-Oct-2003	canacar	Add locking and write filtering to bpf descriptors. Locking prevents dangerous ioctls such as changing the interface and sending signals to be executed by an unprivileged process. A filter can also be applied to packets injected through a bpf descriptor. These features allow programs using bpf descriptors to safely drop/seperate privileges. ok frantzen@ henning@ mcbride@
Revision tags: OPENBSD_3_4_BASE
# 1.10	02-Jun-2003	millert	Remove the advertising clause in the UCB license which Berkeley rescinded 22 July 1999. Proofed by myself and Theo.
Revision tags: OPENBSD_3_1_BASE OPENBSD_3_2_BASE OPENBSD_3_3_BASE UBC_SYNC_A UBC_SYNC_B
# 1.9	14-Mar-2002	millert	First round of __P removal in sys
Revision tags: OPENBSD_3_0_BASE UBC_BASE
# 1.8	09-Jun-2001	angelos	branches: 1.8.4; By popular demand, protect from multiple inclusion, and fix to use the same naming style.
# 1.7	28-May-2001	dugsong	add BIOC[GS]HDRCMPLT ioctl for BPF, to disable overwriting of link level source address in forged frames. from NetBSD. art@ok
Revision tags: OPENBSD_2_8_BASE OPENBSD_2_9_BASE
# 1.6	19-Jun-2000	jason	de-#ifdef-ize
Revision tags: OPENBSD_2_6_BASE OPENBSD_2_7_BASE SMP_BASE kame_19991208
# 1.5	08-Aug-1999	niklas	branches: 1.5.4; Support detaching of network interfaces. Still work to do in ipf, and other families than inet.
Revision tags: OPENBSD_2_4_BASE OPENBSD_2_5_BASE
# 1.4	26-Jun-1998	deraadt	fix bpf select(); from mts@rare.net
Revision tags: OPENBSD_2_2_BASE OPENBSD_2_3_BASE
# 1.3	31-Aug-1997	deraadt	for non-tty TIOCSPGRP/F_SETOWN/FIOSETOWN pgid setting calls, store uid and euid as well, then deliver them using new csignal() interface which ensures that pgid setting process is permitted to signal the pgid process(es). Thanks to newsham@aloha.net for extensive help and discussion.
Revision tags: OPENBSD_2_1_BASE
# 1.2	24-Feb-1997	niklas	OpenBSD tags + some prototyping police
# 1.1	18-Oct-1995	deraadt	branches: 1.1.1; Initial revision
# 1.47	09-Jul-2022	visa	Unwrap klist from struct selinfo as this code no longer uses selwakeup(). OK jsg@
Revision tags: OPENBSD_7_1_BASE
# 1.46	17-Mar-2022	visa	Use the refcnt API in bpf. OK sashan@ bluhm@
Revision tags: OPENBSD_6_9_BASE OPENBSD_7_0_BASE
# 1.45	21-Jan-2021	dlg	let vfs keep track of nonblocking state for us. ok claudio@ mvs@
# 1.44	02-Jan-2021	cheloha	bpf(4): remove ticks Change bd_rtout to a uint64_t of nanoseconds. Update the code in bpfioctl() and bpfread() accordingly. Add a local copy of nsecuptime() to make the diff smaller. This will need to move to kern_tc.c if/when we have another user elsewhere in the kernel. Prompted by mpi@. With input from dlg@. ok dlg@ mpi@ visa@
# 1.43	26-Dec-2020	cheloha	bpf(4): bpf_d struct: replace bd_rdStart member with bd_nreaders member bd_rdStart is strange. It nominally represents the start of a read(2) on a given bpf(4) descriptor, but there are several problems with it: 1. If there are multiple readers, the bd_rdStart is not set by subsequent readers, so their timeout is screwed up. The read timeout should really be tracked on a per-thread basis in bpfread(). 2. We set bd_rdStart for poll(2), select(2), and kevent(2), even though that makes no sense. We should not be setting bd_rdStart in bpfpoll() or bpfkqfilter(). 3. bd_rdStart is buggy. If ticks is 0 when the read starts then bpf_catchpacket() won't wake up the reader. This is a problem inherent to the design of bd_rdStart: it serves as both a boolean and a scalar value, even though 0 is a valid value in the scalar range. So let's replace it with a better struct member. "bd_nreaders" is a count of threads sleeping in bpfread(). It is incremented before a thread goes to sleep in bpfread() and decremented when a thread wakes up. If bd_nreaders is greater than zero when we reach bpf_catchpacket() and fbuf is non-NULL we wake up all readers. The read timeout, if any, is now tracked locally by the thread in bpfread(). Unlike bd_rdStart, bpfpoll() and bpfkqfilter() don't touch bd_nreaders. Prompted by mpi@. Basic idea from dlg@. Lots of input from dlg@. Tested by dlg@ with tcpdump(8) (blocking read) and flow-collector (https://github.com/eait-itig/flow-collector, non-blocking read). ok dlg@
# 1.42	11-Dec-2020	cheloha	bpf(4): BIOCGRTIMEOUT, BIOCSRTIMEOUT: protect bd_rtout with bd_mtx Reading and writing bd_rtout is not an atomic operation, so it needs to be done under the per-descriptor mutex. While here, start annotating locking in bpfdesc.h. There's lots more to do on this front, but you have to start somewhere. Tweaked by mpi@. ok mpi@
Revision tags: OPENBSD_6_8_BASE
# 1.41	13-May-2020	cheloha	bpf(4): separate descriptor non-blocking status from read timeout If you set FIONBIO on a bpf(4) descriptor you enable non-blocking mode and also clobber any read timeout set for the descriptor. The reverse is also true: do BIOCSRTIMEOUT and you'll set a timeout and simultaneously disable non-blocking status. The two are mutually exclusive. This relationship is undocumented and might cause a bug. At the very least it makes reasoning about the code difficult. This patch adds a new member to bpf_d, bd_rnonblock, to store the non-blocking status of the descriptor. The read timeout is still kept in bd_rtout. With this in place, non-blocking status and the read timeout can coexist. Setting one state does not clear the other, and vice versa. Separating the two states also clears the way for changing the bpf(4) read timeout to use the system clock instead of ticks. More on that in a later patch. With insight from dlg@ regarding the purpose of the read timeout. ok dlg@
Revision tags: OPENBSD_6_7_BASE
# 1.40	02-Jan-2020	claudio	Switch bpf to use pgsigio(9) and sigio_init(9) instead of handrolling something with csignal(). OK visa@
# 1.39	21-Oct-2019	sashan	put bpfdesc reference counting back, revert change introduced in 1.175 as: BPF: remove redundant reference counting of filedescriptors Anton@ made problem crystal clear: I've been looking into a similar bpf panic reported by syzkaller, which looks somewhat related. The one reported by syzkaller is caused by issuing ioctl(SIOCIFDESTROY) on the interface which the packet filter is attached to. This will in turn invoke the following functions expressed as an inverted stacktrace: 1. bpfsdetach() 2. vdevgone() 3. VOP_REVOKE() 4. vop_generic_revoke() 5. vgonel() 6. vclean(DOCLOSE) 7. VOP_CLOSE() 8. bpfclose() Note that bpfclose() is called before changing the vnode type. In bpfclose(), the `struct bpf_d` is immediately removed from the global bpf_d_list list and might end up sleeping inside taskq_barrier(systq). Since the bpf file descriptor (fd) is still present and valid, another thread could perform an ioctl() on the fd only to fault since bpfilter_lookup() will return NULL. The vnode is not locked in this path either so it won't end up waiting on the ongoing vclean(). Steps to trigger the similar type of panic are straightforward, let there be two processes running concurrently: process A: while true ; do ifconfig tun0 up ; ifconfig tun0 destroy ; done process B: while true ; do tcpdump -i tun0 ; done panic happens within few secs (Dell PowerEdge 710) OK @visa, OK @anton
Revision tags: OPENBSD_6_6_BASE
# 1.38	18-May-2019	sashan	branches: 1.38.2; BPF: remove redundant reference counting of filedescriptors OK visa@, OK mpi@
# 1.37	15-Apr-2019	sashan	moving BPF to RCU OK visa@
Revision tags: OPENBSD_6_3_BASE OPENBSD_6_4_BASE OPENBSD_6_5_BASE
# 1.36	24-Jan-2018	dlg	add support for bpf on "subsystems", not just network interfaces bpf assumed that it was being unconditionally attached to network interfaces, and maintained a pointer to a struct ifnet *. this was mostly used to get at the name of the interface, which is how userland asks to be attached to a particular interface. this diff adds a pointer to the name and uses it instead of the interface pointer for these lookups. this in turn allows bpf to be attached to arbitrary subsystems in the kernel which just have to supply a name rather than an interface pointer. for example, bpf could be attached to pf_test so you can see what packets are about to be filtered. mpi@ is using this to look at usb transfers. bpf still uses the interface pointer for bpfwrite, and for enabling and disabling promisc. however, these are nopped out for subsystems. ok mpi@
Revision tags: OPENBSD_6_1_BASE OPENBSD_6_2_BASE
# 1.35	24-Jan-2017	krw	A space here, a space there. Soon we're talking real whitespace rectification.
# 1.34	09-Jan-2017	mpi	Use a mutex to serialize accesses to buffer slots. With this change bpf_catchpacket() no longer need the KERNEL_LOCK(). Tested by Hrvoje Popovski who reported a recursion in the previous attempt. ok bluhm@
# 1.33	03-Jan-2017	mpi	Revert previous, there's still a problem with recursive entries in bpf_mpath_ether(). Problem reported by Hrvoje Popovski.
# 1.32	02-Jan-2017	mpi	Use a mutex to serialize accesses to buffer slots. With this change bpf_catchpacket() no longer need the KERNEL_LOCK(). ok bluhm@, jmatthew@
# 1.31	22-Aug-2016	mpi	Call csignal() and selwakeup() from a KERNEL_LOCK'd task. This will allow us make bpf_tap() KERNEL_LOCK() free. Discussed with dlg@ and input from guenther@
Revision tags: OPENBSD_6_0_BASE
# 1.30	30-Mar-2016	dlg	remove support for BIOCGQUEUE and BIOSGQUEUE nothing uses them, and the implementation make incorrect assumptions about mbufs within bpf processing that could lead to some weird failures. ok sthen@ deraadt@ mpi@
Revision tags: OPENBSD_5_9_BASE
# 1.29	03-Dec-2015	mpi	Use SRPL_HEAD() and SRPL_ENTRY() to be consistent with and allow to fallback to a SLIST. ok dlg@, jasper@
# 1.28	09-Sep-2015	dlg	convert bpf to using an srp list for the list of descriptors. this replaces the hand rolled list. the code has always used hand rolled lists, but that gets a bit cumbersome when theyre SRPs. requested ages ago by mpi@
# 1.27	01-Sep-2015	dlg	reintroduce bpf.c r1.121. this differs slightly from 1.121 in that it uses the new srp_follow() to walk the list of descriptors on an interface. this is instead of interleaving srp_enter() and srp_leave(), which can lead to races and corruption if you're touching the same SRPs at different IPLs on the same CPU. ok deraadt@ jmatthew@
# 1.26	23-Aug-2015	dlg	back out bpf+srp. its blowing up in a bridge setup. ill debug this out of the tree.
# 1.25	16-Aug-2015	dlg	make bpf_mtap mpsafe by using SRPs. this was originally implemented by jmatthew@ last year, and updated by us both during s2k15. there are four data structures that need to be looked after. the first is the bpf interface itself. it is allocated and freed at the same time as an actual interface, so if you're able to send or receive packets, you're able to run bpf on an interface too. dont need to do any work there. the second are bpf descriptors. these represent userland attaching to a bpf interface, so you can have many of them on a single bpf interface. they were arranged in a singly linked list before. now the head and next pointers are replaced with SRP pointers and followed by srp_enter. the list updates are serialised by the kernel lock. the third are the bpf filters. there is an inbound and outbound filter on each bpf descriptor, ann a process can replace them at any time. the pointers from the descriptor to those is also changed to be accessed via srp_enter. updates are serialised by the kernel lock. the fourth thing is the ring that bpf writes to for userland to read. there's one of these per descriptor. because these are only updated when a filter matches (which is hopefully a relatively rare event), we take the kernel lock to serialise the writes to the ring. all this together means you can run bpf against a packet without taking the kernel lock unless you actually caught a packet and need to send it to userland. even better, you can run bpf in parallel, so if we ever support multiple rings on a single interface, we can run bpf on each ring on different cpus safely. ive hit this pretty hard in production at work (yay dhcrelay) on myx (which does rx outside the biglock). ok jmatthew@ mpi@ millert@
Revision tags: OPENBSD_5_7_BASE OPENBSD_5_8_BASE
# 1.24	10-Feb-2015	pelikan	make bpf(4) able to filter based on a pf(4) queue ID for tcpdump -Q qname ALTQ version has been on tech@ for years, people were generally ok with it. ok henning
# 1.23	05-Oct-2014	lteo	fix typo in comment: correspoding -> corresponding
Revision tags: OPENBSD_5_5_BASE OPENBSD_5_6_BASE
# 1.22	18-Dec-2013	krw	Revert the other part of bpf.c's r1.84. May finally fix RD Thrush's encounter with "timeout_add: to_ticks (-1) < 0". Pointed out by RD Thrush.
# 1.21	12-Nov-2013	dlg	try bpf.c r1.84 again, this time without semantic changes to if statements. cheers to sthen@ and krw@ for properly dealing with the fallout of my first commit.
# 1.20	11-Nov-2013	sthen	Revert bpf.c 1.84 / bpfdesc.h 1.19 for now, "panic: timeout_add: to_ticks (-1) < 0" seen by RD Thrush, http://article.gmane.org/gmane.os.openbsd.bugs/20113 where he has a long-running process using bpf which is active at the time of panic. krw@ agrees with reverting for now.
# 1.19	11-Nov-2013	dlg	replace the user of ticks in a condition like "interval + start < ticks" with "ticks - start > interval" because the latter copes with the ticks value wrapping. pointed out by guenther@ ok krw@
# 1.18	24-Oct-2013	deraadt	Move obvious kernel prototypes (and structure's with kernel pointers, obviously only used in the kernel) behind #ifdef _KERNEL
Revision tags: OPENBSD_4_0_BASE OPENBSD_4_1_BASE OPENBSD_4_2_BASE OPENBSD_4_3_BASE OPENBSD_4_4_BASE OPENBSD_4_5_BASE OPENBSD_4_6_BASE OPENBSD_4_7_BASE OPENBSD_4_8_BASE OPENBSD_4_9_BASE OPENBSD_5_0_BASE OPENBSD_5_1_BASE OPENBSD_5_2_BASE OPENBSD_5_3_BASE OPENBSD_5_4_BASE
# 1.17	25-Mar-2006	djm	allow bpf(4) to ignore packets based on their direction (inbound or outbound), using a new BIOCSDIRFILT ioctl; guidance, feedback and ok canacar@
Revision tags: OPENBSD_3_9_BASE
# 1.16	21-Nov-2005	millert	Move contents of sys/select.h to sys/selinfo.h in preparation for a userland-visible sys/select.h. Consistent with what Net and Free do. OK deraadt@, tested with full ports build by naddy@.
Revision tags: OPENBSD_3_7_BASE OPENBSD_3_8_BASE
# 1.15	17-Dec-2004	reyk	knf cleanup, convert old k&r-style functions to ansi-style for a consistent style in sys/net/bpf.c. ok henning@, "looks fine" canacar@
Revision tags: OPENBSD_3_6_BASE
# 1.14	22-Jun-2004	canacar	Add a new "filter drop" flag to bpf and related ioclts. When enabled, it notifies the calling interface that the packet matches a bpf filter and should be dropped. ok henning@ markus@ frantzen@
Revision tags: SMP_SYNC_A SMP_SYNC_B
# 1.13	28-May-2004	grange	bpf device cloning. Now to have more bpf devices just add device nodes in /dev, no need to recompile kernel anymore. Code from form@pdp-11.org.ru, some help from markus@. ok markus@ canacar@ deraadt@
# 1.12	08-May-2004	canacar	reference count bpf descriptors to protect against disappearing interfaces while asleep in read. ok deraadt@
Revision tags: OPENBSD_3_5_BASE
# 1.11	22-Oct-2003	canacar	Add locking and write filtering to bpf descriptors. Locking prevents dangerous ioctls such as changing the interface and sending signals to be executed by an unprivileged process. A filter can also be applied to packets injected through a bpf descriptor. These features allow programs using bpf descriptors to safely drop/seperate privileges. ok frantzen@ henning@ mcbride@
Revision tags: OPENBSD_3_4_BASE
# 1.10	02-Jun-2003	millert	Remove the advertising clause in the UCB license which Berkeley rescinded 22 July 1999. Proofed by myself and Theo.
Revision tags: OPENBSD_3_1_BASE OPENBSD_3_2_BASE OPENBSD_3_3_BASE UBC_SYNC_A UBC_SYNC_B
# 1.9	14-Mar-2002	millert	First round of __P removal in sys
Revision tags: OPENBSD_3_0_BASE UBC_BASE
# 1.8	09-Jun-2001	angelos	branches: 1.8.4; By popular demand, protect from multiple inclusion, and fix to use the same naming style.
# 1.7	28-May-2001	dugsong	add BIOC[GS]HDRCMPLT ioctl for BPF, to disable overwriting of link level source address in forged frames. from NetBSD. art@ok
Revision tags: OPENBSD_2_8_BASE OPENBSD_2_9_BASE
# 1.6	19-Jun-2000	jason	de-#ifdef-ize
Revision tags: OPENBSD_2_6_BASE OPENBSD_2_7_BASE SMP_BASE kame_19991208
# 1.5	08-Aug-1999	niklas	branches: 1.5.4; Support detaching of network interfaces. Still work to do in ipf, and other families than inet.
Revision tags: OPENBSD_2_4_BASE OPENBSD_2_5_BASE
# 1.4	26-Jun-1998	deraadt	fix bpf select(); from mts@rare.net
Revision tags: OPENBSD_2_2_BASE OPENBSD_2_3_BASE
# 1.3	31-Aug-1997	deraadt	for non-tty TIOCSPGRP/F_SETOWN/FIOSETOWN pgid setting calls, store uid and euid as well, then deliver them using new csignal() interface which ensures that pgid setting process is permitted to signal the pgid process(es). Thanks to newsham@aloha.net for extensive help and discussion.
Revision tags: OPENBSD_2_1_BASE
# 1.2	24-Feb-1997	niklas	OpenBSD tags + some prototyping police
# 1.1	18-Oct-1995	deraadt	branches: 1.1.1; Initial revision
# 1.46	17-Mar-2022	visa	Use the refcnt API in bpf. OK sashan@ bluhm@
Revision tags: OPENBSD_6_9_BASE OPENBSD_7_0_BASE
# 1.45	21-Jan-2021	dlg	let vfs keep track of nonblocking state for us. ok claudio@ mvs@
# 1.44	02-Jan-2021	cheloha	bpf(4): remove ticks Change bd_rtout to a uint64_t of nanoseconds. Update the code in bpfioctl() and bpfread() accordingly. Add a local copy of nsecuptime() to make the diff smaller. This will need to move to kern_tc.c if/when we have another user elsewhere in the kernel. Prompted by mpi@. With input from dlg@. ok dlg@ mpi@ visa@
# 1.43	26-Dec-2020	cheloha	bpf(4): bpf_d struct: replace bd_rdStart member with bd_nreaders member bd_rdStart is strange. It nominally represents the start of a read(2) on a given bpf(4) descriptor, but there are several problems with it: 1. If there are multiple readers, the bd_rdStart is not set by subsequent readers, so their timeout is screwed up. The read timeout should really be tracked on a per-thread basis in bpfread(). 2. We set bd_rdStart for poll(2), select(2), and kevent(2), even though that makes no sense. We should not be setting bd_rdStart in bpfpoll() or bpfkqfilter(). 3. bd_rdStart is buggy. If ticks is 0 when the read starts then bpf_catchpacket() won't wake up the reader. This is a problem inherent to the design of bd_rdStart: it serves as both a boolean and a scalar value, even though 0 is a valid value in the scalar range. So let's replace it with a better struct member. "bd_nreaders" is a count of threads sleeping in bpfread(). It is incremented before a thread goes to sleep in bpfread() and decremented when a thread wakes up. If bd_nreaders is greater than zero when we reach bpf_catchpacket() and fbuf is non-NULL we wake up all readers. The read timeout, if any, is now tracked locally by the thread in bpfread(). Unlike bd_rdStart, bpfpoll() and bpfkqfilter() don't touch bd_nreaders. Prompted by mpi@. Basic idea from dlg@. Lots of input from dlg@. Tested by dlg@ with tcpdump(8) (blocking read) and flow-collector (https://github.com/eait-itig/flow-collector, non-blocking read). ok dlg@
# 1.42	11-Dec-2020	cheloha	bpf(4): BIOCGRTIMEOUT, BIOCSRTIMEOUT: protect bd_rtout with bd_mtx Reading and writing bd_rtout is not an atomic operation, so it needs to be done under the per-descriptor mutex. While here, start annotating locking in bpfdesc.h. There's lots more to do on this front, but you have to start somewhere. Tweaked by mpi@. ok mpi@
Revision tags: OPENBSD_6_8_BASE
# 1.41	13-May-2020	cheloha	bpf(4): separate descriptor non-blocking status from read timeout If you set FIONBIO on a bpf(4) descriptor you enable non-blocking mode and also clobber any read timeout set for the descriptor. The reverse is also true: do BIOCSRTIMEOUT and you'll set a timeout and simultaneously disable non-blocking status. The two are mutually exclusive. This relationship is undocumented and might cause a bug. At the very least it makes reasoning about the code difficult. This patch adds a new member to bpf_d, bd_rnonblock, to store the non-blocking status of the descriptor. The read timeout is still kept in bd_rtout. With this in place, non-blocking status and the read timeout can coexist. Setting one state does not clear the other, and vice versa. Separating the two states also clears the way for changing the bpf(4) read timeout to use the system clock instead of ticks. More on that in a later patch. With insight from dlg@ regarding the purpose of the read timeout. ok dlg@
Revision tags: OPENBSD_6_7_BASE
# 1.40	02-Jan-2020	claudio	Switch bpf to use pgsigio(9) and sigio_init(9) instead of handrolling something with csignal(). OK visa@
# 1.39	21-Oct-2019	sashan	put bpfdesc reference counting back, revert change introduced in 1.175 as: BPF: remove redundant reference counting of filedescriptors Anton@ made problem crystal clear: I've been looking into a similar bpf panic reported by syzkaller, which looks somewhat related. The one reported by syzkaller is caused by issuing ioctl(SIOCIFDESTROY) on the interface which the packet filter is attached to. This will in turn invoke the following functions expressed as an inverted stacktrace: 1. bpfsdetach() 2. vdevgone() 3. VOP_REVOKE() 4. vop_generic_revoke() 5. vgonel() 6. vclean(DOCLOSE) 7. VOP_CLOSE() 8. bpfclose() Note that bpfclose() is called before changing the vnode type. In bpfclose(), the `struct bpf_d` is immediately removed from the global bpf_d_list list and might end up sleeping inside taskq_barrier(systq). Since the bpf file descriptor (fd) is still present and valid, another thread could perform an ioctl() on the fd only to fault since bpfilter_lookup() will return NULL. The vnode is not locked in this path either so it won't end up waiting on the ongoing vclean(). Steps to trigger the similar type of panic are straightforward, let there be two processes running concurrently: process A: while true ; do ifconfig tun0 up ; ifconfig tun0 destroy ; done process B: while true ; do tcpdump -i tun0 ; done panic happens within few secs (Dell PowerEdge 710) OK @visa, OK @anton
Revision tags: OPENBSD_6_6_BASE
# 1.38	18-May-2019	sashan	branches: 1.38.2; BPF: remove redundant reference counting of filedescriptors OK visa@, OK mpi@
# 1.37	15-Apr-2019	sashan	moving BPF to RCU OK visa@
Revision tags: OPENBSD_6_3_BASE OPENBSD_6_4_BASE OPENBSD_6_5_BASE
# 1.36	24-Jan-2018	dlg	add support for bpf on "subsystems", not just network interfaces bpf assumed that it was being unconditionally attached to network interfaces, and maintained a pointer to a struct ifnet *. this was mostly used to get at the name of the interface, which is how userland asks to be attached to a particular interface. this diff adds a pointer to the name and uses it instead of the interface pointer for these lookups. this in turn allows bpf to be attached to arbitrary subsystems in the kernel which just have to supply a name rather than an interface pointer. for example, bpf could be attached to pf_test so you can see what packets are about to be filtered. mpi@ is using this to look at usb transfers. bpf still uses the interface pointer for bpfwrite, and for enabling and disabling promisc. however, these are nopped out for subsystems. ok mpi@
Revision tags: OPENBSD_6_1_BASE OPENBSD_6_2_BASE
# 1.35	24-Jan-2017	krw	A space here, a space there. Soon we're talking real whitespace rectification.
# 1.34	09-Jan-2017	mpi	Use a mutex to serialize accesses to buffer slots. With this change bpf_catchpacket() no longer need the KERNEL_LOCK(). Tested by Hrvoje Popovski who reported a recursion in the previous attempt. ok bluhm@
# 1.33	03-Jan-2017	mpi	Revert previous, there's still a problem with recursive entries in bpf_mpath_ether(). Problem reported by Hrvoje Popovski.
# 1.32	02-Jan-2017	mpi	Use a mutex to serialize accesses to buffer slots. With this change bpf_catchpacket() no longer need the KERNEL_LOCK(). ok bluhm@, jmatthew@
# 1.31	22-Aug-2016	mpi	Call csignal() and selwakeup() from a KERNEL_LOCK'd task. This will allow us make bpf_tap() KERNEL_LOCK() free. Discussed with dlg@ and input from guenther@
Revision tags: OPENBSD_6_0_BASE
# 1.30	30-Mar-2016	dlg	remove support for BIOCGQUEUE and BIOSGQUEUE nothing uses them, and the implementation make incorrect assumptions about mbufs within bpf processing that could lead to some weird failures. ok sthen@ deraadt@ mpi@
Revision tags: OPENBSD_5_9_BASE
# 1.29	03-Dec-2015	mpi	Use SRPL_HEAD() and SRPL_ENTRY() to be consistent with and allow to fallback to a SLIST. ok dlg@, jasper@
# 1.28	09-Sep-2015	dlg	convert bpf to using an srp list for the list of descriptors. this replaces the hand rolled list. the code has always used hand rolled lists, but that gets a bit cumbersome when theyre SRPs. requested ages ago by mpi@
# 1.27	01-Sep-2015	dlg	reintroduce bpf.c r1.121. this differs slightly from 1.121 in that it uses the new srp_follow() to walk the list of descriptors on an interface. this is instead of interleaving srp_enter() and srp_leave(), which can lead to races and corruption if you're touching the same SRPs at different IPLs on the same CPU. ok deraadt@ jmatthew@
# 1.26	23-Aug-2015	dlg	back out bpf+srp. its blowing up in a bridge setup. ill debug this out of the tree.
# 1.25	16-Aug-2015	dlg	make bpf_mtap mpsafe by using SRPs. this was originally implemented by jmatthew@ last year, and updated by us both during s2k15. there are four data structures that need to be looked after. the first is the bpf interface itself. it is allocated and freed at the same time as an actual interface, so if you're able to send or receive packets, you're able to run bpf on an interface too. dont need to do any work there. the second are bpf descriptors. these represent userland attaching to a bpf interface, so you can have many of them on a single bpf interface. they were arranged in a singly linked list before. now the head and next pointers are replaced with SRP pointers and followed by srp_enter. the list updates are serialised by the kernel lock. the third are the bpf filters. there is an inbound and outbound filter on each bpf descriptor, ann a process can replace them at any time. the pointers from the descriptor to those is also changed to be accessed via srp_enter. updates are serialised by the kernel lock. the fourth thing is the ring that bpf writes to for userland to read. there's one of these per descriptor. because these are only updated when a filter matches (which is hopefully a relatively rare event), we take the kernel lock to serialise the writes to the ring. all this together means you can run bpf against a packet without taking the kernel lock unless you actually caught a packet and need to send it to userland. even better, you can run bpf in parallel, so if we ever support multiple rings on a single interface, we can run bpf on each ring on different cpus safely. ive hit this pretty hard in production at work (yay dhcrelay) on myx (which does rx outside the biglock). ok jmatthew@ mpi@ millert@
Revision tags: OPENBSD_5_7_BASE OPENBSD_5_8_BASE
# 1.24	10-Feb-2015	pelikan	make bpf(4) able to filter based on a pf(4) queue ID for tcpdump -Q qname ALTQ version has been on tech@ for years, people were generally ok with it. ok henning
# 1.23	05-Oct-2014	lteo	fix typo in comment: correspoding -> corresponding
Revision tags: OPENBSD_5_5_BASE OPENBSD_5_6_BASE
# 1.22	18-Dec-2013	krw	Revert the other part of bpf.c's r1.84. May finally fix RD Thrush's encounter with "timeout_add: to_ticks (-1) < 0". Pointed out by RD Thrush.
# 1.21	12-Nov-2013	dlg	try bpf.c r1.84 again, this time without semantic changes to if statements. cheers to sthen@ and krw@ for properly dealing with the fallout of my first commit.
# 1.20	11-Nov-2013	sthen	Revert bpf.c 1.84 / bpfdesc.h 1.19 for now, "panic: timeout_add: to_ticks (-1) < 0" seen by RD Thrush, http://article.gmane.org/gmane.os.openbsd.bugs/20113 where he has a long-running process using bpf which is active at the time of panic. krw@ agrees with reverting for now.
# 1.19	11-Nov-2013	dlg	replace the user of ticks in a condition like "interval + start < ticks" with "ticks - start > interval" because the latter copes with the ticks value wrapping. pointed out by guenther@ ok krw@
# 1.18	24-Oct-2013	deraadt	Move obvious kernel prototypes (and structure's with kernel pointers, obviously only used in the kernel) behind #ifdef _KERNEL
Revision tags: OPENBSD_4_0_BASE OPENBSD_4_1_BASE OPENBSD_4_2_BASE OPENBSD_4_3_BASE OPENBSD_4_4_BASE OPENBSD_4_5_BASE OPENBSD_4_6_BASE OPENBSD_4_7_BASE OPENBSD_4_8_BASE OPENBSD_4_9_BASE OPENBSD_5_0_BASE OPENBSD_5_1_BASE OPENBSD_5_2_BASE OPENBSD_5_3_BASE OPENBSD_5_4_BASE
# 1.17	25-Mar-2006	djm	allow bpf(4) to ignore packets based on their direction (inbound or outbound), using a new BIOCSDIRFILT ioctl; guidance, feedback and ok canacar@
Revision tags: OPENBSD_3_9_BASE
# 1.16	21-Nov-2005	millert	Move contents of sys/select.h to sys/selinfo.h in preparation for a userland-visible sys/select.h. Consistent with what Net and Free do. OK deraadt@, tested with full ports build by naddy@.
Revision tags: OPENBSD_3_7_BASE OPENBSD_3_8_BASE
# 1.15	17-Dec-2004	reyk	knf cleanup, convert old k&r-style functions to ansi-style for a consistent style in sys/net/bpf.c. ok henning@, "looks fine" canacar@
Revision tags: OPENBSD_3_6_BASE
# 1.14	22-Jun-2004	canacar	Add a new "filter drop" flag to bpf and related ioclts. When enabled, it notifies the calling interface that the packet matches a bpf filter and should be dropped. ok henning@ markus@ frantzen@
Revision tags: SMP_SYNC_A SMP_SYNC_B
# 1.13	28-May-2004	grange	bpf device cloning. Now to have more bpf devices just add device nodes in /dev, no need to recompile kernel anymore. Code from form@pdp-11.org.ru, some help from markus@. ok markus@ canacar@ deraadt@
# 1.12	08-May-2004	canacar	reference count bpf descriptors to protect against disappearing interfaces while asleep in read. ok deraadt@
Revision tags: OPENBSD_3_5_BASE
# 1.11	22-Oct-2003	canacar	Add locking and write filtering to bpf descriptors. Locking prevents dangerous ioctls such as changing the interface and sending signals to be executed by an unprivileged process. A filter can also be applied to packets injected through a bpf descriptor. These features allow programs using bpf descriptors to safely drop/seperate privileges. ok frantzen@ henning@ mcbride@
Revision tags: OPENBSD_3_4_BASE
# 1.10	02-Jun-2003	millert	Remove the advertising clause in the UCB license which Berkeley rescinded 22 July 1999. Proofed by myself and Theo.
Revision tags: OPENBSD_3_1_BASE OPENBSD_3_2_BASE OPENBSD_3_3_BASE UBC_SYNC_A UBC_SYNC_B
# 1.9	14-Mar-2002	millert	First round of __P removal in sys
Revision tags: OPENBSD_3_0_BASE UBC_BASE
# 1.8	09-Jun-2001	angelos	branches: 1.8.4; By popular demand, protect from multiple inclusion, and fix to use the same naming style.
# 1.7	28-May-2001	dugsong	add BIOC[GS]HDRCMPLT ioctl for BPF, to disable overwriting of link level source address in forged frames. from NetBSD. art@ok
Revision tags: OPENBSD_2_8_BASE OPENBSD_2_9_BASE
# 1.6	19-Jun-2000	jason	de-#ifdef-ize
Revision tags: OPENBSD_2_6_BASE OPENBSD_2_7_BASE SMP_BASE kame_19991208
# 1.5	08-Aug-1999	niklas	branches: 1.5.4; Support detaching of network interfaces. Still work to do in ipf, and other families than inet.
Revision tags: OPENBSD_2_4_BASE OPENBSD_2_5_BASE
# 1.4	26-Jun-1998	deraadt	fix bpf select(); from mts@rare.net
Revision tags: OPENBSD_2_2_BASE OPENBSD_2_3_BASE
# 1.3	31-Aug-1997	deraadt	for non-tty TIOCSPGRP/F_SETOWN/FIOSETOWN pgid setting calls, store uid and euid as well, then deliver them using new csignal() interface which ensures that pgid setting process is permitted to signal the pgid process(es). Thanks to newsham@aloha.net for extensive help and discussion.
Revision tags: OPENBSD_2_1_BASE
# 1.2	24-Feb-1997	niklas	OpenBSD tags + some prototyping police
# 1.1	18-Oct-1995	deraadt	branches: 1.1.1; Initial revision
# 1.45	21-Jan-2021	dlg	let vfs keep track of nonblocking state for us. ok claudio@ mvs@
# 1.44	02-Jan-2021	cheloha	bpf(4): remove ticks Change bd_rtout to a uint64_t of nanoseconds. Update the code in bpfioctl() and bpfread() accordingly. Add a local copy of nsecuptime() to make the diff smaller. This will need to move to kern_tc.c if/when we have another user elsewhere in the kernel. Prompted by mpi@. With input from dlg@. ok dlg@ mpi@ visa@
# 1.43	26-Dec-2020	cheloha	bpf(4): bpf_d struct: replace bd_rdStart member with bd_nreaders member bd_rdStart is strange. It nominally represents the start of a read(2) on a given bpf(4) descriptor, but there are several problems with it: 1. If there are multiple readers, the bd_rdStart is not set by subsequent readers, so their timeout is screwed up. The read timeout should really be tracked on a per-thread basis in bpfread(). 2. We set bd_rdStart for poll(2), select(2), and kevent(2), even though that makes no sense. We should not be setting bd_rdStart in bpfpoll() or bpfkqfilter(). 3. bd_rdStart is buggy. If ticks is 0 when the read starts then bpf_catchpacket() won't wake up the reader. This is a problem inherent to the design of bd_rdStart: it serves as both a boolean and a scalar value, even though 0 is a valid value in the scalar range. So let's replace it with a better struct member. "bd_nreaders" is a count of threads sleeping in bpfread(). It is incremented before a thread goes to sleep in bpfread() and decremented when a thread wakes up. If bd_nreaders is greater than zero when we reach bpf_catchpacket() and fbuf is non-NULL we wake up all readers. The read timeout, if any, is now tracked locally by the thread in bpfread(). Unlike bd_rdStart, bpfpoll() and bpfkqfilter() don't touch bd_nreaders. Prompted by mpi@. Basic idea from dlg@. Lots of input from dlg@. Tested by dlg@ with tcpdump(8) (blocking read) and flow-collector (https://github.com/eait-itig/flow-collector, non-blocking read). ok dlg@
# 1.42	11-Dec-2020	cheloha	bpf(4): BIOCGRTIMEOUT, BIOCSRTIMEOUT: protect bd_rtout with bd_mtx Reading and writing bd_rtout is not an atomic operation, so it needs to be done under the per-descriptor mutex. While here, start annotating locking in bpfdesc.h. There's lots more to do on this front, but you have to start somewhere. Tweaked by mpi@. ok mpi@
Revision tags: OPENBSD_6_8_BASE
# 1.41	13-May-2020	cheloha	bpf(4): separate descriptor non-blocking status from read timeout If you set FIONBIO on a bpf(4) descriptor you enable non-blocking mode and also clobber any read timeout set for the descriptor. The reverse is also true: do BIOCSRTIMEOUT and you'll set a timeout and simultaneously disable non-blocking status. The two are mutually exclusive. This relationship is undocumented and might cause a bug. At the very least it makes reasoning about the code difficult. This patch adds a new member to bpf_d, bd_rnonblock, to store the non-blocking status of the descriptor. The read timeout is still kept in bd_rtout. With this in place, non-blocking status and the read timeout can coexist. Setting one state does not clear the other, and vice versa. Separating the two states also clears the way for changing the bpf(4) read timeout to use the system clock instead of ticks. More on that in a later patch. With insight from dlg@ regarding the purpose of the read timeout. ok dlg@
Revision tags: OPENBSD_6_7_BASE
# 1.40	02-Jan-2020	claudio	Switch bpf to use pgsigio(9) and sigio_init(9) instead of handrolling something with csignal(). OK visa@
# 1.39	21-Oct-2019	sashan	put bpfdesc reference counting back, revert change introduced in 1.175 as: BPF: remove redundant reference counting of filedescriptors Anton@ made problem crystal clear: I've been looking into a similar bpf panic reported by syzkaller, which looks somewhat related. The one reported by syzkaller is caused by issuing ioctl(SIOCIFDESTROY) on the interface which the packet filter is attached to. This will in turn invoke the following functions expressed as an inverted stacktrace: 1. bpfsdetach() 2. vdevgone() 3. VOP_REVOKE() 4. vop_generic_revoke() 5. vgonel() 6. vclean(DOCLOSE) 7. VOP_CLOSE() 8. bpfclose() Note that bpfclose() is called before changing the vnode type. In bpfclose(), the `struct bpf_d` is immediately removed from the global bpf_d_list list and might end up sleeping inside taskq_barrier(systq). Since the bpf file descriptor (fd) is still present and valid, another thread could perform an ioctl() on the fd only to fault since bpfilter_lookup() will return NULL. The vnode is not locked in this path either so it won't end up waiting on the ongoing vclean(). Steps to trigger the similar type of panic are straightforward, let there be two processes running concurrently: process A: while true ; do ifconfig tun0 up ; ifconfig tun0 destroy ; done process B: while true ; do tcpdump -i tun0 ; done panic happens within few secs (Dell PowerEdge 710) OK @visa, OK @anton
Revision tags: OPENBSD_6_6_BASE
# 1.38	18-May-2019	sashan	branches: 1.38.2; BPF: remove redundant reference counting of filedescriptors OK visa@, OK mpi@
# 1.37	15-Apr-2019	sashan	moving BPF to RCU OK visa@
Revision tags: OPENBSD_6_3_BASE OPENBSD_6_4_BASE OPENBSD_6_5_BASE
# 1.36	24-Jan-2018	dlg	add support for bpf on "subsystems", not just network interfaces bpf assumed that it was being unconditionally attached to network interfaces, and maintained a pointer to a struct ifnet *. this was mostly used to get at the name of the interface, which is how userland asks to be attached to a particular interface. this diff adds a pointer to the name and uses it instead of the interface pointer for these lookups. this in turn allows bpf to be attached to arbitrary subsystems in the kernel which just have to supply a name rather than an interface pointer. for example, bpf could be attached to pf_test so you can see what packets are about to be filtered. mpi@ is using this to look at usb transfers. bpf still uses the interface pointer for bpfwrite, and for enabling and disabling promisc. however, these are nopped out for subsystems. ok mpi@
Revision tags: OPENBSD_6_1_BASE OPENBSD_6_2_BASE
# 1.35	24-Jan-2017	krw	A space here, a space there. Soon we're talking real whitespace rectification.
# 1.34	09-Jan-2017	mpi	Use a mutex to serialize accesses to buffer slots. With this change bpf_catchpacket() no longer need the KERNEL_LOCK(). Tested by Hrvoje Popovski who reported a recursion in the previous attempt. ok bluhm@
# 1.33	03-Jan-2017	mpi	Revert previous, there's still a problem with recursive entries in bpf_mpath_ether(). Problem reported by Hrvoje Popovski.
# 1.32	02-Jan-2017	mpi	Use a mutex to serialize accesses to buffer slots. With this change bpf_catchpacket() no longer need the KERNEL_LOCK(). ok bluhm@, jmatthew@
# 1.31	22-Aug-2016	mpi	Call csignal() and selwakeup() from a KERNEL_LOCK'd task. This will allow us make bpf_tap() KERNEL_LOCK() free. Discussed with dlg@ and input from guenther@
Revision tags: OPENBSD_6_0_BASE
# 1.30	30-Mar-2016	dlg	remove support for BIOCGQUEUE and BIOSGQUEUE nothing uses them, and the implementation make incorrect assumptions about mbufs within bpf processing that could lead to some weird failures. ok sthen@ deraadt@ mpi@
Revision tags: OPENBSD_5_9_BASE
# 1.29	03-Dec-2015	mpi	Use SRPL_HEAD() and SRPL_ENTRY() to be consistent with and allow to fallback to a SLIST. ok dlg@, jasper@
# 1.28	09-Sep-2015	dlg	convert bpf to using an srp list for the list of descriptors. this replaces the hand rolled list. the code has always used hand rolled lists, but that gets a bit cumbersome when theyre SRPs. requested ages ago by mpi@
# 1.27	01-Sep-2015	dlg	reintroduce bpf.c r1.121. this differs slightly from 1.121 in that it uses the new srp_follow() to walk the list of descriptors on an interface. this is instead of interleaving srp_enter() and srp_leave(), which can lead to races and corruption if you're touching the same SRPs at different IPLs on the same CPU. ok deraadt@ jmatthew@
# 1.26	23-Aug-2015	dlg	back out bpf+srp. its blowing up in a bridge setup. ill debug this out of the tree.
# 1.25	16-Aug-2015	dlg	make bpf_mtap mpsafe by using SRPs. this was originally implemented by jmatthew@ last year, and updated by us both during s2k15. there are four data structures that need to be looked after. the first is the bpf interface itself. it is allocated and freed at the same time as an actual interface, so if you're able to send or receive packets, you're able to run bpf on an interface too. dont need to do any work there. the second are bpf descriptors. these represent userland attaching to a bpf interface, so you can have many of them on a single bpf interface. they were arranged in a singly linked list before. now the head and next pointers are replaced with SRP pointers and followed by srp_enter. the list updates are serialised by the kernel lock. the third are the bpf filters. there is an inbound and outbound filter on each bpf descriptor, ann a process can replace them at any time. the pointers from the descriptor to those is also changed to be accessed via srp_enter. updates are serialised by the kernel lock. the fourth thing is the ring that bpf writes to for userland to read. there's one of these per descriptor. because these are only updated when a filter matches (which is hopefully a relatively rare event), we take the kernel lock to serialise the writes to the ring. all this together means you can run bpf against a packet without taking the kernel lock unless you actually caught a packet and need to send it to userland. even better, you can run bpf in parallel, so if we ever support multiple rings on a single interface, we can run bpf on each ring on different cpus safely. ive hit this pretty hard in production at work (yay dhcrelay) on myx (which does rx outside the biglock). ok jmatthew@ mpi@ millert@
Revision tags: OPENBSD_5_7_BASE OPENBSD_5_8_BASE
# 1.24	10-Feb-2015	pelikan	make bpf(4) able to filter based on a pf(4) queue ID for tcpdump -Q qname ALTQ version has been on tech@ for years, people were generally ok with it. ok henning
# 1.23	05-Oct-2014	lteo	fix typo in comment: correspoding -> corresponding
Revision tags: OPENBSD_5_5_BASE OPENBSD_5_6_BASE
# 1.22	18-Dec-2013	krw	Revert the other part of bpf.c's r1.84. May finally fix RD Thrush's encounter with "timeout_add: to_ticks (-1) < 0". Pointed out by RD Thrush.
# 1.21	12-Nov-2013	dlg	try bpf.c r1.84 again, this time without semantic changes to if statements. cheers to sthen@ and krw@ for properly dealing with the fallout of my first commit.
# 1.20	11-Nov-2013	sthen	Revert bpf.c 1.84 / bpfdesc.h 1.19 for now, "panic: timeout_add: to_ticks (-1) < 0" seen by RD Thrush, http://article.gmane.org/gmane.os.openbsd.bugs/20113 where he has a long-running process using bpf which is active at the time of panic. krw@ agrees with reverting for now.
# 1.19	11-Nov-2013	dlg	replace the user of ticks in a condition like "interval + start < ticks" with "ticks - start > interval" because the latter copes with the ticks value wrapping. pointed out by guenther@ ok krw@
# 1.18	24-Oct-2013	deraadt	Move obvious kernel prototypes (and structure's with kernel pointers, obviously only used in the kernel) behind #ifdef _KERNEL
Revision tags: OPENBSD_4_0_BASE OPENBSD_4_1_BASE OPENBSD_4_2_BASE OPENBSD_4_3_BASE OPENBSD_4_4_BASE OPENBSD_4_5_BASE OPENBSD_4_6_BASE OPENBSD_4_7_BASE OPENBSD_4_8_BASE OPENBSD_4_9_BASE OPENBSD_5_0_BASE OPENBSD_5_1_BASE OPENBSD_5_2_BASE OPENBSD_5_3_BASE OPENBSD_5_4_BASE
# 1.17	25-Mar-2006	djm	allow bpf(4) to ignore packets based on their direction (inbound or outbound), using a new BIOCSDIRFILT ioctl; guidance, feedback and ok canacar@
Revision tags: OPENBSD_3_9_BASE
# 1.16	21-Nov-2005	millert	Move contents of sys/select.h to sys/selinfo.h in preparation for a userland-visible sys/select.h. Consistent with what Net and Free do. OK deraadt@, tested with full ports build by naddy@.
Revision tags: OPENBSD_3_7_BASE OPENBSD_3_8_BASE
# 1.15	17-Dec-2004	reyk	knf cleanup, convert old k&r-style functions to ansi-style for a consistent style in sys/net/bpf.c. ok henning@, "looks fine" canacar@
Revision tags: OPENBSD_3_6_BASE
# 1.14	22-Jun-2004	canacar	Add a new "filter drop" flag to bpf and related ioclts. When enabled, it notifies the calling interface that the packet matches a bpf filter and should be dropped. ok henning@ markus@ frantzen@
Revision tags: SMP_SYNC_A SMP_SYNC_B
# 1.13	28-May-2004	grange	bpf device cloning. Now to have more bpf devices just add device nodes in /dev, no need to recompile kernel anymore. Code from form@pdp-11.org.ru, some help from markus@. ok markus@ canacar@ deraadt@
# 1.12	08-May-2004	canacar	reference count bpf descriptors to protect against disappearing interfaces while asleep in read. ok deraadt@
Revision tags: OPENBSD_3_5_BASE
# 1.11	22-Oct-2003	canacar	Add locking and write filtering to bpf descriptors. Locking prevents dangerous ioctls such as changing the interface and sending signals to be executed by an unprivileged process. A filter can also be applied to packets injected through a bpf descriptor. These features allow programs using bpf descriptors to safely drop/seperate privileges. ok frantzen@ henning@ mcbride@
Revision tags: OPENBSD_3_4_BASE
# 1.10	02-Jun-2003	millert	Remove the advertising clause in the UCB license which Berkeley rescinded 22 July 1999. Proofed by myself and Theo.
Revision tags: OPENBSD_3_1_BASE OPENBSD_3_2_BASE OPENBSD_3_3_BASE UBC_SYNC_A UBC_SYNC_B
# 1.9	14-Mar-2002	millert	First round of __P removal in sys
Revision tags: OPENBSD_3_0_BASE UBC_BASE
# 1.8	09-Jun-2001	angelos	branches: 1.8.4; By popular demand, protect from multiple inclusion, and fix to use the same naming style.
# 1.7	28-May-2001	dugsong	add BIOC[GS]HDRCMPLT ioctl for BPF, to disable overwriting of link level source address in forged frames. from NetBSD. art@ok
Revision tags: OPENBSD_2_8_BASE OPENBSD_2_9_BASE
# 1.6	19-Jun-2000	jason	de-#ifdef-ize
Revision tags: OPENBSD_2_6_BASE OPENBSD_2_7_BASE SMP_BASE kame_19991208
# 1.5	08-Aug-1999	niklas	branches: 1.5.4; Support detaching of network interfaces. Still work to do in ipf, and other families than inet.
Revision tags: OPENBSD_2_4_BASE OPENBSD_2_5_BASE
# 1.4	26-Jun-1998	deraadt	fix bpf select(); from mts@rare.net
Revision tags: OPENBSD_2_2_BASE OPENBSD_2_3_BASE
# 1.3	31-Aug-1997	deraadt	for non-tty TIOCSPGRP/F_SETOWN/FIOSETOWN pgid setting calls, store uid and euid as well, then deliver them using new csignal() interface which ensures that pgid setting process is permitted to signal the pgid process(es). Thanks to newsham@aloha.net for extensive help and discussion.
Revision tags: OPENBSD_2_1_BASE
# 1.2	24-Feb-1997	niklas	OpenBSD tags + some prototyping police
# 1.1	18-Oct-1995	deraadt	branches: 1.1.1; Initial revision
# 1.44	02-Jan-2021	cheloha	bpf(4): remove ticks Change bd_rtout to a uint64_t of nanoseconds. Update the code in bpfioctl() and bpfread() accordingly. Add a local copy of nsecuptime() to make the diff smaller. This will need to move to kern_tc.c if/when we have another user elsewhere in the kernel. Prompted by mpi@. With input from dlg@. ok dlg@ mpi@ visa@
# 1.43	26-Dec-2020	cheloha	bpf(4): bpf_d struct: replace bd_rdStart member with bd_nreaders member bd_rdStart is strange. It nominally represents the start of a read(2) on a given bpf(4) descriptor, but there are several problems with it: 1. If there are multiple readers, the bd_rdStart is not set by subsequent readers, so their timeout is screwed up. The read timeout should really be tracked on a per-thread basis in bpfread(). 2. We set bd_rdStart for poll(2), select(2), and kevent(2), even though that makes no sense. We should not be setting bd_rdStart in bpfpoll() or bpfkqfilter(). 3. bd_rdStart is buggy. If ticks is 0 when the read starts then bpf_catchpacket() won't wake up the reader. This is a problem inherent to the design of bd_rdStart: it serves as both a boolean and a scalar value, even though 0 is a valid value in the scalar range. So let's replace it with a better struct member. "bd_nreaders" is a count of threads sleeping in bpfread(). It is incremented before a thread goes to sleep in bpfread() and decremented when a thread wakes up. If bd_nreaders is greater than zero when we reach bpf_catchpacket() and fbuf is non-NULL we wake up all readers. The read timeout, if any, is now tracked locally by the thread in bpfread(). Unlike bd_rdStart, bpfpoll() and bpfkqfilter() don't touch bd_nreaders. Prompted by mpi@. Basic idea from dlg@. Lots of input from dlg@. Tested by dlg@ with tcpdump(8) (blocking read) and flow-collector (https://github.com/eait-itig/flow-collector, non-blocking read). ok dlg@
# 1.42	11-Dec-2020	cheloha	bpf(4): BIOCGRTIMEOUT, BIOCSRTIMEOUT: protect bd_rtout with bd_mtx Reading and writing bd_rtout is not an atomic operation, so it needs to be done under the per-descriptor mutex. While here, start annotating locking in bpfdesc.h. There's lots more to do on this front, but you have to start somewhere. Tweaked by mpi@. ok mpi@
Revision tags: OPENBSD_6_8_BASE
# 1.41	13-May-2020	cheloha	bpf(4): separate descriptor non-blocking status from read timeout If you set FIONBIO on a bpf(4) descriptor you enable non-blocking mode and also clobber any read timeout set for the descriptor. The reverse is also true: do BIOCSRTIMEOUT and you'll set a timeout and simultaneously disable non-blocking status. The two are mutually exclusive. This relationship is undocumented and might cause a bug. At the very least it makes reasoning about the code difficult. This patch adds a new member to bpf_d, bd_rnonblock, to store the non-blocking status of the descriptor. The read timeout is still kept in bd_rtout. With this in place, non-blocking status and the read timeout can coexist. Setting one state does not clear the other, and vice versa. Separating the two states also clears the way for changing the bpf(4) read timeout to use the system clock instead of ticks. More on that in a later patch. With insight from dlg@ regarding the purpose of the read timeout. ok dlg@
Revision tags: OPENBSD_6_7_BASE
# 1.40	02-Jan-2020	claudio	Switch bpf to use pgsigio(9) and sigio_init(9) instead of handrolling something with csignal(). OK visa@
# 1.39	21-Oct-2019	sashan	put bpfdesc reference counting back, revert change introduced in 1.175 as: BPF: remove redundant reference counting of filedescriptors Anton@ made problem crystal clear: I've been looking into a similar bpf panic reported by syzkaller, which looks somewhat related. The one reported by syzkaller is caused by issuing ioctl(SIOCIFDESTROY) on the interface which the packet filter is attached to. This will in turn invoke the following functions expressed as an inverted stacktrace: 1. bpfsdetach() 2. vdevgone() 3. VOP_REVOKE() 4. vop_generic_revoke() 5. vgonel() 6. vclean(DOCLOSE) 7. VOP_CLOSE() 8. bpfclose() Note that bpfclose() is called before changing the vnode type. In bpfclose(), the `struct bpf_d` is immediately removed from the global bpf_d_list list and might end up sleeping inside taskq_barrier(systq). Since the bpf file descriptor (fd) is still present and valid, another thread could perform an ioctl() on the fd only to fault since bpfilter_lookup() will return NULL. The vnode is not locked in this path either so it won't end up waiting on the ongoing vclean(). Steps to trigger the similar type of panic are straightforward, let there be two processes running concurrently: process A: while true ; do ifconfig tun0 up ; ifconfig tun0 destroy ; done process B: while true ; do tcpdump -i tun0 ; done panic happens within few secs (Dell PowerEdge 710) OK @visa, OK @anton
Revision tags: OPENBSD_6_6_BASE
# 1.38	18-May-2019	sashan	branches: 1.38.2; BPF: remove redundant reference counting of filedescriptors OK visa@, OK mpi@
# 1.37	15-Apr-2019	sashan	moving BPF to RCU OK visa@
Revision tags: OPENBSD_6_3_BASE OPENBSD_6_4_BASE OPENBSD_6_5_BASE
# 1.36	24-Jan-2018	dlg	add support for bpf on "subsystems", not just network interfaces bpf assumed that it was being unconditionally attached to network interfaces, and maintained a pointer to a struct ifnet *. this was mostly used to get at the name of the interface, which is how userland asks to be attached to a particular interface. this diff adds a pointer to the name and uses it instead of the interface pointer for these lookups. this in turn allows bpf to be attached to arbitrary subsystems in the kernel which just have to supply a name rather than an interface pointer. for example, bpf could be attached to pf_test so you can see what packets are about to be filtered. mpi@ is using this to look at usb transfers. bpf still uses the interface pointer for bpfwrite, and for enabling and disabling promisc. however, these are nopped out for subsystems. ok mpi@
Revision tags: OPENBSD_6_1_BASE OPENBSD_6_2_BASE
# 1.35	24-Jan-2017	krw	A space here, a space there. Soon we're talking real whitespace rectification.
# 1.34	09-Jan-2017	mpi	Use a mutex to serialize accesses to buffer slots. With this change bpf_catchpacket() no longer need the KERNEL_LOCK(). Tested by Hrvoje Popovski who reported a recursion in the previous attempt. ok bluhm@
# 1.33	03-Jan-2017	mpi	Revert previous, there's still a problem with recursive entries in bpf_mpath_ether(). Problem reported by Hrvoje Popovski.
# 1.32	02-Jan-2017	mpi	Use a mutex to serialize accesses to buffer slots. With this change bpf_catchpacket() no longer need the KERNEL_LOCK(). ok bluhm@, jmatthew@
# 1.31	22-Aug-2016	mpi	Call csignal() and selwakeup() from a KERNEL_LOCK'd task. This will allow us make bpf_tap() KERNEL_LOCK() free. Discussed with dlg@ and input from guenther@
Revision tags: OPENBSD_6_0_BASE
# 1.30	30-Mar-2016	dlg	remove support for BIOCGQUEUE and BIOSGQUEUE nothing uses them, and the implementation make incorrect assumptions about mbufs within bpf processing that could lead to some weird failures. ok sthen@ deraadt@ mpi@
Revision tags: OPENBSD_5_9_BASE
# 1.29	03-Dec-2015	mpi	Use SRPL_HEAD() and SRPL_ENTRY() to be consistent with and allow to fallback to a SLIST. ok dlg@, jasper@
# 1.28	09-Sep-2015	dlg	convert bpf to using an srp list for the list of descriptors. this replaces the hand rolled list. the code has always used hand rolled lists, but that gets a bit cumbersome when theyre SRPs. requested ages ago by mpi@
# 1.27	01-Sep-2015	dlg	reintroduce bpf.c r1.121. this differs slightly from 1.121 in that it uses the new srp_follow() to walk the list of descriptors on an interface. this is instead of interleaving srp_enter() and srp_leave(), which can lead to races and corruption if you're touching the same SRPs at different IPLs on the same CPU. ok deraadt@ jmatthew@
# 1.26	23-Aug-2015	dlg	back out bpf+srp. its blowing up in a bridge setup. ill debug this out of the tree.
# 1.25	16-Aug-2015	dlg	make bpf_mtap mpsafe by using SRPs. this was originally implemented by jmatthew@ last year, and updated by us both during s2k15. there are four data structures that need to be looked after. the first is the bpf interface itself. it is allocated and freed at the same time as an actual interface, so if you're able to send or receive packets, you're able to run bpf on an interface too. dont need to do any work there. the second are bpf descriptors. these represent userland attaching to a bpf interface, so you can have many of them on a single bpf interface. they were arranged in a singly linked list before. now the head and next pointers are replaced with SRP pointers and followed by srp_enter. the list updates are serialised by the kernel lock. the third are the bpf filters. there is an inbound and outbound filter on each bpf descriptor, ann a process can replace them at any time. the pointers from the descriptor to those is also changed to be accessed via srp_enter. updates are serialised by the kernel lock. the fourth thing is the ring that bpf writes to for userland to read. there's one of these per descriptor. because these are only updated when a filter matches (which is hopefully a relatively rare event), we take the kernel lock to serialise the writes to the ring. all this together means you can run bpf against a packet without taking the kernel lock unless you actually caught a packet and need to send it to userland. even better, you can run bpf in parallel, so if we ever support multiple rings on a single interface, we can run bpf on each ring on different cpus safely. ive hit this pretty hard in production at work (yay dhcrelay) on myx (which does rx outside the biglock). ok jmatthew@ mpi@ millert@
Revision tags: OPENBSD_5_7_BASE OPENBSD_5_8_BASE
# 1.24	10-Feb-2015	pelikan	make bpf(4) able to filter based on a pf(4) queue ID for tcpdump -Q qname ALTQ version has been on tech@ for years, people were generally ok with it. ok henning
# 1.23	05-Oct-2014	lteo	fix typo in comment: correspoding -> corresponding
Revision tags: OPENBSD_5_5_BASE OPENBSD_5_6_BASE
# 1.22	18-Dec-2013	krw	Revert the other part of bpf.c's r1.84. May finally fix RD Thrush's encounter with "timeout_add: to_ticks (-1) < 0". Pointed out by RD Thrush.
# 1.21	12-Nov-2013	dlg	try bpf.c r1.84 again, this time without semantic changes to if statements. cheers to sthen@ and krw@ for properly dealing with the fallout of my first commit.
# 1.20	11-Nov-2013	sthen	Revert bpf.c 1.84 / bpfdesc.h 1.19 for now, "panic: timeout_add: to_ticks (-1) < 0" seen by RD Thrush, http://article.gmane.org/gmane.os.openbsd.bugs/20113 where he has a long-running process using bpf which is active at the time of panic. krw@ agrees with reverting for now.
# 1.19	11-Nov-2013	dlg	replace the user of ticks in a condition like "interval + start < ticks" with "ticks - start > interval" because the latter copes with the ticks value wrapping. pointed out by guenther@ ok krw@
# 1.18	24-Oct-2013	deraadt	Move obvious kernel prototypes (and structure's with kernel pointers, obviously only used in the kernel) behind #ifdef _KERNEL
Revision tags: OPENBSD_4_0_BASE OPENBSD_4_1_BASE OPENBSD_4_2_BASE OPENBSD_4_3_BASE OPENBSD_4_4_BASE OPENBSD_4_5_BASE OPENBSD_4_6_BASE OPENBSD_4_7_BASE OPENBSD_4_8_BASE OPENBSD_4_9_BASE OPENBSD_5_0_BASE OPENBSD_5_1_BASE OPENBSD_5_2_BASE OPENBSD_5_3_BASE OPENBSD_5_4_BASE
# 1.17	25-Mar-2006	djm	allow bpf(4) to ignore packets based on their direction (inbound or outbound), using a new BIOCSDIRFILT ioctl; guidance, feedback and ok canacar@
Revision tags: OPENBSD_3_9_BASE
# 1.16	21-Nov-2005	millert	Move contents of sys/select.h to sys/selinfo.h in preparation for a userland-visible sys/select.h. Consistent with what Net and Free do. OK deraadt@, tested with full ports build by naddy@.
Revision tags: OPENBSD_3_7_BASE OPENBSD_3_8_BASE
# 1.15	17-Dec-2004	reyk	knf cleanup, convert old k&r-style functions to ansi-style for a consistent style in sys/net/bpf.c. ok henning@, "looks fine" canacar@
Revision tags: OPENBSD_3_6_BASE
# 1.14	22-Jun-2004	canacar	Add a new "filter drop" flag to bpf and related ioclts. When enabled, it notifies the calling interface that the packet matches a bpf filter and should be dropped. ok henning@ markus@ frantzen@
Revision tags: SMP_SYNC_A SMP_SYNC_B
# 1.13	28-May-2004	grange	bpf device cloning. Now to have more bpf devices just add device nodes in /dev, no need to recompile kernel anymore. Code from form@pdp-11.org.ru, some help from markus@. ok markus@ canacar@ deraadt@
# 1.12	08-May-2004	canacar	reference count bpf descriptors to protect against disappearing interfaces while asleep in read. ok deraadt@
Revision tags: OPENBSD_3_5_BASE
# 1.11	22-Oct-2003	canacar	Add locking and write filtering to bpf descriptors. Locking prevents dangerous ioctls such as changing the interface and sending signals to be executed by an unprivileged process. A filter can also be applied to packets injected through a bpf descriptor. These features allow programs using bpf descriptors to safely drop/seperate privileges. ok frantzen@ henning@ mcbride@
Revision tags: OPENBSD_3_4_BASE
# 1.10	02-Jun-2003	millert	Remove the advertising clause in the UCB license which Berkeley rescinded 22 July 1999. Proofed by myself and Theo.
Revision tags: OPENBSD_3_1_BASE OPENBSD_3_2_BASE OPENBSD_3_3_BASE UBC_SYNC_A UBC_SYNC_B
# 1.9	14-Mar-2002	millert	First round of __P removal in sys
Revision tags: OPENBSD_3_0_BASE UBC_BASE
# 1.8	09-Jun-2001	angelos	branches: 1.8.4; By popular demand, protect from multiple inclusion, and fix to use the same naming style.
# 1.7	28-May-2001	dugsong	add BIOC[GS]HDRCMPLT ioctl for BPF, to disable overwriting of link level source address in forged frames. from NetBSD. art@ok
Revision tags: OPENBSD_2_8_BASE OPENBSD_2_9_BASE
# 1.6	19-Jun-2000	jason	de-#ifdef-ize
Revision tags: OPENBSD_2_6_BASE OPENBSD_2_7_BASE SMP_BASE kame_19991208
# 1.5	08-Aug-1999	niklas	branches: 1.5.4; Support detaching of network interfaces. Still work to do in ipf, and other families than inet.
Revision tags: OPENBSD_2_4_BASE OPENBSD_2_5_BASE
# 1.4	26-Jun-1998	deraadt	fix bpf select(); from mts@rare.net
Revision tags: OPENBSD_2_2_BASE OPENBSD_2_3_BASE
# 1.3	31-Aug-1997	deraadt	for non-tty TIOCSPGRP/F_SETOWN/FIOSETOWN pgid setting calls, store uid and euid as well, then deliver them using new csignal() interface which ensures that pgid setting process is permitted to signal the pgid process(es). Thanks to newsham@aloha.net for extensive help and discussion.
Revision tags: OPENBSD_2_1_BASE
# 1.2	24-Feb-1997	niklas	OpenBSD tags + some prototyping police
# 1.1	18-Oct-1995	deraadt	branches: 1.1.1; Initial revision
# 1.43	26-Dec-2020	cheloha	bpf(4): bpf_d struct: replace bd_rdStart member with bd_nreaders member bd_rdStart is strange. It nominally represents the start of a read(2) on a given bpf(4) descriptor, but there are several problems with it: 1. If there are multiple readers, the bd_rdStart is not set by subsequent readers, so their timeout is screwed up. The read timeout should really be tracked on a per-thread basis in bpfread(). 2. We set bd_rdStart for poll(2), select(2), and kevent(2), even though that makes no sense. We should not be setting bd_rdStart in bpfpoll() or bpfkqfilter(). 3. bd_rdStart is buggy. If ticks is 0 when the read starts then bpf_catchpacket() won't wake up the reader. This is a problem inherent to the design of bd_rdStart: it serves as both a boolean and a scalar value, even though 0 is a valid value in the scalar range. So let's replace it with a better struct member. "bd_nreaders" is a count of threads sleeping in bpfread(). It is incremented before a thread goes to sleep in bpfread() and decremented when a thread wakes up. If bd_nreaders is greater than zero when we reach bpf_catchpacket() and fbuf is non-NULL we wake up all readers. The read timeout, if any, is now tracked locally by the thread in bpfread(). Unlike bd_rdStart, bpfpoll() and bpfkqfilter() don't touch bd_nreaders. Prompted by mpi@. Basic idea from dlg@. Lots of input from dlg@. Tested by dlg@ with tcpdump(8) (blocking read) and flow-collector (https://github.com/eait-itig/flow-collector, non-blocking read). ok dlg@
# 1.42	11-Dec-2020	cheloha	bpf(4): BIOCGRTIMEOUT, BIOCSRTIMEOUT: protect bd_rtout with bd_mtx Reading and writing bd_rtout is not an atomic operation, so it needs to be done under the per-descriptor mutex. While here, start annotating locking in bpfdesc.h. There's lots more to do on this front, but you have to start somewhere. Tweaked by mpi@. ok mpi@
Revision tags: OPENBSD_6_8_BASE
# 1.41	13-May-2020	cheloha	bpf(4): separate descriptor non-blocking status from read timeout If you set FIONBIO on a bpf(4) descriptor you enable non-blocking mode and also clobber any read timeout set for the descriptor. The reverse is also true: do BIOCSRTIMEOUT and you'll set a timeout and simultaneously disable non-blocking status. The two are mutually exclusive. This relationship is undocumented and might cause a bug. At the very least it makes reasoning about the code difficult. This patch adds a new member to bpf_d, bd_rnonblock, to store the non-blocking status of the descriptor. The read timeout is still kept in bd_rtout. With this in place, non-blocking status and the read timeout can coexist. Setting one state does not clear the other, and vice versa. Separating the two states also clears the way for changing the bpf(4) read timeout to use the system clock instead of ticks. More on that in a later patch. With insight from dlg@ regarding the purpose of the read timeout. ok dlg@
Revision tags: OPENBSD_6_7_BASE
# 1.40	02-Jan-2020	claudio	Switch bpf to use pgsigio(9) and sigio_init(9) instead of handrolling something with csignal(). OK visa@
# 1.39	21-Oct-2019	sashan	put bpfdesc reference counting back, revert change introduced in 1.175 as: BPF: remove redundant reference counting of filedescriptors Anton@ made problem crystal clear: I've been looking into a similar bpf panic reported by syzkaller, which looks somewhat related. The one reported by syzkaller is caused by issuing ioctl(SIOCIFDESTROY) on the interface which the packet filter is attached to. This will in turn invoke the following functions expressed as an inverted stacktrace: 1. bpfsdetach() 2. vdevgone() 3. VOP_REVOKE() 4. vop_generic_revoke() 5. vgonel() 6. vclean(DOCLOSE) 7. VOP_CLOSE() 8. bpfclose() Note that bpfclose() is called before changing the vnode type. In bpfclose(), the `struct bpf_d` is immediately removed from the global bpf_d_list list and might end up sleeping inside taskq_barrier(systq). Since the bpf file descriptor (fd) is still present and valid, another thread could perform an ioctl() on the fd only to fault since bpfilter_lookup() will return NULL. The vnode is not locked in this path either so it won't end up waiting on the ongoing vclean(). Steps to trigger the similar type of panic are straightforward, let there be two processes running concurrently: process A: while true ; do ifconfig tun0 up ; ifconfig tun0 destroy ; done process B: while true ; do tcpdump -i tun0 ; done panic happens within few secs (Dell PowerEdge 710) OK @visa, OK @anton
Revision tags: OPENBSD_6_6_BASE
# 1.38	18-May-2019	sashan	branches: 1.38.2; BPF: remove redundant reference counting of filedescriptors OK visa@, OK mpi@
# 1.37	15-Apr-2019	sashan	moving BPF to RCU OK visa@
Revision tags: OPENBSD_6_3_BASE OPENBSD_6_4_BASE OPENBSD_6_5_BASE
# 1.36	24-Jan-2018	dlg	add support for bpf on "subsystems", not just network interfaces bpf assumed that it was being unconditionally attached to network interfaces, and maintained a pointer to a struct ifnet *. this was mostly used to get at the name of the interface, which is how userland asks to be attached to a particular interface. this diff adds a pointer to the name and uses it instead of the interface pointer for these lookups. this in turn allows bpf to be attached to arbitrary subsystems in the kernel which just have to supply a name rather than an interface pointer. for example, bpf could be attached to pf_test so you can see what packets are about to be filtered. mpi@ is using this to look at usb transfers. bpf still uses the interface pointer for bpfwrite, and for enabling and disabling promisc. however, these are nopped out for subsystems. ok mpi@
Revision tags: OPENBSD_6_1_BASE OPENBSD_6_2_BASE
# 1.35	24-Jan-2017	krw	A space here, a space there. Soon we're talking real whitespace rectification.
# 1.34	09-Jan-2017	mpi	Use a mutex to serialize accesses to buffer slots. With this change bpf_catchpacket() no longer need the KERNEL_LOCK(). Tested by Hrvoje Popovski who reported a recursion in the previous attempt. ok bluhm@
# 1.33	03-Jan-2017	mpi	Revert previous, there's still a problem with recursive entries in bpf_mpath_ether(). Problem reported by Hrvoje Popovski.
# 1.32	02-Jan-2017	mpi	Use a mutex to serialize accesses to buffer slots. With this change bpf_catchpacket() no longer need the KERNEL_LOCK(). ok bluhm@, jmatthew@
# 1.31	22-Aug-2016	mpi	Call csignal() and selwakeup() from a KERNEL_LOCK'd task. This will allow us make bpf_tap() KERNEL_LOCK() free. Discussed with dlg@ and input from guenther@
Revision tags: OPENBSD_6_0_BASE
# 1.30	30-Mar-2016	dlg	remove support for BIOCGQUEUE and BIOSGQUEUE nothing uses them, and the implementation make incorrect assumptions about mbufs within bpf processing that could lead to some weird failures. ok sthen@ deraadt@ mpi@
Revision tags: OPENBSD_5_9_BASE
# 1.29	03-Dec-2015	mpi	Use SRPL_HEAD() and SRPL_ENTRY() to be consistent with and allow to fallback to a SLIST. ok dlg@, jasper@
# 1.28	09-Sep-2015	dlg	convert bpf to using an srp list for the list of descriptors. this replaces the hand rolled list. the code has always used hand rolled lists, but that gets a bit cumbersome when theyre SRPs. requested ages ago by mpi@
# 1.27	01-Sep-2015	dlg	reintroduce bpf.c r1.121. this differs slightly from 1.121 in that it uses the new srp_follow() to walk the list of descriptors on an interface. this is instead of interleaving srp_enter() and srp_leave(), which can lead to races and corruption if you're touching the same SRPs at different IPLs on the same CPU. ok deraadt@ jmatthew@
# 1.26	23-Aug-2015	dlg	back out bpf+srp. its blowing up in a bridge setup. ill debug this out of the tree.
# 1.25	16-Aug-2015	dlg	make bpf_mtap mpsafe by using SRPs. this was originally implemented by jmatthew@ last year, and updated by us both during s2k15. there are four data structures that need to be looked after. the first is the bpf interface itself. it is allocated and freed at the same time as an actual interface, so if you're able to send or receive packets, you're able to run bpf on an interface too. dont need to do any work there. the second are bpf descriptors. these represent userland attaching to a bpf interface, so you can have many of them on a single bpf interface. they were arranged in a singly linked list before. now the head and next pointers are replaced with SRP pointers and followed by srp_enter. the list updates are serialised by the kernel lock. the third are the bpf filters. there is an inbound and outbound filter on each bpf descriptor, ann a process can replace them at any time. the pointers from the descriptor to those is also changed to be accessed via srp_enter. updates are serialised by the kernel lock. the fourth thing is the ring that bpf writes to for userland to read. there's one of these per descriptor. because these are only updated when a filter matches (which is hopefully a relatively rare event), we take the kernel lock to serialise the writes to the ring. all this together means you can run bpf against a packet without taking the kernel lock unless you actually caught a packet and need to send it to userland. even better, you can run bpf in parallel, so if we ever support multiple rings on a single interface, we can run bpf on each ring on different cpus safely. ive hit this pretty hard in production at work (yay dhcrelay) on myx (which does rx outside the biglock). ok jmatthew@ mpi@ millert@
Revision tags: OPENBSD_5_7_BASE OPENBSD_5_8_BASE
# 1.24	10-Feb-2015	pelikan	make bpf(4) able to filter based on a pf(4) queue ID for tcpdump -Q qname ALTQ version has been on tech@ for years, people were generally ok with it. ok henning
# 1.23	05-Oct-2014	lteo	fix typo in comment: correspoding -> corresponding
Revision tags: OPENBSD_5_5_BASE OPENBSD_5_6_BASE
# 1.22	18-Dec-2013	krw	Revert the other part of bpf.c's r1.84. May finally fix RD Thrush's encounter with "timeout_add: to_ticks (-1) < 0". Pointed out by RD Thrush.
# 1.21	12-Nov-2013	dlg	try bpf.c r1.84 again, this time without semantic changes to if statements. cheers to sthen@ and krw@ for properly dealing with the fallout of my first commit.
# 1.20	11-Nov-2013	sthen	Revert bpf.c 1.84 / bpfdesc.h 1.19 for now, "panic: timeout_add: to_ticks (-1) < 0" seen by RD Thrush, http://article.gmane.org/gmane.os.openbsd.bugs/20113 where he has a long-running process using bpf which is active at the time of panic. krw@ agrees with reverting for now.
# 1.19	11-Nov-2013	dlg	replace the user of ticks in a condition like "interval + start < ticks" with "ticks - start > interval" because the latter copes with the ticks value wrapping. pointed out by guenther@ ok krw@
# 1.18	24-Oct-2013	deraadt	Move obvious kernel prototypes (and structure's with kernel pointers, obviously only used in the kernel) behind #ifdef _KERNEL
Revision tags: OPENBSD_4_0_BASE OPENBSD_4_1_BASE OPENBSD_4_2_BASE OPENBSD_4_3_BASE OPENBSD_4_4_BASE OPENBSD_4_5_BASE OPENBSD_4_6_BASE OPENBSD_4_7_BASE OPENBSD_4_8_BASE OPENBSD_4_9_BASE OPENBSD_5_0_BASE OPENBSD_5_1_BASE OPENBSD_5_2_BASE OPENBSD_5_3_BASE OPENBSD_5_4_BASE
# 1.17	25-Mar-2006	djm	allow bpf(4) to ignore packets based on their direction (inbound or outbound), using a new BIOCSDIRFILT ioctl; guidance, feedback and ok canacar@
Revision tags: OPENBSD_3_9_BASE
# 1.16	21-Nov-2005	millert	Move contents of sys/select.h to sys/selinfo.h in preparation for a userland-visible sys/select.h. Consistent with what Net and Free do. OK deraadt@, tested with full ports build by naddy@.
Revision tags: OPENBSD_3_7_BASE OPENBSD_3_8_BASE
# 1.15	17-Dec-2004	reyk	knf cleanup, convert old k&r-style functions to ansi-style for a consistent style in sys/net/bpf.c. ok henning@, "looks fine" canacar@
Revision tags: OPENBSD_3_6_BASE
# 1.14	22-Jun-2004	canacar	Add a new "filter drop" flag to bpf and related ioclts. When enabled, it notifies the calling interface that the packet matches a bpf filter and should be dropped. ok henning@ markus@ frantzen@
Revision tags: SMP_SYNC_A SMP_SYNC_B
# 1.13	28-May-2004	grange	bpf device cloning. Now to have more bpf devices just add device nodes in /dev, no need to recompile kernel anymore. Code from form@pdp-11.org.ru, some help from markus@. ok markus@ canacar@ deraadt@
# 1.12	08-May-2004	canacar	reference count bpf descriptors to protect against disappearing interfaces while asleep in read. ok deraadt@
Revision tags: OPENBSD_3_5_BASE
# 1.11	22-Oct-2003	canacar	Add locking and write filtering to bpf descriptors. Locking prevents dangerous ioctls such as changing the interface and sending signals to be executed by an unprivileged process. A filter can also be applied to packets injected through a bpf descriptor. These features allow programs using bpf descriptors to safely drop/seperate privileges. ok frantzen@ henning@ mcbride@
Revision tags: OPENBSD_3_4_BASE
# 1.10	02-Jun-2003	millert	Remove the advertising clause in the UCB license which Berkeley rescinded 22 July 1999. Proofed by myself and Theo.
Revision tags: OPENBSD_3_1_BASE OPENBSD_3_2_BASE OPENBSD_3_3_BASE UBC_SYNC_A UBC_SYNC_B
# 1.9	14-Mar-2002	millert	First round of __P removal in sys
Revision tags: OPENBSD_3_0_BASE UBC_BASE
# 1.8	09-Jun-2001	angelos	branches: 1.8.4; By popular demand, protect from multiple inclusion, and fix to use the same naming style.
# 1.7	28-May-2001	dugsong	add BIOC[GS]HDRCMPLT ioctl for BPF, to disable overwriting of link level source address in forged frames. from NetBSD. art@ok
Revision tags: OPENBSD_2_8_BASE OPENBSD_2_9_BASE
# 1.6	19-Jun-2000	jason	de-#ifdef-ize
Revision tags: OPENBSD_2_6_BASE OPENBSD_2_7_BASE SMP_BASE kame_19991208
# 1.5	08-Aug-1999	niklas	branches: 1.5.4; Support detaching of network interfaces. Still work to do in ipf, and other families than inet.
Revision tags: OPENBSD_2_4_BASE OPENBSD_2_5_BASE
# 1.4	26-Jun-1998	deraadt	fix bpf select(); from mts@rare.net
Revision tags: OPENBSD_2_2_BASE OPENBSD_2_3_BASE
# 1.3	31-Aug-1997	deraadt	for non-tty TIOCSPGRP/F_SETOWN/FIOSETOWN pgid setting calls, store uid and euid as well, then deliver them using new csignal() interface which ensures that pgid setting process is permitted to signal the pgid process(es). Thanks to newsham@aloha.net for extensive help and discussion.
Revision tags: OPENBSD_2_1_BASE
# 1.2	24-Feb-1997	niklas	OpenBSD tags + some prototyping police
# 1.1	18-Oct-1995	deraadt	branches: 1.1.1; Initial revision
# 1.42	11-Dec-2020	cheloha	bpf(4): BIOCGRTIMEOUT, BIOCSRTIMEOUT: protect bd_rtout with bd_mtx Reading and writing bd_rtout is not an atomic operation, so it needs to be done under the per-descriptor mutex. While here, start annotating locking in bpfdesc.h. There's lots more to do on this front, but you have to start somewhere. Tweaked by mpi@. ok mpi@
Revision tags: OPENBSD_6_8_BASE
# 1.41	13-May-2020	cheloha	bpf(4): separate descriptor non-blocking status from read timeout If you set FIONBIO on a bpf(4) descriptor you enable non-blocking mode and also clobber any read timeout set for the descriptor. The reverse is also true: do BIOCSRTIMEOUT and you'll set a timeout and simultaneously disable non-blocking status. The two are mutually exclusive. This relationship is undocumented and might cause a bug. At the very least it makes reasoning about the code difficult. This patch adds a new member to bpf_d, bd_rnonblock, to store the non-blocking status of the descriptor. The read timeout is still kept in bd_rtout. With this in place, non-blocking status and the read timeout can coexist. Setting one state does not clear the other, and vice versa. Separating the two states also clears the way for changing the bpf(4) read timeout to use the system clock instead of ticks. More on that in a later patch. With insight from dlg@ regarding the purpose of the read timeout. ok dlg@
Revision tags: OPENBSD_6_7_BASE
# 1.40	02-Jan-2020	claudio	Switch bpf to use pgsigio(9) and sigio_init(9) instead of handrolling something with csignal(). OK visa@
# 1.39	21-Oct-2019	sashan	put bpfdesc reference counting back, revert change introduced in 1.175 as: BPF: remove redundant reference counting of filedescriptors Anton@ made problem crystal clear: I've been looking into a similar bpf panic reported by syzkaller, which looks somewhat related. The one reported by syzkaller is caused by issuing ioctl(SIOCIFDESTROY) on the interface which the packet filter is attached to. This will in turn invoke the following functions expressed as an inverted stacktrace: 1. bpfsdetach() 2. vdevgone() 3. VOP_REVOKE() 4. vop_generic_revoke() 5. vgonel() 6. vclean(DOCLOSE) 7. VOP_CLOSE() 8. bpfclose() Note that bpfclose() is called before changing the vnode type. In bpfclose(), the `struct bpf_d` is immediately removed from the global bpf_d_list list and might end up sleeping inside taskq_barrier(systq). Since the bpf file descriptor (fd) is still present and valid, another thread could perform an ioctl() on the fd only to fault since bpfilter_lookup() will return NULL. The vnode is not locked in this path either so it won't end up waiting on the ongoing vclean(). Steps to trigger the similar type of panic are straightforward, let there be two processes running concurrently: process A: while true ; do ifconfig tun0 up ; ifconfig tun0 destroy ; done process B: while true ; do tcpdump -i tun0 ; done panic happens within few secs (Dell PowerEdge 710) OK @visa, OK @anton
Revision tags: OPENBSD_6_6_BASE
# 1.38	18-May-2019	sashan	branches: 1.38.2; BPF: remove redundant reference counting of filedescriptors OK visa@, OK mpi@
# 1.37	15-Apr-2019	sashan	moving BPF to RCU OK visa@
Revision tags: OPENBSD_6_3_BASE OPENBSD_6_4_BASE OPENBSD_6_5_BASE
# 1.36	24-Jan-2018	dlg	add support for bpf on "subsystems", not just network interfaces bpf assumed that it was being unconditionally attached to network interfaces, and maintained a pointer to a struct ifnet *. this was mostly used to get at the name of the interface, which is how userland asks to be attached to a particular interface. this diff adds a pointer to the name and uses it instead of the interface pointer for these lookups. this in turn allows bpf to be attached to arbitrary subsystems in the kernel which just have to supply a name rather than an interface pointer. for example, bpf could be attached to pf_test so you can see what packets are about to be filtered. mpi@ is using this to look at usb transfers. bpf still uses the interface pointer for bpfwrite, and for enabling and disabling promisc. however, these are nopped out for subsystems. ok mpi@
Revision tags: OPENBSD_6_1_BASE OPENBSD_6_2_BASE
# 1.35	24-Jan-2017	krw	A space here, a space there. Soon we're talking real whitespace rectification.
# 1.34	09-Jan-2017	mpi	Use a mutex to serialize accesses to buffer slots. With this change bpf_catchpacket() no longer need the KERNEL_LOCK(). Tested by Hrvoje Popovski who reported a recursion in the previous attempt. ok bluhm@
# 1.33	03-Jan-2017	mpi	Revert previous, there's still a problem with recursive entries in bpf_mpath_ether(). Problem reported by Hrvoje Popovski.
# 1.32	02-Jan-2017	mpi	Use a mutex to serialize accesses to buffer slots. With this change bpf_catchpacket() no longer need the KERNEL_LOCK(). ok bluhm@, jmatthew@
# 1.31	22-Aug-2016	mpi	Call csignal() and selwakeup() from a KERNEL_LOCK'd task. This will allow us make bpf_tap() KERNEL_LOCK() free. Discussed with dlg@ and input from guenther@
Revision tags: OPENBSD_6_0_BASE
# 1.30	30-Mar-2016	dlg	remove support for BIOCGQUEUE and BIOSGQUEUE nothing uses them, and the implementation make incorrect assumptions about mbufs within bpf processing that could lead to some weird failures. ok sthen@ deraadt@ mpi@
Revision tags: OPENBSD_5_9_BASE
# 1.29	03-Dec-2015	mpi	Use SRPL_HEAD() and SRPL_ENTRY() to be consistent with and allow to fallback to a SLIST. ok dlg@, jasper@
# 1.28	09-Sep-2015	dlg	convert bpf to using an srp list for the list of descriptors. this replaces the hand rolled list. the code has always used hand rolled lists, but that gets a bit cumbersome when theyre SRPs. requested ages ago by mpi@
# 1.27	01-Sep-2015	dlg	reintroduce bpf.c r1.121. this differs slightly from 1.121 in that it uses the new srp_follow() to walk the list of descriptors on an interface. this is instead of interleaving srp_enter() and srp_leave(), which can lead to races and corruption if you're touching the same SRPs at different IPLs on the same CPU. ok deraadt@ jmatthew@
# 1.26	23-Aug-2015	dlg	back out bpf+srp. its blowing up in a bridge setup. ill debug this out of the tree.
# 1.25	16-Aug-2015	dlg	make bpf_mtap mpsafe by using SRPs. this was originally implemented by jmatthew@ last year, and updated by us both during s2k15. there are four data structures that need to be looked after. the first is the bpf interface itself. it is allocated and freed at the same time as an actual interface, so if you're able to send or receive packets, you're able to run bpf on an interface too. dont need to do any work there. the second are bpf descriptors. these represent userland attaching to a bpf interface, so you can have many of them on a single bpf interface. they were arranged in a singly linked list before. now the head and next pointers are replaced with SRP pointers and followed by srp_enter. the list updates are serialised by the kernel lock. the third are the bpf filters. there is an inbound and outbound filter on each bpf descriptor, ann a process can replace them at any time. the pointers from the descriptor to those is also changed to be accessed via srp_enter. updates are serialised by the kernel lock. the fourth thing is the ring that bpf writes to for userland to read. there's one of these per descriptor. because these are only updated when a filter matches (which is hopefully a relatively rare event), we take the kernel lock to serialise the writes to the ring. all this together means you can run bpf against a packet without taking the kernel lock unless you actually caught a packet and need to send it to userland. even better, you can run bpf in parallel, so if we ever support multiple rings on a single interface, we can run bpf on each ring on different cpus safely. ive hit this pretty hard in production at work (yay dhcrelay) on myx (which does rx outside the biglock). ok jmatthew@ mpi@ millert@
Revision tags: OPENBSD_5_7_BASE OPENBSD_5_8_BASE
# 1.24	10-Feb-2015	pelikan	make bpf(4) able to filter based on a pf(4) queue ID for tcpdump -Q qname ALTQ version has been on tech@ for years, people were generally ok with it. ok henning
# 1.23	05-Oct-2014	lteo	fix typo in comment: correspoding -> corresponding
Revision tags: OPENBSD_5_5_BASE OPENBSD_5_6_BASE
# 1.22	18-Dec-2013	krw	Revert the other part of bpf.c's r1.84. May finally fix RD Thrush's encounter with "timeout_add: to_ticks (-1) < 0". Pointed out by RD Thrush.
# 1.21	12-Nov-2013	dlg	try bpf.c r1.84 again, this time without semantic changes to if statements. cheers to sthen@ and krw@ for properly dealing with the fallout of my first commit.
# 1.20	11-Nov-2013	sthen	Revert bpf.c 1.84 / bpfdesc.h 1.19 for now, "panic: timeout_add: to_ticks (-1) < 0" seen by RD Thrush, http://article.gmane.org/gmane.os.openbsd.bugs/20113 where he has a long-running process using bpf which is active at the time of panic. krw@ agrees with reverting for now.
# 1.19	11-Nov-2013	dlg	replace the user of ticks in a condition like "interval + start < ticks" with "ticks - start > interval" because the latter copes with the ticks value wrapping. pointed out by guenther@ ok krw@
# 1.18	24-Oct-2013	deraadt	Move obvious kernel prototypes (and structure's with kernel pointers, obviously only used in the kernel) behind #ifdef _KERNEL
Revision tags: OPENBSD_4_0_BASE OPENBSD_4_1_BASE OPENBSD_4_2_BASE OPENBSD_4_3_BASE OPENBSD_4_4_BASE OPENBSD_4_5_BASE OPENBSD_4_6_BASE OPENBSD_4_7_BASE OPENBSD_4_8_BASE OPENBSD_4_9_BASE OPENBSD_5_0_BASE OPENBSD_5_1_BASE OPENBSD_5_2_BASE OPENBSD_5_3_BASE OPENBSD_5_4_BASE
# 1.17	25-Mar-2006	djm	allow bpf(4) to ignore packets based on their direction (inbound or outbound), using a new BIOCSDIRFILT ioctl; guidance, feedback and ok canacar@
Revision tags: OPENBSD_3_9_BASE
# 1.16	21-Nov-2005	millert	Move contents of sys/select.h to sys/selinfo.h in preparation for a userland-visible sys/select.h. Consistent with what Net and Free do. OK deraadt@, tested with full ports build by naddy@.
Revision tags: OPENBSD_3_7_BASE OPENBSD_3_8_BASE
# 1.15	17-Dec-2004	reyk	knf cleanup, convert old k&r-style functions to ansi-style for a consistent style in sys/net/bpf.c. ok henning@, "looks fine" canacar@
Revision tags: OPENBSD_3_6_BASE
# 1.14	22-Jun-2004	canacar	Add a new "filter drop" flag to bpf and related ioclts. When enabled, it notifies the calling interface that the packet matches a bpf filter and should be dropped. ok henning@ markus@ frantzen@
Revision tags: SMP_SYNC_A SMP_SYNC_B
# 1.13	28-May-2004	grange	bpf device cloning. Now to have more bpf devices just add device nodes in /dev, no need to recompile kernel anymore. Code from form@pdp-11.org.ru, some help from markus@. ok markus@ canacar@ deraadt@
# 1.12	08-May-2004	canacar	reference count bpf descriptors to protect against disappearing interfaces while asleep in read. ok deraadt@
Revision tags: OPENBSD_3_5_BASE
# 1.11	22-Oct-2003	canacar	Add locking and write filtering to bpf descriptors. Locking prevents dangerous ioctls such as changing the interface and sending signals to be executed by an unprivileged process. A filter can also be applied to packets injected through a bpf descriptor. These features allow programs using bpf descriptors to safely drop/seperate privileges. ok frantzen@ henning@ mcbride@
Revision tags: OPENBSD_3_4_BASE
# 1.10	02-Jun-2003	millert	Remove the advertising clause in the UCB license which Berkeley rescinded 22 July 1999. Proofed by myself and Theo.
Revision tags: OPENBSD_3_1_BASE OPENBSD_3_2_BASE OPENBSD_3_3_BASE UBC_SYNC_A UBC_SYNC_B
# 1.9	14-Mar-2002	millert	First round of __P removal in sys
Revision tags: OPENBSD_3_0_BASE UBC_BASE
# 1.8	09-Jun-2001	angelos	branches: 1.8.4; By popular demand, protect from multiple inclusion, and fix to use the same naming style.
# 1.7	28-May-2001	dugsong	add BIOC[GS]HDRCMPLT ioctl for BPF, to disable overwriting of link level source address in forged frames. from NetBSD. art@ok
Revision tags: OPENBSD_2_8_BASE OPENBSD_2_9_BASE
# 1.6	19-Jun-2000	jason	de-#ifdef-ize
Revision tags: OPENBSD_2_6_BASE OPENBSD_2_7_BASE SMP_BASE kame_19991208
# 1.5	08-Aug-1999	niklas	branches: 1.5.4; Support detaching of network interfaces. Still work to do in ipf, and other families than inet.
Revision tags: OPENBSD_2_4_BASE OPENBSD_2_5_BASE
# 1.4	26-Jun-1998	deraadt	fix bpf select(); from mts@rare.net
Revision tags: OPENBSD_2_2_BASE OPENBSD_2_3_BASE
# 1.3	31-Aug-1997	deraadt	for non-tty TIOCSPGRP/F_SETOWN/FIOSETOWN pgid setting calls, store uid and euid as well, then deliver them using new csignal() interface which ensures that pgid setting process is permitted to signal the pgid process(es). Thanks to newsham@aloha.net for extensive help and discussion.
Revision tags: OPENBSD_2_1_BASE
# 1.2	24-Feb-1997	niklas	OpenBSD tags + some prototyping police
# 1.1	18-Oct-1995	deraadt	branches: 1.1.1; Initial revision
# 1.41	13-May-2020	cheloha	bpf(4): separate descriptor non-blocking status from read timeout If you set FIONBIO on a bpf(4) descriptor you enable non-blocking mode and also clobber any read timeout set for the descriptor. The reverse is also true: do BIOCSRTIMEOUT and you'll set a timeout and simultaneously disable non-blocking status. The two are mutually exclusive. This relationship is undocumented and might cause a bug. At the very least it makes reasoning about the code difficult. This patch adds a new member to bpf_d, bd_rnonblock, to store the non-blocking status of the descriptor. The read timeout is still kept in bd_rtout. With this in place, non-blocking status and the read timeout can coexist. Setting one state does not clear the other, and vice versa. Separating the two states also clears the way for changing the bpf(4) read timeout to use the system clock instead of ticks. More on that in a later patch. With insight from dlg@ regarding the purpose of the read timeout. ok dlg@
Revision tags: OPENBSD_6_7_BASE
# 1.40	02-Jan-2020	claudio	Switch bpf to use pgsigio(9) and sigio_init(9) instead of handrolling something with csignal(). OK visa@
# 1.39	21-Oct-2019	sashan	put bpfdesc reference counting back, revert change introduced in 1.175 as: BPF: remove redundant reference counting of filedescriptors Anton@ made problem crystal clear: I've been looking into a similar bpf panic reported by syzkaller, which looks somewhat related. The one reported by syzkaller is caused by issuing ioctl(SIOCIFDESTROY) on the interface which the packet filter is attached to. This will in turn invoke the following functions expressed as an inverted stacktrace: 1. bpfsdetach() 2. vdevgone() 3. VOP_REVOKE() 4. vop_generic_revoke() 5. vgonel() 6. vclean(DOCLOSE) 7. VOP_CLOSE() 8. bpfclose() Note that bpfclose() is called before changing the vnode type. In bpfclose(), the `struct bpf_d` is immediately removed from the global bpf_d_list list and might end up sleeping inside taskq_barrier(systq). Since the bpf file descriptor (fd) is still present and valid, another thread could perform an ioctl() on the fd only to fault since bpfilter_lookup() will return NULL. The vnode is not locked in this path either so it won't end up waiting on the ongoing vclean(). Steps to trigger the similar type of panic are straightforward, let there be two processes running concurrently: process A: while true ; do ifconfig tun0 up ; ifconfig tun0 destroy ; done process B: while true ; do tcpdump -i tun0 ; done panic happens within few secs (Dell PowerEdge 710) OK @visa, OK @anton
Revision tags: OPENBSD_6_6_BASE
# 1.38	18-May-2019	sashan	branches: 1.38.2; BPF: remove redundant reference counting of filedescriptors OK visa@, OK mpi@
# 1.37	15-Apr-2019	sashan	moving BPF to RCU OK visa@
Revision tags: OPENBSD_6_3_BASE OPENBSD_6_4_BASE OPENBSD_6_5_BASE
# 1.36	24-Jan-2018	dlg	add support for bpf on "subsystems", not just network interfaces bpf assumed that it was being unconditionally attached to network interfaces, and maintained a pointer to a struct ifnet *. this was mostly used to get at the name of the interface, which is how userland asks to be attached to a particular interface. this diff adds a pointer to the name and uses it instead of the interface pointer for these lookups. this in turn allows bpf to be attached to arbitrary subsystems in the kernel which just have to supply a name rather than an interface pointer. for example, bpf could be attached to pf_test so you can see what packets are about to be filtered. mpi@ is using this to look at usb transfers. bpf still uses the interface pointer for bpfwrite, and for enabling and disabling promisc. however, these are nopped out for subsystems. ok mpi@
Revision tags: OPENBSD_6_1_BASE OPENBSD_6_2_BASE
# 1.35	24-Jan-2017	krw	A space here, a space there. Soon we're talking real whitespace rectification.
# 1.34	09-Jan-2017	mpi	Use a mutex to serialize accesses to buffer slots. With this change bpf_catchpacket() no longer need the KERNEL_LOCK(). Tested by Hrvoje Popovski who reported a recursion in the previous attempt. ok bluhm@
# 1.33	03-Jan-2017	mpi	Revert previous, there's still a problem with recursive entries in bpf_mpath_ether(). Problem reported by Hrvoje Popovski.
# 1.32	02-Jan-2017	mpi	Use a mutex to serialize accesses to buffer slots. With this change bpf_catchpacket() no longer need the KERNEL_LOCK(). ok bluhm@, jmatthew@
# 1.31	22-Aug-2016	mpi	Call csignal() and selwakeup() from a KERNEL_LOCK'd task. This will allow us make bpf_tap() KERNEL_LOCK() free. Discussed with dlg@ and input from guenther@
Revision tags: OPENBSD_6_0_BASE
# 1.30	30-Mar-2016	dlg	remove support for BIOCGQUEUE and BIOSGQUEUE nothing uses them, and the implementation make incorrect assumptions about mbufs within bpf processing that could lead to some weird failures. ok sthen@ deraadt@ mpi@
Revision tags: OPENBSD_5_9_BASE
# 1.29	03-Dec-2015	mpi	Use SRPL_HEAD() and SRPL_ENTRY() to be consistent with and allow to fallback to a SLIST. ok dlg@, jasper@
# 1.28	09-Sep-2015	dlg	convert bpf to using an srp list for the list of descriptors. this replaces the hand rolled list. the code has always used hand rolled lists, but that gets a bit cumbersome when theyre SRPs. requested ages ago by mpi@
# 1.27	01-Sep-2015	dlg	reintroduce bpf.c r1.121. this differs slightly from 1.121 in that it uses the new srp_follow() to walk the list of descriptors on an interface. this is instead of interleaving srp_enter() and srp_leave(), which can lead to races and corruption if you're touching the same SRPs at different IPLs on the same CPU. ok deraadt@ jmatthew@
# 1.26	23-Aug-2015	dlg	back out bpf+srp. its blowing up in a bridge setup. ill debug this out of the tree.
# 1.25	16-Aug-2015	dlg	make bpf_mtap mpsafe by using SRPs. this was originally implemented by jmatthew@ last year, and updated by us both during s2k15. there are four data structures that need to be looked after. the first is the bpf interface itself. it is allocated and freed at the same time as an actual interface, so if you're able to send or receive packets, you're able to run bpf on an interface too. dont need to do any work there. the second are bpf descriptors. these represent userland attaching to a bpf interface, so you can have many of them on a single bpf interface. they were arranged in a singly linked list before. now the head and next pointers are replaced with SRP pointers and followed by srp_enter. the list updates are serialised by the kernel lock. the third are the bpf filters. there is an inbound and outbound filter on each bpf descriptor, ann a process can replace them at any time. the pointers from the descriptor to those is also changed to be accessed via srp_enter. updates are serialised by the kernel lock. the fourth thing is the ring that bpf writes to for userland to read. there's one of these per descriptor. because these are only updated when a filter matches (which is hopefully a relatively rare event), we take the kernel lock to serialise the writes to the ring. all this together means you can run bpf against a packet without taking the kernel lock unless you actually caught a packet and need to send it to userland. even better, you can run bpf in parallel, so if we ever support multiple rings on a single interface, we can run bpf on each ring on different cpus safely. ive hit this pretty hard in production at work (yay dhcrelay) on myx (which does rx outside the biglock). ok jmatthew@ mpi@ millert@
Revision tags: OPENBSD_5_7_BASE OPENBSD_5_8_BASE
# 1.24	10-Feb-2015	pelikan	make bpf(4) able to filter based on a pf(4) queue ID for tcpdump -Q qname ALTQ version has been on tech@ for years, people were generally ok with it. ok henning
# 1.23	05-Oct-2014	lteo	fix typo in comment: correspoding -> corresponding
Revision tags: OPENBSD_5_5_BASE OPENBSD_5_6_BASE
# 1.22	18-Dec-2013	krw	Revert the other part of bpf.c's r1.84. May finally fix RD Thrush's encounter with "timeout_add: to_ticks (-1) < 0". Pointed out by RD Thrush.
# 1.21	12-Nov-2013	dlg	try bpf.c r1.84 again, this time without semantic changes to if statements. cheers to sthen@ and krw@ for properly dealing with the fallout of my first commit.
# 1.20	11-Nov-2013	sthen	Revert bpf.c 1.84 / bpfdesc.h 1.19 for now, "panic: timeout_add: to_ticks (-1) < 0" seen by RD Thrush, http://article.gmane.org/gmane.os.openbsd.bugs/20113 where he has a long-running process using bpf which is active at the time of panic. krw@ agrees with reverting for now.
# 1.19	11-Nov-2013	dlg	replace the user of ticks in a condition like "interval + start < ticks" with "ticks - start > interval" because the latter copes with the ticks value wrapping. pointed out by guenther@ ok krw@
# 1.18	24-Oct-2013	deraadt	Move obvious kernel prototypes (and structure's with kernel pointers, obviously only used in the kernel) behind #ifdef _KERNEL
Revision tags: OPENBSD_4_0_BASE OPENBSD_4_1_BASE OPENBSD_4_2_BASE OPENBSD_4_3_BASE OPENBSD_4_4_BASE OPENBSD_4_5_BASE OPENBSD_4_6_BASE OPENBSD_4_7_BASE OPENBSD_4_8_BASE OPENBSD_4_9_BASE OPENBSD_5_0_BASE OPENBSD_5_1_BASE OPENBSD_5_2_BASE OPENBSD_5_3_BASE OPENBSD_5_4_BASE
# 1.17	25-Mar-2006	djm	allow bpf(4) to ignore packets based on their direction (inbound or outbound), using a new BIOCSDIRFILT ioctl; guidance, feedback and ok canacar@
Revision tags: OPENBSD_3_9_BASE
# 1.16	21-Nov-2005	millert	Move contents of sys/select.h to sys/selinfo.h in preparation for a userland-visible sys/select.h. Consistent with what Net and Free do. OK deraadt@, tested with full ports build by naddy@.
Revision tags: OPENBSD_3_7_BASE OPENBSD_3_8_BASE
# 1.15	17-Dec-2004	reyk	knf cleanup, convert old k&r-style functions to ansi-style for a consistent style in sys/net/bpf.c. ok henning@, "looks fine" canacar@
Revision tags: OPENBSD_3_6_BASE
# 1.14	22-Jun-2004	canacar	Add a new "filter drop" flag to bpf and related ioclts. When enabled, it notifies the calling interface that the packet matches a bpf filter and should be dropped. ok henning@ markus@ frantzen@
Revision tags: SMP_SYNC_A SMP_SYNC_B
# 1.13	28-May-2004	grange	bpf device cloning. Now to have more bpf devices just add device nodes in /dev, no need to recompile kernel anymore. Code from form@pdp-11.org.ru, some help from markus@. ok markus@ canacar@ deraadt@
# 1.12	08-May-2004	canacar	reference count bpf descriptors to protect against disappearing interfaces while asleep in read. ok deraadt@
Revision tags: OPENBSD_3_5_BASE
# 1.11	22-Oct-2003	canacar	Add locking and write filtering to bpf descriptors. Locking prevents dangerous ioctls such as changing the interface and sending signals to be executed by an unprivileged process. A filter can also be applied to packets injected through a bpf descriptor. These features allow programs using bpf descriptors to safely drop/seperate privileges. ok frantzen@ henning@ mcbride@
Revision tags: OPENBSD_3_4_BASE
# 1.10	02-Jun-2003	millert	Remove the advertising clause in the UCB license which Berkeley rescinded 22 July 1999. Proofed by myself and Theo.
Revision tags: OPENBSD_3_1_BASE OPENBSD_3_2_BASE OPENBSD_3_3_BASE UBC_SYNC_A UBC_SYNC_B
# 1.9	14-Mar-2002	millert	First round of __P removal in sys
Revision tags: OPENBSD_3_0_BASE UBC_BASE
# 1.8	09-Jun-2001	angelos	branches: 1.8.4; By popular demand, protect from multiple inclusion, and fix to use the same naming style.
# 1.7	28-May-2001	dugsong	add BIOC[GS]HDRCMPLT ioctl for BPF, to disable overwriting of link level source address in forged frames. from NetBSD. art@ok
Revision tags: OPENBSD_2_8_BASE OPENBSD_2_9_BASE
# 1.6	19-Jun-2000	jason	de-#ifdef-ize
Revision tags: OPENBSD_2_6_BASE OPENBSD_2_7_BASE SMP_BASE kame_19991208
# 1.5	08-Aug-1999	niklas	branches: 1.5.4; Support detaching of network interfaces. Still work to do in ipf, and other families than inet.
Revision tags: OPENBSD_2_4_BASE OPENBSD_2_5_BASE
# 1.4	26-Jun-1998	deraadt	fix bpf select(); from mts@rare.net
Revision tags: OPENBSD_2_2_BASE OPENBSD_2_3_BASE
# 1.3	31-Aug-1997	deraadt	for non-tty TIOCSPGRP/F_SETOWN/FIOSETOWN pgid setting calls, store uid and euid as well, then deliver them using new csignal() interface which ensures that pgid setting process is permitted to signal the pgid process(es). Thanks to newsham@aloha.net for extensive help and discussion.
Revision tags: OPENBSD_2_1_BASE
# 1.2	24-Feb-1997	niklas	OpenBSD tags + some prototyping police
# 1.1	18-Oct-1995	deraadt	branches: 1.1.1; Initial revision
# 1.40	02-Jan-2020	claudio	Switch bpf to use pgsigio(9) and sigio_init(9) instead of handrolling something with csignal(). OK visa@
# 1.39	21-Oct-2019	sashan	put bpfdesc reference counting back, revert change introduced in 1.175 as: BPF: remove redundant reference counting of filedescriptors Anton@ made problem crystal clear: I've been looking into a similar bpf panic reported by syzkaller, which looks somewhat related. The one reported by syzkaller is caused by issuing ioctl(SIOCIFDESTROY) on the interface which the packet filter is attached to. This will in turn invoke the following functions expressed as an inverted stacktrace: 1. bpfsdetach() 2. vdevgone() 3. VOP_REVOKE() 4. vop_generic_revoke() 5. vgonel() 6. vclean(DOCLOSE) 7. VOP_CLOSE() 8. bpfclose() Note that bpfclose() is called before changing the vnode type. In bpfclose(), the `struct bpf_d` is immediately removed from the global bpf_d_list list and might end up sleeping inside taskq_barrier(systq). Since the bpf file descriptor (fd) is still present and valid, another thread could perform an ioctl() on the fd only to fault since bpfilter_lookup() will return NULL. The vnode is not locked in this path either so it won't end up waiting on the ongoing vclean(). Steps to trigger the similar type of panic are straightforward, let there be two processes running concurrently: process A: while true ; do ifconfig tun0 up ; ifconfig tun0 destroy ; done process B: while true ; do tcpdump -i tun0 ; done panic happens within few secs (Dell PowerEdge 710) OK @visa, OK @anton
Revision tags: OPENBSD_6_6_BASE
# 1.38	18-May-2019	sashan	branches: 1.38.2; BPF: remove redundant reference counting of filedescriptors OK visa@, OK mpi@
# 1.37	15-Apr-2019	sashan	moving BPF to RCU OK visa@
Revision tags: OPENBSD_6_3_BASE OPENBSD_6_4_BASE OPENBSD_6_5_BASE
# 1.36	24-Jan-2018	dlg	add support for bpf on "subsystems", not just network interfaces bpf assumed that it was being unconditionally attached to network interfaces, and maintained a pointer to a struct ifnet *. this was mostly used to get at the name of the interface, which is how userland asks to be attached to a particular interface. this diff adds a pointer to the name and uses it instead of the interface pointer for these lookups. this in turn allows bpf to be attached to arbitrary subsystems in the kernel which just have to supply a name rather than an interface pointer. for example, bpf could be attached to pf_test so you can see what packets are about to be filtered. mpi@ is using this to look at usb transfers. bpf still uses the interface pointer for bpfwrite, and for enabling and disabling promisc. however, these are nopped out for subsystems. ok mpi@
Revision tags: OPENBSD_6_1_BASE OPENBSD_6_2_BASE
# 1.35	24-Jan-2017	krw	A space here, a space there. Soon we're talking real whitespace rectification.
# 1.34	09-Jan-2017	mpi	Use a mutex to serialize accesses to buffer slots. With this change bpf_catchpacket() no longer need the KERNEL_LOCK(). Tested by Hrvoje Popovski who reported a recursion in the previous attempt. ok bluhm@
# 1.33	03-Jan-2017	mpi	Revert previous, there's still a problem with recursive entries in bpf_mpath_ether(). Problem reported by Hrvoje Popovski.
# 1.32	02-Jan-2017	mpi	Use a mutex to serialize accesses to buffer slots. With this change bpf_catchpacket() no longer need the KERNEL_LOCK(). ok bluhm@, jmatthew@
# 1.31	22-Aug-2016	mpi	Call csignal() and selwakeup() from a KERNEL_LOCK'd task. This will allow us make bpf_tap() KERNEL_LOCK() free. Discussed with dlg@ and input from guenther@
Revision tags: OPENBSD_6_0_BASE
# 1.30	30-Mar-2016	dlg	remove support for BIOCGQUEUE and BIOSGQUEUE nothing uses them, and the implementation make incorrect assumptions about mbufs within bpf processing that could lead to some weird failures. ok sthen@ deraadt@ mpi@
Revision tags: OPENBSD_5_9_BASE
# 1.29	03-Dec-2015	mpi	Use SRPL_HEAD() and SRPL_ENTRY() to be consistent with and allow to fallback to a SLIST. ok dlg@, jasper@
# 1.28	09-Sep-2015	dlg	convert bpf to using an srp list for the list of descriptors. this replaces the hand rolled list. the code has always used hand rolled lists, but that gets a bit cumbersome when theyre SRPs. requested ages ago by mpi@
# 1.27	01-Sep-2015	dlg	reintroduce bpf.c r1.121. this differs slightly from 1.121 in that it uses the new srp_follow() to walk the list of descriptors on an interface. this is instead of interleaving srp_enter() and srp_leave(), which can lead to races and corruption if you're touching the same SRPs at different IPLs on the same CPU. ok deraadt@ jmatthew@
# 1.26	23-Aug-2015	dlg	back out bpf+srp. its blowing up in a bridge setup. ill debug this out of the tree.
# 1.25	16-Aug-2015	dlg	make bpf_mtap mpsafe by using SRPs. this was originally implemented by jmatthew@ last year, and updated by us both during s2k15. there are four data structures that need to be looked after. the first is the bpf interface itself. it is allocated and freed at the same time as an actual interface, so if you're able to send or receive packets, you're able to run bpf on an interface too. dont need to do any work there. the second are bpf descriptors. these represent userland attaching to a bpf interface, so you can have many of them on a single bpf interface. they were arranged in a singly linked list before. now the head and next pointers are replaced with SRP pointers and followed by srp_enter. the list updates are serialised by the kernel lock. the third are the bpf filters. there is an inbound and outbound filter on each bpf descriptor, ann a process can replace them at any time. the pointers from the descriptor to those is also changed to be accessed via srp_enter. updates are serialised by the kernel lock. the fourth thing is the ring that bpf writes to for userland to read. there's one of these per descriptor. because these are only updated when a filter matches (which is hopefully a relatively rare event), we take the kernel lock to serialise the writes to the ring. all this together means you can run bpf against a packet without taking the kernel lock unless you actually caught a packet and need to send it to userland. even better, you can run bpf in parallel, so if we ever support multiple rings on a single interface, we can run bpf on each ring on different cpus safely. ive hit this pretty hard in production at work (yay dhcrelay) on myx (which does rx outside the biglock). ok jmatthew@ mpi@ millert@
Revision tags: OPENBSD_5_7_BASE OPENBSD_5_8_BASE
# 1.24	10-Feb-2015	pelikan	make bpf(4) able to filter based on a pf(4) queue ID for tcpdump -Q qname ALTQ version has been on tech@ for years, people were generally ok with it. ok henning
# 1.23	05-Oct-2014	lteo	fix typo in comment: correspoding -> corresponding
Revision tags: OPENBSD_5_5_BASE OPENBSD_5_6_BASE
# 1.22	18-Dec-2013	krw	Revert the other part of bpf.c's r1.84. May finally fix RD Thrush's encounter with "timeout_add: to_ticks (-1) < 0". Pointed out by RD Thrush.
# 1.21	12-Nov-2013	dlg	try bpf.c r1.84 again, this time without semantic changes to if statements. cheers to sthen@ and krw@ for properly dealing with the fallout of my first commit.
# 1.20	11-Nov-2013	sthen	Revert bpf.c 1.84 / bpfdesc.h 1.19 for now, "panic: timeout_add: to_ticks (-1) < 0" seen by RD Thrush, http://article.gmane.org/gmane.os.openbsd.bugs/20113 where he has a long-running process using bpf which is active at the time of panic. krw@ agrees with reverting for now.
# 1.19	11-Nov-2013	dlg	replace the user of ticks in a condition like "interval + start < ticks" with "ticks - start > interval" because the latter copes with the ticks value wrapping. pointed out by guenther@ ok krw@
# 1.18	24-Oct-2013	deraadt	Move obvious kernel prototypes (and structure's with kernel pointers, obviously only used in the kernel) behind #ifdef _KERNEL
Revision tags: OPENBSD_4_0_BASE OPENBSD_4_1_BASE OPENBSD_4_2_BASE OPENBSD_4_3_BASE OPENBSD_4_4_BASE OPENBSD_4_5_BASE OPENBSD_4_6_BASE OPENBSD_4_7_BASE OPENBSD_4_8_BASE OPENBSD_4_9_BASE OPENBSD_5_0_BASE OPENBSD_5_1_BASE OPENBSD_5_2_BASE OPENBSD_5_3_BASE OPENBSD_5_4_BASE
# 1.17	25-Mar-2006	djm	allow bpf(4) to ignore packets based on their direction (inbound or outbound), using a new BIOCSDIRFILT ioctl; guidance, feedback and ok canacar@
Revision tags: OPENBSD_3_9_BASE
# 1.16	21-Nov-2005	millert	Move contents of sys/select.h to sys/selinfo.h in preparation for a userland-visible sys/select.h. Consistent with what Net and Free do. OK deraadt@, tested with full ports build by naddy@.
Revision tags: OPENBSD_3_7_BASE OPENBSD_3_8_BASE
# 1.15	17-Dec-2004	reyk	knf cleanup, convert old k&r-style functions to ansi-style for a consistent style in sys/net/bpf.c. ok henning@, "looks fine" canacar@
Revision tags: OPENBSD_3_6_BASE
# 1.14	22-Jun-2004	canacar	Add a new "filter drop" flag to bpf and related ioclts. When enabled, it notifies the calling interface that the packet matches a bpf filter and should be dropped. ok henning@ markus@ frantzen@
Revision tags: SMP_SYNC_A SMP_SYNC_B
# 1.13	28-May-2004	grange	bpf device cloning. Now to have more bpf devices just add device nodes in /dev, no need to recompile kernel anymore. Code from form@pdp-11.org.ru, some help from markus@. ok markus@ canacar@ deraadt@
# 1.12	08-May-2004	canacar	reference count bpf descriptors to protect against disappearing interfaces while asleep in read. ok deraadt@
Revision tags: OPENBSD_3_5_BASE
# 1.11	22-Oct-2003	canacar	Add locking and write filtering to bpf descriptors. Locking prevents dangerous ioctls such as changing the interface and sending signals to be executed by an unprivileged process. A filter can also be applied to packets injected through a bpf descriptor. These features allow programs using bpf descriptors to safely drop/seperate privileges. ok frantzen@ henning@ mcbride@
Revision tags: OPENBSD_3_4_BASE
# 1.10	02-Jun-2003	millert	Remove the advertising clause in the UCB license which Berkeley rescinded 22 July 1999. Proofed by myself and Theo.
Revision tags: OPENBSD_3_1_BASE OPENBSD_3_2_BASE OPENBSD_3_3_BASE UBC_SYNC_A UBC_SYNC_B
# 1.9	14-Mar-2002	millert	First round of __P removal in sys
Revision tags: OPENBSD_3_0_BASE UBC_BASE
# 1.8	09-Jun-2001	angelos	branches: 1.8.4; By popular demand, protect from multiple inclusion, and fix to use the same naming style.
# 1.7	28-May-2001	dugsong	add BIOC[GS]HDRCMPLT ioctl for BPF, to disable overwriting of link level source address in forged frames. from NetBSD. art@ok
Revision tags: OPENBSD_2_8_BASE OPENBSD_2_9_BASE
# 1.6	19-Jun-2000	jason	de-#ifdef-ize
Revision tags: OPENBSD_2_6_BASE OPENBSD_2_7_BASE SMP_BASE kame_19991208
# 1.5	08-Aug-1999	niklas	branches: 1.5.4; Support detaching of network interfaces. Still work to do in ipf, and other families than inet.
Revision tags: OPENBSD_2_4_BASE OPENBSD_2_5_BASE
# 1.4	26-Jun-1998	deraadt	fix bpf select(); from mts@rare.net
Revision tags: OPENBSD_2_2_BASE OPENBSD_2_3_BASE
# 1.3	31-Aug-1997	deraadt	for non-tty TIOCSPGRP/F_SETOWN/FIOSETOWN pgid setting calls, store uid and euid as well, then deliver them using new csignal() interface which ensures that pgid setting process is permitted to signal the pgid process(es). Thanks to newsham@aloha.net for extensive help and discussion.
Revision tags: OPENBSD_2_1_BASE
# 1.2	24-Feb-1997	niklas	OpenBSD tags + some prototyping police
# 1.1	18-Oct-1995	deraadt	branches: 1.1.1; Initial revision
# 1.39	21-Oct-2019	sashan	put bpfdesc reference counting back, revert change introduced in 1.175 as: BPF: remove redundant reference counting of filedescriptors Anton@ made problem crystal clear: I've been looking into a similar bpf panic reported by syzkaller, which looks somewhat related. The one reported by syzkaller is caused by issuing ioctl(SIOCIFDESTROY) on the interface which the packet filter is attached to. This will in turn invoke the following functions expressed as an inverted stacktrace: 1. bpfsdetach() 2. vdevgone() 3. VOP_REVOKE() 4. vop_generic_revoke() 5. vgonel() 6. vclean(DOCLOSE) 7. VOP_CLOSE() 8. bpfclose() Note that bpfclose() is called before changing the vnode type. In bpfclose(), the `struct bpf_d` is immediately removed from the global bpf_d_list list and might end up sleeping inside taskq_barrier(systq). Since the bpf file descriptor (fd) is still present and valid, another thread could perform an ioctl() on the fd only to fault since bpfilter_lookup() will return NULL. The vnode is not locked in this path either so it won't end up waiting on the ongoing vclean(). Steps to trigger the similar type of panic are straightforward, let there be two processes running concurrently: process A: while true ; do ifconfig tun0 up ; ifconfig tun0 destroy ; done process B: while true ; do tcpdump -i tun0 ; done panic happens within few secs (Dell PowerEdge 710) OK @visa, OK @anton
Revision tags: OPENBSD_6_6_BASE
# 1.38	18-May-2019	sashan	BPF: remove redundant reference counting of filedescriptors OK visa@, OK mpi@
# 1.37	15-Apr-2019	sashan	moving BPF to RCU OK visa@
Revision tags: OPENBSD_6_3_BASE OPENBSD_6_4_BASE OPENBSD_6_5_BASE
# 1.36	24-Jan-2018	dlg	add support for bpf on "subsystems", not just network interfaces bpf assumed that it was being unconditionally attached to network interfaces, and maintained a pointer to a struct ifnet *. this was mostly used to get at the name of the interface, which is how userland asks to be attached to a particular interface. this diff adds a pointer to the name and uses it instead of the interface pointer for these lookups. this in turn allows bpf to be attached to arbitrary subsystems in the kernel which just have to supply a name rather than an interface pointer. for example, bpf could be attached to pf_test so you can see what packets are about to be filtered. mpi@ is using this to look at usb transfers. bpf still uses the interface pointer for bpfwrite, and for enabling and disabling promisc. however, these are nopped out for subsystems. ok mpi@
Revision tags: OPENBSD_6_1_BASE OPENBSD_6_2_BASE
# 1.35	24-Jan-2017	krw	A space here, a space there. Soon we're talking real whitespace rectification.
# 1.34	09-Jan-2017	mpi	Use a mutex to serialize accesses to buffer slots. With this change bpf_catchpacket() no longer need the KERNEL_LOCK(). Tested by Hrvoje Popovski who reported a recursion in the previous attempt. ok bluhm@
# 1.33	03-Jan-2017	mpi	Revert previous, there's still a problem with recursive entries in bpf_mpath_ether(). Problem reported by Hrvoje Popovski.
# 1.32	02-Jan-2017	mpi	Use a mutex to serialize accesses to buffer slots. With this change bpf_catchpacket() no longer need the KERNEL_LOCK(). ok bluhm@, jmatthew@
# 1.31	22-Aug-2016	mpi	Call csignal() and selwakeup() from a KERNEL_LOCK'd task. This will allow us make bpf_tap() KERNEL_LOCK() free. Discussed with dlg@ and input from guenther@
Revision tags: OPENBSD_6_0_BASE
# 1.30	30-Mar-2016	dlg	remove support for BIOCGQUEUE and BIOSGQUEUE nothing uses them, and the implementation make incorrect assumptions about mbufs within bpf processing that could lead to some weird failures. ok sthen@ deraadt@ mpi@
Revision tags: OPENBSD_5_9_BASE
# 1.29	03-Dec-2015	mpi	Use SRPL_HEAD() and SRPL_ENTRY() to be consistent with and allow to fallback to a SLIST. ok dlg@, jasper@
# 1.28	09-Sep-2015	dlg	convert bpf to using an srp list for the list of descriptors. this replaces the hand rolled list. the code has always used hand rolled lists, but that gets a bit cumbersome when theyre SRPs. requested ages ago by mpi@
# 1.27	01-Sep-2015	dlg	reintroduce bpf.c r1.121. this differs slightly from 1.121 in that it uses the new srp_follow() to walk the list of descriptors on an interface. this is instead of interleaving srp_enter() and srp_leave(), which can lead to races and corruption if you're touching the same SRPs at different IPLs on the same CPU. ok deraadt@ jmatthew@
# 1.26	23-Aug-2015	dlg	back out bpf+srp. its blowing up in a bridge setup. ill debug this out of the tree.
# 1.25	16-Aug-2015	dlg	make bpf_mtap mpsafe by using SRPs. this was originally implemented by jmatthew@ last year, and updated by us both during s2k15. there are four data structures that need to be looked after. the first is the bpf interface itself. it is allocated and freed at the same time as an actual interface, so if you're able to send or receive packets, you're able to run bpf on an interface too. dont need to do any work there. the second are bpf descriptors. these represent userland attaching to a bpf interface, so you can have many of them on a single bpf interface. they were arranged in a singly linked list before. now the head and next pointers are replaced with SRP pointers and followed by srp_enter. the list updates are serialised by the kernel lock. the third are the bpf filters. there is an inbound and outbound filter on each bpf descriptor, ann a process can replace them at any time. the pointers from the descriptor to those is also changed to be accessed via srp_enter. updates are serialised by the kernel lock. the fourth thing is the ring that bpf writes to for userland to read. there's one of these per descriptor. because these are only updated when a filter matches (which is hopefully a relatively rare event), we take the kernel lock to serialise the writes to the ring. all this together means you can run bpf against a packet without taking the kernel lock unless you actually caught a packet and need to send it to userland. even better, you can run bpf in parallel, so if we ever support multiple rings on a single interface, we can run bpf on each ring on different cpus safely. ive hit this pretty hard in production at work (yay dhcrelay) on myx (which does rx outside the biglock). ok jmatthew@ mpi@ millert@
Revision tags: OPENBSD_5_7_BASE OPENBSD_5_8_BASE
# 1.24	10-Feb-2015	pelikan	make bpf(4) able to filter based on a pf(4) queue ID for tcpdump -Q qname ALTQ version has been on tech@ for years, people were generally ok with it. ok henning
# 1.23	05-Oct-2014	lteo	fix typo in comment: correspoding -> corresponding
Revision tags: OPENBSD_5_5_BASE OPENBSD_5_6_BASE
# 1.22	18-Dec-2013	krw	Revert the other part of bpf.c's r1.84. May finally fix RD Thrush's encounter with "timeout_add: to_ticks (-1) < 0". Pointed out by RD Thrush.
# 1.21	12-Nov-2013	dlg	try bpf.c r1.84 again, this time without semantic changes to if statements. cheers to sthen@ and krw@ for properly dealing with the fallout of my first commit.
# 1.20	11-Nov-2013	sthen	Revert bpf.c 1.84 / bpfdesc.h 1.19 for now, "panic: timeout_add: to_ticks (-1) < 0" seen by RD Thrush, http://article.gmane.org/gmane.os.openbsd.bugs/20113 where he has a long-running process using bpf which is active at the time of panic. krw@ agrees with reverting for now.
# 1.19	11-Nov-2013	dlg	replace the user of ticks in a condition like "interval + start < ticks" with "ticks - start > interval" because the latter copes with the ticks value wrapping. pointed out by guenther@ ok krw@
# 1.18	24-Oct-2013	deraadt	Move obvious kernel prototypes (and structure's with kernel pointers, obviously only used in the kernel) behind #ifdef _KERNEL
Revision tags: OPENBSD_4_0_BASE OPENBSD_4_1_BASE OPENBSD_4_2_BASE OPENBSD_4_3_BASE OPENBSD_4_4_BASE OPENBSD_4_5_BASE OPENBSD_4_6_BASE OPENBSD_4_7_BASE OPENBSD_4_8_BASE OPENBSD_4_9_BASE OPENBSD_5_0_BASE OPENBSD_5_1_BASE OPENBSD_5_2_BASE OPENBSD_5_3_BASE OPENBSD_5_4_BASE
# 1.17	25-Mar-2006	djm	allow bpf(4) to ignore packets based on their direction (inbound or outbound), using a new BIOCSDIRFILT ioctl; guidance, feedback and ok canacar@
Revision tags: OPENBSD_3_9_BASE
# 1.16	21-Nov-2005	millert	Move contents of sys/select.h to sys/selinfo.h in preparation for a userland-visible sys/select.h. Consistent with what Net and Free do. OK deraadt@, tested with full ports build by naddy@.
Revision tags: OPENBSD_3_7_BASE OPENBSD_3_8_BASE
# 1.15	17-Dec-2004	reyk	knf cleanup, convert old k&r-style functions to ansi-style for a consistent style in sys/net/bpf.c. ok henning@, "looks fine" canacar@
Revision tags: OPENBSD_3_6_BASE
# 1.14	22-Jun-2004	canacar	Add a new "filter drop" flag to bpf and related ioclts. When enabled, it notifies the calling interface that the packet matches a bpf filter and should be dropped. ok henning@ markus@ frantzen@
Revision tags: SMP_SYNC_A SMP_SYNC_B
# 1.13	28-May-2004	grange	bpf device cloning. Now to have more bpf devices just add device nodes in /dev, no need to recompile kernel anymore. Code from form@pdp-11.org.ru, some help from markus@. ok markus@ canacar@ deraadt@
# 1.12	08-May-2004	canacar	reference count bpf descriptors to protect against disappearing interfaces while asleep in read. ok deraadt@
Revision tags: OPENBSD_3_5_BASE
# 1.11	22-Oct-2003	canacar	Add locking and write filtering to bpf descriptors. Locking prevents dangerous ioctls such as changing the interface and sending signals to be executed by an unprivileged process. A filter can also be applied to packets injected through a bpf descriptor. These features allow programs using bpf descriptors to safely drop/seperate privileges. ok frantzen@ henning@ mcbride@
Revision tags: OPENBSD_3_4_BASE
# 1.10	02-Jun-2003	millert	Remove the advertising clause in the UCB license which Berkeley rescinded 22 July 1999. Proofed by myself and Theo.
Revision tags: OPENBSD_3_1_BASE OPENBSD_3_2_BASE OPENBSD_3_3_BASE UBC_SYNC_A UBC_SYNC_B
# 1.9	14-Mar-2002	millert	First round of __P removal in sys
Revision tags: OPENBSD_3_0_BASE UBC_BASE
# 1.8	09-Jun-2001	angelos	branches: 1.8.4; By popular demand, protect from multiple inclusion, and fix to use the same naming style.
# 1.7	28-May-2001	dugsong	add BIOC[GS]HDRCMPLT ioctl for BPF, to disable overwriting of link level source address in forged frames. from NetBSD. art@ok
Revision tags: OPENBSD_2_8_BASE OPENBSD_2_9_BASE
# 1.6	19-Jun-2000	jason	de-#ifdef-ize
Revision tags: OPENBSD_2_6_BASE OPENBSD_2_7_BASE SMP_BASE kame_19991208
# 1.5	08-Aug-1999	niklas	branches: 1.5.4; Support detaching of network interfaces. Still work to do in ipf, and other families than inet.
Revision tags: OPENBSD_2_4_BASE OPENBSD_2_5_BASE
# 1.4	26-Jun-1998	deraadt	fix bpf select(); from mts@rare.net
Revision tags: OPENBSD_2_2_BASE OPENBSD_2_3_BASE
# 1.3	31-Aug-1997	deraadt	for non-tty TIOCSPGRP/F_SETOWN/FIOSETOWN pgid setting calls, store uid and euid as well, then deliver them using new csignal() interface which ensures that pgid setting process is permitted to signal the pgid process(es). Thanks to newsham@aloha.net for extensive help and discussion.
Revision tags: OPENBSD_2_1_BASE
# 1.2	24-Feb-1997	niklas	OpenBSD tags + some prototyping police
# 1.1	18-Oct-1995	deraadt	branches: 1.1.1; Initial revision
# 1.38	18-May-2019	sashan	BPF: remove redundant reference counting of filedescriptors OK visa@, OK mpi@
# 1.37	15-Apr-2019	sashan	moving BPF to RCU OK visa@
Revision tags: OPENBSD_6_3_BASE OPENBSD_6_4_BASE OPENBSD_6_5_BASE
# 1.36	24-Jan-2018	dlg	add support for bpf on "subsystems", not just network interfaces bpf assumed that it was being unconditionally attached to network interfaces, and maintained a pointer to a struct ifnet *. this was mostly used to get at the name of the interface, which is how userland asks to be attached to a particular interface. this diff adds a pointer to the name and uses it instead of the interface pointer for these lookups. this in turn allows bpf to be attached to arbitrary subsystems in the kernel which just have to supply a name rather than an interface pointer. for example, bpf could be attached to pf_test so you can see what packets are about to be filtered. mpi@ is using this to look at usb transfers. bpf still uses the interface pointer for bpfwrite, and for enabling and disabling promisc. however, these are nopped out for subsystems. ok mpi@
Revision tags: OPENBSD_6_1_BASE OPENBSD_6_2_BASE
# 1.35	24-Jan-2017	krw	A space here, a space there. Soon we're talking real whitespace rectification.
# 1.34	09-Jan-2017	mpi	Use a mutex to serialize accesses to buffer slots. With this change bpf_catchpacket() no longer need the KERNEL_LOCK(). Tested by Hrvoje Popovski who reported a recursion in the previous attempt. ok bluhm@
# 1.33	03-Jan-2017	mpi	Revert previous, there's still a problem with recursive entries in bpf_mpath_ether(). Problem reported by Hrvoje Popovski.
# 1.32	02-Jan-2017	mpi	Use a mutex to serialize accesses to buffer slots. With this change bpf_catchpacket() no longer need the KERNEL_LOCK(). ok bluhm@, jmatthew@
# 1.31	22-Aug-2016	mpi	Call csignal() and selwakeup() from a KERNEL_LOCK'd task. This will allow us make bpf_tap() KERNEL_LOCK() free. Discussed with dlg@ and input from guenther@
Revision tags: OPENBSD_6_0_BASE
# 1.30	30-Mar-2016	dlg	remove support for BIOCGQUEUE and BIOSGQUEUE nothing uses them, and the implementation make incorrect assumptions about mbufs within bpf processing that could lead to some weird failures. ok sthen@ deraadt@ mpi@
Revision tags: OPENBSD_5_9_BASE
# 1.29	03-Dec-2015	mpi	Use SRPL_HEAD() and SRPL_ENTRY() to be consistent with and allow to fallback to a SLIST. ok dlg@, jasper@
# 1.28	09-Sep-2015	dlg	convert bpf to using an srp list for the list of descriptors. this replaces the hand rolled list. the code has always used hand rolled lists, but that gets a bit cumbersome when theyre SRPs. requested ages ago by mpi@
# 1.27	01-Sep-2015	dlg	reintroduce bpf.c r1.121. this differs slightly from 1.121 in that it uses the new srp_follow() to walk the list of descriptors on an interface. this is instead of interleaving srp_enter() and srp_leave(), which can lead to races and corruption if you're touching the same SRPs at different IPLs on the same CPU. ok deraadt@ jmatthew@
# 1.26	23-Aug-2015	dlg	back out bpf+srp. its blowing up in a bridge setup. ill debug this out of the tree.
# 1.25	16-Aug-2015	dlg	make bpf_mtap mpsafe by using SRPs. this was originally implemented by jmatthew@ last year, and updated by us both during s2k15. there are four data structures that need to be looked after. the first is the bpf interface itself. it is allocated and freed at the same time as an actual interface, so if you're able to send or receive packets, you're able to run bpf on an interface too. dont need to do any work there. the second are bpf descriptors. these represent userland attaching to a bpf interface, so you can have many of them on a single bpf interface. they were arranged in a singly linked list before. now the head and next pointers are replaced with SRP pointers and followed by srp_enter. the list updates are serialised by the kernel lock. the third are the bpf filters. there is an inbound and outbound filter on each bpf descriptor, ann a process can replace them at any time. the pointers from the descriptor to those is also changed to be accessed via srp_enter. updates are serialised by the kernel lock. the fourth thing is the ring that bpf writes to for userland to read. there's one of these per descriptor. because these are only updated when a filter matches (which is hopefully a relatively rare event), we take the kernel lock to serialise the writes to the ring. all this together means you can run bpf against a packet without taking the kernel lock unless you actually caught a packet and need to send it to userland. even better, you can run bpf in parallel, so if we ever support multiple rings on a single interface, we can run bpf on each ring on different cpus safely. ive hit this pretty hard in production at work (yay dhcrelay) on myx (which does rx outside the biglock). ok jmatthew@ mpi@ millert@
Revision tags: OPENBSD_5_7_BASE OPENBSD_5_8_BASE
# 1.24	10-Feb-2015	pelikan	make bpf(4) able to filter based on a pf(4) queue ID for tcpdump -Q qname ALTQ version has been on tech@ for years, people were generally ok with it. ok henning
# 1.23	05-Oct-2014	lteo	fix typo in comment: correspoding -> corresponding
Revision tags: OPENBSD_5_5_BASE OPENBSD_5_6_BASE
# 1.22	18-Dec-2013	krw	Revert the other part of bpf.c's r1.84. May finally fix RD Thrush's encounter with "timeout_add: to_ticks (-1) < 0". Pointed out by RD Thrush.
# 1.21	12-Nov-2013	dlg	try bpf.c r1.84 again, this time without semantic changes to if statements. cheers to sthen@ and krw@ for properly dealing with the fallout of my first commit.
# 1.20	11-Nov-2013	sthen	Revert bpf.c 1.84 / bpfdesc.h 1.19 for now, "panic: timeout_add: to_ticks (-1) < 0" seen by RD Thrush, http://article.gmane.org/gmane.os.openbsd.bugs/20113 where he has a long-running process using bpf which is active at the time of panic. krw@ agrees with reverting for now.
# 1.19	11-Nov-2013	dlg	replace the user of ticks in a condition like "interval + start < ticks" with "ticks - start > interval" because the latter copes with the ticks value wrapping. pointed out by guenther@ ok krw@
# 1.18	24-Oct-2013	deraadt	Move obvious kernel prototypes (and structure's with kernel pointers, obviously only used in the kernel) behind #ifdef _KERNEL
Revision tags: OPENBSD_4_0_BASE OPENBSD_4_1_BASE OPENBSD_4_2_BASE OPENBSD_4_3_BASE OPENBSD_4_4_BASE OPENBSD_4_5_BASE OPENBSD_4_6_BASE OPENBSD_4_7_BASE OPENBSD_4_8_BASE OPENBSD_4_9_BASE OPENBSD_5_0_BASE OPENBSD_5_1_BASE OPENBSD_5_2_BASE OPENBSD_5_3_BASE OPENBSD_5_4_BASE
# 1.17	25-Mar-2006	djm	allow bpf(4) to ignore packets based on their direction (inbound or outbound), using a new BIOCSDIRFILT ioctl; guidance, feedback and ok canacar@
Revision tags: OPENBSD_3_9_BASE
# 1.16	21-Nov-2005	millert	Move contents of sys/select.h to sys/selinfo.h in preparation for a userland-visible sys/select.h. Consistent with what Net and Free do. OK deraadt@, tested with full ports build by naddy@.
Revision tags: OPENBSD_3_7_BASE OPENBSD_3_8_BASE
# 1.15	17-Dec-2004	reyk	knf cleanup, convert old k&r-style functions to ansi-style for a consistent style in sys/net/bpf.c. ok henning@, "looks fine" canacar@
Revision tags: OPENBSD_3_6_BASE
# 1.14	22-Jun-2004	canacar	Add a new "filter drop" flag to bpf and related ioclts. When enabled, it notifies the calling interface that the packet matches a bpf filter and should be dropped. ok henning@ markus@ frantzen@
Revision tags: SMP_SYNC_A SMP_SYNC_B
# 1.13	28-May-2004	grange	bpf device cloning. Now to have more bpf devices just add device nodes in /dev, no need to recompile kernel anymore. Code from form@pdp-11.org.ru, some help from markus@. ok markus@ canacar@ deraadt@
# 1.12	08-May-2004	canacar	reference count bpf descriptors to protect against disappearing interfaces while asleep in read. ok deraadt@
Revision tags: OPENBSD_3_5_BASE
# 1.11	22-Oct-2003	canacar	Add locking and write filtering to bpf descriptors. Locking prevents dangerous ioctls such as changing the interface and sending signals to be executed by an unprivileged process. A filter can also be applied to packets injected through a bpf descriptor. These features allow programs using bpf descriptors to safely drop/seperate privileges. ok frantzen@ henning@ mcbride@
Revision tags: OPENBSD_3_4_BASE
# 1.10	02-Jun-2003	millert	Remove the advertising clause in the UCB license which Berkeley rescinded 22 July 1999. Proofed by myself and Theo.
Revision tags: OPENBSD_3_1_BASE OPENBSD_3_2_BASE OPENBSD_3_3_BASE UBC_SYNC_A UBC_SYNC_B
# 1.9	14-Mar-2002	millert	First round of __P removal in sys
Revision tags: OPENBSD_3_0_BASE UBC_BASE
# 1.8	09-Jun-2001	angelos	branches: 1.8.4; By popular demand, protect from multiple inclusion, and fix to use the same naming style.
# 1.7	28-May-2001	dugsong	add BIOC[GS]HDRCMPLT ioctl for BPF, to disable overwriting of link level source address in forged frames. from NetBSD. art@ok
Revision tags: OPENBSD_2_8_BASE OPENBSD_2_9_BASE
# 1.6	19-Jun-2000	jason	de-#ifdef-ize
Revision tags: OPENBSD_2_6_BASE OPENBSD_2_7_BASE SMP_BASE kame_19991208
# 1.5	08-Aug-1999	niklas	branches: 1.5.4; Support detaching of network interfaces. Still work to do in ipf, and other families than inet.
Revision tags: OPENBSD_2_4_BASE OPENBSD_2_5_BASE
# 1.4	26-Jun-1998	deraadt	fix bpf select(); from mts@rare.net
Revision tags: OPENBSD_2_2_BASE OPENBSD_2_3_BASE
# 1.3	31-Aug-1997	deraadt	for non-tty TIOCSPGRP/F_SETOWN/FIOSETOWN pgid setting calls, store uid and euid as well, then deliver them using new csignal() interface which ensures that pgid setting process is permitted to signal the pgid process(es). Thanks to newsham@aloha.net for extensive help and discussion.
Revision tags: OPENBSD_2_1_BASE
# 1.2	24-Feb-1997	niklas	OpenBSD tags + some prototyping police
# 1.1	18-Oct-1995	deraadt	branches: 1.1.1; Initial revision
# 1.37	15-Apr-2019	sashan	moving BPF to RCU OK visa@
Revision tags: OPENBSD_6_3_BASE OPENBSD_6_4_BASE OPENBSD_6_5_BASE
# 1.36	24-Jan-2018	dlg	add support for bpf on "subsystems", not just network interfaces bpf assumed that it was being unconditionally attached to network interfaces, and maintained a pointer to a struct ifnet *. this was mostly used to get at the name of the interface, which is how userland asks to be attached to a particular interface. this diff adds a pointer to the name and uses it instead of the interface pointer for these lookups. this in turn allows bpf to be attached to arbitrary subsystems in the kernel which just have to supply a name rather than an interface pointer. for example, bpf could be attached to pf_test so you can see what packets are about to be filtered. mpi@ is using this to look at usb transfers. bpf still uses the interface pointer for bpfwrite, and for enabling and disabling promisc. however, these are nopped out for subsystems. ok mpi@
Revision tags: OPENBSD_6_1_BASE OPENBSD_6_2_BASE
# 1.35	24-Jan-2017	krw	A space here, a space there. Soon we're talking real whitespace rectification.
# 1.34	09-Jan-2017	mpi	Use a mutex to serialize accesses to buffer slots. With this change bpf_catchpacket() no longer need the KERNEL_LOCK(). Tested by Hrvoje Popovski who reported a recursion in the previous attempt. ok bluhm@
# 1.33	03-Jan-2017	mpi	Revert previous, there's still a problem with recursive entries in bpf_mpath_ether(). Problem reported by Hrvoje Popovski.
# 1.32	02-Jan-2017	mpi	Use a mutex to serialize accesses to buffer slots. With this change bpf_catchpacket() no longer need the KERNEL_LOCK(). ok bluhm@, jmatthew@
# 1.31	22-Aug-2016	mpi	Call csignal() and selwakeup() from a KERNEL_LOCK'd task. This will allow us make bpf_tap() KERNEL_LOCK() free. Discussed with dlg@ and input from guenther@
Revision tags: OPENBSD_6_0_BASE
# 1.30	30-Mar-2016	dlg	remove support for BIOCGQUEUE and BIOSGQUEUE nothing uses them, and the implementation make incorrect assumptions about mbufs within bpf processing that could lead to some weird failures. ok sthen@ deraadt@ mpi@
Revision tags: OPENBSD_5_9_BASE
# 1.29	03-Dec-2015	mpi	Use SRPL_HEAD() and SRPL_ENTRY() to be consistent with and allow to fallback to a SLIST. ok dlg@, jasper@
# 1.28	09-Sep-2015	dlg	convert bpf to using an srp list for the list of descriptors. this replaces the hand rolled list. the code has always used hand rolled lists, but that gets a bit cumbersome when theyre SRPs. requested ages ago by mpi@
# 1.27	01-Sep-2015	dlg	reintroduce bpf.c r1.121. this differs slightly from 1.121 in that it uses the new srp_follow() to walk the list of descriptors on an interface. this is instead of interleaving srp_enter() and srp_leave(), which can lead to races and corruption if you're touching the same SRPs at different IPLs on the same CPU. ok deraadt@ jmatthew@
# 1.26	23-Aug-2015	dlg	back out bpf+srp. its blowing up in a bridge setup. ill debug this out of the tree.
# 1.25	16-Aug-2015	dlg	make bpf_mtap mpsafe by using SRPs. this was originally implemented by jmatthew@ last year, and updated by us both during s2k15. there are four data structures that need to be looked after. the first is the bpf interface itself. it is allocated and freed at the same time as an actual interface, so if you're able to send or receive packets, you're able to run bpf on an interface too. dont need to do any work there. the second are bpf descriptors. these represent userland attaching to a bpf interface, so you can have many of them on a single bpf interface. they were arranged in a singly linked list before. now the head and next pointers are replaced with SRP pointers and followed by srp_enter. the list updates are serialised by the kernel lock. the third are the bpf filters. there is an inbound and outbound filter on each bpf descriptor, ann a process can replace them at any time. the pointers from the descriptor to those is also changed to be accessed via srp_enter. updates are serialised by the kernel lock. the fourth thing is the ring that bpf writes to for userland to read. there's one of these per descriptor. because these are only updated when a filter matches (which is hopefully a relatively rare event), we take the kernel lock to serialise the writes to the ring. all this together means you can run bpf against a packet without taking the kernel lock unless you actually caught a packet and need to send it to userland. even better, you can run bpf in parallel, so if we ever support multiple rings on a single interface, we can run bpf on each ring on different cpus safely. ive hit this pretty hard in production at work (yay dhcrelay) on myx (which does rx outside the biglock). ok jmatthew@ mpi@ millert@
Revision tags: OPENBSD_5_7_BASE OPENBSD_5_8_BASE
# 1.24	10-Feb-2015	pelikan	make bpf(4) able to filter based on a pf(4) queue ID for tcpdump -Q qname ALTQ version has been on tech@ for years, people were generally ok with it. ok henning
# 1.23	05-Oct-2014	lteo	fix typo in comment: correspoding -> corresponding
Revision tags: OPENBSD_5_5_BASE OPENBSD_5_6_BASE
# 1.22	18-Dec-2013	krw	Revert the other part of bpf.c's r1.84. May finally fix RD Thrush's encounter with "timeout_add: to_ticks (-1) < 0". Pointed out by RD Thrush.
# 1.21	12-Nov-2013	dlg	try bpf.c r1.84 again, this time without semantic changes to if statements. cheers to sthen@ and krw@ for properly dealing with the fallout of my first commit.
# 1.20	11-Nov-2013	sthen	Revert bpf.c 1.84 / bpfdesc.h 1.19 for now, "panic: timeout_add: to_ticks (-1) < 0" seen by RD Thrush, http://article.gmane.org/gmane.os.openbsd.bugs/20113 where he has a long-running process using bpf which is active at the time of panic. krw@ agrees with reverting for now.
# 1.19	11-Nov-2013	dlg	replace the user of ticks in a condition like "interval + start < ticks" with "ticks - start > interval" because the latter copes with the ticks value wrapping. pointed out by guenther@ ok krw@
# 1.18	24-Oct-2013	deraadt	Move obvious kernel prototypes (and structure's with kernel pointers, obviously only used in the kernel) behind #ifdef _KERNEL
Revision tags: OPENBSD_4_0_BASE OPENBSD_4_1_BASE OPENBSD_4_2_BASE OPENBSD_4_3_BASE OPENBSD_4_4_BASE OPENBSD_4_5_BASE OPENBSD_4_6_BASE OPENBSD_4_7_BASE OPENBSD_4_8_BASE OPENBSD_4_9_BASE OPENBSD_5_0_BASE OPENBSD_5_1_BASE OPENBSD_5_2_BASE OPENBSD_5_3_BASE OPENBSD_5_4_BASE
# 1.17	25-Mar-2006	djm	allow bpf(4) to ignore packets based on their direction (inbound or outbound), using a new BIOCSDIRFILT ioctl; guidance, feedback and ok canacar@
Revision tags: OPENBSD_3_9_BASE
# 1.16	21-Nov-2005	millert	Move contents of sys/select.h to sys/selinfo.h in preparation for a userland-visible sys/select.h. Consistent with what Net and Free do. OK deraadt@, tested with full ports build by naddy@.
Revision tags: OPENBSD_3_7_BASE OPENBSD_3_8_BASE
# 1.15	17-Dec-2004	reyk	knf cleanup, convert old k&r-style functions to ansi-style for a consistent style in sys/net/bpf.c. ok henning@, "looks fine" canacar@
Revision tags: OPENBSD_3_6_BASE
# 1.14	22-Jun-2004	canacar	Add a new "filter drop" flag to bpf and related ioclts. When enabled, it notifies the calling interface that the packet matches a bpf filter and should be dropped. ok henning@ markus@ frantzen@
Revision tags: SMP_SYNC_A SMP_SYNC_B
# 1.13	28-May-2004	grange	bpf device cloning. Now to have more bpf devices just add device nodes in /dev, no need to recompile kernel anymore. Code from form@pdp-11.org.ru, some help from markus@. ok markus@ canacar@ deraadt@
# 1.12	08-May-2004	canacar	reference count bpf descriptors to protect against disappearing interfaces while asleep in read. ok deraadt@
Revision tags: OPENBSD_3_5_BASE
# 1.11	22-Oct-2003	canacar	Add locking and write filtering to bpf descriptors. Locking prevents dangerous ioctls such as changing the interface and sending signals to be executed by an unprivileged process. A filter can also be applied to packets injected through a bpf descriptor. These features allow programs using bpf descriptors to safely drop/seperate privileges. ok frantzen@ henning@ mcbride@
Revision tags: OPENBSD_3_4_BASE
# 1.10	02-Jun-2003	millert	Remove the advertising clause in the UCB license which Berkeley rescinded 22 July 1999. Proofed by myself and Theo.
Revision tags: OPENBSD_3_1_BASE OPENBSD_3_2_BASE OPENBSD_3_3_BASE UBC_SYNC_A UBC_SYNC_B
# 1.9	14-Mar-2002	millert	First round of __P removal in sys
Revision tags: OPENBSD_3_0_BASE UBC_BASE
# 1.8	09-Jun-2001	angelos	branches: 1.8.4; By popular demand, protect from multiple inclusion, and fix to use the same naming style.
# 1.7	28-May-2001	dugsong	add BIOC[GS]HDRCMPLT ioctl for BPF, to disable overwriting of link level source address in forged frames. from NetBSD. art@ok
Revision tags: OPENBSD_2_8_BASE OPENBSD_2_9_BASE
# 1.6	19-Jun-2000	jason	de-#ifdef-ize
Revision tags: OPENBSD_2_6_BASE OPENBSD_2_7_BASE SMP_BASE kame_19991208
# 1.5	08-Aug-1999	niklas	branches: 1.5.4; Support detaching of network interfaces. Still work to do in ipf, and other families than inet.
Revision tags: OPENBSD_2_4_BASE OPENBSD_2_5_BASE
# 1.4	26-Jun-1998	deraadt	fix bpf select(); from mts@rare.net
Revision tags: OPENBSD_2_2_BASE OPENBSD_2_3_BASE
# 1.3	31-Aug-1997	deraadt	for non-tty TIOCSPGRP/F_SETOWN/FIOSETOWN pgid setting calls, store uid and euid as well, then deliver them using new csignal() interface which ensures that pgid setting process is permitted to signal the pgid process(es). Thanks to newsham@aloha.net for extensive help and discussion.
Revision tags: OPENBSD_2_1_BASE
# 1.2	24-Feb-1997	niklas	OpenBSD tags + some prototyping police
# 1.1	18-Oct-1995	deraadt	branches: 1.1.1; Initial revision
# 1.36	24-Jan-2018	dlg	add support for bpf on "subsystems", not just network interfaces bpf assumed that it was being unconditionally attached to network interfaces, and maintained a pointer to a struct ifnet *. this was mostly used to get at the name of the interface, which is how userland asks to be attached to a particular interface. this diff adds a pointer to the name and uses it instead of the interface pointer for these lookups. this in turn allows bpf to be attached to arbitrary subsystems in the kernel which just have to supply a name rather than an interface pointer. for example, bpf could be attached to pf_test so you can see what packets are about to be filtered. mpi@ is using this to look at usb transfers. bpf still uses the interface pointer for bpfwrite, and for enabling and disabling promisc. however, these are nopped out for subsystems. ok mpi@
Revision tags: OPENBSD_6_1_BASE OPENBSD_6_2_BASE
# 1.35	24-Jan-2017	krw	A space here, a space there. Soon we're talking real whitespace rectification.
# 1.34	09-Jan-2017	mpi	Use a mutex to serialize accesses to buffer slots. With this change bpf_catchpacket() no longer need the KERNEL_LOCK(). Tested by Hrvoje Popovski who reported a recursion in the previous attempt. ok bluhm@
# 1.33	03-Jan-2017	mpi	Revert previous, there's still a problem with recursive entries in bpf_mpath_ether(). Problem reported by Hrvoje Popovski.
# 1.32	02-Jan-2017	mpi	Use a mutex to serialize accesses to buffer slots. With this change bpf_catchpacket() no longer need the KERNEL_LOCK(). ok bluhm@, jmatthew@
# 1.31	22-Aug-2016	mpi	Call csignal() and selwakeup() from a KERNEL_LOCK'd task. This will allow us make bpf_tap() KERNEL_LOCK() free. Discussed with dlg@ and input from guenther@
Revision tags: OPENBSD_6_0_BASE
# 1.30	30-Mar-2016	dlg	remove support for BIOCGQUEUE and BIOSGQUEUE nothing uses them, and the implementation make incorrect assumptions about mbufs within bpf processing that could lead to some weird failures. ok sthen@ deraadt@ mpi@
Revision tags: OPENBSD_5_9_BASE
# 1.29	03-Dec-2015	mpi	Use SRPL_HEAD() and SRPL_ENTRY() to be consistent with and allow to fallback to a SLIST. ok dlg@, jasper@
# 1.28	09-Sep-2015	dlg	convert bpf to using an srp list for the list of descriptors. this replaces the hand rolled list. the code has always used hand rolled lists, but that gets a bit cumbersome when theyre SRPs. requested ages ago by mpi@
# 1.27	01-Sep-2015	dlg	reintroduce bpf.c r1.121. this differs slightly from 1.121 in that it uses the new srp_follow() to walk the list of descriptors on an interface. this is instead of interleaving srp_enter() and srp_leave(), which can lead to races and corruption if you're touching the same SRPs at different IPLs on the same CPU. ok deraadt@ jmatthew@
# 1.26	23-Aug-2015	dlg	back out bpf+srp. its blowing up in a bridge setup. ill debug this out of the tree.
# 1.25	16-Aug-2015	dlg	make bpf_mtap mpsafe by using SRPs. this was originally implemented by jmatthew@ last year, and updated by us both during s2k15. there are four data structures that need to be looked after. the first is the bpf interface itself. it is allocated and freed at the same time as an actual interface, so if you're able to send or receive packets, you're able to run bpf on an interface too. dont need to do any work there. the second are bpf descriptors. these represent userland attaching to a bpf interface, so you can have many of them on a single bpf interface. they were arranged in a singly linked list before. now the head and next pointers are replaced with SRP pointers and followed by srp_enter. the list updates are serialised by the kernel lock. the third are the bpf filters. there is an inbound and outbound filter on each bpf descriptor, ann a process can replace them at any time. the pointers from the descriptor to those is also changed to be accessed via srp_enter. updates are serialised by the kernel lock. the fourth thing is the ring that bpf writes to for userland to read. there's one of these per descriptor. because these are only updated when a filter matches (which is hopefully a relatively rare event), we take the kernel lock to serialise the writes to the ring. all this together means you can run bpf against a packet without taking the kernel lock unless you actually caught a packet and need to send it to userland. even better, you can run bpf in parallel, so if we ever support multiple rings on a single interface, we can run bpf on each ring on different cpus safely. ive hit this pretty hard in production at work (yay dhcrelay) on myx (which does rx outside the biglock). ok jmatthew@ mpi@ millert@
Revision tags: OPENBSD_5_7_BASE OPENBSD_5_8_BASE
# 1.24	10-Feb-2015	pelikan	make bpf(4) able to filter based on a pf(4) queue ID for tcpdump -Q qname ALTQ version has been on tech@ for years, people were generally ok with it. ok henning
# 1.23	05-Oct-2014	lteo	fix typo in comment: correspoding -> corresponding
Revision tags: OPENBSD_5_5_BASE OPENBSD_5_6_BASE
# 1.22	18-Dec-2013	krw	Revert the other part of bpf.c's r1.84. May finally fix RD Thrush's encounter with "timeout_add: to_ticks (-1) < 0". Pointed out by RD Thrush.
# 1.21	12-Nov-2013	dlg	try bpf.c r1.84 again, this time without semantic changes to if statements. cheers to sthen@ and krw@ for properly dealing with the fallout of my first commit.
# 1.20	11-Nov-2013	sthen	Revert bpf.c 1.84 / bpfdesc.h 1.19 for now, "panic: timeout_add: to_ticks (-1) < 0" seen by RD Thrush, http://article.gmane.org/gmane.os.openbsd.bugs/20113 where he has a long-running process using bpf which is active at the time of panic. krw@ agrees with reverting for now.
# 1.19	11-Nov-2013	dlg	replace the user of ticks in a condition like "interval + start < ticks" with "ticks - start > interval" because the latter copes with the ticks value wrapping. pointed out by guenther@ ok krw@
# 1.18	24-Oct-2013	deraadt	Move obvious kernel prototypes (and structure's with kernel pointers, obviously only used in the kernel) behind #ifdef _KERNEL
Revision tags: OPENBSD_4_0_BASE OPENBSD_4_1_BASE OPENBSD_4_2_BASE OPENBSD_4_3_BASE OPENBSD_4_4_BASE OPENBSD_4_5_BASE OPENBSD_4_6_BASE OPENBSD_4_7_BASE OPENBSD_4_8_BASE OPENBSD_4_9_BASE OPENBSD_5_0_BASE OPENBSD_5_1_BASE OPENBSD_5_2_BASE OPENBSD_5_3_BASE OPENBSD_5_4_BASE
# 1.17	25-Mar-2006	djm	allow bpf(4) to ignore packets based on their direction (inbound or outbound), using a new BIOCSDIRFILT ioctl; guidance, feedback and ok canacar@
Revision tags: OPENBSD_3_9_BASE
# 1.16	21-Nov-2005	millert	Move contents of sys/select.h to sys/selinfo.h in preparation for a userland-visible sys/select.h. Consistent with what Net and Free do. OK deraadt@, tested with full ports build by naddy@.
Revision tags: OPENBSD_3_7_BASE OPENBSD_3_8_BASE
# 1.15	17-Dec-2004	reyk	knf cleanup, convert old k&r-style functions to ansi-style for a consistent style in sys/net/bpf.c. ok henning@, "looks fine" canacar@
Revision tags: OPENBSD_3_6_BASE
# 1.14	22-Jun-2004	canacar	Add a new "filter drop" flag to bpf and related ioclts. When enabled, it notifies the calling interface that the packet matches a bpf filter and should be dropped. ok henning@ markus@ frantzen@
Revision tags: SMP_SYNC_A SMP_SYNC_B
# 1.13	28-May-2004	grange	bpf device cloning. Now to have more bpf devices just add device nodes in /dev, no need to recompile kernel anymore. Code from form@pdp-11.org.ru, some help from markus@. ok markus@ canacar@ deraadt@
# 1.12	08-May-2004	canacar	reference count bpf descriptors to protect against disappearing interfaces while asleep in read. ok deraadt@
Revision tags: OPENBSD_3_5_BASE
# 1.11	22-Oct-2003	canacar	Add locking and write filtering to bpf descriptors. Locking prevents dangerous ioctls such as changing the interface and sending signals to be executed by an unprivileged process. A filter can also be applied to packets injected through a bpf descriptor. These features allow programs using bpf descriptors to safely drop/seperate privileges. ok frantzen@ henning@ mcbride@
Revision tags: OPENBSD_3_4_BASE
# 1.10	02-Jun-2003	millert	Remove the advertising clause in the UCB license which Berkeley rescinded 22 July 1999. Proofed by myself and Theo.
Revision tags: OPENBSD_3_1_BASE OPENBSD_3_2_BASE OPENBSD_3_3_BASE UBC_SYNC_A UBC_SYNC_B
# 1.9	14-Mar-2002	millert	First round of __P removal in sys
Revision tags: OPENBSD_3_0_BASE UBC_BASE
# 1.8	09-Jun-2001	angelos	branches: 1.8.4; By popular demand, protect from multiple inclusion, and fix to use the same naming style.
# 1.7	28-May-2001	dugsong	add BIOC[GS]HDRCMPLT ioctl for BPF, to disable overwriting of link level source address in forged frames. from NetBSD. art@ok
Revision tags: OPENBSD_2_8_BASE OPENBSD_2_9_BASE
# 1.6	19-Jun-2000	jason	de-#ifdef-ize
Revision tags: OPENBSD_2_6_BASE OPENBSD_2_7_BASE SMP_BASE kame_19991208
# 1.5	08-Aug-1999	niklas	branches: 1.5.4; Support detaching of network interfaces. Still work to do in ipf, and other families than inet.
Revision tags: OPENBSD_2_4_BASE OPENBSD_2_5_BASE
# 1.4	26-Jun-1998	deraadt	fix bpf select(); from mts@rare.net
Revision tags: OPENBSD_2_2_BASE OPENBSD_2_3_BASE
# 1.3	31-Aug-1997	deraadt	for non-tty TIOCSPGRP/F_SETOWN/FIOSETOWN pgid setting calls, store uid and euid as well, then deliver them using new csignal() interface which ensures that pgid setting process is permitted to signal the pgid process(es). Thanks to newsham@aloha.net for extensive help and discussion.
Revision tags: OPENBSD_2_1_BASE
# 1.2	24-Feb-1997	niklas	OpenBSD tags + some prototyping police
# 1.1	18-Oct-1995	deraadt	branches: 1.1.1; Initial revision
Revision tags: OPENBSD_6_1_BASE OPENBSD_6_2_BASE
# 1.35	24-Jan-2017	krw	A space here, a space there. Soon we're talking real whitespace rectification.
# 1.34	09-Jan-2017	mpi	Use a mutex to serialize accesses to buffer slots. With this change bpf_catchpacket() no longer need the KERNEL_LOCK(). Tested by Hrvoje Popovski who reported a recursion in the previous attempt. ok bluhm@
# 1.33	03-Jan-2017	mpi	Revert previous, there's still a problem with recursive entries in bpf_mpath_ether(). Problem reported by Hrvoje Popovski.
# 1.32	02-Jan-2017	mpi	Use a mutex to serialize accesses to buffer slots. With this change bpf_catchpacket() no longer need the KERNEL_LOCK(). ok bluhm@, jmatthew@
# 1.31	22-Aug-2016	mpi	Call csignal() and selwakeup() from a KERNEL_LOCK'd task. This will allow us make bpf_tap() KERNEL_LOCK() free. Discussed with dlg@ and input from guenther@
Revision tags: OPENBSD_6_0_BASE
# 1.30	30-Mar-2016	dlg	remove support for BIOCGQUEUE and BIOSGQUEUE nothing uses them, and the implementation make incorrect assumptions about mbufs within bpf processing that could lead to some weird failures. ok sthen@ deraadt@ mpi@
Revision tags: OPENBSD_5_9_BASE
# 1.29	03-Dec-2015	mpi	Use SRPL_HEAD() and SRPL_ENTRY() to be consistent with and allow to fallback to a SLIST. ok dlg@, jasper@
# 1.28	09-Sep-2015	dlg	convert bpf to using an srp list for the list of descriptors. this replaces the hand rolled list. the code has always used hand rolled lists, but that gets a bit cumbersome when theyre SRPs. requested ages ago by mpi@
# 1.27	01-Sep-2015	dlg	reintroduce bpf.c r1.121. this differs slightly from 1.121 in that it uses the new srp_follow() to walk the list of descriptors on an interface. this is instead of interleaving srp_enter() and srp_leave(), which can lead to races and corruption if you're touching the same SRPs at different IPLs on the same CPU. ok deraadt@ jmatthew@
# 1.26	23-Aug-2015	dlg	back out bpf+srp. its blowing up in a bridge setup. ill debug this out of the tree.
# 1.25	16-Aug-2015	dlg	make bpf_mtap mpsafe by using SRPs. this was originally implemented by jmatthew@ last year, and updated by us both during s2k15. there are four data structures that need to be looked after. the first is the bpf interface itself. it is allocated and freed at the same time as an actual interface, so if you're able to send or receive packets, you're able to run bpf on an interface too. dont need to do any work there. the second are bpf descriptors. these represent userland attaching to a bpf interface, so you can have many of them on a single bpf interface. they were arranged in a singly linked list before. now the head and next pointers are replaced with SRP pointers and followed by srp_enter. the list updates are serialised by the kernel lock. the third are the bpf filters. there is an inbound and outbound filter on each bpf descriptor, ann a process can replace them at any time. the pointers from the descriptor to those is also changed to be accessed via srp_enter. updates are serialised by the kernel lock. the fourth thing is the ring that bpf writes to for userland to read. there's one of these per descriptor. because these are only updated when a filter matches (which is hopefully a relatively rare event), we take the kernel lock to serialise the writes to the ring. all this together means you can run bpf against a packet without taking the kernel lock unless you actually caught a packet and need to send it to userland. even better, you can run bpf in parallel, so if we ever support multiple rings on a single interface, we can run bpf on each ring on different cpus safely. ive hit this pretty hard in production at work (yay dhcrelay) on myx (which does rx outside the biglock). ok jmatthew@ mpi@ millert@
Revision tags: OPENBSD_5_7_BASE OPENBSD_5_8_BASE
# 1.24	10-Feb-2015	pelikan	make bpf(4) able to filter based on a pf(4) queue ID for tcpdump -Q qname ALTQ version has been on tech@ for years, people were generally ok with it. ok henning
# 1.23	05-Oct-2014	lteo	fix typo in comment: correspoding -> corresponding
Revision tags: OPENBSD_5_5_BASE OPENBSD_5_6_BASE
# 1.22	18-Dec-2013	krw	Revert the other part of bpf.c's r1.84. May finally fix RD Thrush's encounter with "timeout_add: to_ticks (-1) < 0". Pointed out by RD Thrush.
# 1.21	12-Nov-2013	dlg	try bpf.c r1.84 again, this time without semantic changes to if statements. cheers to sthen@ and krw@ for properly dealing with the fallout of my first commit.
# 1.20	11-Nov-2013	sthen	Revert bpf.c 1.84 / bpfdesc.h 1.19 for now, "panic: timeout_add: to_ticks (-1) < 0" seen by RD Thrush, http://article.gmane.org/gmane.os.openbsd.bugs/20113 where he has a long-running process using bpf which is active at the time of panic. krw@ agrees with reverting for now.
# 1.19	11-Nov-2013	dlg	replace the user of ticks in a condition like "interval + start < ticks" with "ticks - start > interval" because the latter copes with the ticks value wrapping. pointed out by guenther@ ok krw@
# 1.18	24-Oct-2013	deraadt	Move obvious kernel prototypes (and structure's with kernel pointers, obviously only used in the kernel) behind #ifdef _KERNEL
Revision tags: OPENBSD_4_0_BASE OPENBSD_4_1_BASE OPENBSD_4_2_BASE OPENBSD_4_3_BASE OPENBSD_4_4_BASE OPENBSD_4_5_BASE OPENBSD_4_6_BASE OPENBSD_4_7_BASE OPENBSD_4_8_BASE OPENBSD_4_9_BASE OPENBSD_5_0_BASE OPENBSD_5_1_BASE OPENBSD_5_2_BASE OPENBSD_5_3_BASE OPENBSD_5_4_BASE
# 1.17	25-Mar-2006	djm	allow bpf(4) to ignore packets based on their direction (inbound or outbound), using a new BIOCSDIRFILT ioctl; guidance, feedback and ok canacar@
Revision tags: OPENBSD_3_9_BASE
# 1.16	21-Nov-2005	millert	Move contents of sys/select.h to sys/selinfo.h in preparation for a userland-visible sys/select.h. Consistent with what Net and Free do. OK deraadt@, tested with full ports build by naddy@.
Revision tags: OPENBSD_3_7_BASE OPENBSD_3_8_BASE
# 1.15	17-Dec-2004	reyk	knf cleanup, convert old k&r-style functions to ansi-style for a consistent style in sys/net/bpf.c. ok henning@, "looks fine" canacar@
Revision tags: OPENBSD_3_6_BASE
# 1.14	22-Jun-2004	canacar	Add a new "filter drop" flag to bpf and related ioclts. When enabled, it notifies the calling interface that the packet matches a bpf filter and should be dropped. ok henning@ markus@ frantzen@
Revision tags: SMP_SYNC_A SMP_SYNC_B
# 1.13	28-May-2004	grange	bpf device cloning. Now to have more bpf devices just add device nodes in /dev, no need to recompile kernel anymore. Code from form@pdp-11.org.ru, some help from markus@. ok markus@ canacar@ deraadt@
# 1.12	08-May-2004	canacar	reference count bpf descriptors to protect against disappearing interfaces while asleep in read. ok deraadt@
Revision tags: OPENBSD_3_5_BASE
# 1.11	22-Oct-2003	canacar	Add locking and write filtering to bpf descriptors. Locking prevents dangerous ioctls such as changing the interface and sending signals to be executed by an unprivileged process. A filter can also be applied to packets injected through a bpf descriptor. These features allow programs using bpf descriptors to safely drop/seperate privileges. ok frantzen@ henning@ mcbride@
Revision tags: OPENBSD_3_4_BASE
# 1.10	02-Jun-2003	millert	Remove the advertising clause in the UCB license which Berkeley rescinded 22 July 1999. Proofed by myself and Theo.
Revision tags: OPENBSD_3_1_BASE OPENBSD_3_2_BASE OPENBSD_3_3_BASE UBC_SYNC_A UBC_SYNC_B
# 1.9	14-Mar-2002	millert	First round of __P removal in sys
Revision tags: OPENBSD_3_0_BASE UBC_BASE
# 1.8	09-Jun-2001	angelos	branches: 1.8.4; By popular demand, protect from multiple inclusion, and fix to use the same naming style.
# 1.7	28-May-2001	dugsong	add BIOC[GS]HDRCMPLT ioctl for BPF, to disable overwriting of link level source address in forged frames. from NetBSD. art@ok
Revision tags: OPENBSD_2_8_BASE OPENBSD_2_9_BASE
# 1.6	19-Jun-2000	jason	de-#ifdef-ize
Revision tags: OPENBSD_2_6_BASE OPENBSD_2_7_BASE SMP_BASE kame_19991208
# 1.5	08-Aug-1999	niklas	branches: 1.5.4; Support detaching of network interfaces. Still work to do in ipf, and other families than inet.
Revision tags: OPENBSD_2_4_BASE OPENBSD_2_5_BASE
# 1.4	26-Jun-1998	deraadt	fix bpf select(); from mts@rare.net
Revision tags: OPENBSD_2_2_BASE OPENBSD_2_3_BASE
# 1.3	31-Aug-1997	deraadt	for non-tty TIOCSPGRP/F_SETOWN/FIOSETOWN pgid setting calls, store uid and euid as well, then deliver them using new csignal() interface which ensures that pgid setting process is permitted to signal the pgid process(es). Thanks to newsham@aloha.net for extensive help and discussion.
Revision tags: OPENBSD_2_1_BASE
# 1.2	24-Feb-1997	niklas	OpenBSD tags + some prototyping police
# 1.1	18-Oct-1995	deraadt	branches: 1.1.1; Initial revision