#
369183 |
|
30-Jan-2021 |
gbe |
kevent(2): Bugfix for wrong EVFILT_TIMER timeouts
When using NOTE_NSECONDS in the kevent(2) API, US_TO_SBT should be used instead of NS_TO_SBT, otherwise the timeout results are misleading.
PR: 252539 Reviewed by: kevans, kib Approved by: kevans Differential Revision: https://reviews.freebsd.org/D28067
(cherry picked from commit 4d0c33be634a929f323117f04e6b1670776f9e37) (cherry picked from commit 6a3ad2d0a7b633bad2bb33f9c4c426dffcc91633)
Git Hash: 8ce9180c09d93b4ef11859be604ef41173d6dbd1 Git Author: jan.kokemueller@gmail.com
|
#
341083 |
|
27-Nov-2018 |
markj |
MFC r340898: Ensure that knotes do not get registered when KQ_CLOSING is set.
PR: 228858 |
#
341078 |
|
27-Nov-2018 |
markj |
MFC r340897: Lock the knlist before releasing the in-flux state in knote_fork().
PR: 228858 |
#
341076 |
|
27-Nov-2018 |
markj |
MFC r340899: Plug some kernel memory disclosures via kevent(2). |
#
340904 |
|
24-Nov-2018 |
markj |
MFC r340734: Avoid unsynchronized updates to kn_status. |
#
337418 |
|
07-Aug-2018 |
dab |
MFC r336761 & r336781:
Allow a EVFILT_TIMER kevent to be updated.
If a timer is updated (re-added) with a different time period (specified in the .data field of the kevent), the new time period has no effect; the timer will not expire until the original time has elapsed. This violates the documented behavior as the kqueue(2) man page says (in part) "Re-adding an existing event will modify the parameters of the original event, and not result in a duplicate entry."
This modification, adapted from a patch submitted by cem@ to PR214987, fixes the kqueue system to allow updating a timer entry. The kevent timer behavior is changed to:
* When a timer is re-added, update the timer parameters to and re-start the timer using the new parameters. * Allow updating both active and already expired timers. * When the timer has already expired, dequeue any undelivered events and clear the count of expirations.
All of these changes address the original PR and also bring the FreeBSD and macOS kevent timer behaviors into agreement.
A few other changes were made along the way:
* Update the kqueue(2) man page to reflect the new timer behavior. * Fix man page style issues in kqueue(2) diagnosed by igor. * Update the timer libkqueue system test to test for the updated timer behavior. * Fix the (test) libkqueue common.h file so that it includes config.h which defines various HAVE_* feature defines, before the #if tests for such variables in common.h. This enables the use of the actual err(3) family of functions. * Fix the usages of the err(3) functions in the tests for incorrect type of variables. Those were formerly undiagnosed due to the disablement of the err(3) functions (see previous bullet point).
PR: 214987 Relnotes: yes Sponsored by: Dell EMC |
#
328454 |
|
26-Jan-2018 |
jhb |
MFC 326184: Decode kevent structures logged via ktrace(2) in kdump.
- Add a new KTR_STRUCT_ARRAY ktrace record type which dumps an array of structures.
The structure name in the record payload is preceded by a size_t containing the size of the individual structures. Use this to replace the previous code that dumped the kevent arrays dumped for kevent(). kdump is now able to decode the kevent structures rather than dumping their contents via a hexdump.
One change from before is that the 'changes' and 'events' arrays are not marked with separate 'read' and 'write' annotations in kdump output. Instead, the first array is the 'changes' array, and the second array (only present if kevent doesn't fail with an error) is the 'events' array. For kevent(), empty arrays are denoted by an entry with an array containing zero entries rather than no record.
- Move kevent decoding tables from truss to libsysdecode.
This adds three new functions to decode members of struct kevent: sysdecode_kevent_filter, sysdecode_kevent_flags, and sysdecode_kevent_fflags.
kdump uses these helper functions to pretty-print kevent fields.
- Move structure definitions for freebsd11 and freebsd32 kevent structures to <sys/event.h> so that they can be shared with userland. The 32-bit structures are only exposed if _WANT_KEVENT32 is defined. The freebsd11 structures are only exposed if _WANT_FREEBSD11_KEVENT is defined. The 32-bit freebsd11 structure requires both.
- Decode freebsd11 kevent structures in truss for the compat11.kevent() system call.
- Log 32-bit kevent structures via ktrace for 32-bit compat kevent() system calls.
- While here, constify the 'void *data' argument to ktrstruct().
Note that this version of the change for 11.x does not include freebsd11 kevent structures or _WANT_FREEBSD11_KEVENT. It also does not include the change to decode the compat11.kevent system call in truss. |
#
320290 |
|
23-Jun-2017 |
kib |
MFC r320038: Style.
Approved by: re (gjb) |
#
315471 |
|
18-Mar-2017 |
kib |
MFC r315238: Use designated initializers for kevent_copyops. |
#
315470 |
|
18-Mar-2017 |
kib |
MFC r315155: Ktracing kevent(2) calls with unusual arguments might leads to an overly large allocation requests.
PR: 217435
MFC r315237: Hide kev_iovlen() definition under #ifdef KTRACE. |
#
311771 |
|
09-Jan-2017 |
kib |
MFC r310615: Change knlist_destroy() to assertion. |
#
311046 |
|
02-Jan-2017 |
kib |
MFC r310613: Style. |
#
311007 |
|
01-Jan-2017 |
kib |
MFC r310554: Some optimizations for kqueue timers. |
#
311006 |
|
01-Jan-2017 |
kib |
MFC r310552: Some style. |
#
310578 |
|
26-Dec-2016 |
kib |
MFC r310302: Do not clear KN_INFLUX when not owning influx state.
PR: 214923 |
#
310469 |
|
23-Dec-2016 |
kib |
MFC r310159: Switch from stdatomic.h to atomic.h for kernel. |
#
303216 |
|
23-Jul-2016 |
kib |
MFC r302936: Explicitely check for the valid range of file descriptor values.
Approved by: re (gjb) |
#
302408 |
|
08-Jul-2016 |
gjb |
Copy head@r302406 to stable/11 as part of the 11.0-RELEASE cycle. Prune svn:mergeinfo from the new branch, as nothing has been merged here.
Additional commits post-branch will follow.
Approved by: re (implicit) Sponsored by: The FreeBSD Foundation |
#
302308 |
|
01-Jul-2016 |
kib |
When a process knote was attached to the process which is already exiting, the knote is activated immediately. If the exit1() later activates knotes, such knote is attempted to be activated second time. Detect the condition by zeroed kn_ptr.p_proc pointer, and avoid excessive activation.
Before r302235, such knotes were removed from the knlist immediately upon activation.
Reported by: truckman Sponsored by: The FreeBSD Foundation Approved by: re (gjb)
|
#
302242 |
|
27-Jun-2016 |
kib |
Fix userspace build after r302235: do not expose bool field of the structure, change it to int.
The real fix is to sanitize user-visible definitions in sys/event.h, e.g. the affected struct knlist is of no use for userspace programs.
Reported and tested by: jkim Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Approved by: re (gjb)
|
#
302235 |
|
27-Jun-2016 |
kib |
When filt_proc() removes event from the knlist due to the process exiting (NOTE_EXIT->knlist_remove_inevent()), two things happen: - knote kn_knlist pointer is reset - INFLUX knote is removed from the process knlist. And, there are two consequences: - KN_LIST_UNLOCK() on such knote is nop - there is nothing which would block exit1() from processing past the knlist_destroy() (and knlist_destroy() resets knlist lock pointers). Both consequences result either in leaked process lock, or dereferencing NULL function pointers for locking.
Handle this by stopping embedding the process knlist into struct proc. Instead, the knlist is allocated together with struct proc, but marked as autodestroy on the zombie reap, by knlist_detach() function. The knlist is freed when last kevent is removed from the list, in particular, at the zombie reap time if the list is empty. As result, the knlist_remove_inevent() is no longer needed and removed.
Other changes:
In filt_procattach(), clear NOTE_EXEC and NOTE_FORK desired events from kn_sfflags for knote registered by kernel to only get NOTE_CHILD notifications. The flags leak resulted in excessive NOTE_EXEC/NOTE_FORK reports.
Fix immediate note activation in filt_procattach(). Condition should be either the immediate CHILD_NOTE activation, or immediate NOTE_EXIT report for the exiting process.
In knote_fork(), do not perform racy check for KN_INFLUX before kq lock is taken. Besides being racy, it did not accounted for notes just added by scan (KN_SCAN).
Some minor and incomplete style fixes.
Analyzed and tested by: Eric Badger <eric@badgerio.us> Reviewed by: jhb Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Approved by: re (gjb) Differential revision: https://reviews.freebsd.org/D6859
|
#
300627 |
|
24-May-2016 |
kib |
Silence false LOR report due to the taskqueue mutex and kqueue lock named the same.
Reported by: Doug Luce <doug@freebsd.con.com> Sponsored by: The FreeBSD Foundation
|
#
296775 |
|
12-Mar-2016 |
gibbs |
Provide high precision conversion from ns,us,ms -> sbintime in kevent
In timer2sbintime(), calculate the second and fractional second portions of the sbintime separately. When calculating the the fractional second portion, use a 64bit multiply to prevent excess truncation. This avoids the ~7% error in the original conversion for ns, and smaller errors of the same type for us and ms.
PR: 198139 Reviewed by: jhb MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D5397
|
#
295786 |
|
19-Feb-2016 |
markj |
Ensure that we test the event condition when a disabled kevent is enabled.
r274560 modified kqueue_register() to only test the event condition if the corresponding knote is not disabled. However, this check takes place before the EV_ENABLE flag is used to clear the KN_DISABLED flag on the knote, so enabling a previously-disabled kevent would not result in a notification for a triggered event. This change fixes the problem by testing for EV_ENABLED before possibly checking the event condition.
This change also updates a kqueue regression test to exercise this case.
PR: 206368 Reviewed by: kib Sponsored by: EMC / Isilon Storage Division Differential Revision: https://reviews.freebsd.org/D5307
|
#
295785 |
|
19-Feb-2016 |
markj |
Return an error if both EV_ENABLE and EV_DISABLE are specified for a kevent.
Currently, this combination results in EV_DISABLE being ignored.
Reviewed by: kib Sponsored by: EMC / Isilon Storage Division Differential Revision: https://reviews.freebsd.org/D5307
|
#
295012 |
|
28-Jan-2016 |
vangyzen |
kqueue EVFILT_PROC: avoid collision between NOTE_CHILD and NOTE_EXIT
NOTE_CHILD and NOTE_EXIT return something in kevent.data: the parent pid (ppid) for NOTE_CHILD and the exit status for NOTE_EXIT. Do not let the two events be combined, since one would overwrite the other's data.
PR: 180385 Submitted by: David A. Bright <david_a_bright@dell.com> Reviewed by: jhb MFC after: 1 month Sponsored by: Dell Inc. Differential Revision: https://reviews.freebsd.org/D4900
|
#
288145 |
|
23-Sep-2015 |
mjg |
kqueue: simplify kern_kqueue by not refing/unrefing creds too early
No functional changes.
|
#
287366 |
|
01-Sep-2015 |
kib |
Exit notification for EVFILT_PROC removes knote from the knlist. In particular, this invalidates the knote kn_link linkage, making the SLIST_FOREACH() loop accessing undefined values (e.g. trashed by QUEUE_MACRO_DEBUG). If the knote is freed by other thread when kq lock is released or when influx is cleared, e.g. by knote_scan() for kqueue owning the knote, the iteration step would access freed memory.
Use SLIST_FOREACH_SAFE() to fix iteration.
Diagnosed by: avg Tested by: avg, lstewart, pawel Sponsored by: The FreeBSD Foundation MFC after: 2 weeks
|
#
287362 |
|
01-Sep-2015 |
kib |
Clean up the kqueue use of the uma KPI.
Explain why it is fine to not check for M_NOWAIT failures in kqueue_register(). Remove unneeded check for NULL result from waitable allocation in kqueue_scan(). uma_free(9) handles NULL argument correctly, remove checks for NULL. Remove useless cast and adjust style in knote_alloc().
Sponsored by: The FreeBSD Foundation MFC after: 2 weeks
|
#
286681 |
|
12-Aug-2015 |
ed |
Perform cleanups in response to D3307.
- Document the kern_kevent_anonymous() function. - Add assertions to ensure that we don't silently leave the kqueue linked from a file descriptor table.
Reviewed by: jmg Differential Revision: https://reviews.freebsd.org/D3364
|
#
286631 |
|
11-Aug-2015 |
ed |
Add support for anonymous kqueues.
CloudABI's polling system calls merge the concept of one-shot polling (poll, select) and stateful polling (kqueue). They share the same data structures.
Extend FreeBSD's kqueue to provide support for waiting for events on an anonymous kqueue. Unlike stateful polling, there is no need to support timeouts, as an additional timer event could be used instead. Furthermore, it makes no sense to use a different number of input and output kevents. Merge this into a single argument.
Obtained from: https://github.com/NuxiNL/freebsd Differential Revision: https://reviews.freebsd.org/D3307
|
#
286309 |
|
05-Aug-2015 |
ed |
Allow the creation of kqueues with a restricted set of Capsicum rights.
On CloudABI we want to create file descriptors with just the minimal set of Capsicum rights in place. The reason for this is that it makes it easier to obtain uniform behaviour across different operating systems.
By explicitly whitelisting the operations, we can return consistent error codes, but also prevent applications from depending OS-specific behaviour.
Extend kern_kqueue() to take an additional struct filecaps that is passed on to falloc_caps(). Update the existing consumers to pass in NULL.
Differential Revision: https://reviews.freebsd.org/D3259
|
#
285670 |
|
18-Jul-2015 |
kib |
The si_status field of the siginfo_t, provided by the waitid(2) and SIGCHLD signal, should keep full 32 bits of the status passed to the _exit(2).
Split the combined p_xstat of the struct proc into the separate exit status p_xexit for normal process exit, and signalled termination information p_xsig. Kernel-visible macro KW_EXITCODE() reconstructs old p_xstat from p_xexit and p_xsig. p_xexit contains complete status and copied out into si_status.
Requested by: Joerg Schilling Reviewed by: jilles (previous version), pho Tested by: pho Sponsored by: The FreeBSD Foundation
|
#
284215 |
|
10-Jun-2015 |
mjg |
Implement lockless resource limits.
Use the same scheme implemented to manage credentials.
Code needing to look at process's credentials (as opposed to thred's) is provided with *_proc variants of relevant functions.
Places which possibly had to take the proc lock anyway still use the proc pointer to access limits.
|
#
283440 |
|
24-May-2015 |
dchagin |
For future use in the Linuxulator:
1. Add a kern_kqueue() counterpart for kqueue() with flags parameter.
2. Be a bit secure. To avoid a double fp lookup add a kern_kevent_fp() counterpart for kern_kevent() with file pointer parameter instead of file descriptor an pass the buck to it.
Suggested by: mjg [2]
Differential Revision: https://reviews.freebsd.org/D1091 Reviewed by: trasz
|
#
283291 |
|
22-May-2015 |
jkim |
CALLOUT_MPSAFE has lost its meaning since r141428, i.e., for more than ten years for head. However, it is continuously misused as the mpsafe argument for callout_init(9). Deprecate the flag and clean up callout_init() calls to make them more consistent.
Differential Revision: https://reviews.freebsd.org/D2613 Reviewed by: jhb MFC after: 2 weeks
|
#
274560 |
|
16-Nov-2014 |
jmg |
prevent doing filter ops locking for staticly compiled filter ops... This significantly reduces lock contention when adding/removing knotes on busy multi-kq system... Next step is to cache these references per kq.. i.e. kq refs it once and keeps a local ref count so that the same refs don't get accessed by many cpus...
only allocate a knote when we might use it...
Add a new flag, _FORCEONESHOT.. This allows a thread to force the delivery of another event in a safe manner, say waking up an idle http connection to force it to be reaped...
If we are _DISABLE'ing a knote, don't bother to call f_event on it, it's disabled, so won't be delivered anyways..
Tested by: adrian
|
#
272528 |
|
04-Oct-2014 |
ian |
Make kevent(2) periodic timer events more reliably periodic. The event callout is now scheduled using the C_ABSOLUTE flag, and the absolute time of each event is calculated as the time the previous event was scheduled for plus the interval. This ensures that latency in processing a given event doesn't perturb the arrival time of any subsequent events.
Reviewed by: jhb
|
#
271976 |
|
22-Sep-2014 |
jhb |
Add a new fo_fill_kinfo fileops method to add type-specific information to struct kinfo_file. - Move the various fill_*_info() methods out of kern_descrip.c and into the various file type implementations. - Rework the support for kinfo_ofile to generate a suitable kinfo_file object for each file and then convert that to a kinfo_ofile structure rather than keeping a second, different set of code that directly manipulates type-specific file information. - Remove the shm_path() and ksem_info() layering violations.
Differential Revision: https://reviews.freebsd.org/D775 Reviewed by: kib, glebius (earlier version)
|
#
271489 |
|
12-Sep-2014 |
jhb |
Fix various issues with invalid file operations: - Add invfo_rdwr() (for read and write), invfo_ioctl(), invfo_poll(), and invfo_kqfilter() for use by file types that do not support the respective operations. Home-grown versions of invfo_poll() were universally broken (they returned an errno value, invfo_poll() uses poll_no_poll() to return an appropriate event mask). Home-grown ioctl routines also tended to return an incorrect errno (invfo_ioctl returns ENOTTY). - Use the invfo_*() functions instead of local versions for unsupported file operations. - Reorder fileops members to match the order in the structure definition to make it easier to spot missing members. - Add several missing methods to linuxfileops used by the OFED shim layer: fo_write(), fo_truncate(), fo_kqfilter(), and fo_stat(). Most of these used invfo_*(), but a dummy fo_stat() implementation was added.
|
#
268843 |
|
18-Jul-2014 |
bapt |
Extend kqueue's EVFILT_TIMER by adding precision unit flags support
Define the precision macros as bits sets to conform with XNU equivalent. Test fflags passed for EVFILT_TIMER and return EINVAL in case an invalid flag is passed.
Phabric: https://phabric.freebsd.org/D421 Reviewed by: kib
|
#
264388 |
|
12-Apr-2014 |
davide |
Hide internal details of sbintime_t implementation wrapping INT64_MAX into SBT_MAX, to make it more robust in case internal type representation will change in the future. All the consumers were migrated to SBT_MAX and every new consumer (if any) should from now use this interface.
Requested by: bapt, jmg, Ryan Lortie (implictly) Reviewed by: mav, bde
|
#
264231 |
|
07-Apr-2014 |
ed |
Implement kqueue(2) for procdesc(4).
kqueue(2) already supports EVFILT_PROC. Add an EVFILT_PROCDESC that behaves the same, but operates on a procdesc(4) instead. Only implement NOTE_EXIT for now. The nice thing about NOTE_EXIT is that it also returns the exit status of the process, meaning that we can now obtain this value, even if pdwait4(2) is still unimplemented.
Notes:
- Simply reuse EVFILT_NETDEV for EVFILT_PROCDESC. As both of these will be used on totally different descriptor types, this should not clash.
- Let procdesc_kqops_event() reuse the same structure as filt_proc(). The only difference is that procdesc_kqops_event() should also be able to deal with the case where the process was already terminated after registration. Simply test this when hint == 0.
- Fix some style(9) issues in filt_proc() to keep it consistent with the newly added procdesc_kqops_event().
- Save the exit status of the process in pd->pd_xstat, as we cannot pick up the proctree_lock from within procdesc_kqops_event().
Discussed on: arch@ Reviewed by: kib@
|
#
264146 |
|
05-Apr-2014 |
kib |
When KN_INFLUX is set on the knote due to kqueue_register() or kqueue_scan() unlocking the kqueue to call f_event, knote() or knote_fork() should not skip the knote. The knote is not going to disappear during the influx time, and the mutual exclusion between scan and knote() is ensured by both code pathes taking knlist lock. The race appears since knlist lock is before kq lock, so KN_INFLUX must be set, kq lock must be dropped and only then knlist lock can be taken. The window between kq unlock and knlist lock causes lost events.
Add a flag KN_SCAN to indicate that KN_INFLUX is set in a manner safe for the knote(), and check for it to ignore KN_INFLUX in the knote*() as needed. Also, in knote(), remove the lockless check for the KN_INFLUX flag, which could also result in the lost notification.
Reported and tested by: Kohji Okuno <okuno.kohji@jp.panasonic.com> Discussed with: jmg Sponsored by: The FreeBSD Foundation MFC after: 1 week
|
#
263233 |
|
16-Mar-2014 |
rwatson |
Update kernel inclusions of capability.h to use capsicum.h instead; some further refinement is required as some device drivers intended to be portable over FreeBSD versions rely on __FreeBSD_version to decide whether to include capability.h.
MFC after: 3 weeks
|
#
260805 |
|
17-Jan-2014 |
adrian |
Add in a default initialiser for the EVOPS_SENDFILE kqueue filterops.
Sponsored by: Netflix, Inc.
|
#
260384 |
|
07-Jan-2014 |
adrian |
Add a compile-time control over the size of KN_HASHSIZE.
This is needed for applications that use a lot of non-filedescriptor knotes.
MFC after: 1 week Sponsored by: Netflix, Inc.
|
#
259633 |
|
19-Dec-2013 |
se |
Fix compilation on 32 bit architectures and use INT64_MAX instead of LONG_MAX for the upper bound check.
|
#
259609 |
|
19-Dec-2013 |
se |
Fix overflow for timeout values of more than 68 years, which is the maximum covered by sbintime (LONG_MAX seconds).
Some programs use timeout values in excess of 1000 years. The conversion to sbintime caused wrap-around on overflow, which resulted in short or negative timeout values. This caused long delays on sockets opened by affected programs (e.g. OpenSSH).
Kernels compiled without -fno-strict-overflow were not affected, apparently because the compiler tested the sign of the timeout value before performing the multiplication that lead to overflow.
When the -fno-strict-overflow option was added to CFLAGS, this optimization was disabled and the test was performed on the result of the multiplication. Negative products were caught and resulted in EINVAL being returned, but wrap-around to positive values just shortened the timeout value to the residue of the result that could be represented by sbintime.
The fix is to cap the timeout values at the maximum that can be represented by sbintime, which is 2^31 - 1 seconds or more than 68 years.
After this change, the kernel can be compiled with -fno-strict-overflow with no ill effects.
MFC after: 3 days
|
#
258181 |
|
15-Nov-2013 |
pjd |
Replace CAP_POLL_EVENT and CAP_POST_EVENT capability rights (which I had a very hard time to fully understand) with much more intuitive rights:
CAP_EVENT - when set on descriptor, the descriptor can be monitored with syscalls like select(2), poll(2), kevent(2).
CAP_KQUEUE_EVENT - When set on a kqueue descriptor, the kevent(2) syscall can be called on this kqueue to with the eventlist argument set to non-NULL value; in other words the given kqueue descriptor can be used to monitor other descriptors. CAP_KQUEUE_CHANGE - When set on a kqueue descriptor, the kevent(2) syscall can be called on this kqueue to with the changelist argument set to non-NULL value; in other words it allows to modify events monitored with the given kqueue descriptor.
Add alias CAP_KQUEUE, which allows for both CAP_KQUEUE_EVENT and CAP_KQUEUE_CHANGE.
Add backward compatibility define CAP_POLL_EVENT which is equal to CAP_EVENT.
Sponsored by: The FreeBSD Foundation MFC after: 3 days
|
#
257597 |
|
03-Nov-2013 |
jilles |
kqueue: Change error for kqueues rlimit from EMFILE to ENOMEM and document this error condition in the kqueue(2) manual page.
Discussed with: kib
|
#
256849 |
|
21-Oct-2013 |
kib |
Add a resource limit for the total number of kqueues available to the user. Kqueue now saves the ucred of the allocating thread, to correctly decrement the counter on close.
Under some specific and not real-world use scenario for kqueue, it is possible for the kqueues to consume memory proportional to the square of the number of the filedescriptors available to the process. Limit allows administrator to prevent the abuse.
This is kernel-mode side of the change, with the user-mode enabling commit following.
Reported and tested by: pho Discussed with: jmg Sponsored by: The FreeBSD Foundation MFC after: 2 weeks
|
#
255882 |
|
26-Sep-2013 |
kib |
Do not allow negative timeouts for kqueue timers, check for the negative timeout both before and after the conversion to sbintime_t.
For periodic kqueue timer, convert zero timeout into 1ms, to avoid interrupt storm on fast event timers.
Reported and tested by: pho Discussed with: mav Reviewed by: davide Sponsored by: The FreeBSD Foundation Approved by: re (marius)
|
#
255798 |
|
22-Sep-2013 |
kib |
Pre-acquire the filedesc sx when a possibility exists that the later code could need to remove a kqueue from the filedesc list. Global lock is already locked, which causes sleepable after non-sleepable lock acquisition.
Reported and tested by: pho Reviewed by: jmg Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Approved by: re (gjb)
|
#
255675 |
|
18-Sep-2013 |
rdivacky |
Revert r255672, it has some serious flaws, leaking file references etc.
Approved by: re (delphij)
|
#
255672 |
|
18-Sep-2013 |
rdivacky |
Implement epoll support in Linuxulator. This is a tiny wrapper around kqueue to implement epoll subset of functionality. The kqueue user data are 32bit on i386 which is not enough for epoll user data so this patch overrides kqueue fileops to maintain enough space in struct file.
Initial patch developed by me in 2007 and then extended and finished by Yuri Victorovich.
Approved by: re (delphij) Sponsored by: Google Summer of Code Submitted by: Yuri Victorovich <yuri at rawbw dot com> Tested by: Yuri Victorovich <yuri at rawbw dot com>
|
#
255527 |
|
13-Sep-2013 |
kib |
Use TAILQ instead of STAILQ for kqeueue filedescriptors to ensure constant time removal on kqueue close.
Reported and tested by: pho Reviewed by: jmg Sponsored by: The FreeBSD Foundation MFC after: 1 week Approved by: re (delphij)
|
#
255219 |
|
05-Sep-2013 |
pjd |
Change the cap_rights_t type from uint64_t to a structure that we can extend in the future in a backward compatible (API and ABI) way.
The cap_rights_t represents capability rights. We used to use one bit to represent one right, but we are running out of spare bits. Currently the new structure provides place for 114 rights (so 50 more than the previous cap_rights_t), but it is possible to grow the structure to hold at least 285 rights, although we can make it even larger if 285 rights won't be enough.
The structure definition looks like this:
struct cap_rights { uint64_t cr_rights[CAP_RIGHTS_VERSION + 2]; };
The initial CAP_RIGHTS_VERSION is 0.
The top two bits in the first element of the cr_rights[] array contain total number of elements in the array - 2. This means if those two bits are equal to 0, we have 2 array elements.
The top two bits in all remaining array elements should be 0. The next five bits in all array elements contain array index. Only one bit is used and bit position in this five-bits range defines array index. This means there can be at most five array elements in the future.
To define new right the CAPRIGHT() macro must be used. The macro takes two arguments - an array index and a bit to set, eg.
#define CAP_PDKILL CAPRIGHT(1, 0x0000000000000800ULL)
We still support aliases that combine few rights, but the rights have to belong to the same array element, eg:
#define CAP_LOOKUP CAPRIGHT(0, 0x0000000000000400ULL) #define CAP_FCHMOD CAPRIGHT(0, 0x0000000000002000ULL)
#define CAP_FCHMODAT (CAP_FCHMOD | CAP_LOOKUP)
There is new API to manage the new cap_rights_t structure:
cap_rights_t *cap_rights_init(cap_rights_t *rights, ...); void cap_rights_set(cap_rights_t *rights, ...); void cap_rights_clear(cap_rights_t *rights, ...); bool cap_rights_is_set(const cap_rights_t *rights, ...);
bool cap_rights_is_valid(const cap_rights_t *rights); void cap_rights_merge(cap_rights_t *dst, const cap_rights_t *src); void cap_rights_remove(cap_rights_t *dst, const cap_rights_t *src); bool cap_rights_contains(const cap_rights_t *big, const cap_rights_t *little);
Capability rights to the cap_rights_init(), cap_rights_set(), cap_rights_clear() and cap_rights_is_set() functions are provided by separating them with commas, eg:
cap_rights_t rights;
cap_rights_init(&rights, CAP_READ, CAP_WRITE, CAP_FSTAT);
There is no need to terminate the list of rights, as those functions are actually macros that take care of the termination, eg:
#define cap_rights_set(rights, ...) \ __cap_rights_set((rights), __VA_ARGS__, 0ULL) void __cap_rights_set(cap_rights_t *rights, ...);
Thanks to using one bit as an array index we can assert in those functions that there are no two rights belonging to different array elements provided together. For example this is illegal and will be detected, because CAP_LOOKUP belongs to element 0 and CAP_PDKILL to element 1:
cap_rights_init(&rights, CAP_LOOKUP | CAP_PDKILL);
Providing several rights that belongs to the same array's element this way is correct, but is not advised. It should only be used for aliases definition.
This commit also breaks compatibility with some existing Capsicum system calls, but I see no other way to do that. This should be fine as Capsicum is still experimental and this change is not going to 9.x.
Sponsored by: The FreeBSD Foundation
|
#
254932 |
|
26-Aug-2013 |
jmg |
fix up some comments and a white space issue...
MFC after: 3 days
|
#
254356 |
|
15-Aug-2013 |
glebius |
Make sendfile() a method in the struct fileops. Currently only vnode backed file descriptors have this method implemented.
Reviewed by: kib Sponsored by: Nginx, Inc. Sponsored by: Netflix
|
#
254287 |
|
13-Aug-2013 |
jhb |
Some small cleanups to the fixes in r180340: - Set NOTE_TRACKERR before running filt_proc(). If the knote did not have NOTE_FORK set in fflags when registered, then the TRACKERR event could miss being posted. - Don't pass the pid in to filt_proc() for NOTE_FORK events. The special handling for pids is done knote_fork() directly and no longer in filt_proc().
MFC after: 2 weeks
|
#
254072 |
|
07-Aug-2013 |
jhb |
Don't emit a spurious EVFILT_PROC event with no fflags set on process exit if NOTE_EXIT is not being monitored. The rationale is that a listener should only get an event for exit() if they registered interest via NOTE_EXIT. This matches the behavior on OS X. - Don't save the exit status on process exit unless NOTE_EXIT is being monitored. - Add an internal EV_DROP flag that requests kqueue_scan() to free the knote without signalling it to userland and use this when a process exits but the fflags in the knote is zero.
Reviewed by: jmg MFC after: 1 month
|
#
251803 |
|
16-Jun-2013 |
ed |
Change callout use counter to use C11 atomics.
In order to get some coverage of C11 atomics in kernelspace, switch at least one piece of code in kernelspace to use C11 atomics instead of <machine/atomic.h>.
While there, slightly improve the code by adding an assertion to prevent the use count from going negative.
|
#
248092 |
|
09-Mar-2013 |
mav |
Rework overflow checks of r247898 to not let too "intelligent" compiler to optimize it out.
Submitted by: bde
|
#
247917 |
|
07-Mar-2013 |
mav |
Fix off-by-one error in nanoseconds validation.
Submitted by: bde
|
#
247898 |
|
06-Mar-2013 |
mav |
Fix time math overflows and improve zero intervals handling in poll(), select(), nanosleep() and kevent() functions after calloutng changes.
Reported by: bde
|
#
247804 |
|
04-Mar-2013 |
davide |
MFcalloutng: - Rewrite kevent() timeout implementation to allow sub-tick precision. - Make the interval timings for EVFILT_TIMER more accurate. This also removes an hack introduced in r238424.
Sponsored by: Google Summer of Code 2012, iXsystems inc. Tested by: flo, marius, ian, markj, Fabian Keil
|
#
238424 |
|
13-Jul-2012 |
jhb |
Make the interval timings for EVFILT_TIMER more accurate. tvtohz() always adds an extra tick to account for the current partial clock tick. However, that is not appropriate for a repeating timer when the exact tvtohz() value should be used for subsequent intervals. Fix repeating callouts for EVFILT_TIMER by subtracting 1 tick from the tvtohz() result similar to the fix used in realitexpire() for interval timers.
While here, update a few comments to note that if the EVFILT_TIMER code were to move out of kern_event.c, it should move to kern_time.c (where the interval timer code it mimics lives) rather than kern_timeout.c.
MFC after: 1 month
|
#
237084 |
|
14-Jun-2012 |
pjd |
Update comment.
MFC after: 1 month
|
#
233505 |
|
26-Mar-2012 |
melifaro |
- Add knlist_init_rw_reader() function to kqueue(9). Function acquired reader lock if needed. Assert check for reader or writer lock (RA_LOCKED / RA_UNLOCKED) - While here, add knlist_init_mtx.9 to MLINKS and fix some style(9) issues
Reviewed by: glebius Approved by: ae(mentor)
MFC after: 2 weeks
|
#
225617 |
|
16-Sep-2011 |
kmacy |
In order to maximize the re-usability of kernel code in user space this patch modifies makesyscalls.sh to prefix all of the non-compatibility calls (e.g. not linux_, freebsd32_) with sys_ and updates the kernel entry points and all places in the code that use them. It also fixes an additional name space collision between the kernel function psignal and the libc function of the same name by renaming the kernel psignal kern_psignal(). By introducing this change now we will ease future MFCs that change syscalls.
Reviewed by: rwatson Approved by: re (bz)
|
#
225177 |
|
25-Aug-2011 |
attilio |
Fix a deficiency in the selinfo interface: If a selinfo object is recorded (via selrecord()) and then it is quickly destroyed, with the waiters missing the opportunity to awake, at the next iteration they will find the selinfo object destroyed, causing a PF#.
That happens because the selinfo interface has no way to drain the waiters before to destroy the registered selinfo object. Also this race is quite rare to get in practice, because it would require a selrecord(), a poll request by another thread and a quick destruction of the selrecord()'ed selinfo object.
Fix this by adding the seldrain() routine which should be called before to destroy the selinfo objects (in order to avoid such case), and fix the present cases where it might have already been called. Sometimes, the context is safe enough to prevent this type of race, like it happens in device drivers which installs selinfo objects on poll callbacks. There, the destruction of the selinfo object happens at driver detach time, when all the filedescriptors should be already closed, thus there cannot be a race. For this case, mfi(4) device driver can be set as an example, as it implements a full correct logic for preventing this from happening.
Sponsored by: Sandvine Incorporated Reported by: rstone Tested by: pluknet Reviewed by: jhb, kib Approved by: re (bz) MFC after: 3 weeks
|
#
224914 |
|
16-Aug-2011 |
kib |
Add the fo_chown and fo_chmod methods to struct fileops and use them to implement fchown(2) and fchmod(2) support for several file types that previously lacked it. Add MAC entries for chown/chmod done on posix shared memory and (old) in-kernel posix semaphores.
Based on the submission by: glebius Reviewed by: rwatson Approved by: re (bz)
|
#
224797 |
|
12-Aug-2011 |
jonathan |
Rename CAP_*_KEVENT to CAP_*_EVENT.
Change the names of a couple of capability rights to be less FreeBSD-specific.
Approved by: re (kib), mentor (rwatson) Sponsored by: Google Inc
|
#
224778 |
|
11-Aug-2011 |
rwatson |
Second-to-last commit implementing Capsicum capabilities in the FreeBSD kernel for FreeBSD 9.0:
Add a new capability mask argument to fget(9) and friends, allowing system call code to declare what capabilities are required when an integer file descriptor is converted into an in-kernel struct file *. With options CAPABILITIES compiled into the kernel, this enforces capability protection; without, this change is effectively a no-op.
Some cases require special handling, such as mmap(2), which must preserve information about the maximum rights at the time of mapping in the memory map so that they can later be enforced in mprotect(2) -- this is done by narrowing the rights in the existing max_protection field used for similar purposes with file permissions.
In namei(9), we assert that the code is not reached from within capability mode, as we're not yet ready to enforce namespace capabilities there. This will follow in a later commit.
Update two capability names: CAP_EVENT and CAP_KEVENT become CAP_POST_KEVENT and CAP_POLL_KEVENT to more accurately indicate what they represent.
Approved by: re (bz) Submitted by: jonathan Sponsored by: Google Inc
|
#
220245 |
|
01-Apr-2011 |
kib |
After the r219999 is merged to stable/8, rename fallocf(9) to falloc(9) and remove the falloc() version that lacks flag argument. This is done to reduce the KPI bloat.
Requested by: jhb X-MFC-note: do not
|
#
205886 |
|
30-Mar-2010 |
jhb |
Defer freeing a kevent list until after dropping kqueue locks.
LOR: 185 Submitted by: Matthew Fleming @ Isilon MFC after: 1 week
|
#
203875 |
|
14-Feb-2010 |
kib |
Do not leak process lock when current thread is not allowed to see target.
Bumped into by: ed MFC after: 3 days
|
#
201352 |
|
31-Dec-2009 |
brooks |
If a filter has already been added, actually return EEXIST when trying at add it again.
MFC after: 1 week
|
#
201350 |
|
31-Dec-2009 |
brooks |
The devices that supported EVFILT_NETDEV kqueue filters were removed in r195175. Remove all definitions, documentation, and usage.
fifo_misc.c: Remove all kqueue tests as fifo_io.c performs all those that would have remained.
Reviewed by: rwatson MFC after: 3 weeks X-MFC note: don't change vlan_link_state() function signature
|
#
197930 |
|
10-Oct-2009 |
kib |
Postpone dropping fp till both kq_global and kqueue mutexes are unlocked. fdrop() closes file descriptor when reference count goes to zero. Close method for vnodes locks the vnode, resulting in "sleepable after non-sleepable". For pipes, pipe mutex is before kqueue lock, causing LOR.
Reported and tested by: pho MFC after: 2 weeks
|
#
197575 |
|
28-Sep-2009 |
delphij |
Use correct sizeof() object for klist 'list'. Currently, struct klist contained only SLIST_HEAD as its member, thus sizeof(struct klist) would equal to sizeof(struct klist *), so this change makes the code more correct in terms of semantics, but should be a no-op to compiler at this time.
Reported by: MQ <antinvidia at gmail com>
|
#
197407 |
|
22-Sep-2009 |
rdivacky |
Change unsigned foo to u_foo as required by style(9).
Requested by: bde Approved by: ed (mentor)
|
#
197294 |
|
17-Sep-2009 |
rdivacky |
Fix the style of the previous commit.
Approved by: ed (mentor, implicit)
|
#
197293 |
|
17-Sep-2009 |
rdivacky |
Make these argument/variable unsigned as the defines for them don't fit into signed 32bit integer.
Approved by: ed (mentor, implicit) Approved by: sson
|
#
197243 |
|
16-Sep-2009 |
sson |
Add EV_RECEIPT to kevents.
EV_RECEIPT is useful to disambiguating error conditions when multiple events structures are passed to kevent(2). The error code is returned in the data field and EV_ERROR is set.
Approved by: rwatson (co-mentor)
|
#
197242 |
|
16-Sep-2009 |
sson |
Add the EV_DISPATCH flag to kevents.
When the EV_DISPATCH flag is used the event source will be disabled immediately after the delivery of an event. This is similar to the EV_ONESHOT flag but it doesn't delete the event.
Approved by: rwatson (co-mentor)
|
#
197241 |
|
16-Sep-2009 |
sson |
Add EVFILT_USER to kevents.
Add user events support to kernel events which are not associated with any kernel mechanism but are triggered by user level code. This is useful for adding user level events to an event handler that may also be monitoring kernel events.
Approved by: rwatson (co-mentor)
|
#
197240 |
|
16-Sep-2009 |
sson |
Add optional touch event filter hooks to kevents.
The touch event filter is called when a kernel event data is possibly updated. There are two hook points. First, during a kevent() system call. Second, when an event has been triggered.
Approved by: rwatson (co-mentor)
|
#
197134 |
|
12-Sep-2009 |
rwatson |
Use C99 initialization for struct filterops.
Obtained from: Mac OS X Sponsored by: Apple Inc. MFC after: 3 weeks
|
#
195148 |
|
28-Jun-2009 |
stas |
- Turn the third (islocked) argument of the knote call into flags parameter. Introduce the new flag KNF_NOKQLOCK to allow event callers to be called without KQ_LOCK mtx held. - Modify VFS knote calls to always use KNF_NOKQLOCK flag. This is required for ZFS as its getattr implementation may sleep.
Approved by: re (rwatson) Reviewed by: kib MFC after: 2 weeks
|
#
193951 |
|
10-Jun-2009 |
kib |
Adapt vfs kqfilter to the shared vnode lock used by zfs write vop. Use vnode interlock to protect the knote fields [1]. The locking assumes that shared vnode lock is held, thus we get exclusive access to knote either by exclusive vnode lock protection, or by shared vnode lock + vnode interlock.
Do not use kl_locked() method to assert either lock ownership or the fact that curthread does not own the lock. For shared locks, ownership is not recorded, e.g. VOP_ISLOCKED can return LK_SHARED for the shared lock not owned by curthread, causing false positives in kqueue subsystem assertions about knlist lock.
Remove kl_locked method from knlist lock vector, and add two separate assertion methods kl_assert_locked and kl_assert_unlocked, that are supposed to use proper asserts. Change knlist_init accordingly.
Add convenience function knlist_init_mtx to reduce number of arguments for typical knlist initialization.
Submitted by: jhb [1] Noted by: jhb [2] Reviewed by: jhb Tested by: rnoland
|
#
184214 |
|
23-Oct-2008 |
des |
Fix a number of style issues in the MALLOC / FREE commit. I've tried to be careful not to fix anything that was already broken; the NFSv4 code is particularly bad in this respect.
|
#
184205 |
|
23-Oct-2008 |
des |
Retire the MALLOC and FREE macros. They are an abomination unto style(9).
MFC after: 3 months
|
#
180340 |
|
07-Jul-2008 |
kib |
The kqueue_register() function assumes that it is called from the top of the syscall code and acquires various event subsystem locks as needed. The handling of the NOTE_TRACK for EVFILT_PROC is currently done by calling the kqueue_register() from filt_proc() filter, causing recursive entrance of the kqueue code. This results in the LORs and recursive acquisition of the locks.
Implement the variant of the knote() function designed to only handle the fork() event. It mostly copies the knote() body, but also handles the NOTE_TRACK, removing the handling from the filt_proc(), where it causes problems described above. The function is called from the fork1() instead of knote().
When encountering NOTE_TRACK knote, it marks the knote as influx and drops the knlist and kqueue lock. In this context call to kqueue_register is safe from the problems.
An error from the kqueue_register() is reported to the observer as NOTE_TRACKERR fflag.
PR: 108201 Reviewed by: jhb, Pramod Srinivasan <pramod juniper net> (previous version) Discussed with: jmg Tested by: pho MFC after: 2 weeks
|
#
180336 |
|
07-Jul-2008 |
kib |
The r178914 I erronously put the setting of the KQ_FLUXWAIT flag before KQ_FLUX_WAKEUP(). Since the later macro clears the KQ_FLUXWAIT, the kqueue_scan() thread may be not woken up.
Move the setting of KQ_FLUXWAIT after wakeup to correct the issue.
Reported and tested by: pho MFC after: 3 days
|
#
178914 |
|
10-May-2008 |
kib |
Kqueue_scan() may sleep when encountered the influx knotes. On the other hand, it may cause other threads to sleep since kqueue_scan() may mark some knotes as infux. This could lead to the deadlock.
Before kqueue_scan() sleeps, wakeup the threads that are waiting for the influx knotes produced by this thread.
Tested by: pho (previous version) Reviewed by: jmg MFC after: 2 weeks
|
#
178913 |
|
10-May-2008 |
kib |
The kqueue_close() encountering the KN_INFLUX knotes on the kq being closed is the legitimate situation. For instance, filedescriptor with registered events may be closed in parallel with closing the kqueue. Properly handle the case instead of asserting that this cannot happen.
Reported and tested by: pho Reviewed by: jmg MFC after: 2 weeks
|
#
177860 |
|
02-Apr-2008 |
jeff |
- Convert two timeout users to the new callout_reset_curcpu() api.
Sponsored by: Nokia
|
#
177253 |
|
16-Mar-2008 |
rwatson |
In keeping with style(9)'s recommendations on macros, use a ';' after each SYSINIT() macro invocation. This makes a number of lightweight C parsers much happier with the FreeBSD kernel source, including cflow's prcc and lxr.
MFC after: 1 month Discussed with: imp, rink
|
#
175140 |
|
07-Jan-2008 |
jhb |
Make ftruncate a 'struct file' operation rather than a vnode operation. This makes it possible to support ftruncate() on non-vnode file types in the future. - 'struct fileops' grows a 'fo_truncate' method to handle an ftruncate() on a given file descriptor. - ftruncate() moves to kern/sys_generic.c and now just fetches a file object and invokes fo_truncate(). - The vnode-specific portions of ftruncate() move to vn_truncate() in vfs_vnops.c which implements fo_truncate() for vnode file types. - Non-vnode file types return EINVAL in their fo_truncate() method.
Submitted by: rwatson
|
#
174988 |
|
30-Dec-2007 |
jeff |
Remove explicit locking of struct file. - Introduce a finit() which is used to initailize the fields of struct file in such a way that the ops vector is only valid after the data, type, and flags are valid. - Protect f_flag and f_count with atomic operations. - Remove the global list of all files and associated accounting. - Rewrite the unp garbage collection such that it no longer requires the global list of all files and instead uses a list of all unp sockets. - Mark sockets in the accept queue so we don't incorrectly gc them.
Tested by: kris, pho
|
#
174647 |
|
16-Dec-2007 |
jeff |
Refactor select to reduce contention and hide internal implementation details from consumers.
- Track individual selecters on a per-descriptor basis such that there are no longer collisions and after sleeping for events only those descriptors which triggered events must be rescaned. - Protect the selinfo (per descriptor) structure with a mtx pool mutex. mtx pool mutexes were chosen to preserve api compatibility with existing code which does nothing but bzero() to setup selinfo structures. - Use a per-thread wait channel rather than a global wait channel. - Hide select implementation details in a seltd structure which is opaque to the rest of the kernel. - Provide a 'selsocket' interface for those kernel consumers who wish to select on a socket when they have no fd so they no longer have to be aware of select implementation details.
Tested by: kris Reviewed on: arch
|
#
171452 |
|
14-Jul-2007 |
rodrigc |
Revert previous commits which I committed by mistake.
Approved by: re (implicit) Pointy hat to: me
|
#
171450 |
|
14-Jul-2007 |
rodrigc |
The last entry in the ext2_opts array must be NULL, otherwise the kernel with crash in vfs_filteropt() if an invalid mount option is passed to ext2fs.
Approved by: re (kensmith)
|
#
170066 |
|
28-May-2007 |
rwatson |
In kern_kevent(), unconditionally fdrop() fp once fget() has succeeded, as we never have an opportunity to set it to NULL.
Found with: Coverity Prevent(tm) CID: 2161
|
#
170029 |
|
27-May-2007 |
rwatson |
Select a more appealing spelling for the word acquire.
|
#
168355 |
|
04-Apr-2007 |
rwatson |
Replace custom file descriptor array sleep lock constructed using a mutex and flags with an sxlock. This leads to a significant and measurable performance improvement as a result of access to shared locking for frequent lookup operations, reduced general overhead, and reduced overhead in the event of contention. All of these are imported for threaded applications where simultaneous access to a shared file descriptor array occurs frequently. Kris has reported 2x-4x transaction rate improvements on 8-core MySQL benchmarks; smaller improvements can be expected for many workloads as a result of reduced overhead.
- Generally eliminate the distinction between "fast" and regular acquisisition of the filedesc lock; the plan is that they will now all be fast. Change all locking instances to either shared or exclusive locks.
- Correct a bug (pointed out by kib) in fdfree() where previously msleep() was called without the mutex held; sx_sleep() is now always called with the sxlock held exclusively.
- Universally hold the struct file lock over changes to struct file, rather than the filedesc lock or no lock. Always update the f_ops field last. A further memory barrier is required here in the future (discussed with jhb).
- Improve locking and reference management in linux_at(), which fails to properly acquire vnode references before using vnode pointers. Annotate improper use of vn_fullpath(), which will be replaced at a future date.
In fcntl(), we conservatively acquire an exclusive lock, even though in some cases a shared lock may be sufficient, which should be revisited. The dropping of the filedesc lock in fdgrowtable() is no longer required as the sxlock can be held over the sleep operation; we should consider removing that (pointed out by attilio).
Tested by: kris Discussed with: jhb, kris, attilio, jeff
|
#
167211 |
|
04-Mar-2007 |
rwatson |
Remove 'MPSAFE' annotations from the comments above most system calls: all system calls now enter without Giant held, and then in some cases, acquire Giant explicitly.
Remove a number of other MPSAFE annotations in the credential code and tweak one or two other adjacent comments.
|
#
164451 |
|
20-Nov-2006 |
jhb |
Save exit status of an exiting process in kn_data in the knote.
Submitted by: Jared Yanovich ^phirerunner at comcast.net^ MFC after: 2 weeks
|
#
162608 |
|
25-Sep-2006 |
jmg |
remove unnecessary NULL check...
Coverity ID: 1545
|
#
162594 |
|
24-Sep-2006 |
jmg |
hide kqueue_register from public view, and replace it w/ kqfd_register... this eliminates a possible race in aio registering a kevent..
|
#
162592 |
|
24-Sep-2006 |
jmg |
add KTRACE hooks into kevent... This will help people debug their kqueue programs to find out exactly which events were registered and which were returned... This should be lower in kern_kevent, but that would require special munging due to locks and the functions used to copyin/copyout kevents...
If someone wants to teach ktrace how to output pretty kevents, I have a kevent prety printer that can be used...
|
#
159553 |
|
12-Jun-2006 |
jhb |
Use fget() in kqueue_register() instead of doing all the work by hand.
|
#
159173 |
|
02-Jun-2006 |
pjd |
Don't forget to unlock kq lock in low memory situations.
OK'ed by: jmg
|
#
159172 |
|
02-Jun-2006 |
pjd |
Remove confusing done_noglobal label. The KQ_GLOBAL_UNLOCK() macro know how to handle both situations - when kq_global lock is and is not held.
OK'ed by: jmg
|
#
159171 |
|
02-Jun-2006 |
pjd |
Use SLIST_FOREACH_SAFE() macro, because knote_drop() can free an element which can be then used to find next element in the list.
OK'ed by: jmg
|
#
157754 |
|
14-Apr-2006 |
jhb |
Drop the kqueue global mutex as soon as we are finished with it rather than keeping it locked until we exit the function to optimize the case where the lock would be dropped and later reacquired. The optimization was broken when kevent's were moved from UFS to VFS and the knote list lock for a vnode kevent became the lockmgr vnode lock. If one tried to use a kqueue that contained events for a kqueue fd followed by a vnode, then the kq global lock would end up being held when the vnode lock was acquired which could result in sleeping with a mutex held (and subsequent panics) if the vnode lock was contested.
Reviewed by: jmg Tested by: ps (on 6.x) MFC after: 3 days
|
#
157582 |
|
07-Apr-2006 |
jmg |
spell unlock correctly, this is relatively minor as it's rare someone would provide a lock method, and want the default unlock, but it is a bug...
PR: 95356 Submitted by: Stephen Corteselli MFC after: 3 days
|
#
157383 |
|
01-Apr-2006 |
jmg |
mask out any action when copying the flags from the event to the knote..
Pointed out by: Václav Haisman Submitted by: Dan Nelson (slightly modifed patch) MFC after: 3 days
|
#
157267 |
|
29-Mar-2006 |
jmg |
hold the list lock over the f_event and KNOTE_ACTIVATE calls... This closes a race where data could come in before we clear the INFLUX flag, and get skipped over by knote (and hence never be activated, though it should of been)...
Found by: glebius & co. Reviewed by: glebius MFC after: 3 days
|
#
151260 |
|
12-Oct-2005 |
ambrisko |
Add in kqueue support to LIO event notification and fix how it handled notifications when LIO operations completed. These were the problems with LIO event complete notification: - Move all LIO/AIO event notification into one general function so we don't have bugs in different data paths. This unification got rid of several notification bugs one of which if kqueue was used a SIGILL could get sent to the process. - Change the LIO event accounting to count all AIO request that could have been split across the fast path and daemon mode. The prior accounting only kept track of AIO op's in that mode and not the entire list of operations. This could cause a bogus LIO event complete notification to occur when all of the fast path AIO op's completed and not the AIO op's that ended up queued for the daemon.
Suggestions from: alc
|
#
150199 |
|
15-Sep-2005 |
ups |
Fix race condition that caused activation of an event to be ignored immediately after it was deactivated.
Found by: Yahoo! MFC after: 3 days
|
#
147730 |
|
01-Jul-2005 |
ssouhlal |
Fix the recent panics/LORs/hangs created by my kqueue commit by:
- Introducing the possibility of using locks different than mutexes for the knlist locking. In order to do this, we add three arguments to knlist_init() to specify the functions to use to lock, unlock and check if the lock is owned. If these arguments are NULL, we assume mtx_lock, mtx_unlock and mtx_owned, respectively.
- Using the vnode lock for the knlist locking, when doing kqueue operations on a vnode. This way, we don't have to lock the vnode while holding a mutex, in filt_vfsread.
Reviewed by: jmg Approved by: re (scottl), scottl (mentor override) Pointyhat to: ssouhlal Will be happy: everyone
|
#
146950 |
|
03-Jun-2005 |
ps |
Wrap copyin/copyout for kevent so the 32bit wrapper does not have to malloc nchanges * sizeof(struct kevent) AND/OR nevents * sizeof(struct kevent) on every syscall.
Glanced at by: peter, jmg Obtained from: Yahoo! MFC after: 2 weeks
|
#
146603 |
|
24-May-2005 |
jmg |
make stat return an zero'd struct, and be a FIFO again... This is only to fix libc_r since it requires stat to close fd's, and so commented in the code...
PR: threads/75795 Reviewed by: ps MFC after: 1 week
|
#
143776 |
|
18-Mar-2005 |
jmg |
fix aio+kq... I've been running ambrisko's test program for much longer w/o problems than I was before... This simply brings back the knote_delete as knlist_delete which will also drop the knote's, instead of just clearing the list and seeing _ONESHOT...
Fix a race where if a note was _INFLUX and _DETACHED, it could end up being modified... whoopse..
MFC after: 1 week Prodded by: ambrisko and dwhite
|
#
142934 |
|
01-Mar-2005 |
ps |
Use kern_kevent instead of the stackgap for 32bit syscall wrapping.
Submitted by: jhb Tested on: amd64
|
#
142217 |
|
22-Feb-2005 |
rwatson |
When invoking callout_init(), spell '1' as "CALLOUT_MPSAFE".
MFC after: 3 days
|
#
141616 |
|
10-Feb-2005 |
phk |
Make a bunch of malloc types static.
Found by: src/tools/tools/kernxref
|
#
137772 |
|
16-Nov-2004 |
phk |
Move a FILEDESC_UNLOCK upwards to silence witness.
|
#
137647 |
|
13-Nov-2004 |
phk |
Introduce an alias for FILEDESC_{UN}LOCK() with the suffix _FAST.
Use this in all the places where sleeping with the lock held is not an issue.
The distinction will become significant once we finalize the exact lock-type to use for this kind of case.
|
#
136500 |
|
14-Oct-2004 |
jmg |
/me gets the wrong patch out of the pr :( /me had the write patch w/o comments on his test system.
Pointed out by: kuriyama and ache Pointy hat to: jmg
|
#
136492 |
|
13-Oct-2004 |
jmg |
fix a bug where signal events didn't set the flags for attach/detach..
PR: 72234 MFC after: 2 days
|
#
135240 |
|
14-Sep-2004 |
jmg |
unlock global lock in kqueue_scan before msleep'ing to prevent dead lock.. we didn't unlock global lock earlier to prevent just having to reaquire it again..
Found by: peter Reviewed by: ps MFC after: 3 days
|
#
135021 |
|
10-Sep-2004 |
jmg |
remove giant required from kqueue_close..
Reported by: kuriyama MFC after: 3 days
|
#
134859 |
|
06-Sep-2004 |
jmg |
don't call f_detach if the filter has alread removed the knote.. This happens when a proc exits, but needs to inform the user that this has happened.. This also means we can remove the check for detached from proc and sig f_detach functions as this is doing in kqueue now...
MFC after: 5 days
|
#
133794 |
|
16-Aug-2004 |
green |
Allocate the marker, when scanning a kqueue, from the "heap" instead of the stack. When swapped out, a process's kernel stack would be unavailable, and we could get a page fault when scanning the same kqueue.
PR: kern/61849
|
#
133741 |
|
15-Aug-2004 |
jmg |
Add locking to the kqueue subsystem. This also makes the kqueue subsystem a more complete subsystem, and removes the knowlege of how things are implemented from the drivers. Include locking around filter ops, so a module like aio will know when not to be unloaded if there are outstanding knotes using it's filter ops.
Currently, it uses the MTX_DUPOK even though it is not always safe to aquire duplicate locks. Witness currently doesn't support the ability to discover if a dup lock is ok (in some cases).
Reviewed by: green, rwatson (both earlier versions)
|
#
133635 |
|
13-Aug-2004 |
jmg |
looks like rwatson forgot tabs... :)
|
#
133590 |
|
12-Aug-2004 |
rwatson |
Trim trailing white space.
|
#
132554 |
|
22-Jul-2004 |
rwatson |
Push Giant acquisition down into fo_stat() from most callers. Acquire Giant conditional on debug.mpsafenet in the socket soo_stat() routine, unconditionally in vn_statfile() for VFS, and otherwise don't acquire Giant. Accept an unlocked read in kqueue_stat(), and cryptof_stat() is a no-op. Don't acquire Giant in fstat() system call.
Note: in fdescfs, fo_stat() is called while holding Giant due to the VFS stack sitting on top, and therefore there will still be Giant recursion in this case.
|
#
132549 |
|
22-Jul-2004 |
rwatson |
Push acquisition of Giant from fdrop_closed() into fo_close() so that individual file object implementations can optionally acquire Giant if they require it:
- soo_close(): depends on debug.mpsafenet - pipe_close(): Giant not acquired - kqueue_close(): Giant required - vn_close(): Giant required - cryptof_close(): Giant required (conservative)
Notes:
Giant is still acquired in close() even when closing MPSAFE objects due to kqueue requiring Giant in the calling closef() code. Microbenchmarks indicate that this removal of Giant cuts 3%-3% off of pipe create/destroy pairs from user space with SMP compiled into the kernel.
The cryptodev and opencrypto code appears MPSAFE, but I'm unable to test it extensively and so have left Giant over fo_close(). It can probably be removed given some testing and review.
|
#
132174 |
|
15-Jul-2004 |
alfred |
Disable SIGIO for now, leave a comment as to why it's busted and hard to fix.
|
#
132138 |
|
14-Jul-2004 |
alfred |
Make FIOASYNC, FIOSETOWN and FIOGETOWN work on kqueues.
|
#
131562 |
|
04-Jul-2004 |
alfred |
Introduce a new kevent filter. EVFILT_FS that will be used to signal generic filesystem events to userspace. Currently only mount and unmount of filesystems are signalled. Soon to be added, up/down status of NFS.
Introduce a sysctl node used to route requests to/from filesystems based on filesystem ids.
Introduce a new vfsop, vfs_sysctl(mp, req) that is used as the callback/ entrypoint by the sysctl code to change individual filesystems.
|
#
129949 |
|
01-Jun-2004 |
rwatson |
Add GIANT_REQUIRED to kqueue_close(), since kqueue currently requires Giant.
|
#
127982 |
|
07-Apr-2004 |
cperciva |
Fix filt_timer* races: Finish initializing a knote before we pass it to a callout, and use the new callout_drain API to make sure that a callout has finished before we deallocate memory it is using.
PR: kern/64121 Discussed with: gallatin
|
#
126033 |
|
20-Feb-2004 |
green |
Make sure to wake up any select waiters when closing a kqueue (also, not crash). I am fairly sure that only people with SMP and multi-threaded apps using kqueue will be affected by this, so I have a stress-testing program on my web site: <URL:http://green.homeunix.org/~green/getaddrinfo-pthreads-stresstest.c>
|
#
123843 |
|
25-Dec-2003 |
dwmalone |
Don't TAILQ_INIT kq_head twice, once is enough.
|
#
122686 |
|
14-Nov-2003 |
cognet |
Better fix than my previous commit: in exit1(), make sure the p_klist is empty after sending NOTE_EXIT. The process won't report fork() or execve() and won't be able to handle NOTE_SIGNAL knotes anyway. This fixes some race conditions with do_tdsignal() calling knote() while the process is exiting.
Reported by: Stefan Farfeleder <stefan@fafoe.narf.at> MFC after: 1 week
|
#
122352 |
|
09-Nov-2003 |
tanimura |
- Implement selwakeuppri() which allows raising the priority of a thread being waken up. The thread waken up can run at a priority as high as after tsleep().
- Replace selwakeup()s with selwakeuppri()s and pass appropriate priorities.
- Add cv_broadcastpri() which raises the priority of the broadcast threads. Used by selwakeuppri() if collision occurs.
Not objected in: -arch, -current
|
#
122019 |
|
04-Nov-2003 |
cognet |
I believe kbyanc@ really meant this in rev 1.58. Use zpfind() to see if the process became a zombie if pfind() doesn't find it and if the caller wants to know about process death, so that the caller knows the process died even if it happened before the kevent was actually registered.
MFC after: 1 week
|
#
122017 |
|
04-Nov-2003 |
cognet |
Do not attempt to report proc event if NOTE_EXIT has already been received. This fixes a race condition (specifically with signal events) that could lead to the kn being re-inserted into the list after it has been destroyed, which is not something we want to happen.
PR: kern/58258
|
#
121256 |
|
19-Oct-2003 |
dwmalone |
falloc allocates a file structure and adds it to the file descriptor table, acquiring the necessary locks as it works. It usually returns two references to the new descriptor: one in the descriptor table and one via a pointer argument.
As falloc releases the FILEDESC lock before returning, there is a potential for a process to close the reference in the file descriptor table before falloc's caller gets to use the file. I don't think this can happen in practice at the moment, because Giant indirectly protects closes.
To stop the file being completly closed in this situation, this change makes falloc set the refcount to two when both references are returned. This makes life easier for several of falloc's callers, because the first thing they previously did was grab an extra reference on the file.
Reviewed by: iedowse Idea run past: jhb
|
#
116546 |
|
18-Jun-2003 |
phk |
Initialize struct fileops with C99 sparse initialization.
|
#
116182 |
|
11-Jun-2003 |
obrien |
Use __FBSDID().
|
#
113377 |
|
12-Apr-2003 |
kbyanc |
Fix race between a process registering a NOTE_EXIT EVFILT_PROC event and the target process exiting which causes attempts to register the kevent to randomly fail depending on whether the target runs to completion before the parent can call kevent(2). The bug actually effects EVFILT_PROC events on any zombie process, but the most common manifestation is with parents trying to monitor child processes.
MFC after: 2 weeks Sponsored by: NTT Multimedia Communications Labs
|
#
111119 |
|
19-Feb-2003 |
imp |
Back out M_* changes, per decision of the TRB.
Approved by: trb
|
#
110908 |
|
15-Feb-2003 |
alfred |
Do not allow kqueues to be passed via unix domain sockets.
|
#
110906 |
|
15-Feb-2003 |
alfred |
Fix LOR with PROC/filedesc. Introduce fdesc_mtx that will be used as a barrier between free'ing filedesc structures. Basically if you want to access another process's filedesc, you want to hold this mutex over the entire operation.
|
#
109623 |
|
21-Jan-2003 |
alfred |
Remove M_TRYWAIT/M_WAITOK/M_WAIT. Callers should use 0. Merge M_NOWAIT/M_DONTWAIT into a single flag M_NOWAIT.
|
#
109609 |
|
21-Jan-2003 |
hsu |
Rewrite the SMP filedesc locking in knote_attach() in order to 1. eliminate unnecessary loop which frees and re-allocates the just allocated array 2. eliminate the newsize recomputation 3. eliminate unnecessary unlock and relock around free 4. correctly match the free with the malloc into M_KQUEUE instead of M_TEMP 5. eliminate conditional assignment of oldlist, which is equivalent to a simple assignment 6. eliminate the oldlist temporary variable completely
Reviewed by: jhb
|
#
109153 |
|
13-Jan-2003 |
dillon |
Bow to the whining masses and change a union back into void *. Retain removal of unnecessary casts and throw in some minor cleanups to see if anyone complains, just for the hell of it.
|
#
109123 |
|
12-Jan-2003 |
dillon |
Change struct file f_data to un_data, a union of the correct struct pointer types, and remove a huge number of casts from code using it.
Change struct xfile xf_data to xun_data (ABI is still compatible).
If we need to add a #define for f_data and xf_data we can, but I don't think it will be necessary. There are no operational changes in this commit.
|
#
108524 |
|
01-Jan-2003 |
alfred |
When compiling the kernel do not implicitly include filedesc.h from proc.h, this was causing filedesc work to be very painful. In order to make this work split out sigio definitions to thier own header (sigio.h) which is included from proc.h for the time being.
|
#
108255 |
|
24-Dec-2002 |
phk |
White-space changes.
|
#
108238 |
|
23-Dec-2002 |
phk |
Detediousficate declaration of fileops array members by introducing typedefs for them.
|
#
106171 |
|
29-Oct-2002 |
rwatson |
Minor comment typo fix.
Submitted by: Wayne Morrison <tewok@tislabs.com>
|
#
104396 |
|
03-Oct-2002 |
truckman |
hashinit() calls MALLOC(), so release the filedesc lock in knote_attach() before calling hashinit() and relock afterwards, taking care to see that we don't lose a race.
|
#
102003 |
|
17-Aug-2002 |
rwatson |
In continuation of early fileop credential changes, modify fo_ioctl() to accept an 'active_cred' argument reflecting the credential of the thread initiating the ioctl operation.
- Change fo_ioctl() to accept active_cred; change consumers of the fo_ioctl() interface to generally pass active_cred from td->td_ucred. - In fifofs, initialize filetmp.f_cred to ap->a_cred so that the invocations of soo_ioctl() are provided access to the calling f_cred. Pass ap->a_td->td_ucred as the active_cred, but note that this is required because we don't yet distinguish file_cred and active_cred in invoking VOP's. - Update kqueue_ioctl() for its new argument. - Update pipe_ioctl() for its new argument, pass active_cred rather than td_ucred to MAC for authorization. - Update soo_ioctl() for its new argument. - Update vn_ioctl() for its new argument, use active_cred rather than td->td_ucred to authorize VOP_IOCTL() and the associated VOP_GETATTR().
Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs
|
#
101987 |
|
16-Aug-2002 |
rwatson |
Correct white space nits that crept in during my recent merges of trustedbsd_mac material.
|
#
101983 |
|
16-Aug-2002 |
rwatson |
Make similar changes to fo_stat() and fo_poll() as made earlier to fo_read() and fo_write(): explicitly use the cred argument to fo_poll() as "active_cred" using the passed file descriptor's f_cred reference to provide access to the file credential. Add an active_cred argument to fo_stat() so that implementers have access to the active credential as well as the file credential. Generally modify callers of fo_stat() to pass in td->td_ucred rather than fp->f_cred, which was redundantly provided via the fp argument. This set of modifications also permits threads to perform these operations on behalf of another thread without modifying their credential.
Trickle this change down into fo_stat/poll() implementations:
- badfo_poll(), badfo_stat(): modify/add arguments. - kqueue_poll(), kqueue_stat(): modify arguments. - pipe_poll(), pipe_stat(): modify/add arguments, pass active_cred to MAC checks rather than td->td_ucred. - soo_poll(), soo_stat(): modify/add arguments, pass fp->f_cred rather than cred to pru_sopoll() to maintain current semantics. - sopoll(): moidfy arguments. - vn_poll(), vn_statfile(): modify/add arguments, pass new arguments to vn_stat(). Pass active_cred to MAC and fp->f_cred to VOP_POLL() to maintian current semantics. - vn_close(): rename cred to file_cred to reflect reality while I'm here. - vn_stat(): Add active_cred and file_cred arguments to vn_stat() and consumers so that this distinction is maintained at the VFS as well as 'struct file' layer. Pass active_cred instead of td->td_ucred to MAC and to VOP_GETATTR() to maintain current semantics.
- fifofs: modify the creation of a "filetemp" so that the file credential is properly initialized and can be used in the socket code if desired. Pass ap->a_td->td_ucred as the active credential to soo_poll(). If we teach the vnop interface about the distinction between file and active credentials, we would use the active credential here.
Note that current inconsistent passing of active_cred vs. file_cred to VOP's is maintained. It's not clear why GETATTR would be authorized using active_cred while POLL would be authorized using file_cred at the file system level.
Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs
|
#
101941 |
|
15-Aug-2002 |
rwatson |
In order to better support flexible and extensible access control, make a series of modifications to the credential arguments relating to file read and write operations to cliarfy which credential is used for what:
- Change fo_read() and fo_write() to accept "active_cred" instead of "cred", and change the semantics of consumers of fo_read() and fo_write() to pass the active credential of the thread requesting an operation rather than the cached file cred. The cached file cred is still available in fo_read() and fo_write() consumers via fp->f_cred. These changes largely in sys_generic.c.
For each implementation of fo_read() and fo_write(), update cred usage to reflect this change and maintain current semantics:
- badfo_readwrite() unchanged - kqueue_read/write() unchanged pipe_read/write() now authorize MAC using active_cred rather than td->td_ucred - soo_read/write() unchanged - vn_read/write() now authorize MAC using active_cred but VOP_READ/WRITE() with fp->f_cred
Modify vn_rdwr() to accept two credential arguments instead of a single credential: active_cred and file_cred. Use active_cred for MAC authorization, and select a credential for use in VOP_READ/WRITE() based on whether file_cred is NULL or not. If file_cred is provided, authorize the VOP using that cred, otherwise the active credential, matching current semantics.
Modify current vn_rdwr() consumers to pass a file_cred if used in the context of a struct file, and to always pass active_cred. When vn_rdwr() is used without a file_cred, pass NOCRED.
These changes should maintain current semantics for read/write, but avoid a redundant passing of fp->f_cred, as well as making it more clear what the origin of each credential is in file descriptor read/write operations.
Follow-up commits will make similar changes to other file descriptor operations, and modify the MAC framework to pass both credentials to MAC policy modules so they can implement either semantic for revocation.
Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs
|
#
99009 |
|
29-Jun-2002 |
alfred |
More caddr_t removal, make fo_ioctl take a void * instead of a caddr_t.
|
#
98998 |
|
29-Jun-2002 |
alfred |
More caddr_t removal. Change struct knote's kn_hook from caddr_t to void *.
|
#
96886 |
|
19-May-2002 |
jhb |
Change p_can{debug,see,sched,signal}()'s first argument to be a thread pointer instead of a proc pointer and require the process pointed to by the second argument to be locked. We now use the thread ucred reference for the credential checks in p_can*() as a result. p_canfoo() should now no longer need Giant.
|
#
92751 |
|
20-Mar-2002 |
jeff |
Remove references to vm_zone.h and switch over to the new uma API.
Also, remove maxsockets. If you look carefully you'll notice that the old zone allocator never honored this anyway.
|
#
89749 |
|
24-Jan-2002 |
jlemon |
Add entry for EVFILT_NETDEV, which was inadverdently omitted back in Sept.
|
#
89319 |
|
14-Jan-2002 |
alfred |
Replace ffind_* with fget calls.
Make fget MPsafe.
Make fgetvp and fgetsock use the fget subsystem to reduce code bloat.
Push giant down in fpathconf().
|
#
89306 |
|
13-Jan-2002 |
alfred |
SMP Lock struct file, filedesc and the global file list.
Seigo Tanimura (tanimura) posted the initial delta.
I've polished it quite a bit reducing the need for locking and adapting it for KSE.
Locks:
1 mutex in each filedesc protects all the fields. protects "struct file" initialization, while a struct file is being changed from &badfileops -> &pipeops or something the filedesc should be locked.
1 mutex in each struct file protects the refcount fields. doesn't protect anything else. the flags used for garbage collection have been moved to f_gcflag which was the FILLER short, this doesn't need locking because the garbage collection is a single threaded container. could likely be made to use a pool mutex.
1 sx lock for the global filelist.
struct file * fhold(struct file *fp); /* increments reference count on a file */
struct file * fhold_locked(struct file *fp); /* like fhold but expects file to locked */
struct file * ffind_hold(struct thread *, int fd); /* finds the struct file in thread, adds one reference and returns it unlocked */
struct file * ffind_lock(struct thread *, int fd); /* ffind_hold, but returns file locked */
I still have to smp-safe the fget cruft, I'll get to that asap.
|
#
88633 |
|
29-Dec-2001 |
alfred |
Make AIO a loadable module.
Remove the explicit call to aio_proc_rundown() from exit1(), instead AIO will use at_exit(9).
Add functions at_exec(9), rm_at_exec(9) which function nearly the same as at_exec(9) and rm_at_exec(9), these functions are called on behalf of modules at the time of execve(2) after the image activator has run.
Use a modified version of tegge's suggestion via at_exec(9) to close an exploitable race in AIO.
Fix SYSCALL_MODULE_HELPER such that it's archetecuterally neutral, the problem was that one had to pass it a paramater indicating the number of arguments which were actually the number of "int". Fix it by using an inline version of the AS macro against the syscall arguments. (AS should be available globally but we'll get to that later.)
Add a primative system for dynamically adding kqueue ops, it's really not as sophisticated as it should be, but I'll discuss with jlemon when he's around.
|
#
86341 |
|
14-Nov-2001 |
dillon |
remove holdfp()
Replace uses of holdfp() with fget*() or fgetvp*() calls as appropriate
introduce fget(), fget_read(), fget_write() - these functions will take a thread and file descriptor and return a file pointer with its ref count bumped.
introduce fgetvp(), fgetvp_read(), fgetvp_write() - these functions will take a thread and file descriptor and return a vref()'d vnode.
*_read() requires that the file pointer be FREAD, *_write that it be FWRITE.
This continues the cleanup of struct filedesc and struct file access routines which, when are all through with it, will allow us to then make the API calls MP safe and be able to move Giant down into the fo_* functions.
|
#
84138 |
|
29-Sep-2001 |
jlemon |
Have EVFILT_TIMERS allocate their callouts via malloc() instead of using the static callout list allocated by the system.
Change malloc type from M_TEMP to M_KQUEUE to better track memory.
Add a kern.kq_calloutmax to globally limit the amount of kernel memory that can be allocated by callouts.
Submitted by: iedowse (items 1, 2)
|
#
83805 |
|
21-Sep-2001 |
jhb |
Use the passed in thread to selrecord() instead of curthread.
|
#
83366 |
|
12-Sep-2001 |
julian |
KSE Milestone 2 Note ALL MODULES MUST BE RECOMPILED make the kernel aware that there are smaller units of scheduling than the process. (but only allow one thread per process at this time). This is functionally equivalent to teh previousl -current except that there is a thread associated with each process.
Sorry john! (your next MFC will be a doosie!)
Reviewed by: peter@freebsd.org, dillon@freebsd.org
X-MFC after: ha ha ha ha
|
#
82710 |
|
01-Sep-2001 |
dillon |
Pushdown Giant for acct(), kqueue(), kevent(), execve(), fork(), vfork(), rfork(), jail().
|
#
79989 |
|
19-Jul-2001 |
jlemon |
Introduce EVFILT_TIMER, which allows a process to establish an arbitrary number of timers, both oneshot and periodic.
Repeatedly reminded to commit by: jayanth Reviewed by: peter (a while back)
|
#
79335 |
|
05-Jul-2001 |
rwatson |
o Replace calls to p_can(..., P_CAN_xxx) with calls to p_canxxx(). The p_can(...) construct was a premature (and, it turns out, awkward) abstraction. The individual calls to p_canxxx() better reflect differences between the inter-process authorization checks, such as differing checks based on the type of signal. This has a side effect of improving code readability. o Replace direct credential authorization checks in ktrace() with invocation of p_candebug(), while maintaining the special case check of KTR_ROOT. This allows ktrace() to "play more nicely" with new mandatory access control schemes, as well as making its authorization checks consistent with other "debugging class" checks. o Eliminate "privused" construct for p_can*() calls which allowed the caller to determine if privilege was required for successful evaluation of the access control check. This primitive is currently unused, and as such, serves only to complicate the API.
Approved by: ({procfs,linprocfs} changes) des Obtained from: TrustedBSD Project
|
#
76166 |
|
01-May-2001 |
markm |
Undo part of the tangle of having sys/lock.h and sys/mutex.h included in other "system" header files.
Also help the deprecation of lockmgr.h by making it a sub-include of sys/lock.h and removing sys/lockmgr.h form kernel .c files.
Sort sys/*.h includes where possible in affected files.
OK'ed by: bde (with reservations)
|
#
75893 |
|
24-Apr-2001 |
jhb |
Change the pfind() and zpfind() functions to lock the process that they find before releasing the allproc lock and returning.
Reviewed by: -smp, dfr, jake
|
#
75451 |
|
12-Apr-2001 |
rwatson |
o Make kqueue's filt_procattach() function use the error value returned by p_can(...P_CAN_SEE), rather than returning EACCES directly. This brings the error code used here into line with similar arrangements elsewhere, and prevents the leakage of pid usage information.
Reviewed by: jlemon Obtained from: TrustedBSD Project
|
#
72969 |
|
24-Feb-2001 |
jlemon |
Add an EV_SET() convenience macro for initializing struct kevent prior to the call to kevent().
Update the copyright notices as well.
|
#
72958 |
|
23-Feb-2001 |
jlemon |
Fix typo in comment (knode -> knote).
|
#
72521 |
|
15-Feb-2001 |
jlemon |
Extend kqueue down to the device layer.
Backwards compatible approach suggested by: peter
|
#
71500 |
|
24-Jan-2001 |
jhb |
Proc locking.
|
#
70834 |
|
09-Jan-2001 |
wollman |
select() DKI is now in <sys/selinfo.h>.
|
#
69781 |
|
08-Dec-2000 |
dwmalone |
Convert more malloc+bzero to malloc+M_ZERO.
Submitted by: josh@zipperup.org Submitted by: Robert Drehmel <robd@gmx.net>
|
#
68883 |
|
18-Nov-2000 |
dillon |
This patchset fixes a large number of file descriptor race conditions. Pre-rfork code assumed inherent locking of a process's file descriptor array. However, with the advent of rfork() the file descriptor table could be shared between processes. This patch closes over a dozen serious race conditions related to one thread manipulating the table (e.g. closing or dup()ing a descriptor) while another is blocked in an open(), close(), fcntl(), read(), write(), etc...
PR: kern/11629 Discussed with: Alexander Viro <viro@math.psu.edu>
|
#
65237 |
|
30-Aug-2000 |
rwatson |
o Centralize inter-process access control, introducing:
int p_can(p1, p2, operation, privused)
which allows specification of subject process, object process, inter-process operation, and an optional call-by-reference privused flag, allowing the caller to determine if privilege was required for the call to succeed. This allows jail, kern.ps_showallprocs and regular credential-based interaction checks to occur in one block of code. Possible operations are P_CAN_SEE, P_CAN_SCHED, P_CAN_KILL, and P_CAN_DEBUG. p_can currently breaks out as a wrapper to a series of static function checks in kern_prot, which should not be invoked directly.
o Commented out capabilities entries are included for some checks.
o Update most inter-process authorization to make use of p_can() instead of manual checks, PRISON_CHECK(), P_TRESPASS(), and kern.ps_showallprocs.
o Modify suser{,_xxx} to use const arguments, as it no longer modifies process flags due to the disabling of ASU.
o Modify some checks/errors in procfs so that ENOENT is returned instead of ESRCH, further improving concealment of processes that should not be visible to other processes. Also introduce new access checks to improve hiding of processes for procfs_lookup(), procfs_getattr(), procfs_readdir(). Correct a bug reported by bp concerning not handling the CREATE case in procfs_lookup(). Remove volatile flag in procfs that caused apparently spurious qualifier warnigns (approved by bde).
o Add comment noting that ktrace() has not been updated, as its access control checks are different from ptrace(), whereas they should probably be the same. Further discussion should happen on this topic.
Reviewed by: bde, green, phk, freebsd-security, others Approved by: bde Obtained from: TrustedBSD Project
|
#
64343 |
|
07-Aug-2000 |
jlemon |
Fix bug with timeout; previously, when attempting to poll the kqueue by passing a zero-valued timeout, the code would always sleep for one tick. Change code to avoid calling tsleep if we have no intention of sleeping.
Bring in bugfix from sys_select.c, r1.60 which also applies here.
Modify error handling slightly; passing in an invalid fd will now result in EBADF returned in the eventlist, while an attempt to change a knote which does not exist will result in ENOENT being returned. Previously such attempts would fail silently without notification.
Pointed out by: nicolas.leonard@animaths.com Rick Reed (rr@yahoo-inc.com)
|
#
64084 |
|
01-Aug-2000 |
jlemon |
Back out rev 1.12; its not clear that this is the right thing to do, and in any event, it wasn't done correctly in the first place.
|
#
63977 |
|
28-Jul-2000 |
peter |
Fix warnings - make kevent args in comment match those in syscalls.master. Deal with consts.
|
#
63943 |
|
27-Jul-2000 |
jlemon |
Have kevent() automatically restart if interrupted by a signal. If this is not desired, then the user can register an EV_SIGNAL filter to explicitly catch a signal event.
Change requested by: jayanth, ps, peter "Why is kevent non-restartable after a signal?"
|
#
63470 |
|
18-Jul-2000 |
jlemon |
Fix a bug which would cause some knotes to get lost when two kqueues were being used in a process at the same time.
Test case provided by: Chris Peiffer <peifferc@CS.Stanford.EDU>
|
#
63452 |
|
18-Jul-2000 |
jlemon |
Simplify kqueue API slightly.
Discussed on: -arch
|
#
62218 |
|
28-Jun-2000 |
chris |
Report a file type (S_IFIFO) in kqueue_stat().
|
#
61962 |
|
22-Jun-2000 |
jlemon |
Add code so that the udata field is preserved across a TRACK event.
When re-adding an event, do not reset the event state. If the event was pending, it will remain pending. This allows the user to change the udata field after the event was registered, while not losing any events which have already occurred.
Reported by: jmg
|
#
61468 |
|
10-Jun-2000 |
jlemon |
malloc(..., M_WAITOK) will not return NULL, so remove the error handling for this case (which was slightly broken anyway)
Fix up some whitespace problems while I'm here too.
Submitted by: alfred (in a slightly different form)
|
#
60938 |
|
26-May-2000 |
jake |
Back out the previous change to the queue(3) interface. It was not discussed and should probably not happen.
Requested by: msmith and others
|
#
60833 |
|
23-May-2000 |
jake |
Change the way that the queue(3) structures are declared; don't assume that the type argument to *_HEAD and *_ENTRY is a struct.
Suggested by: phk Reviewed by: phk Approved by: mdodd
|
#
60759 |
|
21-May-2000 |
green |
Back out NOTE_EXIT status reporting pending discussion.
|
#
60659 |
|
17-May-2000 |
green |
Put the wait(2) exit status in "data" for NOTE_EXIT kevents.
|
#
59997 |
|
04-May-2000 |
jlemon |
Fix one bug where the kn_head list could be manipulated without spl() protection in the case of a copyout error.
Add missing spl calls around the intial activation call that is done when when the kevent is added.
Add two KASSERT macros to help catch errors in the future.
|
#
59290 |
|
16-Apr-2000 |
jlemon |
Add files that I forgot to `cvs add' on last commit.
|
#
341083 |
|
27-Nov-2018 |
markj |
MFC r340898: Ensure that knotes do not get registered when KQ_CLOSING is set.
PR: 228858
|
#
341078 |
|
27-Nov-2018 |
markj |
MFC r340897: Lock the knlist before releasing the in-flux state in knote_fork().
PR: 228858
|
#
341076 |
|
27-Nov-2018 |
markj |
MFC r340899: Plug some kernel memory disclosures via kevent(2).
|
#
340904 |
|
24-Nov-2018 |
markj |
MFC r340734: Avoid unsynchronized updates to kn_status.
|
#
337418 |
|
07-Aug-2018 |
dab |
MFC r336761 & r336781:
Allow a EVFILT_TIMER kevent to be updated.
If a timer is updated (re-added) with a different time period (specified in the .data field of the kevent), the new time period has no effect; the timer will not expire until the original time has elapsed. This violates the documented behavior as the kqueue(2) man page says (in part) "Re-adding an existing event will modify the parameters of the original event, and not result in a duplicate entry."
This modification, adapted from a patch submitted by cem@ to PR214987, fixes the kqueue system to allow updating a timer entry. The kevent timer behavior is changed to:
* When a timer is re-added, update the timer parameters to and re-start the timer using the new parameters. * Allow updating both active and already expired timers. * When the timer has already expired, dequeue any undelivered events and clear the count of expirations.
All of these changes address the original PR and also bring the FreeBSD and macOS kevent timer behaviors into agreement.
A few other changes were made along the way:
* Update the kqueue(2) man page to reflect the new timer behavior. * Fix man page style issues in kqueue(2) diagnosed by igor. * Update the timer libkqueue system test to test for the updated timer behavior. * Fix the (test) libkqueue common.h file so that it includes config.h which defines various HAVE_* feature defines, before the #if tests for such variables in common.h. This enables the use of the actual err(3) family of functions. * Fix the usages of the err(3) functions in the tests for incorrect type of variables. Those were formerly undiagnosed due to the disablement of the err(3) functions (see previous bullet point).
PR: 214987 Relnotes: yes Sponsored by: Dell EMC
|
#
328454 |
|
26-Jan-2018 |
jhb |
MFC 326184: Decode kevent structures logged via ktrace(2) in kdump.
- Add a new KTR_STRUCT_ARRAY ktrace record type which dumps an array of structures.
The structure name in the record payload is preceded by a size_t containing the size of the individual structures. Use this to replace the previous code that dumped the kevent arrays dumped for kevent(). kdump is now able to decode the kevent structures rather than dumping their contents via a hexdump.
One change from before is that the 'changes' and 'events' arrays are not marked with separate 'read' and 'write' annotations in kdump output. Instead, the first array is the 'changes' array, and the second array (only present if kevent doesn't fail with an error) is the 'events' array. For kevent(), empty arrays are denoted by an entry with an array containing zero entries rather than no record.
- Move kevent decoding tables from truss to libsysdecode.
This adds three new functions to decode members of struct kevent: sysdecode_kevent_filter, sysdecode_kevent_flags, and sysdecode_kevent_fflags.
kdump uses these helper functions to pretty-print kevent fields.
- Move structure definitions for freebsd11 and freebsd32 kevent structures to <sys/event.h> so that they can be shared with userland. The 32-bit structures are only exposed if _WANT_KEVENT32 is defined. The freebsd11 structures are only exposed if _WANT_FREEBSD11_KEVENT is defined. The 32-bit freebsd11 structure requires both.
- Decode freebsd11 kevent structures in truss for the compat11.kevent() system call.
- Log 32-bit kevent structures via ktrace for 32-bit compat kevent() system calls.
- While here, constify the 'void *data' argument to ktrstruct().
Note that this version of the change for 11.x does not include freebsd11 kevent structures or _WANT_FREEBSD11_KEVENT. It also does not include the change to decode the compat11.kevent system call in truss.
|
#
320290 |
|
23-Jun-2017 |
kib |
MFC r320038: Style.
Approved by: re (gjb)
|
#
315471 |
|
18-Mar-2017 |
kib |
MFC r315238: Use designated initializers for kevent_copyops.
|
#
315470 |
|
18-Mar-2017 |
kib |
MFC r315155: Ktracing kevent(2) calls with unusual arguments might leads to an overly large allocation requests.
PR: 217435
MFC r315237: Hide kev_iovlen() definition under #ifdef KTRACE.
|
#
311771 |
|
09-Jan-2017 |
kib |
MFC r310615: Change knlist_destroy() to assertion.
|
#
311046 |
|
02-Jan-2017 |
kib |
MFC r310613: Style.
|
#
311007 |
|
01-Jan-2017 |
kib |
MFC r310554: Some optimizations for kqueue timers.
|
#
311006 |
|
01-Jan-2017 |
kib |
MFC r310552: Some style.
|
#
310578 |
|
26-Dec-2016 |
kib |
MFC r310302: Do not clear KN_INFLUX when not owning influx state.
PR: 214923
|
#
310469 |
|
23-Dec-2016 |
kib |
MFC r310159: Switch from stdatomic.h to atomic.h for kernel.
|
#
303216 |
|
23-Jul-2016 |
kib |
MFC r302936: Explicitely check for the valid range of file descriptor values.
Approved by: re (gjb)
|