#
285830 |
|
23-Jul-2015 |
gjb |
- Copy stable/10@285827 to releng/10.2 in preparation for 10.2-RC1 builds. - Update newvers.sh to reflect RC1. - Update __FreeBSD_version to reflect 10.2. - Update default pkg(8) configuration to use the quarterly branch.[1]
Discussed with: re, portmgr [1] Approved by: re (implicit) Sponsored by: The FreeBSD Foundation |
#
284202 |
|
10-Jun-2015 |
kib |
MFC r283601: Add V_MNTREF flag, to indicate that caller of vn_start*_write() already owns a reference on the mount point, and the functions can consume it.
|
#
276649 |
|
04-Jan-2015 |
kib |
MFC r276008: Add VN_OPEN_NAMECACHE flag for vn_open_cred(9), which requests that the created file name was cached. Use the flag for core dumps.
|
#
276500 |
|
01-Jan-2015 |
kib |
MFC r275897: Set NOCACHE flag for CREATE namei() calls, do not specially handle MAKEENTRY in VOP_LOOKUP().
|
#
275957 |
|
20-Dec-2014 |
kib |
MFC r275744: Only sleep interruptible while waiting for suspension end when filesystem specified VFCF_SBDRY flag, i.e. for NFS.
|
#
274787 |
|
21-Nov-2014 |
kib |
MFC r274501: In vfs_write_suspend_umnt(), if suspension cannot be established, do not forget to restore write ops count when returning the error.
|
#
273254 |
|
18-Oct-2014 |
kib |
MFC r272534: Add IO_RANGELOCKED flag for vn_rdwr(9), which specifies that vnode is not locked, but range is.
|
#
272948 |
|
11-Oct-2014 |
kib |
MFC r272538: Slightly reword comment. Move code, which is described by the comment, after it.
|
#
272187 |
|
26-Sep-2014 |
mjg |
MFC r270993:
Fix up proc_realparent to always return correct process.
Prior to the change it would always return initproc for non-traced processes.
This fixes a regression in inferior().
Approved by: re (marius)
|
#
269171 |
|
28-Jul-2014 |
kib |
MFC r268612: Add helper helper vfs_write_suspend_umnt().
Fix the bug in the FFS unmount, when suspension failed, the ufs extattrs were not reinitialized.
|
#
269165 |
|
28-Jul-2014 |
kib |
MFC r268606: Generalize vn_get_ino() to allow filesystems to use custom vnode producer. Convert inline copies of vn_get_ino() in msdosfs and cd9660 into the uses of vn_get_ino_gen().
|
#
268015 |
|
29-Jun-2014 |
kib |
MFC r267491: Use vn_io_fault for the writes from core dumping code.
|
#
267816 |
|
24-Jun-2014 |
kib |
MFC r267564: In msdosfs_setattr(), add a check for result of the utimes(2) permissions test. Refactor the permission checks for utimes(2).
|
#
259817 |
|
24-Dec-2013 |
kib |
MFC r259522: If vn_open_vnode() succeeded in opening the vnode, but subsequent advisory lock cannot be obtained, prevent double-close of the vnode in vn_close() called from the fdrop(), by resetting file' f_ops methods.
|
#
259499 |
|
17-Dec-2013 |
kib |
MFC r258039: Avoid overflow for the page counts.
MFC r258365: Revert back to use int for the page counts. Rearrange the checks to correctly handle overflowing address arithmetic.
|
#
259294 |
|
13-Dec-2013 |
kib |
MFC r257898: Change VFS_PROLOGUE() to evaluate the mp once, convert MNTK_SHARED_WRITES and MNTK_EXTENDED_SHARED tests into inline functions.
|
#
256281 |
|
10-Oct-2013 |
gjb |
Copy head (r256279) to stable/10 as part of the 10.0-RELEASE cycle.
Approved by: re (implicit) Sponsored by: The FreeBSD Foundation
|
#
255510 |
|
13-Sep-2013 |
kib |
When opening or closing fifo, ensure that the vnode is locked exclusively. Filesystems are assumed to disable shared locking for the fifo vnode locks, but some do not.
Reported and tested by: olgeni Discussed with: avg Sponsored by: The FreeBSD Foundation MFC after: 1 week Approved by: re (glebius)
|
#
254602 |
|
21-Aug-2013 |
kib |
Make the seek a method of the struct fileops.
Tested by: pho Sponsored by: The FreeBSD Foundation
|
#
254415 |
|
16-Aug-2013 |
kib |
Restore the previous sendfile(2) behaviour on the block devices. Provide valid .fo_sendfile method for several missed struct fileops.
Reviewed by: glebius Sponsored by: The FreeBSD Foundation
|
#
254356 |
|
15-Aug-2013 |
glebius |
Make sendfile() a method in the struct fileops. Currently only vnode backed file descriptors have this method implemented.
Reviewed by: kib Sponsored by: Nginx, Inc. Sponsored by: Netflix
|
#
253106 |
|
09-Jul-2013 |
kib |
There are several code sequences like vfs_busy(mp); vfs_write_suspend(mp); which are problematic if other thread starts unmount between two calls. The unmount starts a write, while vfs_write_suspend() drain writers. On the other hand, unmount drains busy references, causing the deadlock.
Add a flag argument to vfs_write_suspend and require the callers of it to specify VS_SKIP_UNMOUNT flag, when the call is performed not in the mount path, i.e. the covered vnode is not locked. The suspension is not attempted if VS_SKIP_UNMOUNT is specified and unmount is in progress.
Reported and tested by: Andreas Longwitz <longwitz@incore.de> Sponsored by: The FreeBSD Foundation MFC after: 3 weeks
|
#
251184 |
|
31-May-2013 |
jhb |
Style fixes to vn_ioctl().
Suggested by: bde
|
#
250220 |
|
03-May-2013 |
jhb |
Fix FIONREAD on regular files. The computed result was being ignored and it was being passed down to VOP_IOCTL() where it promptly resulted in ENOTTY due to a missing else for the past 8 years. While here, use a shared vnode lock while fetching the current file's size.
MFC after: 1 week
|
#
248933 |
|
30-Mar-2013 |
mdf |
Use a shared lock for VOP_GETEXTATTR, as it is a read-like operation.
MFC after: 1 week
|
#
248329 |
|
15-Mar-2013 |
kib |
Separate the copyright lines and the informational block by a blank line.
Requested by: joel MFC after: 2 weeks
|
#
248325 |
|
15-Mar-2013 |
kib |
Add my copyright for the 2012 year work, in particular vn_io_fault() and f_offset locking. Add required Foundation notice for r248319.
Sponsored by: The FreeBSD Foundation MFC after: 2 weeks
|
#
248319 |
|
15-Mar-2013 |
kib |
Implement the helper function vn_io_fault_pgmove(), intended to use by the filesystem VOP_READ() and VOP_WRITE() implementations in the same way as vn_io_fault_uiomove() over the unmapped buffers. Helper provides the convenient wrapper over the pmap_copy_pages() for struct uio consumers, taking care of the TDP_UIOHELD situations.
Sponsored by: The FreeBSD Foundation Tested by: pho MFC after: 2 weeks
|
#
248084 |
|
09-Mar-2013 |
attilio |
Switch the vm_object mutex to be a rwlock. This will enable in the future further optimizations where the vm_object lock will be held in read mode most of the time the page cache resident pool of pages are accessed for reading purposes.
The change is mostly mechanical but few notes are reported: * The KPI changes as follow: - VM_OBJECT_LOCK() -> VM_OBJECT_WLOCK() - VM_OBJECT_TRYLOCK() -> VM_OBJECT_TRYWLOCK() - VM_OBJECT_UNLOCK() -> VM_OBJECT_WUNLOCK() - VM_OBJECT_LOCK_ASSERT(MA_OWNED) -> VM_OBJECT_ASSERT_WLOCKED() (in order to avoid visibility of implementation details) - The read-mode operations are added: VM_OBJECT_RLOCK(), VM_OBJECT_TRYRLOCK(), VM_OBJECT_RUNLOCK(), VM_OBJECT_ASSERT_RLOCKED(), VM_OBJECT_ASSERT_LOCKED() * The vm/vm_pager.h namespace pollution avoidance (forcing requiring sys/mutex.h in consumers directly to cater its inlining functions using VM_OBJECT_LOCK()) imposes that all the vm/vm_pager.h consumers now must include also sys/rwlock.h. * zfs requires a quite convoluted fix to include FreeBSD rwlocks into the compat layer because the name clash between FreeBSD and solaris versions must be avoided. At this purpose zfs redefines the vm_object locking functions directly, isolating the FreeBSD components in specific compat stubs.
The KPI results heavilly broken by this commit. Thirdy part ports must be updated accordingly (I can think off-hand of VirtualBox, for example).
Sponsored by: EMC / Isilon storage division Reviewed by: jeff Reviewed by: pjd (ZFS specific review) Discussed with: alc Tested by: pho
|
#
247586 |
|
01-Mar-2013 |
pjd |
Remove unnecessary variables.
|
#
246832 |
|
15-Feb-2013 |
pluknet |
vn_io_faults_cnt: - use u_long consistently - use SYSCTL_ULONG to match the type of variable
Reviewed by: kib MFC after: 1 week
|
#
246169 |
|
31-Jan-2013 |
pjd |
Simplify code a bit. This is leftover after Giant removal from VFS.
|
#
245286 |
|
11-Jan-2013 |
kib |
Add flags argument to vfs_write_resume() and remove vfs_write_resume_flags().
Sponsored by: The FreeBSD Foundation
|
#
244925 |
|
01-Jan-2013 |
kib |
The process_deferred_inactive() function locks the vnodes of the ufs mount, which means that is must not be called while the snaplock is owned. The vfs_write_resume(9) does call the function as the VFS_SUSP_CLEAN() method, which is too early and falls into the region still protected by snaplock.
Add yet another flag for the vfs_write_resume_flags() to avoid calling suspension cleanup handler after the suspend is lifted, and use it in the ffs_snapshot() call to vfs_write_resume.
Reported and tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks
|
#
244795 |
|
28-Dec-2012 |
kib |
Make it possible to atomically resume writes on the mount and account the write start, by adding a variation of the vfs_write_resume(9) which accepts flags.
Use the new function to prevent a deadlock between parallel suspension and snapshotting a UFS mount. The ffs_snapshot() code performed vfs_write_resume() followed by vn_start_write() while owning the snaplock. If the suspension intervene between resume and vn_start_write(), the deadlock occured after the suspending thread tried to lock the snaplock, most typically during the write in the ffs_copyonwrite().
Reported and tested by: Andreas Longwitz <longwitz@incore.de> Reviewed by: mckusick MFC after: 2 weeks X-MFC-note: make the vfs_write_resume(9) function a macro after the MFC, in HEAD
|
#
243612 |
|
27-Nov-2012 |
pjd |
- Add NOCAPCHECK flag to namei that allows lookup to work even if the process is in capability mode. - Add VN_OPEN_NOCAPCHECK flag for vn_open_cred() to will ne converted into NOCAPCHECK namei flag.
This functionality will be used to enable core dumps for sandboxed processes.
Reviewed by: rwatson Obtained from: WHEEL Systems MFC after: 2 weeks
|
#
242476 |
|
02-Nov-2012 |
kib |
The r241025 fixed the case when a binary, executed from nullfs mount, was still possible to open for write from the lower filesystem. There is a symmetric situation where the binary could already has file descriptors opened for write, but it can be executed from the nullfs overlay.
Handle the issue by passing one v_writecount reference to the lower vnode if nullfs vnode has non-zero v_writecount. Note that only one write reference can be donated, since nullfs only keeps one use reference on the lower vnode. Always use the lower vnode v_writecount for the checks.
Introduce the VOP_GET_WRITECOUNT to read v_writecount, which is currently always bypassed to the lower vnode, and VOP_ADD_WRITECOUNT to manipulate the v_writecount value, which manages a single bypass reference to the lower vnode. Caling the VOPs instead of directly accessing v_writecount provide the fix described in the previous paragraph.
Tested by: pho MFC after: 3 weeks
|
#
241896 |
|
22-Oct-2012 |
kib |
Remove the support for using non-mpsafe filesystem modules.
In particular, do not lock Giant conditionally when calling into the filesystem module, remove the VFS_LOCK_GIANT() and related macros. Stop handling buffers belonging to non-mpsafe filesystems.
The VFS_VERSION is bumped to indicate the interface change which does not result in the interface signatures changes.
Conducted and reviewed by: attilio Tested by: pho
|
#
241025 |
|
28-Sep-2012 |
kib |
Fix the mis-handling of the VV_TEXT on the nullfs vnodes.
If you have a binary on a filesystem which is also mounted over by nullfs, you could execute the binary from the lower filesystem, or from the nullfs mount. When executed from lower filesystem, the lower vnode gets VV_TEXT flag set, and the file cannot be modified while the binary is active. But, if executed as the nullfs alias, only the nullfs vnode gets VV_TEXT set, and you still can open the lower vnode for write.
Add a set of VOPs for the VV_TEXT query, set and clear operations, which are correctly bypassed to lower vnode.
Tested by: pho (previous version) MFC after: 2 weeks
|
#
240936 |
|
25-Sep-2012 |
pjd |
vn_write() always expects FOF_OFFSET flag, which is asserted at the begining, so there is no need to check for it.
Sponsored by: FreeBSD Foundation MFC after: 2 weeks
|
#
238952 |
|
31-Jul-2012 |
jhb |
Reorder the managament of advisory locks on open files so that the advisory lock is obtained before the write count is increased during open() and the lock is released after the write count is decreased during close().
The first change closes a race where an open() that will block with O_SHLOCK or O_EXLOCK can increase the write count while it waits. If the process holding the current lock on the file then tries to call exec() on the file it has locked, it can fail with ETXTBUSY even though the advisory lock is preventing other threads from succesfully completeing a writable open().
The second change closes a race where a read-only open() with O_SHLOCK or O_EXLOCK may return successfully while the write count is non-zero due to another descriptor that had the advisory lock and was blocking the open() still being in the process of closing. If the process that completed the open() then attempts to call exec() on the file it locked, it can fail with ETXTBUSY even though the other process that held a write lock has closed the file and released the lock.
Reviewed by: kib MFC after: 1 month
|
#
238029 |
|
02-Jul-2012 |
kib |
Extend the KPI to lock and unlock f_offset member of struct file. It now fully encapsulates all accesses to f_offset, and extends f_offset locking to other consumers that need it, in particular, to lseek() and variants of getdirentries().
Ensure that on 32bit architectures f_offset, which is 64bit quantity, always read and written under the mtxpool protection. This fixes apparently easy to trigger race when parallel lseek()s or lseek() and read/write could destroy file offset.
The already broken ABI emulations, including iBCS and SysV, are not converted (yet).
Tested by: pho No objections from: jhb MFC after: 3 weeks
|
#
237365 |
|
21-Jun-2012 |
kib |
Fix locking for f_offset, vn_read() and vn_write() cases only, for now.
It seems that intended locking protocol for struct file f_offset field was as follows: f_offset should always be changed under the vnode lock (except fcntl(2) and lseek(2) did not followed the rules). Since read(2) uses shared vnode lock, FOFFSET_LOCKED block is additionally taken to serialize shared vnode lock owners.
This was broken first by enabling shared lock on writes, then by fadvise changes, which moved f_offset assigned from under vnode lock, and last by vn_io_fault() doing chunked i/o. More, due to uio_offset not yet valid in vn_io_fault(), the range lock for reads was taken on the wrong region.
Change the locking for f_offset to always use FOFFSET_LOCKED block, which is placed before rangelocks in the lock order.
Extract foffset_lock() and foffset_unlock() functions which implements FOFFSET_LOCKED lock, and consistently lock f_offset with it in the vn_io_fault() both for reads and writes, even if MNTK_NO_IOPF flag is not set for the vnode mount. Indicate that f_offset is already valid for vn_read() and vn_write() calls from vn_io_fault() with FOF_OFFSET flag, and assert that all callers of vn_read() and vn_write() follow this protocol.
Extract get_advice() function to calculate the POSIX_FADV_XXX value for the i/o region, and use it were appropriate.
Reviewed by: jhb Tested by: pho MFC after: 2 weeks
|
#
237274 |
|
19-Jun-2012 |
jhb |
Further refine the implementation of POSIX_FADV_NOREUSE.
First, extend the changes in r230782 to better handle the common case of using NOREUSE with sequential reads. A NOREUSE file descriptor will now track the last implicit DONTNEED request it made as a result of a NOREUSE read. If a subsequent NOREUSE read is adjacent to the previous range, it will apply the DONTNEED request to the entire range of both the previous read and the current read. The effect is that each read of a file accessed sequentially will apply the DONTNEED request to the entire range that has been read. This allows NOREUSE to properly handle misaligned reads by flushing each buffer to cache once it has been completely read.
Second, apply the same changes made to read(2) by r230782 and this change to writes. This provides much better performance in the sequential write case as it allows writes to still be clustered. It also provides much better performance for misaligned writes. It does mean that NOREUSE will be generally ineffective for non-sequential writes as the current implementation relies on a future NOREUSE write's implicit DONTNEED request to flush the dirty buffer from the current write.
MFC after: 2 weeks
|
#
236762 |
|
08-Jun-2012 |
jhb |
Split the second half of vn_open_cred() (after a vnode has been found via a lookup or created via VOP_CREATE()) into a new vn_open_vnode() function and use this function in fhopen() instead of duplicating code from vn_open_cred() directly.
Tested by: pho Reviewed by: kib MFC after: 2 weeks
|
#
236517 |
|
03-Jun-2012 |
kib |
Add a knob to disable vn_io_fault.
MFC after: 1 month
|
#
236516 |
|
03-Jun-2012 |
kib |
Count and export the number of prefaulting happen.
MFC after: 1 month
|
#
236321 |
|
30-May-2012 |
kib |
vn_io_fault() is a facility to prevent page faults while filesystems perform copyin/copyout of the file data into the usermode buffer. Typical filesystem hold vnode lock and some buffer locks over the VOP_READ() and VOP_WRITE() operations, and since page fault handler may need to recurse into VFS to get the page content, a deadlock is possible.
The facility works by disabling page faults handling for the current thread and attempting to execute i/o while allowing uiomove() to access the usermode mapping of the i/o buffer. If all buffer pages are resident, uiomove() is successfull and request is finished. If EFAULT is returned from uiomove(), the pages backing i/o buffer are faulted in and held, and the copyin/out is performed using uiomove_fromphys() over the held pages for the second attempt of VOP call.
Since pages are hold in chunks to prevent large i/o requests from starving free pages pool, and since vnode lock is only taken for i/o over the current chunk, the vnode lock no longer protect atomicity of the whole i/o request. Use newly added rangelocks to provide the required atomicity of i/o regardind other i/o and truncations.
Filesystems need to explicitely opt-in into the scheme, by setting the MNTK_NO_IOPF struct mount flag, and optionally by using vn_io_fault_uiomove(9) helper which takes care of calling uiomove() or converting uio into request for uiomove_fromphys().
Reviewed by: bf (comments), mdf, pjd (previous version) Tested by: pho Tested by: flo, Gustau P?rez <gperez entel upc edu> (previous version) MFC after: 2 months
|
#
236043 |
|
26-May-2012 |
kib |
Add a vn_bmap_seekhole(9) vnode helper which can be used by any filesystem which supports VOP_BMAP(9) to implement SEEK_HOLE/SEEK_DATA commands for lseek(2).
MFC after: 2 weeks
|
#
232701 |
|
08-Mar-2012 |
jhb |
Add KTR_VFS traces to track modifications to a vnode's writecount.
|
#
231949 |
|
21-Feb-2012 |
kib |
Fix found places where uio_resid is truncated to int.
Add the sysctl debug.iosize_max_clamp, enabled by default. Setting the sysctl to zero allows to perform the SSIZE_MAX-sized i/o requests from the usermode.
Discussed with: bde, das (previous versions) MFC after: 1 month
|
#
230782 |
|
30-Jan-2012 |
jhb |
Refine the implementation of POSIX_FADV_NOREUSE for the read(2) case such that instead of using direct I/O it allows read-ahead similar to POSIX_FADV_NORMAL, but invokes VOP_ADVISE(POSIX_FADV_DONTNEED) after the read(2) has completed to purge just-read data. The write(2) path continues to use direct I/O for POSIX_FADV_NOREUSE for now. Note that NOREUSE works optimally if an application reads and writes full fs blocks.
|
#
227070 |
|
04-Nov-2011 |
jhb |
Add the posix_fadvise(2) system call. It is somewhat similar to madvise(2) except that it operates on a file descriptor instead of a memory region. It is currently only supported on regular files.
Just as with madvise(2), the advice given to posix_fadvise(2) can be divided into two types. The first type provide hints about data access patterns and are used in the file read and write routines to modify the I/O flags passed down to VOP_READ() and VOP_WRITE(). These modes are thus filesystem independent. Note that to ease implementation (and since this API is only advisory anyway), only a single non-normal range is allowed per file descriptor.
The second type of hints are used to hint to the OS that data will or will not be used. These hints are implemented via a new VOP_ADVISE(). A default implementation is provided which does nothing for the WILLNEED request and attempts to move any clean pages to the cache page queue for the DONTNEED request. This latter case required two other changes. First, a new V_CLEANONLY flag was added to vinvalbuf(). This requests vinvalbuf() to only flush clean buffers for the vnode from the buffer cache and to not remove any backing pages from the vnode. This is used to ensure clean pages are not wired into the buffer cache before attempting to move them to the cache page queue. The second change adds a new vm_object_page_cache() method. This method is somewhat similar to vm_object_page_remove() except that instead of freeing each page in the specified range, it attempts to move clean pages to the cache queue if possible.
To preserve the ABI of struct file, the f_cdevpriv pointer is now reused in a union to point to the currently active advice region if one is present for regular files.
Reviewed by: jilles, kib, arch@ Approved by: re (kib) MFC after: 1 month
|
#
225617 |
|
16-Sep-2011 |
kmacy |
In order to maximize the re-usability of kernel code in user space this patch modifies makesyscalls.sh to prefix all of the non-compatibility calls (e.g. not linux_, freebsd32_) with sys_ and updates the kernel entry points and all places in the code that use them. It also fixes an additional name space collision between the kernel function psignal and the libc function of the same name by renaming the kernel psignal kern_psignal(). By introducing this change now we will ease future MFCs that change syscalls.
Reviewed by: rwatson Approved by: re (bz)
|
#
225166 |
|
25-Aug-2011 |
mm |
Generalize ffs_pages_remove() into vn_pages_remove().
Remove mapped pages for all dataset vnodes in zfs_rezget() using new vn_pages_remove() to fix mmapped files changed by zfs rollback or zfs receive -F.
PR: kern/160035, kern/156933 Reviewed by: kib, pjd Approved by: re (kib) MFC after: 1 week
|
#
224914 |
|
16-Aug-2011 |
kib |
Add the fo_chown and fo_chmod methods to struct fileops and use them to implement fchown(2) and fchmod(2) support for several file types that previously lacked it. Add MAC entries for chown/chmod done on posix shared memory and (old) in-kernel posix semaphores.
Based on the submission by: glebius Reviewed by: rwatson Approved by: re (bz)
|
#
221829 |
|
13-May-2011 |
mdf |
Use a name instead of a magic number for kern_yield(9) when the priority should not change. Fetch the td_user_pri under the thread lock. This is probably not necessary but a magic number also seems preferable to knowing the implementation details here.
Requested by: Jason Behmer < jason DOT behmer AT isilon DOT com >
|
#
218424 |
|
08-Feb-2011 |
mdf |
Based on discussions on the svn-src mailing list, rework r218195:
- entirely eliminate some calls to uio_yeild() as being unnecessary, such as in a sysctl handler.
- move should_yield() and maybe_yield() to kern_synch.c and move the prototypes from sys/uio.h to sys/proc.h
- add a slightly more generic kern_yield() that can replace the functionality of uio_yield().
- replace source uses of uio_yield() with the functional equivalent, or in some cases do not change the thread priority when switching.
- fix a logic inversion bug in vlrureclaim(), pointed out by bde@.
- instead of using the per-cpu last switched ticks, use a per thread variable for should_yield(). With PREEMPTION, the only reasonable use of this is to determine if a lock has been held a long time and relinquish it. Without PREEMPTION, this is essentially the same as the per-cpu variable.
|
#
209543 |
|
26-Jun-2010 |
pjd |
Correct arguments order.
|
#
207725 |
|
06-May-2010 |
trasz |
Avoid overflow.
Submitted by: bde@
|
#
207719 |
|
06-May-2010 |
trasz |
Style fixes and removal of unneeded variable.
Submitted by: bde@
|
#
207662 |
|
05-May-2010 |
trasz |
Move checking against RLIMIT_FSIZE into one place, vn_rlimit_fsize().
Reviewed by: kib
|
#
206129 |
|
03-Apr-2010 |
avg |
vn_stat: take into account va_blocksize when setting st_blksize
As currently st_blksize is always PAGE_SIZE, it is playing safe to not use any smaller value. For some cases this might not be optimal, but at least nothing should get broken.
Generally I don't expect this commit to change much for the following reasons (in case of VREG, VDIR): - application I/O and physical I/O are sufficiently decoupled by filesystem code, buffer cache code, cluster and read-ahead logic - not all applications use st_blksize as a hint, some use f_iosize, some use fixed block sizes
I expect writes to the middle of files on ZFS to benefit the most from this change.
Silence from: fs@ MFC after: 2 weeks
|
#
205792 |
|
28-Mar-2010 |
ed |
Rename st_*timespec fields to st_*tim for POSIX 2008 compliance.
A nice thing about POSIX 2008 is that it finally standardizes a way to obtain file access/modification/change times in sub-second precision, namely using struct timespec, which we already have for a very long time. Unfortunately POSIX uses different names.
This commit adds compatibility macros, so existing code should still build properly. Also change all source code in the kernel to work without any of the compatibility macros. This makes it all a less ambiguous.
I am also renaming st_birthtime to st_birthtim, even though it was a local extension anyway. It seems Cygwin also has a st_birthtim.
|
#
205423 |
|
21-Mar-2010 |
ed |
Actually make O_DIRECTORY work.
According to POSIX open() must return ENOTDIR when the path name does not refer to a path name. Change vn_open() to respect this flag. This also simplifies the Linuxolator a bit.
|
#
200273 |
|
08-Dec-2009 |
trasz |
Don't add VAPPEND if the file is not being opened for writing. Note that this only affects cases where open(2) is being used improperly - i.e. when the user specifies O_APPEND without O_WRONLY or O_RDWR.
Reviewed by: rwatson
|
#
198876 |
|
04-Nov-2009 |
trasz |
Revert r198874, pending further discussion.
|
#
198874 |
|
04-Nov-2009 |
trasz |
Make sure we don't end up with VAPPEND without VWRITE, if someone calls open(2) like this: open(..., O_APPEND).
|
#
197579 |
|
28-Sep-2009 |
delphij |
Add two new fcntls to enable/disable read-ahead:
- F_READAHEAD: specify the amount for sequential access. The amount is specified in bytes and is rounded up to nearest block size. - F_RDAHEAD: Darwin compatible version that use 128KB as the sequential access size.
A third argument of zero disables the read-ahead behavior.
Please note that the read-ahead amount is also constrainted by sysctl variable, vfs.read_max, which may need to be raised in order to better utilize this feature.
Thanks Igor Sysoev for proposing the feature and submitting the original version, and kib@ for his valuable comments.
Submitted by: Igor Sysoev <is rambler-co ru> Reviewed by: kib@ MFC after: 1 month
|
#
196733 |
|
01-Sep-2009 |
kib |
Fix mount reference leak when V_XSLEEP is specified to vn_start_write().
Submitted by: tegge
|
#
196692 |
|
31-Aug-2009 |
kib |
Make the mnt_writeopcount and mnt_secondary_writes counters, used by the suspension code, not greater then mnt_ref reference counter value. Increment mnt_ref together with write counter in vn_start_write()/ vn_start_secondary_write(), releasing in vn_finished_write/vn_finished_secondary_write().
Since r186197, unmount code requires that no writers occured after all references are expired. We still could get write counter incremented for freed or reused struct mount, but it seems to be innocent, since corresponding vnode should be referenced and reclaimed then.
Reported by: pho (last half a year), erwin Reviewed by: attilio Tested by: pho, erwin MFC after: 1 week
|
#
195294 |
|
02-Jul-2009 |
kib |
In vn_vget_ino() and their inline equivalents, mnt_ref() the mount point around the sequence that drop vnode lock and then busies the mount point. Not having vlocked node or direct reference to the mp allows for the forced unmount to proceed, making mp unmounted or reused.
Tested by: pho Reviewed by: jeff Approved by: re (kensmith) MFC after: 2 weeks
|
#
194586 |
|
21-Jun-2009 |
kib |
Add another flags argument to vn_open_cred. Use it to specify that some vn_open_cred invocations shall not audit namei path.
In particular, specify VN_OPEN_NOAUDIT for dotdot lookup performed by default implementation of vop_vptocnp, and for the open done for core file. vn_fullpath is called from the audit code, and vn_open there need to disable audit to avoid infinite recursion. Core file is created on return to user mode, that, in particular, happens during syscall return. The creation of the core file is audited by direct calls, and we do not want to overwrite audit information for syscall.
Reported, reviewed and tested by: rwatson
|
#
193762 |
|
08-Jun-2009 |
ps |
Simply shared vnode locking and extend it to also include fsync. Also, in vop_write, no longer assert for exclusive locks on the vnode.
Reviewed by: jhb, kmacy, jeffr
|
#
193511 |
|
05-Jun-2009 |
rwatson |
Move "options MAC" from opt_mac.h to opt_global.h, as it's now in GENERIC and used in a large number of files, but also because an increasing number of incorrect uses of MAC calls were sneaking in due to copy-and-paste of MAC-aware code without the associated opt_mac.h include.
Discussed with: pjd
|
#
193442 |
|
04-Jun-2009 |
ps |
When checking for shared writes, use the struct mount returned from vn_start_write.
Reviewed by: jhb
|
#
193440 |
|
04-Jun-2009 |
ps |
Support shared vnode locks for write operations when the offset is provided on filesystems that support it. This really improves mysql + innodb performance on ZFS.
Reviewed by: jhb, kmacy, jeffr
|
#
191990 |
|
11-May-2009 |
attilio |
Remove the thread argument from the FSD (File-System Dependent) parts of the VFS. Now all the VFS_* functions and relating parts don't want the context as long as it always refers to curthread.
In some points, in particular when dealing with VOPs and functions living in the same namespace (eg. vflush) which still need to be converted, pass curthread explicitly in order to retain the old behaviour. Such loose ends will be fixed ASAP.
While here fix a bug: now, UFS_EXTATTR can be compiled alone without the UFS_EXTATTR_AUTOSTART option.
VFS KPI is heavilly changed by this commit so thirdy parts modules needs to be recompiled. Bump __FreeBSD_version in order to signal such situation.
|
#
191895 |
|
07-May-2009 |
kib |
Eliminate the loop and the call to pause(9) in vfs_vget_ino(). If vfs_busy(MBF_NOWAIT) failed, unlock the vnode and sleep in vfs_busy().
Suggested and reviewed by: jeff Tested by: pho MFC after: 3 weeks
|
#
191028 |
|
13-Apr-2009 |
kmacy |
- use a shared lock for reads - remove stale comment
Reviewed by: jeffr
|
#
190888 |
|
10-Apr-2009 |
rwatson |
Remove VOP_LEASE and supporting functions. This hasn't been used since the removal of NQNFS, but was left in in case it was required for NFSv4. Since our new NFSv4 client and server can't use it for their requirements, GC the old mechanism, as well as other unused lease- related code and interfaces.
Due to its impact on kernel programming and binary interfaces, this change should not be MFC'd.
Proposed by: jeff Reviewed by: jeff Discussed with: rmacklem, zach loafman @ isilon
|
#
189696 |
|
11-Mar-2009 |
jhb |
Add a new internal mount flag (MNTK_EXTENDED_SHARED) to indicate that a filesystem supports additional operations using shared vnode locks. Currently this is used to enable shared locks for open() and close() of read-only file descriptors. - When an ISOPEN namei() request is performed with LOCKSHARED, use a shared vnode lock for the leaf vnode only if the mount point has the extended shared flag set. - Set LOCKSHARED in vn_open_cred() for requests that specify O_RDONLY but not O_CREAT. - Use a shared vnode lock around VOP_CLOSE() if the file was opened with O_RDONLY and the mountpoint has the extended shared flag set. - Adjust md(4) to upgrade the vnode lock on the vnode it gets back from vn_open() since it now may only have a shared vnode lock. - Don't enable shared vnode locks on FIFO vnodes in ZFS and UFS since FIFO's require exclusive vnode locks for their open() and close() routines. (My recent MPSAFE patches for UDF and cd9660 already included this change.) - Enable extended shared operations on UFS, cd9660, and UDF.
Submitted by: ups Reviewed by: pjd (ZFS bits) MFC after: 1 month
|
#
187528 |
|
21-Jan-2009 |
kib |
Move the code from ufs_lookup.c used to do dotdot lookup, into the helper function. It is supposed to be useful for any filesystem that has to unlock dvp to walk to the ".." entry in lookup routine.
Requested by: jhb Tested by: pho MFC after: 1 month
|
#
185431 |
|
29-Nov-2008 |
pjd |
Improve KASSERT() call a bit: - Print flags in hex. - Note that flags can be fine and panic can be due unexpected error condition. - Remove redundant new line character.
Eventhough panic message excess 80 characters keep it in one line so it is easier to grep.
|
#
185011 |
|
16-Nov-2008 |
kib |
Revert r184118. There is actually a code in the kernel, for instance in kern_unlinkat(), that expects that vn_start_write() actually fills the mp even when the call failed.
As Tor noted, that pattern relies on the the type stability of the mount points, as well as that suspended mount points are never freed and V_XSLEEP is always passed to vn_start_write() when called on a freed mount point.
Reported by: stass Reviewed by: tegge PR: 123768
|
#
184600 |
|
03-Nov-2008 |
jhb |
Use shared vnode locks instead of exclusive vnode locks for the access(), chdir(), chroot(), eaccess(), fpathconf(), fstat(), fstatfs(), lseek() (when figuring out the current size of the file in the SEEK_END case), pathconf(), readlink(), and statfs() system calls.
Submitted by: ups (mostly) Tested by: pho MFC after: 1 month
|
#
184554 |
|
02-Nov-2008 |
attilio |
Improve VFS locking: - Implement real draining for vfs consumers by not relying on the mnt_lock and using instead a refcount in order to keep track of lock requesters. - Due to the change above, remove the mnt_lock lockmgr because it is now useless. - Due to the change above, vfs_busy() is no more linked to a lockmgr. Change so its KPI by removing the interlock argument and defining 2 new flags for it: MBF_NOWAIT which basically replaces the LK_NOWAIT of the old version (which was unlinked from the lockmgr alredy) and MBF_MNTLSTLOCK which provides the ability to drop the mountlist_mtx once the mnt interlock is held (ability still desired by most consumers). - The stub used into vfs_mount_destroy(), that allows to override the mnt_ref if running for more than 3 seconds, make it totally useless. Remove it as it was thought to work into older versions. If a problem of "refcount held never going away" should appear, we will need to fix properly instead than trust on such hackish solution. - Fix a bug where returning (with an error) from dounmount() was still leaving the MNTK_MWAIT flag on even if it the waiters were actually woken up. Just a place in vfs_mount_destroy() is left because it is going to recycle the structure in any case, so it doesn't matter. - Remove the markercnt refcount as it is useless.
This patch modifies VFS ABI and breaks KPI for vfs_busy() so manpages and __FreeBSD_version will be modified accordingly.
Discussed with: kib Tested by: pho
|
#
184413 |
|
28-Oct-2008 |
trasz |
Introduce accmode_t. This is required for NFSv4 ACLs - it will be neccessary to add more V* constants, and the variables changed by this patch were often being assigned to mode_t variables, which is 16 bit.
Approved by: rwatson (mentor)
|
#
184118 |
|
21-Oct-2008 |
kib |
Change vn_start_write() to clear *mpp on all failures when non-NULL vp is supplied, since vm_pageout_scan() expects it to be cleared on error.
Submitted by: tegge PR: 123768 MFC after: 1 week
|
#
184074 |
|
20-Oct-2008 |
kib |
Assert that v_holdcnt is non-zero before entering lockmgr in vn_lock and ffs_lock. This cannot catch situations where holdcnt is incremented not by curthread, but I think it is useful.
Reviewed by: tegge, attilio Tested by: pho MFC after: 2 weeks
|
#
183213 |
|
20-Sep-2008 |
kib |
Initialize va_rdev to NODEV and va_fsid to VNOVAL before the VOP_GETATTR() call in vn_stat(). Thus if a file system doesn't initialize those fields in VOP_GETATTR() they will have a sane default value.
Submitted by: Jaakko Heinonen <jh saunalahti fi> Discussed on: freebsd-fs MFC after: 1 month
|
#
183211 |
|
20-Sep-2008 |
kib |
Initialize birthtime fields in vn_stat() to prevent stat(2) from returning uninitialized birthtime. Most file systems don't initialize birthtime properly in their VOP_GETTATTR().
Submitted by: Jaakko Heinonen <jh saunalahti fi> Reviewed by: bde Discussed on: freebsd-fs MFC after: 1 month
|
#
183073 |
|
16-Sep-2008 |
kib |
When attempt is made to suspend a filesystem that is already syspended, wait until the current suspension is lifted instead of silently returning success immediately. The consequences of calling vfs_write() resume when not owning the suspension are not well-defined at best.
Add the vfs_susp_clean() mount method to be called from vfs_write_resume(). Set it to process_deferred_inactive() for ffs, and stop calling it manually.
Add the thread flag TDP_IGNSUSP that allows to bypass the suspension point in the vn_start_write. It is intended for use by VFS in the situations where the suspender want to do some i/o requiring calls to vn_start_write(), and this i/o cannot be done later.
Reviewed by: tegge In collaboration with: pho MFC after: 1 month
|
#
183071 |
|
16-Sep-2008 |
kib |
Garbage-collect vn_write_suspend_wait().
Suggested and reviewed by: tegge Tested by: pho MFC after: 1 month
|
#
182371 |
|
28-Aug-2008 |
attilio |
Decontextualize the couplet VOP_GETATTR / VOP_SETATTR as the passed thread was always curthread and totally unuseful.
Tested by: Giovanni Trematerra <giovanni dot trematerra at gmail dot com>
|
#
181254 |
|
03-Aug-2008 |
rwatson |
Remove broken code to replace st_mode value with ACCESSPERMS when lstat(2) is called on symlinks -- this code appears never to have worked. The PR this addresses suggests that the intended original behavior is the right one, but as bde points out in the PR comments, we do actually support storing a mode on symlinks, so returning it seems reasonable.
This is consistent with Mac OS X, which despite documentation to the contrary does return the mode set on a symlink, but not some other platforms. The Single Unix Spec requires only that the returned bits be "meaningful", which seems at best unhelpful as advice goes.
PR: 25018 MFC after: 3 days
|
#
177784 |
|
31-Mar-2008 |
kib |
Add the support for the O_EXEC open(2) mode, as specified by the POSIX Extended API Set Part 2 extension specification.
Reviewed by: rwatson, rdivacky Tested by: pho
|
#
177727 |
|
29-Mar-2008 |
jeff |
- Don't allow calls to vn_lock() with no lock type requested. Callers which simply want a reference should use vref(). Callers which want to check validity need to hold a lock while performing any action based on that validity. vn_lock() would always release the interlock before returning making any action synchronous with the validity check impossible.
|
#
177538 |
|
24-Mar-2008 |
jeff |
- Don't acquire the vnode interlock in _vn_lock() unless no lock type is requested. Handle this case specially before the while loop. - Use the held vnode lock to check for VI_DOOMED. The vnode lock and interlock must both be held to set VI_DOOMED so either one held, even shared, is sufficient to check it.
No objection by: kib
|
#
175294 |
|
13-Jan-2008 |
attilio |
VOP_LOCK1() (and so VOP_LOCK()) and VOP_UNLOCK() are only used in conjuction with 'thread' argument passing which is always curthread. Remove the unuseful extra-argument and pass explicitly curthread to lower layer functions, when necessary.
KPI results broken by this change, which should affect several ports, so version bumping and manpage update will be further committed.
Tested by: kris, pho, Diego Sardina <siarodx at gmail dot com>
|
#
175202 |
|
10-Jan-2008 |
attilio |
vn_lock() is currently only used with the 'curthread' passed as argument. Remove this argument and pass curthread directly to underlying VOP_LOCK1() VFS method. This modify makes the code cleaner and in particular remove an annoying dependence helping next lockmgr() cleanup. KPI results, obviously, changed.
Manpage and FreeBSD_version will be updated through further commits.
As a side note, would be valuable to say that next commits will address a similar cleanup about VFS methods, in particular vop_lock1 and vop_unlock.
Tested by: Diego Sardina <siarodx at gmail dot com>, Andrea Di Pasquale <whyx dot it at gmail dot com>
|
#
175140 |
|
07-Jan-2008 |
jhb |
Make ftruncate a 'struct file' operation rather than a vnode operation. This makes it possible to support ftruncate() on non-vnode file types in the future. - 'struct fileops' grows a 'fo_truncate' method to handle an ftruncate() on a given file descriptor. - ftruncate() moves to kern/sys_generic.c and now just fetches a file object and invokes fo_truncate(). - The vnode-specific portions of ftruncate() move to vn_truncate() in vfs_vnops.c which implements fo_truncate() for vnode file types. - Non-vnode file types return EINVAL in their fo_truncate() method.
Submitted by: rwatson
|
#
175106 |
|
05-Jan-2008 |
bde |
In sequential_heuristic(): - spell 16384 as 16384 and not as BKVASIZE. 16384 is (not quite) just a magic size that works well in practice. BKVASIZE should be MAXBSIZE (65536), but is 16384 because i386's don't have enough kva for it to be MAXBSIZE; 16384 works (not so well) for it for much the same reasons that it works well in the heuristic. - expand and/or add comments about this and other details. - don't explicitly inline this function. - fix some other style bugs.
|
#
174988 |
|
30-Dec-2007 |
jeff |
Remove explicit locking of struct file. - Introduce a finit() which is used to initailize the fields of struct file in such a way that the ops vector is only valid after the data, type, and flags are valid. - Protect f_flag and f_count with atomic operations. - Remove the global list of all files and associated accounting. - Rewrite the unp garbage collection such that it no longer requires the global list of all files and instead uses a list of all unp sockets. - Mark sockets in the accept queue so we don't incorrectly gc them.
Tested by: kris, pho
|
#
172930 |
|
24-Oct-2007 |
rwatson |
Merge first in a series of TrustedBSD MAC Framework KPI changes from Mac OS X Leopard--rationalize naming for entry points to the following general forms:
mac_<object>_<method/action> mac_<object>_check_<method/action>
The previous naming scheme was inconsistent and mostly reversed from the new scheme. Also, make object types more consistent and remove spaces from object types that contain multiple parts ("posix_sem" -> "posixsem") to make mechanical parsing easier. Introduce a new "netinet" object type for certain IPv4/IPv6-related methods. Also simplify, slightly, some entry point names.
All MAC policy modules will need to be recompiled, and modules not updates as part of this commit will need to be modified to conform to the new KPI.
Sponsored by: SPARTA (original patches against Mac OS X) Obtained from: TrustedBSD Project, Apple Computer
|
#
171599 |
|
26-Jul-2007 |
pjd |
When we do open, we should lock the vnode exclusively. This fixes few races: - fifo race, where two threads assign v_fifoinfo, - v_writecount modifications, - v_object modifications, - and probably more...
Discussed with: kib, ups Approved by: re (rwatson)
|
#
170152 |
|
31-May-2007 |
kib |
Revert UF_OPENING workaround for CURRENT. Change the VOP_OPEN(), vn_open() vnode operation and d_fdopen() cdev operation argument from being file descriptor index into the pointer to struct file.
Proposed and reviewed by: jhb Reviewed by: daichi (unionfs) Approved by: re (kensmith)
|
#
169671 |
|
18-May-2007 |
kib |
Since renaming of vop_lock to _vop_lock, pre- and post-condition function calls are no more generated for vop_lock. Rename _vop_lock to vop_lock1 to satisfy tools/vnode_if.awk assumption about vop naming conventions. This restores pre/post-condition calls.
|
#
169658 |
|
17-May-2007 |
peter |
Eliminate a micro-optimization that hasn't had any effect for 15+ years.
|
#
166674 |
|
12-Feb-2007 |
mpp |
Add a VNASSERT to vn_close to detect if v_writecount is going to become negative. This will detect the underflow when it happens, instead of having it discovered when the vnode is taken off the freelist, long after the offending process is long gone.
|
#
164248 |
|
13-Nov-2006 |
kmacy |
change vop_lock handling to allowing tracking of callers' file and line for acquisition of lockmgr locks
Approved by: scottl (standing in for mentor rwatson)
|
#
164033 |
|
06-Nov-2006 |
rwatson |
Sweep kernel replacing suser(9) calls with priv(9) calls, assigning specific privilege names to a broad range of privileges. These may require some future tweaking.
Sponsored by: nCircle Network Security, Inc. Obtained from: TrustedBSD Project Discussed on: arch@ Reviewed (at least in part) by: mlaier, jmg, pjd, bde, ceri, Alex Lyashkov <umka at sevcity dot net>, Skip Ford <skip dot ford at verizon dot net>, Antoine Brodin <antoine dot brodin at laposte dot net>
|
#
163606 |
|
22-Oct-2006 |
rwatson |
Complete break-out of sys/sys/mac.h into sys/security/mac/mac_framework.h begun with a repo-copy of mac.h to mac_framework.h. sys/mac.h now contains the userspace and user<->kernel API and definitions, with all in-kernel interfaces moved to mac_framework.h, which is now included across most of the kernel instead.
This change is the first step in a larger cleanup and sweep of MAC Framework interfaces in the kernel, and will not be MFC'd.
Obtained from: TrustedBSD Project Sponsored by: SPARTA
|
#
159917 |
|
24-Jun-2006 |
pjd |
Simplify the code and remove two mutex operations.
MFC after: 2 weeks
|
#
158642 |
|
16-May-2006 |
ps |
Allow concurrent read(2)/readv(2) access to a file. Lock file offset against multiple read calls.
Submitted by: ups Obtained from: Yahoo! MFC after: 2 weeks
|
#
158128 |
|
28-Apr-2006 |
pjd |
vn_start_write() is called only when v_type != VCHR, so corresponding vn_finished_write() should also be called only then.
BTW. I fixed two functions here: vn_rdwr() and vn_write(). The latter seems to be unused.
MFC after: 3 weeks
|
#
157325 |
|
31-Mar-2006 |
jeff |
- Release the references acquired by VOP_GETWRITEMOUNT and vfs_getvfs().
Discussed with: tegge Tested by: kris Sponsored by: Isilon Systems, Inc.
|
#
157230 |
|
28-Mar-2006 |
jhb |
Change vn_open() to honor the MPSAFE flag in the passed in nameidata object and use that instead of testing fdidx against -1 to determine if it should release Giant if Giant was locked due to the requested file residing on a non-MPSAFE VFS.
Discussed with: jeff
|
#
156978 |
|
22-Mar-2006 |
jeff |
- Remove explicit giant acquires and replace it with VFS_LOCK_GIANT.
Sponsored by: Isilon Systems, Inc.
|
#
156574 |
|
11-Mar-2006 |
csjp |
Make sure that we are adding a path token to the audit record in open(2). Do this by making sure we are using the AUDITVNODE1 mask in the namei flags.
Obtained from: TrustedBSD Project
|
#
156560 |
|
11-Mar-2006 |
tegge |
Block secondary writes while expunging active unlinked files.
Fix detection of active unlinked files by checking VI_OWEINACT and VI_DOINGINACT in addition to v_usecount.
Defer inactive handling for unlinked files if the file system is mostly suspended (secondary writes being blocked).
Perform deferred inactive handling after the file system is resumed.
|
#
156451 |
|
08-Mar-2006 |
tegge |
Use vn_start_secondary_write() and vn_finished_secondary_write() as a replacement for vn_write_suspend_wait() to better account for secondary write processing.
Close race where secondary writes could be started after ffs_sync() returned but before the file system was marked as suspended.
Detect if secondary writes or softdep processing occurred during vnode sync loop in ffs_sync() and retry the loop if needed.
|
#
153400 |
|
14-Dec-2005 |
des |
Eradicate caddr_t from the VFS API.
|
#
148668 |
|
03-Aug-2005 |
jeff |
- Replace the series of DEBUG_LOCKS hacks which tried to save the vn_lock caller by saving the stack of the last locker/unlocker in lockmgr. We also put the stack in KTR at the moment.
Contributed by: Antoine Brodin <antoine.brodin@laposte.net>
|
#
147352 |
|
14-Jun-2005 |
jeff |
- Remove vnode lock asserts at the end of vfs syscalls. These asserts were used to ensure that we weren't exiting the syscall with a lock still held. This wasn't safe, however, because we'd already executed a vput() and on a loaded system the vnode may have been free'd by the time we assert. This functionality is also handled by the td_locks assert in userret, which doesn't tell you what the syscall was, but will at least panic before you deadlock.
Sponsored by: Isilon Systems, Inc. Discovred by: Peter Holm Approved by: re (blanket vfs)
|
#
147328 |
|
13-Jun-2005 |
jeff |
- It has long been my suspicion that we don't actually need a loop in vn_lock(). Add an assert that will help me gain more confidence that this is correct.
Sponsored by: Isilon Systems, Inc.
|
#
145587 |
|
27-Apr-2005 |
jeff |
- Stop checking vxthread, we've asserted that it was useless for several weeks.
|
#
145584 |
|
27-Apr-2005 |
jeff |
- Pass the ISOPEN flag to namei so filesystems will know we're about to open them or otherwise access the data.
|
#
144899 |
|
11-Apr-2005 |
jeff |
- Assert that we're no longer doing recursive vn_locks in inactive/reclaim as I'd like to get rid of the vxthread. - Handle lock requests which don't actually want a lock as this is a much more convenient place to handle this condition than in vget(). These requests simply want to know that VI_DOOMED isn't set. - Correct a test at the end of vn_lock, if error !=0 should be if error == 0, this has been broken since I comitted the VI_DOOMED changes, but no one ran into it because vget() duplicated this functionality.
Sponsored by: Isilon Systems, Inc.
|
#
144643 |
|
05-Apr-2005 |
csjp |
Assert that the vnode is locked. This is meant to catch bugs or mis-use of the vnode API in conditions where IO_NODELOCKED has been used without the vnode actually being locked.
|
#
144367 |
|
31-Mar-2005 |
jeff |
- LK_NOPAUSE is a nop now.
Sponsored by: Isilon Systems, Inc.
|
#
144050 |
|
24-Mar-2005 |
jeff |
- Remove some long dead LOOKUP_SHARED code that tracked the lock state. - Always pass LOCKSHARED and rely on namei() to ignore it when LOOKUP_SHARED is not set.
Sponsored by: Isilon Systems, Inc.
|
#
143498 |
|
13-Mar-2005 |
jeff |
- Do a vn_start_write in vn_close, we may write if this is the last ref on an unlinked file. We can't know if this is the case until after we have the lock. - Lock the vnode in vn_close, many filesystems had code which was unsafe without the lock held, and holding it greatly simplifies vgone(). - Adjust vn_lock() to check for the VI_DOOMED flag where appropriate.
Sponsored by: Isilon Systems, Inc.
|
#
142348 |
|
24-Feb-2005 |
csjp |
Add locking assertions into vn_extattr_set, vn_extattr_get and vn_extattr_rm. This is meant to catch conditions where IO_NODELOCKED has been specified without the vnode being locked.
Discussed with: rwatson MFC after: 1 week
|
#
142011 |
|
17-Feb-2005 |
phk |
Introduce vx_wait{l}() and use it instead of home-rolled versions.
|
#
140779 |
|
24-Jan-2005 |
phk |
Don't call VOP_CREATEVOBJECT(), it's the responsibility of the filesystem which owns the vnode.
|
#
140716 |
|
24-Jan-2005 |
jeff |
- Remove GIANT_REQUIRED where giant is no longer required. - Protect access to mnt_kern_flag with the mountpoint mutex. - Use the appropriate nd flags to deal with giant in vn_open_cred(). We currently determine whether the caller is mpsafe by checking for a valid fdidx. Any caller coming from user-space is now mpsafe and supplies a valid fd. No kenrel callers have been converted to mpsafe, so this check is sufficient for now. - Use VFS_LOCK_GIANT instead of manual giant acquisition where appropriate.
Sponsored By: Isilon Systems, Inc.
|
#
140181 |
|
13-Jan-2005 |
phk |
Ditch vfs_object_create() and make the callers call VOP_CREATEVOBJECT() directly.
|
#
140048 |
|
11-Jan-2005 |
phk |
Remove the unused credential argument from VOP_FSYNC() and VFS_SYNC().
I'm not sure why a credential was added to these in the first place, it is not used anywhere and it doesn't make much sense:
The credentials for syncing a file (ability to write to the file) should be checked at the system call level.
Credentials for syncing one or more filesystems ("none") should be checked at the system call level as well.
If the filesystem implementation needs a particular credential to carry out the syncing it would logically have to the cached mount credential, or a credential cached along with any delayed write data.
Discussed with: rwatson
|
#
139804 |
|
06-Jan-2005 |
imp |
/* -> /*- for copyright notices, minor format tweaks as necessary
|
#
137866 |
|
18-Nov-2004 |
phk |
Ok, first blunder: ioctls are not entirely unused on vnodes anymore :-)
Add dropped call to VOP_IOCTL().
|
#
137806 |
|
17-Nov-2004 |
phk |
Push Giant down through ioctl.
Don't grab Giant in the upper syscall/wrapper code
NET_LOCK_GIANT in the socket code (sockets/fifos).
mtx_lock(&Giant) in the vnode code.
mtx_lock(&Giant) in the opencrypto code. (This may actually not be needed, but better safe than sorry).
Devfs grabs Giant if the driver is marked as needing Giant.
|
#
137805 |
|
17-Nov-2004 |
phk |
Push Giant down through select and poll.
Don't grab Giant in the upper syscall/wrapper code
NET_LOCK_GIANT in the socket code (sockets/fifos).
mtx_lock(&Giant) in the vnode code.
Devfs grabs Giant if the driver is marked as needing Giant.
|
#
137753 |
|
15-Nov-2004 |
phk |
Give vn_poll single exit point (to make it easier to insert "mtx_unlock(&Giant)" real soon now).
|
#
137508 |
|
10-Nov-2004 |
phk |
Slim vnodes by another four bytes by eliminating the (now) unused field v_cachedid.
|
#
137483 |
|
09-Nov-2004 |
phk |
Remove vnode->v_cachedfs.
It was only used for the highly dangerous "export all vnodes with a sysctl" function.
|
#
136966 |
|
26-Oct-2004 |
phk |
Put the I/O block size in bufobj->bo_bsize.
We keep si_bsize_phys around for now as that is the simplest way to pull the number out of disk device drivers in devfs_open(). The correct solution would be to do an ioctl(DIOCGSECTORSIZE), but the point is probably mooth when filesystems sit on GEOM, so don't bother for now.
|
#
136964 |
|
26-Oct-2004 |
phk |
Remove unused si_bsize_best field from struct cdev.
|
#
135709 |
|
24-Sep-2004 |
phk |
Hold dev_lock and check for NULL devsw pointer when we service FIODTYPE ioctl.
|
#
135540 |
|
21-Sep-2004 |
phk |
If a vnode has no v_rdev we cannot hope to answer FIODTYPE ioctl.
|
#
133741 |
|
15-Aug-2004 |
jmg |
Add locking to the kqueue subsystem. This also makes the kqueue subsystem a more complete subsystem, and removes the knowlege of how things are implemented from the drivers. Include locking around filter ops, so a module like aio will know when not to be unloaded if there are outstanding knotes using it's filter ops.
Currently, it uses the MTX_DUPOK even though it is not always safe to aquire duplicate locks. Witness currently doesn't support the ability to discover if a dup lock is ok (in some cases).
Reviewed by: green, rwatson (both earlier versions)
|
#
133236 |
|
06-Aug-2004 |
rwatson |
Flag a broad range of VFS operations as GIANT_REQUIRED in order to catch leaking into VFS without Giant.
Inch Giant a little lower in several file descriptor operations on vnodes to cover only VFS operations that need it, rather than file flag reading, etc.
|
#
132554 |
|
22-Jul-2004 |
rwatson |
Push Giant acquisition down into fo_stat() from most callers. Acquire Giant conditional on debug.mpsafenet in the socket soo_stat() routine, unconditionally in vn_statfile() for VFS, and otherwise don't acquire Giant. Accept an unlocked read in kqueue_stat(), and cryptof_stat() is a no-op. Don't acquire Giant in fstat() system call.
Note: in fdescfs, fo_stat() is called while holding Giant due to the VFS stack sitting on top, and therefore there will still be Giant recursion in this case.
|
#
132549 |
|
22-Jul-2004 |
rwatson |
Push acquisition of Giant from fdrop_closed() into fo_close() so that individual file object implementations can optionally acquire Giant if they require it:
- soo_close(): depends on debug.mpsafenet - pipe_close(): Giant not acquired - kqueue_close(): Giant required - vn_close(): Giant required - cryptof_close(): Giant required (conservative)
Notes:
Giant is still acquired in close() even when closing MPSAFE objects due to kqueue requiring Giant in the calling closef() code. Microbenchmarks indicate that this removal of Giant cuts 3%-3% off of pipe create/destroy pairs from user space with SMP compiled into the kernel.
The cryptodev and opencrypto code appears MPSAFE, but I'm unable to test it extensively and so have left Giant over fo_close(). It can probably be removed given some testing and review.
|
#
131934 |
|
10-Jul-2004 |
marcel |
Update for the KDB framework: o Call kdb_enter() instead of Debugger().
|
#
130101 |
|
05-Jun-2004 |
tjr |
Change the types of vn_rdwr_inchunks()'s len and aresid arguments to size_t and size_t *, respectively. Update callers for the new interface. This is a better fix for overflows that occurred when dumping segments larger than 2GB to core files.
|
#
129973 |
|
01-Jun-2004 |
rwatson |
Rather than assert f_type==DTYPE_VNODE, conditionally perform the file lock release based on f_type==DTYPE_VNODE. vn_closefile() is used by non-vnode types as well (fifo).
|
#
129948 |
|
01-Jun-2004 |
rwatson |
Push the VOP_ADVLOCK() call to release advisory locks on vnode file descriptors out of fdrop_locked() and into vn_closefile(). This removes all knowledge of vnodes from fdrop_locked(), since the lock behavior was specific to vnodes. This also removes the specific requirement for Giant in fdrop_locked(), it's now only required by code that it calls into.
Add GIANT_REQUIRED to vn_closefile() since VFS requires Giant.
|
#
129904 |
|
31-May-2004 |
rwatson |
Assert Giant in vn_start_write() and vn_finished_write().
|
#
127911 |
|
05-Apr-2004 |
imp |
Remove advertising clause from University of California Regent's license, per letter dated July 22, 1999.
Approved by: core
|
#
126902 |
|
13-Mar-2004 |
bde |
Align the offset in vn_rdwr_inchunks() so that at most the first and the last chunk are misaligned relative to a MAXBSIZE byte boundary. vn_rdwr_inchunks() is used mainly for elf core dumps, and elf sections are usually perfectly misaligned relative to MAXBSIZE, and chunking prevents the file system from doing much realigning.
This gives a surprisingly large speedup for core dumps -- from 50 to 13 seconds for a 512MB core dump here. The pessimization was mostly from an interaction of the misalignment with IO_DIRECT. It increased the number of i/o's for each chunk by a factor of 5 (3 writes and 2 read-before-writes instead of 1 write).
|
#
123932 |
|
28-Dec-2003 |
bde |
v_vxproc was a bogus name for a thread (pointer).
|
#
120743 |
|
04-Oct-2003 |
jeff |
- If we are called with LK_NOWAIT in vn_lock() we may be holding a mutex and should not sleep while waiting for XLOCK to clear. Care needs to be taken in functions that use this capability to avoid spinning.
|
#
118131 |
|
28-Jul-2003 |
rwatson |
Rename VOP_RMEXTATTR() to VOP_DELETEEXTATTR() for consistency with the kernel ACL interfaces and system call names.
Break out UFS2 and FFS extattr delete and list vnode operations from setextattr and getextattr to deleteextattr and listextattr, which cleans up the implementations, and makes the results more readable, and makes the APIs more clear.
Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories
|
#
118097 |
|
27-Jul-2003 |
phk |
Pass the fdidx argument from vn_open{_cred}() onto VOP_OPEN()
|
#
118094 |
|
27-Jul-2003 |
phk |
Add fdidx argument to vn_open() and vn_open_cred() and pass -1 throughout.
|
#
118047 |
|
26-Jul-2003 |
phk |
Add a "int fd" argument to VOP_OPEN() which in the future will contain the filedescriptor number on opens from userland.
The index is used rather than a "struct file *" since it conveys a bit more information, which may be useful to in particular fdescfs and /dev/fd/*
For now pass -1 all over the place.
|
#
116699 |
|
22-Jun-2003 |
rwatson |
Prefer the vop_rmextattr() vnode operation for removing extended attributes from objects over vop_setextattr() with a NULL uio; if the file system doesn't support the vop_rmextattr() method, fall back to the vop_setextattr() method.
Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories
|
#
116678 |
|
22-Jun-2003 |
phk |
Add a f_vnode field to struct file.
Several of the subtypes have an associated vnode which is used for stuff like the f*() functions.
By giving the vnode a speparate field, a number of checks for the specific subtype can be replaced simply with a check for f_vnode != NULL, and we can later free f_data up to subtype specific use.
At this point in time, f_data still points to the vnode, so any code I might have overlooked will still work.
|
#
116550 |
|
18-Jun-2003 |
phk |
Introduce a new flag on a file descriptor: DFLAG_SEEKABLE and use that rather than assume that only DTYPE_VNODE is seekable.
|
#
116546 |
|
18-Jun-2003 |
phk |
Initialize struct fileops with C99 sparse initialization.
|
#
116182 |
|
11-Jun-2003 |
obrien |
Use __FBSDID().
|
#
115791 |
|
04-Jun-2003 |
rwatson |
Assert the vnode lock when returning successfully from vn_open_cred().
|
#
114216 |
|
29-Apr-2003 |
kan |
Deprecate machine/limits.h in favor of new sys/limits.h. Change all in-tree consumers to include <sys/limits.h>
Discussed on: standards@ Partially submitted by: Craig Rodrigues <rodrigc@attbi.com>
|
#
112684 |
|
26-Mar-2003 |
tegge |
fp->f_offset doesn't need any protection when it isn't accessed.
|
#
110908 |
|
15-Feb-2003 |
alfred |
Do not allow kqueues to be passed via unix domain sockets.
|
#
109153 |
|
13-Jan-2003 |
dillon |
Bow to the whining masses and change a union back into void *. Retain removal of unnecessary casts and throw in some minor cleanups to see if anyone complains, just for the hell of it.
|
#
109123 |
|
12-Jan-2003 |
dillon |
Change struct file f_data to un_data, a union of the correct struct pointer types, and remove a huge number of casts from code using it.
Change struct xfile xf_data to xun_data (ABI is still compatible).
If we need to add a #define for f_data and xf_data we can, but I don't think it will be necessary. There are no operational changes in this commit.
|
#
108897 |
|
07-Jan-2003 |
green |
In vn_open(), unset ndp->ni_vp when returning failure so that code which expects it to be NULL unless the return value was 0 will work.
Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories
|
#
108356 |
|
28-Dec-2002 |
dillon |
Abstract-out the constants for the sequential heuristic.
No operational changes.
MFC after: 1 day
|
#
108255 |
|
24-Dec-2002 |
phk |
White-space changes.
|
#
108238 |
|
23-Dec-2002 |
phk |
Detediousficate declaration of fileops array members by introducing typedefs for them.
|
#
105902 |
|
25-Oct-2002 |
mckusick |
Within ufs, the ffs_sync and ffs_fsync functions did not always check for and/or report I/O errors. The result is that a VFS_SYNC or VOP_FSYNC called with MNT_WAIT could loop infinitely on ufs in the presence of a hard error writing a disk sector or in a filesystem full condition. This patch ensures that I/O errors will always be checked and returned. This patch also ensures that every call to VFS_SYNC or VOP_FSYNC with MNT_WAIT set checks for and takes appropriate action when an error is returned.
Sponsored by: DARPA & NAI Labs.
|
#
105475 |
|
19-Oct-2002 |
rwatson |
Drop in the MAC check for file creation as part of open().
Approved by: re Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories
|
#
104017 |
|
26-Sep-2002 |
phk |
Under DIAGNOSTIC, complain if ENOIOCTL leaks out through VOP_IOCTL().
|
#
102412 |
|
25-Aug-2002 |
charnier |
Replace various spelling with FALLTHROUGH which is lint()able
|
#
102297 |
|
23-Aug-2002 |
jeff |
- Fix a mistake in my last few commits. The PDROP flag stops msleep from re-acquiring the mutex.
Pointy hat to: me Noticed by: tegge
|
#
102256 |
|
22-Aug-2002 |
jeff |
- Closer inspection revealed a possible deadlock situation in vn_lock() that was introduced by my last commit but not caught by stress testing. Fix that and slightly restructure the code so that it is more readable.
|
#
102255 |
|
22-Aug-2002 |
jeff |
- Make vn_lock() vget() and VOP_LOCK() all behave the same way WRT LK_INTERLOCK. The interlock will never be held on return from these functions even when there is an error. Errors typically only occur when the XLOCK is held which means this isn't the vnode we want anyway. Almost all users of these interfaces expected this behavior even though it was not provided before.
|
#
102254 |
|
22-Aug-2002 |
jeff |
- Return two shared locks to exclusive locks. This was premature. - Document the problems that prevent us from using shared locks.
|
#
102252 |
|
22-Aug-2002 |
jeff |
- Fix interlock handling in vn_lock(). Previously, vn_lock() could return with interlock held in error conditions when the caller did not specify LK_INTERLOCK. - Add several comments to vn_lock() describing the rational behind the code flow since it was not immediately obvious.
|
#
102214 |
|
21-Aug-2002 |
jeff |
- Document two cases, one in vget and the other in vn_lock, where the state of interlock on exit is not consistent. There are probably several bugs relating to this.
|
#
102129 |
|
19-Aug-2002 |
rwatson |
Pass active_cred and file_cred into the MAC framework explicitly for mac_check_vnode_{poll,read,stat,write}(). Pass in fp->f_cred when calling these checks with a struct file available. Otherwise, pass NOCRED. All currently MAC policies use active_cred, but could now offer the cached credential semantic used for the base system security model.
Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs
|
#
102112 |
|
19-Aug-2002 |
rwatson |
Break out mac_check_vnode_op() into three seperate checks: mac_check_vnode_poll(), mac_check_vnode_read(), mac_check_vnode_write(). This improves the consistency with other existing vnode checks, and allows policies to avoid implementing switch statements to determine what operations they do and do not want to authorize.
Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs
|
#
102003 |
|
17-Aug-2002 |
rwatson |
In continuation of early fileop credential changes, modify fo_ioctl() to accept an 'active_cred' argument reflecting the credential of the thread initiating the ioctl operation.
- Change fo_ioctl() to accept active_cred; change consumers of the fo_ioctl() interface to generally pass active_cred from td->td_ucred. - In fifofs, initialize filetmp.f_cred to ap->a_cred so that the invocations of soo_ioctl() are provided access to the calling f_cred. Pass ap->a_td->td_ucred as the active_cred, but note that this is required because we don't yet distinguish file_cred and active_cred in invoking VOP's. - Update kqueue_ioctl() for its new argument. - Update pipe_ioctl() for its new argument, pass active_cred rather than td_ucred to MAC for authorization. - Update soo_ioctl() for its new argument. - Update vn_ioctl() for its new argument, use active_cred rather than td->td_ucred to authorize VOP_IOCTL() and the associated VOP_GETATTR().
Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs
|
#
101983 |
|
16-Aug-2002 |
rwatson |
Make similar changes to fo_stat() and fo_poll() as made earlier to fo_read() and fo_write(): explicitly use the cred argument to fo_poll() as "active_cred" using the passed file descriptor's f_cred reference to provide access to the file credential. Add an active_cred argument to fo_stat() so that implementers have access to the active credential as well as the file credential. Generally modify callers of fo_stat() to pass in td->td_ucred rather than fp->f_cred, which was redundantly provided via the fp argument. This set of modifications also permits threads to perform these operations on behalf of another thread without modifying their credential.
Trickle this change down into fo_stat/poll() implementations:
- badfo_poll(), badfo_stat(): modify/add arguments. - kqueue_poll(), kqueue_stat(): modify arguments. - pipe_poll(), pipe_stat(): modify/add arguments, pass active_cred to MAC checks rather than td->td_ucred. - soo_poll(), soo_stat(): modify/add arguments, pass fp->f_cred rather than cred to pru_sopoll() to maintain current semantics. - sopoll(): moidfy arguments. - vn_poll(), vn_statfile(): modify/add arguments, pass new arguments to vn_stat(). Pass active_cred to MAC and fp->f_cred to VOP_POLL() to maintian current semantics. - vn_close(): rename cred to file_cred to reflect reality while I'm here. - vn_stat(): Add active_cred and file_cred arguments to vn_stat() and consumers so that this distinction is maintained at the VFS as well as 'struct file' layer. Pass active_cred instead of td->td_ucred to MAC and to VOP_GETATTR() to maintain current semantics.
- fifofs: modify the creation of a "filetemp" so that the file credential is properly initialized and can be used in the socket code if desired. Pass ap->a_td->td_ucred as the active credential to soo_poll(). If we teach the vnop interface about the distinction between file and active credentials, we would use the active credential here.
Note that current inconsistent passing of active_cred vs. file_cred to VOP's is maintained. It's not clear why GETATTR would be authorized using active_cred while POLL would be authorized using file_cred at the file system level.
Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs
|
#
101941 |
|
15-Aug-2002 |
rwatson |
In order to better support flexible and extensible access control, make a series of modifications to the credential arguments relating to file read and write operations to cliarfy which credential is used for what:
- Change fo_read() and fo_write() to accept "active_cred" instead of "cred", and change the semantics of consumers of fo_read() and fo_write() to pass the active credential of the thread requesting an operation rather than the cached file cred. The cached file cred is still available in fo_read() and fo_write() consumers via fp->f_cred. These changes largely in sys_generic.c.
For each implementation of fo_read() and fo_write(), update cred usage to reflect this change and maintain current semantics:
- badfo_readwrite() unchanged - kqueue_read/write() unchanged pipe_read/write() now authorize MAC using active_cred rather than td->td_ucred - soo_read/write() unchanged - vn_read/write() now authorize MAC using active_cred but VOP_READ/WRITE() with fp->f_cred
Modify vn_rdwr() to accept two credential arguments instead of a single credential: active_cred and file_cred. Use active_cred for MAC authorization, and select a credential for use in VOP_READ/WRITE() based on whether file_cred is NULL or not. If file_cred is provided, authorize the VOP using that cred, otherwise the active credential, matching current semantics.
Modify current vn_rdwr() consumers to pass a file_cred if used in the context of a struct file, and to always pass active_cred. When vn_rdwr() is used without a file_cred, pass NOCRED.
These changes should maintain current semantics for read/write, but avoid a redundant passing of fp->f_cred, as well as making it more clear what the origin of each credential is in file descriptor read/write operations.
Follow-up commits will make similar changes to other file descriptor operations, and modify the MAC framework to pass both credentials to MAC policy modules so they can implement either semantic for revocation.
Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs
|
#
101741 |
|
12-Aug-2002 |
rwatson |
Implement IO_NOMACCHECK in vn_rdwr() -- perform MAC checks (assuming 'options MAC') as long as IO_NOMACCHECK is not set in the IO flags. If IO_NOMACCHECK is set, bypass MAC checks in vn_rdwr(). This allows vn_rdwr() to be used as a utility function inside of file systems where MAC checks have already been performed, or where the operation is being done on behalf of the kernel not the user.
Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI LAbs
|
#
101525 |
|
08-Aug-2002 |
rwatson |
Due to layering problems, remove the MAC checks from vn_rdwr() -- this VOP wrapper is called from within file systems so can result in odd loopback effects when MAC enforcement is use with the active (as opposed to saved) credential. These checks will be moved elsewhere.
Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs
|
#
101308 |
|
04-Aug-2002 |
jeff |
- Replace v_flag with v_iflag and v_vflag - v_vflag is protected by the vnode lock and is used when synchronization with VOP calls is needed. - v_iflag is protected by interlock and is used for dealing with vnode management issues. These flags include X/O LOCK, FREE, DOOMED, etc. - All accesses to v_iflag and v_vflag have either been locked or marked with mp_fixme's. - Many ASSERT_VOP_LOCKED calls have been added where the locking was not clear. - Many functions in vfs_subr.c were restructured to provide for stronger locking.
Idea stolen from: BSD/OS
|
#
101175 |
|
01-Aug-2002 |
rwatson |
Since we have the struct file data pointer cached in vp, use that instead when invoking VOP_POLL().
|
#
101166 |
|
01-Aug-2002 |
rwatson |
Introduce support for Mandatory Access Control and extensible kernel access control
Invoke appropriate MAC framework entry points to authorize a number of vnode operations, including read, write, stat, poll. This permits MAC policies to revoke access to files following label changes, and to limit information spread about the file to user processes.
Note: currently the file cached credential is used for some of these authorization check. We will need to expand some of the MAC entry point APIs to permit multiple creds to be passed to the access control check to allow diverse policy behavior.
Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs
|
#
101164 |
|
01-Aug-2002 |
rwatson |
Introduce support for Mandatory Access Control and extensible kernel access control.
Restructure the vn_open_cred() access control checks to invoke the MAC entry point for open authorization. Note that MAC can reject open requests where existing DAC code skips the open authorization check due to O_CREAT. However, the failure mode here is the same as other failure modes following creation, wherein an empty file may be left behind.
Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs
|
#
101039 |
|
31-Jul-2002 |
des |
Introduce struct xvnode, which will be used instead of struct vnode for sysctl purposes. Also add two fields to struct vnode, v_cachedfs and v_cachedid, which hold the vnode's device and file id and are filled in by vn_open_cred() and vn_stat().
Sponsored by: DARPA, NAI Labs
|
#
100496 |
|
22-Jul-2002 |
rwatson |
Set VAPPEND in open mode when O_APPEND is specified as an argument to open() of fhopen(). Currently this has no actual affect due to the treatment of VAPPEND in vaccess() and vaccess_acl() as a subset of VWRITE, but when MAC comes in, MAC will distinguish the two. Note: if any file systems are cutting their own permission models, they may wish to now take this into account.
Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs
|
#
100201 |
|
16-Jul-2002 |
mckusick |
Change the name of st_createtime to st_birthtime. This change is made to reduce confusion between st_ctime and st_createtime.
Submitted by: Eric Allman <eric@sendmail.org> Sponsored by: DARPA & NAI Labs.
|
#
99009 |
|
29-Jun-2002 |
alfred |
More caddr_t removal, make fo_ioctl take a void * instead of a caddr_t.
|
#
98981 |
|
28-Jun-2002 |
jeff |
Clean up vn_rdwr locking.
- Do shared locks on read. - Only do vn_{start,finished}_write when writing.
|
#
98734 |
|
24-Jun-2002 |
mckusick |
Use proper size in bzero of stat structure.
Submitted by: Jake Burkholder <jake@locore.ca> Sponsored by: DARPA & NAI Labs.
|
#
98644 |
|
22-Jun-2002 |
mckusick |
This patch fixes a size problem with the stat structure for 64-bit architectures that was introduced in the UFS2 code merge two days ago. The stat structure change that caused the problem was the addition of the file create time.
Submitted by: Bruce Evans <bde@zeta.org.au> Sponsored by: DARPA & NAI Labs.
|
#
98542 |
|
21-Jun-2002 |
mckusick |
This commit adds basic support for the UFS2 filesystem. The UFS2 filesystem expands the inode to 256 bytes to make space for 64-bit block pointers. It also adds a file-creation time field, an ability to use jumbo blocks per inode to allow extent like pointer density, and space for extended attributes (up to twice the filesystem block size worth of attributes, e.g., on a 16K filesystem, there is space for 32K of attributes). UFS2 fully supports and runs existing UFS1 filesystems. New filesystems built using newfs can be built in either UFS1 or UFS2 format using the -O option. In this commit UFS1 is the default format, so if you want to build UFS2 format filesystems, you must specify -O 2. This default will be changed to UFS2 when UFS2 proves itself to be stable. In this commit the boot code for reading UFS2 filesystems is not compiled (see /sys/boot/common/ufsread.c) as there is insufficient space in the boot block. Once the size of the boot block is increased, this code can be defined.
Things to note: the definition of SBSIZE has changed to SBLOCKSIZE. The header file <ufs/ufs/dinode.h> must be included before <ufs/ffs/fs.h> so as to get the definitions of ufs2_daddr_t and ufs_lbn_t.
Still TODO: Verify that the first level bootstraps work for all the architectures. Convert the utility ffsinfo to understand UFS2 and test growfs. Add support for the extended attribute storage. Update soft updates to ensure integrity of extended attribute storage. Switch the current extended attribute interfaces to use the extended attribute storage. Add the extent like functionality (framework is there, but is currently never used).
Sponsored by: DARPA & NAI Labs. Reviewed by: Poul-Henning Kamp <phk@freebsd.org>
|
#
96616 |
|
14-May-2002 |
jeff |
Disable the shared locking namei() code for now. It breaks several stacking filesystems. This is on hold until the rest of VFS Locking is reviewed and deemed safe. It can be enabled with 'options LOOKUP_SHARED'.
|
#
94861 |
|
16-Apr-2002 |
jhb |
Lock proctree_lock instead of pgrpsess_lock.
|
#
94646 |
|
14-Apr-2002 |
jeff |
Use VOP_GETVOBJECT instead of accessing the member directly. This fixed an issue with nullfs and NAMEI shared.
Submitted by: Alexander Kabaev
|
#
94262 |
|
09-Apr-2002 |
jeff |
Turn #ifdef LOOKUP_SHARED into #ifndef LOOKUP_EXCLUSIVE to enable this behavior by default. Also, change the options line to reflect this.
If there are no problems reported this will become the only behavior and the knob will be removed in a month or so.
Demanded by: obrien
|
#
93593 |
|
01-Apr-2002 |
jhb |
Change the suser() API to take advantage of td_ucred as well as do a general cleanup of the API. The entire API now consists of two functions similar to the pre-KSE API. The suser() function takes a thread pointer as its only argument. The td_ucred member of this thread must be valid so the only valid thread pointers are curthread and a few kernel threads such as thread0. The suser_cred() function takes a pointer to a struct ucred as its first argument and an integer flag as its second argument. The flag is currently only used for the PRISON_ROOT flag.
Discussed on: smp@
|
#
93188 |
|
26-Mar-2002 |
bde |
Added used include of <sys/sx.h>. Don't depend on namespace pollution in <sys/file.h>.
|
#
92723 |
|
19-Mar-2002 |
alfred |
Remove __P.
|
#
92310 |
|
15-Mar-2002 |
alfred |
Giant pushdown for read/write/pread/pwrite syscalls.
kern/kern_descrip.c: Aquire Giant in fdrop_locked when file refcount hits zero, this removes the requirement for the caller to own Giant for the most part.
kern/kern_ktrace.c: Aquire Giant in ktrgenio, simplifies locking in upper read/write syscalls.
kern/vfs_bio.c: Aquire Giant in bwillwrite if needed.
kern/sys_generic.c Giant pushdown, remove Giant for: read, pread, write and pwrite. readv and writev aren't done yet because of the possible malloc calls for iov to uio processing.
kern/sys_socket.c Grab giant in the socket fo_read/write functions.
kern/vfs_vnops.c Grab giant in the vnode fo_read/write functions.
|
#
92130 |
|
12-Mar-2002 |
jeff |
This patch adds the "LOCKSHARED" option to namei which causes it to only acquire shared locks on leafs. The stat() and open() calls have been changed to make use of this new functionality. Using shared locks in these cases is sufficient and can significantly reduce their latency if IO is pending to these vnodes. Also, this reduces the number of exclusive locks that are floating around in the system, which helps reduce the number of deadlocks that occur.
A new kernel option "LOOKUP_SHARED" has been added. It defaults to off so this patch can be turned on for testing, and should eventually go away once it is proven to be stable. I have personally been running this patch for over a year now, so it is believed to be fully stable.
Reviewed by: jake, obrien Approved by: jake
|
#
92069 |
|
11-Mar-2002 |
tanimura |
Stop abusing the pgrpsess_lock.
|
#
91690 |
|
05-Mar-2002 |
eivind |
Document all functions, global and static variables, and sysctls. Includes some minor whitespace changes, and re-ordering to be able to document properly (e.g, grouping of variables and the SYSCTL macro calls for them, where the documentation has been added.)
Reviewed by: phk (but all errors are mine)
|
#
91406 |
|
27-Feb-2002 |
jhb |
Simple p_ucred -> td_ucred changes to start using the per-thread ucred reference.
|
#
91140 |
|
23-Feb-2002 |
tanimura |
Lock struct pgrp, session and sigio.
New locks are:
- pgrpsess_lock which locks the whole pgrps and sessions, - pg_mtx which protects the pgrp members, and - s_mtx which protects the session members.
Please refer to sys/proc.h for the coverage of these locks.
Changes on the pgrp/session interface:
- pgfind() needs the pgrpsess_lock held.
- The caller of enterpgrp() is responsible to allocate a new pgrp and session.
- Call enterthispgrp() in order to enter an existing pgrp.
- pgsignal() requires a pgrp lock held.
Reviewed by: jhb, alfred Tested on: cvsup.jp.FreeBSD.org (which is a quad-CPU machine running -current)
|
#
90946 |
|
20-Feb-2002 |
rwatson |
More cleanups relating to vm object allocation failure: make sure we call VOP_CLOSE() with vp unlocked; clean up the return path a little, in as much as our namei/vnode operation return paths can be cleared up. For a return case that was apparently never taken, this sure is ugly.
Reviewed by: jeffr
|
#
90846 |
|
18-Feb-2002 |
iedowse |
Add the braces missed by revision 1.131.
Pointy hat to: rwatson
|
#
90818 |
|
18-Feb-2002 |
rwatson |
When vn_open() is failing because it cannot allocate a vm object, call VOP_CLOSE() on the vnode, so that VOP_OPEN() and VOP_CLOSE() calls are symmetric in all failure cases. This prevents an 'open' reference from being leaked in that unlikely failure scenario.
|
#
90486 |
|
10-Feb-2002 |
rwatson |
Make sure to hold vnode lock when calling into VOP_GETATTR().
Discussed with: mckusick, phk
|
#
90448 |
|
10-Feb-2002 |
rwatson |
Part I: Update extended attribute API and ABI:
o Modify the system call syntax for extattr_{get,set}_{fd,file}() so as not to use the scatter gather API (which appeared not to be used by any consumers, and be less portable), rather, accepts 'data' and 'nbytes' in the style of other simple read/write interfaces. This changes the API and ABI.
o Modify system call semantics so that extattr_get_{fd,file}() return a size_t. When performing a read, the number of bytes read will be returned, unless the data pointer is NULL, in which case the number of bytes of data are returned. This changes the API only.
o Modify the VOP_GETEXTATTR() vnode operation to accept a *size_t argument so as to return the size, if desirable. If set to NULL, the size will not be returned.
o Update various filesystems (pseodofs, ufs) to DTRT.
These changes should make extended attributes more useful and more portable. More commits to rebuild the system call files, as well as update userland utilities to follow.
Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs
|
#
89784 |
|
25-Jan-2002 |
phk |
Make st_blksize default to PAGE_SIZE instead of zero.
|
#
89526 |
|
19-Jan-2002 |
dillon |
Remove 'VXLOCK: interlock avoided' warnings. This can now occur in normal operation. The vgonel() code has always called vclean() but until we started proactively freeing vnodes it would never actually be called with a dirty vnode, so this situation did not occur prior to the vnlru() code. Now that we proactively free vnodes when kern.maxvnodes is hit, however, vclean() winds up with work to do and improperly generates the warnings.
Reviewed by: peter Approved by: re (for MFC) MFC after: 1 day
|
#
89306 |
|
13-Jan-2002 |
alfred |
SMP Lock struct file, filedesc and the global file list.
Seigo Tanimura (tanimura) posted the initial delta.
I've polished it quite a bit reducing the need for locking and adapting it for KSE.
Locks:
1 mutex in each filedesc protects all the fields. protects "struct file" initialization, while a struct file is being changed from &badfileops -> &pipeops or something the filedesc should be locked.
1 mutex in each struct file protects the refcount fields. doesn't protect anything else. the flags used for garbage collection have been moved to f_gcflag which was the FILLER short, this doesn't need locking because the garbage collection is a single threaded container. could likely be made to use a pool mutex.
1 sx lock for the global filelist.
struct file * fhold(struct file *fp); /* increments reference count on a file */
struct file * fhold_locked(struct file *fp); /* like fhold but expects file to locked */
struct file * ffind_hold(struct thread *, int fd); /* finds the struct file in thread, adds one reference and returns it unlocked */
struct file * ffind_lock(struct thread *, int fd); /* ffind_hold, but returns file locked */
I still have to smp-safe the fget cruft, I'll get to that asap.
|
#
88149 |
|
18-Dec-2001 |
dillon |
This is a forward port of Peter's vlrureclaim() fix, with some minor mods by me to make it more efficient. The original code had serious balancing problems and could also deadlock easily. This code relegates the vnode reclamation to its own kproc and relaxes the vnode reclamation requirements to better maintain kern.maxvnodes. This code still doesn't balance as well as it could, but it does a much better job then the original code.
Approved by: re@freebsd.org Obtained from: ps, peter, dillon MFS Assuming: Assuming no problems crop up in Yahoo testing MFC after: 7 days
|
#
86278 |
|
11-Nov-2001 |
alfred |
turn vn_open() into a wrapper around vn_open_cred() which allows one to perform a vn_open using temporary/other/fake credentials.
Modify the nfs client side locking code to use vn_open_cred() passing proc0's ucred instead of the old way which was to temporary raise privs while running vn_open(). This should close the race hopefully.
|
#
85369 |
|
23-Oct-2001 |
rwatson |
o vn_open() fails to call VOP_CLOSE() if vfs_object_create fails. Ideally all successful calls to VOP_OPEN() might be reflected in a call to VOP_CLOSE(). For now, simply add a comment reflecting this problem; this should be fixed at some point.
|
#
84811 |
|
11-Oct-2001 |
jhb |
Add missing includes of sys/lock.h.
|
#
83959 |
|
26-Sep-2001 |
dillon |
Make uio_yield() a global. Call uio_yield() between chunks in vn_rdwr_inchunks(), allowing other processes to gain an exclusive lock on the vnode. Specifically: directory scanning, to avoid a race to the root directory, and multiple child processes coring simultaniously so they can figure out that some other core'ing child has an exclusive adv lock and just exit instead.
This completely fixes performance problems when large programs core. You can have hundreds of copies (forked children) of the same binary core all at once and not notice.
MFC after: 3 days
|
#
83366 |
|
12-Sep-2001 |
julian |
KSE Milestone 2 Note ALL MODULES MUST BE RECOMPILED make the kernel aware that there are smaller units of scheduling than the process. (but only allow one thread per process at this time). This is functionally equivalent to teh previousl -current except that there is a thread associated with each process.
Sorry john! (your next MFC will be a doosie!)
Reviewed by: peter@freebsd.org, dillon@freebsd.org
X-MFC after: ha ha ha ha
|
#
83222 |
|
08-Sep-2001 |
dillon |
This brings in a Yahoo coredump patch from Paul, with additional mods by me (addition of vn_rdwr_inchunks). The problem Yahoo is solving is that if you have large process images core dumping, or you have a large number of forked processes all core dumping at the same time, the original coredump code would leave the vnode locked throughout. This can cause the directory vnode to get locked up, which can cause the parent directory vnode to get locked up, and so on all the way to the root node, locking the entire machine up for extremely long periods of time.
This patch solves the problem in two ways. First it uses an advisory non-blocking lock to abort multiple processes trying to core to the same file. Second (my contribution) it chunks up the writes and uses bwillwrite() to avoid holding the vnode locked while blocking in the buffer cache.
Submitted by: ps Reviewed by: dillon MFC after: 2 weeks
|
#
82211 |
|
23-Aug-2001 |
ache |
vn_stat(): if va_size (u_quad_t) > OFF_MAX, return EOVERFLOW, don't copy it blindly to st_size
|
#
77115 |
|
24-May-2001 |
dillon |
This patch implements O_DIRECT about 80% of the way. It takes a patchset Tor created a while ago, removes the raw I/O piece (that has cache coherency problems), and adds a buffer cache / VM freeing piece.
Essentially this patch causes O_DIRECT I/O to not be left in the cache, but does not prevent it from going through the cache, hence the 80%. For the last 20% we need a method by which the I/O can be issued directly to buffer supplied by the user process and bypass the buffer cache entirely, but still maintain cache coherency.
I also have the code working under -stable but the changes made to sys/file.h may not be MFCable, so an MFC is not on the table yet.
Submitted by: tegge, dillon
|
#
76117 |
|
29-Apr-2001 |
grog |
Revert consequences of changes to mount.h, part 2.
Requested by: bde
|
#
75943 |
|
25-Apr-2001 |
mckusick |
When closing the last reference to an unlinked file, it is freed by the inactive routine. Because the freeing causes the filesystem to be modified, the close must be held up during periods when the filesystem is suspended.
For snapshots to be consistent across crashes, they must write blocks that they copy and claim those written blocks in their on-disk block pointers before the old blocks that they referenced can be allowed to be written.
Close a loophole that allowed unwritten blocks to be skipped when doing ffs_sync with a request to wait for all I/O activity to be completed.
|
#
75858 |
|
23-Apr-2001 |
grog |
Correct #includes to work with fixed sys/mount.h.
|
#
74811 |
|
26-Mar-2001 |
bp |
Previous commit broke interlock locking for !LK_RETRY case.
|
#
74803 |
|
26-Mar-2001 |
bp |
Prevent race condition by using msleep() instead of mtx_unlock()/tsleep().
Reviewed by: alfred
|
#
74437 |
|
19-Mar-2001 |
rwatson |
o Rename "namespace" argument to "attrnamespace" as namespace is a C++ reserved word.
Submitted by: jkh Obtained from: TrustedBSD Project
|
#
74273 |
|
15-Mar-2001 |
rwatson |
o Change the API and ABI of the Extended Attribute kernel interfaces to introduce a new argument, "namespace", rather than relying on a first- character namespace indicator. This is in line with more recent thinking on EA interfaces on various mailing lists, including the posix1e, Linux acl-devel, and trustedbsd-discuss forums. Two namespaces are defined by default, EXTATTR_NAMESPACE_SYSTEM and EXTATTR_NAMESPACE_USER, where the primary distinction lies in the access control model: user EAs are accessible based on the normal MAC and DAC file/directory protections, and system attributes are limited to kernel-originated or appropriately privileged userland requests.
o These API changes occur at several levels: the namespace argument is introduced in the extattr_{get,set}_file() system call interfaces, at the vnode operation level in the vop_{get,set}extattr() interfaces, and in the UFS extended attribute implementation. Changes are also introduced in the VFS extattrctl() interface (system call, VFS, and UFS implementation), where the arguments are modified to include a namespace field, as well as modified to advoid direct access to userspace variables from below the VFS layer (in the style of recent changes to mount by adrian@FreeBSD.org). This required some cleanup and bug fixing regarding VFS locks and the VFS interface, as a vnode pointer may now be optionally submitted to the VFS_EXTATTRCTL() call. Updated documentation for the VFS interface will be committed shortly.
o In the near future, the auto-starting feature will be updated to search two sub-directories to the ".attribute" directory in appropriate file systems: "user" and "system" to locate attributes intended for those namespaces, as the single filename is no longer sufficient to indicate what namespace the attribute is intended for. Until this is committed, all attributes auto-started by UFS will be placed in the EXTATTR_NAMESPACE_SYSTEM namespace.
o The default POSIX.1e attribute names for ACLs and Capabilities have been updated to no longer include the '$' in their filename. As such, if you're using these features, you'll need to rename the attribute backing files to the same names without '$' symbols in front.
o Note that these changes will require changes in userland, which will be committed shortly. These include modifications to the extended attribute utilities, as well as to libutil for new namespace string conversion routines. Once the matching userland changes are committed, a buildworld is recommended to update all the necessary include files and verify that the kernel and userland environments are in sync. Note: If you do not use extended attributes (most people won't), upgrading is not imperative although since the system call API has changed, the new userland extended attribute code will no longer compile with old include files.
o Couple of minor cleanups while I'm there: make more code compilation conditional on FFS_EXTATTR, which should recover a bit of space on kernels running without EA's, as well as update copyright dates.
Obtained from: TrustedBSD Project
|
#
72521 |
|
15-Feb-2001 |
jlemon |
Extend kqueue down to the device layer.
Backwards compatible approach suggested by: peter
|
#
72200 |
|
09-Feb-2001 |
bmilekic |
Change and clean the mutex lock interface.
mtx_enter(lock, type) becomes:
mtx_lock(lock) for sleep locks (MTX_DEF-initialized locks) mtx_lock_spin(lock) for spin locks (MTX_SPIN-initialized)
similarily, for releasing a lock, we now have:
mtx_unlock(lock) for MTX_DEF and mtx_unlock_spin(lock) for MTX_SPIN. We change the caller interface for the two different types of locks because the semantics are entirely different for each case, and this makes it explicitly clear and, at the same time, it rids us of the extra `type' argument.
The enter->lock and exit->unlock change has been made with the idea that we're "locking data" and not "entering locked code" in mind.
Further, remove all additional "flags" previously passed to the lock acquire/release routines with the exception of two:
MTX_QUIET and MTX_NOSWITCH
The functionality of these flags is preserved and they can be passed to the lock/unlock routines by calling the corresponding wrappers:
mtx_{lock, unlock}_flags(lock, flag(s)) and mtx_{lock, unlock}_spin_flags(lock, flag(s)) for MTX_DEF and MTX_SPIN locks, respectively.
Re-inline some lock acq/rel code; in the sleep lock case, we only inline the _obtain_lock()s in order to ensure that the inlined code fits into a cache line. In the spin lock case, we inline recursion and actually only perform a function call if we need to spin. This change has been made with the idea that we generally tend to avoid spin locks and that also the spin locks that we do have and are heavily used (i.e. sched_lock) do recurse, and therefore in an effort to reduce function call overhead for some architectures (such as alpha), we inline recursion for this case.
Create a new malloc type for the witness code and retire from using the M_DEV type. The new type is called M_WITNESS and is only declared if WITNESS is enabled.
Begin cleaning up some machdep/mutex.h code - specifically updated the "optimized" inlined code in alpha/mutex.h and wrote MTX_LOCK_SPIN and MTX_UNLOCK_SPIN asm macros for the i386/mutex.h as we presently need those.
Finally, caught up to the interface changes in all sys code.
Contributors: jake, jhb, jasone (in no particular order)
|
#
71576 |
|
24-Jan-2001 |
jasone |
Convert all simplelocks to mutexes and remove the simplelock implementations.
|
#
68885 |
|
18-Nov-2000 |
dillon |
Implement a low-memory deadlock solution.
Removed most of the hacks that were trying to deal with low-memory situations prior to now.
The new code is based on the concept that I/O must be able to function in a low memory situation. All major modules related to I/O (except networking) have been adjusted to allow allocation out of the system reserve memory pool. These modules now detect a low memory situation but rather then block they instead continue to operate, then return resources to the memory pool instead of cache them or leave them wired.
Code has been added to stall in a low-memory situation prior to a vnode being locked.
Thus situations where a process blocks in a low-memory condition while holding a locked vnode have been reduced to near nothing. Not only will I/O continue to operate, but many prior deadlock conditions simply no longer exist.
Implement a number of VFS/BIO fixes
(found by Ian): in biodone(), bogus-page replacement code, the loop was not properly incrementing loop variables prior to a continue statement. We do not believe this code can be hit anyway but we aren't taking any chances. We'll turn the whole section into a panic (as it already is in brelse()) after the release is rolled.
In biodone(), the foff calculation was incorrectly clamped to the iosize, causing the wrong foff to be calculated for pages in the case of an I/O error or biodone() called without initiating I/O. The problem always caused a panic before. Now it doesn't. The problem is mainly an issue with NFS.
Fixed casts for ~PAGE_MASK. This code worked properly before only because the calculations use signed arithmatic. Better to properly extend PAGE_MASK first before inverting it for the 64 bit masking op.
In brelse(), the bogus_page fixup code was improperly throwing away the original contents of 'm' when it did the j-loop to fix the bogus pages. The result was that it would potentially invalidate parts of the *WRONG* page(!), leading to corruption.
There may still be cases where a background bitmap write is being duplicated, causing potential corruption. We have identified a potentially serious bug related to this but the fix is still TBD. So instead this patch contains a KASSERT to detect the problem and panic the machine rather then continue to corrupt the filesystem. The problem does not occur very often.. it is very hard to reproduce, and it may or may not be the cause of the corruption people have reported.
Review by: (VFS/BIO: mckusick, Ian Dowse <iedowse@maths.tcd.ie>) Testing by: (VM/Deadlock) Paul Saab <ps@yahoo-inc.com>
|
#
68259 |
|
02-Nov-2000 |
phk |
Take VBLK devices further out of their missery.
This should fix the panic I introduced in my previous commit on this topic.
|
#
67365 |
|
20-Oct-2000 |
jhb |
Catch up to moving headers: - machine/ipl.h -> sys/ipl.h - machine/mutex.h -> sys/mutex.h
|
#
66615 |
|
04-Oct-2000 |
jasone |
Convert lockmgr locks from using simple locks to using mutexes.
Add lockdestroy() and appropriate invocations, which corresponds to lockinit() and must be called to clean up after a lockmgr lock is no longer needed.
|
#
66272 |
|
22-Sep-2000 |
rwatson |
o Introduce vn_extattr_rm(), a helper function in the style of vn_extattr_get() and vn_extattr_set(). vn_extattr_rm() removes the specified extended attribute from a vnode, authorizing the change as the kernel (NULL cred).
Obtained from: TrustedBSD Project
|
#
65463 |
|
05-Sep-2000 |
rwatson |
o vn_extattr_set() will now call appropriate vn_start_write() and vn_finished_write() if IO_NODELOCKED is not set.
Obtained from: TrustedBSD Project
|
#
64405 |
|
08-Aug-2000 |
rwatson |
o Introduce vn_extattr_{get,set}, wrapper routines for VOP_GETEXTATTR and VOP_SETEXTATTR to simplify calling from in-kernel consumers, such as capability code. Both accept a vnode (optionally locked, with ioflg to indicate that), attribute name, and a buffer + buffer length in UIO_SYSSPACE. Both authorize the call as a kernel request, with cred set to NULL for the actual VOP_ calls.
Obtained from: TrustedBSD Project
|
#
63788 |
|
24-Jul-2000 |
mckusick |
This patch corrects the first round of panics and hangs reported with the new snapshot code.
Update addaliasu to correctly implement the semantics of the old checkalias function. When a device vnode first comes into existence, check to see if an anonymous vnode for the same device was created at boot time by bdevvp(). If so, adopt the bdevvp vnode rather than creating a new vnode for the device. This corrects a problem which caused the kernel to panic when taking a snapshot of the root filesystem.
Change the calling convention of vn_write_suspend_wait() to be the same as vn_start_write().
Split out softdep_flushworklist() from softdep_flushfiles() so that it can be used to clear the work queue when suspending filesystem operations.
Access to buffers becomes recursive so that snapshots can recursively traverse their indirect blocks using ffs_copyonwrite() when checking for the need for copy on write when flushing one of their own indirect blocks. This eliminates a deadlock between the syncer daemon and a process taking a snapshot.
Ensure that softdep_process_worklist() can never block because of a snapshot being taken. This eliminates a problem with buffer starvation.
Cleanup change in ffs_sync() which did not synchronously wait when MNT_WAIT was specified. The result was an unclean filesystem panic when doing forcible unmount with heavy filesystem I/O in progress.
Return a zero'ed block when reading a block that was not in use at the time that a snapshot was taken. Normally, these blocks should never be read. However, the readahead code will occationally read them which can cause unexpected behavior.
Clean up the debugging code that ensures that no blocks be written on a filesystem while it is suspended. Snapshots must explicitly label the blocks that they are writing during the suspension so that they do not cause a `write on suspended filesystem' panic.
Reorganize ffs_copyonwrite() to eliminate a deadlock and also to prevent a race condition that would permit the same block to be copied twice. This change eliminates an unexpected soft updates inconsistency in fsck caused by the double allocation.
Use bqrelse rather than brelse for buffers that will be needed soon again by the snapshot code. This improves snapshot performance.
|
#
62976 |
|
11-Jul-2000 |
mckusick |
Add snapshots to the fast filesystem. Most of the changes support the gating of system calls that cause modifications to the underlying filesystem. The gating can be enabled by any filesystem that needs to consistently suspend operations by adding the vop_stdgetwritemount to their set of vnops. Once gating is enabled, the function vfs_write_suspend stops all new write operations to a filesystem, allows any filesystem modifying system calls already in progress to complete, then sync's the filesystem to disk and returns. The function vfs_write_resume allows the suspended write operations to begin again. Gating is not added by default for all filesystems as for SMP systems it adds two extra locks to such critical kernel paths as the write system call. Thus, gating should only be added as needed.
Details on the use and current status of snapshots in FFS can be found in /sys/ufs/ffs/README.snapshot so for brevity and timelyness is not included here. Unless and until you create a snapshot file, these changes should have no effect on your system (famous last words).
|
#
62550 |
|
04-Jul-2000 |
mckusick |
Move the truncation code out of vn_open and into the open system call after the acquisition of any advisory locks. This fix corrects a case in which a process tries to open a file with a non-blocking exclusive lock. Even if it fails to get the lock it would still truncate the file even though its open failed. With this change, the truncation is done only after the lock is successfully acquired.
Obtained from: BSD/OS
|
#
62086 |
|
25-Jun-2000 |
jlemon |
Fix stupid braino in last commit, initialize `vp' before we test vp->v_tag.
Spotted by: dillon
|
#
61963 |
|
22-Jun-2000 |
jlemon |
Add a hack to fail registration of kq events on a non-ufs filesystem, as support for those is non-existent at the moment.
|
#
60938 |
|
26-May-2000 |
jake |
Back out the previous change to the queue(3) interface. It was not discussed and should probably not happen.
Requested by: msmith and others
|
#
60833 |
|
23-May-2000 |
jake |
Change the way that the queue(3) structures are declared; don't assume that the type argument to *_HEAD and *_ENTRY is a struct.
Suggested by: phk Reviewed by: phk Approved by: mdodd
|
#
60474 |
|
12-May-2000 |
asmodai |
Fix comment typo.
Submitted by: nrahlstr
|
#
60041 |
|
05-May-2000 |
phk |
Separate the struct bio related stuff out of <sys/buf.h> into <sys/bio.h>.
<sys/bio.h> is now a prerequisite for <sys/buf.h> but it shall not be made a nested include according to bdes teachings on the subject of nested includes.
Diskdrivers and similar stuff below specfs::strategy() should no longer need to include <sys/buf.> unless they need caching of data.
Still a few bogus uses of struct buf to track down.
Repocopy by: peter
|
#
59794 |
|
30-Apr-2000 |
phk |
Remove unneeded #include <vm/vm_zone.h>
Generated by: src/tools/tools/kerninclude
|
#
59288 |
|
16-Apr-2000 |
jlemon |
Introduce kqueue() and kevent(), a kernel event notification facility.
|
#
58909 |
|
02-Apr-2000 |
dillon |
Change the write-behind code to take more care when starting async I/O's. The sequential read heuristic has been extended to cover writes as well. We continue to call cluster_write() normally, thus blocks in the file will still be reallocated for large (but still random) I/O's, but I/O will only be initiated for truely sequential writes.
This solves a number of annoying situations, especially with DBM (hash method) writes, and also has the side effect of fixing a number of (stupid) benchmarks.
Reviewed-by: mckusick
|
#
55756 |
|
10-Jan-2000 |
phk |
Give vn_isdisk() a second argument where it can return a suitable errno.
Suggested by: bde
|
#
55696 |
|
10-Jan-2000 |
mckusick |
Add bwillwrite to all system calls that create things in the filesystem. Benchmarks that create huge trees of empty files overwhelm the buffer cache.
|
#
54655 |
|
15-Dec-1999 |
eivind |
Introduce NDFREE (and remove VOP_ABORTOP)
|
#
53350 |
|
18-Nov-1999 |
dillon |
Ensure that garbage from the kernel stack does not wind up being returned to user mode in the spare fields of the stat structure.
PR: kern/14966 Reviewed by: dillon@freebsd.org Submitted by: Kelly Yancey kbyanc@posi.net
|
#
52985 |
|
08-Nov-1999 |
peter |
Add a vnode fo_stat() entry point.
|
#
51418 |
|
19-Sep-1999 |
green |
This is what was "fdfix2.patch," a fix for fd sharing. It's pretty far-reaching in fd-land, so you'll want to consult the code for changes. The biggest change is that now, you don't use fp->f_ops->fo_foo(fp, bar) but instead fo_foo(fp, bar), which increments and decrements the fp refcount upon entry and exit. Two new calls, fhold() and fdrop(), are provided. Each does what it seems like it should, and if fdrop() brings the refcount to zero, the fd is freed as well.
Thanks to peter ("to hell with it, it looks ok to me.") for his review. Thanks to msmith for keeping me from putting locks everywhere :)
Reviewed by: peter
|
#
51111 |
|
09-Sep-1999 |
julian |
Changes to centralise the default blocksize behaviour. More likely to follow.
Submitted by: phk@freebsd.org
|
#
50830 |
|
03-Sep-1999 |
julian |
Revert a bunch of contraversial changes by PHK. After a quick think and discussion among various people some form of some of these changes will probably be recommitted.
The reversion requested was requested by dg while discussions proceed. PHK has indicated that he can live with this, and it has been agreed that some form of some of these changes may return shortly after further discussion.
|
#
50727 |
|
01-Sep-1999 |
phk |
Improve the returned values in st_blksize a little bit, avoid accessing union fields not valid for dev_t type.
|
#
50623 |
|
30-Aug-1999 |
phk |
Make bdev userland access work like cdev userland access unless the highly non-recommended option ALLOW_BDEV_ACCESS is used.
(bdev access is evil because you don't get write errors reported.)
Kill si_bsize_best before it kills Matt :-)
Use the specfs routines rather having cloned copies in devfs.
|
#
50477 |
|
28-Aug-1999 |
peter |
$Id$ -> $FreeBSD$
|
#
50459 |
|
27-Aug-1999 |
green |
Add FIODTYPE ioctl for getting d_flags (type) info on a device.
Okayed by: phk
|
#
50346 |
|
25-Aug-1999 |
phk |
Add a couple of missing but unimportant break; statements.
|
#
49683 |
|
13-Aug-1999 |
phk |
oops: Add missing include.
|
#
49682 |
|
13-Aug-1999 |
phk |
Move the special-casing of stat(2)->st_blksize for device files from UFS to the generic level. For chr/blk devices we don't care about the blocksize of the filesystem, we want what the device asked for.
|
#
49413 |
|
04-Aug-1999 |
green |
Fix fd race conditions (during shared fd table usage.) Badfileops is now used in f_ops in place of NULL, and modifications to the files are more carefully ordered. f_ops should also be set to &badfileops upon "close" of a file.
This does not fix other problems mentioned in this PR than the first one.
PR: 11629 Reviewed by: peter
|
#
49101 |
|
26-Jul-1999 |
alc |
Add sysctl and support code to allow directories to be VMIO'd. The default setting for the sysctl is OFF, which is the historical operation.
Submitted by: dillon
|
#
48677 |
|
08-Jul-1999 |
mckusick |
These changes appear to give us benefits with both small (32MB) and large (1G) memory machine configurations. I was able to run 'dbench 32' on a 32MB system without bring the machine to a grinding halt.
* buffer cache hash table now dynamically allocated. This will have no effect on memory consumption for smaller systems and will help scale the buffer cache for larger systems.
* minor enhancement to pmap_clearbit(). I noticed that all the calls to it used constant arguments. Making it an inline allows the constants to propogate to deeper inlines and should produce better code.
* removal of inherent vfs_ioopt support through the emplacement of appropriate #ifdef's, with John's permission. If we do not find a use for it by the end of the year we will remove it entirely.
* removal of getnewbufloops* counters & sysctl's - no longer necessary for debugging, getnewbuf() is now optimal.
* buffer hash table functions removed from sys/buf.h and localized to vfs_bio.c
* VFS_BIO_NEED_DIRTYFLUSH flag and support code added ( bwillwrite() ), allowing processes to block when too many dirty buffers are present in the system.
* removal of a softdep test in bdwrite() that is no longer necessary now that bdwrite() no longer attempts to flush dirty buffers.
* slight optimization added to bqrelse() - there is no reason to test for available buffer space on B_DELWRI buffers.
* addition of reverse-scanning code to vfs_bio_awrite(). vfs_bio_awrite() will attempt to locate clusterable areas in both the forward and reverse direction relative to the offset of the buffer passed to it. This will probably not make much of a difference now, but I believe we will start to rely on it heavily in the future if we decide to shift some of the burden of the clustering closer to the actual I/O initiation.
* Removal of the newbufcnt and lastnewbuf counters that Kirk added. They do not fix any race conditions that haven't already been fixed by the gbincore() test done after the only call to getnewbuf(). getnewbuf() is a static, so there is no chance of it being misused by other modules. ( Unless Kirk can think of a specific thing that this code fixes. I went through it very carefully and didn't see anything ).
* removal of VOP_ISLOCKED() check in flushbufqueues(). I do not think this check is necessary, the buffer should flush properly whether the vnode is locked or not. ( yes? ).
* removal of extra arguments passed to getnewbuf() that are not necessary.
* missed cluster_wbuild() that had to be a cluster_wbuild_wb() in vfs_cluster.c
* vn_write() now calls bwillwrite() *PRIOR* to locking the vnode, which should greatly aid flushing operations in heavy load situations - both the pageout and update daemons will be able to operate more efficiently.
* removal of b_usecount. We may add it back in later but for now it is useless. Prior implementations of the buffer cache never had enough buffers for it to be useful, and current implementations which make more buffers available might not benefit relative to the amount of sophistication required to implement a b_usecount. Straight LRU should work just as well, especially when most things are VMIO backed. I expect that (even though John will not like this assumption) directories will become VMIO backed some point soon.
Submitted by: Matthew Dillon <dillon@backplane.com> Reviewed by: Kirk McKusick <mckusick@mckusick.com>
|
#
48468 |
|
02-Jul-1999 |
phk |
Make sure that stat(2) and friends always return a valid st_dev field.
Pseudo-FS need not fill in the va_fsid anymore, the syscall code will use the first half of the fsid, which now looks like a udev_t with major 255.
|
#
46155 |
|
28-Apr-1999 |
phk |
This Implements the mumbled about "Jail" feature.
This is a seriously beefed up chroot kind of thing. The process is jailed along the same lines as a chroot does it, but with additional tough restrictions imposed on what the superuser can do.
For all I know, it is safe to hand over the root bit inside a prison to the customer living in that prison, this is what it was developed for in fact: "real virtual servers".
Each prison has an ip number associated with it, which all IP communications will be coerced to use and each prison has its own hostname.
Needless to say, you need more RAM this way, but the advantage is that each customer can run their own particular version of apache and not stomp on the toes of their neighbors.
It generally does what one would expect, but setting up a jail still takes a little knowledge.
A few notes:
I have no scripts for setting up a jail, don't ask me for them.
The IP number should be an alias on one of the interfaces.
mount a /proc in each jail, it will make ps more useable.
/proc/<pid>/status tells the hostname of the prison for jailed processes.
Quotas are only sensible if you have a mountpoint per prison.
There are no privisions for stopping resource-hogging.
Some "#ifdef INET" and similar may be missing (send patches!)
If somebody wants to take it from here and develop it into more of a "virtual machine" they should be most welcome!
Tools, comments, patches & documentation most welcome.
Have fun...
Sponsored by: http://www.rndassociates.com/ Run for almost a year by: http://www.servetheweb.com/
|
#
46112 |
|
27-Apr-1999 |
phk |
Suser() simplification:
1: s/suser/suser_xxx/
2: Add new function: suser(struct proc *), prototyped in <sys/proc.h>.
3: s/suser_xxx(\([a-zA-Z0-9_]*\)->p_ucred, \&\1->p_acflag)/suser(\1)/
The remaining suser_xxx() calls will be scrutinized and dealt with later.
There may be some unneeded #include <sys/cred.h>, but they are left as an exercise for Bruce.
More changes to the suser() API will come along with the "jail" code.
|
#
45893 |
|
21-Apr-1999 |
alc |
Address several problems in vn_read and vn_write:
1. Make read-ahead work for pread and aio_read.
2. Fix one place where a comparison of uio_offset with -1 wasn't updated to use FOF_OFFSET.
3. Honor O_APPEND in the FOF_OFFSET case.
In addition, use the variable name "ioflag" in both vn_read and vn_write to avoid possible confusion between the variable "flag" and the parameter "flags".
Submitted by: Bruce Evans <bde@zeta.org.au> and me
|
#
45311 |
|
04-Apr-1999 |
dt |
Add standard padding argument to pread and pwrite syscall. That should make them NetBSD compatible.
Add parameter to fo_read and fo_write. (The only flag FOF_OFFSET mean that the offset is set in the struct uio).
Factor out some common code from read/pread/write/pwrite syscalls.
|
#
45051 |
|
26-Mar-1999 |
alc |
Changed vn_read/write such that fp->f_offset isn't touched if uio->uio_offset != -1. This fixes a problem with aio_read/write and permits a straightforward implementation of pread/pwrite.
PR: kern/8669 Submitted by: John Plevyak <jplevyak@inktomi.com> Reviewed by: Matthew Dillon <dillon@apollo.backplane.com>
|
#
43426 |
|
30-Jan-1999 |
phk |
Use suser() to determine super-user-ness, don't examine cr_uid directly.
|
#
42900 |
|
20-Jan-1999 |
eivind |
Add 'options DEBUG_LOCKS', which stores extra information in struct lock, and add some macros and function parameters to make sure that the information get to the point where it can be put in the lock structure.
While I'm here, add DEBUG_VFS_LOCKS to LINT.
|
#
42315 |
|
05-Jan-1999 |
eivind |
Remove the 'waslocked' parameter to vfs_object_create().
|
#
40815 |
|
02-Nov-1998 |
peter |
Only do one VOP_ACCESS() per open() instead of two. This should reduce the NFSv3 ACCESS RPC problems a little for busy clients that do a lot of open/close. The nfs code could probably cache the results, but I'm not sure whether this would be legal or useful. The problem is that with a CPU farm, on each open there would be a lookup, getattr then access RPC then the read/write RPC activity. Caching the access results probably isn't going to help much if the clients access lots of files. Having the nfs_access() routine interpret the getattr results is a bit of a hack, but it's how NFSv2 is done and it might be OK for a mount attribute for v3.
|
#
37180 |
|
27-Jun-1998 |
phk |
Report the mode as the result of the VOP_GETATTR rather than the vnodes type, they may not correspond.
|
#
36735 |
|
07-Jun-1998 |
dfr |
This commit fixes various 64bit portability problems required for FreeBSD/alpha. The most significant item is to change the command argument to ioctl functions from int to u_long. This change brings us inline with various other BSD versions. Driver writers may like to use (__FreeBSD_version == 300003) to detect this change.
The prototype FreeBSD/alpha machdep will follow in a couple of days time.
|
#
35823 |
|
07-May-1998 |
msmith |
In the words of the submitter:
--------- Make callers of namei() responsible for releasing references or locks instead of having the underlying filesystems do it. This eliminates redundancy in all terminal filesystems and makes it possible for stacked transport layers such as umapfs or nullfs to operate correctly.
Quality testing was done with testvn, and lat_fs from the lmbench suite.
Some NFS client testing courtesy of Patrik Kudo.
vop_mknod and vop_symlink still release the returned vpp. vop_rename still releases 4 vnode arguments before it returns. These remaining cases will be corrected in the next set of patches. ---------
Submitted by: Michael Hancock <michaelh@cet.co.jp>
|
#
35113 |
|
10-Apr-1998 |
alex |
Grammar police.
|
#
35105 |
|
08-Apr-1998 |
wosch |
New mount option nosymfollow. If enabled, the kernel lookup() function will not follow symbolic links on the mounted file system and return EACCES (Permission denied).
|
#
35088 |
|
06-Apr-1998 |
peter |
Today is not my lucky day. Fix missing brace and I got a request to use EMLINK instead.
|
#
35086 |
|
06-Apr-1998 |
peter |
Use a different errno (ELOOP (as sef mentioned) since the text that goes with the error sounds ok for the condition) if O_NOFOLLOW gets a link.
|
#
35085 |
|
06-Apr-1998 |
peter |
Rather than let users get fd's to symlink files, make O_NOFOLLOW cause an error if it gets a link (like it does if it gets a socket). The implications of letting users try and do file operations on symlinks themselves were too worrying.
|
#
35082 |
|
06-Apr-1998 |
peter |
Implement a new open(2) flag: O_NOFOLLOW. This will instruct open to not follow symlinks, but to open a handle on the link itself(!). As strange as this might sound, it has several useful applications safe race-free ways of opening files in hostile areas (eg: /tmp, a mode 1777 /var/mail, etc). It also would allow things like fchown() to work on the link rather than having to implement a new syscall specifically for that task.
Reviewed by: phk
|
#
33822 |
|
25-Feb-1998 |
bde |
Removed unused #includes.
|
#
33134 |
|
06-Feb-1998 |
eivind |
Back out DIAGNOSTIC changes.
|
#
33108 |
|
04-Feb-1998 |
eivind |
Turn DIAGNOSTIC into a new-style option.
|
#
32454 |
|
12-Jan-1998 |
dyson |
Fix some vnode management problems, and better mgmt of vnode free list. Fix the UIO optimization code. Fix an assumption in vm_map_insert regarding allocation of swap pagers. Fix an spl problem in the collapse handling in vm_object_deallocate. When pages are freed from vnode objects, and the criteria for putting the associated vnode onto the free list is reached, either put the vnode onto the list, or put it onto an interrupt safe version of the list, for further transfer onto the actual free list. Some minor syntax changes changing pre-decs, pre-incs to post versions. Remove a bogus timeout (that I added for debugging) from vn_lock.
PHK will likely still have problems with the vnode list management, and so do I, but it is better than it was.
|
#
32286 |
|
06-Jan-1998 |
dyson |
Make our v_usecount vnode reference count work identically to the original BSD code. The association between the vnode and the vm_object no longer includes reference counts. The major difference is that vm_object's are no longer freed gratuitiously from the vnode, and so once an object is created for the vnode, it will last as long as the vnode does.
When a vnode object reference count is incremented, then the underlying vnode reference count is incremented also. The two "objects" are now more intimately related, and so the interactions are now much less complex.
When vnodes are now normally placed onto the free queue with an object still attached. The rundown of the object happens at vnode rundown time, and happens with exactly the same filesystem semantics of the original VFS code. There is absolutely no need for vnode_pager_uncache and other travesties like that anymore.
A side-effect of these changes is that SMP locking should be much simpler, the I/O copyin/copyout optimizations work, NFS should be more ponderable, and further work on layered filesystems should be less frustrating, because of the totally coherent management of the vnode objects and vnodes.
Please be careful with your system while running this code, but I would greatly appreciate feedback as soon a reasonably possible.
|
#
32072 |
|
29-Dec-1997 |
dyson |
Fix the decl of vfs_ioopt, allow LFS to compile again, fix a minor problem with the object cache removal.
|
#
32071 |
|
29-Dec-1997 |
dyson |
Lots of improvements, including restructring the caching and management of vnodes and objects. There are some metadata performance improvements that come along with this. There are also a few prototypes added when the need is noticed. Changes include:
1) Cleaning up vref, vget. 2) Removal of the object cache. 3) Nuke vnode_pager_uncache and friends, because they aren't needed anymore. 4) Correct some missing LK_RETRY's in vn_lock. 5) Correct the page range in the code for msync.
Be gentle, and please give me feedback asap.
|
#
31564 |
|
06-Dec-1997 |
sef |
Changes to allow event-based process monitoring and control.
|
#
31443 |
|
29-Nov-1997 |
dyson |
Fix and complete the AIO syscalls. There are some performance enhancements coming up soon, but the code is functional. Docs will be forthcoming.
|
#
31016 |
|
07-Nov-1997 |
phk |
Remove a bunch of variables which were unused both in GENERIC and LINT.
Found by: -Wunused
|
#
30783 |
|
27-Oct-1997 |
bde |
Use 127 instead of CHAR_MAX for the limit on the sequence count. The limit doesn't have anything to do with characters. The count mainly needs to fit in the VOP_READ() ioflag after being left shifted by 16.
Moved vn_lock() before vn_closefile(). vn_lock() was mismerged from Lite2.
Removed some gratuitous braces.
|
#
30137 |
|
06-Oct-1997 |
dyson |
Relax the vnode locking for read only operations.
|
#
29360 |
|
14-Sep-1997 |
peter |
vn_select -> vn_poll
|
#
29041 |
|
02-Sep-1997 |
bde |
Removed unused #includes.
|
#
24625 |
|
04-Apr-1997 |
dfr |
[Previous comment was incorrect for these files] Added calls to VFS lock debugging macros to make fixing filesystems' locking easier.
|
#
24624 |
|
04-Apr-1997 |
dfr |
Add a function vop_sharedlock which a copy of vop_nolock without the implementation #ifdef out. This can be used for now by NFS. As soon as all the other filesystems' locking is fixed, this can go away.
Print the vnode address in vprint for easier debugging.
|
#
24206 |
|
24-Mar-1997 |
bde |
Don't include <sys/ioctl.h> in the kernel. Stage 4: include <sys/ttycom.h> and sometimes <sys/filio.h> instead of <sys/ioctl.h> in miscellaneous files. Most of these files have nothing to do with ttys but need to include <sys/ttycom.h> to get the definitions of TIOC[SG]PGRP which are (ab)used to convert F[SG]ETOWN fcntls into ioctls.
|
#
24131 |
|
23-Mar-1997 |
bde |
Don't #include <sys/fcntl.h> in <sys/file.h> if KERNEL is defined. Fixed everything that depended on getting fcntl.h stuff from the wrong place. Most things don't depend on file.h stuff at all.
|
#
23519 |
|
08-Mar-1997 |
guido |
Fix style bugs and other bugs in the NFS fix.
|
#
23474 |
|
07-Mar-1997 |
gpalmer |
Fix (I hope) the NFS hole. This is only compile tested.
Submitted by: (partly) davids@SECNET.COM via BUGTRAQ
|
#
22975 |
|
22-Feb-1997 |
peter |
Back out part 1 of the MCFH that changed $Id$ to $FreeBSD$. We are not ready for it yet.
|
#
22521 |
|
10-Feb-1997 |
dyson |
This is the kernel Lite/2 commit. There are some requisite userland changes, so don't expect to be able to run the kernel as-is (very well) without the appropriate Lite/2 userland changes.
The system boots and can mount UFS filesystems.
Untested: ext2fs, msdosfs, NFS Known problems: Incorrect Berkeley ID strings in some files. Mount_std mounts will not work until the getfsent library routine is changed.
Reviewed by: various people Submitted by: Jeffery Hsu <hsu@freebsd.org>
|
#
21673 |
|
14-Jan-1997 |
jkh |
Make the long-awaited change from $Id$ to $FreeBSD$
This will make a number of things easier in the future, as well as (finally!) avoiding the Id-smashing problem which has plagued developers for so long.
Boy, I'm glad we're not using sup anymore. This update would have been insane otherwise.
|
#
21002 |
|
29-Dec-1996 |
dyson |
This commit is the embodiment of some VFS read clustering improvements. Firstly, now our read-ahead clustering is on a file descriptor basis and not on a per-vnode basis. This will allow multiple processes reading the same file to take advantage of read-ahead clustering. Secondly, there previously was a problem with large reads still using the ramp-up algorithm. Of course, that was bogus, and now we read the entire "chunk" off of the disk in one operation. The read-ahead clustering algorithm should use less CPU than the previous also (I hope :-)).
NOTE: THAT LKMS MUST BE REBUILT!!!
|
#
17761 |
|
21-Aug-1996 |
dyson |
Even though this looks like it, this is not a complex code change. The interface into the "VMIO" system has changed to be more consistant and robust. Essentially, it is now no longer necessary to call vn_open to get merged VM/Buffer cache operation, and exceptional conditions such as merged operation of VBLK devices is simpler and more correct.
This code corrects a potentially large set of problems including the problems with ktrace output and loaded systems, file create/deletes, etc.
Most of the changes to NFS are cosmetic and name changes, eliminating a layer of subroutine calls. The direct calls to vput/vrele have been re-instituted for better cross platform compatibility.
Reviewed by: davidg
|
#
14424 |
|
09-Mar-1996 |
dyson |
Remove a now unnecessary function prototype.
|
#
14317 |
|
02-Mar-1996 |
dyson |
Enable VMIO for non-VDIR metadata and block device.
|
#
13490 |
|
19-Jan-1996 |
dyson |
Eliminated many redundant vm_map_lookup operations for vm_mmap. Speed up for vfs_bio -- addition of a routine bqrelse to greatly diminish overhead for merged cache. Efficiency improvement for vfs_cluster. It used to do alot of redundant calls to cluster_rbuild. Correct the ordering for vrele of .text and release of credentials. Use the selective tlb update for 486/586/P6. Numerous fixes to the size of objects allocated for files. Additionally, fixes in the various pagers. Fixes for proper positioning of vnode_pager_setsize in msdosfs and ext2fs. Fixes in the swap pager for exhausted resources. The pageout code will not as readily thrash. Change the page queue flags (PG_ACTIVE, PG_INACTIVE, PG_FREE, PG_CACHE) into page queue indices (PQ_ACTIVE, PQ_INACTIVE, PQ_FREE, PQ_CACHE), thereby improving efficiency of several routines. Eliminate even more unnecessary vm_page_protect operations. Significantly speed up process forks. Make vm_object_page_clean more efficient, thereby eliminating the pause that happens every 30seconds. Make sequential clustered writes B_ASYNC instead of B_DELWRI even in the case of filesystems mounted async. Fix a panic with busy pages when write clustering is done for non-VMIO buffers.
|
#
12913 |
|
17-Dec-1995 |
phk |
Staticize. Unstaticize a function in scsi/scsi_base that was used, with an undocumented option. My last count on the LINT kernel shows: Total symbols: 3647 unref symbols: 463 undef symbols: 4 1 ref symbols: 1751 2 ref symbols: 485 Approaching the pain threshold now.
|
#
12767 |
|
11-Dec-1995 |
dyson |
Changes to support 1Tb filesizes. Pages are now named by an (object,index) pair instead of (object,offset) pair.
|
#
12662 |
|
07-Dec-1995 |
dg |
Untangled the vm.h include file spaghetti.
|
#
11644 |
|
22-Oct-1995 |
dg |
Moved the filesystem read-only check out of the syscalls and into the filesystem layer, as was done in lite-2. Merged in some other cosmetic changes while I was at it. Rewrote most of msdosfs_access() to be more like ufs_access() and to include the FS read-only check.
Obtained from: partially from 4.4BSD-lite2
|
#
11261 |
|
06-Oct-1995 |
phk |
A little hack to avoid a 64bit divide. Can go away if Gcc ever learns to optimise 64bit stuff...
|
#
9588 |
|
20-Jul-1995 |
dg |
vnode_pager_alloc() never returns NULL, so don't check for it.
|
#
9540 |
|
16-Jul-1995 |
bde |
Don't include <sys/tty.h> in drivers that aren't tty drivers or in general files that don't depend on the internals of <sys/tty.h>
|
#
9507 |
|
13-Jul-1995 |
dg |
NOTE: libkvm, w, ps, 'top', and any other utility which depends on struct proc or any VM system structure will have to be rebuilt!!!
Much needed overhaul of the VM system. Included in this first round of changes:
1) Improved pager interfaces: init, alloc, dealloc, getpages, putpages, haspage, and sync operations are supported. The haspage interface now provides information about clusterability. All pager routines now take struct vm_object's instead of "pagers".
2) Improved data structures. In the previous paradigm, there is constant confusion caused by pagers being both a data structure ("allocate a pager") and a collection of routines. The idea of a pager structure has escentially been eliminated. Objects now have types, and this type is used to index the appropriate pager. In most cases, items in the pager structure were duplicated in the object data structure and thus were unnecessary. In the few cases that remained, a un_pager structure union was created in the object to contain these items.
3) Because of the cleanup of #1 & #2, a lot of unnecessary layering can now be removed. For instance, vm_object_enter(), vm_object_lookup(), vm_object_remove(), and the associated object hash list were some of the things that were removed.
4) simple_lock's removed. Discussion with several people reveals that the SMP locking primitives used in the VM system aren't likely the mechanism that we'll be adopting. Even if it were, the locking that was in the code was very inadequate and would have to be mostly re-done anyway. The locking in a uni-processor kernel was a no-op but went a long way toward making the code difficult to read and debug.
5) Places that attempted to kludge-up the fact that we don't have kernel thread support have been fixed to reflect the reality that we are really dealing with processes, not threads. The VM system didn't have complete thread support, so the comments and mis-named routines were just wrong. We now use tsleep and wakeup directly in the lock routines, for instance.
6) Where appropriate, the pagers have been improved, especially in the pager_alloc routines. Most of the pager_allocs have been rewritten and are now faster and easier to maintain.
7) The pagedaemon pageout clustering algorithm has been rewritten and now tries harder to output an even number of pages before and after the requested page. This is sort of the reverse of the ideal pagein algorithm and should provide better overall performance.
8) Unnecessary (incorrect) casts to caddr_t in calls to tsleep & wakeup have been removed. Some other unnecessary casts have also been removed.
9) Some almost useless debugging code removed.
10) Terminology of shadow objects vs. backing objects straightened out. The fact that the vm_object data structure escentially had this backwards really confused things. The use of "shadow" and "backing object" throughout the code is now internally consistent and correct in the Mach terminology.
11) Several minor bug fixes, including one in the vm daemon that caused 0 RSS objects to not get purged as intended.
12) A "default pager" has now been created which cleans up the transition of objects to the "swap" type. The previous checks throughout the code for swp->pg_data != NULL were really ugly. This change also provides the rudiments for future backing of "anonymous" memory by something other than the swap pager (via the vnode pager, for example), and it allows the decision about which of these pagers to use to be made dynamically (although will need some additional decision code to do this, of course).
13) (dyson) MAP_COPY has been deprecated and the corresponding "copy object" code has been removed. MAP_COPY was undocumented and non- standard. It was furthermore broken in several ways which caused its behavior to degrade to MAP_PRIVATE. Binaries that use MAP_COPY will continue to work correctly, but via the slightly different semantics of MAP_PRIVATE.
14) (dyson) Sharing maps have been removed. It's marginal usefulness in a threads design can be worked around in other ways. Both #12 and #13 were done to simplify the code and improve readability and maintain- ability. (As were most all of these changes)
TODO:
1) Rewrite most of the vnode pager to use VOP_GETPAGES/PUTPAGES. Doing this will reduce the vnode pager to a mere fraction of its current size.
2) Rewrite vm_fault and the swap/vnode pagers to use the clustering information provided by the new haspage pager interface. This will substantially reduce the overhead by eliminating a large number of VOP_BMAP() calls. The VOP_BMAP() filesystem interface should be improved to provide both a "behind" and "ahead" indication of contiguousness.
3) Implement the extended features of pager_haspage in swap_pager_haspage(). It currently just says 0 pages ahead/behind.
4) Re-implement the swap device (swstrategy) in a more elegant way, perhaps via a much more general mechanism that could also be used for disk striping of regular filesystems.
5) Do something to improve the architecture of vm_object_collapse(). The fact that it makes calls into the swap pager and knows too much about how the swap pager operates really bothers me. It also doesn't allow for collapsing of non-swap pager objects ("unnamed" objects backed by other pagers).
|
#
9456 |
|
09-Jul-1995 |
dg |
Moved call to VOP_GETATTR() out of vnode_pager_alloc() and into the places that call vnode_pager_alloc() so that a failure return can be dealt with. This fixes a panic seen on NFS clients when a file being opened is deleted on the server before the open completes.
|
#
9358 |
|
28-Jun-1995 |
dg |
Removed extra semicolon.
|
#
9356 |
|
28-Jun-1995 |
dg |
1) Converted v_vmdata to v_object. 2) Removed unnecessary vm_object_lookup()/pager_cache(object, TRUE) pairs after vnode_pager_alloc() calls - the object is already guaranteed to be persistent. 3) Removed some gratuitous casts.
|
#
8876 |
|
30-May-1995 |
rgrimes |
Remove trailing whitespace.
|
#
8417 |
|
10-May-1995 |
dg |
Unlock the vnode before sleeping on an OBJ_DEAD object. Should fix Bruce's hang. Fixed some formatting anomolies and removed some unneeded casts.
|
#
7160 |
|
19-Mar-1995 |
dg |
Removed unnecessary call to vnode_pager_uncache(). We automatically clear the VTEXT flag after all mappers have finished with the object.
|
#
6364 |
|
14-Feb-1995 |
phk |
YFfix
|
#
5455 |
|
09-Jan-1995 |
dg |
These changes embody the support of the fully coherent merged VM buffer cache, much higher filesystem I/O performance, and much better paging performance. It represents the culmination of over 6 months of R&D.
The majority of the merged VM/cache work is by John Dyson.
The following highlights the most significant changes. Additionally, there are (mostly minor) changes to the various filesystem modules (nfs, msdosfs, etc) to support the new VM/buffer scheme.
vfs_bio.c: Significant rewrite of most of vfs_bio to support the merged VM buffer cache scheme. The scheme is almost fully compatible with the old filesystem interface. Significant improvement in the number of opportunities for write clustering.
vfs_cluster.c, vfs_subr.c Upgrade and performance enhancements in vfs layer code to support merged VM/buffer cache. Fixup of vfs_cluster to eliminate the bogus pagemove stuff.
vm_object.c: Yet more improvements in the collapse code. Elimination of some windows that can cause list corruption.
vm_pageout.c: Fixed it, it really works better now. Somehow in 2.0, some "enhancements" broke the code. This code has been reworked from the ground-up.
vm_fault.c, vm_page.c, pmap.c, vm_object.c Support for small-block filesystems with merged VM/buffer cache scheme.
pmap.c vm_map.c Dynamic kernel VM size, now we dont have to pre-allocate excessive numbers of kernel PTs.
vm_glue.c Much simpler and more effective swapping code. No more gratuitous swapping.
proc.h Fixed the problem that the p_lock flag was not being cleared on a fork.
swap_pager.c, vnode_pager.c Removal of old vfs_bio cruft to support the past pseudo-coherency. Now the code doesn't need it anymore.
machdep.c Changes to better support the parameter values for the merged VM/buffer cache scheme.
machdep.c, kern_exec.c, vm_glue.c Implemented a seperate submap for temporary exec string space and another one to contain process upages. This eliminates all map fragmentation problems that previously existed.
ffs_inode.c, ufs_inode.c, ufs_readwrite.c Changes for merged VM/buffer cache. Add "bypass" support for sneaking in on busy buffers.
Submitted by: John Dyson and David Greenman
|
#
3374 |
|
05-Oct-1994 |
dg |
Stuff object into v_vmdata rather than pager. Not important which at the moment, but will be in the future. Other changes mostly cosmetic, but are made for future VMIO considerations.
Submitted by: John Dyson
|
#
3308 |
|
02-Oct-1994 |
phk |
All of this is cosmetic. prototypes, #includes, printfs and so on. Makes GCC a lot more silent.
|
#
2102 |
|
18-Aug-1994 |
dg |
Moved over my fix for vnode lossage when multiple TIOCSCTTY ioctls are done. This patch was extended to also include a suggested change by Kirk McKusick which allows the control tty to be reasigned to a different tty without losing a vnode.
|
#
1817 |
|
02-Aug-1994 |
dg |
Added $Id$
|
#
1549 |
|
25-May-1994 |
rgrimes |
The big 4.4BSD Lite to FreeBSD 2.0.0 (Development) patch.
Reviewed by: Rodney W. Grimes Submitted by: John Dyson and David Greenman
|
#
1542 |
|
24-May-1994 |
rgrimes |
This commit was generated by cvs2svn to compensate for changes in r1541, which included commits to RCS files with non-trunk default branches.
|
#
1541 |
|
24-May-1994 |
rgrimes |
BSD 4.4 Lite Kernel Sources
|