#
300600 |
|
24-May-2016 |
kib |
MFC r300083: If IO_SYNC was passed to ffs_truncate(), request synchronous inode update from the final ffs_update().
|
#
256281 |
|
10-Oct-2013 |
gjb |
Copy head (r256279) to stable/10 as part of the 10.0-RELEASE cycle.
Approved by: re (implicit) Sponsored by: The FreeBSD Foundation |
#
251171 |
|
30-May-2013 |
jeff |
- Convert the bufobj lock to rwlock. - Use a shared bufobj lock in getblk() and inmem(). - Convert softdep's lk to rwlock to match the bufobj lock. - Move INFREECNT to b_flags and protect it with the buf lock. - Remove unnecessary locking around bremfree() and BKGRDINPROG.
Sponsored by: EMC / Isilon Storage Division Discussed with: mckusick, kib, mdf
|
#
246289 |
|
03-Feb-2013 |
mckusick |
For UFS2 i_blocks is unsigned. The current "sanity" check that it has gone below zero after the blocks in its inode are freed is a no-op which the compiler fails to warn about because of the use of the DIP macro. Change the sanity check to compare the number of blocks being freed against the value i_blocks. If the number of blocks being freed exceeds i_blocks, just set i_blocks to zero.
Reported by: Pedro Giffuni (pfg@) MFC after: 2 weeks
|
#
234605 |
|
23-Apr-2012 |
trasz |
Remove unused thread argument from vtruncbuf().
Reviewed by: kib
|
#
233629 |
|
28-Mar-2012 |
mckusick |
A refinement of change 232351 to avoid a race with a forcible unmount. While we have a snapshot vnode unlocked to avoid a deadlock with another inode in the same inode block being updated, the filesystem containing it may be forcibly unmounted. When that happens the snapshot vnode is revoked. We need to check for that condition and fail appropriately.
This change will be included along with 232351 when it is MFC'ed to 9.
Spotted by: kib Reviewed by: kib
|
#
233438 |
|
24-Mar-2012 |
mckusick |
Add a third flags argument to ffs_syncvnode to avoid a possible conflict with MNT_WAIT flags that passed in its second argument. This will be MFC'ed together with r232351.
Discussed with: kib
|
#
232837 |
|
11-Mar-2012 |
kib |
Remove superfluous brackets.
Submitted by: alc MFC after: 2 weeks
|
#
232836 |
|
11-Mar-2012 |
kib |
Do schedule delayed writes for async mounts. While there, make some style adjustments, like missed () around return values.
Submitted by: bde Reviewed by: mckusick Tested by: pho MFC after: 2 weeks
|
#
232835 |
|
11-Mar-2012 |
kib |
Do not fall back to slow synchronous i/o when low on memory or buffers. The bawrite() schedules the write to happen immediately, and its use frees the current thread to do more cleanups.
Submitted by: bde Reviewed by: mckusick Tested by: pho MFC after: 2 weeks
|
#
232351 |
|
01-Mar-2012 |
mckusick |
This change avoids a kernel deadlock on "snaplk" when using snapshots on UFS filesystems running with journaled soft updates. This is the first of several bugs that need to be fixed before removing the restriction added in -r230250 to prevent the use of snapshots on filesystems running with journaled soft updates.
The deadlock occurs when holding the snapshot lock (snaplk) and then trying to flush an inode via ffs_update(). We become blocked by another process trying to flush a different inode contained in the same inode block that we need. It holds the inode block for which we are waiting locked. When it tries to write the inode block, it gets blocked waiting for the our snaplk when it calls ffs_copyonwrite() to see if the inode block needs to be copied in our snapshot.
The most obvious place that this deadlock arises is in the ffs_copyonwrite() routine when it updates critical metadata in a snapshot and tries to write it out before proceeding. The fix here is to write the data and indirect block pointer for the snapshot, but to skip the call to ffs_update() to write the snapshot inode. To ensure that we will never have to update a pointer in the inode itself, the ffs_snapshot() routine that creates the snapshot has to ensure that all the direct blocks are allocated as part of the creation of the snapshot.
A less obvious place that this deadlock occurs is when we hold the snaplk because we are deleting a snapshot. In the course of doing the deletion, we need to allocate various soft update dependency structures and allocate some journal space. If we hit a resource limit while doing this we decrease the resources in use by flushing out an existing dirty file to get it to give up the soft dependency resources that it holds. The flush can cause an ffs_update() to be done on the inode for the file that we have selected to flush resulting in the same deadlock as described above when the inode that we have chosen to flush resides in the same inode block as the snapshot inode that we hold. The fix is to defer cleaning up any time that the inode on which we are operating is a snapshot.
Help and review by: Jeff Roberson Tested by: Peter Holm MFC (to 9 only) after: 2 weeks
|
#
225166 |
|
25-Aug-2011 |
mm |
Generalize ffs_pages_remove() into vn_pages_remove().
Remove mapped pages for all dataset vnodes in zfs_rezget() using new vn_pages_remove() to fix mmapped files changed by zfs rollback or zfs receive -F.
PR: kern/160035, kern/156933 Reviewed by: kib, pjd Approved by: re (kib) MFC after: 1 week
|
#
224061 |
|
15-Jul-2011 |
mckusick |
Add an FFS specific mount option to allow a filesystem checker (typically fsck_ffs) to register that it wishes to use FFS specific sysctl's to update the filesystem. This ensures that two checkers cannot run on a given filesystem at the same time and that no other process accidentally or maliciously uses the filesystem updating sysctls inappropriately. This functionality is needed by the journaling soft-updates recovery code.
|
#
223677 |
|
29-Jun-2011 |
alc |
Add a new option, OBJPR_NOTMAPPED, to vm_object_page_remove(). Passing this option to vm_object_page_remove() asserts that the specified range of pages is not mapped, or more precisely that none of these pages have any managed mappings. Thus, vm_object_page_remove() need not call pmap_remove_all() on the pages.
This change not only saves time by eliminating pointless calls to pmap_remove_all(), but it also eliminates an inconsistency in the use of pmap_remove_all() versus related functions, like pmap_remove_write(). It eliminates harmless but pointless calls to pmap_remove_all() that were being performed on PG_UNMANAGED pages.
Update all of the existing assertions on pmap_remove_all() to reflect this change.
Reviewed by: kib
|
#
223127 |
|
15-Jun-2011 |
mckusick |
Ensure that filesystem metadata contained within persistent snapshots is always kept consistent.
Suggested by: Jeff Roberson
|
#
222958 |
|
10-Jun-2011 |
jeff |
Implement fully asynchronous partial truncation with softupdates journaling to resolve errors which can cause corruption on recovery with the old synchronous mechanism.
- Append partial truncation freework structures to indirdeps while truncation is proceeding. These prevent new block pointers from becoming valid until truncation completes and serialize truncations. - On completion of a partial truncate journal work waits for zeroed pointers to hit indirects. - softdep_journal_freeblocks() handles last frag allocation and last block zeroing. - vtruncbuf/ffs_page_remove moved into softdep_*_freeblocks() so it is only implemented in one place. - Block allocation failure handling moved up one level so it does not proceed with buf locks held. This permits us to do more extensive reclaims when filesystem space is exhausted. - softdep_sync_metadata() is broken into two parts, the first executes once at the start of ffs_syncvnode() and flushes truncations and inode dependencies. The second is called on each locked buf. This eliminates excessive looping and rollbacks. - Improve the mechanism in process_worklist_item() that handles acquiring vnode locks for handle_workitem_remove() so that it works more generally and does not loop excessively over the same worklist items on each call. - Don't corrupt directories by zeroing the tail in fsck. This is only done for regular files. - Push a fsync complete record for files that need it so the checker knows a truncation in the journal is no longer valid.
Discussed with: mckusick, kib (ffs_pages_remove and ffs_truncate parts) Tested by: pho
|
#
215113 |
|
11-Nov-2010 |
kib |
Add function lbn_offset to calculate offset of the indirect block of given level.
Reviewed by: jeff Tested by: pho
|
#
209717 |
|
06-Jul-2010 |
jeff |
- Handle the truncation of an inode with an effective link count of 0 in the context of the process that reduced the effective count. Previously all truncation as a result of unlink happened in the softdep flush thread. This had the effect of being impossible to rate limit properly with the journal code. Now the process issuing unlinks is suspended when the journal files. This has a side-effect of improving rm performance by allowing more concurrent work. - Handle two cases in inactive, one for effnlink == 0 and another when nlink finally reaches 0. - Eliminate the SPACECOUNTED related code since the truncation is no longer delayed.
Discussed with: mckusick
|
#
207141 |
|
24-Apr-2010 |
jeff |
- Merge soft-updates journaling from projects/suj/head into head. This brings in support for an optional intent log which eliminates the need for background fsck on unclean shutdown.
Sponsored by: iXsystems, Yahoo!, and Juniper. With help from: McKusick and Peter Holm
|
#
187790 |
|
27-Jan-2009 |
rwatson |
Following a fair amount of real world experience with ACLs and extended attributes since FreeBSD 5, make the following semantic changes:
- Don't update the inode modification time (mtime) when extended attributes (and hence also ACLs) are added, modified, or removed. - Don't update the inode access tie (atime) when extended attributes (and hence also ACLs) are queried.
This means that rsync (and related tools) won't improperly think that the data in the file has changed when only the ACL has changed.
Note that ffs_reallocblks() has not been changed to not update on an IO_EXT transaction, but currently EAs don't use the cluster write routines so this shouldn't be a problem. If EAs grow support for clustering, then VOP_REALLOCBLKS() will need to grow a flag argument to carry down IO_EXT to UFS.
MFC after: 1 week PR: ports/125739 Reported by: Alexander Zagrebin <alexz@visp.ru> Tested by: pluknet <pluknet@gmail.com>, Greg Byshenk <freebsd@byshenk.net> Discussed with: kib, kientzle, timur, Alexander Bokovoy <ab@samba.org>
|
#
187490 |
|
20-Jan-2009 |
kib |
The r187467 should remove all pages for V_NORMAL case too, because indirect block pages are not removed by the mentioned invocation of the vnode_pager_setsize().
Put a common code into the helper function ffs_pages_remove().
Reported and tested by: dchagin Reviewed by: ups MFC after: 3 weeks
|
#
187468 |
|
20-Jan-2009 |
kib |
When extending inode size, we call vnode_pager_setsize(), to have a address space where to put vnode pages, and then call UFS_BALLOC(), to actually allocate new block and map it. When UFS_BALLOC() returns error, sometimes we forget to revert the vm object size increase, allowing for the pages that are not backed by the logical disk blocks.
Revert vnode_pager_setsize() back when UFS_BALLOC() failed, for ffs_truncate() and ffs_write().
PR: 129956 Reviewed by: ups MFC after: 3 weeks
|
#
187467 |
|
20-Jan-2009 |
kib |
FFS puts the extended attributes blocks at the negative blocks for the vnode, from -1 down. When vinvalbuf(vp, V_ALT) is done for the vnode, it incorrectly does vm_object_page_remove(0, 0), removing all pages from the underlying vm object, not only the pages that back the extended attributes data.
Change vinvalbuf() to not remove any pages from the object when V_NORMAL or V_ALT are specified. Instead, the only in-tree caller in ffs_inode.c:ffs_truncate() that specifies V_ALT explicitely removes the corresponding page range. The V_NORMAL caller does vnode_pager_setsize(vp, 0) immediately after the call to vinvalbuf(V_NORMAL) already.
Reported by: csjp Reviewed by: ups MFC after: 3 weeks
|
#
184205 |
|
23-Oct-2008 |
des |
Retire the MALLOC and FREE macros. They are an abomination unto style(9).
MFC after: 3 months
|
#
183754 |
|
10-Oct-2008 |
attilio |
Remove the struct thread unuseful argument from bufobj interface. In particular following functions KPI results modified: - bufobj_invalbuf() - bufsync()
and BO_SYNC() "virtual method" of the buffer objects set. Main consumers of bufobj functions are affected by this change too and, in particular, functions which changed their KPI are: - vinvalbuf() - g_vfs_close()
Due to the KPI breakage, __FreeBSD_version will be bumped in a later commit.
As a side note, please consider just temporary the 'curthread' argument passing to VOP_SYNC() (in bufsync()) as it will be axed out ASAP
Reviewed by: kib Tested by: Giovanni Trematerra <giovanni dot trematerra at gmail dot com>
|
#
183070 |
|
16-Sep-2008 |
kib |
When downgrading the read-write mount to read-only, do_unmount() sets MNT_RDONLY flag before the VFS_MOUNT() is called. In ufs_inactive() and ufs_itimes_locked(), UFS verifies whether the fs is read-only by checking MNT_RDONLY, but this may cause loss of the IN_MODIFIED flag for inode on the fs being remounted rw->ro.
Introduce UFS_RDONLY() struct ufsmount' method that reports the value of the fs_ronly. The later is set to 1 only after the remount is finished.
Reviewed by: tegge In collaboration with: pho MFC after: 1 month
|
#
177493 |
|
22-Mar-2008 |
jeff |
- Complete part of the unfinished bufobj work by consistently using BO_LOCK/UNLOCK/MTX when manipulating the bufobj. - Create a new lock in the bufobj to lock bufobj fields independently. This leaves the vnode interlock as an 'identity' lock while the bufobj is an io lock. The bufobj lock is ordered before the vnode interlock and also before the mnt ilock. - Exploit this new lock order to simplify softdep_check_suspend(). - A few sync related functions are marked with a new XXX to note that we may not properly interlock against a non-zero bv_cnt when attempting to sync all vnodes on a mountlist. I do not believe this race is important. If I'm wrong this will make these locations easier to find.
Reviewed by: kib (earlier diff) Tested by: kris, pho (earlier diff)
|
#
173464 |
|
08-Nov-2007 |
obrien |
Turn most ffs 'DIAGNOSTIC's into INVARIANTS.
|
#
170174 |
|
31-May-2007 |
jeff |
- Move rusage from being per-process in struct pstats to per-thread in td_ru. This removes the requirement for per-process synchronization in statclock() and mi_switch(). This was previously supported by sched_lock which is going away. All modifications to rusage are now done in the context of the owning thread. reads proceed without locks. - Aggregate exiting threads rusage in thread_exit() such that the exiting thread's rusage is not lost. - Provide a new routine, rufetch() to fetch an aggregate of all rusage structures from all threads in a process. This routine must be used in any place requiring a rusage from a process prior to it's exit. The exited process's rusage is still available via p_ru. - Aggregate tick statistics only on demand via rufetch() or when a thread exits. Tick statistics are kept in the thread and protected by sched_lock until it exits.
Initial patch by: attilio Reviewed by: attilio, bde (some objections), arch (mostly silent)
|
#
163194 |
|
10-Oct-2006 |
kib |
Do not translate the IN_ACCESS inode flag into the IN_MODIFIED while filesystem is suspending/suspended. Doing so may result in deadlock. Instead, set the (new) IN_LAZYACCESS flag, that becomes IN_MODIFIED when suspend is lifted.
Change the locking protocol in order to set the IN_ACCESS and timestamps without upgrading shared vnode lock to exclusive (see comments in the inode.h). Before that, inode was modified while holding only shared lock.
Tested by: Peter Holm Reviewed by: tegge, bde Approved by: pjd (mentor) MFC after: 3 weeks
|
#
144659 |
|
05-Apr-2005 |
jeff |
- Consistently call 'vp' vp rather than ovp sometimes in ffs_truncate(). Do the same for oip.
Pointed out by: glebius
|
#
143501 |
|
13-Mar-2005 |
jeff |
- Fix an assert now that the XLOCK no longer exists.
Sponsored by: Isilon Systems, Inc.
|
#
142263 |
|
22-Feb-2005 |
jeff |
- Add VOP locking asserts in several functions that have been implicated in recent deadlocks.
|
#
141595 |
|
09-Feb-2005 |
jeff |
- In the softupdates case for ffs_truncate() we use vinvalbuf() to invalidate pending io and dependencies. However, vinvalbuf() rightfully does not call vnode_pager_setsize() for us. We must do this here. This could potentially have caused numerous kinds of bugs, but it was specifically causing msync() deadlocks because msync() was writing flushing pages that should not have been valid.
Sponsored by: Isilon Systems, Inc. Reported by: kkenn
|
#
141526 |
|
08-Feb-2005 |
phk |
Don't use the UFS_* and VFS_* functions where a direct call is possble.
The UFS_ functions are for UFS to call back into VFS. The VFS functions are external entry points into the filesystem.
|
#
141525 |
|
08-Feb-2005 |
phk |
(forced commit to record correct commit message)
Split ffs_fsync() into a VOP_FSYNC() component and an internal part called ffs_syncvnode().
Eliminate unnecessary thread argument and XXX'ed curthread passes for same. Reduce softdep_sync_metadata() from a struct vop_fsync_args to just the vnode argument it needs.
Convert internal VOP_FSYNC() calls to use ffs_syncvnode().
|
#
141522 |
|
08-Feb-2005 |
phk |
For snapshots we need all VOP_LOCKs to be exclusive.
The "business class upgrade" was implemented in UFS's VOP_LOCK implementation ufs_lock() which is the wrong layer, so move it to ffs_lock().
Also, as long as we have not abandonned advanced vfs-stacking we should not preclude it from happening: instead of implementing a copy locally, use the VOP_LOCK_APV(&ufs) to correctly arrive at vop_stdlock() at the bottom.
|
#
140703 |
|
24-Jan-2005 |
jeff |
- Acquire the ufs lock when manipulating some fields of struct fs. - Change arguments to various ffs functions to match their new prototypes.
Sponsored By: Isilon Systems, Inc.
|
#
140220 |
|
14-Jan-2005 |
phk |
Eliminate unused and unnecessary "cred" argument from vinvalbuf()
|
#
140048 |
|
11-Jan-2005 |
phk |
Remove the unused credential argument from VOP_FSYNC() and VFS_SYNC().
I'm not sure why a credential was added to these in the first place, it is not used anywhere and it doesn't make much sense:
The credentials for syncing a file (ability to write to the file) should be checked at the system call level.
Credentials for syncing one or more filesystems ("none") should be checked at the system call level as well.
If the filesystem implementation needs a particular credential to carry out the syncing it would logically have to the cached mount credential, or a credential cached along with any delayed write data.
Discussed with: rwatson
|
#
139825 |
|
07-Jan-2005 |
imp |
/* -> /*- for license, minor formatting changes
|
#
136943 |
|
25-Oct-2004 |
phk |
Loose the v_dirty* and v_clean* alias macros.
Check the count field where we just want to know the full/empty state, rather than using TAILQ_EMPTY() or TAILQ_FIRST().
|
#
136927 |
|
24-Oct-2004 |
phk |
Move the buffer method vector (buf->b_op) to the bufobj.
Extend it with a strategy method.
Add bufstrategy() which do the usual VOP_SPECSTRATEGY/VOP_STRATEGY song and dance.
Rename ibwrite to bufwrite().
Move the two NFS buf_ops to more sensible places, add bufstrategy to them.
Add inlines for bwrite() and bstrategy() which calls through buf->b_bufobj->b_ops->b_{write,strategy}().
Replace almost all VOP_STRATEGY()/VOP_SPECSTRATEGY() calls with bstrategy().
|
#
132775 |
|
28-Jul-2004 |
kan |
Avoid using casts as lvalues. Introduce DIP_SET macro which sets proper inode field based on UFS version. Use DIP ro read values and DIP_SET to modify them throughout FFS code base.
|
#
127975 |
|
07-Apr-2004 |
imp |
Remove advertising clause from University of California Regent's license, per letter dated July 22, 1999 and irc message from Robert Watson saying that clause 3 can be removed from those files with an NAI copyright that also have only a University of California copyrights.
Approved by: core, rwatson
|
#
121205 |
|
18-Oct-2003 |
phk |
DuH!
bp->b_iooffset (the spot on the disk), not bp->b_offset (the offset in the file)
|
#
121202 |
|
18-Oct-2003 |
phk |
Initialize bp->b_offset before calling VOP_[SPEC]STRATEGY()
|
#
120804 |
|
05-Oct-2003 |
jeff |
- In ffs_update() assert that either the vnode lock or the XLOCK is held.
|
#
116192 |
|
11-Jun-2003 |
obrien |
Use __FBSDID().
|
#
111856 |
|
03-Mar-2003 |
jeff |
- Add a new 'flags' parameter to getblk(). - Define one flag GB_LOCK_NOWAIT that tells getblk() to pass the LK_NOWAIT flag to the initial BUF_LOCK(). This will eventually be used in cases were we want to use a buffer only if it is not currently in use. - Convert all consumers of the getblk() api to use this extra parameter.
Reviwed by: arch Not objected to by: mckusick
|
#
111119 |
|
19-Feb-2003 |
imp |
Back out M_* changes, per decision of the TRB.
Approved by: trb
|
#
109623 |
|
21-Jan-2003 |
alfred |
Remove M_TRYWAIT/M_WAITOK/M_WAIT. Callers should use 0. Merge M_NOWAIT/M_DONTWAIT into a single flag M_NOWAIT.
|
#
108589 |
|
03-Jan-2003 |
phk |
Convert calls to BUF_STRATEGY to VOP_STRATEGY calls. This is a no-op since all BUF_STRATEGY did in the first place was call VOP_STRATEGY.
|
#
103946 |
|
25-Sep-2002 |
jeff |
- Convert locks to use standard macros. - Lock access to the buflists. - Document broken locking. - Use vrefcnt().
|
#
101398 |
|
05-Aug-2002 |
iedowse |
Don't call softdep_slowdown() if soft updates are not active on the filesystem. This causes a panic for kernels compiled without softupdates.
Reported by: luigi
|
#
100344 |
|
19-Jul-2002 |
mckusick |
Add support to UFS2 to provide storage for extended attributes. As this code is not actually used by any of the existing interfaces, it seems unlikely to break anything (famous last words).
The internal kernel interface to manipulate these attributes is invoked using two new IO_ flags: IO_NORMAL and IO_EXT. These flags may be specified in the ioflags word of VOP_READ, VOP_WRITE, and VOP_TRUNCATE. Specifying IO_NORMAL means that you want to do I/O to the normal data part of the file and IO_EXT means that you want to do I/O to the extended attributes part of the file. IO_NORMAL and IO_EXT are mutually exclusive for VOP_READ and VOP_WRITE, but may be specified individually or together in the case of VOP_TRUNCATE. For example, when removing a file, VOP_TRUNCATE is called with both IO_NORMAL and IO_EXT set. For backward compatibility, if neither IO_NORMAL nor IO_EXT is set, then IO_NORMAL is assumed.
Note that the BA_ and IO_ flags have been `merged' so that they may both be used in the same flags word. This merger is possible by assigning the IO_ flags to the low sixteen bits and the BA_ flags the high sixteen bits. This works because the high sixteen bits of the IO_ word is reserved for read-ahead and help with write clustering so will never be used for flags. This merge lets us get away from code of the form:
if (ioflags & IO_SYNC) flags |= BA_SYNC;
For the future, I have considered adding a new field to the vattr structure, va_extsize. This addition could then be exported through the stat structure to allow applications to find out the size of the extended attribute storage and also would provide a more standard interface for truncating them (via VOP_SETATTR rather than VOP_TRUNCATE).
I am also contemplating adding a pathconf parameter (for concreteness, lets call it _PC_MAX_EXTSIZE) which would let an application determine the maximum size of the extended atribute storage.
Sponsored by: DARPA & NAI Labs.
|
#
98658 |
|
23-Jun-2002 |
dillon |
Rename the BALLOC flags from B_* to BA_* to avoid confusion with the struct buf B_ flags.
Approved by: mckusick
|
#
98542 |
|
21-Jun-2002 |
mckusick |
This commit adds basic support for the UFS2 filesystem. The UFS2 filesystem expands the inode to 256 bytes to make space for 64-bit block pointers. It also adds a file-creation time field, an ability to use jumbo blocks per inode to allow extent like pointer density, and space for extended attributes (up to twice the filesystem block size worth of attributes, e.g., on a 16K filesystem, there is space for 32K of attributes). UFS2 fully supports and runs existing UFS1 filesystems. New filesystems built using newfs can be built in either UFS1 or UFS2 format using the -O option. In this commit UFS1 is the default format, so if you want to build UFS2 format filesystems, you must specify -O 2. This default will be changed to UFS2 when UFS2 proves itself to be stable. In this commit the boot code for reading UFS2 filesystems is not compiled (see /sys/boot/common/ufsread.c) as there is insufficient space in the boot block. Once the size of the boot block is increased, this code can be defined.
Things to note: the definition of SBSIZE has changed to SBLOCKSIZE. The header file <ufs/ufs/dinode.h> must be included before <ufs/ffs/fs.h> so as to get the definitions of ufs2_daddr_t and ufs_lbn_t.
Still TODO: Verify that the first level bootstraps work for all the architectures. Convert the utility ffsinfo to understand UFS2 and test growfs. Add support for the extended attribute storage. Update soft updates to ensure integrity of extended attribute storage. Switch the current extended attribute interfaces to use the extended attribute storage. Add the extent like functionality (framework is there, but is currently never used).
Sponsored by: DARPA & NAI Labs. Reviewed by: Poul-Henning Kamp <phk@freebsd.org>
|
#
96506 |
|
13-May-2002 |
phk |
Remove register keyword.
Sponsored by: DARPA & NAI Labs. Submitted by: mckusick
|
#
92728 |
|
19-Mar-2002 |
alfred |
Remove __P.
|
#
90098 |
|
01-Feb-2002 |
mckusick |
When taking a snapshot, we must check for active files that have been unlinked (e.g., with a zero link count). We have to expunge all trace of these files from the snapshot so that they are neither reclaimed prematurely by fsck nor saved unnecessarily by dump.
|
#
89413 |
|
16-Jan-2002 |
mckusick |
Put write on read-only filesystem panic after we have weeded out block and character devices, fifo's, etc.
Submitted by: Bruce Evans <bde@zeta.org.au>
|
#
89384 |
|
15-Jan-2002 |
mckusick |
When downgrading a filesystem from read-write to read-only, operations involving file removal or file update were not always being fully committed to disk. The result was lost files or corrupted file data. This change ensures that the filesystem is properly synced to disk before the filesystem is down-graded.
This delta also fixes a long standing bug in which a file open for reading has been unlinked. When the last open reference to the file is closed, the inode is reclaimed by the filesystem. Previously, if the filesystem had been down-graded to read-only, the inode could not be reclaimed, and thus was lost and had to be later recovered by fsck. With this change, such files are found at the time of the down-grade. Normally they will result in the filesystem down-grade failing with `device busy'. If a forcible down-grade is done, then the affected files will be revoked causing the inode to be released and the open file descriptors to begin failing on attempts to read.
Submitted by: "Sam Leffler" <sam@errno.com>
|
#
87782 |
|
13-Dec-2001 |
mckusick |
When a file is partially truncated, we first check to see if the new file end will land in the middle of a file hole. Since the last block of a file must always be allocated, the hole is filled by allocating a block at that location. If the hole being filled is a direct block, then the truncation may eventually reduce the full sized block down to a fragment. When running with soft updates, it is necessary to FSYNC the file after allocating the block and before creating the fragment to avoid triggering a soft updates inconsistency when the block unexpectedly shrinks.
Found by: Matthew Dillon <dillon@apollo.backplane.com> MFC after: 1 week
|
#
83366 |
|
12-Sep-2001 |
julian |
KSE Milestone 2 Note ALL MODULES MUST BE RECOMPILED make the kernel aware that there are smaller units of scheduling than the process. (but only allow one thread per process at this time). This is functionally equivalent to teh previousl -current except that there is a thread associated with each process.
Sorry john! (your next MFC will be a doosie!)
Reviewed by: peter@freebsd.org, dillon@freebsd.org
X-MFC after: ha ha ha ha
|
#
76357 |
|
08-May-2001 |
mckusick |
When running with soft updates, track the number of blocks and files that are committed to being freed and reflect these blocks in the counts returned by statfs (and thus also by the `df' command). This change allows programs such as those that do news expiration to know when to stop if they are trying to create a certain percentage of free space. Note that this change does not solve the much harder problem of making this to-be-freed space available to applications that want it (thus on a nearly full filesystem, you may still encounter out-of-space conditions even though the free space will show up eventually). Hopefully this harder problem will be the subject of a future enhancement.
|
#
76132 |
|
29-Apr-2001 |
phk |
VOP_BALLOC was never really a VOP in the first place, so convert it to UFS_BALLOC like the other "between UFS and FFS function interfaces".
|
#
76117 |
|
29-Apr-2001 |
grog |
Revert consequences of changes to mount.h, part 2.
Requested by: bde
|
#
75858 |
|
23-Apr-2001 |
grog |
Correct #includes to work with fixed sys/mount.h.
|
#
70182 |
|
19-Dec-2000 |
mckusick |
Get rid of spurious check in ffs_truncate for i_size == length which fails to set the modification time on the file. The same check a few lines later takes the correct action.
Submitted by: Ian Dowse <iedowse@maths.tcd.ie>
|
#
69967 |
|
13-Dec-2000 |
mckusick |
Preventing runaway kernel soft updates memory, take three. Previously, the syncer process was the only process in the system that could process the soft updates background work list. If enough other processes were adding requests to that list, it would eventually grow without bound. Because some of the work list requests require vnodes to be locked, it was not generally safe to let random processes process the work list while they already held vnodes locked. By adding a flag to the work list queue processing function to indicate whether the calling process could safely lock vnodes, it becomes possible to co-opt other processes into helping out with the work list. Now when the worklist gets too large, other processes can safely help out by picking off those work requests that can be handled without locking a vnode, leaving only the small number of requests requiring a vnode lock for the syncer process. With this change, it appears possible to keep even the nastiest workloads under control.
Submitted by: Paul Saab <ps@yahoo-inc.com>
|
#
68885 |
|
18-Nov-2000 |
dillon |
Implement a low-memory deadlock solution.
Removed most of the hacks that were trying to deal with low-memory situations prior to now.
The new code is based on the concept that I/O must be able to function in a low memory situation. All major modules related to I/O (except networking) have been adjusted to allow allocation out of the system reserve memory pool. These modules now detect a low memory situation but rather then block they instead continue to operate, then return resources to the memory pool instead of cache them or leave them wired.
Code has been added to stall in a low-memory situation prior to a vnode being locked.
Thus situations where a process blocks in a low-memory condition while holding a locked vnode have been reduced to near nothing. Not only will I/O continue to operate, but many prior deadlock conditions simply no longer exist.
Implement a number of VFS/BIO fixes
(found by Ian): in biodone(), bogus-page replacement code, the loop was not properly incrementing loop variables prior to a continue statement. We do not believe this code can be hit anyway but we aren't taking any chances. We'll turn the whole section into a panic (as it already is in brelse()) after the release is rolled.
In biodone(), the foff calculation was incorrectly clamped to the iosize, causing the wrong foff to be calculated for pages in the case of an I/O error or biodone() called without initiating I/O. The problem always caused a panic before. Now it doesn't. The problem is mainly an issue with NFS.
Fixed casts for ~PAGE_MASK. This code worked properly before only because the calculations use signed arithmatic. Better to properly extend PAGE_MASK first before inverting it for the 64 bit masking op.
In brelse(), the bogus_page fixup code was improperly throwing away the original contents of 'm' when it did the j-loop to fix the bogus pages. The result was that it would potentially invalidate parts of the *WRONG* page(!), leading to corruption.
There may still be cases where a background bitmap write is being duplicated, causing potential corruption. We have identified a potentially serious bug related to this but the fix is still TBD. So instead this patch contains a KASSERT to detect the problem and panic the machine rather then continue to corrupt the filesystem. The problem does not occur very often.. it is very hard to reproduce, and it may or may not be the cause of the corruption people have reported.
Review by: (VFS/BIO: mckusick, Ian Dowse <iedowse@maths.tcd.ie>) Testing by: (VM/Deadlock) Paul Saab <ps@yahoo-inc.com>
|
#
63897 |
|
26-Jul-2000 |
mckusick |
Clean up the snapshot code so that it no longer depends on the use of the SF_IMMUTABLE flag to prevent writing. Instead put in explicit checking for the SF_SNAPSHOT flag in the appropriate places. With this change, it is now possible to rename and link to snapshot files. It is also possible to set or clear any of the owner, group, or other read bits on the file, though none of the write or execute bits can be set. There is also an explicit test to prevent the setting or clearing of the SF_SNAPSHOT flag via chflags() or fchflags(). Note also that the modify time cannot be changed as it needs to accurately reflect the time that the snapshot was taken.
Submitted by: Robert Watson <rwatson@FreeBSD.org>
|
#
60041 |
|
05-May-2000 |
phk |
Separate the struct bio related stuff out of <sys/buf.h> into <sys/bio.h>.
<sys/bio.h> is now a prerequisite for <sys/buf.h> but it shall not be made a nested include according to bdes teachings on the subject of nested includes.
Diskdrivers and similar stuff below specfs::strategy() should no longer need to include <sys/buf.> unless they need caching of data.
Still a few bogus uses of struct buf to track down.
Repocopy by: peter
|
#
59762 |
|
29-Apr-2000 |
phk |
s/biowait/bufwait/g
Prodded by: several.
|
#
59760 |
|
29-Apr-2000 |
phk |
Remove unneeded #include <sys/kernel.h>
|
#
59241 |
|
15-Apr-2000 |
rwatson |
Introduce extended attribute support for FFS, allowing arbitrary (name, value) pairs to be associated with inodes. This support is used for ACLs, MAC labels, and Capabilities in the TrustedBSD security extensions, which are currently under development.
In this implementation, attributes are backed to data vnodes in the style of the quota support in FFS. Support for FFS extended attributes may be enabled using the FFS_EXTATTR kernel option (disabled by default). Userland utilities and man pages will be committed in the next batch. VFS interfaces and man pages have been in the repo since 4.0-RELEASE and are unchanged.
o ufs/ufs/extattr.h: UFS-specific extattr defines o ufs/ufs/ufs_extattr.c: bulk of support routines o ufs/{ufs,ffs,mfs}/*.[ch]: hooks and extattr.h includes o contrib/softupdates/ffs_softdep.c: extattr.h includes o conf/options, conf/files, i386/conf/LINT: added FFS_EXTATTR
o coda/coda_vfsops.c: XXX required extattr.h due to ufsmount.h (This should not be the case, and will be fixed in a future commit)
Currently attributes are not supported in MFS. This will be fixed.
Reviewed by: adrian, bp, freebsd-fs, other unthanked souls Obtained from: TrustedBSD Project
|
#
58934 |
|
02-Apr-2000 |
phk |
Move B_ERROR flag to b_ioflags and call it BIO_ERROR.
(Much of this done by script)
Move B_ORDERED flag to b_ioflags and call it BIO_ORDERED.
Move b_pblkno and b_iodone_chain to struct bio while we transition, they will be obsoleted once bio structs chain/stack.
Add bio_queue field for struct bio aware disksort.
Address a lot of stylistic issues brought up by bde.
|
#
58349 |
|
20-Mar-2000 |
phk |
Rename the existing BUF_STRATEGY() to DEV_STRATEGY()
substitute BUF_WRITE(foo) for VOP_BWRITE(foo->b_vp, foo)
substitute BUF_STRATEGY(foo) for VOP_STRATEGY(foo->b_vp, foo)
This patch is machine generated except for the ccd.c and buf.h parts.
|
#
58345 |
|
20-Mar-2000 |
phk |
Remove B_READ, B_WRITE and B_FREEBUF and replace them with a new field in struct buf: b_iocmd. The b_iocmd is enforced to have exactly one bit set.
B_WRITE was bogusly defined as zero giving rise to obvious coding mistakes.
Also eliminate the redundant struct buf flag B_CALL, it can just as efficiently be done by comparing b_iodone to NULL.
Should you get a panic or drop into the debugger, complaining about "b_iocmd", don't continue. It is likely to write on your disk where it should have been reading.
This change is a step in the direction towards a stackable BIO capability.
A lot of this patch were machine generated (Thanks to style(9) compliance!)
Vinum users: Greg has not had time to test this yet, be careful.
|
#
50477 |
|
27-Aug-1999 |
peter |
$Id$ -> $FreeBSD$
|
#
48801 |
|
13-Jul-1999 |
mckusick |
Create the macro DOINGASYNC to check whether the MNT_ASYNC flag has been set for a mount point. Insert missing checks to ensure that all write operations are done asynchronously when the MNT_ASYNC option has been requested.
Submitted by: Craig A Soules <soules+@andrew.cmu.edu> Reviewed by: Kirk McKusick <mckusick@mckusick.com>
|
#
46349 |
|
02-May-1999 |
alc |
The VFS/BIO subsystem contained a number of hacks in order to optimize piecemeal, middle-of-file writes for NFS. These hacks have caused no end of trouble, especially when combined with mmap(). I've removed them. Instead, NFS will issue a read-before-write to fully instantiate the struct buf containing the write. NFS does, however, optimize piecemeal appends to files. For most common file operations, you will not notice the difference. The sole remaining fragment in the VFS/BIO system is b_dirtyoff/end, which NFS uses to avoid cache coherency issues with read-merge-write style operations. NFS also optimizes the write-covers-entire-buffer case by avoiding the read-before-write. There is quite a bit of room for further optimization in these areas.
The VM system marks pages fully-valid (AKA vm_page_t->valid = VM_PAGE_BITS_ALL) in several places, most noteably in vm_fault. This is not correct operation. The vm_pager_get_pages() code is now responsible for marking VM pages all-valid. A number of VM helper routines have been added to aid in zeroing-out the invalid portions of a VM page prior to the page being marked all-valid. This operation is necessary to properly support mmap(). The zeroing occurs most often when dealing with file-EOF situations. Several bugs have been fixed in the NFS subsystem, including bits handling file and directory EOF situations and buf->b_flags consistancy issues relating to clearing B_ERROR & B_INVAL, and handling B_DONE.
getblk() and allocbuf() have been rewritten. B_CACHE operation is now formally defined in comments and more straightforward in implementation. B_CACHE for VMIO buffers is based on the validity of the backing store. B_CACHE for non-VMIO buffers is based simply on whether the buffer is B_INVAL or not (B_CACHE set if B_INVAL clear, and vise-versa). biodone() is now responsible for setting B_CACHE when a successful read completes. B_CACHE is also set when a bdwrite() is initiated and when a bwrite() is initiated. VFS VOP_BWRITE routines (there are only two - nfs_bwrite() and bwrite()) are now expected to set B_CACHE. This means that bowrite() and bawrite() also set B_CACHE indirectly.
There are a number of places in the code which were previously using buf->b_bufsize (which is DEV_BSIZE aligned) when they should have been using buf->b_bcount. These have been fixed. getblk() now clears B_DONE on return because the rest of the system is so bad about dealing with B_DONE.
Major fixes to NFS/TCP have been made. A server-side bug could cause requests to be lost by the server due to nfs_realign() overwriting other rpc's in the same TCP mbuf chain. The server's kernel must be recompiled to get the benefit of the fixes.
Submitted by: Matthew Dillon <dillon@apollo.backplane.com>
|
#
43311 |
|
27-Jan-1999 |
dillon |
Fix warnings in preparation for adding -Wall -Wcast-qual to the kernel compile
|
#
42374 |
|
07-Jan-1999 |
bde |
Don't pass unused unused timestamp args to UFS_UPDATE() or waste time initializing them. This almost finishes centralizing (in-core) timestamp updates in ufs_itimes().
|
#
42354 |
|
06-Jan-1999 |
bde |
UFS_UPDATE() takes a boolean `waitfor' arg, so don't pass it the value MNT_WAIT when we mean boolean `true' or check for that value not being passed. There was no problem in practice because MNT_WAIT had the magic value of 1.
|
#
42350 |
|
06-Jan-1999 |
bde |
Backed out rev.1.47. It just broke my optimisations for lazy syncing of timestamps in rev.1.45. The soft updates bug was elsewhere.
Forgotten by: luoqi
|
#
40790 |
|
31-Oct-1998 |
peter |
Use TAILQ macros for clean/dirty block list processing. Set b_xflags rather than abusing the list next pointer with a magic number.
|
#
40648 |
|
25-Oct-1998 |
phk |
Nitpicking and dusting performed on a train. Removes trivial warnings about unused variables, labels and other lint.
|
#
39281 |
|
15-Sep-1998 |
luoqi |
Restore pre-v1.44 behavior: always copy modified in-core inode to disk buffer. Otherwise some in-core inode changes might be lost, including important meta data (e.g. size) if softupdates is enabled.
|
#
37384 |
|
04-Jul-1998 |
julian |
VOP_STRATEGY grows an (struct vnode *) argument as the value in b_vp is often not really what you want. (and needs to be frobbed). more cleanups will follow this. Reviewed by: Bruce Evans <bde@freebsd.org>
|
#
37363 |
|
03-Jul-1998 |
bde |
Sync timestamp changes for inodes of special files to disk as late as possible (when the inode is reclaimed). Temporarily only do this if option UFS_LAZYMOD configured and softupdates aren't enabled. UFS_LAZYMOD is intentionally left out of /sys/conf/options.
This is mainly to avoid almost useless disk i/o on battery powered machines. It's silly to write to disk (on the next sync or when the inode becomes inactive) just because someone hit a key or something wrote to the screen or /dev/null.
PR: 5577 Previous version reviewed by: phk
|
#
37362 |
|
03-Jul-1998 |
bde |
Centralized in-core inode update. Update the in-core inode directly in ufs_setattr() so that there is no need to pass timestamps to UFS_UPDATE() (everything else just needs the current time). Ignore the passed-in timestamps in UFS_UPDATE() and always call ufs_itimes() (was: itimes()) to do the update. The timestamps are still passed so that all the callers don't need to be changed yet.
|
#
36990 |
|
14-Jun-1998 |
julian |
Slight change to directory cleanup Makes soft updates a bit cleaner. Eliminates some warnings about 'corrupted directories' from fsck.
|
#
35696 |
|
04-May-1998 |
dyson |
Correct an error that I made where the vtruncbuf was changed back to vinvalbuf, but I incorrectly added the "V_SAVE|V_SAVEMETA" flags. Submitted by: Luoqi Chen <luoqi@watermarkgroup.com>
|
#
35526 |
|
30-Apr-1998 |
dyson |
Fix an error that I made with an optimization. In the case of softupdates, we need to do vtruncbuf the old way. Luoqi caught, found the bug and submitted this fix. Submitted by: Luoqi Chen <luoqi@chen.ml.org>
|
#
34961 |
|
30-Mar-1998 |
phk |
Eradicate the variable "time" from the kernel, using various measures. "time" wasn't a atomic variable, so splfoo() protection were needed around any access to it, unless you just wanted the seconds part.
Most uses of time.tv_sec now uses the new variable time_second instead.
gettime() changed to getmicrotime(0.
Remove a couple of unneeded splfoo() protections, the new getmicrotime() is atomic, (until Bruce sets a breakpoint in it).
A couple of places needed random data, so use read_random() instead of mucking about with time which isn't random.
Add a new nfs_curusec() function.
Mark a couple of bogosities involving the now disappeard time variable.
Update ffs_update() to avoid the weird "== &time" checks, by fixing the one remaining call that passwd &time as args.
Change profiling in ncr.c to use ticks instead of time. Resolution is the same.
Add new function "tvtohz()" to avoid the bogus "splfoo(), add time, call hzto() which subtracts time" sequences.
Reviewed by: bde
|
#
34901 |
|
26-Mar-1998 |
phk |
Add two new functions, get{micro|nano}time.
They are atomic, but return in essence what is in the "time" variable. gettime() is now a macro front for getmicrotime().
Various patches to use the two new functions instead of the various hacks used in their absence.
Some puntuation and grammer patches from Bruce.
A couple of XXX comments.
|
#
34696 |
|
19-Mar-1998 |
dyson |
Fix vfs_bio_awrite usage, and correct vtruncbuf usage.
|
#
34611 |
|
15-Mar-1998 |
dyson |
Some VM improvements, including elimination of alot of Sig-11 problems. Tor Egge and others have helped with various VM bugs lately, but don't blame him -- blame me!!!
pmap.c: 1) Create an object for kernel page table allocations. This fixes a bogus allocation method previously used for such, by grabbing pages from the kernel object, using bogus pindexes. (This was a code cleanup, and perhaps a minor system stability issue.)
pmap.c: 2) Pre-set the modify and accessed bits when prudent. This will decrease bus traffic under certain circumstances.
vfs_bio.c, vfs_cluster.c: 3) Rather than calculating the beginning virtual byte offset multiple times, stick the offset into the buffer header, so that the calculated offset can be reused. (Long long multiplies are often expensive, and this is a probably unmeasurable performance improvement, and code cleanup.)
vfs_bio.c: 4) Handle write recursion more intelligently (but not perfectly) so that it is less likely to cause a system panic, and is also much more robust.
vfs_bio.c: 5) getblk incorrectly wrote out blocks that are incorrectly sized. The problem is fixed, and writes blocks out ONLY when B_DELWRI is true.
vfs_bio.c: 6) Check that already constituted buffers have fully valid pages. If not, then make sure that the B_CACHE bit is not set. (This was a major source of Sig-11 type problems.)
vfs_bio.c: 7) Fix a potential system deadlock due to an incorrectly specified sleep priority while waiting for a buffer write operation. The change that I made opens the system up to serious problems, and we need to examine the issue of process sleep priorities.
vfs_cluster.c, vfs_bio.c: 8) Make clustered reads work more correctly (and more completely) when buffers are already constituted, but not fully valid. (This was another system reliability issue.)
vfs_subr.c, ffs_inode.c: 9) Create a vtruncbuf function, which is used by filesystems that can truncate files. The vinvalbuf forced a file sync type operation, while vtruncbuf only invalidates the buffers past the new end of file, and also invalidates the appropriate pages. (This was a system reliabiliy and performance issue.)
10) Modify FFS to use vtruncbuf.
vm_object.c: 11) Make the object rundown mechanism for OBJT_VNODE type objects work more correctly. Included in that fix, create pager entries for the OBJT_DEAD pager type, so that paging requests that might slip in during race conditions are properly handled. (This was a system reliability issue.)
vm_page.c: 12) Make some of the page validation routines be a little less picky about arguments passed to them. Also, support page invalidation change the object generation count so that we handle generation counts a little more robustly.
vm_pageout.c: 13) Further reduce pageout daemon activity when the system doesn't need help from it. There should be no additional performance decrease even when the pageout daemon is running. (This was a significant performance issue.)
vnode_pager.c: 14) Teach the vnode pager to handle race conditions during vnode deallocations.
|
#
34266 |
|
08-Mar-1998 |
julian |
Reviewed by: dyson@freebsd.org (john Dyson), dg@root.com (david greenman) Submitted by: Kirk McKusick (mcKusick@mckusick.com) Obtained from: WHistle development tree
|
#
34206 |
|
07-Mar-1998 |
dyson |
This mega-commit is meant to fix numerous interrelated problems. There has been some bitrot and incorrect assumptions in the vfs_bio code. These problems have manifest themselves worse on NFS type filesystems, but can still affect local filesystems under certain circumstances. Most of the problems have involved mmap consistancy, and as a side-effect broke the vfs.ioopt code. This code might have been committed seperately, but almost everything is interrelated.
1) Allow (pmap_object_init_pt) prefaulting of buffer-busy pages that are fully valid. 2) Rather than deactivating erroneously read initial (header) pages in kern_exec, we now free them. 3) Fix the rundown of non-VMIO buffers that are in an inconsistent (missing vp) state. 4) Fix the disassociation of pages from buffers in brelse. The previous code had rotted and was faulty in a couple of important circumstances. 5) Remove a gratuitious buffer wakeup in vfs_vmio_release. 6) Remove a crufty and currently unused cluster mechanism for VBLK files in vfs_bio_awrite. When the code is functional, I'll add back a cleaner version. 7) The page busy count wakeups assocated with the buffer cache usage were incorrectly cleaned up in a previous commit by me. Revert to the original, correct version, but with a cleaner implementation. 8) The cluster read code now tries to keep data associated with buffers more aggressively (without breaking the heuristics) when it is presumed that the read data (buffers) will be soon needed. 9) Change to filesystem lockmgr locks so that they use LK_NOPAUSE. The delay loop waiting is not useful for filesystem locks, due to the length of the time intervals. 10) Correct and clean-up spec_getpages. 11) Implement a fully functional nfs_getpages, nfs_putpages. 12) Fix nfs_write so that modifications are coherent with the NFS data on the server disk (at least as well as NFS seems to allow.) 13) Properly support MS_INVALIDATE on NFS. 14) Properly pass down MS_INVALIDATE to lower levels of the VM code from vm_map_clean. 15) Better support the notion of pages being busy but valid, so that fewer in-transit waits occur. (use p->busy more for pageouts instead of PG_BUSY.) Since the page is fully valid, it is still usable for reads. 16) It is possible (in error) for cached pages to be busy. Make the page allocation code handle that case correctly. (It should probably be a printf or panic, but I want the system to handle coding errors robustly. I'll probably add a printf.) 17) Correct the design and usage of vm_page_sleep. It didn't handle consistancy problems very well, so make the design a little less lofty. After vm_page_sleep, if it ever blocked, it is still important to relookup the page (if the object generation count changed), and verify it's status (always.) 18) In vm_pageout.c, vm_pageout_clean had rotted, so clean that up. 19) Push the page busy for writes and VM_PROT_READ into vm_pageout_flush. 20) Fix vm_pager_put_pages and it's descendents to support an int flag instead of a boolean, so that we can pass down the invalidate bit.
|
#
33134 |
|
06-Feb-1998 |
eivind |
Back out DIAGNOSTIC changes.
|
#
33108 |
|
04-Feb-1998 |
eivind |
Turn DIAGNOSTIC into a new-style option.
|
#
32976 |
|
01-Feb-1998 |
dyson |
Back out recent laptop sync changes. They had significant errors.
|
#
32951 |
|
31-Jan-1998 |
dyson |
Support more intelligent sync operations for MNT_NOATIME. PR: kern/5577 Submitted by: Craig Leres <leres@ee.lbl.gov>
|
#
32286 |
|
06-Jan-1998 |
dyson |
Make our v_usecount vnode reference count work identically to the original BSD code. The association between the vnode and the vm_object no longer includes reference counts. The major difference is that vm_object's are no longer freed gratuitiously from the vnode, and so once an object is created for the vnode, it will last as long as the vnode does.
When a vnode object reference count is incremented, then the underlying vnode reference count is incremented also. The two "objects" are now more intimately related, and so the interactions are now much less complex.
When vnodes are now normally placed onto the free queue with an object still attached. The rundown of the object happens at vnode rundown time, and happens with exactly the same filesystem semantics of the original VFS code. There is absolutely no need for vnode_pager_uncache and other travesties like that anymore.
A side-effect of these changes is that SMP locking should be much simpler, the I/O copyin/copyout optimizations work, NFS should be more ponderable, and further work on layered filesystems should be less frustrating, because of the totally coherent management of the vnode objects and vnodes.
Please be careful with your system while running this code, but I would greatly appreciate feedback as soon a reasonably possible.
|
#
30492 |
|
16-Oct-1997 |
phk |
Another VFS cleanup "kilo commit"
1. Remove VOP_UPDATE, it is (also) an UFS/{FFS,LFS,EXT2FS,MFS} intereface function, and now lives in the ufsmount structure.
2. Remove VOP_SEEK, it was unused.
3. Add mode default vops:
VOP_ADVLOCK vop_einval VOP_CLOSE vop_null VOP_FSYNC vop_null VOP_IOCTL vop_enotty VOP_MMAP vop_einval VOP_OPEN vop_null VOP_PATHCONF vop_einval VOP_READLINK vop_einval VOP_REALLOCBLKS vop_eopnotsupp
And remove identical functionality from filesystems
4. Add vop_stdpathconf, which returns the canonical stuff. Use it in the filesystems. (XXX: It's probably wrong that specfs and fifofs sets this vop, shouldn't it come from the "host" filesystem, for instance ufs or cd9660 ?)
5. Try to make system wide VOP functions have vop_* names.
6. Initialize the um_* vectors in LFS.
(Recompile your LKMS!!!)
|
#
30474 |
|
16-Oct-1997 |
phk |
VFS mega cleanup commit (x/N)
1. Add new file "sys/kern/vfs_default.c" where default actions for VOPs go. Implement proper defaults for ABORTOP, BWRITE, LEASE, POLL, REVOKE and STRATEGY. Various stuff spread over the entire tree belongs here.
2. Change VOP_BLKATOFF to a normal function in cd9660.
3. Kill VOP_BLKATOFF, VOP_TRUNCATE, VOP_VFREE, VOP_VALLOC. These are private interface functions between UFS and the underlying storage manager layer (FFS/LFS/MFS/EXT2FS). The functions now live in struct ufsmount instead.
4. Remove a kludge of VOP_ functions in all filesystems, that did nothing but obscure the simplicity and break the expandability. If a filesystem doesn't implement VOP_FOO, it shouldn't have an entry for it in its vnops table. The system will try to DTRT if it is not implemented. There are still some cruft left, but the bulk of it is done.
5. Fix another VCALL in vfs_cache.c (thanks Bruce!)
|
#
29041 |
|
02-Sep-1997 |
bde |
Removed unused #includes.
|
#
24101 |
|
22-Mar-1997 |
bde |
Fixed some invalid (non-atomic) accesses to `time', mostly ones of the form `tv = time'. Use a new function gettime(). The current version just forces atomicicity without fixing precision or efficiency bugs. Simplified some related valid accesses by using the central function.
|
#
22975 |
|
22-Feb-1997 |
peter |
Back out part 1 of the MCFH that changed $Id$ to $FreeBSD$. We are not ready for it yet.
|
#
22521 |
|
10-Feb-1997 |
dyson |
This is the kernel Lite/2 commit. There are some requisite userland changes, so don't expect to be able to run the kernel as-is (very well) without the appropriate Lite/2 userland changes.
The system boots and can mount UFS filesystems.
Untested: ext2fs, msdosfs, NFS Known problems: Incorrect Berkeley ID strings in some files. Mount_std mounts will not work until the getfsent library routine is changed.
Reviewed by: various people Submitted by: Jeffery Hsu <hsu@freebsd.org>
|
#
21673 |
|
14-Jan-1997 |
jkh |
Make the long-awaited change from $Id$ to $FreeBSD$
This will make a number of things easier in the future, as well as (finally!) avoiding the Id-smashing problem which has plagued developers for so long.
Boy, I'm glad we're not using sup anymore. This update would have been insane otherwise.
|
#
19424 |
|
05-Nov-1996 |
dg |
Eliminate an unnecessary synchronous write (and an 8K bcopy+bzero) when truncating/deleting large files.
Reviewed by: mckusick, dyson Submitted by: Kirk McKusick <mckusick@mckusick.com>, modified for FreeBSD by me.
|
#
18397 |
|
19-Sep-1996 |
nate |
In sys/time.h, struct timespec is defined as:
/* * Structure defined by POSIX.4 to be like a timeval. */ struct timespec { time_t ts_sec; /* seconds */ long ts_nsec; /* and nanoseconds */ };
The correct names of the fields are tv_sec and tv_nsec.
Reminded by: James Drobina <jdrobina@infinet.com>
|
#
13490 |
|
19-Jan-1996 |
dyson |
Eliminated many redundant vm_map_lookup operations for vm_mmap. Speed up for vfs_bio -- addition of a routine bqrelse to greatly diminish overhead for merged cache. Efficiency improvement for vfs_cluster. It used to do alot of redundant calls to cluster_rbuild. Correct the ordering for vrele of .text and release of credentials. Use the selective tlb update for 486/586/P6. Numerous fixes to the size of objects allocated for files. Additionally, fixes in the various pagers. Fixes for proper positioning of vnode_pager_setsize in msdosfs and ext2fs. Fixes in the swap pager for exhausted resources. The pageout code will not as readily thrash. Change the page queue flags (PG_ACTIVE, PG_INACTIVE, PG_FREE, PG_CACHE) into page queue indices (PQ_ACTIVE, PQ_INACTIVE, PQ_FREE, PQ_CACHE), thereby improving efficiency of several routines. Eliminate even more unnecessary vm_page_protect operations. Significantly speed up process forks. Make vm_object_page_clean more efficient, thereby eliminating the pause that happens every 30seconds. Make sequential clustered writes B_ASYNC instead of B_DELWRI even in the case of filesystems mounted async. Fix a panic with busy pages when write clustering is done for non-VMIO buffers.
|
#
13260 |
|
05-Jan-1996 |
wollman |
Convert QUOTA to new-style option.
|
#
12767 |
|
11-Dec-1995 |
dyson |
Changes to support 1Tb filesizes. Pages are now named by an (object,index) pair instead of (object,offset) pair.
|
#
12662 |
|
07-Dec-1995 |
dg |
Untangled the vm.h include file spaghetti.
|
#
12111 |
|
05-Nov-1995 |
dyson |
Make MNT_ASYNC more effective for UFS. It should not be too much more dangerous than the original MNT_ASYNC. There might be some minor security considerations due to data writes not being posted as promptly as before. Meta-data operations are still not quite as fast as Linux, but streaming I/O is still higher.
|
#
10078 |
|
16-Aug-1995 |
dg |
Honor -async mount option when doing the inode update.
Obtained from: 4.4BSD-Lite2
|
#
9886 |
|
04-Aug-1995 |
dg |
Use the correct flags (IO_SYNC -> B_SYNC) when deciding to do a sync or async write in the section that changes the filesize. The bug resulted in the updates always being async.
Obtained from: 4.4BSD-Lite2
|
#
7399 |
|
26-Mar-1995 |
dg |
Removed third arg (vmio) to allocbuf() that was added with the original merged cache changes, and figure it out based on the B_VMIO buffer flag. Fixes a problem where delayed write VMIO buffers would sometimes get recopied into kernel-alloced memory.
Submitted by: John Dyson
|
#
6875 |
|
04-Mar-1995 |
dg |
Removed obsolete vtrace() remnants.
|
#
5455 |
|
09-Jan-1995 |
dg |
These changes embody the support of the fully coherent merged VM buffer cache, much higher filesystem I/O performance, and much better paging performance. It represents the culmination of over 6 months of R&D.
The majority of the merged VM/cache work is by John Dyson.
The following highlights the most significant changes. Additionally, there are (mostly minor) changes to the various filesystem modules (nfs, msdosfs, etc) to support the new VM/buffer scheme.
vfs_bio.c: Significant rewrite of most of vfs_bio to support the merged VM buffer cache scheme. The scheme is almost fully compatible with the old filesystem interface. Significant improvement in the number of opportunities for write clustering.
vfs_cluster.c, vfs_subr.c Upgrade and performance enhancements in vfs layer code to support merged VM/buffer cache. Fixup of vfs_cluster to eliminate the bogus pagemove stuff.
vm_object.c: Yet more improvements in the collapse code. Elimination of some windows that can cause list corruption.
vm_pageout.c: Fixed it, it really works better now. Somehow in 2.0, some "enhancements" broke the code. This code has been reworked from the ground-up.
vm_fault.c, vm_page.c, pmap.c, vm_object.c Support for small-block filesystems with merged VM/buffer cache scheme.
pmap.c vm_map.c Dynamic kernel VM size, now we dont have to pre-allocate excessive numbers of kernel PTs.
vm_glue.c Much simpler and more effective swapping code. No more gratuitous swapping.
proc.h Fixed the problem that the p_lock flag was not being cleared on a fork.
swap_pager.c, vnode_pager.c Removal of old vfs_bio cruft to support the past pseudo-coherency. Now the code doesn't need it anymore.
machdep.c Changes to better support the parameter values for the merged VM/buffer cache scheme.
machdep.c, kern_exec.c, vm_glue.c Implemented a seperate submap for temporary exec string space and another one to contain process upages. This eliminates all map fragmentation problems that previously existed.
ffs_inode.c, ufs_inode.c, ufs_readwrite.c Changes for merged VM/buffer cache. Add "bypass" support for sneaking in on busy buffers.
Submitted by: John Dyson and David Greenman
|
#
5248 |
|
27-Dec-1994 |
bde |
Use the same current time throughout ffs_update().
Update some macro names in comments.
Don't use MNT_WAIT for something not related to mounting.
|
#
3768 |
|
22-Oct-1994 |
dg |
Restrict fs_maxfilesize to 2^40, and check against this in ffs_truncate(). This is part of a bug fix from Kirk McKusick to work around problems in FFS related to the blkno of a 64bit offset not fitting into an int. Note the proper solution would be to deal with 64bit block numbers, but doing this would require sweeping changes; some other day perhaps.
Submitted by: Marshall Kirk McKusick
|
#
3487 |
|
09-Oct-1994 |
phk |
Cosmetics. make gcc less noisy. Still some way to go here.
|
#
2460 |
|
02-Sep-1994 |
dg |
panic if length is < 0 in ffs_truncate().
|
#
2384 |
|
29-Aug-1994 |
dg |
"bogus" fixes from 1.1.5 to work around some cache coherency problems.
|
#
1826 |
|
03-Aug-1994 |
dg |
Changed occurrances of "itrunc" to "ffs_truncate" to make Bruce happy.
|
#
1821 |
|
02-Aug-1994 |
dg |
Completed (hopefully) the kernel support for old style "fastlinks".
|
#
1817 |
|
02-Aug-1994 |
dg |
Added $Id$
|
#
1549 |
|
25-May-1994 |
rgrimes |
The big 4.4BSD Lite to FreeBSD 2.0.0 (Development) patch.
Reviewed by: Rodney W. Grimes Submitted by: John Dyson and David Greenman
|
#
1542 |
|
24-May-1994 |
rgrimes |
This commit was generated by cvs2svn to compensate for changes in r1541, which included commits to RCS files with non-trunk default branches.
|
#
1541 |
|
24-May-1994 |
rgrimes |
BSD 4.4 Lite Kernel Sources
|