#
356905 |
|
20-Jan-2020 |
eugen |
MFC r323157 by 323157: fix recovery information with sector sizes up to 64K.
Original commit log:
The new fsck recovery information to enable it to find backup superblocks created in revision 322297 only works on disks with sector sizes up to 4K. This update allows the recovery information to be created by newfs and used by fsck on disks with sector sizes up to 64K. Note that FFS currently limits filesystem to be mounted from disks with up to 8K sectors. Expanding this limitation will be the subject of another commit.
For example, this allows newfs to work on GELI volumes with 8K sectors.
PR: 243413 Approved by: mckusick Relnotes: Yes
|
#
344861 |
|
07-Mar-2019 |
mckusick |
MFC of 344552 and 344732
Have fsck_ffs adjust size for files with hole at end
Tighten last lbn calculation
Sponsored by: Netflix
|
#
344376 |
|
20-Feb-2019 |
kevans |
MFC r304850, r305480, r324550-r324551, r324655, r324684: correct mis-merge
Some of these commits were improperly MFC'd in the sys/boot => stand mega-MFC, others were simply missed. Correct that mistake now by manually merging the few that were missed and record-only merge on the others.
r304850: Unused variables and cstyle fix for loader dosfs
r305480: Renumber the advertising clause.
r324550: Add $FreeBSD$ to ancient sources that it's missing from.
r324551: Move lib/libstand to sys/boot/libsa
Move the sources to sys/boot. Make adjustments related to the move. Kill LIBSTAND_SRC since it's no longer needed.
r324655: Remove the libstand directory which is now empty.
r324684: Remove lib/libstand again, accidentally readded in r324683
|
#
331722 |
|
29-Mar-2018 |
eadler |
Revert r330897:
This was intended to be a non-functional change. It wasn't. The commit message was thus wrong. In addition it broke arm, and merged crypto related code.
Revert with prejudice.
This revert skips files touched in r316370 since that commit was since MFCed. This revert also skips files that require $FreeBSD$ property changes.
Thank you to those who helped me get out of this mess including but not limited to gonzo, kevans, rgrimes.
Requested by: gjb (re)
|
#
330897 |
|
14-Mar-2018 |
eadler |
Partial merge of the SPDX changes
These changes are incomplete but are making it difficult to determine what other changes can/should be merged.
No objections from: pfg
|
#
322806 |
|
23-Aug-2017 |
mckusick |
MFC of 322200, 322201, 322271, and 322297
322200: Remove (broken) search for alternate superblocks 322201: Show differences when alternate superblock fails to match 322271: Cleanup for 322200. 322297: Restore fsck_ffs ability to find alternate superblocks
Discussed with: kib, imp Differential Revision: https://reviews.freebsd.org/D11589
|
#
302408 |
|
07-Jul-2016 |
gjb |
Copy head@r302406 to stable/11 as part of the 11.0-RELEASE cycle. Prune svn:mergeinfo from the new branch, as nothing has been merged here.
Additional commits post-branch will follow.
Approved by: re (implicit) Sponsored by: The FreeBSD Foundation |
#
298804 |
|
29-Apr-2016 |
pfg |
UFS: spelling fixes on comments.
No functional change.
|
#
262678 |
|
02-Mar-2014 |
pfg |
ufs: small formatting fixes.
Cleanup some extra space. Use of tabs vs. spaces. No functional change.
MFC after: 3 days Reviewed by: mckusick
|
#
248623 |
|
22-Mar-2013 |
mckusick |
The purpose of this change to the FFS layout policy is to reduce the running time for a full fsck. It also reduces the random access time for large files and speeds the traversal time for directory tree walks.
The key idea is to reserve a small area in each cylinder group immediately following the inode blocks for the use of metadata, specifically indirect blocks and directory contents. The new policy is to preferentially place metadata in the metadata area and everything else in the blocks that follow the metadata area.
The size of this area can be set when creating a filesystem using newfs(8) or changed in an existing filesystem using tunefs(8). Both utilities use the `-k held-for-metadata-blocks' option to specify the amount of space to be held for metadata blocks in each cylinder group. By default, newfs(8) sets this area to half of minfree (typically 4% of the data area).
This work was inspired by a paper presented at Usenix's FAST '13: www.usenix.org/conference/fast13/ffsck-fast-file-system-checker
Details of this implementation appears in the April 2013 of ;login: www.usenix.org/publications/login/april-2013-volume-38-number-2. A copy of the April 2013 ;login: paper can also be downloaded from: www.mckusick.com/publications/faster_fsck.pdf.
Reviewed by: kib Tested by: Peter Holm MFC after: 4 weeks
|
#
243250 |
|
18-Nov-2012 |
trasz |
Fix build of kdump(1).
|
#
243245 |
|
18-Nov-2012 |
trasz |
Add UFS writesuspension mechanism, designed to allow userland processes to modify on-disk metadata for filesystems mounted for write.
Reviewed by: kib, mckusick Sponsored by: FreeBSD Foundation
|
#
242379 |
|
30-Oct-2012 |
trasz |
Fix problem with geom_label(4) not recognizing UFS labels on filesystems extended using growfs(8). The problem here is that geom_label checks if the filesystem size recorded in UFS superblock is equal to the provider (i.e. device) size. This check cannot be removed due to backward compatibility. On the other hand, in most cases growfs(8) cannot set fs_size in the superblock to match the provider size, because, differently from newfs(8), it cannot recompute cylinder group sizes.
To fix this problem, add another superblock field, fs_providersize, used only for this purpose. The geom_label(4) will attach if either fs_size (filesystem created with newfs(8)) or fs_providersize (filesystem expanded using growfs(8)) matches the device size.
PR: kern/165962 Reviewed by: mckusick Sponsored by: FreeBSD Foundation
|
#
227382 |
|
09-Nov-2011 |
gleb |
Use implementation independent inoNN_t scalars for on-disk UFS structures
Approved by: mdf (mentor)
|
#
224061 |
|
15-Jul-2011 |
mckusick |
Add an FFS specific mount option to allow a filesystem checker (typically fsck_ffs) to register that it wishes to use FFS specific sysctl's to update the filesystem. This ensures that two checkers cannot run on a given filesystem at the same time and that no other process accidentally or maliciously uses the filesystem updating sysctls inappropriately. This functionality is needed by the journaling soft-updates recovery code.
|
#
222958 |
|
10-Jun-2011 |
jeff |
Implement fully asynchronous partial truncation with softupdates journaling to resolve errors which can cause corruption on recovery with the old synchronous mechanism.
- Append partial truncation freework structures to indirdeps while truncation is proceeding. These prevent new block pointers from becoming valid until truncation completes and serialize truncations. - On completion of a partial truncate journal work waits for zeroed pointers to hit indirects. - softdep_journal_freeblocks() handles last frag allocation and last block zeroing. - vtruncbuf/ffs_page_remove moved into softdep_*_freeblocks() so it is only implemented in one place. - Block allocation failure handling moved up one level so it does not proceed with buf locks held. This permits us to do more extensive reclaims when filesystem space is exhausted. - softdep_sync_metadata() is broken into two parts, the first executes once at the start of ffs_syncvnode() and flushes truncations and inode dependencies. The second is called on each locked buf. This eliminates excessive looping and rollbacks. - Improve the mechanism in process_worklist_item() that handles acquiring vnode locks for handle_workitem_remove() so that it works more generally and does not loop excessively over the same worklist items on each call. - Don't corrupt directories by zeroing the tail in fsck. This is only done for regular files. - Push a fsync complete record for files that need it so the checker knows a truncation in the journal is no longer valid.
Discussed with: mckusick, kib (ffs_pages_remove and ffs_truncate parts) Tested by: pho
|
#
218602 |
|
12-Feb-2011 |
kib |
Use the native sector size of the device backing the UFS volume for SU+J journal blocks, instead of hard coding 512 byte sector size. Journal need to atomically write the block, that can only be guaranteed at the device sector size, not larger. Attempt to write less then sector size results in driver errors.
Note that this is the first structure in UFS that depends on the sector size. Other elements are written in the units of fragments.
In collaboration with: pho Reviewed by: jeff Tested by: bz, pho
|
#
216796 |
|
29-Dec-2010 |
kib |
Add kernel side support for BIO_DELETE/TRIM on UFS.
The FS_TRIM fs flag indicates that administrator requested issuing of TRIM commands for the volume. UFS will only send the command to disk if the disk reports GEOM::candelete attribute.
Since disk queue is reordered, data block is marked as free in the bitmap only after TRIM command completed. Due to need to sleep waiting for i/o to finish, TRIM bio_done routine schedules taskqueue to set the bitmap bit.
Based on the patch by: mckusick Reviewed by: mckusick, pjd Tested by: pho MFC after: 1 month
|
#
215113 |
|
11-Nov-2010 |
kib |
Add function lbn_offset to calculate offset of the indirect block of given level.
Reviewed by: jeff Tested by: pho
|
#
212617 |
|
14-Sep-2010 |
mckusick |
Update comments in soft updates code to more fully describe the addition of journalling. Only functional change is to tighten a KASSERT.
Reviewed by: jeff Roberson
|
#
207141 |
|
24-Apr-2010 |
jeff |
- Merge soft-updates journaling from projects/suj/head into head. This brings in support for an optional intent log which eliminates the need for background fsck on unclean shutdown.
Sponsored by: iXsystems, Yahoo!, and Juniper. With help from: McKusick and Peter Holm
|
#
203784 |
|
11-Feb-2010 |
mckusick |
One last pass to get all the unsigned comparisons correct.
|
#
203763 |
|
10-Feb-2010 |
mckusick |
This fix corrects a problem in the file system that treats large inode numbers as negative rather than unsigned. For a default (16K block) file system, this bug began to show up at a file system size above about 16Tb.
To fully handle this problem, newfs must be updated to ensure that it will never create a filesystem with more than 2^32 inodes. That patch will be forthcoming soon.
Reported by: Scott Burns, John Kilburg, Bruce Evans Followup by: Jeff Roberson PR: 133980 MFC after: 2 weeks
|
#
202113 |
|
11-Jan-2010 |
mckusick |
Background:
When renaming a directory it passes through several intermediate states. First its new name will be created causing it to have two names (from possibly different parents). Next, if it has different parents, its value of ".." will be changed from pointing to the old parent to pointing to the new parent. Concurrently, its old name will be removed bringing it back into a consistent state. When fsck encounters an extra name for a directory, it offers to remove the "extraneous hard link"; when it finds that the names have been changed but the update to ".." has not happened, it offers to rewrite ".." to point at the correct parent. Both of these changes were considered unexpected so would cause fsck in preen mode or fsck in background mode to fail with the need to run fsck manually to fix these problems. Fsck running in preen mode or background mode now corrects these expected inconsistencies that arise during directory rename. The functionality added with this update is used by fsck running in background mode to make these fixes.
Solution:
This update adds three new fsck sysctl commands to support background fsck in correcting expected inconsistencies that arise from incomplete directory rename operations. They are:
setcwd(dirinode) - set the current directory to dirinode in the filesystem associated with the snapshot. setdotdot(oldvalue, newvalue) - Verify that the inode number for ".." in the current directory is oldvalue then change it to newvalue. unlink(nameptr, oldvalue) - Verify that the inode number associated with nameptr in the current directory is oldvalue then unlink it.
As with all other fsck sysctls, these new ones may only be used by processes with appropriate priviledge.
Reported by: jeff Security issues: rwatson
|
#
200796 |
|
21-Dec-2009 |
trasz |
Implement NFSv4 ACL support for UFS.
Reviewed by: rwatson
|
#
179295 |
|
24-May-2008 |
rodrigc |
Fix comments to replace SBSIZE with SBLOCKSIZE, since SBSIZE was renamed to SBLOCKSIZE in version 1.33
Reviewed by: mckusick
|
#
163841 |
|
31-Oct-2006 |
pjd |
Add gjournal specific code to the UFS file system: - Add FS_GJOURNAL flag which enables gjournal support on a file system. - Add cg_unrefs field to the cylinder group structure which holds number of unreferenced (orphaned) inodes in the given cylinder group. - Add fs_unrefs field to the super block structure which holds total number of unreferenced (orphaned) inodes. - When file or a directory is orphaned (last reference is removed, but object is still open), increase fs_unrefs and cg_unrefs fields, which is a hint for fsck in which cylinder groups looks for such (orphaned) objects. - When file is last closed, decrease {fs,cg}_unrefs fields. - Add VV_DELETED vnode flag which points at orphaned objects.
Sponsored by: home.pl
|
#
142123 |
|
20-Feb-2005 |
delphij |
The recomputation of file system summary at mount time can be a very slow process, especially for large file systems that is just recovered from a crash.
Since the summary is already re-sync'ed every 30 second, we will not lag behind too much after a crash. With this consideration in mind, it is more reasonable to transfer the responsibility to background fsck, to reduce the delay after a crash.
Add a new sysctl variable, vfs.ffs.compute_summary_at_mount, to control this behavior. When set to nonzero, we will get the "old" behavior, that the summary is computed immediately at mount time.
Add five new sysctl variables to adjust ndir, nbfree, nifree, nffree and numclusters respectively. Teach fsck_ffs about these API, however, intentionally not to check the existence, since kernels without these sysctls must have recomputed the summary and hence no adjustments are necessary.
This change has eliminated the usual tens of minutes of delay of mounting large dirty volumes.
Reviewed by: mckusick MFC After: 1 week
|
#
140702 |
|
24-Jan-2005 |
jeff |
- Mark the struct fs members that require the ufsmount mutex. - Define some macros for manipulating the fs_active bitmap.
Sponsored By: Isilon Systems, Inc.
|
#
139825 |
|
07-Jan-2005 |
imp |
/* -> /*- for license, minor formatting changes
|
#
136336 |
|
09-Oct-2004 |
njl |
Fix fsbtodb() for UFS1. This fixes an overflow for file sizes >1 TB, allowing for sizes up to 4 TB. This doesn't affect UFS2 since b is already a 64 bit type, coincidental with daddr_t.
Submitted by: bde
|
#
134011 |
|
19-Aug-2004 |
jhb |
Generalize the UFS bad magic value used to determine when a filesystem has only been partly initialized via newfs(8) so that it applies to both UFS1 and UFS2.
Submitted by: "Xin LI" delphij at frontfree dot net MFC: maybe?
|
#
129895 |
|
31-May-2004 |
krion |
- Fix typo
Approved by: tobez
|
#
127975 |
|
07-Apr-2004 |
imp |
Remove advertising clause from University of California Regent's license, per letter dated July 22, 1999 and irc message from Robert Watson saying that clause 3 can be removed from those files with an NAI copyright that also have only a University of California copyrights.
Approved by: core, rwatson
|
#
127818 |
|
03-Apr-2004 |
mux |
Fix the remaining warnings of growfs(8) on my sparc64 box with WARNS=6. I don't change the WARNS level in the Makefile because I didn't tested this on other archs.
The fs.h fix was suggested by: marcel Reviewed by: md5(1)
|
#
122783 |
|
16-Nov-2003 |
wes |
Write the UFS2 superblock with a 'BAD' magic number at the beginning of newfs, to signify the newfs operation has not yet completed. Re- write the superblock with the correct magic number once all of the cylinder groups have been created to show the operation has finished.
Sponsored by: St. Bernard Software
|
#
111238 |
|
21-Feb-2003 |
mckusick |
This patch fixes a bug in the logical block calculation macros so that they convert to 64-bit values before shifting rather than afterwards. Once fixed, they can be used rather than inline expanded.
Sponsored by: DARPA & NAI Labs.
|
#
109053 |
|
10-Jan-2003 |
marcel |
o Improve wording of the comment that accompanies fs_pad. The padding is not specific to non-i386 architectures. It is caused by non-i386 specific alignment requirements of fs_swuid, o Add a CTASSERT to catch a change in the size of struct fs at compile-time rather than run-time.
Ok'd: gordon Tested on: i386 ia64
|
#
109034 |
|
09-Jan-2003 |
gordon |
Fix superblock alignment problems on non-i386 platforms. Also change fs_uuid to fs_swuid, making it more descriptive.
Submitted by: marcel Reviewed by: peter Pointy hat to: gordon
|
#
108970 |
|
08-Jan-2003 |
gordon |
Steal some space from fs_fsmnt to create fs_volname and fs_uuid. The volname will be used to support volume names with the help of a GEOM module (to be committed). uuid will be used to deal with conflicting volume names (which doesn't work just yet).
Approved by: mckusick@
|
#
107294 |
|
27-Nov-2002 |
mckusick |
Create a new 32-bit fs_flags word in the superblock. Add code to move the old 8-bit fs_old_flags to the new location the first time that the filesystem is mounted by a new kernel. One of the unused flags in fs_old_flags is used to indicate that the flags have been moved. Leave the fs_old_flags word intact so that it will work properly if used on an old kernel.
Change the fs_sblockloc superblock location field to be in units of bytes instead of in units of filesystem fragments. The old units did not work properly when the fragment size exceeeded the superblock size (8192). Update old fs_sblockloc values at the same time that the flags are moved.
Suggested by: BOUWSMA Barry <freebsd-misuser@netscum.dyndns.dk> Sponsored by: DARPA & NAI Labs.
|
#
105112 |
|
14-Oct-2002 |
rwatson |
Define two new superblock file system flags:
FS_ACLS Administrative enable/disable of extended ACL support FS_MULTILABEL Administrative flag to indicate to the MAC Framework that objects in the file system are individually labeled using extended attributes.
Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories Reviewed by: (in principal) mckusick, phk
|
#
98542 |
|
21-Jun-2002 |
mckusick |
This commit adds basic support for the UFS2 filesystem. The UFS2 filesystem expands the inode to 256 bytes to make space for 64-bit block pointers. It also adds a file-creation time field, an ability to use jumbo blocks per inode to allow extent like pointer density, and space for extended attributes (up to twice the filesystem block size worth of attributes, e.g., on a 16K filesystem, there is space for 32K of attributes). UFS2 fully supports and runs existing UFS1 filesystems. New filesystems built using newfs can be built in either UFS1 or UFS2 format using the -O option. In this commit UFS1 is the default format, so if you want to build UFS2 format filesystems, you must specify -O 2. This default will be changed to UFS2 when UFS2 proves itself to be stable. In this commit the boot code for reading UFS2 filesystems is not compiled (see /sys/boot/common/ufsread.c) as there is insufficient space in the boot block. Once the size of the boot block is increased, this code can be defined.
Things to note: the definition of SBSIZE has changed to SBLOCKSIZE. The header file <ufs/ufs/dinode.h> must be included before <ufs/ffs/fs.h> so as to get the definitions of ufs2_daddr_t and ufs_lbn_t.
Still TODO: Verify that the first level bootstraps work for all the architectures. Convert the utility ffsinfo to understand UFS2 and test growfs. Add support for the extended attribute storage. Update soft updates to ensure integrity of extended attribute storage. Switch the current extended attribute interfaces to use the extended attribute storage. Add the extent like functionality (framework is there, but is currently never used).
Sponsored by: DARPA & NAI Labs. Reviewed by: Poul-Henning Kamp <phk@freebsd.org>
|
#
96755 |
|
16-May-2002 |
trhodes |
More s/file system/filesystem/g
|
#
96473 |
|
12-May-2002 |
phk |
ARGH! SBLOCK is not unused. Try to get this right.
BBSIZE belongs in <sys/disklabel.h> (but shouldn't be a constant).
Define SBLOCK again, using the right math.
Sponsored by: DARPA & NAI Labs.
|
#
96472 |
|
12-May-2002 |
phk |
Remove #define for BBOFF, it is assumed == 0 so many places that we might as well forget about it. In fact the only thing which used it was the SBOFF macro.
Sponsored by: DARPA & NAI Labs.
|
#
96471 |
|
12-May-2002 |
phk |
Remove unused BBLOCK and SBLOCK #defines.
Sponsored by: DARPA & NAI Labs.
|
#
93736 |
|
03-Apr-2002 |
phk |
Move the FFS parameter MAXFRAG from <sys/param.h> to <ufs/ffs/fs.h>
Sponsored by: DARPA & NAI Labs.
|
#
89450 |
|
17-Jan-2002 |
mckusick |
Fix a bug introduced in ffs_snapshot.c -r1.25 and fs.h -r1.26 which caused incomplete snapshots to be taken. When background fsck would run on these snapshots, the result would be files being incorrectly released which would subsequently panic the kernel with ``handle_workitem_freefile: inodedep survived'', ``handle_written_inodeblock: live inodedep'', and ``handle_workitem_remove: lost inodedep'' errors.
|
#
88138 |
|
18-Dec-2001 |
mckusick |
Change the atomic_set_char to atomic_set_int and atomic_clear_char to atomic_clear_int to ease the implementation for the sparc64.
Requested by: Jake Burkholder <jake@locore.ca>
|
#
88025 |
|
16-Dec-2001 |
iedowse |
Move the new superblock field `fs_active' into the region of the superblock that is already set up to handle pointer types. This fixes an accidental change in the superblock size on 64-bit platforms caused by revision 1.24.
|
#
87827 |
|
13-Dec-2001 |
mckusick |
Minimize the time necessary to suspend operations on a filesystem when taking a snapshot. The two time consuming operations are scanning all the filesystem bitmaps to determine which blocks are in use and scanning all the other snapshots so as to be able to expunge their blocks from the view of the current snapshot. The bitmap scanning is broken into two passes. Before suspending the filesystem all bitmaps are scanned. After the suspension, those bitmaps that changed after being scanned the first time are rescanned. Typically there are few bitmaps that need to be rescanned. The expunging of other snapshots is now done after the suspension is released by observing that we can easily identify any blocks that were allocated to them after the suspension (they will be maked as `not needing to be copied' in the just created snapshot). For all the gory details, see the ``Running fsck in the Background'' paper in the Usenix BSDCon 2002 Conference Proceedings, pages 55-64.
|
#
79769 |
|
15-Jul-2001 |
peter |
Use a fixed type for times in on-disk structures for ufs rather than something that could potentially change like time_t.
|
#
76357 |
|
08-May-2001 |
mckusick |
When running with soft updates, track the number of blocks and files that are committed to being freed and reflect these blocks in the counts returned by statfs (and thus also by the `df' command). This change allows programs such as those that do news expiration to know when to stop if they are trying to create a certain percentage of free space. Note that this change does not solve the much harder problem of making this to-be-freed space available to applications that want it (thus on a nearly full filesystem, you may still encounter out-of-space conditions even though the free space will show up eventually). Hopefully this harder problem will be the subject of a future enhancement.
|
#
75503 |
|
14-Apr-2001 |
mckusick |
This checkin adds support in ufs/ffs for the FS_NEEDSFSCK flag. It is described in ufs/ffs/fs.h as follows:
/* * Filesystem flags. * * Note that the FS_NEEDSFSCK flag is set and cleared only by the * fsck utility. It is set when background fsck finds an unexpected * inconsistency which requires a traditional foreground fsck to be * run. Such inconsistencies should only be found after an uncorrectable * disk error. A foreground fsck will clear the FS_NEEDSFSCK flag when * it has successfully cleaned up the filesystem. The kernel uses this * flag to enforce that inconsistent filesystems be mounted read-only. */ #define FS_UNCLEAN 0x01 /* filesystem not clean at mount */ #define FS_DOSOFTDEP 0x02 /* filesystem using soft dependencies */ #define FS_NEEDSFSCK 0x04 /* filesystem needs sync fsck before mount */
|
#
75377 |
|
10-Apr-2001 |
mckusick |
Directory layout preference improvements from Grigoriy Orlov <gluk@ptci.ru>. His description of the problem and solution follow. My own tests show speedups on typical filesystem intensive workloads of 5% to 12% which is very impressive considering the small amount of code change involved.
------
One day I noticed that some file operations run much faster on small file systems then on big ones. I've looked at the ffs algorithms, thought about them, and redesigned the dirpref algorithm.
First I want to describe the results of my tests. These results are old and I have improved the algorithm after these tests were done. Nevertheless they show how big the perfomance speedup may be. I have done two file/directory intensive tests on a two OpenBSD systems with old and new dirpref algorithm. The first test is "tar -xzf ports.tar.gz", the second is "rm -rf ports". The ports.tar.gz file is the ports collection from the OpenBSD 2.8 release. It contains 6596 directories and 13868 files. The test systems are:
1. Celeron-450, 128Mb, two IDE drives, the system at wd0, file system for test is at wd1. Size of test file system is 8 Gb, number of cg=991, size of cg is 8m, block size = 8k, fragment size = 1k OpenBSD-current from Dec 2000 with BUFCACHEPERCENT=35
2. PIII-600, 128Mb, two IBM DTLA-307045 IDE drives at i815e, the system at wd0, file system for test is at wd1. Size of test file system is 40 Gb, number of cg=5324, size of cg is 8m, block size = 8k, fragment size = 1k OpenBSD-current from Dec 2000 with BUFCACHEPERCENT=50
You can get more info about the test systems and methods at: http://www.ptci.ru/gluk/dirpref/old/dirpref.html
Test Results
tar -xzf ports.tar.gz rm -rf ports mode old dirpref new dirpref speedup old dirprefnew dirpref speedup First system normal 667 472 1.41 477 331 1.44 async 285 144 1.98 130 14 9.29 sync 768 616 1.25 477 334 1.43 softdep 413 252 1.64 241 38 6.34 Second system normal 329 81 4.06 263.5 93.5 2.81 async 302 25.7 11.75 112 2.26 49.56 sync 281 57.0 4.93 263 90.5 2.9 softdep 341 40.6 8.4 284 4.76 59.66
"old dirpref" and "new dirpref" columns give a test time in seconds. speedup - speed increasement in times, ie. old dirpref / new dirpref.
------
Algorithm description
The old dirpref algorithm is described in comments:
/* * Find a cylinder to place a directory. * * The policy implemented by this algorithm is to select from * among those cylinder groups with above the average number of * free inodes, the one with the smallest number of directories. */
A new directory is allocated in a different cylinder groups than its parent directory resulting in a directory tree that is spreaded across all the cylinder groups. This spreading out results in a non-optimal access to the directories and files. When we have a small filesystem it is not a problem but when the filesystem is big then perfomance degradation becomes very apparent.
What I mean by a big file system ?
1. A big filesystem is a filesystem which occupy 20-30 or more percent of total drive space, i.e. first and last cylinder are physically located relatively far from each other. 2. It has a relatively large number of cylinder groups, for example more cylinder groups than 50% of the buffers in the buffer cache.
The first results in long access times, while the second results in many buffers being used by metadata operations. Such operations use cylinder group blocks and on-disk inode blocks. The cylinder group block (fs->fs_cblkno) contains struct cg, inode and block bit maps. It is 2k in size for the default filesystem parameters. If new and parent directories are located in different cylinder groups then the system performs more input/output operations and uses more buffers. On filesystems with many cylinder groups, lots of cache buffers are used for metadata operations.
My solution for this problem is very simple. I allocate many directories in one cylinder group. I also do some things, so that the new allocation method does not cause excessive fragmentation and all directory inodes will not be located at a location far from its file's inodes and data. The algorithm is: /* * Find a cylinder group to place a directory. * * The policy implemented by this algorithm is to allocate a * directory inode in the same cylinder group as its parent * directory, but also to reserve space for its files inodes * and data. Restrict the number of directories which may be * allocated one after another in the same cylinder group * without intervening allocation of files. * * If we allocate a first level directory then force allocation * in another cylinder group. */
My early versions of dirpref give me a good results for a wide range of file operations and different filesystem capacities except one case: those applications that create their entire directory structure first and only later fill this structure with files.
My solution for such and similar cases is to limit a number of directories which may be created one after another in the same cylinder group without intervening file creations. For this purpose, I allocate an array of counters at mount time. This array is linked to the superblock fs->fs_contigdirs[cg]. Each time a directory is created the counter increases and each time a file is created the counter decreases. A 60Gb filesystem with 8mb/cg requires 10kb of memory for the counters array.
The maxcontigdirs is a maximum number of directories which may be created without an intervening file creation. I found in my tests that the best performance occurs when I restrict the number of directories in one cylinder group such that all its files may be located in the same cylinder group. There may be some deterioration in performance if all the file inodes are in the same cylinder group as its containing directory, but their data partially resides in a different cylinder group. The maxcontigdirs value is calculated to try to prevent this condition. Since there is no way to know how many files and directories will be allocated later I added two optimization parameters in superblock/tunefs. They are:
int32_t fs_avgfilesize; /* expected average file size */ int32_t fs_avgfpdir; /* expected # of files per directory */
These parameters have reasonable defaults but may be tweeked for special uses of a filesystem. They are only necessary in rare cases like better tuning a filesystem being used to store a squid cache.
I have been using this algorithm for about 3 months. I have done a lot of testing on filesystems with different capacities, average filesize, average number of files per directory, and so on. I think this algorithm has no negative impact on filesystem perfomance. It works better than the default one in all cases. The new dirpref will greatly improve untarring/removing/coping of big directories, decrease load on cvs servers and much more. The new dirpref doesn't speedup a compilation process, but also doesn't slow it down.
Obtained from: Grigoriy Orlov <gluk@ptci.ru>
|
#
74548 |
|
21-Mar-2001 |
mckusick |
Add kernel support for running fsck on active filesystems.
|
#
73942 |
|
07-Mar-2001 |
mckusick |
Fixes to track snapshot copy-on-write checking in the specinfo structure rather than assuming that the device vnode would reside in the FFS filesystem (which is obviously a broken assumption with the device filesystem).
|
#
71073 |
|
15-Jan-2001 |
iedowse |
The ffs superblock includes a 128-byte region for use by temporary in-core pointers to summary information. An array in this region (fs_csp) could overflow on filesystems with a very large number of cylinder groups (~16000 on i386 with 8k blocks). When this happens, other fields in the superblock get corrupted, and fsck refuses to check the filesystem.
Solve this problem by replacing the fs_csp array in 'struct fs' with a single pointer, and add padding to keep the length of the 128-byte region fixed. Update the kernel and userland utilities to use just this single pointer.
With this change, the kernel no longer makes use of the superblock fields 'fs_csshift' and 'fs_csmask'. Add a comment to newfs/mkfs.c to indicate that these fields must be calculated for compatibility with older kernels.
Reviewed by: mckusick
|
#
62553 |
|
04-Jul-2000 |
mckusick |
Get userland visible flags added for snapshots to give a few days advance preparation for them to get migrated into place so that subsequent changes in utilities will not fail to compile for lack of up-to-date header files in /usr/include.
|
#
58155 |
|
17-Mar-2000 |
mckusick |
Use 64-bit math to calculate if we have hit our freespace limit. Necessary for coherent results on filesystems bigger than 0.5Tb.
|
#
50477 |
|
27-Aug-1999 |
peter |
$Id$ -> $FreeBSD$
|
#
34266 |
|
08-Mar-1998 |
julian |
Reviewed by: dyson@freebsd.org (john Dyson), dg@root.com (david greenman) Submitted by: Kirk McKusick (mcKusick@mckusick.com) Obtained from: WHistle development tree
|
#
24171 |
|
24-Mar-1997 |
bde |
Fixed corrupted newline and corrupted tab in previous commit.
|
#
24149 |
|
23-Mar-1997 |
guido |
Add generation number randomization. Newly created filesystems wil now automatically have random generation numbers. The kenel way of handling those also changed. Further it is advised to run fsirand on all your nfs exported filesystems. the code is mostly copied from OpenBSD, with the randomization chanegd to use /dev/urandom Reviewed by: Garrett Obtained from: OpenBSD
|
#
22975 |
|
22-Feb-1997 |
peter |
Back out part 1 of the MCFH that changed $Id$ to $FreeBSD$. We are not ready for it yet.
|
#
22521 |
|
10-Feb-1997 |
dyson |
This is the kernel Lite/2 commit. There are some requisite userland changes, so don't expect to be able to run the kernel as-is (very well) without the appropriate Lite/2 userland changes.
The system boots and can mount UFS filesystems.
Untested: ext2fs, msdosfs, NFS Known problems: Incorrect Berkeley ID strings in some files. Mount_std mounts will not work until the getfsent library routine is changed.
Reviewed by: various people Submitted by: Jeffery Hsu <hsu@freebsd.org>
|
#
21673 |
|
14-Jan-1997 |
jkh |
Make the long-awaited change from $Id$ to $FreeBSD$
This will make a number of things easier in the future, as well as (finally!) avoiding the Id-smashing problem which has plagued developers for so long.
Boy, I'm glad we're not using sup anymore. This update would have been insane otherwise.
|
#
18899 |
|
12-Oct-1996 |
bde |
Fixed lblktosize(). It overflowed at 2G. This bug only affected ufs_read() and ufs_write().
Found by: looking at warnings for comparing the result of lblktosize() (which is usually daddr_t = long) with file sizes (which are u_quad_t for ufs). File sizes should probably be off_t's to avoid warnings when the are compared with file offsets, so the fixed lblktosize() casts to off_t instead of u_quad_t.
Added definition of smalllblksize(). It is the same as the old lblksize() and is more efficient for small block numbers on 32-bit machines.
Use smalllblktosize() instead of its expansion in blksize() and dblksize(). This keeps the line length short and makes it more obvious that the shift can't overflow.
|
#
13765 |
|
30-Jan-1996 |
mpp |
Fix a bunch of spelling errors in the comment fields of a bunch of system include files.
|
#
8876 |
|
30-May-1995 |
rgrimes |
Remove trailing whitespace.
|
#
6994 |
|
10-Mar-1995 |
dg |
Increased default minfree to 8%.
|
#
2176 |
|
21-Aug-1994 |
paul |
Made idempotent Reviewed by: Submitted by:
|
#
1817 |
|
02-Aug-1994 |
dg |
Added $Id$
|
#
1542 |
|
24-May-1994 |
rgrimes |
This commit was generated by cvs2svn to compensate for changes in r1541, which included commits to RCS files with non-trunk default branches.
|
#
1541 |
|
24-May-1994 |
rgrimes |
BSD 4.4 Lite Kernel Sources
|