History log of /freebsd-11-stable/sys/ufs/ufs/ufs_dirhash.c
Revision Date Author Comments
(<<< Hide modified files)
(Show modified files >>>)
# 326845 14-Dec-2017 kib

MFC r326657:
Fix livelock in ufsdirhash_create().


# 302408 07-Jul-2016 gjb

Copy head@r302406 to stable/11 as part of the 11.0-RELEASE cycle.
Prune svn:mergeinfo from the new branch, as nothing has been merged
here.

Additional commits post-branch will follow.

Approved by: re (implicit)
Sponsored by: The FreeBSD Foundation


/freebsd-11-stable/MAINTAINERS
/freebsd-11-stable/cddl
/freebsd-11-stable/cddl/contrib/opensolaris
/freebsd-11-stable/cddl/contrib/opensolaris/cmd/dtrace/test/tst/common/print
/freebsd-11-stable/cddl/contrib/opensolaris/cmd/zfs
/freebsd-11-stable/cddl/contrib/opensolaris/lib/libzfs
/freebsd-11-stable/contrib/amd
/freebsd-11-stable/contrib/apr
/freebsd-11-stable/contrib/apr-util
/freebsd-11-stable/contrib/atf
/freebsd-11-stable/contrib/binutils
/freebsd-11-stable/contrib/bmake
/freebsd-11-stable/contrib/byacc
/freebsd-11-stable/contrib/bzip2
/freebsd-11-stable/contrib/com_err
/freebsd-11-stable/contrib/compiler-rt
/freebsd-11-stable/contrib/dialog
/freebsd-11-stable/contrib/dma
/freebsd-11-stable/contrib/dtc
/freebsd-11-stable/contrib/ee
/freebsd-11-stable/contrib/elftoolchain
/freebsd-11-stable/contrib/elftoolchain/ar
/freebsd-11-stable/contrib/elftoolchain/brandelf
/freebsd-11-stable/contrib/elftoolchain/elfdump
/freebsd-11-stable/contrib/expat
/freebsd-11-stable/contrib/file
/freebsd-11-stable/contrib/gcc
/freebsd-11-stable/contrib/gcclibs/libgomp
/freebsd-11-stable/contrib/gdb
/freebsd-11-stable/contrib/gdtoa
/freebsd-11-stable/contrib/groff
/freebsd-11-stable/contrib/ipfilter
/freebsd-11-stable/contrib/ldns
/freebsd-11-stable/contrib/ldns-host
/freebsd-11-stable/contrib/less
/freebsd-11-stable/contrib/libarchive
/freebsd-11-stable/contrib/libarchive/cpio
/freebsd-11-stable/contrib/libarchive/libarchive
/freebsd-11-stable/contrib/libarchive/libarchive_fe
/freebsd-11-stable/contrib/libarchive/tar
/freebsd-11-stable/contrib/libc++
/freebsd-11-stable/contrib/libc-vis
/freebsd-11-stable/contrib/libcxxrt
/freebsd-11-stable/contrib/libexecinfo
/freebsd-11-stable/contrib/libpcap
/freebsd-11-stable/contrib/libstdc++
/freebsd-11-stable/contrib/libucl
/freebsd-11-stable/contrib/libxo
/freebsd-11-stable/contrib/llvm
/freebsd-11-stable/contrib/llvm/projects/libunwind
/freebsd-11-stable/contrib/llvm/tools/clang
/freebsd-11-stable/contrib/llvm/tools/lldb
/freebsd-11-stable/contrib/llvm/tools/llvm-dwarfdump
/freebsd-11-stable/contrib/llvm/tools/llvm-lto
/freebsd-11-stable/contrib/mdocml
/freebsd-11-stable/contrib/mtree
/freebsd-11-stable/contrib/ncurses
/freebsd-11-stable/contrib/netcat
/freebsd-11-stable/contrib/ntp
/freebsd-11-stable/contrib/nvi
/freebsd-11-stable/contrib/one-true-awk
/freebsd-11-stable/contrib/openbsm
/freebsd-11-stable/contrib/openpam
/freebsd-11-stable/contrib/openresolv
/freebsd-11-stable/contrib/pf
/freebsd-11-stable/contrib/sendmail
/freebsd-11-stable/contrib/serf
/freebsd-11-stable/contrib/sqlite3
/freebsd-11-stable/contrib/subversion
/freebsd-11-stable/contrib/tcpdump
/freebsd-11-stable/contrib/tcsh
/freebsd-11-stable/contrib/tnftp
/freebsd-11-stable/contrib/top
/freebsd-11-stable/contrib/top/install-sh
/freebsd-11-stable/contrib/tzcode/stdtime
/freebsd-11-stable/contrib/tzcode/zic
/freebsd-11-stable/contrib/tzdata
/freebsd-11-stable/contrib/unbound
/freebsd-11-stable/contrib/vis
/freebsd-11-stable/contrib/wpa
/freebsd-11-stable/contrib/xz
/freebsd-11-stable/crypto/heimdal
/freebsd-11-stable/crypto/openssh
/freebsd-11-stable/crypto/openssl
/freebsd-11-stable/gnu/lib
/freebsd-11-stable/gnu/usr.bin/binutils
/freebsd-11-stable/gnu/usr.bin/cc/cc_tools
/freebsd-11-stable/gnu/usr.bin/gdb
/freebsd-11-stable/lib/libc/locale/ascii.c
/freebsd-11-stable/sys/cddl/contrib/opensolaris
/freebsd-11-stable/sys/contrib/dev/acpica
/freebsd-11-stable/sys/contrib/ipfilter
/freebsd-11-stable/sys/contrib/libfdt
/freebsd-11-stable/sys/contrib/octeon-sdk
/freebsd-11-stable/sys/contrib/x86emu
/freebsd-11-stable/sys/contrib/xz-embedded
/freebsd-11-stable/usr.sbin/bhyve/atkbdc.h
/freebsd-11-stable/usr.sbin/bhyve/bhyvegc.c
/freebsd-11-stable/usr.sbin/bhyve/bhyvegc.h
/freebsd-11-stable/usr.sbin/bhyve/console.c
/freebsd-11-stable/usr.sbin/bhyve/console.h
/freebsd-11-stable/usr.sbin/bhyve/pci_fbuf.c
/freebsd-11-stable/usr.sbin/bhyve/pci_xhci.c
/freebsd-11-stable/usr.sbin/bhyve/pci_xhci.h
/freebsd-11-stable/usr.sbin/bhyve/ps2kbd.c
/freebsd-11-stable/usr.sbin/bhyve/ps2kbd.h
/freebsd-11-stable/usr.sbin/bhyve/ps2mouse.c
/freebsd-11-stable/usr.sbin/bhyve/ps2mouse.h
/freebsd-11-stable/usr.sbin/bhyve/rfb.c
/freebsd-11-stable/usr.sbin/bhyve/rfb.h
/freebsd-11-stable/usr.sbin/bhyve/sockstream.c
/freebsd-11-stable/usr.sbin/bhyve/sockstream.h
/freebsd-11-stable/usr.sbin/bhyve/usb_emul.c
/freebsd-11-stable/usr.sbin/bhyve/usb_emul.h
/freebsd-11-stable/usr.sbin/bhyve/usb_mouse.c
/freebsd-11-stable/usr.sbin/bhyve/vga.c
/freebsd-11-stable/usr.sbin/bhyve/vga.h
# 298433 21-Apr-2016 pfg

sys: use our roundup2/rounddown2() macros when param.h is available.

rounddown2 tends to produce longer lines than the original code
and when the code has a high indentation level it was not really
advantageous to do the replacement.

This tries to strike a balance between readability using the macros
and flexibility of having the expressions, so not everything is
converted.


# 283735 29-May-2015 kib

Remove several write-only variables, all reported by the gcc 4.9
buildkernel run.

Some of them were write-only under some kernel options, e.g. variables
keeping values only used by CTR() macros. It costs nothing to the
code readability and correctness to eliminate the warnings in those
cases too by removing the local cached values used only for
single-access.

Review: https://reviews.freebsd.org/D2665
Reviewed by: rodrigc
Looked at by: bjk
Sponsored by: The FreeBSD Foundation
MFC after: 1 week


# 274835 21-Nov-2014 davide

Use the correct variable name.


# 274834 21-Nov-2014 davide

Make ufs_dirhashreclaimperc a percentage for real and
rename it to ufs_dirhashreclaimpercent, as suggested
by jhb@. As an added bonus this avoids divide-by-zero
errors.

Requested by: jhb, markj
Reviewied by: jhb, markj


# 270588 25-Aug-2014 davide

Rather than using an hardcoded reclaim age, rely on an LRU-like approach
for dirhash cache, setting a target percent to reclaim (exposed via
SYSCTL). This allows to always make some amount of progress keeping the maximum
reclaim age dynamic.

Tested by: pho
Reviewed by: jhb


# 254986 28-Aug-2013 ivoras

Take a very small step toward the Century of the Anchovy by increasing the
time dirhash entries stay in memory before being considered for eviction to
1 minute.


# 230221 16-Jan-2012 ivoras

Add a bit of verbosity to the comment.


# 219388 07-Mar-2011 kib

Simplify uses of the web of pointers.

Reviewed by: mckusick
MFC after: 1 week


# 219384 07-Mar-2011 jhb

The UFS dirhash code was attempting to update shared state in the dirhash
from multiple threads while holding a shared lock during a lookup operation.
This could result in incorrect ENOENT failures which could then be
permanently stored in the name cache.

Specifically, the dirhash code optimizes the case that a single thread is
walking a directory sequentially opening (or stat'ing) each file. It uses
state in the dirhash structure to determine if a given lookup is using the
optimization. If the optimization fails, it disables it and restarts the
lookup. The problem arises when two threads both attempt the optimization
and fail. The first thread will restart the loop, but the second thread
will incorrectly think that it did not try the optimization and will only
examine a subset of the directory entires in its hash chain. As a result,
it may fail to find its directory entry and incorrectly fail with ENOENT.

To make this safe for use with shared locks, simplify the state stored in
the dirhash and move some of the state (the part that determines if the
current thread is trying the optimization) into a local variable. One
result is that we will now try the optimization more often. We still
update the value under the shared lock, but it is a single atomic store
similar to i_diroff that is stored in UFS directory i-nodes for the
non-dirhash lookup.

Reviewed by: kib
MFC after: 1 week


# 214359 25-Oct-2010 ivoras

Bring vfs.ufs.dirhash_maxmem into the age of the fruitbat and make it
autotuned. It is only an upper bound (the memory is not always allocated)
and the system contains a vm_lowmem handler so nothing will crash and burn
if it's tuned too high.

Reviewed by: mckusick


# 207141 24-Apr-2010 jeff

- Merge soft-updates journaling from projects/suj/head into head. This
brings in support for an optional intent log which eliminates the need
for background fsck on unclean shutdown.

Sponsored by: iXsystems, Yahoo!, and Juniper.
With help from: McKusick and Peter Holm


# 201717 07-Jan-2010 mckusick

KASSERT that condition raised by Coverity cannot happen.

Found by: Coverity Prevent (tm)
KASSERT by: sam


# 195003 25-Jun-2009 snb

Fix a bug reported by pho@ where one can induce a panic by decreasing
vfs.ufs.dirhash_maxmem below the current amount of memory used by dirhash. When
ufsdirhash_build() is called with the memory in use greater than dirhash_maxmem,
it attempts to free up memory by calling ufsdirhash_recycle(). If successful in
freeing enough memory, ufsdirhash_recycle() leaves the dirhash list locked. But
at this point in ufsdirhash_build(), the list is not explicitly unlocked after
the call(s) to ufsdirhash_recycle(). When we next attempt to lock the dirhash
list, we will get a "panic: _mtx_lock_sleep: recursed on non-recursive mutex
dirhash list".

Tested by: pho
Approved by: dwmalone (mentor)
MFC after: 3 weeks


# 194387 17-Jun-2009 snb

Keep dirhash tailq locked throughout the entirety of ufsdirhash_destroy() to fix
a potential race pointed out by pjd. Also use TAILQ_FOREACH_SAFE to iterate over
dirhashes in ufsdirhash_lowmem(), so that we can continue iterating even after a
dirhash is destroyed.

Suggested by: pjd
Tested by: pho
Approved by: dwmalone (mentor)


# 193375 03-Jun-2009 snb

Add vm_lowmem event handler for dirhash. This will cause dirhashes to be
deleted when the system is low on memory. This ought to allow an increase to
vfs.ufs.dirhash_maxmem on machines that have lots of memory, without
degrading performance by having too much memory reserved for dirhash when
other things need it. The default value for dirhash_maxmem is being kept at
2MB for now, though.

This work was mostly done during the 2008 Google Summer of Code.

Approved by: dwmalone (mentor), re
MFC after: 3 months


# 187474 20-Jan-2009 jhb

Add a comment explaining why the "bufwait" / "dirhash" LOR reported by
WITNESS will not actually result in a deadlock.

Discussed with: kib
MFC after: 1 week


# 185102 19-Nov-2008 jhb

Fix typo.


# 184651 04-Nov-2008 jhb

Quiet a WITNESS warning with the dirhash sx locks by setting the DUPOK
flag. Specifically, if two threads race to create a dirhash for a
directory, then one might already have created a private dirhash
structure (and locked it) when it realizes the directory now has a
structure and tries to lock that one.


# 184205 23-Oct-2008 des

Retire the MALLOC and FREE macros. They are an abomination unto style(9).

MFC after: 3 months


# 183280 22-Sep-2008 jhb

Close a race between concurrent calls to ufsdirhash_recycle() and
ufsdirhash_free() introduced in my last commit by removing the dirhash
about to be free'd in ufsdirhash_free() from the global dirhash list
before dropping the sx lock.

Tested by: kris


# 183080 16-Sep-2008 jhb

Fix a race with shared lookups on UFS. If the the dirhash code reached the
cap on memory usage, then shared LOOKUP operations could start free'ing
dirhash structures. Without these fixes, concurrent free's on the same
directory could result in one of the threads blocked on a lock in a dirhash
structure free'd by the other thread.
- Replace the lockmgr lock in the dirhash structure with an sx lock.
- Use a reference count managed with ufsdirhash_hold()/drop() to determine
when to free the dirhash structures. The directory i-node holds a
reference while the dirhash is attached to an i-node. Code that wishes
to lock the dirhash while holding a shared vnode lock must first
acquire a private reference to the dirhash while holding the vnode
interlock before acquiring the dirhash sx lock. After acquiring the sx
lock, it drops the private reference after checking to see if the
dirhash is still used by the directory i-node.


# 178110 11-Apr-2008 jeff

- Use a lockmgr lock rather than a mtx to protect dirhash. This lock
may be held for the duration of the various dirhash operations which
avoids many complex unlock/lock/revalidate sequences.
- Permit shared locks on lookup. To protect the ip->i_dirhash pointer we
use the vnode interlock in the shared case. Callers holding the
exclusive vnode lock can run without fear of concurrent modification to
i_dirhash.
- Hold an exclusive dirhash lock when creating the dirhash structure for
the first time or when re-creating a dirhash structure which has been
recycled.

Tested by: kris, pho


# 151897 31-Oct-2005 rwatson

Normalize a significant number of kernel malloc type names:

- Prefer '_' to ' ', as it results in more easily parsed results in
memory monitoring tools such as vmstat.

- Remove punctuation that is incompatible with using memory type names
as file names, such as '/' characters.

- Disambiguate some collisions by adding subsystem prefixes to some
memory types.

- Generally prefer lower case to upper case.

- If the same type is defined in multiple architecture directories,
attempt to use the same name in additional cases.

Not all instances were caught in this change, so more work is required to
finish this conversion. Similar changes are required for UMA zone names.


# 149178 17-Aug-2005 iedowse

In the ufsdirhash_build() failure case for corrupted directories
or unreadable blocks, make sure to destroy the mutex we created.
Also fix an unrelated typo in a comment.

Found by: Peter Holm's stress tests
Reviewed by: dwmalone
MFC after: 3 days


# 141631 10-Feb-2005 phk

Make a some SYSCTL_NODEs and some of FFS's VFS_ methods static.


# 139825 07-Jan-2005 imp

/* -> /*- for license, minor formatting changes


# 133837 16-Aug-2004 dwmalone

When looking for some extra data to include in the hash, use the
address of the dirhash, rather than the first sizeof(struct dirhash
*) bytes of the structure (which, thankfully, seem to be constant).

Submitted by: Ted Unangst <tedu@zeitbombe.org>
MFC after: 2 weeks


# 125854 15-Feb-2004 dwmalone

Abstract dirhash's locking using macros. This should make it easier to
use the same dirhash code on different branches/platforms.

Reviewed by: Ted Unangst <tedu@zeitbombe.org>
Reviewed by: iedowse
MFC after: 3 weeks


# 116192 11-Jun-2003 obrien

Use __FBSDID().


# 111119 19-Feb-2003 imp

Back out M_* changes, per decision of the TRB.

Approved by: trb


# 109623 21-Jan-2003 alfred

Remove M_TRYWAIT/M_WAITOK/M_WAIT. Callers should use 0.
Merge M_NOWAIT/M_DONTWAIT into a single flag M_NOWAIT.


# 107868 14-Dec-2002 iedowse

Undo the adjustment of the total memory used by dirhash in the case
where allocating the dirhash structure fails. Fix a few typos in
comments and update copyright.

MFC after: 1 week


# 99101 30-Jun-2002 iedowse

Remove the bogus SYSINIT from ufs_dirhash.c and instead add a call
to ufsdirhash_init() from ufs_init(). Add uninit() functions
corresponding the ufs, dirhash, quota and ihash init() functions.


# 96874 18-May-2002 iedowse

Fix a typo where sizeof(daddr_t) was specified instead of sizeof(doff_t).
Now that daddr_t is 64-bit, this caused hash blocks to be allocated
twice as large as they need to be.


# 93818 04-Apr-2002 jhb

Change callers of mtx_init() to pass in an appropriate lock type name. In
most cases NULL is passed, but in some cases such as network driver locks
(which use the MTX_NETWORK_LOCK macro) and UMA zone locks, a name is used.

Tested on: i386, alpha, sparc64


# 92807 20-Mar-2002 dwmalone

Two minor changes to dirhash, which result in some marginal benchmark
improvements.

1) If deleting an entry results in a chain of deleted slots ending in an
empty slot, then we can be a bit more aggressive about marking slots as
empty.

2) The last stage of the FNV hash is to xor the last byte of data
into the hash. This means that filenames which differ only in
the last byte will be placed close to one another in the hash
table, which forms longer chains. To work around this common
case, we also hash in the address of the dirhash structure.

news/cancel = news/articles/control/cancel for a tradspool inn server
squid2 = squid level 2 directory (dirs called 00->FF)
squid3 = squid level 3 directory (files called 00001F00->00001FFF)

mean #probes for
home dir mh inbox news/cancel tmp squid2 squid3
old successful 1.02 3.19 4.07 1.10 7.85 2.06
new successful 1.04 1.32 1.27 1.04 1.93 1.17

old unsuccessful 1.08 4.50 5.37 1.17 10.76 2.69
new unsuccessful 1.08 1.73 1.64 1.17 2.89 1.37

Reviewed by: iedowse
MFC after: 2 weeks


# 92768 20-Mar-2002 jeff

Remove references to vm_zone.h and switch over to the new uma API.


# 92098 11-Mar-2002 iedowse

Fix a bug in ufsdirhash_adjfree() that caused it to incorrectly
update the free-space statistics in some cases. The problem affected
directory blocks when the free space dropped below the size of the
maximum allowed entry size. When this happened, the free-space
summary information could claim that there are no further blocks
that can fit a maximum-size entry, even if there are.

The effect of this bug is that the directory may be enlarged even
though there is space within the directory for the new entry. This
wastes disk space and has a negative impact on performance.

Fix it by correctly computing the dh_firstfree array index, adding
a helper macro for clarity. Put an extra sanity check into
ufsdirhash_checkblock() to detect the situation in future.

Found by: dwmalone
Reviewed by: dwmalone
MFC after: 1 week


# 86350 14-Nov-2001 iedowse

Oops, when trying the dirhash sequential-access optimisation,
compare the slot offset against the predicted offset, not a boolean
flag. This typo effectively disabled the sequential optimisation,
but was otherwise harmless.

Not surprisingly, fixing this improves performance in the sequential
access case. I am seeing a 7% speedup on one machine here; using
dirhash when sequentially looking up directory entries is now about
5% faster instead of 2% slower than the non-dirhash case.

Submitted by: KOIE Hidetaka <koie@suri.co.jp>
MFC after: 1 week


# 85512 25-Oct-2001 iedowse

Default to not performing ufs_dirhash's extensive directory-block
sanity check after every directory modification. This check can be
re-enabled at any time by setting the sysctl "vfs.ufs.dirhash_docheck"
to 1.

This group of sanity tests was there to ensure that any UFS_DIRHASH
bugs could be caught by a panic before a potentially corrupted
directory block would be written to disk. It has served its main
purpose now, so disable it in the interest of performance.

MFC after: 1 week


# 84811 11-Oct-2001 jhb

Add missing includes of sys/lock.h.


# 82364 26-Aug-2001 iedowse

Stop using dirhash when a directory is removed, and ensure that we
never attempt to hash directories once they are deleted. This fixes
a problem where operations on a deleted directory could trigger
dirhash sanity panics.


# 80456 27-Jul-2001 iedowse

Disable the dirhash sanity check that panics if an unused directory
entry (d_ino == 0) is found in a position that is not the start of
a DIRBLKSIZ block.

While such entries cannot occur normally (ufs always extends the
previous entry to cover the free space instead), they do not cause
problems and fsck does not fix them, so panicking is bad.


# 79690 13-Jul-2001 iedowse

Return a locked struct buf from ufsdirhash_lookup() to avoid one
extra getblk/brelse sequence for each lookup. We already had this
buf in ufsdirhash_lookup(), so there was no point in brelse'ing it
only to have the caller immediately reaquire the same buffer.

This should make the case of sequential lookups marginally faster;
in my tests, sequential lookups with dirhash enabled are now only
around 1% slower than without dirhash.


# 79561 10-Jul-2001 iedowse

Bring in dirhash, a simple hash-based lookup optimisation for large
directories. When enabled via "options UFS_DIRHASH", in-core hash
arrays are maintained for large directories. These allow all
directory operations to take place quickly instead of requiring
long linear searches. For now anyway, dirhash is not enabled by
default.

The in-core hash arrays have a memory requirement that is approximately
half the size of the size of the on-disk directory file. A number
of new sysctl variables allow control over which directories get
hashed and over the maximum amount of memory that dirhash will use:

vfs.ufs.dirhash_minsize
The minimum on-disk directory size for which hashing should be
used. The default is 2560 (2.5k).

vfs.ufs.dirhash_maxmem
The system-wide maximum total memory to be used by dirhash data
structures. The default is 2097152 (2MB).

The current amount of memory being used by dirhash is visible
through the read-only sysctl variable vfs.ufs.dirhash_maxmem.
Finally, some extra sanity checks that are enabled by default, but
which may have an impact on performance, can be disabled by setting
vfs.ufs.dirhash_docheck to 0.

Discussed on: -fs, -hackers