#
0cd9cde7 |
|
06-Apr-2024 |
Jake Freeland <jfree@FreeBSD.org> |
ktrace: Record namei violations with KTR_CAPFAIL Report namei path lookups while Capsicum violation tracing with CAPFAIL_NAMEI. vfs caching is also ignored when tracing to mimic capability mode behavior. Reviewed by: markj Approved by: markj (mentor) MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D40680
|
#
55edc40e |
|
04-Jan-2024 |
Mark Johnston <markj@FreeBSD.org> |
file: Remove the fd parameter to fgetvp_lookup() and fgetvp_lookup_smr() The fd is always obtained from nameidata, so just fetch it from there instead. No functional change intended. Reviewed by: kib MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D43257
|
#
29363fb4 |
|
23-Nov-2023 |
Warner Losh <imp@FreeBSD.org> |
sys: Remove ancient SCCS tags. Remove ancient SCCS tags from the tree, automated scripting, with two minor fixup to keep things compiling. All the common forms in the tree were removed with a perl script. Sponsored by: Netflix
|
#
bb8ecf25 |
|
19-Oct-2023 |
Dmitry Chagin <dchagin@FreeBSD.org> |
vfs cache: Fallback to namei to resolve symlinks with leading / in target for non-native ABI This is a temporary solution to fix PR before release. During 15.0 it's necessary to refactor symlinks handling between vfs & namecache. PR: 273414 Reported by: Vincent Milum Jr, Dan Kotowski, glebius Tested by: Dan Kotowski, glebius Reviewed by: Differential Revision: https://reviews.freebsd.org/D41806 MFC after: 3 days
|
#
8b622172 |
|
04-Oct-2023 |
Mateusz Guzik <mjg@FreeBSD.org> |
vfs cache: add 2 more optimizaiton ideas
|
#
cd2105d6 |
|
04-Oct-2023 |
Mateusz Guzik <mjg@FreeBSD.org> |
vfs cache: denote a known bug in cache_remove_cnp
|
#
0f15054f |
|
22-Sep-2023 |
Mateusz Guzik <mjg@FreeBSD.org> |
vfs cache: plug a hypothetical corner case when freeing cache_zap_unlocked_bucket is called with a bunch of addresses and without any locks held, forcing it to revalidate everything from scratch. It did not account for a case where the entry is reallocated with everything the same except for the target vnode. Should the target use a different lock than the one expected, freeing would proceed without being properly synchronized. Note this is almost impossible to happen in practice.
|
#
2749c222 |
|
04-Oct-2023 |
Mateusz Guzik <mjg@FreeBSD.org> |
vfs cache: sanitize debug counters They are very rarely triggered, so no need for per-cpu distribution. At the same time the non-cpu ones still should use atomics to not lose any updates.
|
#
4862e8ac |
|
03-Oct-2023 |
Mateusz Guzik <mjg@FreeBSD.org> |
vfs cache: describe various optimization ideas While here report a sample result from running on Sapphire Rapids: An access(2) loop slapped into will-it-scale, like so: while (1) { int error = access(tmpfile, R_OK); assert(error == 0); (*iterations)++; } .. operating on /usr/obj/usr/src/amd64.amd64/sys/GENERIC/vnode_if.c In operations per second: lockless: 3462164 locked: 1362376 While the over 3.4 mln may seem like a big number, a critical look shows it should be significantly higher. A poor man's profiler, counting how many times given routine was sampled: dtrace -w -n 'profile:::profile-4999 /execname == "a.out"/ { @[sym(arg0)] = count(); } tick-5s { system("clear"); trunc(@, 40); printa("%40a %@16d\n", @); clear(@); }' [snip] kernel`kern_accessat 231 kernel`cpu_fetch_syscall_args 324 kernel`cache_fplookup_cross_mount 340 kernel`namei 346 kernel`amd64_syscall 352 kernel`tmpfs_fplookup_vexec 388 kernel`vput 467 kernel`vget_finish 499 kernel`lockmgr_unlock 529 kernel`lockmgr_slock 558 kernel`vget_prep_smr 571 kernel`vput_final 578 kernel`vdropl 1070 kernel`memcmp 1174 kernel`0xffffffff80 2080 0x0 2231 kernel`copyinstr_smap 2492 kernel`cache_fplookup 9246
|
#
38a375c4 |
|
03-Oct-2023 |
Mateusz Guzik <mjg@FreeBSD.org> |
vfs cache: s/vfs.cache_fast_lookup/vfs.cache.param.fast_lookup
|
#
bb124a0f |
|
22-Sep-2023 |
Mateusz Guzik <mjg@FreeBSD.org> |
vfs cache: retire dothits and dotdothits counters They demonstrate nothing, and in case of dotdot they are not even hits. This is just a count of lookups with "..", which are not worth mentioniong.
|
#
33fdf1af |
|
22-Sep-2023 |
Mateusz Guzik <mjg@FreeBSD.org> |
vfs cache: mark vfs.cache.param.size as read-only It was not meant to be writable and writes don't work correctly as they fail to resize the hash.
|
#
02ef039c |
|
22-Sep-2023 |
Olivier Certner <olce.freebsd@certner.fr> |
vfs cache: Drop known argument of internal cache_recalc_neg_min() 'ncnegminpct' is to be passed always, so just drop the unneeded parameter. Sponsored by: The FreeBSD Foundation Reviewed by: mjg Differential Revision: https://reviews.freebsd.org/D41763
|
#
07f52c4b |
|
14-Sep-2023 |
Mateusz Guzik <mjg@FreeBSD.org> |
vfs cache: garbage collect the fullpathfail2 counter The conditions it checks cannot legally be true (modulo races against forced unmount), so assert on it instead.
|
#
32988c14 |
|
02-Sep-2023 |
Mateusz Guzik <mjg@FreeBSD.org> |
vfs cache: fix a hang when bumping vnode limit too high Overflow in cache_changesize would make the value flip to 0 and stay there as 0 << 1 does not do anything. Note callers limit the outcome to something below u_int. Also note there entire vnode handling thing both in vfs layer as a whole and this file can't decide whether to long, u_long or u_int.
|
#
685dc743 |
|
16-Aug-2023 |
Warner Losh <imp@FreeBSD.org> |
sys: Remove $FreeBSD$: one-line .c pattern Remove /^[\s*]*__FBSDID\("\$FreeBSD\$"\);?\s*\n/
|
#
dbac8474 |
|
29-Jul-2023 |
Dmitry Chagin <dchagin@FreeBSD.org> |
vfs: Deleting a doubled inclusion of sys/capsicum.h Reviewed by: Differential Revision: https://reviews.freebsd.org/D41223 MFC after: 1 week
|
#
ba8cc6d7 |
|
12-Mar-2023 |
Mateusz Guzik <mjg@FreeBSD.org> |
vfs: use __enum_uint8 for vtype and vstate This whacks hackery around only reading v_type once. Bump __FreeBSD_version to 1400093
|
#
d7614c01 |
|
04-Jul-2023 |
Konstantin Belousov <kib@FreeBSD.org> |
vn_path_to_global_path_hardlink(): initialize len before calling vn_fullpath_hardlink(). Otherwise we get random failures when the len is automatically clipped. Reported and tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week
|
#
d6b900c9 |
|
03-Jul-2023 |
Konstantin Belousov <kib@FreeBSD.org> |
vn_path_to_global_path_hardlink(): avoid freeing non-initialized pointer Reported by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week
|
#
60bd7f97 |
|
30-May-2023 |
Mateusz Guzik <mjg@FreeBSD.org> |
vfs cache: restore sorted order of CACHE_FPL_SUPPORTED_CN_FLAGS
|
#
3d2fec7d |
|
29-May-2023 |
Dmitry Chagin <dchagin@FreeBSD.org> |
namei: Add the abilty for the ABI to specify an alternate root path For now a non-native ABI (i.e., Linux) uses the kern_alternate_path() facility to dynamically reroot lookups. First, an attempt is made to lookup the file in /compat/linux/original-path. If that fails, the lookup is done in /original-path. Thats requires a bit of code in every ABI syscall implementation where path name translation is needed. Also our kern_alternate_path() does not properly lookups absolute symlinks in second attempt, i.e., does not append /compat/linux part to the resolved link. The change is intended to avoid this by specifiyng the ABI root directory for namei(), using one call to pwd_altroot() during exec-time into the ABI. In that case namei() will dynamically reroot lookups as mentioned above. PR: 72920 Reviewed by: kib Differential revision: https://reviews.freebsd.org/D38933 MFC after: 2 month
|
#
0e0c47ec |
|
19-Apr-2023 |
Igor Ostapenko <pm@igoro.pro> |
vfs cache: fix vfs.cache.stats.* name typos Two vfs.cache.stats names are fixed: - s/.dotdothis/.dotdothits/ - s/.posszaps/.poszaps/ Signed-off-by: Igor Ostapenko <pm@igoro.pro> [mjg: massaged the header a little bit]
|
#
26b96487 |
|
07-Apr-2023 |
Mateusz Guzik <mjg@FreeBSD.org> |
vfs: more informative panic for missing fplookup ops
|
#
5f6df177 |
|
03-Nov-2021 |
Mateusz Guzik <mjg@FreeBSD.org> |
vfs: validate that vop vectors provide all or none fplookup vops In order to prevent later susprises.
|
#
22eb66d9 |
|
23-Mar-2023 |
Mateusz Guzik <mjg@FreeBSD.org> |
vfs cache: always assert on ndp->ni_resflags
|
#
c16c4ea6 |
|
23-Mar-2023 |
Mateusz Guzik <mjg@FreeBSD.org> |
vfs cache: return ENOTDIR for not_a_dir/{.,..} lookups Reported by: Oliver Kiddle PR: 270419 MFC: 3 days
|
#
dbcd7e7e |
|
21-Feb-2023 |
Mateusz Guzik <mjg@FreeBSD.org> |
vfs cache: whack set-but-not-used warn in cache_purgevfs Reported by: kib Sponsored by: Rubicon Communications, LLC ("Netgate")
|
#
a1d74b2d |
|
04-Dec-2022 |
Doug Rabson <dfr@FreeBSD.org> |
Allow realpath to work for file mounts For file mounts, the directory vnode is not available from namei and this prevents the use of vn_fullpath_hardlink. In this case, we can use the vnode which was covered by the file mount with vn_fullpath. This also disallows file mounts over files with link counts greater than one to ensure a deterministic path to the mount point. Reviewed by: mjg, kib Tested by: pho
|
#
78d35459 |
|
02-Dec-2022 |
Doug Rabson <dfr@FreeBSD.org> |
Add vn_path_to_global_path_hardlink This is similar to vn_path_to_global_path but allows for regular files which may not be present in the cache. Reviewed by: mjg, kib Tested by: pho
|
#
8f7859e8 |
|
14-Dec-2022 |
Mateusz Guzik <mjg@FreeBSD.org> |
vfs: retire the now unused SAVESTART flag Bump __FreeBSD_version to 1400075 Tested by: pho
|
#
85dac03e |
|
17-Nov-2022 |
Mateusz Guzik <mjg@FreeBSD.org> |
vfs: stop using NDFREE It provides nothing but a branchfest and next to no consumers want it anyway. Tested by: pho
|
#
d653aaec |
|
24-Oct-2022 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: add cache_assert_no_entries
|
#
5b5b7e2c |
|
17-Sep-2022 |
Mateusz Guzik <mjg@FreeBSD.org> |
vfs: always retain path buffer after lookup This removes some of the complexity needed to maintain HASBUF and allows for removing injecting SAVENAME by filesystems. Reviewed by: kib (previous version) Differential Revision: https://reviews.freebsd.org/D36542
|
#
7388fb71 |
|
27-Jun-2022 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: drop the vfs.cache_rename_add tunable The functionality has been in use since Jan 2021 -- long enough(tm).
|
#
c9b04ee4 |
|
02-Apr-2022 |
Gordon Bergling <gbe@FreeBSD.org> |
kern: Fix two typos in source code comments - s/accomodate/accommodate/ MFC after: 3 days
|
#
0c805718 |
|
24-Mar-2022 |
Mateusz Guzik <mjg@FreeBSD.org> |
vfs: fix memory leak on lookup with fds with ioctl caps Reviewed by: markj PR: 262515 Noted by: firk@cantconnect.ru Differential Revision: https://reviews.freebsd.org/D34667
|
#
bb92cd7b |
|
24-Mar-2022 |
Mateusz Guzik <mjg@FreeBSD.org> |
vfs: NDFREE(&nd, NDF_ONLY_PNBUF) -> NDFREE_PNBUF(&nd)
|
#
6ff3e8a3 |
|
19-Mar-2022 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: add a comment about a realpath bug
|
#
02fc4e31 |
|
13-Mar-2022 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: use flexible array member ... instead of 0-sizing the array
|
#
afb08a6d |
|
03-Mar-2022 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: hide hash stats behind DEBUG_CACHE They take a long time to dump and hinder sysctl -a when used with DIAGNOSTIC.
|
#
1d65a9b4 |
|
09-Feb-2022 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: improve vnode vs name assertion in cache_enter_time
|
#
611470a5 |
|
09-Feb-2022 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: remove NOCACHE handling from cache_fplookup_noentry It was copy-pasted from locked lookup. As LOOKUP operation cannot have the flag set it was always ending up setting MAKEENTRY.
|
#
7e1d3eef |
|
25-Nov-2021 |
Mateusz Guzik <mjg@FreeBSD.org> |
vfs: remove the unused thread argument from NDINIT* See b4a58fbf640409a1 ("vfs: remove cn_thread") Bump __FreeBSD_version to 1400043.
|
#
7e9680d3 |
|
14-Nov-2021 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: whack "set but not used" warnings
|
#
9a0bee9f |
|
22-Oct-2021 |
Konstantin Belousov <kib@FreeBSD.org> |
Make vn_fullpath_hardlink() externally callable Reviewed by: markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D32611
|
#
628c3b30 |
|
27-Oct-2021 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: only let non-dir descriptors through when doing EMPTYPATH lookups Otherwise things like realpath against a file and '.' end up with an illegal state of having a regular vnode for the parent. Reported by: syzbot+9aa5439dd9c708aeb1a8@syzkaller.appspotmail.com
|
#
1045352f |
|
17-Oct-2021 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: only assert on flags when dealing with EMPTYPATH Reported by: syzbot+bd48ee0843206a09e6b8@syzkaller.appspotmail.com Fixes: 7dd419cabc6bb9e0 ("cache: add empty path support")
|
#
7dd419ca |
|
26-Sep-2021 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: add empty path support This avoids spurious drop offs as EMPTY is passed regardless of the actual path name. Pushign the work inside the lookup instead of just ignorign the flag allows avoid checking for empty pathname for all other lookups.
|
#
b4a58fbf |
|
01-Oct-2021 |
Mateusz Guzik <mjg@FreeBSD.org> |
vfs: remove cn_thread It is always curthread. Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D32453
|
#
a2cb65b8 |
|
18-Sep-2021 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: count vnodes in cache_purgevfs
|
#
b65ad701 |
|
23-Aug-2021 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: retire cache_fast_revlookup sysctl Sponsored by: Rubicon Communications, LLC ("Netgate")
|
#
b30e7cb7 |
|
07-Aug-2021 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: add OPENREAD and OPENWRITE to fast path lookup
|
#
844aa31c |
|
08-Jul-2021 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: add cache_enter_time_flags
|
#
12288bd9 |
|
10-May-2021 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: fix lockless absolute symlink traversal to non-fp mounts Said lookups would incorrectly fail with EOPNOTSUP. Reported by: kib
|
#
c8bbb127 |
|
10-May-2021 |
Mark Johnston <markj@FreeBSD.org> |
vfs: Fix error handling in vn_fullpath_hardlink() vn_fullpath_any_smr() will return a positive error number if the caller-supplied buffer isn't big enough. In this case the error must be propagated up, otherwise we may copy out uninitialized bytes. Reported by: syzkaller+KMSAN Reviewed by: mjg, kib MFC aftr: 3 days Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D30198
|
#
074abacc |
|
10-Apr-2021 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: remove incomplete lockless lockout support during resize This is already properly handled thanks to 2 step hash replacement.
|
#
4f0279e0 |
|
15-Apr-2021 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: extend mismatch vnode assert print to include the name
|
#
72b3b5a9 |
|
08-Apr-2021 |
Mateusz Guzik <mjg@FreeBSD.org> |
vfs: replace vfs_smr_quiesce with vfs_smr_synchronize This ends up using a smr specific method. Suggested by: markj Tested by: pho
|
#
13b3862e |
|
06-Apr-2021 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: update an assert on CACHE_FPL_STATUS_ABORTED Since symlink support it can get upgraded to CACHE_FPL_STATUS_DESTROYED. Reported by: bdrewery
|
#
f79bd71d |
|
11-Feb-2021 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: add high level overview Differential Revision: https://reviews.freebsd.org/D28675
|
#
dc532884 |
|
29-Mar-2021 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: fix resizing in face of lockless lookup Reported by: pho Tested by: pho
|
#
1239a722 |
|
27-Feb-2021 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: temporarily drop the assert that dvp != vp when adding an entry Historically it was allowed for any names, but arguably should never be even attempted. Allow it again since there is a release pending and allowing it is bug-compatible with previous behavior. Reported by: otis
|
#
39e0c3f6 |
|
09-Feb-2021 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: assorted comment fixups
|
#
2f8a8446 |
|
05-Feb-2021 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: remove the largely obsolete general description Examples of inconsistencies with the current state: - references LRU of all entries, removed years ago - references a non-existent lock (neglist) - claims negative entries have a NULL target It will be replaced with a more accurate and more informative description. In the meantime take it out so it stops misleading.
|
#
0e1594e6 |
|
05-Feb-2021 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: fix vfs:namecache:lookup:miss probe call sites
|
#
2e96132a |
|
05-Feb-2021 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: drop spurious arg from panic in cache_validate vp is already reported when noting mismatch
|
#
b54ed778 |
|
03-Feb-2021 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: comment on FNV
|
#
45456abc |
|
02-Feb-2021 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: fix trailing slash support in face of permission problems Reported by: Johan Hendriks <joh.hendriks gmail.com> Tested by: kevans
|
#
6f19dc21 |
|
31-Jan-2021 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: add delayed degenerate path handling
|
#
bbfb1edd |
|
31-Jan-2021 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: move hash computation into the parsing loop
|
#
e027e24b |
|
25-Jan-2021 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: add trailing slash support Tested by: pho
|
#
8cbd164a |
|
26-Jan-2021 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: handle NOFOLLOW requests for symlinks Tested by: pho
|
#
5c325977 |
|
27-Jan-2021 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: add missing MNT_NOSYMFOLLOW check to symlink traversal
|
#
5fc384d1 |
|
27-Jan-2021 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: fallback when encountering a mount point during .. lookup The current abort is overzealous.
|
#
a098a831 |
|
26-Jan-2021 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: tidy up handling of foo/bar lookups where foo is not a directory The code was performing an avoidable check for doomed state to account for foo being a VDIR but turning VBAD. Now that dooming puts a vnode in a permanent "modify" state this is no longer necessary as the final status check will catch it.
|
#
a51eca79 |
|
26-Jan-2021 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: stop referring to removing entries as invalidating them Said use is a remnant from the old code and clashes with the NCF_INVALID flag.
|
#
6943671b |
|
25-Jan-2021 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: convert cache_fplookup_parse to void now that it always succeeds
|
#
e7cf562a |
|
25-Jan-2021 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: change ->v_cache_dd synchronisation rules Instead of resorting to seqc modification take advantage of immutability of entries and check if the entry still matches after everything got prepared.
|
#
6f084276 |
|
25-Jan-2021 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: make ->v_cache_dd accesses atomic-clean for lockless usage
|
#
6ef8fede |
|
25-Jan-2021 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: make ->nc_flag accesses atomic-clean for lockless usage
|
#
ffcf8f97 |
|
23-Jan-2021 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: store vnodes in local vars in cache_zap_locked
|
#
868643e7 |
|
24-Jan-2021 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: assorted cleanups
|
#
1c7a65ad |
|
24-Jan-2021 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: track calls to cache_symlink_alloc with unsupported size While here assert on size passed to free.
|
#
02ec31bd |
|
23-Jan-2021 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: add back target entry on rename
|
#
739ecbcf |
|
23-Jan-2021 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: add symlink support to lockless lookup Reviewed by: kib (previous version) Tested by: pho (previous version) Differential Revision: https://reviews.freebsd.org/D27488
|
#
2171b8e8 |
|
20-Jan-2021 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: augment sdt probe in cache_fplookup_dot Same as 6d386b4c ("cache: save a branch in cache_fplookup_next")
|
#
aae03cfe |
|
20-Jan-2021 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: whitespace nit in cache_fplookup_modifying
|
#
57dab029 |
|
19-Jan-2021 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: fix some typos
|
#
84ab77ad |
|
19-Jan-2021 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: drop-write only var from cache_fplookup_preparse
|
#
6d386b4c |
|
19-Jan-2021 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: save a branch in cache_fplookup_next Previously the code would branch on top find out whether it should branch on SDT probe and bumping the numposhits counter, depending on cache_fplookup_cross_mount. Arguably it should be done regardless of what said function returns.
|
#
70ba7770 |
|
12-Jan-2021 |
Mateusz Guzik <mjg@FreeBSD.org> |
vfs: extend vfs:namei:lookup:return probe with nameidata
|
#
8ddea0b1 |
|
08-Jan-2021 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: just assign ni_resflags = NIRES_ABS It is guaranteed to be 0 on entry.
|
#
fee405e0 |
|
06-Jan-2021 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: stop checkpointing cn_flags They are only modified, if ever, for the last component.
|
#
ac771547 |
|
06-Jan-2021 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: stop checkpointing cn_nameptr For aborts cn_nameptr is the same as cn_pnbuf. For partial results the same cn_nameptr is to be used.
|
#
0f1fc3a3 |
|
01-Jan-2021 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: stop manipulating pathlen It is a copy-pasto from regular lookup. Add debug to ensure the result is the same.
|
#
f2b794e1 |
|
06-Jan-2021 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: unengrish the comment in previous commit Reported by: rpokala, brd
|
#
deabdc68 |
|
05-Jan-2021 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: stop pre-checking seqc when starting the lookup Tested by: pho
|
#
71a6a0b5 |
|
31-Dec-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: skip checking for spurious slashes if possible Tested by: pho
|
#
33f3e81d |
|
01-Jan-2021 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: combine fast path enabled status into one flag Tested by: pho
|
#
dbbbc07c |
|
31-Dec-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: split handling of 0 and non-0 error codes Tested by: pho
|
#
a1a8f8ad |
|
31-Dec-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: deinline state handling The intent is to reduce branchfest when finishing the lookup. Tested by: pho
|
#
05803be0 |
|
05-Jan-2021 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: stop setting cn_nameptr on entry as matches cn_pnbuf already While here tidy up other asserts.
|
#
3814bea0 |
|
03-Jan-2021 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: drop the now spurious doomed check when crossing a mount point
|
#
82397d79 |
|
31-Dec-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
vfs: denote vnode being a mount point with VIRF_MOUNTPOINT Reviewed by: kib (previous version) Differential Revision: https://reviews.freebsd.org/D27794
|
#
51bf55fa |
|
01-Jan-2021 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: stop checkpointing cn_namelen The variable is recomputed by regular lookup from the get go.
|
#
7220a10b |
|
31-Dec-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: predict on no spurious slashes in cache_fpl_handle_root This is a step towards speculatively not handling them.
|
#
30a2fc91 |
|
31-Dec-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: postpone NAME_MAX check as it may be unnecessary
|
#
eca899bd |
|
31-Dec-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: remove spurious null check in sdt probe
|
#
1365b5f8 |
|
28-Dec-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: fold NCF_WHITE check into the rest Tested by: pho
|
#
d7c62d98 |
|
28-Dec-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: call cache_fplookup_modifying in neg Tested by: pho
|
#
6fe7de1a |
|
28-Dec-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: refactor cache_fpl_handle_root to fit the rest of the code better Tested by: pho
|
#
e17e01bd |
|
28-Dec-2020 |
Mateusz Guzik <mjguzik@gmail.com> |
cache: refactor dot handling Tested by: pho
|
#
4651db56 |
|
28-Dec-2020 |
Mateusz Guzik <mjguzik@gmail.com> |
cache: remove a branch from mount point checking Tested by: pho
|
#
0b5bd1af |
|
27-Dec-2020 |
Mateusz Guzik <mjguzik@gmail.com> |
cache: support lockless lookup of degenerate paths Tested by: pho
|
#
1d6eb976 |
|
27-Dec-2020 |
Mateusz Guzik <mjguzik@gmail.com> |
cache: save on branching when parsing the path by inserting a sentinel Tested by: pho
|
#
67297766 |
|
27-Dec-2020 |
Mateusz Guzik <mjguzik@gmail.com> |
cache: hoist trailing slash and degenerate path handling out of the loop Tested by: pho
|
#
0c09f4b0 |
|
28-Dec-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: work around corner case of dvp == tvp in cache_fplookup_final_modifying Fixes a panic where the kernel would unlock an unheld lock coming from rename looking up "foo/." as the source. Reported by: markj (syzkaller)
|
#
4ab7d9f4 |
|
27-Dec-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: reduce engrish in previous commit
|
#
0714f921 |
|
27-Dec-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: save on some branching in common case mount point traversal
|
#
002e18eb |
|
27-Dec-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
vfs: add FAILIFEXISTS flag Both FreeBSD and Linux mkdir -p walk the tree up ignoring any EEXIST on the way and both are used a lot when building respective kernels. This poses a problem as spurious locking avoidably interferes with concurrent operations like getdirentries on affected directories. Work around the problem by adding FAILIFEXISTS flag. In case of lockless lookup this manages to avoid any work to begin with, there is no speed up for the locked case but perhaps this can be augmented later on. For simplicity the only supported semantics are as used by mkdir. Reviewed by: kib (previous version) Differential Revision: https://reviews.freebsd.org/D27789
|
#
ff97bc03 |
|
27-Dec-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: simplify lockless dot lookups
|
#
abd7ded4 |
|
27-Dec-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: modification and last entry filling support in lockless lookup v2 The previous patch failed to set the ISDOTDOT flag when appropriate, which in turn fail to properly handle degenerate lookups. While here sprinkle some extra assertions. Tested by: pho (previous version)
|
#
623daa69 |
|
27-Dec-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: assert internal flags are not passed by namei
|
#
a1fc1f10 |
|
27-Dec-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
Revert "cache: modification and last entry filling support in lockless lookup" This reverts commit 6dbb07ed6872ae7988b9b705e322c94658eba6d1. Some ports unreliably fail to build with rmdir getting ENOTEMPTY.
|
#
6dbb07ed |
|
27-Dec-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: modification and last entry filling support in lockless lookup Tested by: pho (previous version)
|
#
906a73e7 |
|
23-Dec-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: fix up cache_hold_vnode comment
|
#
8ab96e26 |
|
13-Dec-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: fix ups bad predicts - last level fallback normally sees CREATE; the code should be optimized to not get there for said case - fast path commonly fails with ENOENT
|
#
d3bbf8af |
|
11-Dec-2020 |
Ryan Libby <rlibby@FreeBSD.org> |
cache_fplookup: quiet gcc -Wreturn-type Reviewed by: markj, mjg Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D27555
|
#
f6dd1aef |
|
09-Nov-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
vfs: group mount per-cpu vars into one struct While here move frequently read stuff into the same cacheline. This shrinks struct mount by 64 bytes. Tested by: pho
|
#
4bfebc8d |
|
30-Oct-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: add cache_vop_mkdir and rename cache_rename to cache_vop_rename
|
#
d681c51d |
|
26-Oct-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: add missing NIRES_ABS handling
|
#
eb65cde4 |
|
24-Oct-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: assorted typo fixes
|
#
029cfccc |
|
24-Oct-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: add the missing NC_NOMAKEENTRY and NC_KEEPPOSENTRY to lockless lookup They are de facto ignored.
|
#
acb41008 |
|
23-Oct-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: batch updates to numcache in case of mass removal
|
#
208cb7c4 |
|
23-Oct-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: refactor alloc/free This in particular centralizes manipulation of numcache.
|
#
1d444056 |
|
23-Oct-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: fold branch prediction into cache_ncp_canuse
|
#
c13d7d1f |
|
23-Oct-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: fix some typos
|
#
f878526f |
|
23-Oct-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: drop write-only vars
|
#
38628389 |
|
23-Oct-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: reduce memory waste in struct namecache The previous scheme for calculating the total size was doing sizeof on the struct and then adding the wanted space for the buffer. nc_name is at offset 58 while sizeof(struct namecache) is 64. With CACHE_PATH_CUTOFF of 39 bytes and 1 byte of padding we were allocating 104 bytes for the entry and never accounting for the 6 byte padding, wasting that space.
|
#
c7520caa |
|
22-Oct-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
vfs: prevent avoidable evictions on mkdir of existing directories mkdir -p /foo/bar/baz will mkdir each path component and ignore EEXIST. The NOCACHE lookup will make the namecache unnecessarily evict the existing entry, and then fallback to the fs lookup routine eventually leading namei to return an error as the directory is already there. For invocations like mkdir -p /usr/obj/usr/src/sys/GENERIC/modules this triggers fallbacks to the slowpath for concurrently executing lookups. Tested by: pho Discussed with: kib
|
#
54f09403 |
|
22-Oct-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: assert the created entry does not point to itself
|
#
2f1c3505 |
|
20-Oct-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: drop the spurious slash_prefixed argument
|
#
8ecd87a3 |
|
20-Oct-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
vfs: drop spurious cred argument from VOP_VPTOCNP
|
#
6d5d469f |
|
19-Oct-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: promote negative entries based on more than one hit During tinderbox and similar workloads negative entries get at least one hit before they get evicted. In the current scheme this avoidably promotes them. Be conservative and stick to 2 hits for now.
|
#
665c8c3e |
|
19-Oct-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: refactor negative promotion/demotion handling This will simplify policy changes.
|
#
4c4aa848 |
|
17-Oct-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: shorten names of debug stats
|
#
67655714 |
|
17-Oct-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: don't automatically evict negative entries if usage is low The previous scheme only looked at negative entry count in relation to the total count, leading to tons of spurious evictions if the cache is not significantly populated. Instead, only try the above if negative entry count goes beyond namecache capacity.
|
#
e98c3bc6 |
|
17-Oct-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: erwork sysctl vfs.cache tree Split everything into neg, debug, param and stat categories. The legacy nchstats sysctl (queried e.g., by systat) remains untouched. While here rename some vars to be easier on the eye.
|
#
fa7c73d3 |
|
17-Oct-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: factor negative lookup out of cache_fplookup_next
|
#
41e6b184 |
|
17-Oct-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: avoid smr in cache_neg_evict in favoro of the already held bucket lock
|
#
c38d8e1e |
|
17-Oct-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: rework parts of negative entry management - declutter sysctl vfs.cache by moving relevant entries into vfs.cache.neg - add a little more parallelism to eviction by replacing the global lock with an atomically modified counter - track more statistics The code needs further effort.
|
#
b31b5e9c |
|
17-Oct-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: remove entries before trying to add new ones, not after Should allow positive entries to replace negative ones in case the cache is full.
|
#
d6eee350 |
|
16-Oct-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: add a probe reporting addition of duplicate entries
|
#
a59b0ac3 |
|
15-Oct-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: flip inverted condition in previous It happened to not affect correctness in that the fallback code would simply neglect to promote the entry.
|
#
e7602e04 |
|
15-Oct-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: support negative entry promotion in slowpath smr
|
#
571bc3d1 |
|
15-Oct-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: elide vhold/vdrop around promoting negative entry
|
#
640e6162 |
|
15-Oct-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: dedup code for negative promotion
|
#
c97c8746 |
|
15-Oct-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: neglist -> nl; negstate -> ns No functional changes.
|
#
43777a20 |
|
15-Oct-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: split hotlist between existing negative lists This simplifies the code while allowing for concurrent negative eviction down the road. Cache misses increased slightly due to higher rate of evictions allowed by the change. The current algorithm remains too aggressive.
|
#
430dc451 |
|
15-Oct-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: make neglist an array given the static size
|
#
dd28b379 |
|
09-Oct-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
vfs: support lockless dirfd lookups
|
#
eb88fed4 |
|
09-Oct-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: fix vexec panic when racing against vgone Use of dead_vnodeops would result in a panic instead of returning the intended EOPNOTSUPP error. While here make sure to abort, not just try to return a partial result. The former allows the regular lookup to restart from scratch, while the latter makes it stuck with an unusable vnode. Reported by: kevans
|
#
4e226610 |
|
05-Oct-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: fix pwd use-after-free in setting up fallback Since the code exits smr section prior to calling pwd_hold, the used pwd can be freed and a new one allocated with the same address, making the comparison erroneously true. Note it is very unlikely anyone ran into it.
|
#
aa34e791 |
|
02-Oct-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: update the commentary for path parsing
|
#
b5ab177a |
|
01-Oct-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: properly report ENOTDIR on foo/bar lookups where foo is a file Reported by: fernape
|
#
4301a5a7 |
|
30-Sep-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: push the lock into cache_purge_impl
|
#
d4cac594 |
|
29-Sep-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: use cache_has_entries where appropriate instead of opencoding it
|
#
1b2edd6e |
|
23-Sep-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: eliminate cache_zap_locked_vnode It is only ever called for negative entries and for those it is just a wrapper around cache_zap_negative_locked_vnode_kl which always succeeds. This also fixes a bug where cache_lookup_fallback should have been calling cache_zap_locked_bucket instead. Note that in order to trigger the bug NOCACHE must not be set, which currently only happens when creating a new coredump (and then the coredump-to-be has to have a negative entry).
|
#
a3d9bf49 |
|
23-Sep-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: drop the force flag from purgevfs The optional scan is wasteful, thus it is removed altogether from unmount. Callers which always want it anyway remain unaffected.
|
#
a952feff |
|
23-Sep-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: reimplement purgevfs to iterate vnodes instead of the entire hash The entire cache scan was a leftover from the old implementation. It is incredibly wasteful in presence of several mount points and does not win much even for single ones.
|
#
efeec5f0 |
|
23-Sep-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: clean up atomic ops on numneg and numcache - use subtract instead of adding -1 - drop the useless _rel fence Note this should be converted to a scalable scheme.
|
#
da62ed4f |
|
08-Sep-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: drop write-only tvp_seqc vars
|
#
84ecea90 |
|
27-Aug-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: don't update timestmaps on found entry
|
#
5f08d440 |
|
27-Aug-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: assorted clean ups In particular remove spurious comments, duplicate assertions and the inconsistently done KTR support.
|
#
12441fcb |
|
27-Aug-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: ncp = NULL early to account for sdt probes in ailure path CID: 1432106
|
#
1e9a0b39 |
|
25-Aug-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: relock on failure in cache_zap_locked_vnode This gets rid of bogus scheme of yielding in hopes the blocking thread will make progress.
|
#
075f58f2 |
|
25-Aug-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: stop null checking in cache_free
|
#
66fa11c8 |
|
25-Aug-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: make it mandatory to request both timestamps or neither
|
#
eef63775 |
|
25-Aug-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: convert bucketlocks to a mutex By now bucket locks are almost never taken for anything but writing and converting to mutex simplifies the code.
|
#
32f3d082 |
|
25-Aug-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: only evict negative entries on CREATE when ISLASTCN is set
|
#
935e1518 |
|
25-Aug-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: decouple smr and locked lookup in the slowpath Tested by: pho
|
#
d3476dad |
|
25-Aug-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: factor dotdot lookup out of cache_lookup Tested by: pho
|
#
f9cdb077 |
|
24-Aug-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: remove leftover assert in vn_fullpath_any_smr It is only valid when !slash_prefixed. For slash_prefixed the length is properly accounted for later. Reported by: markj (syzkaller)
|
#
e35406c8 |
|
24-Aug-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: lockless reverse lookup This enables fully scalable operation for getcwd and significantly improves realpath. For example: PATH_CUSTOM=/usr/src ./getcwd_processes -t 104 before: 1550851 after: 380135380 Tested by: pho
|
#
feabaaf9 |
|
24-Aug-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: drop the always curthread argument from reverse lookup routines Note VOP_VPTOCNP keeps getting it as temporary compatibility for zfs. Tested by: pho
|
#
f0696c5e |
|
24-Aug-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: perform reverse lookup using v_cache_dd if possible Tested by: pho
|
#
ce575cd0 |
|
24-Aug-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: populate v_cache_dd for non-VDIR entries It makes v_cache_dd into a little bit of a misnomer and it may be addressed later. Tested by: pho
|
#
1e448a15 |
|
22-Aug-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: stronger vnode asserts in cache_enter_time
|
#
760a430b |
|
22-Aug-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
vfs: add a work around for vp_crossmp bug to realpath The actual bug is not yet addressed as it will get much easier after other problems are addressed (most notably rename contract). The only affected in-tree consumer is realpath. Everyone else happens to be performing lookups within a mount point, having a side effect of ni_dvp being set to mount point's root vnode in the worst case. Reported by: pho
|
#
17838b58 |
|
20-Aug-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: don't use cache_purge_negative when renaming It avoidably scans (and evicts) unrelated entries. Instead take advantage of passed componentname and perform a hash lookup for the exact one. Sample data from buildworld probed on cache_purge_negative extended to count both scanned and evicted entries on each call are below. At most it has to evict 1. evicted value ------------- Distribution ------------- count -1 | 0 0 |@@@@@@@@@@@@@@@ 19506 1 |@@@@@ 5820 2 |@@@@@@ 7751 4 |@@@@@ 6506 8 |@@@@@ 5996 16 |@@@ 4029 32 |@ 1489 64 | 193 128 | 109 256 | 56 512 | 16 1024 | 7 2048 | 3 4096 | 1 8192 | 1 16384 | 0 scanned value ------------- Distribution ------------- count -1 | 0 0 |@@ 2456 1 |@ 1496 2 |@@ 2728 4 |@@@ 4171 8 |@@@@ 5122 16 |@@@@ 5335 32 |@@@@@ 6279 64 |@@@@ 5671 128 |@@@@ 4558 256 |@@ 3123 512 |@@ 2790 1024 |@@ 2449 2048 |@@ 3021 4096 |@ 1398 8192 |@ 886 16384 | 0
|
#
39f88150 |
|
20-Aug-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: add cache_rename, a dedicated helper to use for renames While here make both tmpfs and ufs use it. No fuctional changes.
|
#
16be9f99 |
|
20-Aug-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: reimplement cache_lookup_nomakeentry as cache_remove_cnp This in particular removes unused arguments.
|
#
6c55d6e0 |
|
19-Aug-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: when adding an already existing entry assert on a complete match
|
#
7c75f14f |
|
19-Aug-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: tidy up the comment above cache_prehash
|
#
3c5d2ed7 |
|
16-Aug-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: add NOCAPCHECK to the list of supported flags for lockless lookup It is de facto supported in that lockless lookup does not do any capability checks.
|
#
8ab4beca |
|
16-Aug-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
vfs: use namei_zone for getcwd allocations instead of malloc. Note that this should probably be wrapped with a dedicated API and other vn_getcwd callers did not get converted.
|
#
5e79447d |
|
09-Aug-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: let SAVESTART passthrough The flag is only passed for non-LOOKUP ops and those fallback to the slowpath.
|
#
bb48255c |
|
09-Aug-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: resize struct namecache to a multiply of alignment For example struct namecache on amd64 is 100 bytes, but it has to occupies 104. Use the extra bytes to support longer names.
|
#
8b62cebe |
|
10-Aug-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: remove unused variables from cache_fplookup_parse
|
#
03337743 |
|
10-Aug-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
vfs: clean MNTK_FPLOOKUP if MNT_UNION is set Elides checking it during lookup.
|
#
c571b995 |
|
10-Aug-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: strlcpy -> memcpy
|
#
3ba0e517 |
|
10-Aug-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
vfs: partially support file create/delete/rename in lockless lookup Perform the lookup until the last 2 elements and fallback to slowpath. Tested by: pho Sponsored by: The FreeBSD Foundation
|
#
21d5af2b |
|
10-Aug-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
vfs: drop the thread argumemnt from vfs_fplookup_vexec It is guaranteed curthread. Tested by: pho Sponsored by: The FreeBSD Foundation
|
#
e910c93e |
|
05-Aug-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: add more predicts for failing conditions
|
#
95888901 |
|
05-Aug-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: plug unititalized variable use CID: 1431128
|
#
e1b1971c |
|
05-Aug-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: don't ignore size passed to nchinittbl
|
#
2b86f9d6 |
|
05-Aug-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: convert the hash from LIST to SLIST This reduces struct namecache by sizeof(void *). Negative side is that we have to find the previous element (if any) when removing an entry, but since we normally don't expect collisions it should be fine. Note this adds cache_get_hash calls which can be eliminated.
|
#
cf8ac0de |
|
05-Aug-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: reduce zone alignment to 8 bytes It used to be sizeof of the given struct to accomodate for 32 bit mips doing 64 bit loads, but the same can be achieved with requireing just 64 bit alignment. While here reorder struct namecache so that most commonly used fields are closer.
|
#
d61ce7ef |
|
05-Aug-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: convert ncnegnash into a macro It is a read-only var with value known at compilation time.
|
#
2840f07d |
|
05-Aug-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: cleanup lockless entry point - remove spurious bzero - assert ni_lcf, it has to be set by namei by this point
|
#
8ccf01e0 |
|
05-Aug-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: stop messing with cn_lkflags See r363882.
|
#
27c4618d |
|
05-Aug-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: stop messing with cn_flags This removes flag setting/unsetting carried over from regular lookup. Flags still get for compatibility when falling back. Note .. and . handling can get partially folded together.
|
#
db99ec56 |
|
04-Aug-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
vfs: support lockless dotdot lookup Tested by: pho
|
#
b403aa12 |
|
04-Aug-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: add NCF_WIP flag This allows making half-constructed entries visible to the lockless lookup, which now can check for either "not yet fully constructed" and "no longer valid" state. This will be used for .. lookup.
|
#
6e10434c |
|
04-Aug-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: add cache_purge_vgone cache_purge locklessly checks whether the vnode at hand has any namecache entries. This can race with a concurrent purge which managed to remove the last entry, but may not be done touching the vnode. Make sure we observe the relevant vnode lock as not taken before proceeding with vgone. Paired with the fact that doomed vnodes cannnot receive entries this restores the invariant that there are no namecache-related writing users past cache_purge in vgone. Reported by: pho
|
#
1164f7a5 |
|
04-Aug-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: factor away failed vexec handling
|
#
0439b00e |
|
04-Aug-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: assorted tidy ups
|
#
18bd02e2 |
|
04-Aug-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: factor away lockless dot lookup and add missing stat + sdt probe
|
#
17a66c70 |
|
04-Aug-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
vfs: add vfs_op_thread_enter/exit _crit variants and employ them in the namecache. Eliminates all spurious checks for preemption.
|
#
0311b05f |
|
04-Aug-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: add missing numcache detrement on insertion failure
|
#
7ad2f110 |
|
02-Aug-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
vfs: store precomputed namecache hash in the vnode This significantly speeds up path lookup, Cascade Lake doing access(2) on ufs on /usr/obj/usr/src/amd64.amd64/sys/GENERIC/vnode_if.c, ops/s: before: 2535298 after: 2797621 Over +10%. The reversed order of computation here does not seem to matter for hash distribution. Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D25921
|
#
838984de |
|
02-Aug-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
vfs: move namecache initialisation into cache_vnode_init
|
#
8a7ec170 |
|
01-Aug-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: reshuffle struct cache_fpl and nameidata_saved Shaves 16 bytes.
|
#
5a394433 |
|
01-Aug-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: mark climb_mount as __noinline
|
#
cb90ef28 |
|
30-Jul-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: drop the useless numchecks counter
|
#
40492735 |
|
30-Jul-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
vfs: add support for WANTPARENT and LOCKPARENT to lockless lookup This makes the realpath syscall operational with the new lookup. Note that the walk to obtain the full path name still takes locks. Tested by: pho Differential Revision: https://reviews.freebsd.org/D23917
|
#
8230d293 |
|
30-Jul-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
vfs: support negative entry promotion in lockless lookup Tested by: pho
|
#
4057e3ea |
|
30-Jul-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
vfs: add NOMACCHECK and AUDITVNODE2 to lockless lookup They are both nops since lookup does not progress with either mac or audit enabled. Tested by: pho
|
#
9dbd12fb |
|
25-Jul-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
vfs: add support for !LOCKLEAF to lockless lookup Tested by: pho (in a patchset) Differential Revision: https://reviews.freebsd.org/D23916
|
#
c42b77e6 |
|
25-Jul-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
vfs: lockless lookup Provides full scalability as long as all visited filesystems support the lookup and terminal vnodes are different. Inner workings are explained in the comment above cache_fplookup. Capabilities and fd-relative lookups are not supported and will result in immediate fallback to regular code. Symlinks, ".." in the path, mount points without support for lockless lookup and mismatched counters will result in an attempt to get a reference to the directory vnode and continue in regular lookup. If this fails, the entire operation is aborted and regular lookup starts from scratch. However, care is taken that data is not copied again from userspace. Sample benchmark: incremental -j 104 bzImage on tmpfs: before: 142.96s user 1025.63s system 4924% cpu 23.731 total after: 147.36s user 313.40s system 3216% cpu 14.326 total Sample microbenchmark: access calls to separate files in /tmpfs, 104 workers, ops/s: before: 2165816 after: 151216530 Reviewed by: kib Tested by: pho (in a patchset) Differential Revision: https://reviews.freebsd.org/D25578
|
#
29f3e5ea |
|
14-Jul-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: make negative shrinker round robin on all lists every time Previously it would check 4, 3, 2, 1 lists. In practice by the time it is getting called all lists have some elements and consequently this does not result in new evictions. Nonetheless, the code is clearer. Tested by: pho
|
#
a110fa2e |
|
14-Jul-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: remove numcalls The counter is not very useful and if necessary the value can be found by summing up other counters.
|
#
4516c7ee |
|
14-Jul-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: count dropped entries
|
#
654e644e |
|
14-Jul-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: remove neg_locked argument from cache_zap_locked Tested by: pho
|
#
ffb0abdd |
|
14-Jul-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: remove a useless argument from cache_negative_insert
|
#
9f8d4521 |
|
14-Jul-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: create a dedicate struct for negative entries .. and stuff if into the unused target vnode field This gets rid of concurrent nc_flag modifications racing with the shrinker and consequently fixes a bug where such a change could have been missed when cache_ncp_invalidate was being issued.. Reported by: zeising Tested by: pho, zeising Fixes: r362828 ("cache: lockless forward lookup with smr")
|
#
d2385020 |
|
01-Jul-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: add missing call to cache_ncp_invalid for negative hits Note the dtrace probe can fire even the entry is gone, but I don't think that's worth fixing.
|
#
d129e0eb |
|
01-Jul-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: fix misplaced fence in cache_ncp_invalidate The intent was to mark the entry as invalid before cache_zap starts messing with it. While here add some comments.
|
#
5d1c042d |
|
30-Jun-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: lockless forward lookup with smr This eliminates the need to take bucket locks in the common case. Concurrent lookup utilizng the same vnodes is still bottlenecked on referencing and locking path components, this will be taken care of separately. Reviewed by: kib Tested by: pho Differential Revision: https://reviews.freebsd.org/D23913
|
#
d869a17e |
|
06-Mar-2020 |
Mark Johnston <markj@FreeBSD.org> |
Use COUNTER_U64_DEFINE_EARLY() in places where it simplifies things. Reviewed by: kib Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D23978
|
#
8d03b99b |
|
01-Mar-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
fd: move vnodes out of filedesc into a dedicated structure The new structure is copy-on-write. With the assumption that path lookups are significantly more frequent than chdirs and chrooting this is a win. This provides stable root and jail root vnodes without the need to reference them on lookup, which in turn means less work on globally shared structures. Note this also happens to fix a bug where jail vnode was never referenced, meaning subsequent access on lookup could run into use-after-free. Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D23884
|
#
7029da5c |
|
26-Feb-2020 |
Pawel Biernacki <kaktus@FreeBSD.org> |
Mark more nodes as CTLFLAG_MPSAFE or CTLFLAG_NEEDGIANT (17 of many) r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are still not MPSAFE (or already are but aren’t properly marked). Use it in preparation for a general review of all nodes. This is non-functional change that adds annotations to SYSCTL_NODE and SYSCTL_PROC nodes using one of the soon-to-be-required flags. Mark all obvious cases as MPSAFE. All entries that haven't been marked as MPSAFE before are by default marked as NEEDGIANT Approved by: kib (mentor, blanket) Commented by: kib, gallatin, melifaro Differential Revision: https://reviews.freebsd.org/D23718
|
#
0573d0a9 |
|
20-Feb-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
vfs: add realpathat syscall realpath(3) is used a lot e.g., by clang and is a major source of getcwd and fstatat calls. This can be done more efficiently in the kernel. This works by performing a regular lookup while saving the name and found parent directory. If the terminal vnode is a directory we can resolve it using usual means. Otherwise we can use the name saved by lookup and resolve the parent. See the review for sample syscall counts. Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D23574
|
#
6a5abb1e |
|
02-Feb-2020 |
Kyle Evans <kevans@FreeBSD.org> |
Provide O_SEARCH O_SEARCH is defined by POSIX [0] to open a directory for searching, skipping permissions checks on the directory itself after the initial open(). This is close to the semantics we've historically applied for O_EXEC on a directory, which is UB according to POSIX. Conveniently, O_SEARCH on a file is also explicitly undefined behavior according to POSIX, so O_EXEC would be a fine choice. The spec goes on to state that O_SEARCH and O_EXEC need not be distinct values, but they're not defined to be the same value. This was pointed out as an incompatibility with other systems that had made its way into libarchive, which had assumed that O_EXEC was an alias for O_SEARCH. This defines compatibility O_SEARCH/FSEARCH (equivalent to O_EXEC and FEXEC respectively) and expands our UB for O_EXEC on a directory. O_EXEC on a directory is checked in vn_open_vnode already, so for completeness we add a NOEXECCHECK when O_SEARCH has been specified on the top-level fd and do not re-check that when descending in namei. [0] https://pubs.opengroup.org/onlinepubs/9699919799/ Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D23247
|
#
7739d927 |
|
01-Feb-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: replace kern___getcwd with vn_getcwd The previous routine was resulting in extra data copies most notably in linux_getcwd.
|
#
921e7210 |
|
01-Feb-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: return the total length from vn_fullpath1 This removes strlen from getcwd.
|
#
4511dd9d |
|
01-Feb-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: remove vnode -> path lookup disablement It seems to be of little to no use even when debugging. Interested parties can resurrect it and gate compilation with a macro.
|
#
45757984 |
|
01-Feb-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
vfs: consistently use size_t for buflen around VOP_VPTOCNP
|
#
64034553 |
|
20-Jan-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: revert r352613 now that vhold does not take locks
|
#
8bba93c7 |
|
20-Jan-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: make numcachehv use counter(9) on all archs Requested by: kib
|
#
059cb484 |
|
19-Jan-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: counter_u64_add_protected -> counter_u64_add Fixes booting on RISC-V where it does happen to not be equivalent. Reported by: lwhsu
|
#
13990335 |
|
18-Jan-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: convert numcachehv to counter(9) on 64-bit platforms
|
#
69283067 |
|
11-Jan-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
vfs: incomplete pass at converting more ints to u_long Most notably numvnodes and freevnodes were u_long, but parameters used to govern them remained as ints.
|
#
b249ce48 |
|
03-Jan-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
vfs: drop the mostly unused flags argument from VOP_UNLOCK Filesystems which want to use it in limited capacity can employ the VOP_UNLOCK_FLAGS macro. Reviewed by: kib (previous version) Differential Revision: https://reviews.freebsd.org/D21427
|
#
abd80ddb |
|
08-Dec-2019 |
Mateusz Guzik <mjg@FreeBSD.org> |
vfs: introduce v_irflag and make v_type smaller The current vnode layout is not smp-friendly by having frequently read data avoidably sharing cachelines with very frequently modified fields. In particular v_iflag inspected for VI_DOOMED can be found in the same line with v_usecount. Instead make it available in the same cacheline as the v_op, v_data and v_type which all get read all the time. v_type is avoidably 4 bytes while the necessary data will easily fit in 1. Shrinking it frees up 3 bytes, 2 of which get used here to introduce a new flag field with a new value: VIRF_DOOMED. Reviewed by: kib, jeff Differential Revision: https://reviews.freebsd.org/D22715
|
#
588e69e2 |
|
26-Nov-2019 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: stop reusing .. entries on enter It almost never happens in practice anyway. With this eliminated ->nc_vp cannot change vnodes, removing an obstacle on the road to lockless lookup.
|
#
2ac930e3 |
|
26-Nov-2019 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: fix numcache accounting on entry . entries are never created and .. can reuse existing entries, meaning the early count bump is both spurious and leading to overcounting in certain cases.
|
#
36afce39 |
|
26-Nov-2019 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: hide "doingcache" behind DEBUG_CACHE
|
#
d578a425 |
|
19-Nov-2019 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: minor stat cleanup Remove duplicated stats and move numcachehv from debug to vfs.cache.
|
#
708cf7eb |
|
27-Sep-2019 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: decrease ncnegfactor to 5 The current mechanism is bogus in several ways: - the limit is a percentage of total entries added, which means negative entries get evicted all the time even if there are plenty of resources - evicting code is almost not concurrent, which makes it unable to remove entries fast enough when doing something as simple as -j 104 buildworld - there is no support for performing mass removal if necessary Vast majority of negative entries never get any hits. Only evicting them when the filesystem demands it results in a significant growth of the namecache with almost no improvement in the hit ratio. Sample result about afer 90 minutes of poudriere -j 104: current no evict % of the original numneg 219737 2013157 916 numneghits 266711906 263544562 98 [1] [1] this may look funny but there is a certain dose of variation to the build The number was chosen as something which mostly eliminates spurious evictions during lighter workloads but still keeps the total at bay. Sponsored by: The FreeBSD Foundation
|
#
e6431418 |
|
27-Sep-2019 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: stop requeuing negative entries on the hot list Turns out it does not improve hit ratio, but it does come with a cost induces stemming from dirtying hit entries. Sample result: hit counts of evicted entries after 2 buildworlds before: value ------------- Distribution ------------- count -1 | 0 0 |@@@@@@@@@@@@@@@@@@@@@@@@@ 180865 1 |@@@@@@@ 49150 2 |@@@ 19067 4 |@ 9825 8 |@ 7340 16 |@ 5952 32 |@ 5243 64 |@ 4446 128 | 3556 256 | 3035 512 | 1705 1024 | 1078 2048 | 365 4096 | 95 8192 | 34 16384 | 26 32768 | 23 65536 | 8 131072 | 6 262144 | 0 after: value ------------- Distribution ------------- count -1 | 0 0 |@@@@@@@@@@@@@@@@@@@@@@@@@ 184004 1 |@@@@@@ 47577 2 |@@@ 19446 4 |@ 10093 8 |@ 7470 16 |@ 5544 32 |@ 5475 64 |@ 5011 128 | 3451 256 | 3002 512 | 1729 1024 | 1086 2048 | 363 4096 | 86 8192 | 26 16384 | 25 32768 | 24 65536 | 7 131072 | 5 262144 | 0 Sponsored by: The FreeBSD Foundation
|
#
312196df |
|
27-Sep-2019 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: make negative list shrinking a little bit concurrent Continue protecting demotion from the hotlist and selection of the target list with the ncneg_shrink_lock lock, but drop it before relocking to zap the node. While here count how many times we skipped shrinking due to the lock being already taken. Sponsored by: The FreeBSD Foundation
|
#
95c6dd89 |
|
27-Sep-2019 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: stop recalculating upper limit each time a new entry is added Sponsored by: The FreeBSD Foundation
|
#
93a85508 |
|
23-Sep-2019 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: tidy up handling of negative entries - track the total count of hot entries - pre-read the lock when shrinking since it is typically already taken - place the lock in its own cacheline - shorten the hold time of hot lock list when zapping Sponsored by: The FreeBSD Foundation
|
#
afe257e3 |
|
23-Sep-2019 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: count evictions of negatve entries Sponsored by: The FreeBSD Foundation
|
#
7505cffa |
|
22-Sep-2019 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: try to avoid vhold if locks held Sponsored by: The FreeBSD Foundation
|
#
cd2112c3 |
|
22-Sep-2019 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: jump in negative success instead of positive Sponsored by: The FreeBSD Foundation
|
#
b088a4d6 |
|
10-Sep-2019 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: avoid excessive relocking on entry removal during lookup Due to lock ordering issues (bucket lock held, vnode locks wanted) the code starts with trylocking which in face of contention often fails. Prior to the change it would loop back with a possible yield. Instead note we know what locks are needed and can take them in the right order, avoiding retries. Then we can safely re-lookup and see if the entry we are looking for is still there. On a 104-way box poudriere would result in constant retries during an 11h run as seen in the vfs.cache.zap_and_exit_bucket_fail counter. before: 408866592 after : 0 However, a new stat reports: vfs.cache.zap_and_exit_bucket_relock_success: 32638 Note this is only a bandaid over current design issues. Tested by: pho Sponsored by: The FreeBSD Foundation
|
#
a6cacb0d |
|
10-Sep-2019 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: change the formula for calculating lock array sizes It used to be mp_ncpus * 64, but this gives unnecessarily big values for small machines and at the same time constraints bigger ones. In particular this helps on a 104-way box for which the count is now doubled. While here make cache_purgevfs less likely. Currently it is not efficient in face of contention due to lock ordering issues. These are fixable but not worth it at the moment. Sponsored by: The FreeBSD Foundation
|
#
1214618c |
|
10-Sep-2019 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: assorted cleanups Sponsored by: The FreeBSD Foundation
|
#
e3c3248c |
|
03-Sep-2019 |
Mateusz Guzik <mjg@FreeBSD.org> |
vfs: implement usecount implying holdcnt vnodes have 2 reference counts - holdcnt to keep the vnode itself from getting freed and usecount to denote it is actively used. Previously all operations bumping usecount would also bump holdcnt, which is not necessary. We can detect if usecount is already > 1 (in which case holdcnt is also > 1) and utilize it to avoid bumping holdcnt on our own. This saves on atomic ops. Reviewed by: kib Tested by: pho (previous version) Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D21471
|
#
caaa7cee |
|
22-Jul-2019 |
Alan Somers <asomers@FreeBSD.org> |
[skip ci] Fix the comment for cache_purge(9) This is a merge of r348738 from projects/fuse2 Reviewed by: kib MFC after: 2 weeks Sponsored by: The FreeBSD Foundation
|
#
571908e2 |
|
06-Jun-2019 |
Alan Somers <asomers@FreeBSD.org> |
[skip ci] Fix the comment for cache_purge(9) Sponsored by: The FreeBSD Foundation
|
#
daec9284 |
|
21-May-2019 |
Conrad Meyer <cem@FreeBSD.org> |
Include ktr.h in more compilation units Similar to r348026, exhaustive search for uses of CTRn() and cross reference ktr.h includes. Where it was obvious that an OS compat header of some kind included ktr.h indirectly, .c files were left alone. Some of these files clearly got ktr.h via header pollution in some scenarios, or tinderbox would not be passing prior to this revision, but go ahead and explicitly include it in files using it anyway. Like r348026, these CUs did not show up in tinderbox as missing the include. Reported by: peterj (arm64/mp_machdep.c) X-MFC-With: r347984 Sponsored by: Dell EMC Isilon
|
#
8ba6c139 |
|
12-May-2019 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: fix a brainfart in r347505 If bumping over the counter goes over the limit we have to decrement it back. Previous code would only bump the counter after adding the entry (thus allowing the cache to go over the limit). Sponsored by: The FreeBSD Foundation
|
#
5bf50787 |
|
12-May-2019 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: bump numcache on entry, while here fix lnumcache type Sponsored by: The FreeBSD Foundation
|
#
63ad3b65 |
|
12-May-2019 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: push sdt probes in cache_zap_locked to code doing the work Avoids branching to check which probe to evaluate. Very same check was being done later to do the actual work. Sponsored by: The FreeBSD Foundation
|
#
691d4ab6 |
|
10-Apr-2019 |
Alan Somers <asomers@FreeBSD.org> |
fix cache_lookup's documentation cache_lookup's documentation got dislocated by r324378. Relocate and expand it. Reviewed by: jhb, kib MFC after: 2 weeks Sponsored by: The FreeBSD Foundation
|
#
22443809 |
|
29-Nov-2018 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: retire cache_enter compat schim It was added over 6 years ago for binary compat. cache_enter macro remains as it expands to cache_enter_time. Sponsored by: The FreeBSD Foundation
|
#
7ffbcfe2 |
|
20-Jun-2018 |
Bjoern A. Zeeb <bz@FreeBSD.org> |
Sometimes it is helpful to get the path for a vnode. Implement a ddb function walking the namecache to do this. Reviewed by: jhb, mjg Inspired by: gdb macro from jhb (old version) Sponsored by: iXsystems, Inc. Differential Revision: https://reviews.freebsd.org/D14898
|
#
e9b1074b |
|
18-May-2018 |
Matt Macy <mmacy@FreeBSD.org> |
cache_lookup remove unused variable and initialize used
|
#
e1703ef5 |
|
01-Dec-2017 |
Mark Johnston <markj@FreeBSD.org> |
Plug a name cache lock leak. Reviewed by: mjg MFC after: 1 week Sponsored by: Dell EMC Isilon
|
#
51369649 |
|
20-Nov-2017 |
Pedro F. Giffuni <pfg@FreeBSD.org> |
sys: further adoption of SPDX licensing ID tags. Mainly focus on files that use BSD 3-Clause license. The Software Package Data Exchange (SPDX) group provides a specification to make it easier for automated tools to detect and summarize well known opensource licenses. We are gradually adopting the specification, noting that the tags are considered only advisory and do not, in any way, superceed or replace the license texts. Special thanks to Wind River for providing access to "The Duke of Highlander" tool: an older (2014) run over FreeBSD tree was useful as a starting point.
|
#
ce80021f |
|
05-Nov-2017 |
Mateusz Guzik <mjg@FreeBSD.org> |
namecache: bump numcache after dropping all locks This makes no difference correctness-wise, but shortens total hold time.
|
#
119b826a |
|
05-Nov-2017 |
Mateusz Guzik <mjg@FreeBSD.org> |
namecache: wlock buckets in cache_lookup_nomakeentry Since the case of an empty chain was already covered, it si very likely that the existing entry is matching. Skipping readlocking saves on lock upgrade.
|
#
ba324b59 |
|
05-Nov-2017 |
Mateusz Guzik <mjg@FreeBSD.org> |
namecache: skip locking in cache_lookup_nomakeentry if there is no entry
|
#
a52058f0 |
|
05-Nov-2017 |
Mateusz Guzik <mjg@FreeBSD.org> |
namecache: skip locking in cache_purge_negative if there are no entries
|
#
ac850e5a |
|
01-Nov-2017 |
Mateusz Guzik <mjg@FreeBSD.org> |
namecache: fix .. check broken after r324378 wtf by: mjg Diagnosed by: avg
|
#
5644fffa |
|
01-Nov-2017 |
Mateusz Guzik <mjg@FreeBSD.org> |
namecache: ncnegfactor 16 -> 12 It is used on each new entry addition to decide whether to whack an existing negative entry in order to prevent a blow out in size, but the parameter was set years ago and never revisited. Building with poudriere results in about 400 evictions per second which unnecessarily grab entries from the hot list. With the new parameter there are next to no evictions of the sort.
|
#
709939a7 |
|
06-Oct-2017 |
Mateusz Guzik <mjg@FreeBSD.org> |
namecache: factor out ~MAKEENTRY lookups from the common path Lookups of the sort are rare compared to regular ones and succesfull ones result in removing entries from the cache. In the current code buckets are rlocked and a trylock dance is performed, which can fail and cause a restart. Fixing it will require a little bit of surgery and in order to keep the code maintaineable the 2 cases have to split. MFC after: 1 week
|
#
c2dc6d5d |
|
27-Sep-2017 |
John Baldwin <jhb@FreeBSD.org> |
Use UMA_ALIGNOF() for name cache UMA zones. This fixes kernel crashes due to misaligned accesses to the 64-bit time_t embedded in struct namecache_ts in MIPS n32 kernels. MFC after: 1 week Sponsored by: DARPA / AFRL
|
#
0bbae6f3 |
|
10-Sep-2017 |
Mateusz Guzik <mjg@FreeBSD.org> |
namecache: clean up struct namecache_ts handling namecache_ts differs from mere namecache by few fields placed mid struct. The access to the last element (the name) is thus special-cased. The standard solution is to put new fields at the very beginning anad embedd the original struct. The pointer shuffled around points to the embedded part. If needed, access to new fields can be gained through __containerof. MFC after: 1 week
|
#
dad74ce9 |
|
08-Sep-2017 |
Mateusz Guzik <mjg@FreeBSD.org> |
namecache: fold the unlock label into the only consumer No functional changes. MFC after: 1 week
|
#
da8f32a7 |
|
08-Sep-2017 |
Mateusz Guzik <mjg@FreeBSD.org> |
namecache: factor out dot lookup into a dedicated function The intent is to move uncommon cases out of the way. MFC after: 1 week
|
#
8066a14a |
|
03-May-2017 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: stop holding the ncneg_hot lock across purging Only non-hot entries are purged so the lock is not needed in the first place. This saves one lock/unlock pair. MFC after: 1 week
|
#
a3b7d0fb |
|
06-Apr-2017 |
Brooks Davis <brooks@FreeBSD.org> |
Regen after r316594.
|
#
dfecf51d |
|
29-Jan-2017 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: use vrefact for '.' lookups and refing the rdir in fullpath
|
#
17071ff2 |
|
27-Jan-2017 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: annotate with __read_mostly and __exclusive_cache_line MFC after: 1 month
|
#
4938d867 |
|
29-Dec-2016 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: sprinkle __predict_false
|
#
b3770753 |
|
28-Dec-2016 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: move shrink lock init to nchinit This gets rid of unnecesary sysinit usage. While here also rename the lock to be consistent with the rest.
|
#
0569bc9c |
|
29-Dec-2016 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: depessimize hashing macros/inlines All hash sizes are power-of-2, but the compiler does not know that for sure and 'foo % size' forces doing a division. Store the size - 1 and use 'foo & hash' instead which allows mere shift.
|
#
6dd9661b |
|
29-Dec-2016 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: drop the NULL check from VP2VNODELOCK Now that negative entries are annotated with a dedicated flag, NULL vnodes are no longer passed.
|
#
25e578de |
|
12-Dec-2016 |
Mateusz Guzik <mjg@FreeBSD.org> |
vfs: use vrefact in getcwd and fchdir
|
#
8b0e0c91 |
|
23-Nov-2016 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: ensure that the number of bucket locks does not exceed hash size The size can be changed by side effect of modifying kern.maxvnodes. Since numbucketlocks was not modified, setting a sufficiently low value would give more locks than actual buckets, which would then lead to corruption. Force the number of buckets to be not smaller. Note this should not matter for real world cases. Reported and tested by: pho
|
#
6ce45c6a |
|
14-Nov-2016 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: plug a write-only variable in cache_negative_zap_one
|
#
317cac6d |
|
14-Nov-2016 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: fix a race between entry removal and demotion The negative list shrinker can demote an entry with only hotlist + neglist locks held. On the other hand entry removal possibly sets the NCF_DVDROP without aformentioned locks held prior to detaching it from the respective netlist., which can lose the update made by the shrinker. Reported and tested by: truckman
|
#
9bd4f0a2 |
|
07-Nov-2016 |
Konstantin Belousov <kib@FreeBSD.org> |
vn_fullpath1() checked VV_ROOT and then unreferenced vp->v_mount->mnt_vnodecovered unlocked. This allowed unmount to race. Lock vnode after we noticed the VV_ROOT flag. See comments for explanation why unlocked check for the flag is considered safe. Reported and tested by: avg Sponsored by: The FreeBSD Foundation MFC after: 1 week
|
#
bb697a20 |
|
20-Oct-2016 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: fix up a corner case in r307650 If no negative entry is found on the last list, the ncp pointer will be left uninitialized and a non-null value will make the function assume an entry was found. Fix the problem by initializing to NULL on entry. Reported by: glebius
|
#
a45a1a25 |
|
19-Oct-2016 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: split negative entry LRU into multiple lists This splits the ncneg_mtx lock while preserving the hit ratio at least during buildworld. Create N dedicated lists for new negative entries. Entries with at least one hit get promoted to the hot list, where they get requeued every M hits. Shrinking demotes one hot entry and performs a round-robin shrinking of regular lists. Reviewed by: kib
|
#
f71d0856 |
|
07-Oct-2016 |
Konstantin Belousov <kib@FreeBSD.org> |
Limit scope of the optimization in r306608 to dounmount() caller only. Other uses of cache_purgevfs() do rely on the cache purge for correct operations, when paths are invalidated without unmount. Reported and tested by: jkim Discussed with: mjg Sponsored by: The FreeBSD Foundation
|
#
4876636e |
|
02-Oct-2016 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: ignore purgevfs requests for filesystems with few vnodes purgevfs is purely optional and induces lock contention in workloads which frequently mount and unmount filesystems. In particular, poudriere will do this for filesystems with 4 vnodes or less. Full cache scan is clearly wasteful. Since there is no explicit counter for namecache entries, the number of vnodes used by the target fs is checked. The default limit is the number of bucket locks. Reviewed by: kib
|
#
1d2541fd |
|
22-Sep-2016 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: get rid of the global lock Add a table of vnode locks and use them along with bucketlocks to provide concurrent modification support. The approach taken is to preserve the current behaviour of the namecache and just lock all relevant parts before any changes are made. Lookups still require the relevant bucket to be locked. Discussed with: kib Tested by: pho
|
#
69a28758 |
|
15-Sep-2016 |
Ed Maste <emaste@FreeBSD.org> |
Renumber license clauses in sys/kern to avoid skipping #3
|
#
a2781533 |
|
10-Sep-2016 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: improve scalability by introducing bucket locks An array of bucket locks is added. All modifications still require the global cache_lock to be held for writing. However, most readers only need the relevant bucket lock and in effect can run concurrently to the writer as long as they use a different lock. See the added comment for more details. This is an intermediate step towards removal of the global lock. Reviewed by: kib Tested by: pho
|
#
591df145 |
|
04-Sep-2016 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: defer freeing entries until after the global lock is dropped This also defers vdrop for held vnodes. Glanced at by: kib
|
#
31977b42 |
|
04-Sep-2016 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: manage negative entry list with a dedicated lock Since negative entries are managed with a LRU list, a hit requires a modificaton. Currently the code tries to upgrade the global lock if needed and is forced to retry the lookup if it fails. Provide a dedicated lock for use when the cache is only shared-locked. Reviewed by: kib MFC after: 1 week
|
#
b9042ae1 |
|
04-Sep-2016 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: put all negative entry management code into dedicated functions Reviewed by: kib MFC after: 1 week
|
#
e3043798 |
|
29-Apr-2016 |
Pedro F. Giffuni <pfg@FreeBSD.org> |
sys/kern: spelling fixes in comments. No functional change.
|
#
0791e0c0 |
|
24-Feb-2016 |
Konstantin Belousov <kib@FreeBSD.org> |
Provide more correct sizing of the KVA consumed by a vnode, used by the virtvnodes calculation. Include the size of fs-specific v_data as the nfs nclnode inline, the NFS nclnode is bigger than either ZFS znode or UFS inode. Include the size of namecache_ts and short cache path element, multiplied by the name cache population factor, again inline. Inline defines are used to avoid pollution of the vnode.h with the subsystem-private objects. Non-significant unsynchronized changes of the definitions are fine, we do not care about that precision, and e.g. ZFS consumes much malloced memory per vnode for reasons unaccounted in the formula. Lower the partition of kmem dedicated to vnodes, from 1/7 to 1/10. The measures reduce vnode cache pressure on kmem and bring the vnode cache memory use below some apparent thresholds that were exceeded by r291244 due to more robust vnode reuse. Reported and tested by: marius (i386, previous version) Reviewed by: bde Sponsored by: The FreeBSD Foundation MFC after: 2 weeks
|
#
b0632ab4 |
|
20-Jan-2016 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: minor changes 1. vhold and zap immediately instead of postponing few lines later 2. increment numneg after new entry is added No functional changes. No objections: kib
|
#
baa2bcf5 |
|
20-Jan-2016 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: perform . lockup without the namecache lock Reviewed by: kib
|
#
db709ecb |
|
20-Jan-2016 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: provide a helper for computing the hash Reviewed by: kib
|
#
76583fa2 |
|
20-Jan-2016 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: use counter(9) API to maintain statistics Previously the code would just increment statistics while only holding a shared lock, in effect losing updates. Separate tracking for nchstats is removed as values can be obtained from existing counters. Note that some fields are updated by external consumers and are left unfixed. This should not be a serious issue as this structure looks quite obsolete. No strong objections: kib
|
#
6b53d1bc |
|
06-Jan-2016 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: ansify functions and fix some style issues No functional changes.
|
#
36160958 |
|
16-Dec-2015 |
Mark Johnston <markj@FreeBSD.org> |
Fix style issues around existing SDT probes. - Use SDT_PROBE<N>() instead of SDT_PROBE(). This has no functional effect at the moment, but will be needed for some future changes. - Don't hardcode the module component of the probe identifier. This is set automatically by the SDT framework. MFC after: 1 week
|
#
2f2f522b |
|
27-Sep-2015 |
Andriy Gapon <avg@FreeBSD.org> |
save some bytes by using more concise SDT_PROBE<n> instead of SDT_PROBE SDT_PROBE requires 5 parameters whereas SDT_PROBE<n> requires n parameters where n is typically smaller than 5. Perhaps SDT_PROBE should be made a private implementation detail. MFC after: 20 days
|
#
17518b1a |
|
05-Sep-2015 |
Kirk McKusick <mckusick@FreeBSD.org> |
Track changes to kern.maxvnodes and appropriately increase or decrease the size of the name cache hash table (mapping file names to vnodes) and the vnode hash table (mapping mount point and inode number to vnode). An appropriate locking strategy is the key to changing hash table sizes while they are in active use. Reviewed by: kib Tested by: Peter Holm Differential Revision: https://reviews.freebsd.org/D2265 MFC after: 2 weeks
|
#
752fc07d |
|
16-Jul-2015 |
Mateusz Guzik <mjg@FreeBSD.org> |
vfs: implement v_holdcnt/v_usecount manipulation using atomic ops Transitions 0->1 and 1->0 (which decide e.g. on putting the vnode on the free list) of either counter are still guarded with vnode interlock. Reviewed by: kib (earlier version) Tested by: pho
|
#
6289b482 |
|
21-Apr-2015 |
Edward Tomasz Napierala <trasz@FreeBSD.org> |
Modify kern___getcwd() to take max pathlen limit as an additional argument. This will be used for the Linux emulation layer - for Linux, PATH_MAX is 4096 and not 1024. Differential Revision: https://reviews.freebsd.org/D2335 Reviewed by: kib@ MFC after: 1 month Sponsored by: The FreeBSD Foundation
|
#
f3519155 |
|
17-Apr-2015 |
Kirk McKusick <mckusick@FreeBSD.org> |
More accurately collect name-cache statistics in sysctl functions sysctl_debug_hashstat_nchash() and sysctl_debug_hashstat_rawnchash(). These changes are in preparation for allowing changes in the size of the vnode hash tables driven by increases and decreases in the maximum number of vnodes in the system. Reviewed by: kib@ Phabric: D2265
|
#
9f7a06f2 |
|
04-Jan-2015 |
Dmitry Chagin <dchagin@FreeBSD.org> |
Indeed, instead of hiding the kern___getcwd() bug by bogus cast in r276564, change path type to char * (pathnames are always char *). And remove bogus casts of malloc(). kern___getcwd() internally doesn't actually use or support u_char * paths, except to copy them to a normal char * path. These changes are not visible to libc as libc/gen/getcwd.c misdeclares __getcwd() as taking a plain char * path. While here remove _SYS_SYSPROTO_H_ for __getcwd() syscall as we always have sysproto.h. Pointed out by: bde MFC after: 1 week
|
#
f0188618 |
|
21-Oct-2014 |
Hans Petter Selasky <hselasky@FreeBSD.org> |
Fix multiple incorrect SYSCTL arguments in the kernel: - Wrong integer type was specified. - Wrong or missing "access" specifier. The "access" specifier sometimes included the SYSCTL type, which it should not, except for procedural SYSCTL nodes. - Logical OR where binary OR was expected. - Properly assert the "access" argument passed to all SYSCTL macros, using the CTASSERT macro. This applies to both static- and dynamically created SYSCTLs. - Properly assert the the data type for both static and dynamic SYSCTLs. In the case of static SYSCTLs we only assert that the data pointed to by the SYSCTL data pointer has the correct size, hence there is no easy way to assert types in the C language outside a C-function. - Rewrote some code which doesn't pass a constant "access" specifier when creating dynamic SYSCTL nodes, which is now a requirement. - Updated "EXAMPLES" section in SYSCTL manual page. MFC after: 3 days Sponsored by: Mellanox Technologies
|
#
bcdd3bce |
|
03-Aug-2014 |
Sergey Kandaurov <pluknet@FreeBSD.org> |
vn_path_to_global_path: update comment.
|
#
fe200470 |
|
27-Dec-2013 |
Konstantin Belousov <kib@FreeBSD.org> |
Fix accounting for the negative cache entries when reusing v_cache_dd. Having ncneg diverge with the actual length of the ncneg tailq causes NULL dereference. Add assertion that an entry taken from ncneg queue is indeed negative. Reported by and discussed with: avg Sponsored by: The FreeBSD Foundation MFC after: 1 week
|
#
d9fae5ab |
|
26-Nov-2013 |
Andriy Gapon <avg@FreeBSD.org> |
dtrace sdt: remove the ugly sname parameter of SDT_PROBE_DEFINE In its stead use the Solaris / illumos approach of emulating '-' (dash) in probe names with '__' (two consecutive underscores). Reviewed by: markj MFC after: 3 weeks
|
#
54366c0b |
|
25-Nov-2013 |
Attilio Rao <attilio@FreeBSD.org> |
- For kernel compiled only with KDTRACE_HOOKS and not any lock debugging option, unbreak the lock tracing release semantic by embedding calls to LOCKSTAT_PROFILE_RELEASE_LOCK() direclty in the inlined version of the releasing functions for mutex, rwlock and sxlock. Failing to do so skips the lockstat_probe_func invokation for unlocking. - As part of the LOCKSTAT support is inlined in mutex operation, for kernel compiled without lock debugging options, potentially every consumer must be compiled including opt_kdtrace.h. Fix this by moving KDTRACE_HOOKS into opt_global.h and remove the dependency by opt_kdtrace.h for all files, as now only KDTRACE_FRAMES is linked there and it is only used as a compile-time stub [0]. [0] immediately shows some new bug as DTRACE-derived support for debug in sfxge is broken and it was never really tested. As it was not including correctly opt_kdtrace.h before it was never enabled so it was kept broken for a while. Fix this by using a protection stub, leaving sfxge driver authors the responsibility for fixing it appropriately [1]. Sponsored by: EMC / Isilon storage division Discussed with: rstone [0] Reported by: rstone [1] Discussed with: philip
|
#
4633a4c3 |
|
09-Jul-2013 |
Andriy Gapon <avg@FreeBSD.org> |
namecache sdt: freebsd doesn't support structured characters yet :-) MFC after: 7 days
|
#
3289d587 |
|
20-Mar-2013 |
Kirk McKusick <mckusick@FreeBSD.org> |
When renaming a directory from one parent directory to another, we need to call ufs_checkpath() to walk from our new location to the root of the filesystem to ensure that we do not encounter ourselves along the way. Until now, we accomplished this by reading the ".." entries of each directory in our path until we reached the root (or encountered an error). This change tries to avoid the I/O of reading the ".." entries by first looking them up in the name cache and only doing the I/O when the name cache lookup fails. Reviewed by: kib Tested by: Peter Holm MFC after: 4 weeks
|
#
5050aa86 |
|
22-Oct-2012 |
Konstantin Belousov <kib@FreeBSD.org> |
Remove the support for using non-mpsafe filesystem modules. In particular, do not lock Giant conditionally when calling into the filesystem module, remove the VFS_LOCK_GIANT() and related macros. Stop handling buffers belonging to non-mpsafe filesystems. The VFS_VERSION is bumped to indicate the interface change which does not result in the interface signatures changes. Conducted and reviewed by: attilio Tested by: pho
|
#
5e99212d |
|
02-Mar-2012 |
Rick Macklem <rmacklem@FreeBSD.org> |
Post r230394, the Lookup RPC counts for both NFS clients increased significantly. Upon investigation this was caused by name cache misses for lookups of "..". For name cache entries for non-".." directories, the cache entry serves double duty. It maps both the named directory plus ".." for the parent of the directory. As such, two ctime values (one for each of the directory and its parent) need to be saved in the name cache entry. This patch adds an entry for ctime of the parent directory to the name cache. It also adds an additional uma zone for large entries with this time value, in order to minimize memory wastage. As well, it fixes a couple of cases where the mtime of the parent directory was being saved instead of ctime for positive name cache entries. With this patch, Lookup RPC counts return to values similar to pre-r230394 kernels. Reported by: bde Discussed with: kib Reviewed by: jhb MFC after: 2 weeks
|
#
7dfdd83d |
|
24-Feb-2012 |
Maxim Konovalov <maxim@FreeBSD.org> |
o Reduce chances for integer overflow. o More verbose sysctl description added. MFC after: 2 weeks Sponsored by: Nginx, Inc.
|
#
bf40d24a |
|
06-Feb-2012 |
John Baldwin <jhb@FreeBSD.org> |
Rename cache_lookup_times() to cache_lookup() and retire the old API and ABI stub for cache_lookup().
|
#
d5210589 |
|
25-Jan-2012 |
Konstantin Belousov <kib@FreeBSD.org> |
Fix remaining calls to cache_enter() in both NFS clients to provide appropriate timestamps. Restore the assertions which verify that NCF_TS is set when timestamp is asked for. Reviewed by: jhb (previous version) MFC after: 2 weeks
|
#
7a7e609a |
|
23-Jan-2012 |
Konstantin Belousov <kib@FreeBSD.org> |
Apparently, both nfs clients do not use cache_enter_time() consistently, creating some namecache entries without NCF_TS flag. This causes panic due to failed assertion. As a temporal relief, remove the assert. Return epoch timestamp for the entries without timestamp if asked. While there, consolidate the code which returns timestamps, into a helper cache_out_ts(). Discussed with: jhb MFC after: 2 weeks
|
#
c2b396f2 |
|
21-Jan-2012 |
Konstantin Belousov <kib@FreeBSD.org> |
Remove the nc_time and nc_ticks elements from struct namecache, and provide struct namecache_ts which is the old struct namecache. Only allocate struct namecache_ts if non-null struct timespec *tsp was passed to cache_enter_time, otherwise use struct namecache. Change struct namecache allocation and deallocation macros into static functions, since logic becomes somewhat twisty. Provide accessor for the nc_name member of struct namecache to hide difference between struct namecache and namecache_ts. The aim of the change is to not waste 20 bytes per small namecache entry. Reviewed by: jhb MFC after: 2 weeks X-MFC-note: after r230394
|
#
5aefb4cb |
|
20-Jan-2012 |
John Baldwin <jhb@FreeBSD.org> |
Close a race in NFS lookup processing that could result in stale name cache entries on one client when a directory was renamed on another client. The root cause for the stale entry being trusted is that each per-vnode nfsnode structure has a single 'n_ctime' timestamp used to validate positive name cache entries. However, if there are multiple entries for a single vnode, they all share a single timestamp. To fix this, extend the name cache to allow filesystems to optionally store a timestamp value in each name cache entry. The NFS clients now fetch the timestamp associated with each name cache entry and use that to validate cache hits instead of the timestamps previously stored in the nfsnode. Another part of the fix is that the NFS clients now use timestamps from the post-op attributes of RPCs when adding name cache entries rather than pulling the timestamps out of the file's attribute cache. The latter is subject to races with other lookups updating the attribute cache concurrently. Some more details: - Add a variant of nfsm_postop_attr() to the old NFS client that can return a vattr structure with a copy of the post-op attributes. - Handle lookups of "." as a special case in the NFS clients since the name cache does not store name cache entries for ".", so we cannot get a useful timestamp. It didn't really make much sense to recheck the attributes on the the directory to validate the namecache hit for "." anyway. - ABI compat shims for the name cache routines are present in this commit so that it is safe to MFC. MFC after: 2 weeks
|
#
9cbe30e1 |
|
15-Jan-2012 |
Martin Matuska <mm@FreeBSD.org> |
Fix missing in r230129: kern_jail.c: initialize fullpath_disabled to zero vfs_cache.c: add missing dot in comment Reported by: kib MFC after: 1 month
|
#
f6e633a9 |
|
14-Jan-2012 |
Martin Matuska <mm@FreeBSD.org> |
Introduce vn_path_to_global_path() This function updates path string to vnode's full global path and checks the size of the new path string against the pathlen argument. In vfs_domount(), sys_unmount() and kern_jail_set() this new function is used to update the supplied path argument to the respective global path. Unbreaks jailed zfs(8) with enforce_statfs set to 1. Reviewed by: kib MFC after: 1 month
|
#
7a7ce668 |
|
12-Dec-2011 |
Andriy Gapon <avg@FreeBSD.org> |
put sys/systm.h at its proper place or add it if missing Reported by: lstewart, tinderbox Pointyhat to: avg, attilio MFC after: 1 week MFC with: r228430
|
#
f82360ac |
|
19-Nov-2011 |
Konstantin Belousov <kib@FreeBSD.org> |
Existing VOP_VPTOCNP() interface has a fatal flow that is critical for nullfs. The problem is that resulting vnode is only required to be held on return from the successfull call to vop, instead of being referenced. Nullfs VOP_INACTIVE() method reclaims the vnode, which in combination with the VOP_VPTOCNP() interface means that the directory vnode returned from VOP_VPTOCNP() is reclaimed in advance, causing vn_fullpath() to error with EBADF or like. Change the interface for VOP_VPTOCNP(), now the dvp must be referenced. Convert all in-tree implementations of VOP_VPTOCNP(), which is trivial, because vhold(9) and vref(9) are similar in the locking prerequisites. Out-of-tree fs implementation of VOP_VPTOCNP(), if any, should have no trouble with the fix. Tested by: pho Reviewed by: mckusick MFC after: 3 weeks (subject of re approval)
|
#
6472ac3d |
|
07-Nov-2011 |
Ed Schouten <ed@FreeBSD.org> |
Mark all SYSCTL_NODEs static that have no corresponding SYSCTL_DECLs. The SYSCTL_NODE macro defines a list that stores all child-elements of that node. If there's no SYSCTL_DECL macro anywhere else, there's no reason why it shouldn't be static.
|
#
8451d0dd |
|
16-Sep-2011 |
Kip Macy <kmacy@FreeBSD.org> |
In order to maximize the re-usability of kernel code in user space this patch modifies makesyscalls.sh to prefix all of the non-compatibility calls (e.g. not linux_, freebsd32_) with sys_ and updates the kernel entry points and all places in the code that use them. It also fixes an additional name space collision between the kernel function psignal and the libc function of the same name by renaming the kernel psignal kern_psignal(). By introducing this change now we will ease future MFCs that change syscalls. Reviewed by: rwatson Approved by: re (bz)
|
#
8d065a39 |
|
14-Nov-2010 |
Rebecca Cran <brucec@FreeBSD.org> |
Fix some more style(9) issues.
|
#
b389be97 |
|
14-Nov-2010 |
Rebecca Cran <brucec@FreeBSD.org> |
Fix style(9) issues from r215281 and r215282. MFC after: 1 week
|
#
2baa5cdd |
|
13-Nov-2010 |
Rebecca Cran <brucec@FreeBSD.org> |
Add some descriptions to sys/kern sysctls. PR: kern/148710 Tested by: Chip Camden <sterling at camdensoftware.com> MFC after: 1 week
|
#
3a40a00d |
|
30-Oct-2010 |
Konstantin Belousov <kib@FreeBSD.org> |
Remove sysctl debug.ncnegfactor, it is renamed to vfs.ncnegfactor. MFC: do not
|
#
a7d5f7eb |
|
19-Oct-2010 |
Jamie Gritton <jamie@FreeBSD.org> |
A new jail(8) with a configuration file, to replace the work currently done by /etc/rc.d/jail.
|
#
420cfbb4 |
|
16-Oct-2010 |
Konstantin Belousov <kib@FreeBSD.org> |
Provide vfs.ncsizefactor instead of hard-coding namecache ratio. Move debug.ncnegfactor to vfs.ncnegfactor [1]. Provide some descriptions for the namecache related sysctls [1]. Based on the submission by: Rogier R. Mulhuijzen <drwilco drwilco net> [1] MFC after: 2 weeks X-MFC-note: remove debug.ncnegfactor in HEAD after MFC
|
#
79856499 |
|
22-Aug-2010 |
Rui Paulo <rpaulo@FreeBSD.org> |
Add an extra comment to the SDT probes definition. This allows us to get use '-' in probe names, matching the probe names in Solaris.[1] Add userland SDT probes definitions to sys/sdt.h. Sponsored by: The FreeBSD Foundation Discussed with: rwaston [1]
|
#
60ae52f7 |
|
21-Jun-2010 |
Ed Schouten <ed@FreeBSD.org> |
Use ISO C99 integer types in sys/kern where possible. There are only about 100 occurences of the BSD-specific u_int*_t datatypes in sys/kern. The ISO C99 integer types are used here more often.
|
#
22df1496 |
|
03-May-2010 |
Konstantin Belousov <kib@FreeBSD.org> |
MFC r206894: The cache_enter(9) function shall not be called for doomed dvp. Assert this. Verify that dvp is not reclaimed before calling cache_enter().
|
#
5673e3cb |
|
20-Apr-2010 |
Konstantin Belousov <kib@FreeBSD.org> |
The cache_enter(9) function shall not be called for doomed dvp. Assert this. In the reported panic, vdestroy() fired the assertion "vp has namecache for ..", because pseudofs may end up doing cache_enter() with reclaimed dvp, after dotdot lookup temporary unlocked dvp. Similar problem exists in ufs_lookup() for "." lookup, when vnode lock needs to be upgraded. Verify that dvp is not reclaimed before calling cache_enter(). Reported and tested by: pho Reviewed by: kan MFC after: 2 weeks
|
#
c9975476 |
|
17-Apr-2010 |
Konstantin Belousov <kib@FreeBSD.org> |
MFC r206671: Fix typo.
|
#
3e22320c |
|
15-Apr-2010 |
Konstantin Belousov <kib@FreeBSD.org> |
Fix typo. MFC after: 3 days
|
#
106c3802 |
|
14-Aug-2009 |
Konstantin Belousov <kib@FreeBSD.org> |
MFC r196203: Correctly handle unlock for !MAKEENTRY case. Approved by: re (rwatson)
|
#
8f408451 |
|
14-Aug-2009 |
Konstantin Belousov <kib@FreeBSD.org> |
Correctly handle unlock for !MAKEENTRY case, after successfull attempt of lock upgrade cache shall be unlocked from write. Reported by: Lucius Windschuh <lwindschuh googlemail com> Reviewed by: kan Approved by: re (rwatson)
|
#
c808c963 |
|
21-Jun-2009 |
Konstantin Belousov <kib@FreeBSD.org> |
Add explicit struct ucred * argument for VOP_VPTOCNP, to be used by vn_open_cred in default implementation. Valid struct ucred is needed for audit and MAC, and curthread credentials may be wrong. This further requires modifying the interface of vn_fullpath(9), but it is out of scope of this change. Reviewed by: rwatson
|
#
8a444404 |
|
05-Jun-2009 |
Joe Marcus Clarke <marcus@FreeBSD.org> |
Unlock the cache lock before returning when we run out of buffer space trying to fill in the full path name. Reported by: David Naylor <naylor.b.david@gmail.com> Approved by: kib
|
#
1358a795 |
|
31-May-2009 |
Konstantin Belousov <kib@FreeBSD.org> |
Unbreak the build. Add missed probes. Reviewed by: rwatson Pointy hat to: me
|
#
0449e6e1 |
|
31-May-2009 |
Konstantin Belousov <kib@FreeBSD.org> |
Eliminate code duplication in vn_fullpath1() around the cache lookups and calls to vn_vptocnp() by moving more of the common code to vn_vptocnp(). Rename vn_vptocnp() to vn_vptocnp_locked() to signify that cache is locked around the call. Do not track buffer position by both the pointer and offset, use only buflen to record the start of the free space. Export vn_vptocnp() for external consumers as a wrapper around vn_vptocnp_locked() that locks the cache and handles hold counts. Tested by: pho
|
#
348496ad |
|
17-Apr-2009 |
Alexander Kabaev <kan@FreeBSD.org> |
More fallout from negative dotdot caching. Negative entries should be removed from and reinserted to proper ncneg list. Reported by: pho Submitted by: kib
|
#
9cf67722 |
|
14-Apr-2009 |
Alexander Kabaev <kan@FreeBSD.org> |
Redo previous change using simpler patch that happens to be also more correct. Submitted by: tor
|
#
eed8a9ed |
|
14-Apr-2009 |
Alexander Kabaev <kan@FreeBSD.org> |
Fix yet another negative dotodot entry fallout. Reported by: pho
|
#
9d75482f |
|
11-Apr-2009 |
Alexander Kabaev <kan@FreeBSD.org> |
Fix v_cache_dd handling for negative entries. v_cache_dd pointer was not populated in parent directory if negative entry was being created, yet entry itself was added to the nc_neg list. It was possible for parent vnode to get discarded later, leaving negative entry pointing to now unused memory block. Reported by: dho Revewed by: kib
|
#
fd409594 |
|
11-Apr-2009 |
Konstantin Belousov <kib@FreeBSD.org> |
When zapping v_cache_dd for !MAKEENTRY case in cache_lookup(), we shall lock cache as writer. Reviewed by: kan
|
#
3f54086e |
|
10-Apr-2009 |
Konstantin Belousov <kib@FreeBSD.org> |
Cache_lookup() for DOTDOT drops dvp vnode lock, allowing dvp to be reclaimed. Check the condition and return ENOENT then. In nfs_lookup(), respect ENOENT return from cache_lookup() when it is caused by dvp reclaim. Reported and tested by: pho
|
#
5d5c1748 |
|
07-Apr-2009 |
Robert Watson <rwatson@FreeBSD.org> |
Nul-terminate strings in the VFS name cache, which negligibly change the size and cost of name cache entries, but make adding debugging and tracing easier. Add SDT DTrace probes for various namecache events: vfs:namecache:enter:done - new entry in the name cache, passed parent directory vnode pointer, name added to the cache, and child vnode pointer. vfs:namecache:enter_negative:done - new negative entry in the name cache, passed parent vnode pointer, name added to the cache. vfs:namecache:fullpath:enter - call to vn_fullpath1() is made, passed the vnode to resolve to a name. vfs:namecache:fullpath:hit - vn_fullpath1() successfully resolved a search for the parent of an object using the namecache, passed the discovered parent directory vnode pointer, name, and child vnode pointer. vfs:namecache:fullpath:miss - vn_fullpath1() failed to resolve a search for the parent of an object using the namecache, passed the child vnode pointer. vfs:namecache:fullpath:return - vn_fullpath1() has completed, passed the error number, and if that is zero, the vnode to resolve, and the returned path. vfs:namecache:lookup:hit - postive name cache entry hit, passed the parent directory vnode pointer, name, and child vnode pointer. vfs:namecache:lookup:hit_negative - negative name cache entry hit, passed the parent directory vnode pointer and name. vfs:namecache:lookup:miss - name cache miss, passed the parent directory pointer and the full remaining component name (not terminated after the cache miss component). vfs:namecache:purge:done - name cache purge for a vnode, passed the vnode pointer to purge. vfs:namecache:purge_negative:done - name cache purge of negative entries for children of a vnode, passed the vnode pointer to purge. vfs:namecache:purgevfs - name cache purge for a mountpoint, passed the mount pointer. Separate probes will also be invoked for each cache entry zapped. vfs:namecache:zap:done - name cache entry zapped, passed the parent directory vnode pointer, name, and child vnode pointer. vfs:namecache:zap_negative:done - negative name cache entry zapped, passed the parent directory vnode pointer and name. For any probes involving an extant name cache entry (enter, hit, zapp), we use the nul-terminated string for the name component. For misses, the remainder of the path, including later components, is provided as an argument instead since there is no handy nul-terminated version of the string around. This is arguably a bug. MFC after: 1 month Sponsored by: Google, Inc. Reviewed by: jhb, kan, kib (earlier version)
|
#
bb6418cb |
|
04-Apr-2009 |
Alexander Kabaev <kan@FreeBSD.org> |
Revert change 190655 temporarily. It breaks many setups where nullfs is used and needs to be revisited.
|
#
0e875eca |
|
02-Apr-2009 |
Peter Wemm <peter@FreeBSD.org> |
vn_vptocnp() unlocks the name cache and forgets to re-lock it before returning in one error case, and mistakenly unlocks it for the umount -f case.
|
#
607fc40b |
|
29-Mar-2009 |
Alexander Kabaev <kan@FreeBSD.org> |
Replace v_dd vnode pointer with v_cache_dd pointer to struct namecache in directory vnodes. Allow namecache dotdot entry to be created pointing from child vnode to parent vnode if no existing links in opposite direction exist. Use direct link from parent to child for dotdot lookups otherwise. This restores more efficient dotdot caching in NFS filesystems which was lost when vnodes stoppped being type stable. Reviewed by: kib
|
#
049ce093 |
|
24-Mar-2009 |
John Baldwin <jhb@FreeBSD.org> |
When a file lookup fails due to encountering a doomed vnode from a forced unmount, consistently return ENOENT rather than EBADF. Reviewed by: kib MFC after: 1 month
|
#
15fb32c0 |
|
20-Mar-2009 |
Konstantin Belousov <kib@FreeBSD.org> |
Do not underflow the buffer and then report the problem. Check for the condition before the buffer write. Also, since buflen is unsigned, previous check was ignored. Reviewed by: marcus Tested by: pho
|
#
83817ce3 |
|
20-Mar-2009 |
Konstantin Belousov <kib@FreeBSD.org> |
Remove unneeded braces to reduce used vertical screen space. The location was missed in r190140.
|
#
91940072 |
|
20-Mar-2009 |
Konstantin Belousov <kib@FreeBSD.org> |
Do not forget to adjust buflen for the first resolution of the path from namecache. While there, compare pointers for equiality. Reviewed by: marcus Tested by: pho
|
#
065fc451 |
|
20-Mar-2009 |
Konstantin Belousov <kib@FreeBSD.org> |
The nc_nlen member of the struct namecache contains the length of the cached name, not the length + 1. PR: 132620, 132542 Reported by: bf2006a yahoo com Tested by: bf2006a, pho Reviewed by: marcus
|
#
c4a8c2ee |
|
20-Mar-2009 |
Konstantin Belousov <kib@FreeBSD.org> |
When ktracing namei operations, log a result of the __getcwd(). MFC after: 1 week
|
#
bf5c835e |
|
20-Mar-2009 |
Konstantin Belousov <kib@FreeBSD.org> |
Remove unneeded braces to reduce used vertical screen space.
|
#
4ab2a9a0 |
|
09-Mar-2009 |
John Baldwin <jhb@FreeBSD.org> |
Move the debug.hashstat sysctl tree under DIAGNOSTIC. I measured the debug.hashstat.rawnchash sysctl in particular as taking 7 milliseconds on a 3GHz Intel Xeon (4x2) running 7.1. It accounted for almost a quarter of the total runtime of 'sysctl -a'. It also performs lots of copyout's while holding the namecache lock (this does not attempt to fix that). MFC after: 2 weeks
|
#
03964c8e |
|
19-Feb-2009 |
John Baldwin <jhb@FreeBSD.org> |
Enable caching of negative pathname lookups in the NFS client. To avoid stale entries, we save a copy of the directory's modification time when the first negative cache entry was added in the directory's NFS node. When a negative cache entry is hit during a pathname lookup, the parent directory's modification time is checked. If it has changed, all of the negative cache entries for that parent are purged and the lookup falls back to using the RPC. This required adding a new cache_purge_negative() method to the name cache to purge only negative cache entries for a given directory. Submitted by: mohans, Rick Macklem, Ricardo Labiaga @ NetApp Reviewed by: mohans
|
#
9078981a |
|
28-Jan-2009 |
John Baldwin <jhb@FreeBSD.org> |
Convert the global mutex protecting the directory lookup name cache from a mutex to a reader/writer lock. Lookup operations first grab a read lock and perform the lookup. If the operation results in a need to modify the cache, then it tries to do an upgrade. If that fails, it drops the read lock, obtains a write lock, and redoes the lookup.
|
#
8a7ef10b |
|
23-Jan-2009 |
John Baldwin <jhb@FreeBSD.org> |
- Mark all standalone INT/LONG/QUAD sysctl's MPSAFE. This is done inside the SYSCTL() macros and thus does not need to be done for all of the nodes scattered across the source tree. - Mark the name-cache related sysctl's (including debug.hashstat.*) MPSAFE. - Mark vm.loadavg MPSAFE. - Remove GIANT_REQUIRED from vmtotal() (everything in this routine already has sufficient locking) and mark vm.vmtotal MPSAFE. - Mark the vm.stats.(sys|vm).* sysctls MPSAFE.
|
#
58c1607e |
|
19-Jan-2009 |
Stephen McKay <mckay@FreeBSD.org> |
Add a limit on namecache entries. In normal operation, the number of cache entries is roughly equal to the number of active vnodes. However, when most of the recently accessed vnodes have many hard links, the number of cache entries can be 32000 times as large, exhausting kernel memory and provoking a panic in kmem_malloc(). MFC after: 2 weeks
|
#
83e73926 |
|
29-Dec-2008 |
Konstantin Belousov <kib@FreeBSD.org> |
In r185557, the check for existing negative entry for the given name did not compared nc_dvp with supplied parent directory vnode pointer. Add the check and note that now branches for vp != NULL and vp == NULL are the same, thus can be merged. Reported and reviewed by: kan Tested by: pho MFC after: 2 weeks
|
#
4769218f |
|
23-Dec-2008 |
Joe Marcus Clarke <marcus@FreeBSD.org> |
Do not KASSERT when vp->v_dd is NULL. Only directories which have had ".." looked up would have v_dd set to a non-NULL value. This fixes a panic seen when running installworld on a diskless system with a separate /usr file system. Submitted by: cracauer Approved by: kib
|
#
86dcb537 |
|
23-Dec-2008 |
Konstantin Belousov <kib@FreeBSD.org> |
Keep the hold on the vnode during VOP_VPTOCNP() call, allowing the vop implementation to drop vnode lock, if needed. Reported and tested by: pho
|
#
b9022449 |
|
11-Dec-2008 |
Joe Marcus Clarke <marcus@FreeBSD.org> |
Add a new VOP, VOP_VPTOCNP, which translates a vnode to its component name on a best-effort basis. Teach vn_fullpath to use this new VOP if a regular VFS cache lookup fails. This VOP is designed to supplement the VFS cache to provide a better chance that a vnode-to-name lookup will succeed. Currently, an implementation for devfs is being committed. The default implementation is to return ENOENT. A big thanks to kib for the mentorship on this, and to pho for running it through his stress test suite. Reviewed by: arch Approved by: kib
|
#
d6568724 |
|
02-Dec-2008 |
Konstantin Belousov <kib@FreeBSD.org> |
Shared lookup makes it possible to create several negative cache entries for one name. Then, creating inode with that name would remove one entry, leaving others dormant. Reclaiming the vnode would uncover negative entries, causing false return of ENOENT from the calls like stat, that do not create inode. Prevent creation of the duplicated negative entries. Reported and debugged with: pho Reviewed by: jhb X-MFC: after shared lookup changes
|
#
ef61995e |
|
25-Nov-2008 |
Joe Marcus Clarke <marcus@FreeBSD.org> |
Move vn_fullpath1() outside of FILEDESC locking. This is being done in advance of teaching vn_fullpath1() how to query file systems for vnode-to-name mappings when cache lookups fail. Thanks to kib for guidance and patience on this process. Reviewed by: kib Approved by: kib
|
#
d7f03759 |
|
19-Oct-2008 |
Ulf Lilleengen <lulf@FreeBSD.org> |
- Import the HEAD csup code which is the basis for the cvsmode work.
|
#
d2722d70 |
|
24-Sep-2008 |
John Baldwin <jhb@FreeBSD.org> |
Part 1 of making shared lookups more resilient with respect to forced unmounts. When we upgrade a vnode lock from shared to exclusive during a name cache lookup, fail the lookup with EBADF if the vnode is invalidated while we are waiting for the exclusive lock. Also, for correctness (though I'm not sure it can occur in practice), downgrade an exclusively locked vnode if it should be share locked. Tested by: pho
|
#
cbb598af |
|
18-Sep-2008 |
John Baldwin <jhb@FreeBSD.org> |
Sort includes.
|
#
969bf150 |
|
23-Aug-2008 |
John Baldwin <jhb@FreeBSD.org> |
Fix a race condition with concurrent LOOKUP namecache operations for a vnode not in the namecache when shared lookups are enabled (vfs.lookup_shared=1, it is currently off by default) and the filesystem supports shared lookups (e.g. NFS client). Specifically, if multiple concurrent LOOKUPs both miss in the name cache in parallel, each of the lookups may each end up adding an entry to the namecache resulting in duplicate entries in the namecache for the same pathname. A subsequent removal of the mapping of that pathname to that vnode (via remove or rename) would only evict one of the entries from the name cache. As a result, subseqent lookups for that pathname would still return the old vnode. This race was observed with shared lookups over NFS where a file was updated by writing a new file out to a temporary file name and then renaming that temporary file to the "real" file to effect atomic updates of a file. Other processes on the same client that were periodically reading the file would occasionally receive an ESTALE error from open(2) because the VOP_GETATTR() in nfs_open() would receive that error when given the stale vnode. The fix here is to check for duplicates in cache_enter() and just return if an entry for this same directory and leaf file name for this vnode is already in the cache. The check for duplicates is done by walking the per-vnode list of name cache entries. It is expected that this list should be very small in the common case (usually 0 or 1 entries during a cache_enter() since most files only have 1 "leaf" name). Reviewed by: ups, scottl MFC after: 2 months
|
#
cbd3ba3e |
|
16-Aug-2008 |
Alfred Perlstein <alfred@FreeBSD.org> |
Prevent crashes due to unlocked access to hash buckets in two sysctls. Use CACHE_LOCK to prevent crashes. Sysctls fixed: debug.hashstat.nchash and debug.hashstat.rawnchash. Obtained from: Juniper Networks MFC After: 1 week
|
#
dfc714fb |
|
31-Jul-2008 |
Christian S.J. Peron <csjp@FreeBSD.org> |
Currently, BSM audit pathname token generation for chrooted or jailed processes are not producing absolute pathname tokens. It is required that audited pathnames are generated relative to the global root mount point. This modification changes our implementation of audit_canon_path(9) and introduces a new function: vn_fullpath_global(9) which performs a vnode -> pathname translation relative to the global mount point based on the contents of the name cache. Much like vn_fullpath, vn_fullpath_global is a wrapper function which called vn_fullpath1. Further, the string parsing routines have been converted to use the sbuf(9) framework. This change also removes the conditional acquisition of Giant, since the vn_fullpath1 method will not dip into file system dependent code. The vnode locking was modified to use vhold()/vdrop() instead the vref() and vrele(). This will modify the hold count instead of modifying the user count. This makes more sense since it's the kernel that requires the reference to the vnode. This also makes sure that the vnode does not get recycled we hold the reference to it. [1] Discussed with: rwatson Reviewed by: kib [1] MFC after: 2 weeks
|
#
b03d7207 |
|
09-Apr-2008 |
Pawel Jakub Dawidek <pjd@FreeBSD.org> |
- Use LK_TYPE_MASK where needed. Actually after sys/sys/lockmgr.h:1.69 it is no longer needed, but for now we still want to be consistent with other similar checks in the tree. - Call ASSERT_VOP_ELOCKED() only when vget() returns 0. Reviewed by: jeff
|
#
0a3af16a |
|
31-Mar-2008 |
Konstantin Belousov <kib@FreeBSD.org> |
Add the utility function vn_commname() to retrieve the command name from the vfs namecache, when available. Reviewed by: rwatson, rdivacky Tested by: pho
|
#
237fdd78 |
|
16-Mar-2008 |
Robert Watson <rwatson@FreeBSD.org> |
In keeping with style(9)'s recommendations on macros, use a ';' after each SYSINIT() macro invocation. This makes a number of lightweight C parsers much happier with the FreeBSD kernel source, including cflow's prcc and lxr. MFC after: 1 month Discussed with: imp, rink
|
#
81c794f9 |
|
25-Feb-2008 |
Attilio Rao <attilio@FreeBSD.org> |
Axe the 'thread' argument from VOP_ISLOCKED() and lockstatus() as it is always curthread. As KPI gets broken by this patch, manpages and __FreeBSD_version will be updated by further commits. Tested by: Andrea Barberio <insomniac at slackware dot it>
|
#
22db15c0 |
|
13-Jan-2008 |
Attilio Rao <attilio@FreeBSD.org> |
VOP_LOCK1() (and so VOP_LOCK()) and VOP_UNLOCK() are only used in conjuction with 'thread' argument passing which is always curthread. Remove the unuseful extra-argument and pass explicitly curthread to lower layer functions, when necessary. KPI results broken by this change, which should affect several ports, so version bumping and manpage update will be further committed. Tested by: kris, pho, Diego Sardina <siarodx at gmail dot com>
|
#
cb05b60a |
|
09-Jan-2008 |
Attilio Rao <attilio@FreeBSD.org> |
vn_lock() is currently only used with the 'curthread' passed as argument. Remove this argument and pass curthread directly to underlying VOP_LOCK1() VFS method. This modify makes the code cleaner and in particular remove an annoying dependence helping next lockmgr() cleanup. KPI results, obviously, changed. Manpage and FreeBSD_version will be updated through further commits. As a side note, would be valuable to say that next commits will address a similar cleanup about VFS methods, in particular vop_lock1 and vop_unlock. Tested by: Diego Sardina <siarodx at gmail dot com>, Andrea Di Pasquale <whyx dot it at gmail dot com>
|
#
e6d64a0f |
|
22-Nov-2007 |
Kris Kennaway <kris@FreeBSD.org> |
Remove remaining Giant acquisition around vn_fullpath1. This was missed in r1.106 and has not been required for some years now. Reviewed by: jeff MFC After: 1 week
|
#
b4d7e298 |
|
21-Sep-2007 |
Pawel Jakub Dawidek <pjd@FreeBSD.org> |
Fix some locking cases where we ask for exclusively locked vnode, but we get shared locked vnode in instead when vfs.lookup_shared is set to 1. Discussed with: kib, kris Tested by: kris Approved by: re (kensmith)
|
#
dfe97ff4 |
|
18-Jun-2007 |
Pawel Jakub Dawidek <pjd@FreeBSD.org> |
We only flush entries related to the given file system. Currently there are no 'invalid' cache entires - file system is responsible for keeping it that way. The comment should have been updated in rev.1.25.
|
#
6e042171 |
|
25-May-2007 |
Pawel Jakub Dawidek <pjd@FreeBSD.org> |
To avoid a deadlock when handling .. directory during a lookup, we unlock parent vnode and relock it after locking child vnode. The problem was that we always relock it exclusively, even when it was share-locked. Discussed with: jeff
|
#
b4c85af9 |
|
25-May-2007 |
Pawel Jakub Dawidek <pjd@FreeBSD.org> |
We no longer need to put namecache entries onto temporary mplist. It was useful in revision 1.86, but should have been removed in 1.89.
|
#
950afe99 |
|
25-May-2007 |
Pawel Jakub Dawidek <pjd@FreeBSD.org> |
The cache_leaf_test() function seems to be unused, so remove it.
|
#
f013ccb7 |
|
22-May-2007 |
Pawel Jakub Dawidek <pjd@FreeBSD.org> |
- Remove redundant initialization. - Compare pointer with NULL.
|
#
5e3f7694 |
|
04-Apr-2007 |
Robert Watson <rwatson@FreeBSD.org> |
Replace custom file descriptor array sleep lock constructed using a mutex and flags with an sxlock. This leads to a significant and measurable performance improvement as a result of access to shared locking for frequent lookup operations, reduced general overhead, and reduced overhead in the event of contention. All of these are imported for threaded applications where simultaneous access to a shared file descriptor array occurs frequently. Kris has reported 2x-4x transaction rate improvements on 8-core MySQL benchmarks; smaller improvements can be expected for many workloads as a result of reduced overhead. - Generally eliminate the distinction between "fast" and regular acquisisition of the filedesc lock; the plan is that they will now all be fast. Change all locking instances to either shared or exclusive locks. - Correct a bug (pointed out by kib) in fdfree() where previously msleep() was called without the mutex held; sx_sleep() is now always called with the sxlock held exclusively. - Universally hold the struct file lock over changes to struct file, rather than the filedesc lock or no lock. Always update the f_ops field last. A further memory barrier is required here in the future (discussed with jhb). - Improve locking and reference management in linux_at(), which fails to properly acquire vnode references before using vnode pointers. Annotate improper use of vn_fullpath(), which will be replaced at a future date. In fcntl(), we conservatively acquire an exclusive lock, even though in some cases a shared lock may be sufficient, which should be revisited. The dropping of the filedesc lock in fdgrowtable() is no longer required as the sxlock can be held over the sleep operation; we should consider removing that (pointed out by attilio). Tested by: kris Discussed with: jhb, kris, attilio, jeff
|
#
873fbcd7 |
|
05-Mar-2007 |
Robert Watson <rwatson@FreeBSD.org> |
Further system call comment cleanup: - Remove also "MP SAFE" after prior "MPSAFE" pass. (suggested by bde) - Remove extra blank lines in some cases. - Add extra blank lines in some cases. - Remove no-op comments consisting solely of the function name, the word "syscall", or the system call name. - Add punctuation. - Re-wrap some comments.
|
#
4f0840f3 |
|
15-Jun-2006 |
Christian S.J. Peron <csjp@FreeBSD.org> |
Axe Giant from vn_fullpath(9). The vnode -> pathname lookup should be filesystem agnostic. We are not touching any file system specific functions in this code path. Since we have a cache lock, there is really no need to keep Giant around here. This eliminates Giant acquisitions for any syscall which is auditing pathnames. Discussed with: jeff
|
#
e98b5a89 |
|
16-Apr-2006 |
John-Mark Gurney <jmg@FreeBSD.org> |
remove duplicate sizeof vnode entry (debug.sizeof.vnode already existed)... move ncsize into debug.sizeof and rename to namecache...
|
#
2f0bca55 |
|
06-Feb-2006 |
Jeff Roberson <jeff@FreeBSD.org> |
- Don't check v_mount for NULL to determine if a vnode has been recycled. Use the more appropriate VI_DOOMED flag instead. Sponsored by: Isilon Systems, Inc. MFC After: 1 week
|
#
32b6dcd8 |
|
16-Jun-2005 |
Jeff Roberson <jeff@FreeBSD.org> |
- Fix a leaked reference to a vnode via v_dd. We rely on cache_purge() and cache_zap() to clear the v_dd pointers when a directory vnode is forcibly discarded. For this to work, all vnodes with v_dd pointers to a directory must also have name cache entries linked via v_cache_dst to that dvp otherwise we could not find them at cache_purge() time. The following code snipit could break this guarantee by unlinking a directory before fetching it's dotdot. The dotdot lookup would initialize the v_dd field of the unlinked directory which could never be cleared. To fix this we don't initialize v_dd for orphaned vnodes. printf("rmdir: %d\n", rmdir("../foo")); /* foo is cwd */ printf("chdir: %d\n", chdir("..")); printf("%s\n", getwd(NULL)); Sponsored by: Isilon Systems, Inc. Discovered by: kkenn Approved by: re (blanket vfs)
|
#
6bd8103d |
|
12-Jun-2005 |
Jeff Roberson <jeff@FreeBSD.org> |
- Clear v_dd in cache_zap() instead of cache_purge() as cache_purge() may not be called in all cases where we free the cnp. Sponsored by: Isilon Systems, Inc.
|
#
eff2d126 |
|
12-Jun-2005 |
Jeff Roberson <jeff@FreeBSD.org> |
- Add KTR_VFS messages for various name cache related events. Sponsored by: Isilon Systems, Inc.
|
#
1b2da2d0 |
|
11-Jun-2005 |
Jeff Roberson <jeff@FreeBSD.org> |
- Assert that we're not adding a doomed vnode to the name cache. Sponsored by: Isilon Systems, Inc.
|
#
4585e3ac |
|
13-Apr-2005 |
Jeff Roberson <jeff@FreeBSD.org> |
- Change all filesystems and vfs_cache to relock the dvp once the child is locked in the ISDOTDOT case. Se vfs_lookup.c r1.79 for details. Sponsored by: Isilon Systems, Inc.
|
#
7ce7f713 |
|
29-Mar-2005 |
David Schultz <das@FreeBSD.org> |
Eliminate v_id and v_ddid. The name cache now holds references to vnodes whose names it caches, so we no longer need a `generation number' to tell us if a referenced vnode is invalid. Replace the use of the parent's v_id in the hash function with the address of the parent vnode. Tested by: Peter Holm Glanced at by: jeff, phk
|
#
dd33f0d9 |
|
29-Mar-2005 |
David Schultz <das@FreeBSD.org> |
Merge kern___cwd() and vn_fullpath(), which were virtually identical, except for places where people forget to update one of them. We now collect only one set of stats for both of these routines. Other changes in this commit include: - Start acquiring Giant again in vn_fullpath(), since it is required when crossing a mount point. - Expand the scope of the cache lock to avoid dropping it and picking it up again for every pathname component. This also makes it trivial to avoid races in stats collection. - Assert that nc_dvp == v_dd for directories instead of returning an error to userland when this is not true. AFAIK, it should always be true when v_dd is non-null. - For vn_fullpath(), handle the first (non-directory) vnode separately. Glanced at by: jeff, phk
|
#
5280e61f |
|
28-Mar-2005 |
Jeff Roberson <jeff@FreeBSD.org> |
- Move the logic that locks and refs the new vnode from vfs_cache_lookup() to cache_lookup(). This allows us to acquire the vnode interlock before dropping the cache lock. This protects the vnodes identity until we have locked it. Sponsored by: Isilon Systems, Inc.
|
#
571211c4 |
|
29-Mar-2005 |
Jeff Roberson <jeff@FreeBSD.org> |
- Get rid of the old LOOKUP_SHARED code. namei() now supplies the proper lock flags via cn_lkflag. Sponsored by: Isilon Systems, Inc.
|
#
b75719af |
|
29-Mar-2005 |
Jeff Roberson <jeff@FreeBSD.org> |
- Invalidate the childrens v_dd pointers when we cache_purge() a directory. Otherwise the stale pointer may be accessed after a vnode is freed. Sponsored by: Isilon Systems, Inc.
|
#
f7b404d8 |
|
28-Mar-2005 |
Jeff Roberson <jeff@FreeBSD.org> |
- Remove an unused variable. Sponsored by: Isilon Systems, Inc.
|
#
ee5a0a2d |
|
28-Mar-2005 |
Jeff Roberson <jeff@FreeBSD.org> |
- We no longer have to bother with PDIRUNLOCK, lookup() handles it for us. Sponsored by: Isilon Systems, Inc.
|
#
fdd6a3ff |
|
23-Mar-2005 |
Jeff Roberson <jeff@FreeBSD.org> |
- All of the bugs which lead to the complication of the LOOKUP_SHARED config option have now been fixed. All filesystems are properly locked and checked via DEBUG_VFS_LOCKS. Remove the workaround code. Sponsored by: Isilon Systems, Inc.
|
#
2adc2b87 |
|
09-Feb-2005 |
Poul-Henning Kamp <phk@FreeBSD.org> |
Make a SYSCTL_NODE and a mutex static
|
#
799cc2dc |
|
24-Jan-2005 |
Jeff Roberson <jeff@FreeBSD.org> |
- Simplify the cache locking. The lock order relationship with the vnode lock is much simpler than I originally thought it would be. Now, the cache lock is always acquired before the vnode lock. - Provide some gotos in __getcwd() to simplify the unlocking a bit. - Move Giant acquisition down into __getcwd(). Sponsored By: Isilon Systems, Inc.
|
#
9454b2d8 |
|
06-Jan-2005 |
Warner Losh <imp@FreeBSD.org> |
/* -> /*- for copyright notices, minor format tweaks as necessary
|
#
7f8a436f |
|
05-Apr-2004 |
Warner Losh <imp@FreeBSD.org> |
Remove advertising clause from University of California Regent's license, per letter dated July 22, 1999. Approved by: core
|
#
98d7d155 |
|
05-Oct-2003 |
Jeff Roberson <jeff@FreeBSD.org> |
- Apply a big giant lock around the namecache. This has been sitting in my tree since BSDcon.
|
#
c2935410 |
|
13-Jun-2003 |
Dag-Erling Smørgrav <des@FreeBSD.org> |
Make the VFS cache use zones instead of malloc(9). This results in a small but noticeable increase in performance for name lookup operations. The code uses two zones, one for short names (less than 32 characters) and one for long names (up to NAME_MAX). Since most file names are fairly short, this saves a considerable amount of space that would otherwise be wasted if we always allocated NAME_MAX bytes. The cutoff value of 32 characters was picked arbitrarily and may benefit from some tweaking; it could also be made into a tunable. Submitted by: hmp
|
#
ffe92432 |
|
11-Jun-2003 |
Dag-Erling Smørgrav <des@FreeBSD.org> |
Whitespace cleanup.
|
#
677b542e |
|
10-Jun-2003 |
David E. O'Brien <obrien@FreeBSD.org> |
Use __FBSDID().
|
#
cc34e37e |
|
20-Mar-2003 |
Poul-Henning Kamp <phk@FreeBSD.org> |
Backout the getcwd changes, a more comprehensive effort will be needed.
|
#
9eaf5abc |
|
16-Mar-2003 |
Poul-Henning Kamp <phk@FreeBSD.org> |
(This commit certainly increases the need for a wash&clean of vfs_cache.c, but I decided that it was important for this patch to not bit-rot, and since it is mainly moving code around, the total amount of entropy is epsilon /phk) This is a patch to move the common parts of linux_getcwd() back into kern/vfs_cache.c so that the standard FreeBSD libc getcwd() can use it's extended functionality. The linux syscall linux_getcwd() in compat/linux/linux_getcwd.c has been rewritten to use it too. It should be possible to simplify libc's getcwd() after this. No doubt this code needs some cleaning up, since I've left in the sysctl variables I used for debugging. PR: 48169 Submitted by: James Whitwell <abacau@yahoo.com.au>
|
#
a163d034 |
|
18-Feb-2003 |
Warner Losh <imp@FreeBSD.org> |
Back out M_* changes, per decision of the TRB. Approved by: trb
|
#
1f5a94d5 |
|
15-Feb-2003 |
Andrew R. Reiter <arr@FreeBSD.org> |
- Update a couple of comments to make sense with what today's code is doing (stale comments make arr something something ;)).
|
#
da8f0c84 |
|
15-Feb-2003 |
Andrew R. Reiter <arr@FreeBSD.org> |
- Remove old comment for PURGE() as it no longer exists and implied it was a comment to cache_zap(). - Add a comment to quickly state what cache_zap() does. Reviewed by: phk, mux
|
#
44956c98 |
|
21-Jan-2003 |
Alfred Perlstein <alfred@FreeBSD.org> |
Remove M_TRYWAIT/M_WAITOK/M_WAIT. Callers should use 0. Merge M_NOWAIT/M_DONTWAIT into a single flag M_NOWAIT.
|
#
48b52b7a |
|
02-Sep-2002 |
Ian Dowse <iedowse@FreeBSD.org> |
Split up __getcwd so that kernel callers of the internal version can specify whether the buffer is in user or system space.
|
#
18c6acee |
|
05-Aug-2002 |
Jeff Roberson <jeff@FreeBSD.org> |
- Move a VOP assert to the right place. Spotted by: i386 tinderbox
|
#
e6e370a7 |
|
04-Aug-2002 |
Jeff Roberson <jeff@FreeBSD.org> |
- Replace v_flag with v_iflag and v_vflag - v_vflag is protected by the vnode lock and is used when synchronization with VOP calls is needed. - v_iflag is protected by interlock and is used for dealing with vnode management issues. These flags include X/O LOCK, FREE, DOOMED, etc. - All accesses to v_iflag and v_vflag have either been locked or marked with mp_fixme's. - Many ASSERT_VOP_LOCKED calls have been added where the locking was not clear. - Many functions in vfs_subr.c were restructured to provide for stronger locking. Idea stolen from: BSD/OS
|
#
210a5a71 |
|
28-Jun-2002 |
Alfred Perlstein <alfred@FreeBSD.org> |
nuke caddr_t.
|
#
0e2d6cc8 |
|
14-May-2002 |
Jeff Roberson <jeff@FreeBSD.org> |
Disable the shared locking namei() code for now. It breaks several stacking filesystems. This is on hold until the rest of VFS Locking is reviewed and deemed safe. It can be enabled with 'options LOOKUP_SHARED'.
|
#
a59f8b9e |
|
08-Apr-2002 |
Jeff Roberson <jeff@FreeBSD.org> |
Turn #ifdef LOOKUP_SHARED into #ifndef LOOKUP_EXCLUSIVE to enable this behavior by default. Also, change the options line to reflect this. If there are no problems reported this will become the only behavior and the knob will be removed in a month or so. Demanded by: obrien
|
#
cf4ce70b |
|
07-Apr-2002 |
David Malone <dwmalone@FreeBSD.org> |
Remove a comment which relates to the old name cache code, which was replaced in 1997. Approved by: phk
|
#
4d77a549 |
|
19-Mar-2002 |
Alfred Perlstein <alfred@FreeBSD.org> |
Remove __P.
|
#
8de00f4a |
|
11-Mar-2002 |
Jeff Roberson <jeff@FreeBSD.org> |
This patch adds the "LOCKSHARED" option to namei which causes it to only acquire shared locks on leafs. The stat() and open() calls have been changed to make use of this new functionality. Using shared locks in these cases is sufficient and can significantly reduce their latency if IO is pending to these vnodes. Also, this reduces the number of exclusive locks that are floating around in the system, which helps reduce the number of deadlocks that occur. A new kernel option "LOOKUP_SHARED" has been added. It defaults to off so this patch can be turned on for testing, and should eventually go away once it is proven to be stable. I have personally been running this patch for over a year now, so it is believed to be fully stable. Reviewed by: jake, obrien Approved by: jake
|
#
eb8e6d52 |
|
05-Mar-2002 |
Eivind Eklund <eivind@FreeBSD.org> |
Document all functions, global and static variables, and sysctls. Includes some minor whitespace changes, and re-ordering to be able to document properly (e.g, grouping of variables and the SYSCTL macro calls for them, where the documentation has been added.) Reviewed by: phk (but all errors are mine)
|
#
362912eb |
|
17-Feb-2002 |
Poul-Henning Kamp <phk@FreeBSD.org> |
Remove cache_purgeleafdirs(), it has been #if 0 for quite some time.
|
#
9e209b12 |
|
13-Jan-2002 |
Alfred Perlstein <alfred@FreeBSD.org> |
Include sys/_lock.h and sys/_mutex.h to reduce namespace pollution. Requested by: jhb
|
#
426da3bc |
|
13-Jan-2002 |
Alfred Perlstein <alfred@FreeBSD.org> |
SMP Lock struct file, filedesc and the global file list. Seigo Tanimura (tanimura) posted the initial delta. I've polished it quite a bit reducing the need for locking and adapting it for KSE. Locks: 1 mutex in each filedesc protects all the fields. protects "struct file" initialization, while a struct file is being changed from &badfileops -> &pipeops or something the filedesc should be locked. 1 mutex in each struct file protects the refcount fields. doesn't protect anything else. the flags used for garbage collection have been moved to f_gcflag which was the FILLER short, this doesn't need locking because the garbage collection is a single threaded container. could likely be made to use a pool mutex. 1 sx lock for the global filelist. struct file * fhold(struct file *fp); /* increments reference count on a file */ struct file * fhold_locked(struct file *fp); /* like fhold but expects file to locked */ struct file * ffind_hold(struct thread *, int fd); /* finds the struct file in thread, adds one reference and returns it unlocked */ struct file * ffind_lock(struct thread *, int fd); /* ffind_hold, but returns file locked */ I still have to smp-safe the fget cruft, I'll get to that asap.
|
#
45fb069a |
|
21-Oct-2001 |
Dag-Erling Smørgrav <des@FreeBSD.org> |
Convert textvp_fullpath() into the more generic vn_fullpath() which takes a struct thread * and a struct vnode * instead of a struct proc *. Temporarily add a textvp_fullpath macro for compatibility.
|
#
b5810bab |
|
30-Sep-2001 |
Matthew Dillon <dillon@FreeBSD.org> |
After extensive testing it has been determined that adding complexity to avoid removing higher level directory vnodes from the namecache has no perceivable effect and will be removed. This is especially true when vmiodirenable is turned on, which it is by default now. ( vmiodirenable makes a huge difference in directory caching ). The vfs.vmiodirenable and vfs.nameileafonly sysctls have been left in to allow further testing, but I expect to rip out vfs.nameileafonly soon too. I have also determined through testing that the real problem with numvnodes getting too large is due to the VM Page cache preventing the vnode from being reclaimed. The directory stuff made only a tiny dent relative to Poul's original code, enough so that some tests succeeded. But tests with several million small files show that the bigger problem is the VM Page cache. This will have to be addressed by a future commit. MFC after: 3 days
|
#
b40ce416 |
|
12-Sep-2001 |
Julian Elischer <julian@FreeBSD.org> |
KSE Milestone 2 Note ALL MODULES MUST BE RECOMPILED make the kernel aware that there are smaller units of scheduling than the process. (but only allow one thread per process at this time). This is functionally equivalent to teh previousl -current except that there is a thread associated with each process. Sorry john! (your next MFC will be a doosie!) Reviewed by: peter@freebsd.org, dillon@freebsd.org X-MFC after: ha ha ha ha
|
#
7476f7e8 |
|
04-Sep-2001 |
Ian Dowse <iedowse@FreeBSD.org> |
Fix a memory leak in __getcwd() that can occur after a filesystem has been forcibly unmounted. If the filesystem root vnode is reached and it has no associated mountpoint (vp->v_mount == NULL), __getcwd would return without freeing 'buf'. Add the missing free() call. PR: kern/30306 Submitted by: Mike Potanin <potanin@mccme.ru> MFC after: 1 week
|
#
fb919e4d |
|
01-May-2001 |
Mark Murray <markm@FreeBSD.org> |
Undo part of the tangle of having sys/lock.h and sys/mutex.h included in other "system" header files. Also help the deprecation of lockmgr.h by making it a sub-include of sys/lock.h and removing sys/lockmgr.h form kernel .c files. Sort sys/*.h includes where possible in affected files. OK'ed by: bde (with reservations)
|
#
60fb0ce3 |
|
28-Apr-2001 |
Greg Lehey <grog@FreeBSD.org> |
Revert consequences of changes to mount.h, part 2. Requested by: bde
|
#
d98dc34f |
|
23-Apr-2001 |
Greg Lehey <grog@FreeBSD.org> |
Correct #includes to work with fixed sys/mount.h.
|
#
759cb263 |
|
18-Apr-2001 |
Seigo Tanimura <tanimura@FreeBSD.org> |
Reclaim directory vnodes held in namecache if few free vnodes are available. Only directory vnodes holding no child directory vnodes held in v_cache_src are recycled, so that directory vnodes near the root of the filesystem hierarchy remain in namecache and directory vnodes are not reclaimed in cascade. The period of vnode reclaiming attempt and the number of vnodes attempted to reclaim can be tuned via sysctl(2). Suggested by: tegge Approved by: phk
|
#
9d10eb0c |
|
10-Apr-2001 |
Peter Wemm <peter@FreeBSD.org> |
Create debug.hashstat.[raw]nchash and debug.hashstat.[raw]nfsnode to enable easy access to the hash chain stats. The raw prefixed versions dump an integer array to userland with the chain lengths. This cheats and calls it an array of 'struct int' rather than 'int' or sysctl -a faithfully dumps out the 128K array on an average machine. The non-raw versions return 4 integers: count, number of chains used, maximum chain length, and percentage utilization (fixed point, multiplied by 100). The raw forms are more useful for analyzing the hash distribution, while the other form can be read easily by humans and stats loggers.
|
#
439fea92 |
|
19-Mar-2001 |
Peter Wemm <peter@FreeBSD.org> |
Use the same API as the example code. Allow the initial hash value to be passed in, as the examples do. Incrementally hash in the dvp->v_id (using the official api) rather than add it. This seems to help power-of-two predictable filename trees where the filenames repeat on a power-of-two cycle and the directory trees have power-of-two components in it. The simple add then mask was causing things like 12000+ entry collision chains while most other entries have between 0 and 3 entries each. This way seems to improve things.
|
#
6eb39ac8 |
|
17-Mar-2001 |
Peter Wemm <peter@FreeBSD.org> |
Use a generic implementation of the Fowler/Noll/Vo hash (FNV hash). Make the name cache hash as well as the nfsnode hash use it. As a special tweak, create an unsigned version of register_t. This allows us to use a special tweak for the 64 bit versions that significantly speeds up the i386 version (ie: int64 XOR int64 is slower than int64 XOR int32). The code layout is a little strange for the string function, but I was able to get between 5 to 10% improvement over the original version I started with. The layout affects gcc code generation choices and this way was fastest on x86 and alpha. Note that 'CPUTYPE=p3' etc makes a fair difference to this. It is around 45% faster with -march=pentiumpro on a p6 cpu.
|
#
959b7375 |
|
08-Dec-2000 |
Poul-Henning Kamp <phk@FreeBSD.org> |
Staticize some malloc M_ instances.
|
#
138e514c |
|
06-Dec-2000 |
Peter Wemm <peter@FreeBSD.org> |
Untangle vfsinit() a bit. Use seperate sysinit functions rather than having a super-function calling bits all over the place.
|
#
aa542997 |
|
19-Nov-2000 |
Robert Watson <rwatson@FreeBSD.org> |
o Export nchstats ("VFS cache effectiveness statistics") using SYSCTL_OPAQUE. This removes a reason that systat requires setgid kmem. More to come.
|
#
3ff1a2f4 |
|
17-Sep-2000 |
Boris Popov <bp@FreeBSD.org> |
Add new flag PDIRUNLOCK to the component.cn_flags which should be set by filesystem lookup() routine if it unlocks parent directory. This flag should be carefully tracked by filesystems if they want to work properly with nullfs and other stacked filesystems. VFS takes advantage of this flag to perform symantically correct usage of vrele() instead of vput() if parent directory already unlocked. If filesystem fails to track this flag then previous codepath in VFS left unchanged. Convert UFS code to set PDIRUNLOCK flag if necessary. Other filesystmes will be changed after some period of testing. Reviewed in general by: mckusick, dillon, adrian Obtained from: NetBSD
|
#
67b23794 |
|
09-Sep-2000 |
Boris Popov <bp@FreeBSD.org> |
Change variable naming to be consistent with the rest of VFS code. Reduce number of indirections by using already fetched values.
|
#
9701cd40 |
|
05-Jul-2000 |
John Baldwin <jhb@FreeBSD.org> |
Support for unsigned integer and long sysctl variables. Update the SYSCTL_LONG macro to be consistent with other integer sysctl variables and require an initial value instead of assuming 0. Update several sysctl variables to use the unsigned types. PR: 15251 Submitted by: Kelly Yancey <kbyanc@posi.net>
|
#
e3975643 |
|
25-May-2000 |
Jake Burkholder <jake@FreeBSD.org> |
Back out the previous change to the queue(3) interface. It was not discussed and should probably not happen. Requested by: msmith and others
|
#
740a1973 |
|
23-May-2000 |
Jake Burkholder <jake@FreeBSD.org> |
Change the way that the queue(3) structures are declared; don't assume that the type argument to *_HEAD and *_ENTRY is a struct. Suggested by: phk Reviewed by: phk Approved by: mdodd
|
#
b7db1901 |
|
26-Apr-2000 |
Brian Feldman <green@FreeBSD.org> |
Move procfs_fullpath() to vfs_cache.c, with a rename to textvp_fullpath(). There's no excuse to have code in synthetic filestores that allows direct references to the textvp anymore. Feature requested by: msmith Feature agreed to by: warner Move requested by: phk Move agreed to by: bde
|
#
8a2852b1 |
|
21-Apr-2000 |
Brian Feldman <green@FreeBSD.org> |
Move the declaration of "struct namecache" to vnode.h, as it can be useful elsewhere. Note, of course, that in an ideal world nothing should need to see our VFS implementation :-/
|
#
194a0b6c |
|
13-Feb-2000 |
Peter Wemm <peter@FreeBSD.org> |
Avoid a panic in __getcwd(2) when combined with umount -f.
|
#
3b6fb885 |
|
02-Oct-1999 |
Poul-Henning Kamp <phk@FreeBSD.org> |
Before we start to mess with the VFS name-cache clean things up a little bit: Isolate the namecache in its own file, and give it a dedicated malloc type.
|
#
c3aac50f |
|
27-Aug-1999 |
Peter Wemm <peter@FreeBSD.org> |
$Id$ -> $FreeBSD$
|
#
22f054e2 |
|
24-Apr-1999 |
Poul-Henning Kamp <phk@FreeBSD.org> |
Fix a braino in the v_id wraparound code. Give more (current) details in comment. PR: 11307 Spotted by: Ville-Pertti Keinonen <will@iki.fi>
|
#
355a2610 |
|
09-Sep-1998 |
Bruce Evans <bde@FreeBSD.org> |
Don't use CTL_VFS at the wrong level. This caused loops in the sysctl tree if CTL_VFS happened to get assigned as a type number to a vfs that has some vfs sysctls.
|
#
1aa9ea7c |
|
19-Dec-1997 |
Bruce Evans <bde@FreeBSD.org> |
Removed some bogus casts.
|
#
4a11ca4e |
|
07-Nov-1997 |
Poul-Henning Kamp <phk@FreeBSD.org> |
Remove a bunch of variables which were unused both in GENERIC and LINT. Found by: -Wunused
|
#
cec0f20c |
|
16-Oct-1997 |
Poul-Henning Kamp <phk@FreeBSD.org> |
VFS mega cleanup commit (x/N) 1. Add new file "sys/kern/vfs_default.c" where default actions for VOPs go. Implement proper defaults for ABORTOP, BWRITE, LEASE, POLL, REVOKE and STRATEGY. Various stuff spread over the entire tree belongs here. 2. Change VOP_BLKATOFF to a normal function in cd9660. 3. Kill VOP_BLKATOFF, VOP_TRUNCATE, VOP_VFREE, VOP_VALLOC. These are private interface functions between UFS and the underlying storage manager layer (FFS/LFS/MFS/EXT2FS). The functions now live in struct ufsmount instead. 4. Remove a kludge of VOP_ functions in all filesystems, that did nothing but obscure the simplicity and break the expandability. If a filesystem doesn't implement VOP_FOO, it shouldn't have an entry for it in its vnops table. The system will try to DTRT if it is not implemented. There are still some cruft left, but the bulk of it is done. 5. Fix another VCALL in vfs_cache.c (thanks Bruce!)
|
#
138ec1f7 |
|
15-Oct-1997 |
Poul-Henning Kamp <phk@FreeBSD.org> |
vnops megacommit 1. Use the default function to access all the specfs operations. 2. Use the default function to access all the fifofs operations. 3. Use the default function to access all the ufs operations. 4. Fix VCALL usage in vfs_cache.c 5. Use VOCALL to access specfs functions in devfs_vnops.c 6. Staticize most of the spec and fifofs vnops functions. 7. Make UFS panic if it lacks bits of the underlying storage handling.
|
#
46c320ba |
|
24-Sep-1997 |
Poul-Henning Kamp <phk@FreeBSD.org> |
Add one more counter so we can truly find out how good our name cache is. If we don't find something and don't what to have found something, it's actually a success.
|
#
00544193 |
|
24-Sep-1997 |
Poul-Henning Kamp <phk@FreeBSD.org> |
A couple of handles to tweak, more statistics.
|
#
4d1122bd |
|
04-Sep-1997 |
Poul-Henning Kamp <phk@FreeBSD.org> |
Revert to the previous hashing, double the hashtable size instead.
|
#
119b6f4c |
|
03-Sep-1997 |
Poul-Henning Kamp <phk@FreeBSD.org> |
Use 2^N hash sizes rather than primesize, this replaces a division with an and. (Submitted by davidg) Preemptively record ".." values. Reviewed by: phk
|
#
e4ba6a82 |
|
02-Sep-1997 |
Bruce Evans <bde@FreeBSD.org> |
Removed unused #includes.
|
#
a051452a |
|
31-Aug-1997 |
Poul-Henning Kamp <phk@FreeBSD.org> |
Change the 0xdeadb hack to a flag called VDOOMED. Introduce VFREE which indicates that vnode is on freelist. Rename vholdrele() to vdrop(). Create vfree() and vbusy() to add/delete vnode from freelist. Add vfree()/vbusy() to keep (v_holdcnt != 0 || v_usecount != 0) vnodes off the freelist. Generalize vhold()/v_holdcnt to mean "do not recycle". Fix reassignbuf()s lack of use of vhold(). Use vhold() instead of checking v_cache_src list. Remove vtouch(), the vnodes are always vget'ed soon enough after for it to have any measuable effect. Add sysctl debug.freevnodes to keep track of things. Move cache_purge() up in getnewvnodes to avoid race. Decrement v_usecount after VOP_INACTIVE(), put a vhold() on it during VOP_INACTIVE() Unmacroize vhold()/vdrop() Print out VDOOMED and VFREE flags (XXX: should use %b) Reviewed by: dyson
|
#
0fa2443f |
|
26-Aug-1997 |
Poul-Henning Kamp <phk@FreeBSD.org> |
Uncut&paste cache_lookup(). This unifies several times in theory indentical 50 lines of code. The filesystems have a new method: vop_cachedlookup, which is the meat of the lookup, and use vfs_cache_lookup() for their vop_lookup method. vfs_cache_lookup() will check the namecache and pass on to the vop_cachedlookup method in case of a miss. It's still the task of the individual filesystems to populate the namecache with cache_enter(). Filesystems that do not use the namecache will just provide the vop_lookup method as usual.
|
#
2401f27c |
|
04-Aug-1997 |
Poul-Henning Kamp <phk@FreeBSD.org> |
remove unused MAXVNODEUSE macro.
|
#
b15a966e |
|
04-May-1997 |
Poul-Henning Kamp <phk@FreeBSD.org> |
1. Add a {pointer, v_id} pair to the vnode to store the reference to the ".." vnode. This is cheaper storagewise than keeping it in the namecache, and it makes more sense since it's a 1:1 mapping. 2. Also handle the case of "." more intelligently rather than stuff the namecache with pointless entries. 3. Add two lists to the vnode and hang namecache entries which go from or to this vnode. When cleaning a vnode, delete all namecache entries it invalidates. 4. Never reuse namecache enties, malloc new ones when we need it, free old ones when they die. No longer a hard limit on how many we can have. 5. Remove the upper limit on namelength of namecache entries. 6. Make a global list for negative namecache entries, limit their number to a sysctl'able (debug.ncnegfactor) fraction of the total namecache. Currently the default fraction is 1/16th. (Suggestions for better default wanted!) 7. Assign v_id correctly in the face of 32bit rollover. 8. Remove the LRU list for namecache entries, not needed. Remove the #ifdef NCH_STATISTICS stuff, it's not needed either. 9. Use the vnode freelist as a true LRU list, also for namecache accesses. 10. Reuse vnodes more aggresively but also more selectively, if we can't reuse, malloc a new one. There is no longer a hard limit on their number, they grow to the point where we don't reuse potentially usable vnodes. A vnode will not get recycled if still has pages in core or if it is the source of namecache entries (Yes, this does indeed work :-) "." and ".." are not namecache entries any longer...) 11. Do not overload the v_id field in namecache entries with whiteout information, use a char sized flags field instead, so we can get rid of the vpid and v_id fields from the namecache struct. Since we're linked to the vnodes and purged when they're cleaned, we don't have to check the v_id any more. 12. NFS knew about the limitation on name length in the namecache, it shouldn't and doesn't now. Bugs: The namecache statistics no longer includes the hits for ".." and "." hits. Performance impact: Generally in the +/- 0.5% for "normal" workstations, but I hope this will allow the system to be selftuning over a bigger range of "special" applications. The case where RAM is available but unused for cache because we don't have any vnodes should be gone. Future work: Straighten out the namecache statistics. "desiredvnodes" is still used to (bogusly ?) size hash tables in the filesystems. I have still to find a way to safely free unused vnodes back so their number can shrink when not needed. There is a few uses of the v_id field left in the filesystems, scheduled for demolition at a later time. Maybe a one slot cache for unused namecache entries should be implemented to decrease the malloc/free frequency.
|
#
d8d6519c |
|
08-Mar-1997 |
Bruce Evans <bde@FreeBSD.org> |
Fixed the hash formula. Lite2 doesn't have phashinit(), so Lite2's hash formula uses `& nchash'. This is very broken when nchash is a prime number instead of 1 less than a power of 2, but the Lite2 formula was merged in. Merged some cosmetic changes from Lite2, rev.1.21 and Lite1. The merge was difficult because the Lite2 code is essentially ours (phk's) except where Lite2 improved or broke it. Summary of the Lite2 changes: - in the copyright, phk's rights have been transferred to the Regents. This change should be reviewed. - nchENOENT went away; the "no" vnode is now simply 0. - comments were improved. - style was "improved". - goto instead of Fanatism (sic) was considered bad :-). - there are some small changes to support whiteouts. - new cache entries are added in more cases. More work is required near here to change the hash table size if kern.desiredvnodes is changed using sysctl. - rescanning of the hash bucket in cache_purgevfs() was removed. This change should be reviewed.
|
#
6875d254 |
|
22-Feb-1997 |
Peter Wemm <peter@FreeBSD.org> |
Back out part 1 of the MCFH that changed $Id$ to $FreeBSD$. We are not ready for it yet.
|
#
996c772f |
|
09-Feb-1997 |
John Dyson <dyson@FreeBSD.org> |
This is the kernel Lite/2 commit. There are some requisite userland changes, so don't expect to be able to run the kernel as-is (very well) without the appropriate Lite/2 userland changes. The system boots and can mount UFS filesystems. Untested: ext2fs, msdosfs, NFS Known problems: Incorrect Berkeley ID strings in some files. Mount_std mounts will not work until the getfsent library routine is changed. Reviewed by: various people Submitted by: Jeffery Hsu <hsu@freebsd.org>
|
#
1130b656 |
|
14-Jan-1997 |
Jordan K. Hubbard <jkh@FreeBSD.org> |
Make the long-awaited change from $Id$ to $FreeBSD$ This will make a number of things easier in the future, as well as (finally!) avoiding the Id-smashing problem which has plagued developers for so long. Boy, I'm glad we're not using sup anymore. This update would have been insane otherwise.
|
#
edbfedac |
|
11-Mar-1996 |
Peter Wemm <peter@FreeBSD.org> |
Import 4.4BSD-Lite2 onto the vendor branch, note that in the kernel, all files are off the vendor branch, so this should not change anything. A "U" marker generally means that the file was not changed in between the 4.4Lite and Lite-2 releases, and does not need a merge. "C" generally means that there was a change. [note new unused (in this form) syscalls.conf, to be 'cvs rm'ed]
|
#
bd7e5f99 |
|
18-Jan-1996 |
John Dyson <dyson@FreeBSD.org> |
Eliminated many redundant vm_map_lookup operations for vm_mmap. Speed up for vfs_bio -- addition of a routine bqrelse to greatly diminish overhead for merged cache. Efficiency improvement for vfs_cluster. It used to do alot of redundant calls to cluster_rbuild. Correct the ordering for vrele of .text and release of credentials. Use the selective tlb update for 486/586/P6. Numerous fixes to the size of objects allocated for files. Additionally, fixes in the various pagers. Fixes for proper positioning of vnode_pager_setsize in msdosfs and ext2fs. Fixes in the swap pager for exhausted resources. The pageout code will not as readily thrash. Change the page queue flags (PG_ACTIVE, PG_INACTIVE, PG_FREE, PG_CACHE) into page queue indices (PQ_ACTIVE, PQ_INACTIVE, PQ_FREE, PQ_CACHE), thereby improving efficiency of several routines. Eliminate even more unnecessary vm_page_protect operations. Significantly speed up process forks. Make vm_object_page_clean more efficient, thereby eliminating the pause that happens every 30seconds. Make sequential clustered writes B_ASYNC instead of B_DELWRI even in the case of filesystems mounted async. Fix a panic with busy pages when write clustering is done for non-VMIO buffers.
|
#
79c0c4b7 |
|
22-Dec-1995 |
Poul-Henning Kamp <phk@FreeBSD.org> |
kern_conf.c: remove a now unused variable. vfs_cache.c: Fix a very rare probelm in the vnode-cache. Submitted by: Terry Lambert <terry@lambert.org>
|
#
f708ef1b |
|
14-Dec-1995 |
Poul-Henning Kamp <phk@FreeBSD.org> |
Another mega commit to staticize things.
|
#
a98ca469 |
|
29-Oct-1995 |
Poul-Henning Kamp <phk@FreeBSD.org> |
Second batch of cleanup changes. This time mostly making a lot of things static and some unused variables here and there.
|
#
28f8db14 |
|
29-Jul-1995 |
Bruce Evans <bde@FreeBSD.org> |
Eliminate sloppy common-style declarations. There should be none left for the LINT configuation.
|
#
9b2e5354 |
|
30-May-1995 |
Rodney W. Grimes <rgrimes@FreeBSD.org> |
Remove trailing whitespace.
|
#
cf8ad510 |
|
14-Apr-1995 |
David Greenman <dg@FreeBSD.org> |
Fixed serious off by one bug I introduced that will likely cause the machine to panic whenever the name cache fills up. Submitted by: John Dyson
|
#
22e53424 |
|
03-Apr-1995 |
David Greenman <dg@FreeBSD.org> |
kern_subr.c: Added a new type to uiomove - "UIO_NOCOPY" which causes it to update pointers and counts, but doesn't do any data copying. This is needed for upcoming changes to the way that the vnode pager does its page outs. Added a new hash init function call "phashinit" that allocates and initializes a prime number sized hash table. vfs_cache.c: Changed hashing algorithm to use the remainder of dividing by a prime number to improve the distribution characteristcs. Uses new phashinit function in kern_subr.c.
|
#
d7e3d98a |
|
19-Mar-1995 |
David Greenman <dg@FreeBSD.org> |
Patch from Kirk McKusick to fix a bug introduced in the Poul's vfs_cache rewrite.
|
#
47f19694 |
|
11-Mar-1995 |
Poul-Henning Kamp <phk@FreeBSD.org> |
Update a couple of counters.
|
#
914e6eb7 |
|
10-Mar-1995 |
David Greenman <dg@FreeBSD.org> |
Whoops, back out that last change - I misread what Poul had done there.
|
#
dbd90d41 |
|
10-Mar-1995 |
David Greenman <dg@FreeBSD.org> |
Don't thrash the name cache while trying to fill up the object cache. (Make a new cache entry until desiredvnodes is reached).
|
#
b2e10d6d |
|
09-Mar-1995 |
Poul-Henning Kamp <phk@FreeBSD.org> |
Clean up and improve the namecache. 1. We always keep one 16th of the vnodes on the freelist, so that the namecache doesn't get trashed. It used to be that it wasn't a problem, but the only vnodes getting released these days are directories and things which gets forced out of the VM/cache. The latter is not numerous enough to keep the pool of vnodes needed for the namecache sufficiently big. 2. Purge invalid entries in the namecache as soon as we notice them. This avoids a stale entry pushing out a valid entry on the LRU list. 3. Speed up the lookup in the namecache by avoid a special case branch. 4. Make the cache purge routines do the thing they're supposed to, and in a decently efficient manner. 5. Make the size of the namecache follow the number of vnodes, so that we can always point to all the vnodes we have in core. 6. Readability has gone way up. 7. Added a "options NCH_STATISTICS" feature that will gather more detailed statistics on the performance of the namecache. Reviewed by: davidg
|
#
a0e8a1e2 |
|
07-Mar-1995 |
Poul-Henning Kamp <phk@FreeBSD.org> |
Another little optimization to the nameicache. If an entry is stale, ditch it.
|
#
2425396b |
|
07-Mar-1995 |
Poul-Henning Kamp <phk@FreeBSD.org> |
Improve the quality of the hash used in the namei-cache.
|
#
30f467d8 |
|
05-Mar-1995 |
Poul-Henning Kamp <phk@FreeBSD.org> |
Update vfs_cache.c to use the <sys/queue.h> macros. This makes it easier to read, but doesn't change the speed. Reviewed by: phk Obtained from: via NetBSD
|
#
797f2d22 |
|
02-Oct-1994 |
Poul-Henning Kamp <phk@FreeBSD.org> |
All of this is cosmetic. prototypes, #includes, printfs and so on. Makes GCC a lot more silent.
|
#
3c4dd356 |
|
02-Aug-1994 |
David Greenman <dg@FreeBSD.org> |
Added $Id$
|
#
26f9a767 |
|
25-May-1994 |
Rodney W. Grimes <rgrimes@FreeBSD.org> |
The big 4.4BSD Lite to FreeBSD 2.0.0 (Development) patch. Reviewed by: Rodney W. Grimes Submitted by: John Dyson and David Greenman
|
#
df8bae1d |
|
24-May-1994 |
Rodney W. Grimes <rgrimes@FreeBSD.org> |
BSD 4.4 Lite Kernel Sources
|