History log of /freebsd-10.0-release/sys/fs/
Revision Date Author Comments
(<<< Hide modified files)
(Show modified files >>>)
264267 08-Apr-2014 delphij

Fix NFS deadlock vulnerability. [SA-14:05]

Fix "Heartbleed" vulnerability and ECDSA Cache Side-channel
Attack in OpenSSL. [SA-14:06]

Approved by: so

259065 07-Dec-2013 gjb

- Copy stable/10 (r259064) to releng/10.0 as part of the
10.0-RELEASE cycle.
- Update __FreeBSD_version [1]
- Set branch name to -RC1

[1] 10.0-CURRENT __FreeBSD_version value ended at '55', so
start releng/10.0 at '100' so the branch is started with
a value ending in zero.

Approved by: re (implicit)
Sponsored by: The FreeBSD Foundation

257122 25-Oct-2013 kib

MFC r256502:
Similar to debug.iosize_max_clamp sysctl, introduce
devfs_iosize_max_clamp sysctl, which allows/disables SSIZE_MAX-sized
i/o requests on the devfs files.

Approved by: re (glebius)


257121 25-Oct-2013 kib

MFC r256501:
Remove two instances of ARGSUSED comment, and wrap lines nearby the
code that is to be changed.

Approved by: re (glebius)


256281 10-Oct-2013 gjb

Copy head (r256279) to stable/10 as part of the 10.0-RELEASE cycle.

Approved by: re (implicit)
Sponsored by: The FreeBSD Foundation


255867 25-Sep-2013 jmg

NULL stale pointers (should be a no-op as they should no longer be
used)...

Reviewed by: dteske
Approved by: re (kib)
Sponsored by: Vicor
MFC after: 3 days


255866 25-Sep-2013 jmg

fix a bug where we access a bread buffer after we have brelse'd it...
The kernel normally didn't unmap/context switch away before we accessed
the buffer most of the time, but under heavy I/O pressure and lots of
mount/unmounting this would cause a fault on nofault panic...

Reviewed by: dteske
Approved by: re (kib)
Sponsored by: Vicor
MFC after: 3 days


255442 10-Sep-2013 des

Fix the length calculation for the final block of a sendfile(2)
transmission which could be tricked into rounding up to the nearest
page size, leaking up to a page of kernel memory. [13:11]

In IPv6 and NetATM, stop SIOCSIFADDR, SIOCSIFBRDADDR, SIOCSIFDSTADDR
and SIOCSIFNETMASK at the socket layer rather than pass them on to the
link layer without validation or credential checks. [SA-13:12]

Prevent cross-mount hardlinks between different nullfs mounts of the
same underlying filesystem. [SA-13:13]

Security: CVE-2013-5666
Security: FreeBSD-SA-13:11.sendfile
Security: CVE-2013-5691
Security: FreeBSD-SA-13:12.ifioctl
Security: CVE-2013-5710
Security: FreeBSD-SA-13:13.nullfs
Approved by: re


255338 07-Sep-2013 pfg

ext2fs: temporarily disable htree directory index.

Our code does not consider yet the case of hash collisions. This
is a rather annoying situation where two or more files that
happen to have the same hash value will not appear accessible.

The situation is not difficult to work-around but given that things
will just work without enabling htree we will save possible
embarrassments for the next release.

Reported by: Kevin Lo


255240 05-Sep-2013 pjd

Handle cases where capability rights are not provided.

Reported by: kib


255219 05-Sep-2013 pjd

Change the cap_rights_t type from uint64_t to a structure that we can extend
in the future in a backward compatible (API and ABI) way.

The cap_rights_t represents capability rights. We used to use one bit to
represent one right, but we are running out of spare bits. Currently the new
structure provides place for 114 rights (so 50 more than the previous
cap_rights_t), but it is possible to grow the structure to hold at least 285
rights, although we can make it even larger if 285 rights won't be enough.

The structure definition looks like this:

struct cap_rights {
uint64_t cr_rights[CAP_RIGHTS_VERSION + 2];
};

The initial CAP_RIGHTS_VERSION is 0.

The top two bits in the first element of the cr_rights[] array contain total
number of elements in the array - 2. This means if those two bits are equal to
0, we have 2 array elements.

The top two bits in all remaining array elements should be 0.
The next five bits in all array elements contain array index. Only one bit is
used and bit position in this five-bits range defines array index. This means
there can be at most five array elements in the future.

To define new right the CAPRIGHT() macro must be used. The macro takes two
arguments - an array index and a bit to set, eg.

#define CAP_PDKILL CAPRIGHT(1, 0x0000000000000800ULL)

We still support aliases that combine few rights, but the rights have to belong
to the same array element, eg:

#define CAP_LOOKUP CAPRIGHT(0, 0x0000000000000400ULL)
#define CAP_FCHMOD CAPRIGHT(0, 0x0000000000002000ULL)

#define CAP_FCHMODAT (CAP_FCHMOD | CAP_LOOKUP)

There is new API to manage the new cap_rights_t structure:

cap_rights_t *cap_rights_init(cap_rights_t *rights, ...);
void cap_rights_set(cap_rights_t *rights, ...);
void cap_rights_clear(cap_rights_t *rights, ...);
bool cap_rights_is_set(const cap_rights_t *rights, ...);

bool cap_rights_is_valid(const cap_rights_t *rights);
void cap_rights_merge(cap_rights_t *dst, const cap_rights_t *src);
void cap_rights_remove(cap_rights_t *dst, const cap_rights_t *src);
bool cap_rights_contains(const cap_rights_t *big, const cap_rights_t *little);

Capability rights to the cap_rights_init(), cap_rights_set(),
cap_rights_clear() and cap_rights_is_set() functions are provided by
separating them with commas, eg:

cap_rights_t rights;

cap_rights_init(&rights, CAP_READ, CAP_WRITE, CAP_FSTAT);

There is no need to terminate the list of rights, as those functions are
actually macros that take care of the termination, eg:

#define cap_rights_set(rights, ...) \
__cap_rights_set((rights), __VA_ARGS__, 0ULL)
void __cap_rights_set(cap_rights_t *rights, ...);

Thanks to using one bit as an array index we can assert in those functions that
there are no two rights belonging to different array elements provided
together. For example this is illegal and will be detected, because CAP_LOOKUP
belongs to element 0 and CAP_PDKILL to element 1:

cap_rights_init(&rights, CAP_LOOKUP | CAP_PDKILL);

Providing several rights that belongs to the same array's element this way is
correct, but is not advised. It should only be used for aliases definition.

This commit also breaks compatibility with some existing Capsicum system calls,
but I see no other way to do that. This should be fine as Capsicum is still
experimental and this change is not going to 9.x.

Sponsored by: The FreeBSD Foundation


255216 04-Sep-2013 rmacklem

Crashes have been observed for NFSv4.1 mounts when the system
is being shut down which were caused by the nfscbd_pool being
destroyed before the backchannel is disabled. This patch is
believed to fix the problem, by simply avoiding ever destroying
the nfscbd_pool. Since the NFS client module cannot be unloaded,
this should not cause a memory leak.

MFC after: 2 weeks


255136 01-Sep-2013 rmacklem

Forced dismounts of NFS mounts can fail when thread(s) are stuck
waiting for an RPC reply from the server while holding the mount
point busy (mnt_lockref incremented). This happens because dounmount()
msleep()s waiting for mnt_lockref to become 0, before calling
VFS_UNMOUNT(). This patch adds a new VFS operation called VFS_PURGE(),
which the NFS client implements as purging RPCs in progress. Making
this call before checking mnt_lockref fixes the problem, by ensuring
that the VOP_xxx() calls will fail and unbusy the mount point.

Reported by: sbruno
Reviewed by: kib
MFC after: 2 weeks


255008 28-Aug-2013 ken

Support storing 7 additional file flags in tmpfs:

UF_SYSTEM, UF_SPARSE, UF_OFFLINE, UF_REPARSE, UF_ARCHIVE, UF_READONLY,
and UF_HIDDEN.

Sort the file flags tmpfs supports alphabetically. tmpfs now
supports the same flags as UFS, with the exception of SF_SNAPSHOT.

Reported by: bdrewery, antoine
Sponsored by: Spectra Logic


254925 26-Aug-2013 jhb

Remove most of the remaining sysctl name list macros. They were only
ever intended for use in sysctl(8) and it has not used them for many
years.

Reviewed by: bde
Tested by: exp-run by bdrewery


254741 23-Aug-2013 delphij

Allow tmpfs be mounted inside jail.


254627 21-Aug-2013 ken

Expand the use of stat(2) flags to allow storing some Windows/DOS
and CIFS file attributes as BSD stat(2) flags.

This work is intended to be compatible with ZFS, the Solaris CIFS
server's interaction with ZFS, somewhat compatible with MacOS X,
and of course compatible with Windows.

The Windows attributes that are implemented were chosen based on
the attributes that ZFS already supports.

The summary of the flags is as follows:

UF_SYSTEM: Command line name: "system" or "usystem"
ZFS name: XAT_SYSTEM, ZFS_SYSTEM
Windows: FILE_ATTRIBUTE_SYSTEM

This flag means that the file is used by the
operating system. FreeBSD does not enforce any
special handling when this flag is set.

UF_SPARSE: Command line name: "sparse" or "usparse"
ZFS name: XAT_SPARSE, ZFS_SPARSE
Windows: FILE_ATTRIBUTE_SPARSE_FILE

This flag means that the file is sparse. Although
ZFS may modify this in some situations, there is
not generally any special handling for this flag.

UF_OFFLINE: Command line name: "offline" or "uoffline"
ZFS name: XAT_OFFLINE, ZFS_OFFLINE
Windows: FILE_ATTRIBUTE_OFFLINE

This flag means that the file has been moved to
offline storage. FreeBSD does not have any special
handling for this flag.

UF_REPARSE: Command line name: "reparse" or "ureparse"
ZFS name: XAT_REPARSE, ZFS_REPARSE
Windows: FILE_ATTRIBUTE_REPARSE_POINT

This flag means that the file is a Windows reparse
point. ZFS has special handling code for reparse
points, but we don't currently have the other
supporting infrastructure for them.

UF_HIDDEN: Command line name: "hidden" or "uhidden"
ZFS name: XAT_HIDDEN, ZFS_HIDDEN
Windows: FILE_ATTRIBUTE_HIDDEN

This flag means that the file may be excluded from
a directory listing if the application honors it.
FreeBSD has no special handling for this flag.

The name and bit definition for UF_HIDDEN are
identical to the definition in MacOS X.

UF_READONLY: Command line name: "urdonly", "rdonly", "readonly"
ZFS name: XAT_READONLY, ZFS_READONLY
Windows: FILE_ATTRIBUTE_READONLY

This flag means that the file may not written or
appended, but its attributes may be changed.

ZFS currently enforces this flag, but Illumos
developers have discussed disabling enforcement.

The behavior of this flag is different than MacOS X.
MacOS X uses UF_IMMUTABLE to represent the DOS
readonly permission, but that flag has a stronger
meaning than the semantics of DOS readonly permissions.

UF_ARCHIVE: Command line name: "uarch", "uarchive"
ZFS_NAME: XAT_ARCHIVE, ZFS_ARCHIVE
Windows name: FILE_ATTRIBUTE_ARCHIVE

The UF_ARCHIVED flag means that the file has changed and
needs to be archived. The meaning is same as
the Windows FILE_ATTRIBUTE_ARCHIVE attribute, and
the ZFS XAT_ARCHIVE and ZFS_ARCHIVE attribute.

msdosfs and ZFS have special handling for this flag.
i.e. they will set it when the file changes.

sys/param.h: Bump __FreeBSD_version to 1000047 for the
addition of new stat(2) flags.

chflags.1: Document the new command line flag names
(e.g. "system", "hidden") available to the
user.

ls.1: Reference chflags(1) for a list of file flags
and their meanings.

strtofflags.c: Implement the mapping between the new
command line flag names and new stat(2)
flags.

chflags.2: Document all of the new stat(2) flags, and
explain the intended behavior in a little
more detail. Explain how they map to
Windows file attributes.

Different filesystems behave differently
with respect to flags, so warn the
application developer to take care when
using them.

zfs_vnops.c: Add support for getting and setting the
UF_ARCHIVE, UF_READONLY, UF_SYSTEM, UF_HIDDEN,
UF_REPARSE, UF_OFFLINE, and UF_SPARSE flags.

All of these flags are implemented using
attributes that ZFS already supports, so
the on-disk format has not changed.

ZFS currently doesn't allow setting the
UF_REPARSE flag, and we don't really have
the other infrastructure to support reparse
points.

msdosfs_denode.c,
msdosfs_vnops.c: Add support for getting and setting
UF_HIDDEN, UF_SYSTEM and UF_READONLY
in MSDOSFS.

It supported SF_ARCHIVED, but this has been
changed to be UF_ARCHIVE, which has the same
semantics as the DOS archive attribute instead
of inverse semantics like SF_ARCHIVED.

After discussion with Bruce Evans, change
several things in the msdosfs behavior:

Use UF_READONLY to indicate whether a file
is writeable instead of file permissions, but
don't actually enforce it.

Refuse to change attributes on the root
directory, because it is special in FAT
filesystems, but allow most other attribute
changes on directories.

Don't set the archive attribute on a directory
when its modification time is updated.
Windows and DOS don't set the archive attribute
in that scenario, so we are now bug-for-bug
compatible.

smbfs_node.c,
smbfs_vnops.c: Add support for UF_HIDDEN, UF_SYSTEM,
UF_READONLY and UF_ARCHIVE in SMBFS.

This is similar to changes that Apple has
made in their version of SMBFS (as of
smb-583.8, posted on opensource.apple.com),
but not quite the same.

We map SMB_FA_READONLY to UF_READONLY,
because UF_READONLY is intended to match
the semantics of the DOS readonly flag.
The MacOS X code maps both UF_IMMUTABLE
and SF_IMMUTABLE to SMB_FA_READONLY, but
the immutable flags have stronger meaning
than the DOS readonly bit.

stat.h: Add definitions for UF_SYSTEM, UF_SPARSE,
UF_OFFLINE, UF_REPARSE, UF_ARCHIVE, UF_READONLY
and UF_HIDDEN.

The definition of UF_HIDDEN is the same as
the MacOS X definition.

Add commented-out definitions of
UF_COMPRESSED and UF_TRACKED. They are
defined in MacOS X (as of 10.8.2), but we
do not implement them (yet).

ufs_vnops.c: Add support for getting and setting
UF_ARCHIVE, UF_HIDDEN, UF_OFFLINE, UF_READONLY,
UF_REPARSE, UF_SPARSE, and UF_SYSTEM in UFS.
Alphabetize the flags that are supported.

These new flags are only stored, UFS does
not take any action if the flag is set.

Sponsored by: Spectra Logic
Reviewed by: bde (earlier version)


254602 21-Aug-2013 kib

Make the seek a method of the struct fileops.

Tested by: pho
Sponsored by: The FreeBSD Foundation


254601 21-Aug-2013 kib

Extract the general-purpose code from tmpfs to perform uiomove from
the page queue of some vm object.

Discussed with: alc
Tested by: pho
Sponsored by: The FreeBSD Foundation


254415 16-Aug-2013 kib

Restore the previous sendfile(2) behaviour on the block devices.
Provide valid .fo_sendfile method for several missed struct fileops.

Reviewed by: glebius
Sponsored by: The FreeBSD Foundation


254337 14-Aug-2013 rmacklem

Fix several performance related issues in the new NFS server's
DRC for NFS over TCP.
- Increase the size of the hash tables.
- Create a separate mutex for each hash list of the TCP hash table.
- Single thread the code that deletes stale cache entries.
- Add a tunable called vfs.nfsd.tcphighwater, which can be increased
to allow the cache to grow larger, avoiding the overhead of frequent
scans to delete stale cache entries.
(The default value will result in frequent scans to delete stale cache
entries, analagous to what the pre-patched code does.)
- Add a tunable called vfs.nfsd.cachetcp that can be used to disable
DRC caching for NFS over TCP, since the old NFS server didn't DRC cache TCP.
It also adjusts the size of nfsrc_floodlevel dynamically, so that it is
always greater than vfs.nfsd.tcphighwater.

For UDP the algorithm remains the same as the pre-patched code, but the
tunable vfs.nfsd.udphighwater can be used to allow the cache to grow
larger and reduce the overhead caused by frequent scans for stale entries.
UDP also uses a larger hash table size than the pre-patched code.

Reported by: wollman
Tested by: wollman (earlier version of patch)
Submitted by: ivoras (earlier patch)
Reviewed by: jhb (earlier version of patch)
MFC after: 1 month


254326 14-Aug-2013 pfg

ext2fs: update format specifiers for ext4 type.

Previous bandaid was not appropriate and didn't really work for
all platforms. While here, cleanup the surrounding code to match
ffs_checkoverlap()

Reported by: dim, jmallet and bde
MFC after: 3 weeks


254286 13-Aug-2013 pfg

ext2fs: update format specifiers for ext4 type.

Reported by: Sam Fourman Jr.
MFC after: 3 weeks


254283 13-Aug-2013 pfg

Define ext2fs local types and use them.

Add definitions for e2fs_daddr_t, e4fs_daddr_t in addition
to the already existing e2fs_lbn_t and adjust them for ext4.
Other than making the code more readable these changes should
fix problems related to big filesystems.

Setting the proper types can be tricky so the process was
helped by looking at UFS. In our implementation, logical block
numbers can be negative and the code depends on it. In ext2,
block numbers are unsigned so it is convenient to keep
e2fs_daddr_t unsigned and use the complete 32 bits. In the
case of e4fs_daddr_t, while the value should be unsigned, for
ext4 we only need to support 48 bits so preserving an extra
bit from the sign is not an issue.

While here also drop the ext2_setblock() prototype that was
never used.

Discussed with: mckusick, bde
MFC after: 3 weeks


254260 12-Aug-2013 pfg

Add read-only support for extents in ext2fs.

Basic support for extents was implemented by Zheng Liu as part
of his Google Summer of Code in 2010. This support is read-only
at this time.

In addition to extents we also support the huge_file extension
for read-only purposes. This works nicely with the additional
support for birthtime/nanosec timestamps and dir_index that
have been added lately.

The implementation may not work for all ext4 filesystems as
it doesn't support some features that are being enabled by
default on recent linux like flex_bg. Nevertheless, the feature
should be very useful for migration or simple access in
filesystems that have been converted from ext2/3 or don't use
incompatible features.

Special thanks to Zheng Liu for his dedication and continued
work to support ext2 in FreeBSD.

Submitted by: Zheng Liu (lz@)
Reviewed by: Mike Ma, Christoph Mallon (previous version)
Sponsored by: Google Inc.
MFC after: 3 weeks


254138 09-Aug-2013 attilio

The soft and hard busy mechanism rely on the vm object lock to work.
Unify the 2 concept into a real, minimal, sxlock where the shared
acquisition represent the soft busy and the exclusive acquisition
represent the hard busy.
The old VPO_WANTED mechanism becames the hard-path for this new lock
and it becomes per-page rather than per-object.
The vm_object lock becames an interlock for this functionality:
it can be held in both read or write mode.
However, if the vm_object lock is held in read mode while acquiring
or releasing the busy state, the thread owner cannot make any
assumption on the busy state unless it is also busying it.

Also:
- Add a new flag to directly shared busy pages while vm_page_alloc
and vm_page_grab are being executed. This will be very helpful
once these functions happen under a read object lock.
- Move the swapping sleep into its own per-object flag

The KPI is heavilly changed this is why the version is bumped.
It is very likely that some VM ports users will need to change
their own code.

Sponsored by: EMC / Isilon storage division
Discussed with: alc
Reviewed by: jeff, kib
Tested by: gavin, bapt (older version)
Tested by: pho, scottl


254104 08-Aug-2013 pfg

Small typo.

MFC after: 3 days


253967 05-Aug-2013 kib

The tmpfs_alloc_vp() is used to instantiate vnode for the tmpfs node,
in particular, from the tmpfs_lookup VOP method. If LK_NOWAIT is not
specified in the lkflags, the lookup is supposed to return an alive
vnode whenever the underlying node is valid.

Currently, the tmpfs_alloc_vp() returns ENOENT if the vnode attached
to node exists and is being reclaimed. This causes spurious ENOENT
errors from lookup on tmpfs and corresponding random 'No such file'
failures from syscalls working with tmpfs files.

Fix this by waiting for the doomed vnode to be detached from the tmpfs
node if sleepable allocation is requested.

Note that filesystems which use vfs_hash.c, correctly handle the case
due to vfs_hash_get() looping when vget() returns ENOENT for sleepable
requests.

Reported and tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks


253953 05-Aug-2013 attilio

Revert r253939:
We cannot busy a page before doing pagefaults.
Infact, it can deadlock against vnode lock, as it tries to vget().
Other functions, right now, have an opposite lock ordering, like
vm_object_sync(), which acquires the vnode lock first and then
sleeps on the busy mechanism.

Before this patch is reinserted we need to break this ordering.

Sponsored by: EMC / Isilon storage division
Reported by: kib


253939 04-Aug-2013 attilio

The page hold mechanism is fast but it has couple of fallouts:
- It does not let pages respect the LRU policy
- It bloats the active/inactive queues of few pages

Try to avoid it as much as possible with the long-term target to
completely remove it.
Use the soft-busy mechanism to protect page content accesses during
short-term operations (like uiomove_fromphys()).

After this change only vm_fault_quick_hold_pages() is still using the
hold mechanism for page content access.
There is an additional complexity there as the quick path cannot
immediately access the page object to busy the page and the slow path
cannot however busy more than one page a time (to avoid deadlocks).

Fixing such primitive can bring to complete removal of the page hold
mechanism.

Sponsored by: EMC / Isilon storage division
Discussed with: alc
Reviewed by: jeff
Tested by: pho


253927 04-Aug-2013 attilio

Remove unnecessary soft busy of the page before to do vn_rdwr() in
kern_sendfile() which is unnecessary.
The page is already wired so it will not be subjected to pagefault.
The content cannot be effectively protected as it is full of races
already.
Multiple accesses to the same indexes are serialized through vn_rdwr().

Sponsored by: EMC / Isilon storage division
Reviewed by: alc, jeff
Tested by: pho


253861 01-Aug-2013 pfg

Add license for the half MD4 algorithm used in ext2_half_md4().

The htree implementation uses code derived from the
RSA Data Security, Inc. MD4 Message-Digest Algorithm.

Add a proper licensing statement for the code and clarify
the corresponding comments.

Approved by: core (hrs)


253742 28-Jul-2013 marius

- Add const-qualifiers to the arguments of isonum_*().
- According to ISO 9660 7.1.2, isonum_712() should return a signed value.
- Try to get isonum_*() closer to style(9).


253677 26-Jul-2013 avg

make path matching in devfs rules consistent and sane (and safer)

Before this change path matching had the following features:
- for device nodes the patterns were matched against full path
- in the above case '/' in a path could be matched by a wildcard
- for directories and links only the last component was matched

So, for example, a pattern like 're*' could match the following entries:
- re0 device
- responder/u0 device
- zvol/recpool directory

Although it was possible to work around this behavior (once it was spotted
and understood), it was very confusing and contrary to documentation.

Now we always match a full path for all types of devfs entries (devices,
directories, links) and a '/' has to be matched explicitly.
This behavior follows the shell globbing rules.

This change is originally developed by Jaakko Heinonen.
Many thanks!

PR: kern/122838
Submitted by: jh
MFC after: 4 weeks


253657 25-Jul-2013 pfg

ext2fs: Return EINVAL for negative uio_offset as in UFS.

While here drop old comment that doesn't really apply.

MFC after: 1 month
Discussed with: gleb


253652 25-Jul-2013 pfg

ext2fs: Drop a check that wan't supposed to be in r253651.

MFC after: 1 month


253651 25-Jul-2013 pfg

ext2fs: Don't assume that on-disk format of a directory is the same
as in <sys/dirent.h>

ext2_readdir() has always been very fs specific and different
with respect to its ufs_ counterpart. Recent changes from UFS
have made it possible to share more closely the implementation.

MFUFS r252438:
Always start parsing at DIRBLKSIZ aligned offset, skip first entries if
uio_offset is not DIRBLKSIZ aligned. Return EINVAL if buffer is too
small for single entry.

Preallocate buffer for cookies.

Skip entries with zero inode number.

Reviewed by: gleb, Zheng Liu
MFC after: 1 month


253619 24-Jul-2013 pfg

fuse: revert kernel_header update.

It seems to be causing problems due to the lack of the new features.

Found by: bapt
Pointed hat: pfg


253573 23-Jul-2013 nwhitehorn

tmpfs works perfectly fine with -o union -- there is no reason to exclude it
from the list of options.


253506 20-Jul-2013 rmacklem

The NFSv4 server incorrectly assumed that the high order words of
the attribute bitmap argument would be non-zero. This caused an
interoperability problem for a recent patch to the Linux NFSv4 client.
The Linux folks have changed their patch to avoid this, but this
patch fixes the problem on the server.

Reported and tested by: Andre Heider (a.heider@gmail.com)
MFC after: 3 days


253498 20-Jul-2013 pfg

fuse: revert birthtime support.

The creation time support breaks the data structures used in linux
fuse. libfuse carries it's own header.

Revert the changes for now. We will try to get an agreement with the
fuse upstream maintainers to avoid having to patch the library
headers all the time.


253479 20-Jul-2013 pfg

Adjust outsizes:

Recalculate FUSE_COMPAT_ENTRY_OUT_SIZE and COMPAT_ATTR_OUT_SIZE.
These were wrong in the previous commit. They are actually unused
in FreeBSD though.

Pointed out by: Jan Beich


253478 20-Jul-2013 pfg

Adjust outsizes:

When birthtime was added (r253331) we missed adding the weight
of the new fields in FUSE_COMPAT_ENTRY_OUT_SIZE and
COMPAT_ATTR_OUT_SIZE. Adjust them accordingly.

Pointed out by: Jan Beich


253344 15-Jul-2013 pfg

Update fuse_kernel header.

Bring in the changes from the FUSE kernel interface 7.10
(available under a BSD license).

After 7.10 the linux FUSE developers added support for a
controversial CUSE driver and some linux especific
features that are unlikely to find its way into FreeBSD.

We currently don't implement any of the new features so we
are *not* bumping the FUSE_KERNEL_MINOR_VERSION. The header
should, nevertheless, serve as a template to add the new
features in a compatible manner.

While here adopt some minor cleanups from the upstream version
like removing FUSE_MAJOR and FUSE_MINOR which were never
used. Also add multiple inclusion header guards,


253331 13-Jul-2013 pfg

Add creation timestamp (birthtime) support for fuse.

I was keeping this #ifdef'd for reference with the MacFUSE change[1]
but on second thought, this is a FreeBSD-only header so the SVN
history should be enough.

Add missing padding while here.

Reference [1]:
http://code.google.com/p/macfuse/source/detail?spec=svn1686&r=1360


253276 12-Jul-2013 pfg

Add creation timestamp (birthtime) support for fuse.

This is based on similar support in MacFUSE.


253173 10-Jul-2013 pfg

Implement 1003.1-2001 pathconf() keys.

This is based on r106058 in UFS.

MFC after: 1 month


253098 09-Jul-2013 pfg

Reinstate the assertion from r253045.

UFS r232732 reverted the change as the real problem was to be fixed
at the syscall level.

Reported by: bde


253050 09-Jul-2013 pfg

Enhancement when writing an entire block of a file.

Merge from UFS r231313:

This change first attempts the uiomove() to the newly allocated
(and dirty) buffer and only zeros it if the uiomove() fails. The
effect is to eliminate the gratuitous zeroing of the buffer in
the usual case where the uiomove() successfully fills it.

MFC after: 3 days


253049 09-Jul-2013 rmacklem

Add support for host-based (Kerberos 5 service principal) initiator
credentials to the kernel rpc. Modify the NFSv4 client to add
support for the gssname and allgssname mount options to use this
capability. Requires the gssd daemon to be running with the "-h" option.

Reviewed by: jhb


253045 08-Jul-2013 pfg

Avoid a panic and return EINVAL instead.

Merge from UFS r232692:
syscall() fuzzing can trigger this panic.

MFC after: 3 days


252956 07-Jul-2013 pfg

Implement SEEK_HOLE/SEEK_DATA for ext2fs.

Merged from r236044 on UFS.

MFC after: 3 days


252907 07-Jul-2013 pfg

Fix some typos.

MFC after: 1 week


252890 06-Jul-2013 pfg

Initial implementation of the HTree directory index.

This is a port of NetBSD's GSoC 2012 Ext3 HTree directory indexing
by Vyacheslav Matyushin. It was cleaned up and enhanced for FreeBSD
by Zheng Liu (lz@).

This is an excellent example of work shared among different projects:
Vyacheslav was able to look at an early prototype from Zheng Liu who
was also able to check the code from Haiku (with permission).

As in linux, the feature is not available by default and must be
enabled explicitly with tune2fs. We still do not support the
workarounds required in readdir for NFS.

Submitted by: Zheng Liu
Tested by: Mike Ma
Sponsored by: Google Inc.
MFC after: 1 week


252714 04-Jul-2013 kib

The tvp vnode on rename is usually unlinked. Drop the cached null
vnode for tvp to allow the free of the lower vnode, if needed.

PR: kern/180236
Tested by: smh
Sponsored by: The FreeBSD Foundation
MFC after: 1 week


252558 03-Jul-2013 davide

- Fix double frees/user after free.
- Allocate using smb_rq_alloc() instead of inlining it.

Reported by: uqs
Found with: Coverity Scan


252528 03-Jul-2013 rmacklem

A problem with the old NFS client where large writes to large files
would sometimes result in a corrupted file was reported via email.
This problem appears to have been caused by r251719 (reverting
r251719 fixed the problem). Although I have not been able to
reproduce this problem, I suspect it is caused by another thread
increasing np->n_size after the mtx_unlock(&np->n_mtx) but before
the vnode_pager_setsize() call. Since the np->n_mtx mutex serializes
updates to np->n_size, doing the vnode_pager_setsize() with the
mutex locked appears to avoid the problem.
Unfortunately, vnode_pager_setsize() where the new size is smaller,
cannot be called with a mutex held.
This patch returns the semantics to be close to pre-r251719 (actually
pre-r248567, r248581, r248567 for the new client) such that the call to
vnode_pager_setsize() is only delayed until after the mutex is
unlocked when np->n_size is shrinking. Since the file is growing
when being written, I believe this will fix the corruption.
A better solution might be to replace the mutex with a sleep lock,
but that is a non-trivial conversion, so this fix is hoped to be
sufficient in the meantime.

Reported by: David G. Lawrence (dg@dglawrence.com)
Tested by: David G. Lawrence (to be done soon)
Reviewed by: kib
MFC after: 1 week


252397 30-Jun-2013 pfg

ext2fs: Use the complete random() range in i_gen.

i_gen is unsigned in ext2fs so we can handle the complete
32 bits.

MFC after: 1 week


252364 29-Jun-2013 pfg

Bring some updates from ufs_lookup to ext2fs.

r156418:

Don't set IN_CHANGE and IN_UPDATE on inodes for potentially suspended
file systems. This could cause deadlocks when creating snapshots.
(We can't do snapshots on ext2fs but it is useful to keep things in sync).

r183079:

- Only set i_offset in the parent directory's i-node during a lookup for
non-LOOKUP operations.
- Relax a VOP assertion for a DELETE lookup.

r187528:

Move the code from ufs_lookup.c used to do dotdot lookup, into
the helper function. It is supposed to be useful for any filesystem
that has to unlock dvp to walk to the ".." entry in lookup routine.

MFC after: 5 days


252355 28-Jun-2013 davide

Properly use v_data field. This magically worked (even if wrong) until
now because v_data is the first field of the structure, but it's not
something we should rely on.


252353 28-Jun-2013 davide

Garbage collect an useless check. smp should be never NULL.


252352 28-Jun-2013 davide

Plug a couple of leakages in smbfs_lookup().


252259 26-Jun-2013 pfg

Minor sorting.

MFC after: 3 days


252103 23-Jun-2013 pfg

Define and use e2fs_lbn_t in ext2fs.

In line to what is done in UFS, define an internal type
e2fs_lbn_t for the logical block numbers.

This change is basically a no-op as the new type is unchanged
(int32_t) but it may be useful as bumping this may be required
for ext4fs.

Also, as pointed out by Bruce Evans:

-Use daddr_t for daddr in ext2_bmaparray(). This seems to
improve reliability with the reallocblks option.
- Add a cast to the fsbtodb() macro as in UFS.

Reviewed by: bde
MFC after: 3 days


252100 22-Jun-2013 rmacklem

Fix r252074 so that it builds on 64bit arches.


252074 21-Jun-2013 rmacklem

The NFSv4.1 LayoutCommit operation requires a valid offset and length.
(0, 0 is not sufficient) This patch a loop for each file layout, using
the offset, length of each file layout in a separate LayoutCommit.


252072 21-Jun-2013 rmacklem

When the NFSv4.1 client is writing to a pNFS Data Server (DS), the
file's size attribute does not get updated. As such, it is necessary
to invalidate the attribute cache before clearing NMODIFIED for pNFS.

MFC after: 2 weeks


252067 21-Jun-2013 rmacklem

Since some NFSv4 servers enforce the requirement for a reserved port#,
enable use of the (no)resvport mount option for NFSv4. I had thought
that the RFC required that non-reserved port #s be allowed, but I couldn't
find it in the RFC.

MFC after: 2 weeks


252012 20-Jun-2013 pfg

Rename some prefixes in the Block Group Descriptor fields to ext4bgd_

Change prefix to avoid confusion and denote that these fields
are generally only available starting with ext4.

MFC after: 3 days


251952 18-Jun-2013 pfg

More ext2fs header cleanups:

- Set MAXMNTLEN nearer to where it is used.
- Move EXT2_LINK_MAX to ext2_dir.h .

MFC after: 3 days


251823 17-Jun-2013 pfg

Rename remaining DIAGNOSTIC to INVARIANTS.

MFC after: 3 days


251809 16-Jun-2013 pfg

Re-sort ext2fs headers to make things easier to find.

In the ext2fs driver we have a mixture of headers:

- The ext2_ prefixed headers have strong influence from NetBSD
and are carry specific ext2/3/4 information.
- The unprefixed headers are inspired on UFS and carry implementation
specific information.

Do some small adjustments so that the information is easier to
find coming from either UFS or the NetBSD implementation.

MFC after: 3 days


251677 13-Jun-2013 pfg

Relax some unnecessary unsigned type changes in ext2fs.

While the changes in r245820 are in line with the ext2 spec,
the code derived from UFS can use negative values so it is
better to relax some types to keep them as they were, and
somewhat more similar to UFS. While here clean some casts.

Some of the original types are still wrong and will require
more work.

Discussed with: bde
MFC after: 3 days


251658 12-Jun-2013 pfg

Turn DIAGNOSTICs to INVARIANTS in ext2fs.

This is done to be consistent with what other filesystems and
particularly ffs already does (see r173464).

MFC after: 5 days


251612 11-Jun-2013 pfg

s/file system/filesystem/g

Based on r96755 from UFS.

MFC after: 3 days


251562 09-Jun-2013 pfg

e2fs_bpg and e2fs_isize are always unsigned.

The superblock in ext2fs defines all the fields as unsigned but for
some reason the in-memory superblock was carrying e2fs_bpg and
e2fs_isize as signed.

We should preserve the specified types for consistency.

MFC after: 5 days


251505 07-Jun-2013 alc

Add missing VM object unlocks in an error case.

Reviewed by: kib


251452 06-Jun-2013 alc

Don't busy the page unless we are likely to release the object lock.

Reviewed by: kib
Sponsored by: EMC / Isilon Storage Division


251423 05-Jun-2013 alc

Relax the vm object locking. Use a read lock.

Sponsored by: EMC / Isilon Storage Division


251383 04-Jun-2013 alc

Eliminate unnecessary vm object locking from tmpfs_nocacheread().


251346 03-Jun-2013 pfg

ext2fs: space vs tab.

Obtained from: Christoph Mallon
MFC after: 3 days


251344 03-Jun-2013 pfg

ext2fs: Small cosmetic fixes.

Make a long macro readable and sort a header.

Obtained from: Christoph Mallon
MFC after: 3 days


251336 03-Jun-2013 pfg

ext2fs: Update Block Group Descriptor struct.

Uncover some, previously reserved, fields that are used by Ext4.
These are currently unused but it is good to have them for future
reference.

Reviewed by: bde
MFC after: 3 days


251171 31-May-2013 jeff

- Convert the bufobj lock to rwlock.
- Use a shared bufobj lock in getblk() and inmem().
- Convert softdep's lk to rwlock to match the bufobj lock.
- Move INFREECNT to b_flags and protect it with the buf lock.
- Remove unnecessary locking around bremfree() and BKGRDINPROG.

Sponsored by: EMC / Isilon Storage Division
Discussed with: mckusick, kib, mdf


251149 30-May-2013 kib

Assert that OBJ_TMPFS flag on the vm object for the tmpfs node is
cleared when the tmpfs node is going away.

Tested by: bdrewery, pho


251079 28-May-2013 rmacklem

Post-r248567, there were times when the client would return a
truncated directory for some NFS servers. This turned out to
be because the size of a directory reported by an NFS server
can be smaller that the ufs-like directory created from the
RPC XDR in the client. This patch fixes the problem by changing
r248567 so that vnode_pager_setsize() is only done for regular files.

Reported and tested by: hartmut.brandt@dlr.de
Reviewed by: kib
MFC after: 1 week


250852 21-May-2013 kib

Do not leak the NULLV_NOUNLOCK flag from the nullfs_unlink_lowervp(),
for the case when the nullfs vnode is not reclaimed. Otherwise, later
reclamation would not unlock the lower vnode.

Reported by: antoine
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week


250657 15-May-2013 des

Fix typo in comment.

Submitted by: Alex Weber <alexwebr@gmail.com>
MFC after: 1 week


250580 12-May-2013 rmacklem

Add support for the eofflag to nfs_readdir() in the new NFS
client so that it works under a unionfs mount.

Submitted by: Jared Yanovich (slovichon@gmail.com)
Reviewed by: kib
MFC after: 2 weeks


250576 12-May-2013 eadler

Fix several typos

PR: kern/176054
Submitted by: Christoph Mallon <christoph.mallon@gmx.de>
MFC after: 3 days


250567 12-May-2013 jilles

fdescfs: Supply a real value for d_type in readdir.

All the fdescfs nodes (except . and ..) appear as character devices to
stat(), so DT_CHR is correct.


250505 11-May-2013 kib

- Fix nullfs vnode reference leak in nullfs_reclaim_lowervp(). The
null_hashget() obtains the reference on the nullfs vnode, which must
be dropped.

- Fix a wart which existed from the introduction of the nullfs
caching, do not unlock lower vnode in the nullfs_reclaim_lowervp().
It should be innocent, but now it is also formally safe. Inform the
nullfs_reclaim() about this using the NULLV_NOUNLOCK flag set on
nullfs inode.

- Add a callback to the upper filesystems for the lower vnode
unlinking. When inactivating a nullfs vnode, check if the lower
vnode was unlinked, indicated by nullfs flag NULLV_DROP or VV_NOSYNC
on the lower vnode, and reclaim upper vnode if so. This allows
nullfs to purge cached vnodes for the unlinked lower vnode, avoiding
excessive caching.

Reported by: G??ran L??wkrantz <goran.lowkrantz@ismobile.com>
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks


250310 06-May-2013 kib

Avoid deactivating the page if it is already on a queue, only requeue
the page. This both reduces the number of queues locking and avoids
moving the active page to inactive list just because the page was read
or written.

Based on the suggestion by: alc
Reviewed by: alc
Tested by: pho


250238 04-May-2013 davide

Change VM_OBJECT_LOCK/UNLOCK() -> VM_OBJECT_WLOCK/WUNLOCK() to reflect
the recent switch of the vm object lock to a rwlock.

Reported by: attilio


250237 04-May-2013 davide

Overhaul locking in netsmb, getting rid of the obsolete lockmgr() primitive.
This solves a long standing LOR between smb_conn and smb_vc.

Tested by: martymac, pho (previous version)


250236 04-May-2013 davide

Completely rewrite the interface to smbdev switching from dev_clone
to cdevpriv(9). This commit changes the semantic of mount_smbfs
in userland as well, which now passes file descriptor in order to
to mount a specific filesystem istance.

Reviewed by: attilio, ed
Tested by: martymac


250193 02-May-2013 kib

The fsync(2) call should sync the vnode in such way that even after
system crash which happen after successfull fsync() return, the data
is accessible. For msdosfs, this means that FAT entries for the file
must be written.

Since we do not track the FAT blocks containing entries for the
current file, just do a sloppy sync of the devvp vnode for the mount,
which buffers, among other things, contain FAT blocks.

Simultaneously, for deupdat():
- optimize by clearing the modified flags before short-circuiting a
return, if the mount is read-only;
- only ignore the rest of the function for denode with DE_MODIFIED
flag clear when the waitfor argument is false. The directory buffer
for the entry might be of delayed write;
- microoptimize by comparing the updated directory entry with the
current block content;
- try to cluster the write, fall back to bawrite() if low on
resources.

Based on the submission by: bde
MFC after: 2 weeks


250190 02-May-2013 kib

Fix the v_object leak for non-regular tmpfs vnodes.

Reported and tested by: pho
Sponsored by: The FreeBSD Foundation


250189 02-May-2013 kib

For the new regular tmpfs vnode, v_object is initialized before
insmntque() is called. The standard insmntque destructor resets the
vop vector to deadfs one, and calls vgone() on the vnode. As result,
v_object is kept unchanged, which triggers an assertion in the reclaim
code, on instmntque() failure. Also, in this case, OBJ_TMPFS flag on
the backed vm object is not cleared.

Provide the tmpfs insmntque() destructor which properly clears
OBJ_TMPFS flag and resets v_object.

Reported and tested by: pho
Sponsored by: The FreeBSD Foundation


250188 02-May-2013 kib

The page read or written could be wired. Do not requeue if the page
is not on a queue.

Reported and tested by: pho
Sponsored by: The FreeBSD Foundation


250055 29-Apr-2013 des

Fix a bug that allows NFS clients to issue READDIR on files.

PR: kern/178016
Security: CVE-2013-3266
Security: FreeBSD-SA-13:05.nfsserver


250030 28-Apr-2013 kib

Rework the handling of the tmpfs node backing swap object and tmpfs
vnode v_object to avoid double-buffering. Use the same object both as
the backing store for tmpfs node and as the v_object.

Besides reducing memory use up to 2x times for situation of mapping
files from tmpfs, it also makes tmpfs read and write operations copy
twice bytes less.

VM subsystem was already slightly adapted to tolerate OBJT_SWAP object
as v_object. Now the vm_object_deallocate() is modified to not
reinstantiate OBJ_ONEMAPPING flag and help the VFS to correctly handle
VV_TEXT flag on the last dereference of the tmpfs backing object.

Reviewed by: alc
Tested by: pho, bf
MFC after: 1 month


249630 18-Apr-2013 rmacklem

When an NFS unmount occurs, once vflush() writes the last dirty
buffer for the last vnode on the mount back to the server, it
returns. At that point, the code continues with the unmount,
including freeing up the nfs specific part of the mount structure.
It is possible that an nfsiod thread will try to check for an
empty I/O queue in the nfs specific part of the mount structure
after it has been free'd by the unmount. This patch avoids this problem by
setting the iodmount entries for the mount back to NULL while holding the
mutex in the unmount and checking the appropriate entry is non-NULL after
acquiring the mutex in the nfsiod thread.

Reported and tested by: pho
Reviewed by: kib
MFC after: 2 weeks


249623 18-Apr-2013 rmacklem

Both NFS clients can deadlock when using the "rdirplus" mount
option. This can occur when an nfsiod thread that already holds
a buffer lock attempts to acquire a vnode lock on an entry in
the directory (a LOR) when another thread holding the vnode lock
is waiting on an nfsiod thread. This patch avoids the deadlock by disabling
readahead for this case, so the nfsiod threads never do readdirplus.
Since readaheads for directories need the directory offset cookie
from the previous read, they cannot normally happen in parallel.
As such, testing by jhb@ and myself didn't find any performance
degredation when this patch is applied. If there is a case where
this results in a significant performance degradation, mounting
without the "rdirplus" option can be done to re-enable readahead
for directories.

Reported and tested by: jhb
Reviewed by: jhb
MFC after: 2 weeks


249596 17-Apr-2013 ken

Move the NFS FHA (File Handle Affinity) code from sys/nfsserver to
sys/nfs, since it is now shared by the two NFS servers.

Suggested by: rmacklem
Sponsored by: Spectra Logic
MFC after: 2 weeks


249592 17-Apr-2013 ken

Revamp the old NFS server's File Handle Affinity (FHA) code so that
it will work with either the old or new server.

The FHA code keeps a cache of currently active file handles for
NFSv2 and v3 requests, so that read and write requests for the same
file are directed to the same group of threads (reads) or thread
(writes). It does not currently work for NFSv4 requests. They are
more complex, and will take more work to support.

This improves read-ahead performance, especially with ZFS, if the
FHA tuning parameters are configured appropriately. Without the
FHA code, concurrent reads that are part of a sequential read from
a file will be directed to separate NFS threads. This has the
effect of confusing the ZFS zfetch (prefetch) code and makes
sequential reads significantly slower with clients like Linux that
do a lot of prefetching.

The FHA code has also been updated to direct write requests to nearby
file offsets to the same thread in the same way it batches reads,
and the FHA code will now also send writes to multiple threads when
needed.

This improves sequential write performance in ZFS, because writes
to a file are now more ordered. Since NFS writes (generally
less than 64K) are smaller than the typical ZFS record size
(usually 128K), out of order NFS writes to the same block can
trigger a read in ZFS. Sending them down the same thread increases
the odds of their being in order.

In order for multiple write threads per file in the FHA code to be
useful, writes in the NFS server have been changed to use a LK_SHARED
vnode lock, and upgrade that to LK_EXCLUSIVE if the filesystem
doesn't allow multiple writers to a file at once. ZFS is currently
the only filesystem that allows multiple writers to a file, because
it has internal file range locking. This change does not affect the
NFSv4 code.

This improves random write performance to a single file in ZFS, since
we can now have multiple writers inside ZFS at one time.

I have changed the default tuning parameters to a 22 bit (4MB)
window size (from 256K) and unlimited commands per thread as a
result of my benchmarking with ZFS.

The FHA code has been updated to allow configuring the tuning
parameters from loader tunable variables in addition to sysctl
variables. The read offset window calculation has been slightly
modified as well. Instead of having separate bins, each file
handle has a rolling window of bin_shift size. This minimizes
glitches in throughput when shifting from one bin to another.

sys/conf/files:
Add nfs_fha_new.c and nfs_fha_old.c. Compile nfs_fha.c
when either the old or the new NFS server is built.

sys/fs/nfs/nfsport.h,
sys/fs/nfs/nfs_commonport.c:
Bring in changes from Rick Macklem to newnfs_realign that
allow it to operate in blocking (M_WAITOK) or non-blocking
(M_NOWAIT) mode.

sys/fs/nfs/nfs_commonsubs.c,
sys/fs/nfs/nfs_var.h:
Bring in a change from Rick Macklem to allow telling
nfsm_dissect() whether or not to wait for mallocs.

sys/fs/nfs/nfsm_subs.h:
Bring in changes from Rick Macklem to create a new
nfsm_dissect_nonblock() inline function and
NFSM_DISSECT_NONBLOCK() macro.

sys/fs/nfs/nfs_commonkrpc.c,
sys/fs/nfsclient/nfs_clkrpc.c:
Add the malloc wait flag to a newnfs_realign() call.

sys/fs/nfsserver/nfs_nfsdkrpc.c:
Setup the new NFS server's RPC thread pool so that it will
call the FHA code.

Add the malloc flag argument to newnfs_realign().

Unstaticize newnfs_nfsv3_procid[] so that we can use it in
the FHA code.

sys/fs/nfsserver/nfs_nfsdsocket.c:
In nfsrvd_dorpc(), add NFSPROC_WRITE to the list of RPC types
that use the LK_SHARED lock type.

sys/fs/nfsserver/nfs_nfsdport.c:
In nfsd_fhtovp(), if we're starting a write, check to see
whether the underlying filesystem supports shared writes.
If not, upgrade the lock type from LK_SHARED to LK_EXCLUSIVE.

sys/nfsserver/nfs_fha.c:
Remove all code that is specific to the NFS server
implementation. Anything that is server-specific is now
accessed through a callback supplied by that server's FHA
shim in the new softc.

There are now separate sysctls and tunables for the FHA
implementations for the old and new NFS servers. The new
NFS server has its tunables under vfs.nfsd.fha, the old
NFS server's tunables are under vfs.nfsrv.fha as before.

In fha_extract_info(), use callouts for all server-specific
code. Getting file handles and offsets is now done in the
individual server's shim module.

In fha_hash_entry_choose_thread(), change the way we decide
whether two reads are in proximity to each other.
Previously, the calculation was a simple shift operation to
see whether the offsets were in the same power of 2 bucket.
The issue was that there would be a bucket (and therefore
thread) transition, even if the reads were in close
proximity. When there is a thread transition, reads wind
up going somewhat out of order, and ZFS gets confused.

The new calculation simply tries to see whether the offsets
are within 1 << bin_shift of each other. If they are, the
reads will be sent to the same thread.

The effect of this change is that for sequential reads, if
the client doesn't exceed the max_reqs_per_nfsd parameter
and the bin_shift is set to a reasonable value (22, or
4MB works well in my tests), the reads in any sequential
stream will largely be confined to a single thread.

Change fha_assign() so that it takes a softc argument. It
is now called from the individual server's shim code, which
will pass in the softc.

Change fhe_stats_sysctl() so that it takes a softc
parameter. It is now called from the individual server's
shim code. Add the current offset to the list of things
printed out about each active thread.

Change the num_reads and num_writes counters in the
fha_hash_entry structure to 32-bit values, and rename them
num_rw and num_exclusive, respectively, to reflect their
changed usage.

Add an enable sysctl and tunable that allows the user to
disable the FHA code (when vfs.XXX.fha.enable = 0). This
is useful for before/after performance comparisons.

nfs_fha.h:
Move most structure definitions out of nfs_fha.c and into
the header file, so that the individual server shims can
see them.

Change the default bin_shift to 22 (4MB) instead of 18
(256K). Allow unlimited commands per thread.

sys/nfsserver/nfs_fha_old.c,
sys/nfsserver/nfs_fha_old.h,
sys/fs/nfsserver/nfs_fha_new.c,
sys/fs/nfsserver/nfs_fha_new.h:
Add shims for the old and new NFS servers to interface with
the FHA code, and callbacks for the

The shims contain all of the code and definitions that are
specific to the NFS servers.

They setup the server-specific callbacks and set the server
name for the sysctl and loader tunable variables.

sys/nfsserver/nfs_srvkrpc.c:
Configure the RPC code to call fhaold_assign() instead of
fha_assign().

sys/modules/nfsd/Makefile:
Add nfs_fha.c and nfs_fha_new.c.

sys/modules/nfsserver/Makefile:
Add nfs_fha_old.c.

Reviewed by: rmacklem
Sponsored by: Spectra Logic
MFC after: 2 weeks


249588 17-Apr-2013 gabor

- Correct spelling in comments

Submitted by: Christoph Mallon <christoph.mallon@gmx.de> (via private mail)


249583 17-Apr-2013 gabor

- Correct mispellings of the word necessary

Submitted by: Christoph Mallon <christoph.mallon@gmx.de> (via private mail)


249218 06-Apr-2013 jeff

Prepare to replace the buf splay with a trie:

- Don't insert BKGRDMARKER bufs into the splay or dirty/clean buf lists.
No consumers need to find them there and it complicates the tree.
These flags are all FFS specific and could be moved out of the buf
cache.
- Use pbgetvp() and pbrelvp() to associate the background and journal
bufs with the vp. Not only is this much cheaper it makes more sense
for these transient bufs.
- Fix the assertions in pbget* and pbrel*. It's not safe to check list
pointers which were never initialized. Use the BX flags instead. We
also check B_PAGING in reassignbuf() so this should cover all cases.

Discussed with: kib, mckusick, attilio
Sponsored by: EMC / Isilon Storage Division


248967 01-Apr-2013 kib

Strip the unnneeded spaces, mostly at the end of lines.

MFC after: 3 days


248610 22-Mar-2013 pjd

- Constify local path variable for chflagsat().
- Use correct format characters (%lx) for u_long.

This fixes the build broken in r248599.


248597 21-Mar-2013 pjd

- Make 'flags' argument to chflags(2), fchflags(2) and lchflags(2) of type
u_long. Before this change it was of type int for syscalls, but prototypes
in sys/stat.h and documentation for chflags(2) and fchflags(2) (but not
for lchflags(2)) stated that it was u_long. Now some related functions
use u_long type for flags (strtofflags(3), fflagstostr(3)).
- Make path argument of type 'const char *' for consistency.

Discussed on: arch
Sponsored by: The FreeBSD Foundation


248581 21-Mar-2013 kib

Initialize the variable to avoid (false) compiler warning about
use of an uninitialized local.

Reported by: Ivan Klymenko <fidaj@ukr.net>
MFC after: 2 weeks


248567 21-Mar-2013 kib

Do not call vnode_pager_setsize() while a NFS node mutex is
locked. vnode_pager_setsize() might sleep waiting for the page after
EOF be unbusied.

Call vnode_pager_setsize() both for the regular and directory vnodes.

Reported by: mich
Reviewed by: rmacklem
Discussed with: avg, jhb
MFC after: 2 weeks


248500 19-Mar-2013 emaste

Fix remainder calculation when biosize is not a power of 2

In common configurations biosize is a power of two, but is not required to
be so. Thanks to markj@ for spotting an additional case beyond my original
patch.

Reviewed by: rmacklem@


248422 17-Mar-2013 kib

Remove negative name cache entry pointing to the target name, which
could be instantiated while tdvp was unlocked.

Reported by: Rick Miller <vmiller at hostileadmin com>
Tested by: pho
MFC after: 1 week


248282 14-Mar-2013 kib

Add currently unused flag argument to the cluster_read(),
cluster_write() and cluster_wbuild() functions. The flags to be
allowed are a subset of the GB_* flags for getblk().

Sponsored by: The FreeBSD Foundation
Tested by: pho


248255 13-Mar-2013 jhb

Revert 195703 and 195821 as this special stop handling in NFS is now
implemented via VFCF_SBDRY rather than passing PBDRY to individual
sleep calls.


248188 12-Mar-2013 glebius

Finish r243882: mechanically substitute flags from historic mbuf
allocator with malloc(9) flags within sys.

Sponsored by: Nginx, Inc.


248101 09-Mar-2013 davide

smbfs_lookup() in the DOTDOT case operates on dvp->n_parent without
proper locking. This doesn't prevent in any case reclaim of the vnode.
Avoid this not going over-the-wire in this case and relying on subsequent
smbfs_getattr() call to restore consistency.
While I'm here, change a couple of SMBVDEBUG() in MPASS().
sbmfs_smb_lookup() doesn't and shouldn't know about '.' and '..'

Reported by: pho's stress2 suite


248099 09-Mar-2013 davide

- Initialize variable in smbfs_rename() to silent compiler warning
- Fix smbfs_mkdir() return value (in case of error).

Reported by: pho


248097 09-Mar-2013 attilio

Garbage collect NWFS and NCP bits which are now completely disconnected
from the tree since few months.

This patch is not targeted for MFC.


248084 09-Mar-2013 attilio

Switch the vm_object mutex to be a rwlock. This will enable in the
future further optimizations where the vm_object lock will be held
in read mode most of the time the page cache resident pool of pages
are accessed for reading purposes.

The change is mostly mechanical but few notes are reported:
* The KPI changes as follow:
- VM_OBJECT_LOCK() -> VM_OBJECT_WLOCK()
- VM_OBJECT_TRYLOCK() -> VM_OBJECT_TRYWLOCK()
- VM_OBJECT_UNLOCK() -> VM_OBJECT_WUNLOCK()
- VM_OBJECT_LOCK_ASSERT(MA_OWNED) -> VM_OBJECT_ASSERT_WLOCKED()
(in order to avoid visibility of implementation details)
- The read-mode operations are added:
VM_OBJECT_RLOCK(), VM_OBJECT_TRYRLOCK(), VM_OBJECT_RUNLOCK(),
VM_OBJECT_ASSERT_RLOCKED(), VM_OBJECT_ASSERT_LOCKED()
* The vm/vm_pager.h namespace pollution avoidance (forcing requiring
sys/mutex.h in consumers directly to cater its inlining functions
using VM_OBJECT_LOCK()) imposes that all the vm/vm_pager.h
consumers now must include also sys/rwlock.h.
* zfs requires a quite convoluted fix to include FreeBSD rwlocks into
the compat layer because the name clash between FreeBSD and solaris
versions must be avoided.
At this purpose zfs redefines the vm_object locking functions
directly, isolating the FreeBSD components in specific compat stubs.

The KPI results heavilly broken by this commit. Thirdy part ports must
be updated accordingly (I can think off-hand of VirtualBox, for example).

Sponsored by: EMC / Isilon storage division
Reviewed by: jeff
Reviewed by: pjd (ZFS specific review)
Discussed with: alc
Tested by: pho


247665 02-Mar-2013 attilio

Garbage collect NTFS bits which are now completely disconnected from
the tree since few months.

This patch is not targeted for MFC.


247640 02-Mar-2013 attilio

Garbage collect PORTALFS bits which are now completely disconnected from
the tree since few months.

This patch is not targeted for MFC.


247635 02-Mar-2013 attilio

Garbage collect CODAFS bits which are now completely disconnected from
the tree since few months.

This patch is not targeted for MFC.


247628 02-Mar-2013 attilio

Garbage collect HPFS bits which are now already completely disconnected
from the tree since few months (please note that the userland bits
were already disconnected since a long time, thus there is no need
to update the OLD* entries).

This is not targeted for MFC.


247619 02-Mar-2013 jilles

nullfs: Improve f_flags in statfs().

Include some flags of the nullfs mount itself:
MNT_RDONLY, MNT_NOEXEC, MNT_NOSUID, MNT_UNION, MNT_NOSYMFOLLOW.

This allows userland code calling statfs() or fstatfs() to see these flags.
In particular, this allows opendir() to detect that a -t nullfs -o union
mount needs deduplication (otherwise at least . and .. are returned twice)
and allows rtld to detect a -t nullfs -o noexec mount as noexec.

Turn off the MNT_ROOTFS flag from the underlying filesystem because the
nullfs mount is definitely not the root filesystem.

Reviewed by: kib
MFC after: 1 week


247602 02-Mar-2013 pjd

Merge Capsicum overhaul:

- Capability is no longer separate descriptor type. Now every descriptor
has set of its own capability rights.

- The cap_new(2) system call is left, but it is no longer documented and
should not be used in new code.

- The new syscall cap_rights_limit(2) should be used instead of
cap_new(2), which limits capability rights of the given descriptor
without creating a new one.

- The cap_getrights(2) syscall is renamed to cap_rights_get(2).

- If CAP_IOCTL capability right is present we can further reduce allowed
ioctls list with the new cap_ioctls_limit(2) syscall. List of allowed
ioctls can be retrived with cap_ioctls_get(2) syscall.

- If CAP_FCNTL capability right is present we can further reduce fcntls
that can be used with the new cap_fcntls_limit(2) syscall and retrive
them with cap_fcntls_get(2).

- To support ioctl and fcntl white-listing the filedesc structure was
heavly modified.

- The audit subsystem, kdump and procstat tools were updated to
recognize new syscalls.

- Capability rights were revised and eventhough I tried hard to provide
backward API and ABI compatibility there are some incompatible changes
that are described in detail below:

CAP_CREATE old behaviour:
- Allow for openat(2)+O_CREAT.
- Allow for linkat(2).
- Allow for symlinkat(2).
CAP_CREATE new behaviour:
- Allow for openat(2)+O_CREAT.

Added CAP_LINKAT:
- Allow for linkat(2). ABI: Reuses CAP_RMDIR bit.
- Allow to be target for renameat(2).

Added CAP_SYMLINKAT:
- Allow for symlinkat(2).

Removed CAP_DELETE. Old behaviour:
- Allow for unlinkat(2) when removing non-directory object.
- Allow to be source for renameat(2).

Removed CAP_RMDIR. Old behaviour:
- Allow for unlinkat(2) when removing directory.

Added CAP_RENAMEAT:
- Required for source directory for the renameat(2) syscall.

Added CAP_UNLINKAT (effectively it replaces CAP_DELETE and CAP_RMDIR):
- Allow for unlinkat(2) on any object.
- Required if target of renameat(2) exists and will be removed by this
call.

Removed CAP_MAPEXEC.

CAP_MMAP old behaviour:
- Allow for mmap(2) with any combination of PROT_NONE, PROT_READ and
PROT_WRITE.
CAP_MMAP new behaviour:
- Allow for mmap(2)+PROT_NONE.

Added CAP_MMAP_R:
- Allow for mmap(PROT_READ).
Added CAP_MMAP_W:
- Allow for mmap(PROT_WRITE).
Added CAP_MMAP_X:
- Allow for mmap(PROT_EXEC).
Added CAP_MMAP_RW:
- Allow for mmap(PROT_READ | PROT_WRITE).
Added CAP_MMAP_RX:
- Allow for mmap(PROT_READ | PROT_EXEC).
Added CAP_MMAP_WX:
- Allow for mmap(PROT_WRITE | PROT_EXEC).
Added CAP_MMAP_RWX:
- Allow for mmap(PROT_READ | PROT_WRITE | PROT_EXEC).

Renamed CAP_MKDIR to CAP_MKDIRAT.
Renamed CAP_MKFIFO to CAP_MKFIFOAT.
Renamed CAP_MKNODE to CAP_MKNODEAT.

CAP_READ old behaviour:
- Allow pread(2).
- Disallow read(2), readv(2) (if there is no CAP_SEEK).
CAP_READ new behaviour:
- Allow read(2), readv(2).
- Disallow pread(2) (CAP_SEEK was also required).

CAP_WRITE old behaviour:
- Allow pwrite(2).
- Disallow write(2), writev(2) (if there is no CAP_SEEK).
CAP_WRITE new behaviour:
- Allow write(2), writev(2).
- Disallow pwrite(2) (CAP_SEEK was also required).

Added convinient defines:

#define CAP_PREAD (CAP_SEEK | CAP_READ)
#define CAP_PWRITE (CAP_SEEK | CAP_WRITE)
#define CAP_MMAP_R (CAP_MMAP | CAP_SEEK | CAP_READ)
#define CAP_MMAP_W (CAP_MMAP | CAP_SEEK | CAP_WRITE)
#define CAP_MMAP_X (CAP_MMAP | CAP_SEEK | 0x0000000000000008ULL)
#define CAP_MMAP_RW (CAP_MMAP_R | CAP_MMAP_W)
#define CAP_MMAP_RX (CAP_MMAP_R | CAP_MMAP_X)
#define CAP_MMAP_WX (CAP_MMAP_W | CAP_MMAP_X)
#define CAP_MMAP_RWX (CAP_MMAP_R | CAP_MMAP_W | CAP_MMAP_X)
#define CAP_RECV CAP_READ
#define CAP_SEND CAP_WRITE

#define CAP_SOCK_CLIENT \
(CAP_CONNECT | CAP_GETPEERNAME | CAP_GETSOCKNAME | CAP_GETSOCKOPT | \
CAP_PEELOFF | CAP_RECV | CAP_SEND | CAP_SETSOCKOPT | CAP_SHUTDOWN)
#define CAP_SOCK_SERVER \
(CAP_ACCEPT | CAP_BIND | CAP_GETPEERNAME | CAP_GETSOCKNAME | \
CAP_GETSOCKOPT | CAP_LISTEN | CAP_PEELOFF | CAP_RECV | CAP_SEND | \
CAP_SETSOCKOPT | CAP_SHUTDOWN)

Added defines for backward API compatibility:

#define CAP_MAPEXEC CAP_MMAP_X
#define CAP_DELETE CAP_UNLINKAT
#define CAP_MKDIR CAP_MKDIRAT
#define CAP_RMDIR CAP_UNLINKAT
#define CAP_MKFIFO CAP_MKFIFOAT
#define CAP_MKNOD CAP_MKNODAT
#define CAP_SOCK_ALL (CAP_SOCK_CLIENT | CAP_SOCK_SERVER)

Sponsored by: The FreeBSD Foundation
Reviewed by: Christoph Mallon <christoph.mallon@gmx.de>
Many aspects discussed with: rwatson, benl, jonathan
ABI compatibility discussed with: kib


247312 26-Feb-2013 alc

Eliminate a duplicate #include.

Sponsored by: EMC / Isilon Storage Division


247297 26-Feb-2013 attilio

Merge from vmobj-rwlock branch:
Remove unused inclusion of vm/vm_pager.h and vm/vnode_pager.h.

Sponsored by: EMC / Isilon storage division
Tested by: pho
Reviewed by: alc


247116 21-Feb-2013 jhb

Further refine the handling of stop signals in the NFS client. The
changes in r246417 were incomplete as they did not add explicit calls to
sigdeferstop() around all the places that previously passed SBDRY to
_sleep(). In addition, nfs_getcacheblk() could trigger a write RPC from
getblk() resulting in sigdeferstop() recursing. Rather than manually
deferring stop signals in specific places, change the VFS_*() and VOP_*()
methods to defer stop signals for filesystems which request this behavior
via a new VFCF_SBDRY flag. Note that this has to be a VFC flag rather than
a MNTK flag so that it works properly with VFS_MOUNT() when the mount is
not yet fully constructed. For now, only the NFS clients are set this new
flag in VFS_SET().

A few other related changes:
- Add an assertion to ensure that TDF_SBDRY doesn't leak to userland.
- When a lookup request uses VOP_READLINK() to follow a symlink, mark
the request as being on behalf of the thread performing the lookup
(cnp_thread) rather than using a NULL thread pointer. This causes
NFS to properly handle signals during this VOP on an interruptible
mount.

PR: kern/176179
Reported by: Russell Cattelan (sigdeferstop() recursion)
Reviewed by: kib
MFC after: 1 month


247072 21-Feb-2013 imp

The request queue is already locked, so we don't need the splsofclock/splx
here to note future work.


246921 17-Feb-2013 kib

Do not update the fsinfo block on each update of any fat block, this
is excessive. Postpone the flush of the fsinfo to VFS_SYNC(),
remembering the need for update with the flag MSDOSFS_FSIMOD, stored
in pm_flags.

FAT32 specification describes both FSI_Free_Count and FSI_Nxt_Free as
the advisory hints, not requiring them to be correct.

Based on the patch from bde, modified by me.

Reviewed by: bde
MFC after: 2 weeks


246793 14-Feb-2013 bapt

Revert r246791 as it needs a security review first

Reported by: gavin, rwatson


246791 14-Feb-2013 bapt

Allow fdescfs to be mounted from inside a jail

MFC after: 1 week


246634 10-Feb-2013 pfg

ext2fs: Use prototype declarations for function definitions

Submitted by: Christoph Mallon
MFC after: 2 weeks


246603 10-Feb-2013 attilio

Remove a racy checks on resident and cached pages for
tmpfs_mapped{read, write}() functions:
- tmpfs_mapped{read, write}() are only called within VOP_{READ, WRITE}(),
which check before-hand to work only on valid VREG vnodes. Also the
vnode is locked for the duration of the work, making vnode reclaiming
impossible, during the operation. Hence, vobj can never be NULL.
- Currently check on resident pages and cached pages without vm object
lock held is racy and can do even more harm than good, as a page could
be transitioning between these 2 pools and then be skipped entirely.
Skip the checks as lookups on empty splay trees are very cheap.

Discussed with: alc
Tested by: flo
MFC after: 2 weeks


246564 08-Feb-2013 pfg

ext2fs: Replace redundant EXT2_MIN_BLOCK with EXT2_MIN_BLOCK_SIZE.

Submitted by: Christoph Mallon
MFC after: 2 weeks


246563 08-Feb-2013 pfg

ext2fs: make e2fs_maxcontig local and remove tautological check.

e2fs_maxcontig was modelled after UFS when bringing the
"Orlov allocator" to ext2. On UFS fs_maxcontig is kept in the
superblock and is used by userland tools (fsck and growfs),

In ext2 this information is volatile so it is not available
for userland tools, so in this case it doesn't have sense
to carry it in the in-memory superblock.

Also remove a pointless check for MAX(1, x) > 0.

Submitted by: Christoph Mallon
MFC after: 2 weeks


246562 08-Feb-2013 pfg

Remove unused MAXSYMLINKLEN macro.

Reviewed by: mckusick
PR: kern/175794
MFC after: 1 week


246472 07-Feb-2013 kib

Stop translating the ERESTART error from the open(2) into EINTR.
Posix requires that open(2) is restartable for SA_RESTART.

For non-posix objects, in particular, devfs nodes, still disable
automatic restart of the opens. The open call to a driver could have
significant side effects for the hardware.

Noted and reviewed by: jilles
Discussed with: bde
MFC after: 2 weeks


246417 06-Feb-2013 jhb

Rework the handling of stop signals in the NFS client. The changes in
195702, 195703, and 195821 prevented a thread from suspending while holding
locks inside of NFS by forcing the thread to fail sleeps with EINTR or
ERESTART but defer the thread suspension to the user boundary. However,
this had the effect that stopping a process during an NFS request could
abort the request and trigger EINTR errors that were visible to userland
processes (previously the thread would have suspended and completed the
request once it was resumed).

This change instead effectively masks stop signals while in the NFS client.
It uses the existing TDF_SBDRY flag to effect this since SIGSTOP cannot
be masked directly. Also, instead of setting PBDRY on individual sleeps,
the NFS client now sets the TDF_SBDRY flag around each NFS request and
stop signals are masked for all sleeps during that region (the previous
change missed sleeps in lockmgr locks). The end result is that stop
signals sent to threads performing an NFS request are completely
ignored until after the NFS request has finished processing and the
thread prepares to return to userland. This restores the behavior of
stop signals being transparent to userland processes while still
preventing threads from suspending while holding NFS locks.

Reviewed by: kib
MFC after: 1 month


246352 05-Feb-2013 pfg

ext2fs: move assignment where it is not dead.

Submitted by: Christoph Mallon
MFC after: 2 weeks


246351 05-Feb-2013 pfg

ext2fs: Remove unused em_e2fsb definition..

Submitted by: Christoph Mallon
MFC after: 2 weeks


246350 05-Feb-2013 pfg

ext2fs: Remove useless rootino local variable.

Submitted by: Christoph Mallon
MFC after: 2 weeks


246349 05-Feb-2013 pfg

ext2fs: Correct off-by-one errors in FFTODT() and DDTOFT().

Submitted by: Christoph Mallon
MFC after: 2 weeks


246348 05-Feb-2013 pfg

ext2fs: Use nitems().

Submitted by: Christoph Mallon
MFC after: 2 weeks


246347 05-Feb-2013 pfg

ext2fs: Use EXT2_LINK_MAX instead of LINK_MAX

Submitted by: Christoph Mallon
MFC after: 2 weeks


246258 02-Feb-2013 pfg

ext2fs: general cleanup.

- Remove unused extern declarations in fs.h
- Correct comments in ext2_dir.h
- Several panic() messages showed wrong function names.
- Remove commented out stray line in ext2_alloc.c.
- Remove the unused macro EXT2_BLOCK_SIZE_BITS() and the then
write-only member e2fs_blocksize_bits from struct m_ext2fs.
- Remove the unused macro EXT2_FIRST_INO() and the then write-only
member e2fs_first_inode from struct m_ext2fs.
- Remove EXT2_DESC_PER_BLOCK() and the member e2fs_descpb from
struct m_ext2fs.
- Remove the unused members e2fs_bmask, e2fs_dbpg and
e2fs_mount_opt from struct m_ext2fs
- Correct harmless off-by-one error for fspath in ext2_vfsops.c.
- Remove the unused and broken macros EXT2_ADDR_PER_BLOCK_BITS()
and EXT2_DESC_PER_BLOCK_BITS().
- Remove the !_KERNEL versions of the EXT2_* macros.

Submitted by: Christoph Mallon
MFC after: 2 weeks


246219 01-Feb-2013 kib

The MSDOSFSMNT_WAITONFAT flag is bogus and broken. It does less than
track the MNT_SYNCHRONOUS flag. It is set to the latter at mount time
but not updated by MNT_UPDATE.

Use MNT_SYNCHRONOUS to decide to write the FAT updates syncrhonously.

Submitted by: bde
MFC after: 1 week


246218 01-Feb-2013 kib

Backup FATs were sometimes marked dirty by copying their first block
from the primary FAT, and then they were not marked clean on unmount.
Force marking them clean when appropriate.

Submitted by: bde
MFC after: 1 week


246217 01-Feb-2013 kib

The directory entry for dotdot was corrupted in the FAT32 case when moving
a directory to a subdir of the root directory from somewhere else.

For all directory moves that change the parent directory, the dotdot
entry must be fixed up. For msdosfs, the root directory is magic for
non-FAT32. It is less magic for FAT32, but needs the same magic for
the dotdot fixup. It didn't have it.

Both chkdsk and fsck_msdosfs fix the corrupt directory entries with no
problems.

The fix is to use the same magic for dotdot in msdosfs_rename() as in
msdosfs_mkdir().

For msdosfs_mkdir(), document the magic. When writing the dotdot entry
in mkdir, use explicitly set pcl variable instead on relying on the
start cluster of the root directory typically has a value < 65536.

Submitted by: bde
MFC after: 1 week


246216 01-Feb-2013 kib

The mountmsdosfs() function had an insane sanity test, remove it.

Trying FAT32 on a small partition failed to mount because
pmp->pm_Sectors was nonzero. Normally, FAT32 file systems are so
large that the 16-bit pm_Sectors can't hold the size. This is
indicated by setting it to 0 and using only pm_HugeSectors. But at
least old versions of newfs_msdos use the 16-bit field if possible,
and msdosfs supports this except for breaking its own support in the
sanity check. This is quite different from the handling of pm_FATsecs
-- now the 16-bit value is always ignored for FAT32 except for
checking that it is 0, and newfs_msdos doesn't use the 16-bit value
for FAT32.

Submitted by: bde
MFC after: 1 week


246215 01-Feb-2013 kib

Fix a backwards comment in markvoldirty().

Submitted by: bde
MFC after: 1 week


246213 01-Feb-2013 kib

Assert that the mbuf in the chain has sane length. Proper place for
this check is somewhere in the network code, but this assertion
already proven to be useful in catching what seems to be driver bugs
causing NFS scrambling random memory.

Discussed with: rmacklem
MFC after: 1 week


245977 27-Jan-2013 kib

Be conservative and do not try to consume more bytes than was
requested from the server for the read operation. Server shall not
reply with too large size, but client should be resilent too.

Reviewed by: rmacklem
MFC after: 1 week


245952 26-Jan-2013 pfg

Clean some 'svn:executable' properties in the tree.

Submitted by: Christoph Mallon
MFC after: 3 days


245950 26-Jan-2013 pfg

Cosmetical off-by-one

Technically, the case when all the blocks are released
is not a sanity check.
Move further the comment while here.

Suggested by: bde
MFC after: 3 days


245909 25-Jan-2013 jhb

Further cleanups to use of timestamps in NFS:
- Use NFSD_MONOSEC (which maps to time_uptime) instead of the seconds
portion of wall-time stamps to manage timeouts on events.
- Remove unused nd_starttime from the per-request structure in the new
NFS server.
- Use nanotime() for the modification time on a delegation to get as
precise a time as possible.
- Use time_second instead of extracting the second from a call to
getmicrotime().

Submitted by: bde (3)
Reviewed by: bde, rmacklem
MFC after: 2 weeks


245844 23-Jan-2013 pfg

ext2fs: fix a check for negative block numbers.

The previous change accidentally left the substraction we
were trying to avoid in case that i_blocks could become
negative.

Reported by: bde
MFC after: 4 days


245820 22-Jan-2013 pfg

ext2fs: make some inode fields match the ext2 spec.

Ext2fs uses unsigned fields in its dinode struct.
FreeBSD can have negative values in some of those
fields and the inode is meant to interact with the
system so we have never respected the unsigned
nature of most of those fields.

Block numbers and the NFS generation number do
not need to be signed so redefine them as
unsigned to better match the on-disk information.

MFC after: 1 week


245817 22-Jan-2013 pfg

ext2fs: temporarily disable the reallocation code.

Testing with fsx has revealed problems and in order to
hunt the bugs properly we need reduce the complexity.

This seems to help but is not a complete solution.

MFC after: 3 days


245613 18-Jan-2013 delphij

Make it possible to force async at server side on new NFS server, similar
to the old one's nfs.nfsrv.async.

Please note that by enabling this option (default is disabled), the system
could potentionally have silent data corruption if the server crashes
before write is committed to non-volatile storage, as the client side have
no way to tell if the data is already written.

Submitted by: rmacklem
MFC after: 2 weeks


245612 18-Jan-2013 pfg

ext2fs: Add some DOINGASYNC check to match ffs.

This is mostly cosmetical.

Reviewed by: bde
MFC after: 3 days


245611 18-Jan-2013 jhb

Use vfs_timestamp() to set file timestamps rather than invoking
getmicrotime() or getnanotime() directly in NFS.

Reviewed by: rmacklem, bde
MFC after: 1 week


245566 17-Jan-2013 jhb

Remove a no-longer-used variable after the previous change to use
VA_UTIMES_NULL.

Submitted by: bde, rmacklem
MFC after: 1 week


245508 16-Jan-2013 jhb

Use the VA_UTIMES_NULL flag to detect when NULL was passed to utimes()
instead of comparing the desired time against the current time as a
heuristic.

Reviewed by: rmacklem
MFC after: 1 week


245495 16-Jan-2013 kib

Remove the filtering of the acceptable mount options for nullfs, added
in r245004. Although the report was for noatime option which is
non-functional for the nullfs, other standard options like nosuid or
noexec are useful with it.

Reported by: Dewayne Geraghty <dewayne.geraghty@heuristicsystems.com.au>
MFC after: 3 days


245476 15-Jan-2013 jhb

- More properly handle interrupted NFS requests on an interruptible mount
by returning an error of EINTR rather than EACCES.
- While here, bring back some (but not all) of the NFS RPC statistics lost
when krpc was committed.

Reviewed by: rmacklem
MFC after: 1 week


245408 14-Jan-2013 kib

The current default size of the nullfs hash table used to lookup the
existing nullfs vnode by the lower vnode is only 16 slots. Since the
default mode for the nullfs is to cache the vnodes, hash has extremely
huge chains.

Size the nullfs hashtbl based on the current value of
desiredvnodes. Use vfs_hash_index() to calculate the hash bucket for a
given vnode.

Pointy hat to: kib
Diagnosed and reviewed by: peter
Tested by: peter, pho (previous version)
Sponsored by: The FreeBSD Foundation
MFC after: 5 days


245262 10-Jan-2013 kib

When nullfs mount is forcibly unmounted and nullfs vnode is reclaimed,
get back the leased write reference from the lower vnode. There is no
other path which can correct v_writecount on the lowervp.

Reported by: flo
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 3 days


245164 08-Jan-2013 bapt

Add support for IO_APPEND flag in fuse
This make open(..., O_APPEND) actually works on fuse filesystem.

Reviewed by: attilio


245121 07-Jan-2013 pfg

ext2fs: cleanup de dinode structure.

It was plagued with style errors and the offsets had been lost.
While here took the time to update the fields according to the
latest ext4 documentation.

Reviewed by: bde
MFC after: 3 days


245115 06-Jan-2013 gleb

tmpfs: Replace directory entry linked list with RB-Tree.

Use file name hash as a tree key, handle duplicate keys. Both VOP_LOOKUP
and VOP_READDIR operations utilize same tree for search. Directory
entry offset (cookie) is either file name hash or incremental id in case
of hash collisions (duplicate-cookies). Keep sorted per directory list
of duplicate-cookie entries to facilitate cookie number allocation.

Don't fail if previous VOP_READDIR() offset is no longer valid, start
with next dirent instead. Other file system handle it similarly.

Workaround race prone tn_readdir_last[pn] fields update.

Add tmpfs_dir_destroy() to free all dirents.

Set NFS cookies in tmpfs_dir_getdents(). Return EJUSTRETURN from
tmpfs_dir_getdents() instead of hard coded -1.

Mark directory traversal routines static as they are no longer
used outside of tmpfs_subr.c


245033 04-Jan-2013 kib

Fix reversed condition in the assertion.

Pointy hat to: kib
MFC after: 13 days


245004 03-Jan-2013 kib

Add the "nocache" nullfs mount option, which disables the caching of
the free nullfs vnodes, switching nullfs behaviour to pre-r240285.
The option is mostly intended as the last-resort when higher pressure
on the vnode cache due to doubling of the vnode counts is not
desirable.

Note that disabling the cache costs more than 2x wall time in the
metadata-hungry scenarious. The default is "cache".

Tested and benchmarked by: pho (previous version)
MFC after: 2 weeks


245000 03-Jan-2013 kib

Remove the last use of the deprecated MNT_VNODE_FOREACH interface in
the tree.

With the help from: mjg
Tested by: Ronald Klop <ronald-freebsd8@klop.yi.org>
MFC after: 2 weeks


244643 23-Dec-2012 kib

Do not force a writer to the devfs file to drain the buffer writes.

Requested and tested by: Ian Lepore <freebsd@damnhippie.dyndns.org>
MFC after: 2 weeks


244475 20-Dec-2012 pfg

More constant renaming in preparation for newer features.

We also try to make better use of the fs flags instead of
trying adapt the code according to the fs structures. In
the case of subsecond timestamps and birthtime we now
check that the feature is explicitly enabled: previously
we only checked that the reserved space was available and
silently wrote them.

This approach is much safer, especially if the filesystem
happens to use embedded inodes or support EAs.

Discussed with: Zheng Liu
MFC after: 5 days


244056 09-Dec-2012 rmacklem

Add "nfsstat -m" support for the two new NFS mount options
added by r244042.


244042 08-Dec-2012 rmacklem

Move the NFSv4.1 client patches over from projects/nfsv4.1-client
to head. I don't think the NFS client behaviour will change unless
the new "minorversion=1" mount option is used. It includes basic
NFSv4.1 support plus support for pNFS using the Files Layout only.
All problems detecting during an NFSv4.1 Bakeathon testing event
in June 2012 have been resolved in this code and it has been tested
against the NFSv4.1 server available to me.
Although not reviewed, I believe that kib@ has looked at it.


243882 05-Dec-2012 glebius

Mechanically substitute flags from historic mbuf allocator with
malloc(9) flags within sys.

Exceptions:

- sys/contrib not touched
- sys/mbuf.h edited manually


243782 02-Dec-2012 rmacklem

Add an nfssvc() option to the kernel for the new NFS client
which dumps out the actual options being used by an NFS mount.
This will be used to implement a "-m" option for nfsstat(1).

Reviewed by: alfred
MFC after: 2 weeks


243652 28-Nov-2012 pfg

Update some definitions or make them match NetBSD's headers.

Bring several definitions required for newer ext4 features.

Rename EXT2F_COMPAT_HTREE to EXT2F_COMPAT_DIRHASHINDEX since it
is not being used yet and the new name is more compatible with
NetBSD and Linux.

This change is purely cosmetic and has no effect on the real
code.

Obtained from: NetBSD
MFC after: 3 days


243641 28-Nov-2012 pfg

Partially bring r242520 to ext2fs.

When a file is first being written, the dynamic block reallocation
(implemented by ext2_reallocblks) relocates the file's blocks
so as to cluster them together into a contiguous set of blocks on
the disk.

When the cluster crosses the boundary into the first indirect block,
the first indirect block is initially allocated in a position
immediately following the last direct block. Block reallocation
would usually destroy locality by moving the indirect block out of
the way to keep the data blocks contiguous.

The issue was diagnosed long ago by Bruce Evans on ffs and surfaced
on ext2fs when block reallocaton was ported. This is only a partial
solution based on the similarities with FFS. We still require more
review of the allocation details that vary in ext2fs.

Reported by: bde
MFC after: 1 week


243548 26-Nov-2012 davide

- smbfs_rename() might return an error value without correctly upgrading
the vnode use count, and this might cause the kernel to panic if compiled
with WITNESS enable.
- Be sure to put the '\0' terminator to the rpath string.

Sponsored by: iXsystems inc.


243397 22-Nov-2012 davide

- Remove reset of vpp pointer in some places as long as it's not really
useful and has the side effect of obfuscating the code a bit.
- Remove spurious references to simple_lock.

Reported by: attilio [1]
Sponsored by: iXsystems inc.


243396 22-Nov-2012 davide

Until now, smbfs_fullpath() computed the full path starting from the
vnode and following back the chain of n_parent pointers up to the root,
without acquiring the locks of the n_parent vnodes analyzed during the
computation. This is immediately wrong because if the vnode lock is not
held there's no guarantee on the validity of the vnode pointer or the data.
In order to fix, store the whole path in the smbnode structure so that
smbfs_fullpath() can use this information.

Discussed with: kib
Reported and tested by: pho
Sponsored by: iXsystems inc.


243340 20-Nov-2012 kib

Remove the check and panic for an impossible condition. The NULL
lowervp vnode v_vnlock would cause panic due to NULL pointer
dereference much earlier.

MFC after: 1 week


243311 19-Nov-2012 attilio

r16312 is not any longer real since many years (likely since when VFS
received granular locking) but the comment present in UFS has been
copied all over other filesystems code incorrectly for several times.

Removes comments that makes no sense now.

Reviewed by: kib
MFC after: 3 days


243142 16-Nov-2012 kib

In pget(9), if PGET_NOTWEXIT flag is not specified, also search the
zombie list for the pid. This allows several kern.proc sysctls to
report useful information for zombies.

Hold the allproc_lock around all searches instead of relocking it.
Remove private pfind_locked() from the new nfs client code.

Requested and reviewed by: pjd
Tested by: pho
MFC after: 3 weeks


243039 14-Nov-2012 kib

Remove M_USE_RESERVE from the devfs cdp allocator, which is one of two
uses of M_USE_RESERVE in the kernel. This allocation is not special.

Reviewed by: alc
Tested by: pho
MFC after: 2 weeks


243038 14-Nov-2012 davide

Get rid of some old debug code. It provides checks similar to the one
offered by RedZone so there's no need to keep it.

Sponsored by: iXsystems inc.


243033 14-Nov-2012 davide

Fix the lookup in the DOTDOT case in the same way as other filesystems do,
i.e. inlining the vn_vget_ino() algorithm.

Sponsored by: iXsystems inc.


242875 10-Nov-2012 attilio

- Protect mnt_data and mnt_flags under the mount interlock
- Move mp->mnt_stat manipulation where all of them happens

Reported by: davide
Discussed with: kib
Tested by: flo
MFC after: 2 months
X-MFC: 241519, 242536,242616, 242727


242833 09-Nov-2012 attilio

Complete MPSAFE VFS interface and remove MNTK_MPSAFE flag.
Porters should refer to __FreeBSD_version 1000021 for this change as
it may have happened at the same timeframe.


242727 08-Nov-2012 attilio

- Current caching mode is completely broken because it simply relies
on timing of the operations and not real lookup, bringing too many
false positives. Remove the whole mechanism. If it needs to be
implemented, next time it should really be done in the proper way.
- Fix VOP_GETATTR() in order to cope with userland bugs that would
change the type of file and not panic. Instead it gets the entry as
if it is not existing.

Reported and tested by: flo
MFC after: 2 months
X-MFC: 241519, 242536,242616


242616 05-Nov-2012 attilio

fuse_io* must be able to crunch also VDIR vnodes.
Update assert appropriately.

Reported and Tested by: flo
MFC after: 2 months
X-MFC: 241519,242536


242536 03-Nov-2012 attilio

Fix a bug where operations was carried on even if not implemented,
leading to handling of an invalid fdip object.

Reported and tested by: flo
MFC after: 2 months
X-MFC: 241519


242476 02-Nov-2012 kib

The r241025 fixed the case when a binary, executed from nullfs mount,
was still possible to open for write from the lower filesystem. There
is a symmetric situation where the binary could already has file
descriptors opened for write, but it can be executed from the nullfs
overlay.

Handle the issue by passing one v_writecount reference to the lower
vnode if nullfs vnode has non-zero v_writecount. Note that only one
write reference can be donated, since nullfs only keeps one use
reference on the lower vnode. Always use the lower vnode v_writecount
for the checks.

Introduce the VOP_GET_WRITECOUNT to read v_writecount, which is
currently always bypassed to the lower vnode, and VOP_ADD_WRITECOUNT
to manipulate the v_writecount value, which manages a single bypass
reference to the lower vnode. Caling the VOPs instead of directly
accessing v_writecount provide the fix described in the previous
paragraph.

Tested by: pho
MFC after: 3 weeks


242387 31-Oct-2012 davide

- Do not put in the mntqueue half-constructed vnodes.
- Change the code so that it relies on vfs_hash rather than on a
home-made hashtable.
- There's no need to inline fnv_32_buf().

Reviewed by: delphij
Tested by: pho
Sponsored by: iXsystems inc.


242386 31-Oct-2012 davide

Fix panic due to page faults while in kernel mode, under conditions of
VM pressure. The reason is that in some codepaths pointers to stack
variables were passed from one thread to another.

In collaboration with: pho
Reported by: pho's stress2 suite
Sponsored by: iXsystems inc.


242384 31-Oct-2012 davide

Change the code to use %jd as printf() placeholder for uio_offset and
cast to intmax_t.

Suggested by: pjd
Sponsored by: iXsystems inc.


242097 25-Oct-2012 davide

Fix build in case we have SMBVDEBUG turned on.

Reviewed by: gnn
Approved by: gnn
Sponsored by: iXsystems inc.


242092 25-Oct-2012 davide

- Remove the references to the deprecated zalloc kernel interface
- Use M_ZERO flag in malloc() rather than bzero()
- malloc() with M_NOWAIT can't return NULL so there's no need to check

Reviewed by: alc
Approved by: alc


241896 22-Oct-2012 kib

Remove the support for using non-mpsafe filesystem modules.

In particular, do not lock Giant conditionally when calling into the
filesystem module, remove the VFS_LOCK_GIANT() and related
macros. Stop handling buffers belonging to non-mpsafe filesystems.

The VFS_VERSION is bumped to indicate the interface change which does
not result in the interface signatures changes.

Conducted and reviewed by: attilio
Tested by: pho


241844 22-Oct-2012 eadler

remove duplicate semicolons where possible.

Approved by: cperciva
MFC after: 1 week


241702 18-Oct-2012 ed

Remove unneeded D_NEEDMINOR.

This is only needed when using clonelists. This got remove in r238693.


241561 14-Oct-2012 rmacklem

Add two new options to the nfssvc(2) syscall that allow
processes running as root to suspend/resume execution
of the kernel nfsd threads. An earlier version of this
patch was tested by Vincent Hoffman (vince at unsane.co.uk)
and John Hickey (jh at deterlab.net).

Reviewed by: kib
MFC after: 2 weeks


241554 14-Oct-2012 kib

Grammar fixes.

Submitted by: bf
MFC after: 1 week


241548 14-Oct-2012 kib

Replace the XXX comment with the proper description.

MFC after: 1 week


241521 14-Oct-2012 attilio

Rename s/DEBUG()/FS_DEBUG() and s/DEBUG2G()/FS_DEBUG2G() in order to
avoid a name clash in sparc64.

MFC after: 2 months
X-MFC: r241519


241519 14-Oct-2012 attilio

Import a FreeBSD port of the FUSE Linux module.
This has been developed during 2 summer of code mandates and being revived
by gnn recently.
The functionality in this commit mirrors entirely content of fusefs-kmod
port, which doesn't need to be installed anymore for -CURRENT setups.

In order to get some sparse technical notes, please refer to:
http://lists.freebsd.org/pipermail/freebsd-fs/2012-March/013876.html

or to the project branch:
svn://svn.freebsd.org/base/projects/fuse/

which also contains granular history of changes happened during port
refinements. This commit does not came from the branch reintegration
itself because it seems svn is not behaving properly for this functionaly
at the moment.

Partly Sponsored by: Google, Summer of Code program 2005, 2011
Originally submitted by: ilya, Csaba Henk <csaba-ml AT creo DOT hu >
In collabouration with: pho
Tested by: flo, gnn, Gustau Perez,
Kevin Oberman <rkoberman AT gmail DOT com>
MFC after: 2 months


241025 28-Sep-2012 kib

Fix the mis-handling of the VV_TEXT on the nullfs vnodes.

If you have a binary on a filesystem which is also mounted over by
nullfs, you could execute the binary from the lower filesystem, or
from the nullfs mount. When executed from lower filesystem, the lower
vnode gets VV_TEXT flag set, and the file cannot be modified while the
binary is active. But, if executed as the nullfs alias, only the
nullfs vnode gets VV_TEXT set, and you still can open the lower vnode
for write.

Add a set of VOPs for the VV_TEXT query, set and clear operations,
which are correctly bypassed to lower vnode.

Tested by: pho (previous version)
MFC after: 2 weeks


241011 27-Sep-2012 mdf

Fix up kernel sources to be ready for a 64-bit ino_t.

Original code by: Gleb Kurtsou


240720 20-Sep-2012 rmacklem

Modify the NFSv4 client so that it can handle owner
and owner_group strings that consist entirely of
digits, interpreting them as the uid/gid number.
This change was needed since new (>= 3.3) Linux
servers reply with these strings by default.
This change is mandated by the rfc3530bis draft.
Reported on freebsd-stable@ under the Subject
heading "Problem with Linux >= 3.3 as NFSv4 server"
by Norbert Aschendorff on Aug. 20, 2012.

Tested by: norbert.aschendorff at yahoo.de
Reviewed by: jhb
MFC after: 2 weeks


240539 15-Sep-2012 ed

Prefer __containerof() above member2struct().

The first does proper checking of the argument types, while the latter
does not.


240464 13-Sep-2012 kib

The deadfs VOPs for vop_ioctl and vop_bmap call itself recursively,
which is an elaborate way to cause kernel panic. Change the VOPs
implementation to return EBADF for a reclaimed vnode.

While the calls to vop_bmap should not reach deadfs, it is indeed
possible for vop_ioctl, because the VOP locking protocol is to pass
the vnode to VOP unlocked. The actual panic was observed when ioctl
was called on procfs filedescriptor which pointed to an exited
process.

Reported by: zont
Tested by: pho
MFC after: 1 week


240379 12-Sep-2012 kevlo

Add VFCF_READONLY flag that indicates ntfs and xfs file systems are
only supported as read-only.


240358 11-Sep-2012 kevlo

Prevent nump NULL pointer dereference in bmap_getlbns()


240355 11-Sep-2012 kevlo

Fix style nit


240289 09-Sep-2012 rmacklem

Add a simple printf() based debug facility to the new nfs client.
Use it for a printf() that can be harmlessly generated for mmap()'d
files. It will be used extensively for the NFSv4.1 client.
Debugging printf()s are enabled by setting vfs.nfs.debuglevel to
a non-zero value. The higher the value, the more debugging printf()s.

Reviewed by: jhb
MFC after: 2 weeks


240285 09-Sep-2012 kib

Allow shared lookups for nullfs mounts, if lower filesystem supports
it. There are two problems which shall be addressed for shared
lookups use to have measurable effect on nullfs scalability:

1. When vfs_lookup() calls VOP_LOOKUP() for nullfs, which passes lookup
operation to lower fs, resulting vnode is often only shared-locked. Then
null_nodeget() cannot instantiate covering vnode for lower vnode, since
insmntque1() and null_hashins() require exclusive lock on the lower.

Change the assert that lower vnode is exclusively locked to only
require any lock. If null hash failed to find pre-existing nullfs
vnode for lower vnode and the vnode is shared-locked, the lower vnode
lock is upgraded.

2. Nullfs reclaims its vnodes on deactivation. This is due to nullfs
inability to detect reclamation of the lower vnode. Reclamation of a
nullfs vnode at deactivation time prevents a reference to the lower
vnode to become stale.

Change nullfs VOP_INACTIVE to not reclaim the vnode, instead use the
VFS_RECLAIM_LOWERVP to get notification and reclaim upper vnode
together with the reclamation of the lower vnode.

Note that nullfs reclamation procedure calls vput() on the lowervp
vnode, temporary unlocking the vnode being reclaimed. This seems to be
fine for MPSAFE filesystems, but not-MPSAFE code often put partially
initialized vnode on some globally visible list, and later can decide
that half-constructed vnode is not needed. If nullfs mount is created
above such filesystem, then other threads might catch such not
properly initialized vnode. Instead of trying to overcome this case,
e.g. by recursing the lower vnode lock in null_reclaim_lowervp(), I
decided to rely on nearby removal of the support for non-MPSAFE
filesystems.

In collaboration with: pho
MFC after: 3 weeks


239636 24-Aug-2012 pfg

Add some basic definitions for a future htree implementation.

MFC after: 3 days


239372 18-Aug-2012 kevlo

Fix typo


239359 17-Aug-2012 mjg

Remove unused member of struct indir (in_exists) from UFS and EXT2 code.

Reviewed by: mckusick
Approved by: trasz (mentor)
MFC after: 1 week


239303 15-Aug-2012 hselasky

Streamline use of cdevpriv and correct some corner cases.

1) It is not useful to call "devfs_clear_cdevpriv()" from
"d_close" callbacks, hence for example read, write, ioctl and
so on might be sleeping at the time of "d_close" being called
and then then freed private data can still be accessed.
Examples: dtrace, linux_compat, ksyms (all fixed by this patch)

2) In sys/dev/drm* there are some cases in which memory will
be freed twice, if open fails, first by code in the open
routine, secondly by the cdevpriv destructor. Move registration
of the cdevpriv to the end of the drm open routines.

3) devfs_clear_cdevpriv() is not called if the "d_open" callback
registered cdevpriv data and the "d_open" callback function
returned an error. Fix this.

Discussed with: phk
MFC after: 2 weeks


239246 14-Aug-2012 kib

Do not leave invalid pages in the object after the short read for a
network file systems (not only NFS proper). Short reads cause pages
other then the requested one, which were not filled by read response,
to stay invalid.

Change the vm_page_readahead_finish() interface to not take the error
code, but instead to make a decision to free or to (de)activate the
page only by its validity. As result, not requested invalid pages are
freed even if the read RPC indicated success.

Noted and reviewed by: alc
MFC after: 1 week


239065 05-Aug-2012 kib

After the PHYS_TO_VM_PAGE() function was de-inlined, the main reason
to pull vm_param.h was removed. Other big dependency of vm_page.h on
vm_param.h are PA_LOCK* definitions, which are only needed for
in-kernel code, because modules use KBI-safe functions to lock the
pages.

Stop including vm_param.h into vm_page.h. Include vm_param.h
explicitely for the kernel code which needs it.

Suggested and reviewed by: alc
MFC after: 2 weeks


239040 04-Aug-2012 kib

Reduce code duplication and exposure of direct access to struct
vm_page oflags by providing helper function
vm_page_readahead_finish(), which handles completed reads for pages
with indexes other then the requested one, for VOP_GETPAGES().

Reviewed by: alc
MFC after: 1 week


239039 04-Aug-2012 kib

The header uma_int.h is internal uma header, unused by this source
file. Do not include it needlessly.

Reviewed by: alc
MFC after: 1 week


238936 31-Jul-2012 davidxu

I am comparing current pipe code with the one in 8.3-STABLE r236165,
I found 8.3 is a history BSD version using socket to implement FIFO
pipe, it uses per-file seqcount to compare with writer generation
stored in per-pipe object. The concept is after all writers are gone,
the pipe enters next generation, all old readers have not closed the
pipe should get the indication that the pipe is disconnected, result
is they should get EPIPE, SIGPIPE or get POLLHUP in poll().
But newcomer should not know that previous writters were gone, it
should treat it as a fresh session.
I am trying to bring back FIFO pipe to history behavior. It is still
unclear that if single EOF flag can represent SBS_CANTSENDMORE and
SBS_CANTRCVMORE which socket-based version is using, but I have run
the poll regression test in tool directory, output is same as the one
on 8.3-STABLE now.
I think the output "not ok 18 FIFO state 6b: poll result 0 expected 1.
expected POLLHUP; got 0" might be bogus, because newcomer should not
know that old writers were gone. I got the same behavior on Linux.
Our implementation always return POLLIN for disconnected pipe even it
should return POLLHUP, but I think it is not wise to remove POLLIN for
compatible reason, this is our history behavior.

Regression test: /usr/src/tools/regression/poll


238928 31-Jul-2012 davidxu

When a thread is blocked in direct write state, it only sets PIPE_DIRECTW
flag but not PIPE_WANTW, but FIFO pipe code does not understand this internal
state, when a FIFO peer reader closes the pipe, it wants to notify the writer,
it checks PIPE_WANTW, if not set, it skips calling wakeup(), so blocked writer
never noticed the case, but in general, the writer should return from the
syscall with EPIPE error code and may get SIGPIPE signal. Setting the
PIPE_WANTW fixed problem, or you can turn off direct write, it should fix the
problem too. This bug is found by PR/170203.

Another bug in FIFO pipe code is when peer closes the pipe, another end which
is being blocked in select() or poll() is not notified, it missed to call
pipeselwakeup().

Third problem is found in poll regression test, the existing code can not
pass 6b,6c,6d tests, but FreeBSD-4 works. This commit does not fix the
problem, I still need to study more to find the cause.

PR: 170203
Tested by: Garrett Copper &lt; yanegomi at gmail dot com &gt;


238697 22-Jul-2012 kevlo

Use NULL instead of 0 for pointers


238539 16-Jul-2012 brueffer

Simply error handling by moving the allocation of np down to where it is
actually used. While here, improve style a little.

Submitted by: mjg
MFC after: 2 weeks


238491 15-Jul-2012 brueffer

Save a bzero() by using M_ZERO.

Obtained from: Dragonfly BSD (change 4faaf07c3d7ddd120deed007370aaf4d90b72ebb)
MFC after: 2 weeks


238320 10-Jul-2012 attilio

Remove a check on MNTK_UPDATE that is not really necessary as it is
handled in a code snippet above.


238315 10-Jul-2012 attilio

- Remove the unused and not completed write support for NTFS.
- Fix a bug where vfs_mountedfrom() is called also when the filesystem
is not mounted successfully.

Tested by: pho


238059 03-Jul-2012 kevlo

Fix a typo


238029 02-Jul-2012 kib

Extend the KPI to lock and unlock f_offset member of struct file. It
now fully encapsulates all accesses to f_offset, and extends f_offset
locking to other consumers that need it, in particular, to lseek() and
variants of getdirentries().

Ensure that on 32bit architectures f_offset, which is 64bit quantity,
always read and written under the mtxpool protection. This fixes
apparently easy to trigger race when parallel lseek()s or lseek() and
read/write could destroy file offset.

The already broken ABI emulations, including iBCS and SysV, are not
converted (yet).

Tested by: pho
No objections from: jhb
MFC after: 3 weeks


237987 02-Jul-2012 kib

Do not override an error from uiomove() with (non-)error result from
bwrite(). VFS needs to know about EFAULT from uiomove() and does not
care much that partially filled block writeback after EFAULT was
successfull. Early return without error causes short write to be
reported to usermode.

Reported and tested by: andreast
MFC after: 3 weeks


237367 21-Jun-2012 kib

Enable deadlock avoidance code for NFS client.

MFC after: 2 weeks


237244 18-Jun-2012 rmacklem

Fix the NFSv4 client for the case where mmap'd files are
written, but not msync'd by a process. A VOP_PUTPAGES()
called when VOP_RECLAIM() happens will usually fail, since
the NFSv4 Open has already been closed by VOP_INACTIVE().
Add a vm_object_page_clean() call to the NFSv4 client's
VOP_INACTIVE(), so that the write happens before the NFSv4
Open is closed. kib@ suggested using vgone() instead and
I will explore this, but this patch fixes things in the
meantime. For some reason, the VOP_PUTPAGES() is still
attaempted in VOP_RECLAIM(), but having this fail doesn't
cause any problems except a "stateid0 in write" being logged.

Reviewed by: kib
MFC after: 1 week


237200 17-Jun-2012 rmacklem

Move the nfsrpc_close() call in ncl_reclaim() for the NFSv4 client
to below the vnode_destroy_vobject() call, since that is where
writes are flushed.

Suggested by: kib
MFC after: 1 week


236687 06-Jun-2012 kib

Improve handling of uiomove(9) errors for the NFS client.

Do not brelse() the buffer unconditionally with BIO_ERROR set if
uiomove() failed. The brelse() treats most buffers with BIO_ERROR as
B_INVAL, dropping their content. Instead, if the write request
covered the whole buffer, remember the cached state and brelse() with
BIO_ERROR set only if the buffer was not cached previously.

Update the buffer dirtyoff/dirtyend based on the progress recorded by
uiomove() in passed struct uio, even in the presence of
error. Otherwise, usermode could see changed data in the backed pages,
but later the buffer is destroyed without write-back.

If uiomove() failed for IO_UNIT request, try to truncate the vnode
back to the pre-write state, and rewind the progress in passed uio
accordingly, following the FFS behaviour.

Reviewed by: rmacklem (some time ago)
Tested by: pho
MFC after: 1 month


236313 30-May-2012 kib

Capitalize start of sentence.

MFC after: 3 days


236188 28-May-2012 marcel

Catch a corner case where ssegs could be 0 and thus i would be 0 and
we index suinfo out of bounds (i.e. -1).

Approved by: gber


236140 27-May-2012 ed

Fix style and consistency:

- Use tabs, not spaces.
- Add tab after #define.
- Don't mix the use of BSD and ISO C unsigned integer types. Prefer the
ISO C ones.


235984 25-May-2012 gleb

Use C99-style initialization for struct dirent in preparation for
changing the structure.

Sponsored by: Google Summer of Code 2011


235922 24-May-2012 mav

Revert devfs part of r235911. I was unaware about old but unfinished
discussion between kib@ and gibbs@ about it.


235911 24-May-2012 mav

MFprojects/zfsd:
Revamp the CAM enclosure services driver.
This updated driver uses an in-kernel daemon to track state changes and
publishes physical path location information\for disk elements into the
CAM device database.

Sponsored by: Spectra Logic Corporation
Sponsored by: iXsystems, Inc.
Submitted by: gibbs, will, mav


235568 17-May-2012 rmacklem

A problem with the NFSv4 server was reported by Andrew Leonard
to freebsd-fs@, where the setfacl of an NFSv4 acl would fail.
This was caused by the VOP_ACLCHECK() call for ZFS replying
EOPNOTSUPP. After discussion with rwatson@, it was determined
that a call to VOP_ACLCHECK() before doing VOP_SETACL() is not
required. This patch fixes the problem by deleting the
VOP_ACLCHECK() call.

Tested by: Andrew Leonard (previous version)
MFC after: 1 week


235537 17-May-2012 gber

Import work done under project/nand (@235533) into head.

The NAND Flash environment consists of several distinct components:
- NAND framework (drivers harness for NAND controllers and NAND chips)
- NAND simulator (NANDsim)
- NAND file system (NAND FS)
- Companion tools and utilities
- Documentation (manual pages)

This work is still experimental. Please use with caution.

Obtained from: Semihalf
Supported by: FreeBSD Foundation, Juniper Networks


235508 16-May-2012 pfg

Fix a couple of issues that appear to be inherited from the old
8.x code:
- If the lock cannot be acquired immediately unlocks 'bar' vnode
and then locks both vnodes in order.
- wrong vnode type panics from cache_enter_time after calls by
ext2_lookup.

The fix merges the fixes from ufs/ufs_lookup.c.

Submitted by: Mateusz Guzik
Approved by: jhb@ (mentor)
Reviewed by: kib@
MFC after: 1 week


235503 16-May-2012 gleb

Skip directory entries with zero inode number during traversal.

Entries with zero inode number are considered placeholders by libc and
UFS. Fix remaining uses of VOP_READDIR in kernel: vop_stdvptocnp,
unionfs.

Sponsored by: Google Summer of Code 2011


235381 12-May-2012 rmacklem

Fix two cases in the new NFS server where a tsleep() is
used, when the code should actually protect the tested
variable with a mutex. Since the tsleep()s had a 10sec
timeout, the race would have only delayed the allocation
of a new clientid for a client. The sleeps will also
rarely occur, since having a callback in progress when
a client acquires a new clientid, is unlikely.
in practice, since having a callback in progress when
a fresh clientid is being acquired by a client is unlikely.

MFC after: 1 month


235332 12-May-2012 rmacklem

PR# 165923 reported intermittent write failures for dirty
memory mapped pages being written back on an NFS mount.
Since any thread can call VOP_PUTPAGES() to write back a
dirty page, the credentials of that thread may not have
write access to the file on an NFS server. (Often the uid
is 0, which may be mapped to "nobody" in the NFS server.)
Although there is no completely correct fix for this
(NFS servers check access on every write RPC instead of at
open/mmap time), this patch avoids the common cases by
holding onto a credential that recently opened the file
for writing and uses that credential for the write RPCs
being done by VOP_PUTPAGES() for both NFS clients.

Tested by: Joel Ray Holveck (joelh at juniper.net)
PR: kern/165923
Reviewed by: kib
MFC after: 2 weeks


235241 10-May-2012 pluknet

Fix mount interlock oversights from the previous change in r234386.

Reported by: dougb
Submitted by: Mateusz Guzik <mjguzik at gmail com>
Reviewed by: Kirk McKusick
Tested by: pho


235136 08-May-2012 jwd

Use the common api helper routine instead of freeing the namei
buffer directly.

Approved by: rmacklem (mentor)
MFC after: 1 month


234944 03-May-2012 daichi

fixed a unionfs_readdir math issue

PR: 132987
Submitted by: Matthew Fleming <mfleming@isilon.com>


234867 01-May-2012 daichi

- fixed a vnode lock hang-up issue.
- fixed an incorrect lock status issue.
- fixed an incorrect lock issue of unionfs root vnode removed.
(pointed out by keith)
- fixed an infinity loop issue.
(pointed out by dumbbell)
- changed to do LK_RELEASE expressly when unlocked.

Submitted by: ozawa@ongs.co.jp


234742 27-Apr-2012 rmacklem

It was reported via email that some non-FreeBSD NFS servers
do not include file attributes in the reply to an NFS create RPC
under certain circumstances.
This resulted in a vnode of type VNON that was not usable.
This patch adds an NFS getattr RPC to nfs_create() for this case,
to fix the problem. It was tested by the person that reported
the problem and confirmed to fix this case for their server.

Tested by: Steven Haber (steven.haber at isilon.com)
MFC after: 2 weeks


234740 27-Apr-2012 rmacklem

Fix a leak of namei lookup path buffers that occurs when a
ZFS volume is exported via the new NFS server. The leak occurred
because the new NFS server code didn't handle the case where
a file system sets the SAVENAME flag in its VOP_LOOKUP() and
ZFS does this for the DELETE case.

Tested by: Oliver Brandmueller (ob at gruft.de), hrs
PR: kern/167266
MFC after: 1 month


234607 23-Apr-2012 trasz

Remove unused thread argument to vrecycle().

Reviewed by: kib


234605 23-Apr-2012 trasz

Remove unused thread argument from vtruncbuf().

Reviewed by: kib


234482 20-Apr-2012 mckusick

This change creates a new list of active vnodes associated with
a mount point. Active vnodes are those with a non-zero use or hold
count, e.g., those vnodes that are not on the free list. Note that
this list is in addition to the list of all the vnodes associated
with a mount point.

To avoid adding another set of linkage pointers to the vnode
structure, the active list uses the existing linkage pointers
used by the free list (previously named v_freelist, now renamed
v_actfreelist).

This update adds the MNT_VNODE_FOREACH_ACTIVE interface that loops
over just the active vnodes associated with a mount point (typically
less than 1% of the vnodes associated with the mount point).

Reviewed by: kib
Tested by: Peter Holm
MFC after: 2 weeks


234422 18-Apr-2012 jh

Return EOPNOTSUPP rather than EPERM for the SF_SNAPSHOT flag because
tmpfs doesn't support snapshots.

Suggested by: bde


234386 17-Apr-2012 mckusick

Replace the MNT_VNODE_FOREACH interface with MNT_VNODE_FOREACH_ALL.
The primary changes are that the user of the interface no longer
needs to manage the mount-mutex locking and that the vnode that
is returned has its mutex locked (thus avoiding the need to check
to see if its is DOOMED or other possible end of life senarios).

To minimize compatibility issues for third-party developers, the
old MNT_VNODE_FOREACH interface will remain available so that this
change can be MFC'ed to 9. Following the MFC to 9, MNT_VNODE_FOREACH
will be removed in head.

The reason for this update is to prepare for the addition of the
MNT_VNODE_FOREACH_ACTIVE interface that will loop over just the
active vnodes associated with a mount point (typically less than
1% of the vnodes associated with the mount point).

Reviewed by: kib
Tested by: Peter Holm
MFC after: 2 weeks


234347 16-Apr-2012 jh

Sync tmpfs_chflags() with the recent changes to UFS:

- Add a check for unsupported file flags.
- Return EPERM when an user without PRIV_VFS_SYSFLAGS privilege attempts
to toggle SF_SETTABLE flags.


234346 16-Apr-2012 jh

tmpfs: Allow update mounts only for certain options.

Since r230208 update mounts were allowed if the list of mount options
contained the "export" option. This is not correct as tmpfs doesn't
really support updating all options.

Reviewed by: kevlo, trociny


234325 15-Apr-2012 gleb

Provide better description for vfs.tmpfs.memory_reserved sysctl.

Suggested by: Anton Yuzhaninov <citrin@citrin.ru>


234203 13-Apr-2012 jh

Apply changes from r234103 to ext2fs:

Return EPERM from ext2_setattr() when an user without PRIV_VFS_SYSFLAGS
privilege attempts to toggle SF_SETTABLE flags.

Flags are now stored to ip->i_flags in one place after all checks.

Also, remove SF_NOUNLINK from the checks because ext2fs doesn't support
that flag.

Reviewed by: bde


234139 11-Apr-2012 jh

Restore the blank line incorrectly removed in r234104.

Pointed out by: bde


234104 10-Apr-2012 jh

Apply changes from r233787 to ext2fs:

- Use more natural ip->i_flags instead of vap->va_flags in the final
flags check.
- Style improvements.

No functional change intended.

MFC after: 2 weeks


234064 09-Apr-2012 attilio

- Introduce a cache-miss optimization for consistency with other
accesses of the cache member of vm_object objects.
- Use novel vm_page_is_cached() for checks outside of the vm subsystem.

Reviewed by: alc
MFC after: 2 weeks
X-MFC: r234039


234025 08-Apr-2012 mckusick

Add I/O accounting to msdos filesystem.

Suggested and reviewed by: kib


234000 07-Apr-2012 gleb

tmpfs supports only INT_MAX nodes due to limitations of unit number
allocator.

Replace UINT32_MAX checks with INT_MAX. Keeping more than 2^31 nodes in
memory is not likely to become possible in foreseeable feature and would
require new unit number allocator.

Discussed with: delphij
MFC after: 2 weeks


233999 07-Apr-2012 gleb

Add vfs_getopt_size. Support human readable file system options in tmpfs.

Increase maximum tmpfs file system size to 4GB*PAGE_SIZE on 32 bit archs.

Discussed with: delphij
MFC after: 2 weeks


233998 07-Apr-2012 gleb

Add reserved memory limit sysctl to tmpfs.

Cleanup availble and used memory functions.
Check if free pages available before allocating new node.

Discussed with: delphij


233101 17-Mar-2012 kib

Add sysctl vfs.nfs.nfs_keep_dirty_on_error to switch the nfs client
behaviour on error from write RPC back to behaviour of old nfs client.
When set to not zero, the pages for which write failed are kept dirty.

PR: kern/165927
Reviewed by: alc
MFC after: 2 weeks


232960 14-Mar-2012 gleb

Prevent tmpfs_rename() deadlock in a way similar to UFS

Unlock vnodes and try to lock them one by one. Relookup fvp and tvp.

Approved by: mdf (mentor)


232959 14-Mar-2012 gleb

Don't enforce LK_RETRY to get existing vnode in tmpfs_alloc_vp()

Doomed vnode is hardly of any use here, besides all callers handle error
case. vfs_hash_get() does the same.

Don't mess with vnode holdcount, vget() takes care of it already.

Approved by: mdf (mentor)


232918 13-Mar-2012 kevlo

Use NULL instead of 0


232823 11-Mar-2012 kib

Update comment.

Submitted by: gianni


232821 11-Mar-2012 kib

Remove fifo.h. The only used function declaration from the header is
migrated to sys/vnode.h.

Submitted by: gianni


232703 08-Mar-2012 pfg

Add support for ns timestamps and birthtime to the ext2/3 driver.

When using big inodes there is sufficient space in ext3 to
keep extra resolution and birthtime (creation) timestamps.
The appropriate fields in the on-disk inode have been approved
for a long time but support for this in ext3 has not been
widely distributed.

In preparation for ext4 most linux distributions have enabled
by default such bigger inodes and some people use nanosecond
timestamps in ext3. We now support those when the inode is big
enough and while we do recognize the EXT4F_ROCOMPAT_EXTRA_ISIZE,
we maintain the extra timestamps even when they are not used.

An additional note by Bruce Evans:
We blindly accept unrepresentable tv_nsec in VOP_SETATTR(), but
all file systems have always done that. When POSIX gets around
to specifying the behaviour, it will probably require certain
rounding to the fs's resolution and not rejecting the request.
This unfortunately means that syscalls that set times can't
really tell if they succeeded without reading back the times
using stat() or similar and checking that they were set close
enough.

Reviewed by: bde
Approved by: jhb (mentor)
MFC after: 2 weeks


232701 08-Mar-2012 jhb

Add KTR_VFS traces to track modifications to a vnode's writecount.


232641 07-Mar-2012 kib

The pipe_poll() performs lockless access to the vnode to test
fifo_iseof() condition, allowing the v_fifoinfo to be reset and freed
by fifo_cleanup().

Precalculate EOF at the places were fo_wgen is changed, and cache the
state in a new pipe state flag PIPE_SAMEWGEN.

Reported and tested by: bf
Submitted by: gianni
MFC after: 1 week (a backport)


232541 05-Mar-2012 kib

Apply inlined vn_vget_ino() algorithm for ".." lookup in pseudofs.

Reported and tested by: pho
MFC after: 2 weeks


232493 04-Mar-2012 kib

Remove unneeded cast to u_int. The values as small enough to fit into
int, beside the use of MIN macro which performs type promotions.

Submitted by: bde
MFC after: 3 weeks


232485 04-Mar-2012 kevlo

Remove unnecessary casts


232483 04-Mar-2012 kevlo

Clean up style(9) nits


232467 03-Mar-2012 rmacklem

The name caching changes of r230394 exposed an intermittent bug
in the new NFS server for NFSv4, where it would report ENOENT
when the file actually existed on the server. This turned out
to be caused by not initializing ni_topdir before calling lookup()
and there was a rare case where the value on the stack location
assigned to ni_topdir happened to be a pointer to a ".." entry,
such that "dp == ndp->ni_topdir" succeeded in lookup().
This patch initializes ni_topdir to fix the problem.

MFC after: 5 days


232420 03-Mar-2012 rmacklem

Post r230394, the Lookup RPC counts for both NFS clients increased
significantly. Upon investigation this was caused by name cache
misses for lookups of "..". For name cache entries for non-".."
directories, the cache entry serves double duty. It maps both the
named directory plus ".." for the parent of the directory. As such,
two ctime values (one for each of the directory and its parent) need
to be saved in the name cache entry.
This patch adds an entry for ctime of the parent directory to the
name cache. It also adds an additional uma zone for large entries
with this time value, in order to minimize memory wastage.
As well, it fixes a couple of cases where the mtime of the parent
directory was being saved instead of ctime for positive name cache
entries. With this patch, Lookup RPC counts return to values similar
to pre-r230394 kernels.

Reported by: bde
Discussed with: kib
Reviewed by: jhb
MFC after: 2 weeks


232401 02-Mar-2012 jhb

Similar to the fixes in 226967 and 226987, purge any name cache entries
associated with the previous vnode (if any) associated with the target of
a rename(). Otherwise, a lookup of the target pathname concurrent with a
rename() could re-add a name cache entry after the namei(RENAME) lookup
in kern_renameat() had purged the target pathname.

MFC after: 2 weeks


232383 02-Mar-2012 kib

Do not expose unlocked unconstructed nullfs vnode on mount list.
Lock the native nullfs vnode lock before switching the locks.

Tested by: pho
MFC after: 1 week


232327 01-Mar-2012 rmacklem

Fix the NFS clients so that they use copyin() instead of bcopy(),
when doing direct I/O. This direct I/O code is not enabled by default.

Submitted by: kib (earlier version)
Reviewed by: kib
MFC after: 1 week


232307 29-Feb-2012 mm

Add "export" to devfs_opts[] and return EOPNOTSUPP if called with it.
Fixes mountd warnings.

Reported by: kib
MFC after: 1 week


232305 29-Feb-2012 kib

Allow shared locks for reads when lower filesystem accept shared locking.

Tested by: pho
MFC after: 1 week


232304 29-Feb-2012 kib

Document that null_nodeget() cannot take shared-locked lowervp due to
insmntque() requirements.

Tested by: pho
MFC after: 1 week


232303 29-Feb-2012 kib

In null_reclaim(), assert that reclaimed vnode is fully constructed,
instead of accepting half-constructed vnode. Previous code cannot decide
what to do with such vnode anyway, and although processing it for hash
removal, paniced later when getting rid of nullfs reference on lowervp.

While there, remove initializations from the declaration block.

Tested by: pho
MFC after: 1 week


232301 29-Feb-2012 kib

Always request exclusive lock for the lower vnode in nullfs_vget().
The null_nodeget() requires exclusive lock on lowervp to be able to
insmntque() new vnode.

Reported by: rea
Tested by: pho
MFC after: 1 week


232299 29-Feb-2012 kib

Move the code to destroy half-contructed nullfs vnode into helper
function null_destroy_proto() from null_insmntque_dtr(). Also
apply null_destroy_proto() in null_nodeget() when we raced and a vnode
is found in the hash, so the currently allocated protonode shall be
destroyed.

Lock the vnode interlock around reassigning the v_vnlock.

In fact, this path will not be exercised after several later commits,
since null_nodeget() cannot take shared-locked lowervp at all due to
insmntque() requirements.

Reported by: rea
Tested by: pho
MFC after: 1 week


232296 29-Feb-2012 kib

Merge a split multi-line comment.

MFC after: 1 week


232278 29-Feb-2012 mm

Add procfs to jail-mountable filesystems.

Reviewed by: jamie
MFC after: 1 week


232100 24-Feb-2012 kevlo

Remove an unused structure and unnecessary cast


232099 24-Feb-2012 kevlo

Check if the user has necessary permissions on the device


232059 23-Feb-2012 mm

To improve control over the use of mount(8) inside a jail(8), introduce
a new jail parameter node with the following parameters:

allow.mount.devfs:
allow mounting the devfs filesystem inside a jail

allow.mount.nullfs:
allow mounting the nullfs filesystem inside a jail

Both parameters are disabled by default (equals the behavior before
devfs and nullfs in jails). Administrators have to explicitly allow
mounting devfs and nullfs for each jail. The value "-1" of the
devfs_ruleset parameter is removed in favor of the new allow setting.

Reviewed by: jamie
Suggested by: pjd
MFC after: 2 weeks


232055 23-Feb-2012 kmacy

merge pipe and fifo implementations

Also reviewed by: jhb, jilles (initial revision)
Tested by: pho, jilles

Submitted by: gianni
Reviewed by: bde


232050 23-Feb-2012 rmacklem

hrs@ reported a panic to freebsd-stable@ under the subject line
"panic in 8.3-PRERELEASE" on Feb. 22, 2012. This panic was caused
by use of a mix of tsleep() and msleep() calls on the same event
in the new NFS server DRC code. It did "mtx_unlock(); tsleep();"
in two places, which kib@ noted introduced a slight risk that the
wakeup() would occur before the tsleep(), resulting in a 10sec
delay before waking up. This patch fixes the problem by replacing
"mtx_unlock(); tsleep();" with mtx_sleep(..PDROP..). It also
changes a nfsmsleep() call to mtx_sleep() so that the code uses
mtx_sleep() consistently within the file.

Tested by: hrs (in progress)
Reviewed by: jhb
MFC after: 5 days


231998 22-Feb-2012 kib

Use DOINGASYNC() to test for async allowance, to honor VFS syncing requests.

Noted by: bde
MFC after: 1 week


231949 21-Feb-2012 kib

Fix found places where uio_resid is truncated to int.

Add the sysctl debug.iosize_max_clamp, enabled by default. Setting the
sysctl to zero allows to perform the SSIZE_MAX-sized i/o requests from
the usermode.

Discussed with: bde, das (previous versions)
MFC after: 1 month


231932 20-Feb-2012 kevlo

Remove an unnecessary cast.


231852 17-Feb-2012 bz

Merge multi-FIB IPv6 support from projects/multi-fibv6/head/:

Extend the so far IPv4-only support for multiple routing tables (FIBs)
introduced in r178888 to IPv6 providing feature parity.

This includes an extended rtalloc(9) KPI for IPv6, the necessary
adjustments to the network stack, and user land support as in netstat.

Sponsored by: Cisco Systems, Inc.
Reviewed by: melifaro (basically)
MFC after: 10 days


231805 16-Feb-2012 rmacklem

Delete a couple of out of date comments that are no longer true in
the new NFS client.

Requested by: bde
MFC after: 1 week


231669 14-Feb-2012 tijl

Replace PRIdMAX with "jd" in a printf call. Cast the corresponding value to
intmax_t instead of uintmax_t, because the original type is off_t.


231379 10-Feb-2012 ed

Merge si_name and __si_namebuf.

The si_name pointer always points to the __si_namebuf member inside the
same object. Remove it and rename __si_namebuf to si_name.


231269 09-Feb-2012 mm

Allow mounting nullfs(5) inside jails.

This is now possible thanks to r230129.

MFC after: 1 month


231267 09-Feb-2012 mm

Add support for mounting devfs inside jails.

A new jail(8) option "devfs_ruleset" defines the ruleset enforcement for
mounting devfs inside jails. A value of -1 disables mounting devfs in
jails, a value of zero means no restrictions. Nested jails can only
have mounting devfs disabled or inherit parent's enforcement as jails are
not allowed to view or manipulate devfs(8) rules.

Utilizes new functions introduced in r231265.

Reviewed by: jamie
MFC after: 1 month


231265 09-Feb-2012 mm

Introduce the "ruleset=number" option for devfs(5) mounts.
Add support for updating the devfs mount (currently only changing the
ruleset number is supported).
Check mnt_optnew with vfs_filteropt(9).

This new option sets the specified ruleset number as the active ruleset
of the new devfs mount and applies all its rules at mount time. If the
specified ruleset doesn't exist, a new empty ruleset is created.

MFC after: 1 month


231168 07-Feb-2012 pfg

Update the data structures with some fields reserved for
ext4 but that can be used in ext3 mode.

Also adjust the internal inode to carry the birthtime,
like in UFS, which is starting to get some use when
big inodes are available.

Right now these are just placeholders for features
to come.

Approved by: jhb (mentor)
MFC after: 2 weeks


231133 07-Feb-2012 rmacklem

r228827 fixed a problem where copying of NFSv4 open credentials into
a credential structure would corrupt it. This happened when the
p argument was != NULL. However, I now realize that the copying of
open credentials should only happen for p == NULL, since that indicates
that it is a read-ahead or write-behind. This patch fixes this.
After this commit, r228827 could be reverted, but I think the code is
clearer and safer with the patch, so I am going to leave it in.
Without this patch, it was possible that a NFSv4 VOP_SETATTR() could have
changed the credentials of the caller. This would have happened if
the process doing the VOP_SETATTR() did not have the file open, but
some other process running as a different uid had the file open for writing
at the same time.

MFC after: 5 days


231088 06-Feb-2012 jhb

Rename cache_lookup_times() to cache_lookup() and retire the old API and
ABI stub for cache_lookup().


231075 06-Feb-2012 kib

Current implementations of sync(2) and syncer vnode fsync() VOP uses
mnt_noasync counter to temporary remove MNTK_ASYNC mount option, which
is needed to guarantee a synchronous completion of the initiated i/o
before syscall or VOP return. Global removal of MNTK_ASYNC option is
harmful because not only i/o started from corresponding thread becomes
synchronous, but all i/o is synchronous on the filesystem which is
initiated during sync(2) or syncer activity.

Instead of removing MNTK_ASYNC from mnt_kern_flag, provide a local
thread flag to disable async i/o for current thread only. Use the
opportunity to move DOINGASYNC() macro into sys/vnode.h and
consistently use it through places which tested for MNTK_ASYNC.

Some testing demonstrated 60-70% improvements in run time for the
metadata-intensive operations on async-mounted UFS volumes, but still
with great deviation due to other reasons.

Reviewed by: mckusick
Tested by: scottl
MFC after: 2 weeks


230803 31-Jan-2012 rmacklem

When a "mount -u" switches an NFS mount point from TCP to UDP,
any thread doing an I/O RPC with a transfer size greater than
NFS_UDPMAXDATA will be hung indefinitely, retrying the RPC.
After a discussion on freebsd-fs@, I decided to add a warning
message for this case, as suggested by Jeremy Chadwick.

Suggested by: freebsd at jdc.parodius.com (Jeremy Chadwick)
MFC after: 2 weeks


230605 27-Jan-2012 rmacklem

A problem with respect to data read through the buffer cache for both
NFS clients was reported to freebsd-fs@ under the subject "NFS
corruption in recent HEAD" on Nov. 26, 2011. This problem occurred when
a TCP mounted root fs was changed to using UDP. I believe that this
problem was caused by the change in mnt_stat.f_iosize that occurred
because rsize was decreased to the maximum supported by UDP. This
patch fixes the problem by using v_bufobj.bo_bsize instead of f_iosize,
since the latter is set to f_iosize when the vnode is allocated, but
does not change for a given vnode when f_iosize changes.

Reported by: pjd
Reviewed by: kib
MFC after: 2 weeks


230559 26-Jan-2012 rmacklem

Revert r230516, since it doesn't really fix the problem.


230552 25-Jan-2012 kib

Fix remaining calls to cache_enter() in both NFS clients to provide
appropriate timestamps. Restore the assertions which verify that
NCF_TS is set when timestamp is asked for.

Reviewed by: jhb (previous version)
MFC after: 2 weeks


230547 25-Jan-2012 jhb

Add a timeout on positive name cache entries in the NFS client. That is,
we will only trust a positive name cache entry for a specified amount of
time before falling back to a LOOKUP RPC, even if the ctime for the file
handle matches the cached copy in the name cache entry. The timeout is
configured via a new 'nametimeo' mount option and defaults to 60 seconds.
It may be set to zero to disable positive name caching entirely.

Reviewed by: rmacklem
MFC after: 1 week


230516 25-Jan-2012 rmacklem

If a mount -u is done to either NFS client that switches it
from TCP to UDP and the rsize/wsize/readdirsize is greater
than NFS_MAXDGRAMDATA, it is possible for a thread doing an
I/O RPC to get stuck repeatedly doing retries. This happens
because the RPC will use a resize/wsize/readdirsize that won't
work for UDP and, as such, it will keep failing indefinitely.
This patch returns an error for this case, to avoid the problem.
A discussion on freebsd-fs@ seemed to indicate that returning
an error was preferable to silently ignoring the "udp"/"mntudp"
option.
This problem was discovered while investigating a problem reported
by pjd@ via email.

MFC after: 2 weeks


230394 20-Jan-2012 jhb

Close a race in NFS lookup processing that could result in stale name cache
entries on one client when a directory was renamed on another client. The
root cause for the stale entry being trusted is that each per-vnode nfsnode
structure has a single 'n_ctime' timestamp used to validate positive name
cache entries. However, if there are multiple entries for a single vnode,
they all share a single timestamp. To fix this, extend the name cache
to allow filesystems to optionally store a timestamp value in each name
cache entry. The NFS clients now fetch the timestamp associated with
each name cache entry and use that to validate cache hits instead of the
timestamps previously stored in the nfsnode. Another part of the fix is
that the NFS clients now use timestamps from the post-op attributes of
RPCs when adding name cache entries rather than pulling the timestamps out
of the file's attribute cache. The latter is subject to races with other
lookups updating the attribute cache concurrently. Some more details:
- Add a variant of nfsm_postop_attr() to the old NFS client that can return
a vattr structure with a copy of the post-op attributes.
- Handle lookups of "." as a special case in the NFS clients since the name
cache does not store name cache entries for ".", so we cannot get a
useful timestamp. It didn't really make much sense to recheck the
attributes on the the directory to validate the namecache hit for "."
anyway.
- ABI compat shims for the name cache routines are present in this commit
so that it is safe to MFC.

MFC after: 2 weeks


230345 20-Jan-2012 rmacklem

Martin Cracauer reported a problem to freebsd-current@ under the
subject "Data corruption over NFS in -current". During investigation
of this, I came across an ugly bogusity in the new NFS client where
it replaced the cr_uid with the one used for the mount. This was
done so that "system operations" like the NFSv4 Renew would be
performed as the user that did the mount. However, if any other
thread shares the credential with the one doing this operation,
it could do an RPC (or just about anything else) as the wrong cr_uid.
This patch fixes the above, by using the mount credentials instead of
the one provided as an argument for this case. It appears
to have fixed Martin's problem.
This patch is needed for NFSv4 mounts and NFSv3 mounts against
some non-FreeBSD servers that do not put post operation attributes
in the NFSv3 Statfs RPC reply.

Tested by: Martin Cracauer (cracauer at cons.org)
Reviewed by: jhb
MFC after: 2 weeks


230304 18-Jan-2012 rea

Subject: NULLFS: properly destroy node hash

Use hashdestroy() instead of naive free().

Approved by: kib
MFC after: 2 weeks


230252 17-Jan-2012 kevlo

Return EOPNOTSUPP since we only support update mounts for NFS export.

Spotted by: trociny


230249 17-Jan-2012 mckusick

Make sure all intermediate variables holding mount flags (mnt_flag)
and that all internal kernel calls passing mount flags are declared
as uint64_t so that flags in the top 32-bits are not lost.

MFC after: 2 weeks


230208 16-Jan-2012 kevlo

Add nfs export support to tmpfs(5)

Reviewed by: kib


230180 16-Jan-2012 alc

When tmpfs_write() resets an extended file to its original size after an
error, we want tmpfs_reg_resize() to ignore I/O errors and unconditionally
update the file's size.

Reviewed by: kib
MFC after: 3 weeks


230145 15-Jan-2012 trociny

Abrogate nchr argument in proc_getargv() and proc_getenvv(): we always want
to read strings completely to know the actual size.

As a side effect it fixes the issue with kern.proc.args and kern.proc.env
sysctls, which didn't return the size of available data when calling
sysctl(3) with the NULL argument for oldp.

Note, in get_ps_strings(), which does actual work for proc_getargv() and
proc_getenvv(), we still have a safety limit on the size of data read in
case of a corrupted procces stack.

Suggested by: kib
MFC after: 3 days


230132 15-Jan-2012 uqs

Convert files to UTF-8


230120 14-Jan-2012 alc

Neither tmpfs_nocacheread() nor tmpfs_mappedwrite() needs to call
vm_object_pip_{add,subtract}() on the swap object because the swap
object can't be destroyed while the vnode is exclusively locked.
Moreover, even if the swap object could have been destroyed during
tmpfs_nocacheread() and tmpfs_mappedwrite() this code is broken
because vm_object_pip_subtract() does not wake up the sleeping thread
that is trying to destroy the swap object.

Free invalid pages after an I/O error. There is no virtue in keeping
them around in the swap object creating more work for the page daemon.
(I believe that any non-busy page in the swap object will now always
be valid.)

vm_pager_get_pages() does not return a standard errno, so its return
value should not be returned by tmpfs without translation to an errno
value.

There is no reason for the wakeup on vpg in tmpfs_mappedwrite() to
occur with the swap object locked.

Eliminate printf()s from tmpfs_nocacheread() and tmpfs_mappedwrite().
(The swap pager already spam your console if data corruption is
imminent.)

Reviewed by: kib
MFC after: 3 weeks


230100 14-Jan-2012 rmacklem

Tai Horgan reported via email that there were two places in
the new NFSv4 server where the code follows the wrong list.
Fortunately, for these fairly rare cases, the lc_stateid[]
lists are normally empty. This patch fixes the code to
follow the correct list.

Reported by: tai.horgan at isilon.com
Discussed with: zack
MFC after: 2 weeks


229956 11-Jan-2012 rmacklem

jwd@ reported via email that the "CacheSize" field reported by "nfsstat -e -s"
would go negative after using the "-z" option to zero out the stats.
This patch fixes that by not zeroing out the srvcache_size field
for "-z", since it is the size of the cache and not a counter.

MFC after: 2 weeks


229821 08-Jan-2012 alc

Correct an error of omission in the implementation of the truncation
operation on POSIX shared memory objects and tmpfs. Previously, neither of
these modules correctly handled the case in which the new size of the object
or file was not a multiple of the page size. Specifically, they did not
handle partial page truncation of data stored on swap. As a result, stale
data might later be returned to an application.

Interestingly, a data inconsistency was less likely to occur under tmpfs
than POSIX shared memory objects. The reason being that a different mistake
by the tmpfs truncation operation helped avoid a data inconsistency. If the
data was still resident in memory in a PG_CACHED page, then the tmpfs
truncation operation would reactivate that page, zero the truncated portion,
and leave the page pinned in memory. More precisely, the benevolent error
was that the truncation operation didn't add the reactivated page to any of
the paging queues, effectively pinning the page. This page would remain
pinned until the file was destroyed or the page was read or written. With
this change, the page is now added to the inactive queue.

Discussed with: jhb
Reviewed by: kib (an earlier version)
MFC after: 3 weeks


229802 08-Jan-2012 rmacklem

opt_inet6.h was missing from some files in the new NFS subsystem.
The effect of this was, for clients mounted via inet6 addresses,
that the DRC cache would never have a hit in the server. It also
broke NFSv4 callbacks when an inet6 address was the only one available
in the client. This patch fixes the above, plus deletes opt_inet6.h
from a couple of files it is not needed for.

MFC after: 2 weeks


229694 06-Jan-2012 jh

r222004 changed sbuf_finish() to not clear the buffer error status. As a
consequence sbuf_len() will return -1 for buffers which had the error
status set prior to sbuf_finish() call. This causes a problem in
pfs_read() which purposely uses a fixed size sbuf to discard bytes which
are not needed to fulfill the read request.

Work around the problem by using the full buffer length when
sbuf_finish() indicates an overflow. An overflowed sbuf with fixed size
is always full.

PR: kern/163076
Approved by: des
MFC after: 2 weeks


229692 06-Jan-2012 jh

Check the return value of sbuf_finish() in pfs_readlink() and return
ENAMETOOLONG if the buffer overflowed.

Approved by: des
MFC after: 2 weeks


229600 05-Jan-2012 dim

In sys/fs/nullfs/null_subr.c, in a KASSERT, output the correct vnode
pointer 'lowervp' instead of 'vp', which is uninitialized at that point.

Reviewed by: kib
MFC after: 1 week


229431 03-Jan-2012 kib

Do the vput() for the lowervp in the null_nodeget() for error case too.
Several callers of null_nodeget() did the cleanup itself, but several
missed it, most prominent being null_bypass(). Remove the cleanup from
the callers, now null_nodeget() handles lowervp free itself.

Reported and tested by: pho
MFC after: 1 week


229428 03-Jan-2012 kib

Document the state of the lowervp vnode for null_nodeget().

Tested by: pho
MFC after: 1 week


229407 03-Jan-2012 pfg

Minor cleanups to ntfs code

bzero -> memset
rename variables to avoid shadowing.

PR: 142401
Obtained from: NetBSD
Approved by jhb (mentor)


229363 03-Jan-2012 alc

Don't pass VM_ALLOC_ZERO to vm_page_grab() in tmpfs_mappedwrite() and
tmpfs_nocacheread(). It is both unnecessary and a pessimization. It
results in either the page being zeroed twice or zeroed first and then
overwritten by an I/O operation.

MFC after: 3 weeks


229272 02-Jan-2012 ed

Use strchr() and strrchr().

It seems strchr() and strrchr() are used more often than index() and
rindex(). Therefore, simply migrate all kernel code to use it.

For the XFS code, remove an empty line to make the code identical to
the code in the Linux kernel.


229200 01-Jan-2012 ed

Migrate ufs and ext2fs from skpc() to memcchr().

While there, remove a useless check from the code. memcchr() always
returns characters unequal to 0xff in this case, so inosused[i] ^ 0xff
can never be equal to zero. Also, the fact that memcchr() returns a
pointer instead of the number of bytes until the end, makes conversion
to an offset far more easy.


228864 24-Dec-2011 kevlo

Discard local array based on return values.

Pointed out by: uqs
Found with: Coverity Prevent(tm)
CID: 10089


228827 23-Dec-2011 rmacklem

During investigation of an NFSv4 client crash reported by glebius@,
jhb@ spotted that nfscl_getstateid() might modify credentials when
called from nfsrpc_read() for the case where p != NULL, whereas
nfsrpc_read() only did a crdup() to get new credentials for p == NULL.
This bug was introduced by r195510, since pre-r195510 nfscl_getstateid()
only modified credentials for the p == NULL case. This patch modifies
nfsrpc_read()/nfsrpc_write() so that they do crdup() for the p != NULL case.
It is conceivable that this bug caused the crash reported by glebius@, but
that will not be determined for some time, since the crash occurred after
about 1month of operation.

Tested by: glebius
Reviewed by: jhb
MFC after: 2 weeks


228796 22-Dec-2011 kevlo

Discarding local array based on return values


228757 21-Dec-2011 rmacklem

jwd@ reported a problem via email where the old NFS client would
get a reply of EEXIST from an NFS server when a Mkdir RPC was retried,
for an NFS over UDP mount.
Upon investigation, it was found that the client was retransmitting
the Mkdir RPC request over UDP, but with a different xid. As such,
the retransmitted message would miss the Duplicate Request Cache
in the server, causing it to reply EEXIST. The kernel client side
UDP rpc code has two timers. The first one causes a retransmit using
the same xid and socket and was set to a fixed value of 3seconds.
(The default can be overridden via CLSET_RETRY_TIMEOUT.)
The second one creates a new socket and xid and should be larger
than the first. However, both NFS clients were setting the second
timer to nm_timeo ("timeout=<value>" mount argument), which defaulted to
1second, so the first timer would never time out.
This patch fixes both NFS clients so that they set the first timer
using nm_timeo and makes the second timer larger than the first one.

Reported by: jwd
Tested by: jwd
Reviewed by: jhb
MFC after: 2 weeks


228583 16-Dec-2011 pfg

Style cleanups by jh@.
Fix a comment from the previous commit.
Use M_ZERO instead of bzero() in ext2_vfsops.c
Add include guards from PR.

PR: 162564
Approved by: jhb (mentor)
MFC after: 2 weeks


228560 16-Dec-2011 rmacklem

Patch the new NFS server in a manner analagous to r228520 for the
old NFS server, so that it correctly handles a count == 0 argument
for Commit.

PR: kern/118126
MFC after: 2 weeks


228539 15-Dec-2011 pfg

Bring in reallocblk to ext2fs.

The feature has been standard for a while in UFS as a means to reduce
fragmentation, therefore maintaining consistent performance with
filesystem aging. This is also very similar to what ext4 calls
"delayed allocation".

In his 2010 GSoC, Zheng Liu ported and benchmarked the missing
FANCY_REALLOC code to find more consistent performance improvements than
with the preallocation approach.

PR: 159233
Author: Zheng Liu <gnehzuil AT SPAMFREE gmail DOT com>
Sponsored by: Google Inc.
Approved by: jhb (mentor)
MFC after: 2 weeks


228507 14-Dec-2011 pfg

Merge ext2_readwrite.c into ext2_vnops.c as done in UFS in r101729.

This removes the obfuscations mentioned in ext2_readwrite and
places the clustering funtion in a location similar to other
UFS-based implementations.

No performance or functional changeses are expected from
this move.

PR: kern/159232
Suggested by: bde
Approved by: jhb (mentor)
MFC after: 2 weeks


228361 09-Dec-2011 jhb

Explicitly use curthread while manipulating td_fpop during last close
of a devfs file descriptor in devfs_close_f(). The passed in td argument
may be NULL if the close was invoked by garbage collection of open
file descriptors in pending control messages in the socket buffer of a
UNIX domain socket after it was closed.

PR: kern/151758
Submitted by: Andrey Shidakov andrey shidakov ru
Submitted by: Ruben van Staveren ruben verweg com
Reviewed by: kib
MFC after: 2 weeks


228263 04-Dec-2011 kib

Initialize fifoinfo fi_wgen field on open. The only important is the
difference between fi_wgen and f_seqcount, so the change is purely
cosmetic, but it makes the code easier to understand.

Submitted by: gianni
MFC after: 2 weeks


228260 04-Dec-2011 rmacklem

This patch adds a sysctl to the NFSv4 server which optionally disables the
check for a UTF-8 compliant file name. Enabling this sysctl results in
an NFSv4 server that is non-RFC3530 compliant, therefore it is not enabled
by default. However, enabling this sysctl results in NFSv3 compatible
behaviour and fixes the problem reported by "dan at sunsaturn.com"
to freebsd-current@ on Nov. 14, 2011 under the subject "NFSV4 readlink_stat".

Tested by: dan at sunsaturn.com
Reviewed by: zack
MFC after: 2 weeks


228217 03-Dec-2011 rmacklem

Post r223774, the NFSv4 client no longer has multiple instances
of the same lock_owner4 string. As such, the handling of cleanup
of lock_owners could be simplified. This simplification permitted
the client to do a ReleaseLockOwner operation when the process that
the lock_owner4 string represents, has exited. This permits the
server to release any storage related to the lock_owner4 string
before the associated open is closed. Without this change, it
is possible to exhaust a server's storage when a long running
process opens a file and then many child processes do locking
on the file, because the open doesn't get closed. A similar patch
was applied to the Linux NFSv4 client recently so that it wouldn't
exhaust a server's storage.

Reviewed by: zack
MFC after: 2 weeks


228185 01-Dec-2011 jhb

Enhance the sequential access heuristic used to perform readahead in the
NFS server and reuse it for writes as well to allow writes to the backing
store to be clustered.
- Use a prime number for the size of the heuristic table (1017 is not
prime).
- Move the logic to locate a heuristic entry from the table and compute
the sequential count out of VOP_READ() and into a separate routine.
- Use the logic from sequential_heuristic() in vfs_vnops.c to update the
seqcount when a sequential access is performed rather than just
increasing seqcount by 1. This lets the clustering count ramp up
faster.
- Allow for some reordering of RPCs and if it is detected leave the current
seqcount as-is rather than dropping back to a seqcount of 1. Also,
when out of order access is encountered, cut seqcount in half rather than
dropping it all the way back to 1 to further aid with reordering.
- Fix the new NFS server to properly update the next offset after a
successful VOP_READ() so that the readahead actually works.

Some of these changes came from an earlier patch by Bjorn Gronwall that was
forwarded to me by bde@.

Discussed with: bde, rmacklem, fs@
Submitted by: Bjorn Gronwall (1, 4)
MFC after: 2 weeks


228156 30-Nov-2011 kib

Rename vm_page_set_valid() to vm_page_set_valid_range().
The vm_page_set_valid() is the most reasonable name for the m->valid
accessor.

Reviewed by: attilio, alc


228023 27-Nov-2011 kevlo

Add unicode support to ntfs

Obtained from: imura


227834 22-Nov-2011 trociny

In procfs_doproccmdline() if arguments are not cashed read them from
the process stack.

Suggested by: kib
Reviewed by: kib
Tested by: pho
MFC after: 2 weeks


227822 22-Nov-2011 ivoras

Avoid panics from recursive rename operations. Not a perfect patch but
good enough for now.

PR: kern/159418
Submitted by: Gleb Kurtsou
Reviewed by: kib
MFC after: 1 month


227817 22-Nov-2011 kib

Put all the messages from msdosfs under the MSDOSFS_DEBUG ifdef.
They are confusing to user, and not informative for general consumption.

MFC after: 1 week


227809 22-Nov-2011 rmacklem

This patch enables the new/default NFS server's use of shared
vnode locking for read, readdir, readlink, getattr and access.
It is hoped that this will improve server performance for these
operations, since they will no longer be serialized for a given
file/vnode.


227802 21-Nov-2011 delphij

Improve the way to calculate available pages in tmpfs:

- Don't deduct wired pages from total usable counts because it does not
make any sense. To make things worse, on systems where swap size is
smaller than physical memory and use a lot of wired pages (e.g. ZFS),
tmpfs can suddenly have free space of 0 because of this;
- Count cached pages as available; [1]
- Don't count inactive pages as available, technically we could but that
might be too aggressive; [1]

[1] Suggested by kib@

MFC after: 1 week


227796 21-Nov-2011 rmacklem

Clean up some cruft in the NFSv4 client left over from the
OpenBSD port, so that it is more readable. No logic change
is made by this commit.

MFC after: 2 weeks


227760 20-Nov-2011 rmacklem

Add two arguments to the nfsrpc_rellockown() function in the NFSv4
client. This does not change the client's behaviour, but prepares
the code so that nfsrpc_rellockown() can be called elsewhere in a
future commit.

MFC after: 2 weeks


227744 20-Nov-2011 rmacklem

Since the nfscl_cleanup() function isn't used by the FreeBSD NFSv4 client,
delete the code and fix up the related comments. This should not have
any functional effect on the client.

MFC after: 2 weeks


227743 20-Nov-2011 rmacklem

Post r223774 the NFSv4 client never uses the linked list with the
head nfsc_defunctlockowner. This patch simply removes the code that
loops through this always empty list, since the code no longer does
anything useful. It should not have any effect on the client's
behaviour.

MFC after: 2 weeks


227697 19-Nov-2011 kib

Existing VOP_VPTOCNP() interface has a fatal flow that is critical for
nullfs. The problem is that resulting vnode is only required to be
held on return from the successfull call to vop, instead of being
referenced.

Nullfs VOP_INACTIVE() method reclaims the vnode, which in combination
with the VOP_VPTOCNP() interface means that the directory vnode
returned from VOP_VPTOCNP() is reclaimed in advance, causing
vn_fullpath() to error with EBADF or like.

Change the interface for VOP_VPTOCNP(), now the dvp must be
referenced. Convert all in-tree implementations of VOP_VPTOCNP(),
which is trivial, because vhold(9) and vref(9) are similar in the
locking prerequisites. Out-of-tree fs implementation of VOP_VPTOCNP(),
if any, should have no trouble with the fix.

Tested by: pho
Reviewed by: mckusick
MFC after: 3 weeks (subject of re approval)


227696 19-Nov-2011 kib

Do not use NULLVPTOLOWERVP() in the null_print(). If diagnostic is compiled
in, and show vnode is used from ddb on the faulty nullfs vnode, we get
panic instead of vnode dump.

MFC after: 1 week


227695 19-Nov-2011 kib

Use the plain panic calls, without additional printing around them.
The debugger and dumping support is adequate.

Tested by: pho
MFC after: 1 week


227650 18-Nov-2011 kevlo

Add unicode support to msdosfs and smbfs; original pathes from imura,
bug fixes by Kuan-Chung Chiu <buganini at gmail dot com>.

Tested by me in production for several days at work.


227576 16-Nov-2011 kib

Fix build, use %d for int value formatting.


227550 16-Nov-2011 pho

Handle invalid large values for getdirentries(2) data buffer size.

In collaboration with: kib
Reviewed by: des
Reported by: The iknowthis syscall fuzzer.
MFC after: 1 week


227543 15-Nov-2011 rmacklem

Modify the new NFS client so that nfs_fsync() only calls ncl_flush()
for regular files. Since other file types don't write into the
buffer cache, calling ncl_flush() is almost a no-op. However, it does
clear the NMODIFIED flag and this shouldn't be done by nfs_fsync() for
directories.

MFC after: 2 weeks


227527 15-Nov-2011 pho

Removed extra PRELE() call.

MFC after: 1 week


227517 15-Nov-2011 rmacklem

Move the setting of the default value for nm_wcommitsize to
before the nfs_decode_args() call in the new NFS client, so
that a specfied command line value won't be overwritten.
Also, modify the calculation for small values of desiredvnodes
to avoid an unusually large value or a divide by zero crash.
It seems that the default value for nm_wcommitsize is very
conservative and may need to change at some time.

PR: kern/159351
Submitted by: onwahe at gmail.com (earlier version)
Reviewed by: jhb
MFC after: 2 weeks


227507 14-Nov-2011 jhb

Finish making 'wcommitsize' an NFS client mount option.

Reviewed by: rmacklem
MFC after: 1 week


227504 14-Nov-2011 jhb

Sync with the old NFS client: Remove an obsolete comment.


227494 14-Nov-2011 rmacklem

Since NFSv4 byte range locking only works for regular files,
add a sanity check for the vnode type to the NFSv4 client.

MFC after: 2 weeks


227493 13-Nov-2011 rmacklem

Move the assignment of default values for some mount options
to before the nfs_decode_args() call in the new NFS client,
so they don't overwrite the value specified on the command line.

MFC after: 2 weeks


227489 13-Nov-2011 eadler

- fix duplicate "a a" in some comments

Submitted by: eadler
Approved by: simon
MFC after: 3 days


227393 09-Nov-2011 kib

Lock the thread lock around block that retrieves td_wmesg. Otherwise,
procfs could see a thread with assigned td_wchan but still NULL td_wmesg.

Reported and tested by: pho
MFC after: 1 week


227310 07-Nov-2011 marcel

Don astbestos garment and remove the warning about TMPFS being experimental
-- highly experimental even. So far the closest to a bug in TMPFS that people
have gotten to relates to how ZFS can take away from the memory that TMPFS
needs. One can argue that such is not a bug in TMPFS. Irrespective, even if
there is a bug here and there in TMPFS, it's not in our own advantage to
scare people away from using TMPFS. I for one have been using it, even with
ZFS, very successfully.


227309 07-Nov-2011 ed

Mark all SYSCTL_NODEs static that have no corresponding SYSCTL_DECLs.

The SYSCTL_NODE macro defines a list that stores all child-elements of
that node. If there's no SYSCTL_DECL macro anywhere else, there's no
reason why it shouldn't be static.


227293 07-Nov-2011 ed

Mark MALLOC_DEFINEs static that have no corresponding MALLOC_DECLAREs.

This means that their use is restricted to a single C file.


227267 06-Nov-2011 ed

Remove MALLOC_DECLAREs of nonexisting malloc-pools.

After careful grepping, it seems none of these pools can be found in our
source tree. They are not in use, nor are they defined.


227104 05-Nov-2011 kib

Fix typo.

MFC after: 3 days


227069 04-Nov-2011 jhb

Move the cleanup of f_cdevpriv when the reference count of a devfs
file descriptor drops to zero out of _fdrop() and into devfs_close_f()
as it is only relevant for devfs file descriptors.

Reviewed by: kib
MFC after: 1 week


227062 03-Nov-2011 kib

Fix kernel panic when d_fdopen csw method is called for NULL fp.
This may happen when kernel consumer calls VOP_OPEN().

Reported by: Tavis Ormandy <taviso cmpxchg8b com> through delphij
MFC after: 3 days


226987 01-Nov-2011 pho

Added missing cache purge of from argument for rename().

Reported by: Anton Yuzhaninov <citrin citrin ru>
In collaboration with: kib
MFC after: 1 week


226688 24-Oct-2011 kib

The use of VOP_ISLOCKED() without a check for the return values can cause
false positives. Replace the #ifdef block with the proper
ASSERT_VOP_UNLOCKED() assert.

Tested by: pho
MFC after: 1 week


226687 24-Oct-2011 kib

The only possible error return from null_nodeget() is due to insmntque1
failure (the getnewvnode cannot return an error). In this case, the
null_insmntque_dtr() already unlocked the reclaimed vnode, so VOP_UNLOCK()
in the nullfs_mount() after null_nodeget() failure is wrong.

Tested by: pho
MFC after: 1 week


226686 24-Oct-2011 kib

The covered vnode must be reloced if it was unlocked. Remove VOP_ISLOCKED
test because of this and also because it can lead to false positives.

Tested by: pho
MFC after: 1 week


226681 24-Oct-2011 pho

Only unlock if the lock is exclusive.

Reported by: Subbsd <subbsd gmail com>
Discussed with: kib


226497 18-Oct-2011 des

Trace attempts to open a portal device.

Ceterum censeo portalfs esse delendam.


226234 10-Oct-2011 trasz

Make unionfs also clear VAPPEND when clearing VWRITE, since VAPPEND
is just a modifier for VWRITE.

Submitted by: rmacklem


226041 05-Oct-2011 kib

Export devfs inode number allocator for the kernel consumers.

Reviewed by: jhb
MFC after: 2 weeks


225617 16-Sep-2011 kmacy

In order to maximize the re-usability of kernel code in user space this
patch modifies makesyscalls.sh to prefix all of the non-compatibility
calls (e.g. not linux_, freebsd32_) with sys_ and updates the kernel
entry points and all places in the code that use them. It also
fixes an additional name space collision between the kernel function
psignal and the libc function of the same name by renaming the kernel
psignal kern_psignal(). By introducing this change now we will ease future
MFCs that change syscalls.

Reviewed by: rwatson
Approved by: re (bz)


225418 06-Sep-2011 kib

Split the vm_page flags PG_WRITEABLE and PG_REFERENCED into atomic
flags field. Updates to the atomic flags are performed using the atomic
ops on the containing word, do not require any vm lock to be held, and
are non-blocking. The vm_page_aflag_set(9) and vm_page_aflag_clear(9)
functions are provided to modify afalgs.

Document the changes to flags field to only require the page lock.

Introduce vm_page_reference(9) function to provide a stable KPI and
KBI for filesystems like tmpfs and zfs which need to mark a page as
referenced.

Reviewed by: alc, attilio
Tested by: marius, flo (sparc64); andreast (powerpc, powerpc64)
Approved by: re (bz)


225356 03-Sep-2011 rmacklem

Fix the NFS servers so that they can do a Lookup of "..",
which requires that ni_strictrelative be set to 0, post-r224810.

Tested by: swills (earlier version), geo dot liaskos at gmail.com
Approved by: re (kib)


225049 20-Aug-2011 rmacklem

Fix the NFSv4 server so that it returns NFSERR_SYMLINK when
an attempt to do an Open operation on any type of file other
than VREG is done. A recent discussion on the IETF working group's
mailing list (nfsv4@ietf.org) decided that NFSERR_SYMLINK
should be returned for all non-regular files and not just symlinks,
so that the Linux client would work correctly.
This change does not affect the FreeBSD NFSv4 client and is not
believed to have a negative effect on other NFSv4 clients.

Reviewed by: zkirsch
Approved by: re (kib)
MFC after: 2 weeks


224915 16-Aug-2011 kib

Do not return success and a string "unknown" when vn_fullpath() was unable
to resolve the path of the text vnode of the process. The behaviour is
very confusing for any consumer of the procfs, in particular, java.

Reported and tested by: bf
MFC after: 2 weeks
Approved by: re (bz)


224914 16-Aug-2011 kib

Add the fo_chown and fo_chmod methods to struct fileops and use them
to implement fchown(2) and fchmod(2) support for several file types
that previously lacked it. Add MAC entries for chown/chmod done on
posix shared memory and (old) in-kernel posix semaphores.

Based on the submission by: glebius
Reviewed by: rwatson
Approved by: re (bz)


224911 16-Aug-2011 jonathan

Fix a merge conflict.

r224086 added "goto out"-style error handling to nfssvc_nfsd(), in order
to reliably call NFSEXITCODE() before returning. Our Capsicum changes,
based on the old "return (error)" model, did not merge nicely.

Approved by: re (kib), mentor (rwatson)
Sponsored by: Google Inc


224778 11-Aug-2011 rwatson

Second-to-last commit implementing Capsicum capabilities in the FreeBSD
kernel for FreeBSD 9.0:

Add a new capability mask argument to fget(9) and friends, allowing system
call code to declare what capabilities are required when an integer file
descriptor is converted into an in-kernel struct file *. With options
CAPABILITIES compiled into the kernel, this enforces capability
protection; without, this change is effectively a no-op.

Some cases require special handling, such as mmap(2), which must preserve
information about the maximum rights at the time of mapping in the memory
map so that they can later be enforced in mprotect(2) -- this is done by
narrowing the rights in the existing max_protection field used for similar
purposes with file permissions.

In namei(9), we assert that the code is not reached from within capability
mode, as we're not yet ready to enforce namespace capabilities there.
This will follow in a later commit.

Update two capability names: CAP_EVENT and CAP_KEVENT become
CAP_POST_KEVENT and CAP_POLL_KEVENT to more accurately indicate what they
represent.

Approved by: re (bz)
Submitted by: jonathan
Sponsored by: Google Inc


224743 09-Aug-2011 kib

Do not update mountpoint generation counter to the value which was not
yet acted upon by devfs_populate().

Submitted by: Kohji Okuno <okuno.kohji jp panasonic com>
Approved by: re (bz)
MFC after: 1 week


224637 03-Aug-2011 zack

Fix an NFS server issue where it was not correctly setting the eof flag when a
READ had hit the end of the file. Also, clean up some cruft in the code.

Approved by: re (kib)
Reviewed by: rmacklem
MFC after: 2 weeks


224606 02-Aug-2011 rmacklem

Fix a LOR in the NFS client which could cause a deadlock.
This was reported to the mailing list freebsd-net@freebsd.org
on July 21, 2011 under the subject "LOR with nfsclient sillyrename".
The LOR occurred when nfs_inactive() called vrele(sp->s_dvp)
while holding the vnode lock on the file in s_dvp. This patch
modifies the client so that it performs the vrele(sp->s_dvp)
as a separate task to avoid the LOR. This fix was discussed
with jhb@ and kib@, who both proposed variations of it.

Tested by: pho, jlott at averesystems.com
Submitted by: jhb (earlier version)
Reviewed by: kib
Approved by: re (kib)
MFC after: 2 weeks


224554 31-Jul-2011 rmacklem

Fix rename in the new NFS server so that it does not require a
recursive vnode lock on the directory for the case where the
new file name is in the same directory as the old one. The patch
handles this as a special case, recognized by the new directory
having the same file handle as the old one and just VREF()s the old
dir vnode for this case, instead of doing a second VFS_FHTOVP() to get it.
This is required so that the server will work for file systems like
msdosfs, that do not support recursive vnode locking.
This problem was discovered during recent testing by pho@
when exporting an msdosfs file system via the new NFS server.

Tested by: pho
Reviewed by: zkirsch
Approved by: re (kib)
MFC after: 2 weeks


224532 30-Jul-2011 rmacklem

The new NFS client failed to vput() the new vnode if a setattr
failed after the file was created in nfs_create(). This would
probably only happen during a forced dismount. The old NFS client
does have a vput() for this case. Detected by pho during recent
testing, where an open syscall returned with a vnode still locked.

Tested by: pho
Approved by: re (kib)
MFC after: 2 weeks


224290 24-Jul-2011 mckusick

This update changes the mnt_flag field in the mount structure from
32 bits to 64 bits and eliminates the unused mnt_xflag field. The
existing mnt_flag field is completely out of bits, so this update
gives us room to expand. Note that the f_flags field in the statfs
structure is already 64 bits, so the expanded mnt_flag field can
be exported without having to make any changes in the statfs structure.

Approved by: re (bz)


224121 17-Jul-2011 zack

Revert revision 224079 as Rick pointed out that I would be calling VOP_PATHCONF
without the vnode lock held.

Implicitly approved by: zml (mentor)


224117 16-Jul-2011 rmacklem

The new NFSv4 client handled NFSERR_GRACE as a fatal error
for the remove and rename operations. Some NFSv4 servers will
report NFSERR_GRACE for these operations. This patch changes
the behaviour of the client so that it handles NFSERR_GRACE
like NFSERR_DELAY for non-state related operations like
remove and rename. It also exempts the delegreturn operation
from handling within newnfs_request() for NFSERR_DELAY/NFSERR_GRACE
so that it can handle NFSERR_GRACE in the same manner as before.
This problem was resolved thanks to discussion with bfields at fieldses.org.
The problem was identified at the recent NFSv4 ineroperability
bakeathon.

MFC after: 2 weeks


224086 16-Jul-2011 zack

Add DEXITCODE plumbing to NFS.

Isilon has the concept of an in-memory exit-code ring that saves the last exit
code of a function and allows for stack tracing. This is very helpful when
debugging tough issues.

This patch is essentially a no-op for BSD at this point, until we upstream
the dexitcode logic itself. The patch adds DEXITCODE calls to every NFS
function that returns an errno error code. A number of code paths were also
reorganized to have single exit paths, to reduce code duplication.

Submitted by: David Kwan <dkwan@isilon.com>
Reviewed by: rmacklem
Approved by: zml (mentor)
MFC after: 2 weeks


224083 16-Jul-2011 zack

Simple find/replace of VOP_ISLOCKED -> NFSVOPISLOCKED. This is done so that NFSVOPISLOCKED can be modified later to add enhanced logging and assertions.

Reviewed by: rmacklem
Approved by: zml (mentor)
MFC after: 2 weeks


224082 16-Jul-2011 zack

Simple find/replace of VOP_UNLOCK -> NFSVOPUNLOCK. This is done so that NFSVOPUNLOCK can be modified later to add enhanced logging and assertions.

Reviewed by: rmacklem
Approved by: zml (mentor)
MFC after: 2 weeks


224081 16-Jul-2011 zack

Simple find/replace of vn_lock -> NFSVOPLOCK. This is done so that NFSVOPLOCK can be modified later to add enhanced logging and assertions.

Reviewed by: rmacklem
Approved by: zml (mentor)
MFC after: 2 weeks


224080 16-Jul-2011 zack

Remove unnecessary thread pointer from VOPLOCK macros and current users.

Reviewed by: rmacklem
Approved by: zml (mentor)
MFC after: 2 weeks


224079 16-Jul-2011 zack

Change loadattr and fillattr to ask the file system for the pathconf variable.

Small modification where VOP_PATHCONF was being called directly.

Reviewed by: rmacklem
Approved by: zml (mentor)
MFC after: 2 weeks


224078 16-Jul-2011 zack

Move nfsvno_pathconf to be accessible to sys/fs/nfs; no functionality change.

Reviewed by: rmacklem
Approved by: zml (mentor)
MFC after: 2 weeks


224077 16-Jul-2011 zack

Small acl patch to return the aclerror that comes back from nfsrv_dissectacl(). This fixes a problem where ATTRNOTSUPP was being returned instead of BADOWNER.

Reviewed by: rmacklem
Approved by: zml (mentor)
MFC after: 2 weeks


223988 13-Jul-2011 kib

While fixing the looping of a thread while devfs vnode is reclaimed,
r179247 introduced a possibility of devfs_allocv() returning spurious
ENOENT. If the vnode is selected by vnlru daemon for reclamation, then
devfs_allocv() can get ENOENT from vget() due to devfs_close() dropping
vnode lock around the call to cdevsw d_close method.

Use LK_RETRY in the vget() call, and do some part of the devfs_reclaim()
work in devfs_allocv(), clearing vp->v_data and de->de_vnode. Retry the
allocation of the vnode, now with de->de_vnode == NULL.

The check vp->v_data == NULL at the start of devfs_close() cannot be
affected by the change, since vnode lock must be held while VI_DOOMED
is set, and only dropped after the check.

Reported and tested by: Kohji Okuno <okuno.kohji jp panasonic com>
Reviewed by: attilio
MFC after: 3 weeks


223971 13-Jul-2011 rmacklem

r222389 introduced a case where the NFSv4 client could
loop in nfscl_getcl() when a forced dismount is in progress,
because nfsv4_lock() will return 0 without sleeping when
MNTK_UNMOUNTF is set.
This patch fixes it so it won't loop calling nfsv4_lock()
for this case.

MFC after: 2 weeks


223843 07-Jul-2011 jonathan

Make a comment more accurate.

This comment refers to CAP_NT_SMBS, which does not exist; it should refer to SMB_CAP_NT_SMBS.
Fixing this comment makes it easier for people interested in Capsicum to grep around for
capability rights, whose identifiers are of the form 'CAP_[A-Z_]'.

Approved by: mentor (rwatson), re (Capsicum blanket)
Sponsored by: Google Inc


223774 04-Jul-2011 rmacklem

The algorithm used by nfscl_getopen() could have resulted in
multiple instances of the same lock_owner when a process both
inherited an open file descriptor plus opened the same file itself.
Since some NFSv4 servers cannot handle multiple instances of
the same lock_owner string, this patch changes the algorithm
used by nfscl_getopen() in the new NFSv4 client to keep that
from happening. The new algorithm is simpler, since there is
no longer any need to ascend the process's parentage tree because
all NFSv4 Closes for a file are done at VOP_INACTIVE()/VOP_RECLAIM(),
making the Opens indistinct w.r.t. use with Lock Ops.
This problem was discovered at the recent NFSv4 interoperability
Bakeathon.

MFC after: 2 weeks


223747 03-Jul-2011 rmacklem

Modify the new NFSv4 client so that it appends a file handle
to the lock_owner4 string that goes on the wire. Also, add
code to do a ReleaseLockOwner Op on the lock_owner4 string
before a Close. Apparently not all NFSv4 servers handle multiple
instances of the same lock_owner4 string, at least not in a
compatible way. This patch avoids having multiple instances,
except for one unusual case, which will be fixed by a future commit.
Found at the recent NFSv4 interoperability Bakeathon.

Tested by: tdh at excfb.com
MFC after: 2 weeks


223677 29-Jun-2011 alc

Add a new option, OBJPR_NOTMAPPED, to vm_object_page_remove(). Passing this
option to vm_object_page_remove() asserts that the specified range of pages
is not mapped, or more precisely that none of these pages have any managed
mappings. Thus, vm_object_page_remove() need not call pmap_remove_all() on
the pages.

This change not only saves time by eliminating pointless calls to
pmap_remove_all(), but it also eliminates an inconsistency in the use of
pmap_remove_all() versus related functions, like pmap_remove_write(). It
eliminates harmless but pointless calls to pmap_remove_all() that were being
performed on PG_UNMANAGED pages.

Update all of the existing assertions on pmap_remove_all() to reflect this
change.

Reviewed by: kib


223657 28-Jun-2011 rmacklem

Fix the new NFSv4 client so that it doesn't fill the cached
mode attribute in as 0 when doing writes. The change adds
the Mode attribute plus the others except Owner and Owner_group
to the list requested by the NFSv4 Write Operation. This fixed
a problem where an executable file built by "cc" would get mode
0111 instead of 0755 for some NFSv4 servers.
Found at the recent NFSv4 interoperability Bakeathon.

Tested by: tdh at excfb.com
MFC after: 2 weeks


223441 22-Jun-2011 rmacklem

Plug an mbuf leak in the new NFS client that occurred when a
server replied NFS3ERR_JUKEBOX/NFS4ERR_DELAY to an rpc.
This affected both NFSv3 and NFSv4. Found during testing
at the recent NFSv4 interoperability Bakeathon.

MFC after: 2 weeks


223436 22-Jun-2011 rmacklem

Fix the new NFSv4 client so that it uses the same uid as
was used for doing a mount when performing system operations
on AUTH_SYS mounts. This resolved an issue when mounting
a Linux server. Found during testing at the recent
NFSv4 interoperability Bakeathon.

MFC after: 2 weeks


223373 21-Jun-2011 rmacklem

Fix the new NFSv4 server so that it checks for VREAD_ACL when
a client does a Getattr for an ACL and not VREAD_ATTRIBUTES.
This was found during the recent NFSv4 interoperability Bakeathon.

MFC after: 2 weeks


223349 20-Jun-2011 rmacklem

Fix the new NFSv4 server so that it only allows Lookup of
directories and symbolic links when traversing non-exported
file systems. Found during the recent NFSv4 interoperability
Bakeathon.

MFC after: 2 weeks


223348 20-Jun-2011 rmacklem

Fix the new NFSv4 server so that it allows Access and Readlink
operations while traversing non-exported file systems. This is
required for some non-FreeBSD clients to do NFSv4 mounts. Found during
the recent NFSv4 interoperability Bakeathon.

MFC after: 2 weeks


223312 19-Jun-2011 rmacklem

Fix a number of places where the new NFS server did not
lock the mutex when manipulating rc_flag in the DRC cache.
This is believed to fix a hung server that was reported
to the freebsd-fs@ list on June 9 under the subject heading
"New NFS server stress test hang", where all the threads
were waiting for the RC_LOCKED flag to clear.

Tested by: jwd at slowblink.com
MFC after: 2 weeks


223309 19-Jun-2011 rmacklem

Fix the kgssapi so that it can be loaded as a module. Currently
the NFS subsystems use five of the rpcsec_gss/kgssapi entry points,
but since it was not obvious which others might be useful, all
nineteen were included. Basically the nineteen entry points are
set in a structure called rpc_gss_entries and inline functions
defined in sys/rpc/rpcsec_gss.h check for the entry points being
non-NULL and then call them. A default value is returned otherwise.
Requested by rwatson.

Reviewed by: jhb
MFC after: 2 weeks


223280 18-Jun-2011 rmacklem

Add DTrace support to the new NFS client. This is essentially
cloned from the old NFS client, plus additions for NFSv4. A
review of this code is in progress, however it was felt by the
reviewer that it could go in now, before code slush. Any changes
required by the review can be committed as bug fixes later.


222722 05-Jun-2011 rmacklem

Add support for flock(2) locks to the new NFSv4 client. I think this
should be ok, since the client now delays NFSv4 Close operations
until VOP_INACTIVE()/VOP_RECLAIM(). As such, there should be no
risk that the NFSv4 Open is closed while an associated byte range lock
still exists.

Tested by: avg
MFC after: 2 weeks


222719 05-Jun-2011 rmacklem

The new NFSv4 client was erroneously using "p" instead of
"p_leader" for the "id" for POSIX byte range locking. I think
this would only have affected processes created by rfork(2)
with the RFTHREAD flag specified. This patch fixes that by
passing the "id" down through the various functions from
nfs_advlock().

MFC after: 2 weeks


222718 05-Jun-2011 rmacklem

Fix the new NFSv4 client so that it doesn't crash when
a mount is done for a VIMAGE kernel.

Tested by: glz at hidden-powers dot com
Reviewed by: bz
MFC after: 2 weeks


222663 04-Jun-2011 rmacklem

Modify the new NFS server so that the NFSv3 Pathconf RPC
doesn't return an error when the underlying file system
lacks support for any of the four _PC_xxx values used, by
falling back to default values.

Tested by: avg
MFC after: 2 weeks


222586 01-Jun-2011 kib

In the VOP_PUTPAGES() implementations, change the default error from
VM_PAGER_AGAIN to VM_PAGER_ERROR for the uwritten pages. Return
VM_PAGER_AGAIN for the partially written page. Always forward at least
one page in the loop of vm_object_page_clean().

VM_PAGER_ERROR causes the page reactivation and does not clear the
page dirty state, so the write is not lost.

The change fixes an infinite loop in vm_object_page_clean() when the
filesystem returns permanent errors for some page writes.

Reported and tested by: gavin
Reviewed by: alc, rmacklem
MFC after: 1 week


222540 31-May-2011 rmacklem

Fix the new NFS client so that it doesn't do an NFSv3
Pathconf RPC for cases where the reply doesn't include
the answer. This fixes a problem reported by avg@ where
the NFSv3 Pathconf RPC would fail when "ls -l" did an
lpathconf(2) for _PC_ACL_NFS4.

Tested by: avg
MFC after: 2 weeks


222389 27-May-2011 rmacklem

Fix the new NFS client so that it handles NFSv4 state
correctly during a forced dismount. This required that
the exclusive and shared (refcnt) sleep lock functions check
for MNTK_UMOUNTF before sleeping, so that they won't block
while nfscl_umount() is getting rid of the state. As
such, a "struct mount *" argument was added to the locking
functions. I believe the only remaining case where a forced
dismount can get hung in the kernel is when a thread is
already attempting to do a TCP connect to a dead server
when the krpc client structure called nr_client is NULL.
This will only happen just after a "mount -u" with options
that force a new TCP connection is done, so it shouldn't
be a problem in practice.

MFC after: 2 weeks


222329 26-May-2011 rmacklem

Add a check for MNTK_UNMOUNTF at the beginning of nfs_sync()
in the new NFS client so that a forced dismount doesn't
get stuck in the VFS_SYNC() call that happens before
VFS_UNMOUNT() in dounmount().
Additional changes are needed before forced dismounts will work.

MFC after: 2 weeks


222291 25-May-2011 rmacklem

Add some missing mutex locking to the new NFS client.

MFC after: 2 weeks


222289 25-May-2011 rmacklem

Fix the new NFS client so that it correctly sets the "must_commit"
argument for a write RPC when it succeeds for the first one and
fails for a subsequent RPC within the same call to the function.
This makes it compatible with the old NFS client for this case.

MFC after: 2 weeks


222233 23-May-2011 rmacklem

Set the MNT_NFS4ACLS flag for an NFSv4 client mount
if the NFSv4 server supports it. Requested by trasz.

MFC after: 2 weeks


222187 22-May-2011 alc

Eliminate duplicate #include's.


222167 22-May-2011 rmacklem

Add a lock flags argument to the VFS_FHTOVP() file system
method, so that callers can indicate the minimum vnode
locking requirement. This will allow some file systems to choose
to return a LK_SHARED locked vnode when LK_SHARED is specified
for the flags argument. This patch only adds the flag. It
does not change any file system to use it and all callers
specify LK_EXCLUSIVE, so file system semantics are not changed.

Reviewed by: kib


222075 18-May-2011 rmacklem

Add a sanity check for the existence of an "addr" option
to both NFS clients. This avoids the crash reported by
Sergey Kandaurov (pluknet@gmail.com) to the freebsd-fs@
list with subject "[old nfsclient] different nmount()
args passed from mount vs mount_nfs" dated May 17, 2011.

Tested by: pluknet at gmail.com (old nfs client)
MFC after: 2 weeks


221973 15-May-2011 rmacklem

Change the sysctl naming for the old and new NFS clients
to vfs.oldnfs.xxx and vfs.nfs.xxx respectively. This makes
the default nfs client use vfs.nfs.xxx after r221124.


221867 14-May-2011 jhb

Merge comments about converting directory entries to be more direct and
concise.

Inspired by: Gleb Kurtsou


221615 08-May-2011 rmacklem

Change the new NFS server so that it uses vfs.nfsd naming
for its sysctls instead of vfs.newnfs. This separates the
names from the ones used by the client.


221537 06-May-2011 rmacklem

Set the initial value of maxfilesize to OFF_MAX in the
new NFS client. It will then be reduced to whatever the
server says it can support. There might be an argument
that this could be one block larger, but since NFS is
a byte granular system, I chose not to do that.

Suggested by: Matt Dillon
Tested by: Daniel Braniss (earlier version)
MFC after: 2 weeks


221523 06-May-2011 mav

Increase NFS_TICKINTVL value from 10 to 500. Now that callout does useful
things only once per second, so other 99 calls per second were useless and
just don't allow idle system to sleep properly.

Reviewed by: rmacklem


221517 06-May-2011 rmacklem

Change the new NFS server so that it returns 0 when the f_bavail
or f_ffree fields of "struct statfs" are negative, since the
values that go on the wire are unsigned and will appear to be
very large positive values otherwise. This makes the handling
of a negative f_bavail compatible with the old/regular NFS server.

MFC after: 2 weeks


221467 05-May-2011 rmacklem

Fix the new NFS client so that it handles the 64bit fields
that are now in "struct statfs" for NFSv3 and NFSv4. Since
the ffiles value is uint64_t on the wire, I clip the value
to INT64_MAX to avoid setting f_ffree negative.

Tested by: kib
MFC after: 2 weeks


221462 04-May-2011 rmacklem

Add a comment noting that the NFS code assumes that the
values of error numbers in sys/errno.h will be the same
as the ones specified by the NFS RFCs and that the code
needs to be fixed if error numbers are changed in sys/errno.h.

Suggested by: Peter Jeremy
MFC after: 2 weeks


221439 04-May-2011 rmacklem

Add kernel support for NFSSVC_ZEROCLTSTATS and NFSSVC_ZEROSRVSTATS
so that they can be used by nfsstat(1) to implement the "-z" option
for the new NFS subsystem.

MFC after: 2 weeks


221438 04-May-2011 rmacklem

Revert r221306, since NFSSVC_ZEROSTATS zero'd both client and
server stats, when separate modifiers for NFSSVC_GETSTATS for
each of client and server stats is what it required by nfsstat(1).


221436 04-May-2011 ru

Implemented a mount option "nocto" that disables cache coherency
checking at open time. It may improve performance for read-only
NFS mounts. Use deliberately.

MFC after: 1 week
Reviewed by: rmacklem, jhb (earlier version)


221429 04-May-2011 ru

In ncl_printf(), call vprintf() instead of printf().

MFC after: 3 days


221306 01-May-2011 rmacklem

Add the kernel support needed to zero out the nfsstats
structure for the new NFS subsystem. This will be used
by nfsstats.c to implement the "-z" option.

MFC after: 2 weeks


221261 30-Apr-2011 kib

Clarify the comment.

MFC after: 1 week


221205 29-Apr-2011 rmacklem

The build was broken by r221190 for 64bit arches like amd64.
This patch fixes it.

MFC after: 2 weeks


221190 28-Apr-2011 rmacklem

Fix the new NFS client so that it handles the "nfs_args" value
in mnt_optnew. This is needed so that the old mount(2) syscall
works and that is needed so that amd(8) works. The code was
basically just cribbed from sys/nfsclient/nfs_vfsops.c with minor
changes. This patch is mainly to fix the new NFS client so that
amd(8) works with it. Thanks go to Craig Rodrigues for helping with
this.

Tested by: Craig Rodrigues (for amd)
MFC after: 2 weeks


221183 28-Apr-2011 jhb

Update a comment since ext2fs does not use SU.

Reviewed by: kib


221176 28-Apr-2011 jhb

The b_dep field of buffers is always empty for ext2fs, it is only used
for SU in FFS.

Reported by: kib


221166 28-Apr-2011 jhb

Sync with several changes in UFS/FFS:
- 77115: Implement support for O_DIRECT.
- 98425: Fix a performance issue introduced in 70131 that was causing
reads before writes even when writing full blocks.
- 98658: Rename the BALLOC flags from B_* to BA_* to avoid confusion with
the struct buf B_ flags.
- 100344: Merge the BA_ and IO_ flags so so that they may both be used in
the same flags word. This merger is possible by assigning the IO_ flags
to the low sixteen bits and the BA_ flags the high sixteen bits.
- 105422: Fix a file-rewrite performance case.
- 129545: Implement IO_INVAL in VOP_WRITE() by marking the buffer as
"no cache".
- Readd the DOINGASYNC() macro and use it to control asynchronous writes.
Change i-node updates to honor DOINGASYNC() instead of always being
synchronous.
- Use a PRIV_VFS_RETAINSUGID check instead of checking cr_uid against 0
directly when deciding whether or not to clear suid and sgid bits.

Submitted by: Pedro F. Giffuni giffunip at yahoo


221139 27-Apr-2011 rmacklem

Fix module names and dependencies so the NFS clients will
load correctly as modules after r221124.


221128 27-Apr-2011 jhb

Use a private EXT2_ROOTINO constant instead of redefining ROOTINO.

Submitted by: Pedro F. Giffuni giffunip at yahoo


221126 27-Apr-2011 jhb

Various style fixes including using uint*_t instead of u_int*_t.

Submitted by: Pedro F. Giffuni giffunip at yahoo


221124 27-Apr-2011 rmacklem

This patch changes head so that the default NFS client is now the new
NFS client (which I guess is no longer experimental). The fstype "newnfs"
is now "nfs" and the regular/old NFS client is now fstype "oldnfs".
Although mounts via fstype "nfs" will usually work without userland
changes, an updated mount_nfs(8) binary is needed for kernels built with
"options NFSCL" but not "options NFSCLIENT". Updated mount_nfs(8) and
mount(8) binaries are needed to do mounts for fstype "oldnfs".
The GENERIC kernel configs have been changed to use options
NFSCL and NFSD (the new client and server) instead of NFSCLIENT and NFSSERVER.
For kernels being used on diskless NFS root systems, "options NFSCL"
must be in the kernel config.
Discussed on freebsd-fs@.


221066 26-Apr-2011 rmacklem

Fix a kernel linking problem introduced by r221032, r221040
when building kernels that don't have "options NFS_ROOT"
specified. I plan on moving the functions that use these
data structures into the shared code in sys/nfs/nfs_diskless.c
in a future commit. At that time, these definitions will no
longer be needed in nfs_vfsops.c and nfs_clvfsops.c.

MFC after: 2 weeks


221040 25-Apr-2011 rmacklem

Modify the experimental (newnfs) NFS client so that it uses the
same diskless NFS root code as the regular client, which
was moved to sys/nfs by r221032. This fixes the newnfs
client so that it can do an NFSv3 diskless root file system.

MFC after: 2 weeks


221018 25-Apr-2011 rmacklem

Fix the experimental NFS client so that it does not bogusly
set the f_flags field of "struct statfs". This had the interesting
effect of making the NFSv4 mounts "disappear" after r221014,
since NFSMNT_NFSV4 and MNT_IGNORE became the same bit.

MFC after: 2 weeks


221014 25-Apr-2011 rmacklem

Modify the experimental NFS client so that it uses the same
"struct nfs_args" as the regular NFS client. This is needed
so that the old mount(2) syscall will work and it makes
sharing of the diskless NFS root code easier. Eary in the
porting exercise I introduced a new revision of nfs_args, but
didn't actually need it, thanks to nmount(2). I re-introduced the
NFSMNT_KERB flag, since it does essentially the same thing and
the old one would not have been used because it never worked.
I also added a few new NFSMNT_xxx flags to sys/nfsclient/nfs_args.h
that are used by the experimental NFS client.

MFC after: 2 weeks


220928 21-Apr-2011 rmacklem

Remove the nm_mtx mutex locking from the test for
nm_maxfilesize. This value rarely, if ever, changes
and the nm_mtx mutex is locked/unlocked earlier in
the function, which should be sufficient to avoid
getting a stale cached value for it. There is a
discussion w.r.t. what these tests should be, but
I've left them basically the same as the regular
NFS client for now.

Suggested by: pjd
MFC after: 2 weeks


220921 21-Apr-2011 rmacklem

Revert r220906, since the vp isn't always locked when
nfscl_request() is called. It will need a more involved
patch.


220906 20-Apr-2011 rmacklem

Add a check for VI_DOOMED at the beginning of nfscl_request()
so that it won't try and use vp->v_mount to do an RPC during
a forced dismount. There needs to be at least one more kernel
commit, plus a change to the umount(8) command before forced
dismounts will work for the experimental NFS client.

MFC after: 2 weeks


220877 20-Apr-2011 rmacklem

Modify the offset + size checks for read and write in the
experimental NFS client to take care of overflows for the calls
above the buffer cache layer in a manner similar to r220876.
Thanks go to dillon at apollo.backplane.com for providing the
snippet of code that does this.

MFC after: 2 weeks


220876 20-Apr-2011 rmacklem

Modify the offset + size checks for read and write in the
experimental NFS client to take care of overflows. Thanks
go to dillon at apollo.backplane.com for providing the
snippet of code that does this.

MFC after: 2 weeks


220810 19-Apr-2011 rmacklem

Fix up handling of the nfsmount structure in read and write
within the experimental NFS client. Mostly add mutex locking
and use the same rsize, wsize during the operation by keeping
a local copy of it. This is another change that brings it
closer to the regular NFS client.

MFC after: 2 weeks


220807 18-Apr-2011 rmacklem

Revert r220761 since, as kib@ pointed out, the case of
adding the check to nfsrpc_close() isn't useful. Also,
the check in nfscl_getcl() must be more involved, since
it needs to check before and after the acquisition of
the refcnt on nfsc_lock, while the mutex that protects
the client state data is held.


220764 18-Apr-2011 rmacklem

Add a vput() to nfs_lookitup() in the experimental NFS client
for a case that will probably never happen. It can only
happen if a server were to successfully lookup a file, but not
return attributes for that file. Although technically allowed
by the NFSv3 RFC, I doubt any server would ever do this.
However, if it did, the client would have not vput()'d the
new vnode when it needed to do so.

MFC after: 2 weeks


220763 18-Apr-2011 rmacklem

Add vput() calls in two places in the experimental NFS client
that would be needed if, in the future, nfscl_loadattrcache()
were to return an error. Currently nfscl_loadattrcache()
never returns an error, so these cases never currently happen.

MFC after: 2 weeks


220762 18-Apr-2011 rmacklem

Change the mutex locking for several locations in the
experimental NFS client's vnode op functions to make
them compatible with the regular NFS client. I'll admit
I'm not sure that the mutex locks around the assignments
are needed, but the regular client has them, so I added them.
Also, add handling of the case of partial attributes in
setattr to be compatible with the regular client.

MFC after: 2 weeks


220761 17-Apr-2011 rmacklem

Add checks for MNTK_UNMOUNTF at the beginning of three
functions, so that threads don't get stuck in them during
a forced dismount. nfs_sync/VFS_SYNC() needs this, since it is
called by dounmount() before VFS_UNMOUNT(). The nfscl_nget()
case makes sure that a thread doing an VOP_OPEN() or
VOP_ADVLOCK() call doesn't get blocked before attempting
the RPC. Attempting RPCs don't block, since they all
fail once a forced dismount is in progress.
The third one at the beginning of nfsrpc_close()
is done so threads don't get blocked while doing VOP_INACTIVE()
as the vnodes are cleared out.
With these three changes plus a change to the umount(1)
command so that it doesn't do "sync()" for the forced case
seem to make forced dismounts work for the experimental NFS
client.

MFC after: 2 weeks


220752 17-Apr-2011 rmacklem

Get rid of the "nfscl: consider increasing kern.ipc.maxsockbuf"
message that was generated when doing experimental NFS client
mounts. I put that message in because the krpc would hang with
the default size for mounts that used large rsize/wsize values.
Since the bug that caused these hangs was fixed by r213756,
I think the message is no longer needed.

MFC after: 2 weeks


220751 17-Apr-2011 rmacklem

Fix up some of the sysctls for the experimental NFS client so
that they use the same names as the regular client. Also add
string descriptions for them.

MFC after: 2 weeks


220739 17-Apr-2011 rmacklem

Change some defaults in the experimental NFS client to be the
same as the regular NFS client for NFSv3. The main one is making
use of a reserved port# the default. Also, set the retry limit
for TCP the same and fix the code so that it doesn't disable
readdirplus for NFSv4.

MFC after: 2 weeks


220735 17-Apr-2011 rmacklem

Fix readdirplus in the experimental NFS client so that it
skips over ".." to avoid a LOR race with nfs_lookup(). This
fix is analagous to r138256 in the regular NFS client.

MFC after: 2 weeks


220732 16-Apr-2011 rmacklem

Add a lktype flags argument to nfscl_nget() and ncl_nget() in the
experimental NFS client so that its nfs_lookup() function can use
cn_lkflags in a manner analagous to the regular NFS client.

MFC after: 2 weeks


220731 16-Apr-2011 rmacklem

Add mutex locking on the nfs node in ncl_inactive() for the
experimental NFS client.

MFC after: 2 weeks


220683 15-Apr-2011 rmacklem

Change the experimental NFS client so that it creates nfsiod
threads in the same manner as the regular NFS client after
r214026 was committed. This resolves the lors fixed by r214026
and its predecessors for the regular client.

Reviewed by: jhb
MFC after: 2 weeks


220648 14-Apr-2011 rmacklem

Fix the experimental NFSv4 server so that it uses VOP_PATHCONF()
to determine if a file system supports NFSv4 ACLs. Since
VOP_PATHCONF() must be called with a locked vnode, the function
is called before nfsvno_fillattr() and the result is passed in
as an extra argument.

MFC after: 2 weeks


220645 14-Apr-2011 rmacklem

Modify the experimental NFSv4 server so that it handles
crossing of server mount points properly. The functions
nfsvno_fillattr() and nfsv4_fillattr() were modified to
take the extra arguments that are the mount point, a flag
to indicate that it is a file system root and the mounted
on fileno. The mount point argument needs to be busy when
nfsvno_fillattr() is called, since the vp argument is not
locked.

Reviewed by: kib
MFC after: 2 weeks


220611 13-Apr-2011 rmacklem

Add VOP_PATHCONF() support to the experimental NFS client
so that it can, along with other things, report whether or
not NFS4 ACLs are supported.

MFC after: 2 weeks


220610 13-Apr-2011 rmacklem

Fix the experimental NFSv4 client so that it recognizes server
mount point crossings correctly. It was testing the wrong flag.
Also, try harder to make sure that the fsid is different than
the one assigned to the client mount point, by hashing the
server's fsid (just to create a different value deterministically)
when it is the same.

MFC after: 2 weeks


220546 11-Apr-2011 rmacklem

Vrele ni_startdir in the experimental NFS server for the case
of NFSv2 getting an error return from VOP_MKNOD(). Without this
patch, the server file system remains busy after an NFSv2
VOP_MKNOD() fails.

MFC after: 2 weeks


220530 10-Apr-2011 rmacklem

Add some cleanup code to the module unload operation for
the experimental NFS server, so that it doesn't leak memory
when unloaded. However, unloading the NFSv4 server is not
recommended, since all NFSv4 state will be lost by the unload
and clients will have to recover the state after a server
reload/restart as if the server crashed/rebooted.

MFC after: 2 weeks


220507 10-Apr-2011 rmacklem

Add a VOP_UNLOCK() for the directory, when that is not what
VOP_LOOKUP() returned. This fixes a bug in the experimental
NFS server for the case where VFS_VGET() fails returning EOPNOTSUPP
in the ReaddirPlus RPC, forcing the use of VOP_LOOKUP() instead.

MFC after: 2 weeks


220506 09-Apr-2011 kib

Linuxolator calls VOP_READDIR with ncookies pointer. Implement a
workaround for fdescfs to not panic when ncookies is not NULL, similar
to the one committed as r152254, but simpler, due to fdescfs_readdir()
not calling vfs_read_dirent().

PR: kern/156177
MFC after: 1 week


220400 06-Apr-2011 trasz

Add RACCT_NOFILE accounting.

Sponsored by: The FreeBSD Foundation
Reviewed by: kib (earlier version)


220152 30-Mar-2011 zack

This patch fixes the Experimental NFS client to properly deal with 32 bit or 64
bit fileid's in NFSv2 and NFSv3. Without this fix, invalid casting (and sign
extension) was creating problems for any fileid greater than 2^31.

We discovered this because we have test clusters with more than 2 billion
allocated files and 64-bit ino_t's (and friend structures).

Reviewed by: rmacklem
Approved by: zml (mentor)
MFC after: 2 weeks


220014 25-Mar-2011 kib

Report EBUSY instead of EROFS for attempt of deleting or renaming the
root directory of msdosfs mount. The VFS code would handle deletion
case itself too, assuming VV_ROOT flag is not lost. The msdosfs_rename()
should also note attempt to rename root via doscheckpath() or different
mount point check leading to EXDEV. Nonetheless, keep the checks for now.

The change is inspired by NetBSD change referenced in PR, but return
EBUSY like kern_unlinkat() does.

PR: kern/152079
MFC after: 1 week


219968 24-Mar-2011 jhb

Fix some locking nits with the p_state field of struct proc:
- Hold the proc lock while changing the state from PRS_NEW to PRS_NORMAL
in fork to honor the locking requirements. While here, expand the scope
of the PROC_LOCK() on the new process (p2) to avoid some LORs. Previously
the code was locking the new child process (p2) after it had locked the
parent process (p1). However, when locking two processes, the safe order
is to lock the child first, then the parent.
- Fix various places that were checking p_state against PRS_NEW without
having the process locked to use PROC_LOCK(). Every place was already
locking the process, just after the PRS_NEW check.
- Remove or reduce the use of PROC_SLOCK() for places that were checking
p_state against PRS_NEW. The PROC_LOCK() alone is sufficient for reading
the current state.
- Reorder fill_kinfo_proc() slightly so it only acquires PROC_SLOCK() once.

MFC after: 1 week


219028 25-Feb-2011 netchild

Add some FEATURE macros for various features (AUDIT/CAM/IPC/KTR/MAC/NFS/NTP/
PMC/SYSV/...).

No FreeBSD version bump, the userland application to query the features will
be committed last and can serve as an indication of the availablility if
needed.

Sponsored by: Google Summer of Code 2010
Submitted by: kibab
Reviewed by: arch@ (parts by rwatson, trasz, jhb)
X-MFC after: to be determined in last commit with code from this project


219012 24-Feb-2011 jhb

Use ffs() to locate free bits in the inode and block bitmaps rather than
loops with bit shifts.


218965 23-Feb-2011 brucec

Fix typos - remove duplicate "is".

PR: docs/154934
Submitted by: Eitan Adler <lists at eitanadler.com>
MFC after: 3 days


218949 22-Feb-2011 alc

Eliminate two dubious attempts at optimizing the implementation of a
file's last accessed, modified, and changed times:

TMPFS_NODE_ACCESSED and TMPFS_NODE_CHANGED should be set unconditionally
in tmpfs_remove() without regard to the number of hard links to the file.
Otherwise, after the last directory entry for a file has been removed, a
process that still has the file open could read stale values for the last
accessed and changed times with fstat(2).

Similarly, tmpfs_close() should update the time-related fields even if all
directory entries for a file have been removed. In this case, the effect
is that the time-related fields will have values that are later than
expected. They will correspond to the time at which fstat(2) is called.

In collaboration with: kib
MFC after: 1 week


218909 21-Feb-2011 brucec

Fix typos - remove duplicate "the".

PR: bin/154928
Submitted by: Eitan Adler <lists at eitanadler.com>
MFC after: 3 days


218863 19-Feb-2011 alc

tmpfs_remove() isn't modifying the file's data, so it shouldn't set
TMPFS_NODE_MODIFIED on the node.

PR: 152488
Submitted by: Anton Yuzhaninov
Reviewed by: kib
MFC after: 1 week


218757 16-Feb-2011 bz

Mfp4 CH=177274,177280,177284-177285,177297,177324-177325

VNET socket push back:
try to minimize the number of places where we have to switch vnets
and narrow down the time we stay switched. Add assertions to the
socket code to catch possibly unset vnets as seen in r204147.

While this reduces the number of vnet recursion in some places like
NFS, POSIX local sockets and some netgraph, .. recursions are
impossible to fix.

The current expectations are documented at the beginning of
uipc_socket.c along with the other information there.

Sponsored by: The FreeBSD Foundation
Sponsored by: CK Software GmbH
Reviewed by: jhb
Tested by: zec

Tested by: Mikolaj Golub (to.my.trociny gmail.com)
MFC after: 2 weeks


218681 14-Feb-2011 alc

Further simplify tmpfs_reg_resize(). Also, update its comments, including
style fixes.


218640 13-Feb-2011 alc

Eliminate tn_reg.tn_aobj_pages. Instead, correctly maintain the vm
object's size field. Previously, that field was always zero, even
when the object tn_reg.tn_aobj contained numerous pages.

Apply style fixes to tmpfs_reg_resize().

In collaboration with: kib


218438 08-Feb-2011 jhb

After reading a bitmap block for i-nodes or blocks, recheck the count of
free i-nodes or blocks to handle a race where another thread might have
allocated the last i-node or block while we were waiting for the buffer.

Tested by: dougb


218345 05-Feb-2011 alc

Unless "cnt" exceeds MAX_COMMIT_COUNT, nfsrv_commit() and nfsvno_fsync() are
incorrectly calling vm_object_page_clean(). They are passing the length of
the range rather than the ending offset of the range.

Perform the OFF_TO_IDX() conversion in vm_object_page_clean() rather than the
callers.

Reviewed by: kib
MFC after: 3 weeks


218273 04-Feb-2011 jhb

Collapse duplicate definitions of EXT2_SB().

Submitted by: Pedro F. Giffuni giffunip at yahoo


218190 02-Feb-2011 jhb

Fix build with DIAGNOSTIC enabled.

Pointy hat to: jhb


218176 01-Feb-2011 jhb

Some cosmetic fixes and remove a duplicate constant.

Submitted by: Pedro F. Giffuni giffunip at yahoo


218175 01-Feb-2011 jhb

- Set the next_alloc fields for an i-node after allocating a new block
so that future allocations start with most recently allocated block
rather than the beginning of the filesystem.
- Fix ext2_alloccg() to properly scan for 8 block chunks that are not
aligned on 8-bit boundaries. Previously this was causing new blocks
to be allocated in a highly fragmented fashion (block 0 of a file at
lbn N, block 1 at lbn N + 8, block 2 at lbn N + 16, etc.).
- Cosmetic tweaks to the currently-disabled fancy realloc sysctls.

PR: kern/153584
Discussed with: bde
Tested by: Pedro F. Giffuni giffunip at yahoo, Zheng Liu (lz)


217922 27-Jan-2011 gnn

Quick fix to a comment.


217896 26-Jan-2011 dchagin

Add macro to test the sv_flags of any process. Change some places to test
the flags instead of explicit comparing with address of known sysentvec
structures.

MFC after: 1 month


217703 21-Jan-2011 jhb

- Move special inode constants to ext2_dinode.h and rename them to match
NetBSD.
- Add a constant for the HASJOURNAL compat flag.

PR: kern/153584
Submitted by: Pedro F. Giffuni giffunip at yahoo


217702 21-Jan-2011 jhb

Restore support for the 'async' and 'sync' mount options lost when
switching to nmount(2). While here, sort the options.

PR: kern/153584
Submitted by: Pedro F. Giffuni giffunip at yahoo
MFC after: 1 week


217633 20-Jan-2011 kib

In tmpfs_readdir(), normalize handling of the directory entries that
either overflow the supplied buffer, or cause uiomove fail.
Do not advance cached de when directory entry was not copied out.
Do not return EOF when no entries could be copied due to first entry
too large for supplied buffer, signal EINVAL instead.

Reported by: Beat G?tzi <beat chruetertee ch>
MFC after: 1 week


217594 19-Jan-2011 jhb

Fix build with KDB defined.

Pointy hat to: jhb
Submitted by: jkim


217585 19-Jan-2011 jhb

Whitespace and style fixes.


217584 19-Jan-2011 jhb

Move calculation of 'bmask' earlier to match it's current location in
ufs_lookup().


217582 19-Jan-2011 jhb

Merge 118969 from UFS:
Eliminate the i_devvp field from the incore inodes, we can get the same
value from ip->i_ump->um_devvp.

Submitted by: Pedro F. Giffuni giffunip at yahoo
MFC after: 1 week


217535 18-Jan-2011 rmacklem

Fix the experimental NFSv4 server so that it uses VOP_ACCESSX()
to check for VREAD_ACL instead of VOP_ACCESS().

MFC after: 3 days


217432 14-Jan-2011 rmacklem

Modify the experimental NFSv4 server so that it posts a SIGUSR2
signal to the master nfsd daemon whenever the stable restart
file has been modified. This will allow the master nfsd daemon
to maintain an up to date backup copy of the file. This is
enabled via the nfssvc() syscall, so that older nfsd daemons
will not be signaled.

Reviewed by: jhb
MFC after: 1 week


217336 12-Jan-2011 zack

In the experimental NFS server, when converting an open-owner to a lock-owner,
start at sequence id 1 instead of 0, to match up with both Solaris and Linux.

Reviewed by: rmacklem
Approved by: zml (mentor)


217335 12-Jan-2011 zack

Clean up the experimental NFS server replay cache when the module is unloaded.

Reviewed by: rmacklem
Approved by: zml (mentor)


217176 09-Jan-2011 rmacklem

Modify readdirplus in the experimental NFS server in a
manner analogous to r216633 for the regular server. This
change busies the file system so that VFS_VGET() is
guaranteed to be using the correct mount point even
during a forced dismount attempt. Since nfsd_fhtovp() is
not called immediately before readdirplus, the patch is
actually a clone of pjd@'s nfs_serv.c.4.patch instead of
the one committed in r216633.

Reviewed by: kib
MFC after: 10 days


217066 06-Jan-2011 rmacklem

Delete the NFS_STARTWRITE() and NFS_ENDWRITE() macros that
obscured vn_start_write() and vn_finished_write() for the
old OpenBSD port, since most uses have been replaced by the
correct calls.

MFC after: 12 days


217063 06-Jan-2011 rmacklem

Since the VFS_LOCK_GIANT() code in the experimental NFS
server is broken and the major file systems are now all
mpsafe, modify the server so that it will only export
mpsafe file systems. This was discussed on freebsd-fs@
and removes a fair bit of crufty code.

MFC after: 12 days


217023 05-Jan-2011 rmacklem

Modify the experimental NFS server so that it calls
vn_start_write() with a non-NULL vp. That way it will
find the correct mount point mp and use that mp for the
subsequent vn_finished_write() call. Also, it should fail
without crashing if the mount point is being forced dismounted
because vn_start_write() will set the mp NULL via VOP_GETWRITEMOUNT().

Reviewed by: kib
MFC after: 12 days


217017 05-Jan-2011 rmacklem

Fix the experimental NFS server to use vfs_busyfs() instead
of vfs_getvfs() so that the mount point is busied for the
VFS_FHTOVP() call. This is analagous to r185432 for the
regular NFS server.

Reviewed by: kib
MFC after: 12 days


216931 03-Jan-2011 rmacklem

Fix the nlm so that it no longer depends on the regular
nfs client and, as such, can be loaded for the experimental
nfs client without the regular client.

Reviewed by: jhb
MFC after: 2 weeks


216898 03-Jan-2011 rmacklem

Fix the experimental NFS server so that it doesn't leak
a reference count on the directory when creating device
special files.

MFC after: 2 weeks


216897 03-Jan-2011 rmacklem

Modify the experimental NFSv4 server so that the lookup
ops return a locked vnode. This ensures that the associated mount
point will always be valid for the code that follows the operation.
Also add a couple of additional checks
for non-error to the other functions that create file objects.

MFC after: 2 weeks


216894 02-Jan-2011 rmacklem

Delete some cruft from the experimental NFS server that was
only used by the OpenBSD port for its pseudo-fs.

MFC after: 2 weeks


216893 02-Jan-2011 rmacklem

Add checks for VI_DOOMED and vn_lock() failures to the
experimental NFS server, to handle the case where an
exported file system is forced dismounted while an RPC
is in progress. Further commits will fix the cases where
a mount point is used when the associated vnode isn't locked.

Reviewed by: kib
MFC after: 2 weeks


216875 01-Jan-2011 rmacklem

Add support for shared vnode locks for the Read operation
in the experimental NFSv4 server.

Reviewed by: kib
MFC after: 2 weeks


216784 28-Dec-2010 rmacklem

Delete the nfsvno_localconflict() function in the experimental
NFS server since it is no longer used and is broken.

MFC after: 2 weeks


216700 25-Dec-2010 rmacklem

Modify the experimental NFS server so that it uses LK_SHARED
for RPC operations when it can. Since VFS_FHTOVP() currently
always gets an exclusively locked vnode and is usually called
at the beginning of each RPC, the RPCs for a given vnode will
still be serialized. As such, passing a lock type argument to
VFS_FHTOVP() would be preferable to doing the vn_lock() with
LK_DOWNGRADE after the VFS_FHTOVP() call.

Reviewed by: kib
MFC after: 2 weeks


216693 24-Dec-2010 rmacklem

Add an argument to nfsvno_getattr() in the experimental
NFS server, so that it can avoid calling VOP_ISLOCKED()
when the vnode is known to be locked. This will allow
LK_SHARED to be used for these cases, which happen to
be all the cases that can use LK_SHARED. This does not
fix any bug, but it reduces the number of calls to
VOP_ISLOCKED() and prepares the code so that it can be
switched to using LK_SHARED in a future patch.

Reviewed by: kib
MFC after: 2 weeks


216692 24-Dec-2010 rmacklem

Simplify vnode locking in the expeimental NFS server's
readdir functions. In particular, get rid of two bogus
VOP_ISLOCKED() calls. Removing the VOP_ISLOCKED() calls
is the only actual bug fixed by this patch.

Reviewed by: kib
MFC after: 2 weeks


216691 24-Dec-2010 rmacklem

Since VOP_READDIR() for ZFS does not return monotonically
increasing directory offset cookies, disable the UFS related
loop that skips over directory entries at the beginning of
the block for the experimental NFS server. This loop is
required for UFS since it always returns directory entries
starting at the beginning of the block that
the requested directory offset is in. In discussion with pjd@
and mckusick@ it seems that this behaviour of UFS should maybe
change, with this fix being an interim patch until then.
This patch only fixes the experimental server, since pjd@ is
working on a patch for the regular server.

Discussed with: pjd, mckusick
MFC after: 5 days


216510 17-Dec-2010 rmacklem

Fix two vnode locking problems in nfsd_recalldelegation() in the
experimental NFSv4 server. The first was a bogus use of VOP_ISLOCKED()
in a KASSERT() and the second was the need to lock the vnode for the
nfsrv_checkremove() call. Also, delete a "__unused" that was bogus,
since the argument is used.

Reviewed by: zack.kirsch at isilon.com
MFC after: 2 weeks


216462 15-Dec-2010 jh

Don't allow user created symbolic links to cover another entries marked
with DE_USER. If a devfs rule hid such entry, it was possible to create
infinite number of symbolic links with the same name.

Reviewed by: kib


216461 15-Dec-2010 jh

- Assert that dm_lock is exclusively held in devfs_rules_apply() and
in devfs_vmkdir() while adding the entry to de_list of the parent.
- Apply devfs rules to newly created directories and symbolic links.

PR: kern/125034
Submitted by: Mateusz Guzik (original version)


216391 12-Dec-2010 jh

Handle the special ruleset 0 in devfs_ruleset_use(). An attempt set the
current ruleset to 0 with command "devfs ruleset 0" triggered a KASSERT
in devfs_ruleset_create().

PR: kern/125030
Submitted by: Mateusz Guzik


216330 09-Dec-2010 rmacklem

Disable attempts to establish a callback connection from the
experimental NFSv4 server to a NFSv4 client when delegations are not
being issued, even if the client advertises a callback path.
This avoids a problem where a Linux client advertises a
callback path that doesn't work, due to a firewall, and then
times out an Open attempt before the FreeBSD server gives up
its callback connection attempt. (Suggested by
drb at karlov.mff.cuni.cz to fix the Linux client problem that
he reported on the fs-stable mailing list.)
The server should probably have
a 1sec timeout on callback connection attempts when there are
no delegations issued to the client, but that patch will require
changes to the krpc and this serves as a work around until then.

Tested by: drb at karlov.mff.cuni.cz
MFC after: 5 days


216128 02-Dec-2010 trasz

Replace pointer to "struct uidinfo" with pointer to "struct ucred"
in "struct vm_object". This is required to make it possible to account
for per-jail swap usage.

Reviewed by: kib@
Tested by: pho@
Sponsored by: FreeBSD Foundation


216120 02-Dec-2010 kib

For non-stopped threads, td_frame pointer is undefined. As a
consequence, fill_regs() and fill_fpregs() access random data, usually
on the thread kernel stack. Most often the td_frame points to the
previous frame saved by last kernel entry sequence, but this is not
guaranteed.

For /proc/<pid>/{regs,fpregs} read access, require the thread to be in
stopped state. Otherwise, return EBUSY as is done for write case.

Reported and tested by: pho
Approved by: des (procfs maintainer)
MFC after: 1 week


215548 19-Nov-2010 kib

Remove prtactive variable and related printf()s in the vop_inactive
and vop_reclaim() methods. They seems to be unused, and the reported
situation is normal for the forced unmount.

MFC after: 1 week
X-MFC-note: keep prtactive symbol in vfs_subr.c


215052 09-Nov-2010 jhb

Remove unused includes of <sys/mutex.h> and <machine/mutex.h>.


214513 29-Oct-2010 rmacklem

Modify nfs_open() in the experimental NFS client to be compatible
with the regular NFS client. Also, fix a couple of mutex lock issues.

MFC after: 1 week


214511 29-Oct-2010 rmacklem

Add a call for nfsrpc_close() to ncl_reclaim() in the experimental
NFSv4 client, since the call in ncl_inactive() might be missed
because VOP_INACTIVE() is not guaranteed to be called before
VOP_RECLAIM().

MFC after: 1 week


214406 26-Oct-2010 rmacklem

Add a flag to the experimental NFSv4 client to indicate when
delegations are being returned for reasons other than a Recall.
Also, re-organize nfscl_recalldeleg() slightly, so that it leaves
clearing NMODIFIED to the ncl_flush() call and invalidates the
attribute cache after flushing. It is hoped that these changes
might fix the problem others have seen when using the NFSv4
client with delegations enabled, since I can't reliably reproduce
the problem. These changes only affect the client when doing NFSv4
mounts with delegations enabled.

MFC after: 10 days


214255 23-Oct-2010 rmacklem

Modify the experimental NFSv4 server's file handle hash function
to use the generic hash32_buf() function. Although adding the
bytes seemed sufficient for UFS and ZFS, since most of the bytes
are the same for file handles on the same volume, this might not
be sufficient for other file systems. Use of a generic function
also seems preferable to one specific to NFSv4.

Suggested by: gleb.kurtsou at gmail.com
MFC after: 10 days


214224 22-Oct-2010 rmacklem

Modify the file handle hash function in the experimental NFS
server so that it will work better for non-UFS file systems.
The new function simply sums the bytes of the fh_fid field
of fhandle_t.

MFC after: 10 days


214149 21-Oct-2010 rmacklem

Modify the experimental NFS server in a manner analagous to
r214049 for the regular NFS server, so that it will not do
a VOP_LOOKUP() of ".." when at the root of a file system
when performing a ReaddirPlus RPC.

MFC after: 10 days


214053 19-Oct-2010 rmacklem

Fix the type of the 3rd argument for nm_getinfo so that it works
for architectures like sparc64.

Suggested by: kib
MFC after: 2 weeks


214048 19-Oct-2010 rmacklem

Modify the NFS clients and the NLM so that the NLM can be used
by both clients. Since the NLM uses various fields of the
nfsmount structure, those fields were extracted and put in a
separate nfs_mountcommon structure stored in sys/nfs/nfs_mountcommon.h.
This structure also has a function pointer for a function that
extracts the required information from the mount point and nfs vnode
for that particular client, for information stored differently by the
clients.

Reviewed by: jhb
MFC after: 2 weeks


214001 18-Oct-2010 kevlo

Fix a possible race where the directory dirent is moved to the location
that was used by ".." entry.
This change seems fixed panic during attempt to access msdosfs data
over nfs.

Reviewed by: kib
MFC after: 1 week


213771 13-Oct-2010 rpaulo

Ignore the return value of DE_INTERNALIZE().


213735 12-Oct-2010 avg

tmpfs + sendfile: do not produce partially valid pages for vnode's tail

See r213730 for details of analogous change in ZFS.

MFC after: 3 days


213725 12-Oct-2010 jh

Format prototypes to follow style(9) more closely.

Discussed with: kib, phk


213712 11-Oct-2010 rmacklem

Try and make the nfsrv_localunlock() function in the experimental
NFSv4 server more readable. Mostly changes to comments, but a
case of >= is changed to >, since == can never happen. Also, I've
added a couple of KASSERT()s and a slight optimization, since
once the "else if" case happens, subsequent locks in the list can't
have any effect. None of these changes fixes any known bug.

MFC after: 2 weeks


213664 10-Oct-2010 kib

The r184588 changed the layout of struct export_args, causing an ABI
breakage for old mount(2) syscall, since most struct <filesystem>_args
embed export_args. The mount(2) is supposed to provide ABI
compatibility for pre-nmount mount(8) binaries, so restore ABI to
pre-r184588.

Requested and reviewed by: bde
MFC after: 2 weeks


213543 08-Oct-2010 kib

Add a comment describing the reason for calling cache_purge(fvp).

Requested by: danfe
MFC after: 6 days


213508 07-Oct-2010 kib

The msdosfs lookup is case insensitive. Several aliases may be inserted for
a single directory entry. As a consequnce, name cache purge done by lookup
for fvp when DELETE op for namei is specified, might be not enough to
expunge all namecache entries that were installed for this direntry.

Explicitely call cache_purge(fvp) when msdosfs_rename() succeeded.

PR: kern/93634
MFC after: 1 week


213363 02-Oct-2010 alc

M_USE_RESERVE has been deprecated for a decade. Eliminate any uses that
have no run-time effect.


213221 27-Sep-2010 jh

Add a new function devfs_dev_exists() to be able to find out if a
specific devfs path already exists.

The function will be used from kern_conf.c to detect duplicate device
registrations. Callers must hold the devmtx mutex.

Reviewed by: kib


213215 27-Sep-2010 jh

Add reference counting for devfs paths containing user created symbolic
links. The reference counting is needed to be able to determine if a
specific devfs path exists. For true device file paths we can traverse
the cdevp_list but a separate directory list is needed for user created
symbolic links.

Add a new directory entry flag DE_USER to mark entries which should
unreference their parent directory on deletion.

A new function to traverse cdevp_list and the directory list will be
introduced in a separate commit.

Idea from: kib
Reviewed by: kib


212966 21-Sep-2010 jh

Modify devfs_fqpn() for future use in devfs path reference counting
code:

- Accept devfs_mount and devfs_dirent as the arguments instead of a
vnode. This generalizes the function so that it can be used from
contexts where vnode references are not available.
- Accept NULL cnp argument. No '/' will be appended, if a NULL cnp is
provided.
- Make the function global and add its prototype to devfs.h.

Reviewed by: kib


212834 19-Sep-2010 rmacklem

Fix nfsrv_freeallnfslocks() in the experimental NFSv4 server so that
it frees local locks correctly upon close. In order for
nfsrv_localunlock() to work correctly, the lock can no longer be in
the lockowner's stateid list. As such, nfsrv_freenfslock() has to
be called before nfsrv_localunlock(), to get rid of the lock structure
on the lockowner's stateid list. This only affected operation when
local locks (vfs.newnfs.enable_locallocks=1) are enabled, which is
not the default at this time.

MFC after: 1 week


212833 19-Sep-2010 rmacklem

Fix the experimental NFSv4 server so that it performs local VOP_ADVLOCK()
unlock operations correctly. It was passing in F_SETLK instead of
F_UNLCK as the operation for the unlock case. This only affected
operation when local locking (vfs.newnfs.enable_locallocks=1) was enabled.

MFC after: 1 week


212826 18-Sep-2010 jh

- For consistency, remove "." and ".." entries from de_dlist before
calling devfs_delete() (and thus possibly dropping dm_lock) in
devfs_rmdir_empty().
- Assert that we don't return doomed entries from devfs_find(). [1]

Suggested by: kib [1]
Reviewed by: kib


212660 15-Sep-2010 jh

Remove empty devfs directories automatically.

devfs_delete() now recursively removes empty parent directories unless
the DEVFS_DEL_NORECURSE flag is specified. devfs_delete() can't be
called anymore with a parent directory vnode lock held because the
possible parent directory deletion needs to lock the vnode. Thus we
unlock the parent directory vnode in devfs_remove() before calling
devfs_delete().

Call devfs_populate_vp() from devfs_symlink() and devfs_vptocnp() as now
directories can get removed.

Add a check for DE_DOOMED flag to devfs_populate_vp() because
devfs_delete() drops dm_lock before the VI_DOOMED vnode flag gets set.
This ensures that devfs_populate_vp() returns an error for directories
which are in progress of deletion.

Reviewed by: kib
Discussed on: freebsd-current (mostly silence)


212650 15-Sep-2010 avg

tmpfs, zfs + sendfile: mark page bits as valid after populating it with data

Otherwise, adding insult to injury, in addition to double-caching of data
we would always copy the data into a vnode's vm object page from backend.
This is specific to sendfile case only (VOP_READ with UIO_NOCOPY).

PR: kern/141305
Reported by: Wiktor Niesiobedzki <bsd@vink.pl>
Reviewed by: alc
Tested by: tools/regression/sockets/sendfile
MFC after: 2 weeks


212443 10-Sep-2010 rmacklem

This patch applies one of the two fixes suggested by
zack.kirsch at isilon.com for a race between nfsrv_freeopen()
and nfsrv_getlockfile() in the experimental NFS server that
he found during testing. Although nfsrv_freeopen() holds a
sleep lock on the lock file structure when called with
cansleep != 0, nfsrv_getlockfile() could still search the
list, once it acquired the NFSLOCKSTATE() mutex. I believe
that acquiring the mutex in nfsrv_freeopen() fixes the race.

MFC after: 2 weeks


212439 10-Sep-2010 rmacklem

Fix the NFSVNO_CMPFH() macro in the experimental NFS server so
that it works correctly for ZFS file handles. It is possible to
have two ZFS file handles that differ only in the bytes in the
fid_reserved field of the generic "struct fid" and comparing the
bytes in fid_data didn't catch this case. This patch changes the
macro to compare all bytes of "struct fid".

Tested by: gull at gull.us
MFC after: 2 weeks


212362 09-Sep-2010 rmacklem

Fix the experimental NFS client so that it doesn't panic when
NFSv2,3 byte range locking is attempted. A fix that allows the
nlm_advlock() to work with both clients is in progress, but
may take a while. As such, I am doing this commit so that
the kernel doesn't panic in the meantime.

Submitted by: jh
MFC after: 2 weeks


212305 07-Sep-2010 ivoras

Avoid "Entry can disappear before we lock fdvp" panic.

PR: 150143
Submitted by: Gleb Kurtsou <gk at FreeBSD.org>
Pretty sure it won't blow up: mckusick
MFC after: 2 weeks


212293 07-Sep-2010 jhb

Store the full timestamp when caching timestamps of files and
directories for purposes of validating name cache entries. This
closes races where two updates to a file or directory within the same
second could result in stale entries in the name cache. While here,
remove the 'n_expiry' field as it is no longer used.

Reviewed by: rmacklem
MFC after: 1 week


212221 05-Sep-2010 daichi

Allowed unionfs to use whiteout not supporting file system as
upper layer. Until now, unionfs prevents to use that kind of
file system as upper layer. This time, I changed to allow
that kind of file system as upper layer. By this change, you
can use whiteout not supporting file system (e.g., especially
for tmpfs) as upper layer. It's very useful for combination of
tmpfs as upper layer and read only file system as lower layer.

By difinition, without whiteout support from the file system
backing the upper layer, there is no way that delete and rename
operations on lower layer objects can be done. EOPNOTSUPP is
returned for this kind of operations as generated by VOP_WHITEOUT()
along with any others which would make modifica tions to the
lower layer, such as chmod(1).

This change is suggested by ed.

Submitted by: ed


212217 05-Sep-2010 rmacklem

Change the code in ncl_bioread() in the experimental NFS
client to return an error when rabp is not set, so it
behaves the same way as the regular NFS client for this
case. It does not affect NFSv4, since nfs_getcacheblk()
only fails for "intr" mounts and NFSv4 can't use the
"intr" mount option.

MFC after: 2 weeks


212216 05-Sep-2010 rmacklem

Disable use of the NLM in the experimental NFS client, since
it will crash the kernel because it uses the nfsmount and
nfsnode structures of the regular NFS client.

MFC after: 2 weeks


212079 01-Sep-2010 lulf

- Remove duplicate comment.

PR: kern/148820
Submitted by: pluknet <pluknet - at - gmail.com>


212043 31-Aug-2010 rmacklem

Add a null_remove() function to nullfs, so that the v_usecount
of the lower level vnode is incremented to greater than 1 when
the upper level vnode's v_usecount is greater than one. This
is necessary for the NFS clients, so that they will do a silly
rename of the file instead of actually removing it when the
file is still in use. It is "racy", since the v_usecount is
incremented in many places in the kernel with
minimal synchronization, but an extraneous silly rename is
preferred to not doing a silly rename when it is required.
The only other file systems that currently check the value
of v_usecount in their VOP_REMOVE() functions are nwfs and
smbfs. These file systems choose to fail a remove when the
v_usecount is greater than 1 and I believe will function
more correctly with this patch, as well.

Tested by: to.my.trociny at gmail.com
Submitted by: to.my.trociny at gmail.com (earlier version)
Reviewed by: kib
MFC after: 2 weeks


211953 28-Aug-2010 rmacklem

Add acquisition of a reference count on nfsv4root_lock to the
nfsd_recalldelegation() function, since this function is called
by nfsd threads when they are handling NFSv2 or NFSv3 RPCs, where
no reference count would have been acquired.

MFC after: 2 weeks


211951 28-Aug-2010 rmacklem

The timer routine in the experimental NFS server did not acquire
the correct mutex when checking nfsv4root_lock. Although this
could be fixed by adding mutex lock/unlock calls, zack.kirsch at
isilon.com suggested a better fix that uses a non-blocking
acquisition of a reference count on nfsv4root_lock. This fix
allows the weird NFSLOCKSTATE(); NFSUNLOCKSTATE(); synchronization
to be deleted. This patch applies this fix.

Tested by: zack.kirsch at isilon.com
MFC after: 2 weeks


211847 26-Aug-2010 jh

Set de_dir for user created symbolic links. This will be needed to be
able to resolve their parent directories.


211826 25-Aug-2010 trasz

Revert r210194, adding a comment explaining why calls to chgproccnt()
in unionfs are actually needed. I have a better fix in trasz_hrl p4 branch,
but now is not a good moment to commit it.

Reported by: Alex Kozlov


211816 25-Aug-2010 jh

Call devfs_populate_vp() from devfs_getattr(). It was possible that
fstat(2) returned stale information through an open file descriptor.


211628 22-Aug-2010 jh

Introduce and use devfs_populate_vp() to unlock a vnode before calling
devfs_populate(). This is a prerequisite for the automatic removal of
empty directories which will be committed in the future.

Reviewed by: kib (previous version)


211598 22-Aug-2010 ed

Add support for whiteouts on tmpfs.

Right now unionfs only allows filesystems to be mounted on top of
another if it supports whiteouts. Even though I have sent a patch to
daichi@ to let unionfs work without it, we'd better also add support for
whiteouts to tmpfs.

This patch implements .vop_whiteout and makes necessary changes to
lookup() and readdir() to take them into account. We must also make sure
that when adding or removing a file, we honour the componentname's
DOWHITEOUT and ISWHITEOUT, to prevent duplicate filenames.

MFC after: 1 month


211531 20-Aug-2010 jhb

Add dedicated routines to toggle lockmgr flags such as LK_NOSHARE and
LK_CANRECURSE after a lock is created. Use them to implement macros that
otherwise manipulated the flags directly. Assert that the associated
lockmgr lock is exclusively locked by the current thread when manipulating
these flags to ensure the flag updates are safe. This last change required
some minor shuffling in a few filesystems to exclusively lock a brand new
vnode slightly earlier.

Reviewed by: kib
MFC after: 3 days


211513 19-Aug-2010 jh

Call dev_rel() in error paths.

Reported by: kib
Reviewed by: kib
MFC after: 2 weeks


211226 12-Aug-2010 jh

Allow user created symbolic links to cover device files and directories
if the device file appears during or after the link creation.

User created symbolic links are now inserted at the head of the
directory entry list after the "." and ".." entries. A new directory
entry flag DE_COVERED indicates that an entry is covered by a symbolic
link.

PR: kern/114057
Reviewed by: kib
Idea from: kib
Discussed on: freebsd-current (mostly silence)


210997 07-Aug-2010 rwatson

Properly bounds check ioctl/pioctl data arguments for Coda:

1. Use unsigned rather than signed lengths
2. Bound messages to/from Venus to VC_MAXMSGSIZE
3. Bound messages to/from general user processes to VC_MAXDATASIZE
4. Update comment regarding data limits for pioctl

Without (1) and (3), it may be possible for unprivileged user processes to
read sensitive portions of kernel memory. This issue is only present if
the Coda kernel module is loaded and venus (the userspace Coda daemon) is
running and has /coda mounted.

As Coda is considered experimental and production use is warned against in
the coda(4) man page, and because Coda must be explicitly configured for a
configuration to be vulnerable, we won't be issuing a security advisory.
However, if you are using Coda, then you are advised to apply these fixes.

Reported by: Dan J. Rosenberg <drosenberg at vsecurity.com>
Obtained from: NetBSD (Christos Zoulas)
Security: Kernel memory disclosure; no advisory as feature experimental
MFC after: 3 days


210925 06-Aug-2010 kib

Enable shared lookups and externed shared ops for devfs.

In collaboration with: pho
MFC after: 1 month


210923 06-Aug-2010 kib

Add new make_dev_p(9) flag MAKEDEV_ETERNAL to inform devfs that created
cdev will never be destroyed. Propagate the flag to devfs vnodes as
VV_ETERNVALDEV. Use the flags to avoid acquiring devmtx and taking a
thread reference on such nodes.

In collaboration with: pho
MFC after: 1 month


210921 06-Aug-2010 kib

Enable shared locks for the devfs vnodes. Honor the locking mode
requested by lookup(). This should be a nop at the moment.

In collaboration with: pho
MFC after: 1 month


210918 06-Aug-2010 kib

Initialize VV_ISTTY vnode flag on the devfs vnode creation instead of
doing it on each open.

In collaboration with: pho
MFC after: 1 month


210786 03-Aug-2010 rmacklem

Modify the return value for nfscl_mustflush() from boolean_t,
which I mistakenly thought was correct w.r.t. style(9), back
to int and add the checks for != 0. This is just a stylistic
modification.

MFC after: 1 week


210455 24-Jul-2010 rmacklem

Move sys/nfsclient/nfs_lock.c into sys/nfs and build it as a separate
module that can be used by both the regular and experimental nfs
clients. This fixes the problem reported by jh@ where /dev/nfslock
would be registered twice when both nfs clients were used.
I also defined the size of the lm_fh field to be the correct value,
as it should be the maximum size of an NFSv3 file handle.

Reviewed by: jh
MFC after: 2 weeks


210268 19-Jul-2010 rmacklem

For the experimental NFSv4 server's dumplocks operation, add the
MPSAFE flag to cn_flags so that it doesn't panic. The panics weren't
seen since nfsdumpstate(8) is broken for the "-l" case, so this
was never done. I'll do a separate commit to fix nfsdumpstate(8).

Submitted by: zack.kirsch at isilon.com
MFC after: 2 weeks


210227 18-Jul-2010 rmacklem

Add a call to nfscl_mustflush() in nfs_close() of the experimental
NFSv4 client, so that attributes are not acquired from the server
when a delegation for the file is held. This can reduce the number
of Getattr Ops significantly.

MFC after: 2 weeks


210213 18-Jul-2010 trasz

Fix build.

Submitted by: Andreas Tobler <andreast-list at fgznet.ch>


210201 18-Jul-2010 rmacklem

Change the nfscl_mustflush() function in the experimental NFSv4
client to return a boolean_t in order to make it more compatible
with style(9).

MFC after: 2 weeks


210194 17-Jul-2010 trasz

Remove updating process count by unionfs. It serves no purpose, unionfs just
needs root credentials for a moment.


210178 16-Jul-2010 rmacklem

Patch the experimental NFSv4 server so that it acquires a reference
count on nfsv4rootfs_lock when dumping state, since these functions
are not called by nfsd threads. Without this reference count, it
is possible for an nfsd thread to acquire an exclusive lock on
nfsv4rootfs_lock while the dump is in progress and then change the
lists, potentially causing a crash.

Reported by: zack.kirsch at isilon.com
MFC after: 2 weeks


210172 16-Jul-2010 jhb

Revert the previous commit. The race is not applicable to the lockmgr
implementation in 8.0 and later as its flags field does not hold dynamic
state such as waiters flags, but is only modified in lockinit() aside
from VN_LOCK_*().

Discussed with: attilio


210171 16-Jul-2010 jhb

When the MNTK_EXTENDED_SHARED mount option was added, some filesystems were
changed to defer the setting of VN_LOCK_ASHARE() (which clears LK_NOSHARE
in the vnode lock's flags) until after they had determined if the vnode was
a FIFO. This occurs after the vnode has been inserted a VFS hash or some
similar table, so it is possible for another thread to find this vnode via
vget() on an i-node number and block on the vnode lock. If the lockmgr
interlock (vnode interlock for vnode locks) is not held when clearing the
LK_NOSHARE flag, then the lk_flags field can be clobbered. As a result
the thread blocked on the vnode lock may never get woken up. Fix this by
holding the vnode interlock while modifying the lock flags in this case.

MFC after: 3 days


210154 16-Jul-2010 rmacklem

Delete comments related to soft clock interrupts that don't apply
to the FreeBSD port of the experimental NFSv4 server.

Submitted by: zack.kirsch at isilon.com
MFC after: 2 weeks


210136 15-Jul-2010 jhb

Retire the NFS access cache timestamp structure. It was used in VOP_OPEN()
to avoid sending multiple ACCESS/GETATTR RPCs during a single open()
between VOP_LOOKUP() and VOP_OPEN(). Now we always send the RPC in
VOP_LOOKUP() and not VOP_OPEN() in the cases that multiple RPCs could be
sent.

MFC after: 2 weeks


210135 15-Jul-2010 jhb

Merge 208603, 209946, and 209948 to the new NFS client:
Move attribute cache flushes from VOP_OPEN() to VOP_LOOKUP() to provide
more graceful recovery for stale filehandles and eliminate the need for
conditionally clearing the attribute cache in the !NMODIFIED case in
VOP_OPEN().

Reviewed by: rmacklem
MFC after: 2 weeks


210102 15-Jul-2010 rmacklem

This patch fixes a bug in the experimental NFSv4 server where it
released a reference count on nfsv4rootfs_lock erroneously when
administrative revocation of state was done.

Submitted by: zack.kirsch at isilon.com
MFC after: 2 weeks


210034 13-Jul-2010 rmacklem

For the experimental NFSv4 client, make sure that attributes that
predate the issue of a delegation are not cached once the delegation
is held. This is necessary, since cached attributes remain valid
while the delegation is held.

MFC after: 2 weeks


210032 13-Jul-2010 rmacklem

For the experimental NFSv4 client, do not use cached attributes
that were invalidated, even when a delegation for the file is held.

MFC after: 2 weeks


210030 13-Jul-2010 rmacklem

Fix a bogus comment that mentions lru lists that don't exist.

Reported by: zack.kirsch at isilon.com
MFC after: 2 weeks


209425 22-Jun-2010 avg

udf_vnops: cosmetic followup to r208671 - better looking code

Suggested by: jhb
MFC after: 3 days


209320 18-Jun-2010 alc

Eliminate unnecessary page queues locking.


209226 16-Jun-2010 alc

Eliminate unnecessary page queues locking.


209191 15-Jun-2010 rmacklem

Add MODULE_DEPEND() macros to the experimental NFS client and
server so that the modules will load when kernels are built with
none of the NFS* configuration options specified. I believe this
resolves the problems reported by PR kern/144458 and the email on
freebsd-stable@ posted by Dmitry Pryanishnikov on June 13.

Tested by: kib
PR: kern/144458
Reviewed by: kib
MFC after: 1 week


209120 13-Jun-2010 kib

In NFS clients, instead of inconsistently using #ifdef
DIAGNOSTIC and #ifndef DIAGNOSTIC for debug assertions, prefer
KASSERT(). Also change one #ifdef DIAGNOSTIC in the new nfs server.

Submitted by: Mikolaj Golub <to.my.trociny gmail com>
MFC after: 2 weeks


209062 11-Jun-2010 avg

fix a few cases where a string is passed via format argument instead of
via %s

Most of the cases looked harmless, but this is done for the sake of
correctness. In one case it even allowed to drop an intermediate buffer.

Found by: clang
MFC after: 2 week


208951 09-Jun-2010 jh

Add a new function devfs_parent_dirent() for resolving devfs parent
directory entry. Use the new function in devfs_fqpn(), devfs_lookupx()
and devfs_vptocnp() instead of manually resolving the parent entry.

Reviewed by: kib


208717 01-Jun-2010 jh

Don't try to call cdevsw d_close() method when devfs_close() is called
because of insmntque1() failure.

Found with: stress2
Suggested and reviewed by: kib


208671 31-May-2010 avg

udf_readlink: fix malloc call with uninitialized size parameter

Found by: clang static analyzer
MFC after: 4 days


208254 18-May-2010 rmacklem

Allow the experimental NFSv4 client to use cached attributes
when a write delegation is held. Also, add a missing
mtx_unlock() call for the ACL debugging code.

MFC after: 5 days


208234 18-May-2010 rmacklem

Add a sanity check for a negative args.fhsize to the experimental
NFS client.

MFC after: 5 days


208128 16-May-2010 kib

Disable bypass for the vop_advlockpurge(). The vop is called after
vop_revoke(), the v_data is already destroyed.

Reported and tested by: ed


207848 10-May-2010 kib

The thread_unsuspend() requires both process mutex and process spinlock
locked. Postpone the process unlock till the thread_unsuspend() is called.

Approved by: des (procfs maintainer)
MFC after: 1 week


207847 10-May-2010 kib

For detach procfs ctl command, also clear P_STOPPED_TRACE process stop
flag, and for each thread, TDB_SUSPEND debug flag, same as it is done by
exit1() for orphaned debugee.

Approved by: des (procfs maintainer)
MFC after: 1 week


207785 08-May-2010 rmacklem

Fix typos in macros.

PR: kern/146375
Submitted by: simon AT comsys.ntu-kpi.kiev.ua
MFC after: 1 week


207764 08-May-2010 rmacklem

Patch the experimental NFS client so that it works for NFSv2
by adding the necessary mapping from NFSv3 procedure numbers
to NFSv2 procedure numbers when doing NFSv2 RPCs.

MFC after: 1 week


207746 07-May-2010 alc

Push down the page queues lock into vm_page_activate().


207729 06-May-2010 kib

Add MAKEDEV_NOWAIT flag to make_dev_credf(9), to create a device node
in a no-sleep context. If resource allocation cannot be done without
sleep, make_dev_credf() fails and returns NULL.

Reviewed by: jh
MFC after: 2 weeks


207728 06-May-2010 alc

Eliminate page queues locking around most calls to vm_page_free().


207719 06-May-2010 trasz

Style fixes and removal of unneeded variable.

Submitted by: bde@


207669 05-May-2010 alc

Acquire the page lock around all remaining calls to vm_page_free() on
managed pages that didn't already have that lock held. (Freeing an
unmanaged page, such as the various pmaps use, doesn't require the page
lock.)

This allows a change in vm_page_remove()'s locking requirements. It now
expects the page lock to be held instead of the page queues lock.
Consequently, the page queues lock is no longer required at all by callers
to vm_page_rename().

Discussed with: kib


207662 05-May-2010 trasz

Move checking against RLIMIT_FSIZE into one place, vn_rlimit_fsize().

Reviewed by: kib


207644 05-May-2010 alc

Push down the acquisition of the page queues lock into vm_page_unwire().

Update the comment describing which lock should be held on entry to
vm_page_wire().

Reviewed by: kib


207584 03-May-2010 kib

Lock the page around vm_page_activate() and vm_page_deactivate() calls
where it was missed. The wrapped fragments now protect wire_count with
page lock.

Reviewed by: alc


207573 03-May-2010 alc

Acquire the page lock around vm_page_unwire() and vm_page_wire().

Reviewed by: kib


207530 02-May-2010 alc

It makes no sense for vm_page_sleep_if_busy()'s helper, vm_page_sleep(),
to unconditionally set PG_REFERENCED on a page before sleeping. In many
cases, it's perfectly ok for the page to disappear, i.e., be reclaimed by
the page daemon, before the caller to vm_page_sleep() is reawakened.
Instead, we now explicitly set PG_REFERENCED in those cases where having
the page persist until the caller is awakened is clearly desirable. Note,
however, that setting PG_REFERENCED on the page is still only a hint,
and not a guarantee that the page should persist.


207350 28-Apr-2010 rmacklem

For the experimental NFS client, it should always flush dirty
buffers before closing the NFSv4 opens, as the comment states.
This patch deletes the call to nfscl_mustflush() which would
return 0 for the case where a delegation still exists, which
was incorrect and could cause crashes during recovery from
an expired lease.

MFC after: 1 week


207349 28-Apr-2010 rmacklem

Delete a diagnostic statement that is no longer useful from
the experimental NFS client.

MFC after: 1 week


207170 24-Apr-2010 rmacklem

An NFSv4 server will reply NFSERR_GRACE for non-recovery RPCs
during the grace period after startup. This grace period must
be at least the lease duration, which is typically 1-2 minutes.
It seems prudent for the experimental NFS client to wait a few
seconds before retrying such an RPC, so that the server isn't
flooded with non-recovery RPCs during recovery. This patch adds
an argument to nfs_catnap() to implement a 5 second delay
for this case.

MFC after: 1 week


207082 22-Apr-2010 rmacklem

When the experimental NFS client is handling an NFSv4 server reboot
with delegations enabled, the recovery could fail if the renew
thread is trying to return a delegation, since it will not do the
recovery. This patch fixes the above by having nfscl_recalldeleg()
fail with the I/O operations returning EIO, so that they will be
attempted later. Most of the patch consists of adding an argument
to various functions to indicate the delegation recall case where
this needs to be done.

MFC after: 1 week


206894 20-Apr-2010 kib

The cache_enter(9) function shall not be called for doomed dvp.
Assert this.

In the reported panic, vdestroy() fired the assertion "vp has namecache
for ..", because pseudofs may end up doing cache_enter() with reclaimed
dvp, after dotdot lookup temporary unlocked dvp.
Similar problem exists in ufs_lookup() for "." lookup, when vnode
lock needs to be upgraded.

Verify that dvp is not reclaimed before calling cache_enter().

Reported and tested by: pho
Reviewed by: kan
MFC after: 2 weeks


206880 20-Apr-2010 rmacklem

For the experimental NFS client doing an NFSv4 mount,
set the NFSCLFLAGS_RECVRINPROG while doing recovery from an expired
lease in a manner similar to r206818 for server reboot recovery.
This will prevent the function that acquires stateids for I/O
operations from acquiring out of date stateids during recovery.
Also, fix up mutex locking on the nfsc_flags field.

MFC after: 1 week


206818 18-Apr-2010 rmacklem

Avoid extraneous recovery cycles in the experimental NFS client
when an NFSv4 server reboots, by doing two things.
1 - Make the function that acquires a stateid for I/O operations
block until recovery is complete, so that it doesn't acquire
out of date stateids.
2 - Only allow a recovery once every 1/2 of a lease duration, since
the NFSv4 server must provide a recovery grace period of at
least a lease duration. This should avoid recoveries caused
by an out of date stateid that was acquired for an I/O op.
just before a recovery cycle started.

MFC after: 1 week


206698 16-Apr-2010 jh

Revert r206560. The change doesn't work correctly in all cases with
multiple devfs mounts.


206690 15-Apr-2010 rmacklem

Add mutex lock calls to 2 cases in the experimental NFS client's
renew thread where they were missing.

MFC after: 1 week


206688 15-Apr-2010 rmacklem

The experimental NFS client was not filling in recovery credentials
for opens done locally in the client when a delegation for the file
was held. This could cause the client to crash in crsetgroups() when
recovering from a server crash/reboot. This patch fills in the
recovery credentials for this case, in order to avoid the client crash.
Also, add KASSERT()s to the credential copy functions, to catch any
other cases where the credentials aren't filled in correctly.

MFC after: 1 week


206560 13-Apr-2010 jh

- Ignore and report duplicate and empty device names in devfs_populate_loop()
instead of causing erratic behavior. Currently make_dev(9) can't fail, so
there is no way to report an error to make_dev(9) callers.
- Disallow using "." and ".." in device path names. It didn't work previously
but now it is reported rather than panicing.
- Treat multiple sequential slashes as single in device path names.

Discussed with: pjd


206361 07-Apr-2010 joel

Switch to our preferred 2-clause BSD license.

Approved by: bp


206236 06-Apr-2010 rmacklem

Harden the experimental NFS server a little, by adding range
checks on the length of the client's open/lock owner name. Also,
add free()'s for one case where they were missing and would
have caused a leak if NFSERR_BADXDR had been replied. Probably
never happens, but the leak is now plugged, just in case.

MFC after: 2 weeks


206210 05-Apr-2010 rwatson

Synchronize Coda kernel module definitions in our coda.h to Coda 6's
coda.h:

- CodaFid typdef -> struct CodaFid throughout.
- Use unsigned int instead of unsigned long for venus_dirent and other
cosmetic fixes.
- Introduce cuid_t and cgid_t and use instead of uid_t and gid_t in RPCs.
- Synchronize comments and macros.
- Use u_int32_t instead of unsigned long for coda_out_hdr.

With these changes, a 64-bit Coda kernel module now works with
coda6_client, whereas previous userspace and kernel versions of RPCs
differed sufficiently to prevent using the file system. This has been
verified only with casual testing, but /coda is now usable for at least
basic operations on amd64.

MFC after: 1 week


206206 05-Apr-2010 rwatson

Correct definition of CIOC_KERNEL_VERSION Coda ioctl() for systems
where sizeof(int) != sizeof(sizeof(int)), or the ioctl will return
EINVAL.

MFC after: 3 days


206170 04-Apr-2010 rmacklem

Harden the experimental NFS server a little, by adding extra checks
in the readdir functions for non-positive byte count arguments.
For the negative case, set it to the maximum allowable, since it
was actually a large positive value (unsigned) on the wire.
Also, fix up the readdir function comment a bit.

Suggested by: dillon AT apollo.backplane.com
MFC after: 2 weeks


206098 02-Apr-2010 avg

mountmsdosfs: reject too high value of bytes per cluster

Bytes per cluster are calcuated as bytes per sector times sectors per
cluster. Too high value can overflow an internal variable with type
that can hold only values in valid range. Trying to use a wider type
results in an attempt to read more than MAXBSIZE at once, a panic.
Unfortunately, it is FreeBSD newfs_msdos that produces filesystems
with invalid parameters for certain types of media.

Reported by: Fabian Keil <freebsd-listen@fabiankeil.de>,
Paul B. Mahol <onemda@gmail.com>
Discussed with: bde, kib
MFC after: 1 week
X-ToDo: fix newfs_msdos


206093 02-Apr-2010 kib

Add function vop_rename_fail(9) that performs needed cleanup for locks
and references of the VOP_RENAME(9) arguments. Use vop_rename_fail()
in deadfs_rename().

Tested by: Mikolaj Golub
MFC after: 1 week


206063 02-Apr-2010 rmacklem

For the experimental NFS server, add a call to free the lookup
path buffer for one case where it was missing when doing mkdir.
This could have conceivably resulted in a leak of a buffer, but
a leak was never observed during testing, so I suspect it would
have occurred rarely, if ever, in practice.

MFC after: 2 weeks


206061 02-Apr-2010 rmacklem

Add SAVENAME to the cn_flags for all cases in the experimental
NFS server for the CREATE cn_nameiop where SAVESTART isn't set.
I was not aware that this needed to be done by the caller until
recently.

Tested by: lampa AT fit.vutbr.cz (link case)
Submitted by: lampa AT fit.vutbr.cz (link case)
MFC after: 2 weeks


205941 30-Mar-2010 rmacklem

This patch should fix handling of byte range locks locally
on the server for the experimental nfs server. When enabled
by setting vfs.newnfs.locallocks_enable to non-zero, the
experimental nfs server will now acquire byte range locks
on the file on behalf of NFSv4 clients, such that lock
conflicts between the NFSv4 clients and processes running
locally on the server, will be recognized and handled correctly.

MFC after: 2 weeks


205663 26-Mar-2010 rmacklem

Patch the experimental NFS server in a manner analagous to r205661
for the regular NFS server, to ensure that ESTALE is
returned to the client for all errors returned by VFS_FHTOVP().

MFC after: 2 weeks


205572 24-Mar-2010 rmacklem

Fix the experimental NFS subsystem so that it uses the correct
preprocessor macro name for not requiring strict data alignment.

Suggested by: marius
MFC after: 2 weeks


205223 16-Mar-2010 jkim

Fix a long standing regression of readdir(3) in fdescfs(5) introduced
in r1.48. We were stopping at the first null pointer when multiple file
descriptors were opened and one in the middle was closed. This restores
traditional behaviour of fdescfs.

MFC after: 3 days


205014 11-Mar-2010 nwhitehorn

Provide groundwork for 32-bit binary compatibility on non-x86 platforms,
for upcoming 64-bit PowerPC and MIPS support. This renames the COMPAT_IA32
option to COMPAT_FREEBSD32, removes some IA32-specific code from MI parts
of the kernel and enhances the freebsd32 compatibility code to support
big-endian platforms.

Reviewed by: kib, jhb


205010 11-Mar-2010 rwatson

Update nfsrv_getsocksndseq() for changes in TCP internals since FreeBSD 6.x:

- so_pcb is now guaranteed to be non-NULL and valid if a valid socket
reference is held.

- Need to check INP_TIMEWAIT and INP_DROPPED before assuming inp_ppcb is a
tcpcb, as it might be a tcptw or NULL otherwise.

- tp can never be NULL by the end of the function, so only check
TCPS_ESTABLISHED before extracting tcpcb fields.

The NFS server arguably incorporates too many assumptions about TCP
internals, but fixing that is left for nother day.

MFC after: 1 week
Reviewed by: bz
Reviewed and tested by: rmacklem
Sponsored by: Juniper Networks


204675 03-Mar-2010 kib

When returning error from msdosfs_lookup(), make sure that *vpp is NULL.
lookup() KASSERTs this condition.

Reported and tested by: pho
MFC after: 3 weeks


204589 02-Mar-2010 kib

Do not leak vnode lock when msdosfs mount is updated and specified
device is different from the device used to the original mount.

Note that update_mp does not need devvp locked, and pmp->pm_devvp cannot
be freed meantime.

Reported and tested by: pho
MFC after: 3 weeks


204576 02-Mar-2010 kib

Only destroy pm_fatlock on error if it was initialized.

MFC after: 3 weeks


204475 28-Feb-2010 kib

Mark msdosfs as mpsafe.

Tested by: pho
MFC after: 3 weeks


204474 28-Feb-2010 kib

Fix the race between dotdot lookup and forced unmount, by using
msdosfs-specific variant of vn_vget_ino(), msdosfs_deget_dotdot().

As was done for UFS, relookup the dotdot denode after the call to
msdosfs_deget_dotdot(), because vnode lock is dropped and directory
might be moved.

Tested by: pho
MFC after: 3 weeks


204473 28-Feb-2010 kib

Use pm_fatlock to protect per-filesystem rb tree used to allocate fileno
on the large FAT volumes. Previously, a single global mutex was used.

Tested by: pho
MFC after: 3 weeks


204472 28-Feb-2010 kib

Add assertions for FAT bitmap state.

Tested by: pho
MFC after: 3 weeks


204471 28-Feb-2010 kib

Use pm_fatlock to protect fat bitmap.

Tested by: pho
MFC after: 3 weeks


204470 28-Feb-2010 kib

Add per-mountpoint lockmgr lock for msdosfs. It is intended to be used
as fat bitmap lock and to replace global mutex protecting fileno rbtree.

Tested by: pho
MFC after: 3 weeks


204469 28-Feb-2010 kib

In msdosfs deget(), properly handle the case when the vnode is found in hash.

Tested by: pho
MFC after: 3 weeks


204468 28-Feb-2010 kib

In msdosfs_inactive(), reclaim the vnodes both for SLOT_DELETED and
SLOT_EMPTY deName[0] values. Besides conforming to FAT specification, it
also clears the issue where vfs_hash_insert found the vnode in hash, and
newly allocated vnode is vput()ed. There, deName[0] == 0, and vnode is
not reclaimed, indefinitely kept on mountlist.

Tested by: pho
MFC after: 3 weeks


204467 28-Feb-2010 kib

Remove seemingly unneeded unlock/relock of the dvp in msdosfs_rmdir,
causing LOR.

Reported and tested by: pho
MFC after: 3 weeks


204466 28-Feb-2010 kib

Assert that the msdosfs vnode is (e)locked in several places.
The plan is to use vnode lock to protect denode and fat cache,
and having separate lock for block use map.

Change the check and return on impossible condition into KASSERT().

Tested by: pho
MFC after: 3 weeks


204465 28-Feb-2010 kib

Remove unused global statistic about fat cache usage.

Tested by: pho
MFC after: 3 weeks


204111 20-Feb-2010 uqs

Fix common misspelling of hierarchy

Pointed out by: bf1783 at gmail
Approved by: np (cxgb), kientzle (tar, etc.), philip (mentor)


203866 14-Feb-2010 kib

Invalid filesystem might cause the bp to be never read.

Noted by: Pedro F. Giffuni <giffunip tutopia com>
Obtanined from: NetBSD
MFC after: 1 week


203849 14-Feb-2010 rmacklem

Change the default value for vfs.newnfs.enable_locallocks to 0 for
the experimental NFS server, since local locking is known to be
broken and the patch to fix it is still a work in progress.

MFC after: 5 days


203848 14-Feb-2010 rmacklem

This fixes the experimental NFS server so that it won't crash in the
caching code for IPv6 by fixing a typo that used the incorrect variable.
It also fixes the indentation of the statement above it.

Reported by: simon AT comsys.ntu-kpi.kiev.ua
MFC after: 5 days


203828 13-Feb-2010 kib

Fix function name in the comment in the second location too.

Submitted by: ed
MFC after: 1 week


203827 13-Feb-2010 kib

- Add idempotency guards so the structures can be used in other utilities.
- Update bpb structs with reserved fields.
- In direntry struct join deName with deExtension. Although a
fix was attempted in the past, these fields were being overflowed,
Now this is consistent with the spec, and we can now share the
WinChksum code with NetBSD.

Submitted by: Pedro F. Giffuni <giffunip tutopia com>
Mostly obtained from: NetBSD
Reviewed by: bde
MFC after: 2 weeks


203826 13-Feb-2010 kib

Use M_ZERO instead of calling bzero().
Fix function name in the comment.

MFC after: 1 week


203822 13-Feb-2010 kib

Remove unused macros.

MFC after: 1 week


203303 31-Jan-2010 rmacklem

Patch the experimental NFS client so that there is a timeout
for negative name cache entries in a manner analogous to
r202767 for the regular NFS client. Also, make the code in
nfs_lookup() compatible with that of the regular client
and replace the sysctl variable that enabled negative name
caching with the mount point option.

MFC after: 2 weeks


203292 31-Jan-2010 ed

Properly use dev_refl()/dev_rel() in kern.devname.

While there, perform some clean-up fixes. Update some stale comments on
struct cdev * instead of dev_t and devfs_random(). Also add some missing
whitespace.

MFC after: 1 week


203164 29-Jan-2010 jh

Add "maxfilesize" mount option for tmpfs to allow specifying the
maximum file size limit. Default is UINT64_MAX when the option is
not specified. It was useless to set the limit to the total amount of
memory and swap in the system.

Use tmpfs_mem_info() rather than get_swpgtotal() in tmpfs_mount() to
check if there is enough memory available.

Remove now unused get_swpgtotal().

Reviewed by: Gleb Kurtsou
Approved by: trasz (mentor)


203119 28-Jan-2010 rmacklem

Patch the experimental NFS client in a manner analogous to
r203072 for the regular NFS client. Also, delete two fields
of struct nfsmount that are not used by the FreeBSD port of
the client.

MFC after: 2 weeks


203086 27-Jan-2010 trasz

Don't touch v_interlock; use VI_* macros instead.


202903 23-Jan-2010 marius

On LP64 struct ifid is 64-bit aligned while struct fid is 32-bit aligned
so on architectures with strict alignment requirements we can't just simply
cast the latter to the former but need to copy it bytewise instead.

PR: 143010
MFC after: 3 days


202783 22-Jan-2010 jh

Truncate read request rather than returning EIO if the request is
larger than MAXPHYS + 1. This fixes a problem with cat(1) when it
uses a large I/O buffer.

Reported by: Fernando Apesteguía
Suggested by: jilles
Reviewed by: des
Approved by: trasz (mentor)


202708 20-Jan-2010 jh

- Change the type of nodes_max to u_int and use "%u" format string to
convert its value. [1]
- Set default tm_nodes_max to min(pages + 3, UINT32_MAX). It's more
reasonable than the old four nodes per page (with page size 4096) because
non-empty regular files always use at least one page. This fixes possible
overflow in the calculation. [2]
- Don't allow more than tm_nodes_max nodes allocated in tmpfs_alloc_node().

PR: kern/138367
Suggested by: bde [1], Gleb Kurtsou [2]
Approved by: trasz (mentor)


202584 18-Jan-2010 lulf

Revert parts of r202283:
- Return EOPNOTSUPP before EROFS to be consistent with other filesystems.
- Fix setting of the nodump flag for users without PRIV_VFS_SYSFLAGS privilege.

Submitted by: jh@


202283 14-Jan-2010 lulf

Bring in the ext2fs work done by Aditya Sarawgi during and after Google Summer
of Code 2009:

- BSDL block and inode allocation policies for ext2fs. This involves the use
FFS1 style block and inode allocation for ext2fs. Preallocation was removed
since it was GPL'd.
- Make ext2fs MPSAFE by introducing locks to per-mount datastructures.
- Fixes for kern/122047 PR.
- Various small bugfixes.
- Move out of gnu/ directory.

Sponsored by: Google Inc.
Submitted by: Aditya Sarawgi <sarawgi.aditya AT SPAMFREE gmail DOT com>


202187 13-Jan-2010 jh

- Fix some style bugs in tmpfs_mount(). [1]
- Remove a stale comment about tmpfs_mem_info() 'total' argument.

Reported by: bde [1]


201954 09-Jan-2010 brooks

Update the comment on printing group membership to reflect that fact
that each groupt the process is a member of is printed rather than an
entry for each group the user could be a member of.

MFC after: 3 days


201798 08-Jan-2010 trasz

Remove unused smbfs_smb_qpathinfo().


201773 08-Jan-2010 jh

- Change the type of size_max to u_quad_t because its value is converted
with vfs_scanopt(9) using the "%qu" format string.
- Limit the maximum value of size_max to (SIZE_MAX - PAGE_SIZE) to
prevent overflow in howmany() macro.

PR: kern/141194
Approved by: trasz (mentor)
MFC after: 2 weeks


201442 03-Jan-2010 rmacklem

The test for "same client" for the experimental nfs server over NFSv4
was broken w.r.t. byte range lock conflicts when it was the same client
and the request used the open_to_lock_owner4 case, since lckstp->ls_clp
was not set. This patch fixes it by using "clp" instead of "lckstp->ls_clp".

MFC after: 2 weeks


201439 03-Jan-2010 rmacklem

Fix three related problems in the experimental nfs client when
checking for conflicts w.r.t. byte range locks for NFSv4.
1 - Return 0 instead of EACCES when a conflict is found, for F_GETLK.
2 - Check for "same file" when checking for a conflict.
3 - Don't check for a conflict for the F_UNLCK case.


201345 31-Dec-2009 rmacklem

Fix the experimental NFS client so that it can create Unix
domain sockets on an NFSv4 mount point. It was generating
incorrect XDR in the request for this case.

Tested by: infofarmer
MFC after: 2 weeks


201029 26-Dec-2009 rmacklem

When porting the experimental nfs subsystem to the FreeBSD8 krpc,
I added 3 functions that were already in the experimental client
under different names. This patch deletes the functions in the
experimental client and renames the calls to use the other set.
(This is just removal of duplicated code and does not fix any bug.)

MFC after: 2 weeks


200999 25-Dec-2009 rmacklem

Modify the experimental server so that it uses VOP_ACCESSX().
This is necessary in order to enable NFSv4 ACL support. The
argument to nfsvno_accchk() was changed to an accmode_t and
the function nfsrv_aclaccess() was no longer needed and,
therefore, deleted.

Reviewed by: trasz
MFC after: 2 weeks


200732 19-Dec-2009 ed

Let access overriding to TTYs depend on the cdev_priv, not the vnode.

Basically this commit changes two things, which improves access to TTYs
in exceptional conditions. Basically the problem was that when you ran
jexec(8) to attach to a jail, you couldn't use /dev/tty (well, also the
node of the actual TTY, e.g. /dev/pts/X). This is very inconvenient if
you want to attach to screens quickly, use ssh(1), etc.

The fixes:

- Cache the cdev_priv of the controlling TTY in struct session. Change
devfs_access() to compare against the cdev_priv instead of the vnode.
This allows you to bypass UNIX permissions, even across different
mounts of devfs.

- Extend devfs_prison_check() to unconditionally expose the device node
of the controlling TTY, even if normal prison nesting rules normally
don't allow this. This actually allows you to interact with this
device node.

To be honest, I'm not really happy with this solution. We now have to
store three pointers to a controlling TTY (s_ttyp, s_ttyvp, s_ttydp).
In an ideal world, we should just get rid of the latter two and only use
s_ttyp, but this makes certian pieces of code very impractical (e.g.
devfs, kern_exit.c).

Reported by: Many people


200287 08-Dec-2009 delphij

Allow using IPv6 in nfsrvd_sentcache() callback.

PR: kern/141289
Submitted by: Petr Lampa <lampa fit vutbr cz>
Approved by: rmacklem
MFC after: 1 week


200214 07-Dec-2009 guido

Fix ntfs such that it understand media with a non-512-bytes sector size:
1. Fixups are always done on 512 byte chunks (in stead of sectors). This
is kind of stupid.
2. Conevrt between NTFS blocknumbers (the blocksize equals the media
sector size) and the bread() and getblk() blocknr (which are 512-byte
sized)

NB: this change should not affect ntfs for 512-byte sector sizes.


200069 03-Dec-2009 trasz

Remove unneeded ifdefs.

Reviewed by: rmacklem


200041 02-Dec-2009 trasz

Don't use ap->a_td->td_ucred when we were passed ap->a_cred.


199715 23-Nov-2009 rmacklem

Modify the experimental nfs server so that it falls back to
using VOP_LOOKUP() when VFS_VGET() returns EOPNOTSUPP in the
ReaddirPlus RPC. This patch is based upon one by pjd@ for the
regular nfs server which has not yet been committed. It is needed
when a ZFS volume is exported and ReaddirPlus (which almost
always happens for NFSv4) is performed by a client. The patch
also simplifies vnode lock handling somewhat.

MFC after: 2 weeks


199616 20-Nov-2009 rmacklem

Patch the experimental NFS server is a manner analagous to
r197525, so that the creation verifier is handled correctly
in va_atime for 64bit architectures. There were two problems.
One was that the code incorrectly assumed that
sizeof (struct timespec) == 8 and the other was that the tv_sec
field needs to be assigned from a signed 32bit integer, so that
sign extension occurs on 64bit architectures. This is required
for correct operation when exporting ZFS volumes.

Reviewed by: pjd
MFC after: 2 weeks


199189 11-Nov-2009 jh

Create verifier used by FreeBSD NFS client is suboptimal because the
first part of a verifier is set to the first IP address from
V_in_ifaddrhead list. This address is typically the loopback address
making the first part of the verifier practically non-unique. The second
part of the verifier is initialized to zero making its initial value
non-unique too.

This commit changes the strategy for create verifier initialization:
just initialize it to a random value. Also move verifier handling into
its own function and use a mutex to protect the variable.

This change is a candidate for porting to sys/nfsclient.

Reviewed by: jhb, rmacklem
Approved by: trasz (mentor)


199007 06-Nov-2009 attilio

- Improve comments about locking of the "struct fifoinfo" which is a bit
unclear.
- Fix a memory leak [0]

[0] Diagnosed by: Dorr H. Clark <dclark at engr dot scu dot edu>
MFC: 1 week


198494 26-Oct-2009 alc

There is no need to "busy" a page when the object is locked for the duration
of the operation.


198448 24-Oct-2009 ru

Spell DIAGNOSTIC correctly.


198291 20-Oct-2009 jh

Unloading of the nfscl module is unsupported because newnfslock doesn't
support unloading. It's not trivial to implement newnfslock unloading so
for now just admit that unloading is unsupported and refuse to attempt
unload in all nfscl module event handlers.

Reviewed by: rmacklem
Approved by: trasz (mentor)


198290 20-Oct-2009 jh

Fix ordering of nfscl_modevent() and ncl_uninit(). nfscl_modevent() must
be called after ncl_uninit() when unloading the nfscl module because
ncl_uninit() uses ncl_iod_mutex which is destroyed in nfscl_modevent().

Reviewed by: rmacklem
Approved by: trasz (mentor)


198289 20-Oct-2009 jh

Fix comment typos.

Reviewed by: rmacklem
Approved by: trasz (mentor)


197953 11-Oct-2009 delphij

Add locking around access to parent node, and bail out when the parent
node is already freed rather than panicking the system.

PR: kern/122038
Submitted by: gk
Tested by: pho
MFC after: 1 week


197850 07-Oct-2009 delphij

Add a special workaround to handle UIO_NOCOPY case. This fixes data
corruption observed when sendfile() is being used.

PR: kern/127213
Submitted by: gk
MFC after: 2 weeks


197740 04-Oct-2009 delphij

Fix a bug that causes the fsx test case of mmap'ed page being out of sync
of read/write, inspired by ZFS's counterpart.

PR: kern/139312
Submitted by: gk@
MFC after: 1 week


197680 01-Oct-2009 trasz

Provide default implementation for VOP_ACCESS(9), so that filesystems which
want to provide VOP_ACCESSX(9) don't have to implement both. Note that
this commit makes implementation of either of these two mandatory.

Reviewed by: kib


197650 30-Sep-2009 trasz

Fix typo in the comment.


197428 23-Sep-2009 kib

Add per-process osrel node to the procfs, to allow read and set p_osrel
value for the process.

Approved by: des (procfs maintainer)
MFC after: 3 weeks


197134 12-Sep-2009 rwatson

Use C99 initialization for struct filterops.

Obtained from: Mac OS X
Sponsored by: Apple Inc.
MFC after: 3 weeks


197048 09-Sep-2009 rmacklem

Add LK_NOWITNESS to the vn_lock() calls done on newly created nfs
vnodes, since these nodes are not linked into the mount queue and,
as such, the vn_lock() cannot cause a deadlock so LORs are harmless.

Suggested by: kib
Approved by: kib (mentor)
MFC after: 3 days


196970 08-Sep-2009 phk

Revert previous commit and add myself to the list of people who should
know better than to commit with a cat in the area.


196969 08-Sep-2009 phk

Add necessary include.


196921 07-Sep-2009 kib

If a race is detected, pfs_vncache_alloc() may reclaim a vnode that had
never been inserted into the pfs_vncache list. Since pfs_vncache_free()
does not anticipate this case, it decrements pfs_vncache_entries
unconditionally; if the vnode was not in the list, pfs_vncache_entries
will no longer reflect the actual number of list entries. This may cause
size of the cache to exceed the configured maximum. It may also trigger
a panic during module unload or system shutdown.

Do not decrement pfs_vncache_entries for the vnode that was not in the
list.

Submitted by: tegge
Reviewed by: des
MFC after: 1 week


196920 07-Sep-2009 kib

insmntque_stddtr() clears vp->v_data and resets vp->v_op to
dead_vnodeops before calling vgone(). Revert r189706 and corresponding
part of the r186560.

Noted and reviewed by: tegge
Approved by: des (pseudofs part)
MFC after: 3 days


196689 31-Aug-2009 kib

Remove spurious pfs_unlock().

PR: kern/137310
Reviewed by: des
MFC after: 3 days


196556 25-Aug-2009 jilles

Fix poll() on half-closed sockets, while retaining POLLHUP for fifos.

This reverts part of r196460, so that sockets only return POLLHUP if both
directions are closed/error. Fifos get POLLHUP by closing the unused
direction immediately after creating the sockets.

The tools/regression/poll/*poll.c tests now pass except for two other things:
- if POLLHUP is returned, POLLIN is always returned as well instead of only
when there is data left in the buffer to be read
- fifo old/new reader distinction does not work the way POSIX specs it

Reviewed by: kib, bde


196503 24-Aug-2009 zec

Fix NFS panics with options VIMAGE kernels by apropriately setting curvnet
context inside the RPC code.

Temporarily set td's cred to mount's cred before calling socreate() via
__rpc_nconf2socket().

Submitted by: rmacklem (in part)
Reviewed by: rmacklem, rwatson
Discussed with: dfr, bz
Approved by: re (rwatson), julian (mentor)
MFC after: 3 days


196332 17-Aug-2009 rmacklem

Apply the same patch as r196205 for nfs_upgrade_lock() and
nfs_downgrade_lock() to the experimental nfs client.

Approved by: re (kensmith), kib (mentor)


196019 01-Aug-2009 rwatson

Merge the remainder of kern_vimage.c and vimage.h into vnet.c and
vnet.h, we now use jails (rather than vimages) as the abstraction
for virtualization management, and what remained was specific to
virtual network stacks. Minor cleanups are done in the process,
and comments updated to reflect these changes.

Reviewed by: bz
Approved by: re (vimage blanket)


195995 31-Jul-2009 jhb

Fix some LORs between vnode locks and filedescriptor table locks.
- Don't grab the filedesc lock just to read fd_cmask.
- Drop vnode locks earlier when mounting the root filesystem and before
sanitizing stdin/out/err file descriptors during execve().

Submitted by: kib
Approved by: re (rwatson)
MFC after: 1 week


195943 29-Jul-2009 rmacklem

Fix the experimental nfs client so that it only calls ncl_vinvalbuf()
for NFSv2 and not NFSv4 when nfscl_mustflush() returns 0. Since
nfscl_mustflush() only returns 0 when there is a valid write delegation
issued to the client, it only affects the case of an NFSv4 mount with
callbacks/delegations enabled.

Approved by: re (kensmith), kib (mentor)


195840 24-Jul-2009 jhb

Add a new type of VM object: OBJT_SG. An OBJT_SG object is very similar to
a device pager (OBJT_DEVICE) object in that it uses fictitious pages to
provide aliases to other memory addresses. The primary difference is that
it uses an sglist(9) to determine the physical addresses for a given offset
into the object instead of invoking the d_mmap() method in a device driver.

Reviewed by: alc
Approved by: re (kensmith)
MFC after: 2 weeks


195825 22-Jul-2009 rmacklem

When vfs.newnfs.callback_addr is set to an IPv4 address, the
experimental NFSv4 client might try and use it as an IPv6 address,
breaking callbacks. The fix simply initializes the isinet6 variable
for this case.

Approved by: re (kensmith), kib (mentor)


195821 22-Jul-2009 rmacklem

Add changes to the experimental nfs client to use the PBDRY flag for
msleep(9) when a vnode lock or similar may be held. The changes are
just a clone of the changes applied to the regular nfs client by
r195703.

Approved by: re (kensmith), kib (mentor)


195819 22-Jul-2009 rmacklem

When using an NFSv4 mount in the experimental nfs client with delegations
being issued from the server, there was a case where an Open issued locally
based on the delegation would be released before the associated vnode
became inactive. If the delegation was recalled after the open was released,
an Open against the server would not have been acquired and subsequent I/O
operations would need to use the special stateid of all zeros. This patch
fixes that case.

Approved by: re (kensmith), kib (mentor)


195762 19-Jul-2009 rmacklem

Fix two bugs in the experimental nfs client:
- When the root vnode was acquired during mounting, mnt_stat.f_iosize was
still set to 0, so getnewvnode() would set bo_bsize == 0. This would
confuse getblk(), so that it always returned the first block causing
the problem when the root directory of the mount point was greater
than one block in size. It was fixed by setting mnt_stat.f_iosize to
NFS_DIRBLKSIZ before calling ncl_nget() to acquire the root vnode.
- NFSMNT_INT was being set temporarily while the initial connect to a
server was being done. This erroneously configured the krpc for
interruptible RPCs, which caused problems because signals weren't
being masked off as they would have been for interruptible mounts.
This code was deleted to fix the problem. Since mount_nfs does an
NFS null RPC before the mount system call, connections to the server
should work ok.

Tested by: swell dot k at gmail dot com
Approved by: re (kensmith), kib (mentor)


195704 14-Jul-2009 rmacklem

Fix the experimental nfs client so that it does not cause a
"share->excl" panic when doing a lookup of dotdot at the root
of a server's file system. The patch avoids calling vn_lock()
for that case, since nfscl_nget() has already acquired a lock
for the vnode.

Approved by: re (kensmith), kib (mentor)


195699 14-Jul-2009 rwatson

Build on Jeff Roberson's linker-set based dynamic per-CPU allocator
(DPCPU), as suggested by Peter Wemm, and implement a new per-virtual
network stack memory allocator. Modify vnet to use the allocator
instead of monolithic global container structures (vinet, ...). This
change solves many binary compatibility problems associated with
VIMAGE, and restores ELF symbols for virtualized global variables.

Each virtualized global variable exists as a "reference copy", and also
once per virtual network stack. Virtualized global variables are
tagged at compile-time, placing the in a special linker set, which is
loaded into a contiguous region of kernel memory. Virtualized global
variables in the base kernel are linked as normal, but those in modules
are copied and relocated to a reserved portion of the kernel's vnet
region with the help of a the kernel linker.

Virtualized global variables exist in per-vnet memory set up when the
network stack instance is created, and are initialized statically from
the reference copy. Run-time access occurs via an accessor macro, which
converts from the current vnet and requested symbol to a per-vnet
address. When "options VIMAGE" is not compiled into the kernel, normal
global ELF symbols will be used instead and indirection is avoided.

This change restores static initialization for network stack global
variables, restores support for non-global symbols and types, eliminates
the need for many subsystem constructors, eliminates large per-subsystem
structures that caused many binary compatibility issues both for
monitoring applications (netstat) and kernel modules, removes the
per-function INIT_VNET_*() macros throughout the stack, eliminates the
need for vnet_symmap ksym(2) munging, and eliminates duplicate
definitions of virtualized globals under VIMAGE_GLOBALS.

Bump __FreeBSD_version and update UPDATING.

Portions submitted by: bz
Reviewed by: bz, zec
Discussed with: gnn, jamie, jeff, jhb, julian, sam
Suggested by: peter
Approved by: re (kensmith)


195642 12-Jul-2009 rmacklem

Add calls to the experimental nfs client for the case of an "intr" mount,
so that signals that aren't supposed to terminate RPCs in progress are
masked off during the RPC.

Approved by: re (kensmith), kib (mentor)


195641 12-Jul-2009 rmacklem

Fix the handling of dotdot in lookup for the experimental nfs client
in a manner analagous to the change in r195294 for the regular nfs client.

Approved by: re (kensmith), kib (mentor)


195510 09-Jul-2009 rmacklem

Since the nfscl_getclose() function both decremented open counts and,
optionally, created a separate list of NFSv4 opens to be closed, it
was possible for the associated OpenOwner to be free'd before the Open
was closed. The problem was that the Open was taken off the OpenOwner
list before the Close RPC was done and OpenOwners can be free'd once the
list is empty. This patch separates out the case of doing the Close RPC
into a separate function called nfscl_doclose() and simplifies nfsrpc_doclose()
so that it closes a single open instead of a list of them. This avoids
removing the Open from the OpenOwner list before doing the Close RPC.

Approved by: re (kensmith), kib (mentor)


195423 07-Jul-2009 kib

Fix poll(2) and select(2) for named pipes to return "ready for read"
when all writers, observed by reader, exited. Use writer generation
counter for fifo, and store the snapshot of the fifo generation in the
f_seqcount field of struct file, that is otherwise unused for fifos.
Set FreeBSD-undocumented POLLINIGNEOF flag only when file f_seqcount is
equal to fifo' fi_wgen, and revert r89376.

Fix POLLINIGNEOF for sockets and pipes, and return POLLHUP for them.
Note that the patch does not fix not returning POLLHUP for fifos.

PR: kern/94772
Submitted by: bde (original version)
Reviewed by: rwatson, jilles
Approved by: re (kensmith)
MFC after: 6 weeks (might be)


195294 02-Jul-2009 kib

In vn_vget_ino() and their inline equivalents, mnt_ref() the mount point
around the sequence that drop vnode lock and then busies the mount point.
Not having vlocked node or direct reference to the mp allows for the
forced unmount to proceed, making mp unmounted or reused.

Tested by: pho
Reviewed by: jeff
Approved by: re (kensmith)
MFC after: 2 weeks


194990 25-Jun-2009 kib

Change the type of uio_resid member of struct uio from int to ssize_t.
Note that this does not actually enable full-range i/o requests for
64 architectures, and is done now to update KBI only.

Tested by: pho
Reviewed by: jhb, bde (as part of the review of the bigger patch)


194951 25-Jun-2009 rwatson

Add a new global rwlock, in_ifaddr_lock, which will synchronize use of the
in_ifaddrhead and INADDR_HASH address lists.

Previously, these lists were used unsynchronized as they were effectively
never changed in steady state, but we've seen increasing reports of
writer-writer races on very busy VPN servers as core count has gone up
(and similar configurations where address lists change frequently and
concurrently).

For the time being, use rwlocks rather than rmlocks in order to take
advantage of their better lock debugging support. As a result, we don't
enable ip_input()'s read-locking of INADDR_HASH until an rmlock conversion
is complete and a performance analysis has been done. This means that one
class of reader-writer races still exists.

MFC after: 6 weeks
Reviewed by: bz


194766 23-Jun-2009 kib

Implement global and per-uid accounting of the anonymous memory. Add
rlimit RLIMIT_SWAP that limits the amount of swap that may be reserved
for the uid.

The accounting information (charge) is associated with either map entry,
or vm object backing the entry, assuming the object is the first one
in the shadow chain and entry does not require COW. Charge is moved
from entry to object on allocation of the object, e.g. during the mmap,
assuming the object is allocated, or on the first page fault on the
entry. It moves back to the entry on forks due to COW setup.

The per-entry granularity of accounting makes the charge process fair
for processes that change uid during lifetime, and decrements charge
for proper uid when region is unmapped.

The interface of vm_pager_allocate(9) is extended by adding struct ucred *,
that is used to charge appropriate uid when allocation if performed by
kernel, e.g. md(4).

Several syscalls, among them is fork(2), may now return ENOMEM when
global or per-uid limits are enforced.

In collaboration with: pho
Reviewed by: alc
Approved by: re (kensmith)


194601 21-Jun-2009 kib

Add explicit struct ucred * argument for VOP_VPTOCNP, to be used by
vn_open_cred in default implementation. Valid struct ucred is needed for
audit and MAC, and curthread credentials may be wrong.

This further requires modifying the interface of vn_fullpath(9), but it
is out of scope of this change.

Reviewed by: rwatson


194576 21-Jun-2009 rdivacky

In non-debugging mode make this define (void)0 instead of nothing. This
helps to catch bugs like the below with clang.

if (cond); <--- note the trailing ;
something();

Approved by: ed (mentor)
Discussed on: current@


194541 20-Jun-2009 rmacklem

Replace RPCAUTH_UNIXGIDS with NFS_MAXGRPS so that nfscbd.c will build.

Approved by: kib (mentor)


194532 20-Jun-2009 ed

Improve nested jail awareness of devfs by handling credentials.

Now that we start to use credentials on character devices more often
(because of MPSAFE TTY), move the prison-checks that are in place in the
TTY code into devfs.

Instead of strictly comparing the prisons, use the more common
prison_check() function to compare credentials. This means that
pseudo-terminals are only visible in devfs by processes within the same
jail and parent jails.

Even though regular users in parent jails can now interact with
pseudo-terminals from child jails, this seems to be the right approach.
These processes are also capable of interacting with the jailed
processes anyway, through signals for example.

Reviewed by: kib, rwatson (older version)


194523 20-Jun-2009 rmacklem

Change the size of the nfsc_groups[] array in the experimental nfs
client to RPCAUTH_UNIXGIDS + 1 (17), since that is what can go on
the wire for AUTH_SYS authentication.

Reviewed by: brooks
Approved by: kib (mentor)


194498 19-Jun-2009 brooks

Rework the credential code to support larger values of NGROUPS and
NGROUPS_MAX, eliminate ABI dependencies on them, and raise the to 1024
and 1023 respectively. (Previously they were equal, but under a close
reading of POSIX, NGROUPS_MAX was defined to be too large by 1 since it
is the number of supplemental groups, not total number of groups.)

The bulk of the change consists of converting the struct ucred member
cr_groups from a static array to a pointer. Do the equivalent in
kinfo_proc.

Introduce new interfaces crcopysafe() and crsetgroups() for duplicating
a process credential before modifying it and for setting group lists
respectively. Both interfaces take care for the details of allocating
groups array. crsetgroups() takes care of truncating the group list
to the current maximum (NGROUPS) if necessary. In the future,
crsetgroups() may be responsible for insuring invariants such as sorting
the supplemental groups to allow groupmember() to be implemented as a
binary search.

Because we can not change struct xucred without breaking application
ABIs, we leave it alone and introduce a new XU_NGROUPS value which is
always 16 and is to be used or NGRPS as appropriate for things such as
NFS which need to use no more than 16 groups. When feasible, truncate
the group list rather than generating an error.

Minor changes:
- Reduce the number of hand rolled versions of groupmember().
- Do not assign to both cr_gid and cr_groups[0].
- Modify ipfw to cache ucreds instead of part of their contents since
they are immutable once referenced by more than one entity.

Submitted by: Isilon Systems (initial implementation)
X-MFC after: never
PR: bin/113398 kern/133867


194425 18-Jun-2009 alc

Fix some of the style errors in *getpages().


194408 17-Jun-2009 rmacklem

Add the SVC_RELEASE(xprt), as required by r194407.

Approved by: kib (mentor)


194368 17-Jun-2009 bz

Add explicit includes for jail.h to the files that need them and
remove the "hidden" one from vimage.h.


194363 17-Jun-2009 rmacklem

Fix handling of ".." in nfs_lookup() for the forced dismount case
by cribbing the change made to the regular nfs client in r194358.

Approved by: kib (mentor)


194357 17-Jun-2009 bz

Add the explicit include of vimage.h to another five .c files still
missing it.

Remove the "hidden" kernel only include of vimage.h from ip_var.h added
with the very first Vimage commit r181803 to avoid further kernel poisoning.


194292 16-Jun-2009 rmacklem

Remove the "int *" typecast for the aresid argument to vn_rdwr()
and change the type of the argument from size_t to int. This
should avoid issues on 64bit architectures.

Suggested by: kib
Approved by: kib (mentor)


194124 13-Jun-2009 alc

Eliminate unnecessary variables.


194118 13-Jun-2009 jamie

Rename the host-related prison fields to be the same as the host.*
parameters they represent, and the variables they replaced, instead of
abbreviated versions of them.

Approved by: bz (mentor)


194117 13-Jun-2009 jamie

Use getcredhostuuid instead of accessing the prison directly.

Approved by: bz (mentor)


194078 12-Jun-2009 jhb

Update the inline version of vn_get_ino() for ".." lookups to match the
recentish changes to vn_get_ino().

MFC after: 1 week


193955 10-Jun-2009 rmacklem

This commit is analagous to r193952, but for the experimental nfs
subsystem. Add a test for VI_DOOMED just after ncl_upgrade_vnlock() in
ncl_bioread_check_cons(). This is required since it is possible
for the vnode to be vgonel()'d while in ncl_upgrade_vnlock() when
a forced dismount is in progress. Also, move the check for VI_DOOMED
in ncl_vinvalbuf() down to after ncl_upgrade_vnlock() and replace the
out of date comment for it.

Approved by: kib (mentor)


193930 10-Jun-2009 kib

For cd9660_ioctl, check for recycled vnode after locking it.

Noted by: Jaakko Heinonen <jh saunalahti fi>
MFC after: 2 weeks


193924 10-Jun-2009 kib

Fix r193923 by noting that type of a_fp is struct file *, not int.
It was assumed that r193923 was trivial change that cannot be done
wrong.

MFC after: 2 weeks


193923 10-Jun-2009 kib

s/a_fdidx/a_fp/ for VOP_OPEN comments that inline struct vop_open_args
definition.

Discussed with: bde
MFC after: 2 weeks


193922 10-Jun-2009 kib

Remove unused VOP_IOCTL and VOP_KQFILTER implementations for fifofs.

MFC after: 2 weeks


193919 10-Jun-2009 kib

VOP_IOCTL takes unlocked vnode as an argument. Due to this, v_data may
be NULL or derefenced memory may become free at arbitrary moment.

Lock the vnode in cd9660, devfs and pseudofs implementation of VOP_IOCTL
to prevent reclaim; check whether the vnode was already reclaimed after
the lock is granted.

Reported by: georg at dts su
Reviewed by: des (pseudofs)
MFC after: 2 weeks


193837 09-Jun-2009 rmacklem

Since vn_lock() with the LK_RETRY flag never returns an error
for FreeBSD-CURRENT, the code that checked for and returned the
error was broken. Change it to check for VI_DOOMED set after
vn_lock() and return an error for that case. I believe this
should only happen for forced dismounts.

Approved by: kib (mentor)


193735 08-Jun-2009 rmacklem

Fix nfscl_getcl() so that it doesn't crash when it is called to
do an NFSv4 Close operation with the cred argument NULL. Also,
clarify what NULL arguments mean in the function's comment.

Approved by: kib (mentor)


193571 06-Jun-2009 rwatson

Use #ifdef APPLE_MAC instead of #ifdef MAC to conditionalize Apple-specific
behavior for unicode support in UDF so as not to conflict with the MAC
Framework.

Note that Apple's XNU kernel also uses #ifdef MAC for the MAC Framework.

Suggested by: pjd
MFC after: 3 days


193556 06-Jun-2009 des

Drop Giant.

MFC after: 1 week


193511 05-Jun-2009 rwatson

Move "options MAC" from opt_mac.h to opt_global.h, as it's now in GENERIC
and used in a large number of files, but also because an increasing number
of incorrect uses of MAC calls were sneaking in due to copy-and-paste of
MAC-aware code without the associated opt_mac.h include.

Discussed with: pjd


193507 05-Jun-2009 rwatson

Don't check MAC in the NFS server ACL set path, right now we aren't
enforcing MAC for NFS clients.


193433 04-Jun-2009 rwatson

Re-add opt_mac.h include, which is required in order for MNT_MULTILABEL
to be set properly on devfs. Otherwise, it isn't possible to set labels
on /dev nodes.

Reported by: Sergio Rodriguez <sergiorr at yahoo.com>
MFC after: 3 days


193187 31-May-2009 alc

nfs_write() can use the recently introduced vfs_bio_set_valid() instead of
vfs_bio_set_validclean(), thereby avoiding the page queues lock.

Garbage collect vfs_bio_set_validclean(). Nothing uses it any longer.


193176 31-May-2009 kib

Unlock the pseudofs vnode before calling fill method for pfs_readlink().
The fill code may need to lock another vnode, e.g. procfs file
implementation.

Reviewed by: des
Tested by: pho
MFC after: 2 weeks


193175 31-May-2009 kib

Implement the bypass routine for VOP_VPTOCNP in nullfs.
Among other things, this makes procfs <pid>/file working for executables
started from nullfs mount.

Tested by: pho
PR: 94269, 104938


193173 31-May-2009 kib

Do not drop vnode interlock in null_checkvp(). null_lock() verifies that
v_data is not-null before calling NULLVPTOLOWERVP(), and dropping the
interlock allows for reclaim to clean v_data and free the memory.

While there, remove unneeded semicolons and convert the infinite loops
to panics. I have a will to remove null_checkvp() altogether, or leave
it as a trivial stub, but not now.

Reported and tested by: pho


193172 31-May-2009 kib

Lock the real null vnode lock before substitution of vp->v_vnlock.
This should not really matter for correctness, since vp->v_lock is
not locked before the call, and null_lock() holds the interlock,
but makes the control flow for reclaim more clear.

Tested by: pho


193162 31-May-2009 zec

Unbreak options VIMAGE kernel builds.

Approved by: julian (mentor)


193125 30-May-2009 rmacklem

Add a check to v_type == VREG for the recently modified code that
does NFSv4 Closes in the experimental client's VOP_INACTIVE().
I also replaced a bunch of ap->a_vp with a local copy of vp,
because I thought that made it more readable.

Approved by: kib (mentor)


193092 30-May-2009 trasz

Add VOP_ACCESSX, which can be used to query for newly added V*
permissions, such as VWRITE_ACL. For a filsystems that don't
implement it, there is a default implementation, which works
as a wrapper around VOP_ACCESS.

Reviewed by: rwatson@


193066 29-May-2009 jamie

Place hostnames and similar information fully under the prison system.
The system hostname is now stored in prison0, and the global variable
"hostname" has been removed, as has the hostname_mtx mutex. Jails may
have their own host information, or they may inherit it from the
parent/system. The proper way to read the hostname is via
getcredhostname(), which will copy either the hostname associated with
the passed cred, or the system hostname if you pass NULL. The system
hostname can still be accessed directly (and without locking) at
prison0.pr_host, but that should be avoided where possible.

The "similar information" referred to is domainname, hostid, and
hostuuid, which have also become prison parameters and had their
associated global variables removed.

Approved by: bz (mentor)


192986 28-May-2009 alc

Make *getpages()s' assertion on the state of each page's dirty bits
stricter.


192973 28-May-2009 des

Use a temporary variable to avoid a duplicate strlen().

Submitted by: kib
MFC after: 1 week


192928 27-May-2009 rmacklem

Fix handling of NFSv4 Close operations in ncl_inactive(). Only
do them for NFSv4 and flush writes to the server before doing
the Close(s), as required. Also, use the a_td argument instead of
curthread.

Approved by: kib (mentor)


192917 27-May-2009 alc

Eliminate redundant setting of a page's valid bits and pointless clearing
of the same page's dirty bits.


192898 27-May-2009 rmacklem

Add a function to the experimental nfs subsystem that tests to see
if a local file system supports NFSv4 ACLs. This allows the
NFSHASNFS4ACL() macro to be correctly implemented. The NFSv4 ACL
support should now work when the server exports a ZFS volume.

Approved by: kib (mentor)


192895 27-May-2009 jamie

Add hierarchical jails. A jail may further virtualize its environment
by creating a child jail, which is visible to that jail and to any
parent jails. Child jails may be restricted more than their parents,
but never less. Jail names reflect this hierarchy, being MIB-style
dot-separated strings.

Every thread now points to a jail, the default being prison0, which
contains information about the physical system. Prison0's root
directory is the same as rootvnode; its hostname is the same as the
global hostname, and its securelevel replaces the global securelevel.
Note that the variable "securelevel" has actually gone away, which
should not cause any problems for code that properly uses
securelevel_gt() and securelevel_ge().

Some jail-related permissions that were kept in global variables and
set via sysctls are now per-jail settings. The sysctls still exist for
backward compatibility, used only by the now-deprecated jail(2) system
call.

Approved by: bz (mentor)


192861 26-May-2009 rmacklem

Fix the experimental nfs subsystem so that it builds with the
current NFSv4 ACLs, as defined in sys/acl.h. It still needs a
way to test a mount point for NFSv4 ACL support before it will
work. Until then, the NFSHASNFS4ACL() macro just always returns 0.

Approved by: kib (mentor)


192818 26-May-2009 trasz

Adapt to the new ACL #define names.

Reviewed by: rmacklem@


192782 26-May-2009 rmacklem

Add two sysctl variables to the experimental nfs server, so
that the range of versions of NFS handled by the server can
be limited. The nfsd daemon must be restarted after these
sysctl variables are changed, in order for the change to take
effect.

Approved by: kib (mentor)


192781 26-May-2009 rmacklem

Fix the handling of NFSv4 Illegal Operation number to conform
to RFC3530 (the operation number in the reply must be set to
the value for OP_ILLEGAL). Also cleaned up some indentation.

Approved by: kib (mentor)


192780 26-May-2009 rmacklem

Fix the experimental nfs server's interface to the new krpc so
that it handles the case of a non-exported NFSv4 root correctly.
Also, delete handling for the case where nd_repstat is already
set in nfs_proc(), since that no longer happens.

Approved by: kib (mentor)


192707 25-May-2009 rmacklem

Add NFSv4 root export checks to the DelegPurge, Renew and
ReleaseLockOwner operations analagous to what is already
in place for SetClientID and SetClientIDConfirm. These are
the five NFSv4 operations that do not use file handle(s),
so the checks are done using the NFSv4 root export entries
in /etc/exports.

Approved by: kib (mentor)


192705 25-May-2009 rmacklem

Temporarily #undef NFS4_ACL_EXTATTR_NAME, so that the
experimental nfs subsystem will build while the NFSv4 ACL
support is going into the kernel.

Approved by: kib (mentor)


192695 24-May-2009 rmacklem

Crib the realign function out of nfs_krpc.c and add a call
to it for the client side reply. Hopefully this fixes the
problem with using the new krpc for arm for the experimental
nfs client.

Approved by: kib (mentor)


192693 24-May-2009 rmacklem

Fix the experimental NFSv4 server so that it handles the case
where a client is not allowed NFSv4 access correctly. This
restriction is specified in the "V4: ..." line(s) in
/etc/exports.

Approved by: kib (mentor)


192675 24-May-2009 rmacklem

Fix the experimental nfsv4 client so that it works for the
case of a kerberized mount without a host based principal
name. This will only work for mounts being done by a user
other than root. Support for a host based principal name
will not work until proposed changes to the rpcsec_gss part
of the krpc are committed. It now builds for "options KGSSAPI".

Approved by: kib (mentor)


192657 23-May-2009 alc

Eliminate the unnecessary clearing of a page's dirty bits from
nwfs_getpages().


192616 23-May-2009 rmacklem

Fix the rpc_gss_secfind() call in nfs_commonkrpc.c so that
the code will build when "options KGSSAPI" is specified
without requiring the proposed changes that add host based
initiator principal support. It will not handle the case where
the client uses a host based initiator principal until those
changes are committed. The code that uses those changes is
#ifdef'd notyet until the krpc rpcsec_changes are committed.

Approved by: kib (mentor)


192613 22-May-2009 rmacklem

Change the sysctl_base argument to svcpool_create() to NULL for
client side callbacks so that leaf names are not re-used,
since they are already being used by the server.

Approved by: kib (mentor)


192601 22-May-2009 rmacklem

Fix the name of the module common to the client and server
in the experimental nfs subsystem to the correct one for
the MODULE_DEPEND() macro.

Approved by: kib (mentor)


192596 22-May-2009 rmacklem

Change the printf of r192595 to identify the function,
as requested by Sam.

Approved by: kib (mentor)


192591 22-May-2009 rmacklem

Modified the printf message of r192590 to remove the
possible DOS attack, as suggested by Sam.

Approved by: kib (mentor)


192589 22-May-2009 rmacklem

Change the comment at the beginning of the function to reflect the
change from panic() to printf() done by r192588.


192588 22-May-2009 rmacklem

Change the reboot panic that would have occurred if clientid
numbers wrapped around to a printf() warning of a possible
DOS attack, in the experimental nfsv4 server.

Approved by: kib (mentor)


192585 22-May-2009 rmacklem

Modify the mount handling code in the experimental nfs client to
use the newer nmount() style arguments, as is used by mount_nfs.c.
This prepares the kernel code for the use of a mount_nfs.c with
changes for the experimental client integrated into it.

Approved by: kib (mentor)


192582 22-May-2009 rmacklem

Change the code in the experimental nfs client to avoid flushing
writes upon close when a write delegation is held by the client.
This should be safe to do, now that nfsv4 Close operations are
delayed until ncl_inactive() is called for the vnode.

Approved by: kib (mentor)


192581 22-May-2009 rmacklem

Fix the comment in sys/fs/nfs/nfs.h to correctly reflect the
current use of the R_xxx flags. This changed when the
NFS_LEGACYRPC code was removed from the subsystem.

Approved by: kib (mentor)


192578 22-May-2009 rwatson

Remove the unmaintained University of Michigan NFSv4 client from 8.x
prior to 8.0-RELEASE. Rick Macklem's new and more feature-rich NFSv234
client and server are replacing it.

Discussed with: rmacklem


192574 22-May-2009 rmacklem

Fix the experimental nfs server so that it depends on the nlm,
since it now calls nlm_acquire_next_sysid().

Approved by: kib (mentor)


192539 21-May-2009 rmacklem

Fix the comment at line 3711 to be consistent with the change
applied for r192537.

Approved by: kib (mentor)


192503 21-May-2009 rmacklem

Modify sys/fs/nfsserver/nfs_nfsdport.c to use nlm_acquire_next_sysid()
to set the l_sysid for locks correctly.

Approved by: kib (mentor)


192463 20-May-2009 rmacklem

Although it should never happen, all the nfsv4 server can do
when it runs out of clientids is reboot. I had replaced cpu_reboot()
with printf(), since cpu_reboot() doesn't exist for sparc64.
This change replaces the printf() with panic(), so the reboot
would occur for this highly unlikely occurrence.

Approved by: kib (mentor)


192337 18-May-2009 rmacklem

Change the experimental NFSv4 client so that it does not do
the NFSv4 Close operations until ncl_inactive(). This is
necessary so that the Open StateIDs are available for doing
I/O on mmap'd files after VOP_CLOSE(). I also changed some
indentation for the nfscl_getclose() function.

Approved by: kib (mentor)


192256 17-May-2009 rmacklem

Fix the acquisition of local locks via VOP_ADVLOCK() by the
experimental nfsv4 server. It was setting the a_id argument
to a fixed value, but that wasn't sufficient for FreeBSD8.
Instead, set l_pid and l_sysid to 0 plus set the F_REMOTE
flag to indicate that these fields are used to check for
same lock owner. Since, for NFSv4, a lockowner is a ClientID plus
an up to 1024byte name, it can't be put in l_sysid easily.
I also renamed the p variable to td, since it's a thread ptr.

Approved by: kib (mentor)


192255 17-May-2009 rmacklem

Added a SYSCTL to sys/fs/nfsserver/nfs_nfsdport.c so that the value of
nfsrv_dolocallocks can be changed via sysctl. I also added some non-empty
descriptor strings and reformatted some overly long lines.

Approved by: kib (mentor)


192245 17-May-2009 alc

Merge r191964: Eliminate a case of unnecessary page queues locking.


192231 16-May-2009 rmacklem

Changed sys/fs/nfs_clbio.c in the same way Alan Cox changed
sys/nfsclient/nfs_bio.c for r192134, so that the sources stay
in sync.

Approved by: kib (mentor)


192181 16-May-2009 rmacklem

Fixed the Null callback RPCs so that they work with the new krpc. This
required two changes: setting the program and version numbers before
connect and fixing the handling of the Null Rpc case in newnfs_request().

Approved by: kib (mentor)


192152 15-May-2009 rmacklem

Move the nfsstat structure and proc/op number definitions on the
experimental nfs subsystem from sys/fs/nfs/nfs.h and sys/fs/nfs/nfsproto.h
to sys/fs/nfs/nfsport.h and rename nfsstat to ext_nfsstat. This was done
so that src/usr.bin/nfsstat.c could use it alongside the regular nfs
include files and struct nfsstat.

Approved by: kib (mentor)


192151 15-May-2009 kib

Devfs replaces file ops vector with devfs-specific one in devfs_open(),
before the struct file is fully initialized in vn_open(), in particular,
fp->f_vnode is NULL. Other thread calling file operation before f_vnode
is set results in NULL pointer dereference in devvn_refthread().

Initialize f_vnode before calling d_fdopen() cdevsw method, that might
set file ops too.

Reported and tested by: Chris Timmons <cwt networks cwu edu>
(RELENG_7 version)
MFC after: 3 days


192145 15-May-2009 rmacklem

Modify the diskless booting code in sys/fs/nfsclient to be compatible
with what is in sys/nfsclient, so that it will at least build now.

Approved by: kib (mentor)


192134 15-May-2009 alc

Eliminate unnecessary clearing of the page's dirty mask from various
getpages functions.

Eliminate a stale comment.


192121 14-May-2009 rmacklem

Apply changes to the experimental nfs server so that it uses the security
flavors as exported in FreeBSD-CURRENT. This allows it to use a
slightly modified mountd.c instead of a different utility.

Approved by: kib (mentor)


192115 14-May-2009 rmacklem

Change the file names in the comments in sys/fs/nfs/nfs_var.h so
that they are the names used in FreeBSD-CURRENT. Also shuffled a
few entries around, so that they under the correct comment.

Approved by: kib (mentor)


192065 13-May-2009 rmacklem

Apply a one line change to nfs_clbio.c (which is largely a copy
of sys/nfsclient/nfs_bio.c) to track the change recently committed
by acl for nfs_bio.c.

Approved by: kib (mentor)


192017 12-May-2009 rmacklem

Modify the experimental nfs server to use the new nfsd_nfsd_args
structure for nfsd. Includes a change that clarifies the use of
an empty principal name string to indicate AUTH_SYS only.

Approved by: kib (mentor)


192013 12-May-2009 kib

Report all fdescfs vnodes as VCHR for stat(2). Fake the unique
major/minor numbers of the devices.

Pretending that the vnodes are character devices prevents file tree
walkers from descending into the directories opened by current process.
Also, not doing stat on the filedescriptors prevents the recursive entry
into the VFS.

Requested by: kientzle
Discussed with: Jilles Tjoelker <jilles stack nl>


192012 12-May-2009 kib

Return controlled EINVAL when the fdescfs lookup routine is given string
representing too large integer, instead of overflowing and possibly
returning a random but valid vnode.

Noted by: Jilles Tjoelker <jilles stack nl>
MFC after: 3 days


192010 12-May-2009 alc

Eliminate gratuitous clearing of the page's dirty mask.


192000 11-May-2009 rmacklem

Change the name of the nfs server addsock structure from nfsd_args
to nfsd_addsock_args, so that it is consistent with the one in
sys/nfsserver/nfs.h.

Approved by: kib (mentor)


191998 11-May-2009 rmacklem

Modify nfsvno_fhtovp() to ensure that it always sets the credp
argument. Returning without credp set could result in a caller
doing crfree() on garbage.

Reviewed by: kan
Approved by: kib (mentor)


191990 11-May-2009 attilio

Remove the thread argument from the FSD (File-System Dependent) parts of
the VFS. Now all the VFS_* functions and relating parts don't want the
context as long as it always refers to curthread.

In some points, in particular when dealing with VOPs and functions living
in the same namespace (eg. vflush) which still need to be converted,
pass curthread explicitly in order to retain the old behaviour.
Such loose ends will be fixed ASAP.

While here fix a bug: now, UFS_EXTATTR can be compiled alone without the
UFS_EXTATTR_AUTOSTART option.

VFS KPI is heavilly changed by this commit so thirdy parts modules needs
to be recompiled. Bump __FreeBSD_version in order to signal such
situation.


191964 10-May-2009 alc

Eliminate stale comments.

Eliminate a case of unnecessary page queues locking.


191940 09-May-2009 kan

Do not embed struct ucred into larger netcred parent structures.

Credential might need to hang around longer than its parent and be used
outside of mnt_explock scope controlling netcred lifetime. Use separate
reference-counted ucred allocated separately instead.

While there, extend mnt_explock coverage in vfs_stdexpcheck and clean-up
some unused declarations in new NFS code.

Reported by: John Hickey
PR: kern/133439
Reviewed by: dfr, kib


191783 04-May-2009 rmacklem

Add the experimental nfs subtree to the kernel, that includes
support for NFSv4 as well as NFSv2 and 3.
It lives in 3 subdirs under sys/fs:
nfs - functions that are common to the client and server
nfsclient - a mutation of sys/nfsclient that call generic functions
to do RPCs and handle state. As such, it retains the
buffer cache handling characteristics and vnode semantics that
are found in sys/nfsclient, for the most part.
nfsserver - the server. It includes a DRC designed specifically for
NFSv4, that is used instead of the generic DRC in sys/rpc.
The build glue will be checked in later, so at this point, it
consists of 3 new subdirs that should not affect kernel building.

Approved by: kib (mentor)


190888 10-Apr-2009 rwatson

Remove VOP_LEASE and supporting functions. This hasn't been used since
the removal of NQNFS, but was left in in case it was required for NFSv4.
Since our new NFSv4 client and server can't use it for their
requirements, GC the old mechanism, as well as other unused lease-
related code and interfaces.

Due to its impact on kernel programming and binary interfaces, this
change should not be MFC'd.

Proposed by: jeff
Reviewed by: jeff
Discussed with: rmacklem, zach loafman @ isilon


190839 08-Apr-2009 des

Remove spurious locking in pfs_write().

Reported by: Andrew Brampton <me@bramp.net>
MFC after: 1 week


190806 07-Apr-2009 des

Fix an inverted KASSERT. Add similar assertions in other similar places.

Reported by: Andrew Brampton <me@bramp.net>
MFC after: 1 week


189961 18-Mar-2009 pho

Do not use null_bypass for VOP_ISLOCKED, directly call default
implementation. null_bypass cannot work for the !nullfs-vnodes, in
particular, for VBAD vnodes.

In collaboration with: kib


189758 13-Mar-2009 attilio

Remove the null_islocked() overloaded vop because the standard one does
the same.


189696 11-Mar-2009 jhb

Add a new internal mount flag (MNTK_EXTENDED_SHARED) to indicate that a
filesystem supports additional operations using shared vnode locks.
Currently this is used to enable shared locks for open() and close() of
read-only file descriptors.
- When an ISOPEN namei() request is performed with LOCKSHARED, use a
shared vnode lock for the leaf vnode only if the mount point has the
extended shared flag set.
- Set LOCKSHARED in vn_open_cred() for requests that specify O_RDONLY but
not O_CREAT.
- Use a shared vnode lock around VOP_CLOSE() if the file was opened with
O_RDONLY and the mountpoint has the extended shared flag set.
- Adjust md(4) to upgrade the vnode lock on the vnode it gets back from
vn_open() since it now may only have a shared vnode lock.
- Don't enable shared vnode locks on FIFO vnodes in ZFS and UFS since
FIFO's require exclusive vnode locks for their open() and close()
routines. (My recent MPSAFE patches for UDF and cd9660 already included
this change.)
- Enable extended shared operations on UFS, cd9660, and UDF.

Submitted by: ups
Reviewed by: pjd (ZFS bits)
MFC after: 1 month


189693 11-Mar-2009 kib

Enable advisory file locking for devfs vnodes.

Reported by: Timothy Redaelli <timothy redaelli eu>
MFC after: 1 week


189622 10-Mar-2009 kib

Do not use bypass for vop_vptocnp() from nullfs, call standard
implementation instead. The bypass does not assume that returned vnode
is only held.

Reported by: Paul B. Mahol <onemda gmail com>, pluknet <pluknet gmail com>
Reviewed by: jhb
Tested by: pho, pluknet <pluknet gmail com>


189450 06-Mar-2009 kib

Extract the no_poll() and vop_nopoll() code into the common routine
poll_no_poll().
Return a poll_no_poll() result from devfs_poll_f() when
filedescriptor does not reference the live cdev, instead of ENXIO.

Noted and tested by: hps
MFC after: 1 week


189364 04-Mar-2009 avg

udf: use truly unique directory cookie

'off' is an offset within current block, so there is a good chance
it can be non-unique, so use complete offset.

Submitted by: bde
Approved by: jhb


189363 04-Mar-2009 avg

udf_strategy: remove redundant comment

We fail mapping for any udf_bmap_internal error and there can be
different reasons for it, so no need to (over-)emphasize files with
data in fentry.

Submitted by: bde
Approved by: jhb


189302 03-Mar-2009 avg

udf_readdir: do not advance offset if entry can not be uio-ed

Previosly readdir missed some directory entries because there was
no space for them in current uio but directory stream offset
was advanced nevertheless.
jhb has discoved the issue and provided a test-case.

Reviewed by: bde
Approved by: jhb (mentor)


189282 02-Mar-2009 kib

Use the p_sysent->sv_flags flag SV_ILP32 to detect 32bit process
executing on 64bit kernel. This eliminates the direct comparisions
of p_sysent with &ia32_freebsd_sysvec, that were left intact after
r185169.


189120 27-Feb-2009 jhb

- Hold a reference on the cdev a filesystem is mounted from in the mount.
- Remove the cdev pointers from the denode and instead use the mountpoint's
reference to call dev2udev() in getattr().

Reviewed by: kib, julian


189111 27-Feb-2009 avg

udf_readatoffset: return correct size and data pointer for data in fentry

This should help correct reading of directories with data located
in fentry.

Submitted by: bde
Approved by: jhb (mentor)


189082 26-Feb-2009 avg

udf_readatoffset: read through directory vnode, do not read > MAXBSIZE

Currently bread()-ing through device vnode with
(1) VMIO enabled,
(2) bo_bsize != DEV_BSIZE
(3) more than 1 block
results in data being incorrectly cached.
So instead a more common approach of using a vnode belonging to fs is now
employed.
Also, prevent attempt to bread more than MAXBSIZE bytes because of
adjustments made to account for offset that doesn't start on block
boundary.
Add expanded comments to explain the calculations.
Also drop unused inline function while here.

PR: kern/120967
PR: kern/129084

Reviewed by: scottl, kib
Approved by: jhb (mentor)


189070 26-Feb-2009 avg

udf: add read-ahead support modeled after cd9660

Reviewed by: scottl
Approved by: jhb (mentor)


189069 26-Feb-2009 avg

udf_map: return proper error code instead of leaking an internal one

Incidentally this also allows for small files with data embedded into
fentry to be mmap-ed.

Approved by: jhb (mentor)


189068 26-Feb-2009 avg

udf_read: correctly read data from files with data embedded into fentry,

... as opposed to files with data in extents.
Some UDF authoring tools produce this type of file for sufficiently small
data files.

Approved by: jhb (mentor)


189067 26-Feb-2009 avg

udf_strategy: tiny optimization of logic, calculations; extra diagnostics

Use bit-shift instead of division/multiplication.
Act on error as soon as it is detected.
Report attempt to read data embedded in file entry via regular way.
While there, fix lblktosize macro and make use of it.

No functionality should change as a result.

Approved by: jhb (mentor)


188956 23-Feb-2009 trasz

Right now, when trying to unmount a device that's already gone,
msdosfs_unmount() and ffs_unmount() exit early after getting ENXIO.
However, dounmount() treats ENXIO as a success and proceeds with
unmounting. In effect, the filesystem gets unmounted without closing
GEOM provider etc.

Reviewed by: kib
Approved by: rwatson (mentor)
Tested by: dho
Sponsored by: FreeBSD Foundation


188929 22-Feb-2009 alc

Use uiomove_fromphys() instead of the combination of sf_buf and uiomove().

This is not only shorter; it also eliminates unnecessary thread pinning on
architectures that implement a direct map.

MFC after: 3 weeks


188921 22-Feb-2009 alc

Simplify the unwiring and activation of pages.

MFC after: 1 week


188816 19-Feb-2009 avg

style nit in r188815

Pointed out by: jhb, rpaulo
Approved by: jhb (mentor)


188815 19-Feb-2009 avg

fs/udf: fix incorrect error return (-1) when reading a large dir

Not enough space in user-land buffer is not an error, userland
will read further until eof is reached. So instead of propagating
-1 to caller we convert it to zero/success.

cd9660 code works exactly the same way.

PR: kern/78987
Reviewed by: jhb (mentor)
Approved by: jhb (mentor)


188677 16-Feb-2009 des

Fix a logic bug that caused the pfs_attr method to be called only for
PFS_PROCDEP nodes.

Submitted by: Andrew Brampton <brampton@gmail.com>
MFC after: 2 weeks


188588 13-Feb-2009 jhb

Use shared vnode locks when invoking VOP_READDIR().

MFC after: 1 month


188502 11-Feb-2009 jhb

- Consolidate error handling in the cd9660 and udf mount routines.
- Always read the character device pointer while the associated devfs vnode
is locked. Also, use dev_ref() to obtain a new reference on the vnode for
the mountpoint. This reference is released on unmount. This mirrors the
earlier fix to FFS.

Reviewed by: kib


188407 09-Feb-2009 jhb

Mark udf(4) MPSAFE and add support for shared vnode locks during pathname
lookups:
- Honor the caller's locking flags in udf_root() and udf_vget().
- Set VV_ROOT for the root vnode in udf_vget() instead of only doing it in
udf_root().
- Honor the requested locking flags during pathname lookups in udf_lookup().
- Release the buffer holding the directory data before looking up the vnode
for a given file to avoid a LOR between the "udf" vnode locks and
"bufwait".
- Use vn_vget_ino() to handle ".." lookups.
- Special case "." lookups instead of calling udf_vget(). We have to do
extra checking for the vnode lock for "." lookups.


188406 09-Feb-2009 jhb

Use the same style as the rest of the file for the optional data string
after each path component rather than a GCC-ism.


188318 08-Feb-2009 kib

Lookup up the directory entry for the tmpfs node that are deleted by
both node pointer and name component. This does the right thing for
hardlinks to the same node in the same directory.

Submitted by: Yoshihiro Ota <ota j email ne jp>
PR: kern/131356
MFC after: 2 weeks


188251 06-Feb-2009 jhb

Add rudimentary support for symbolic links on UDF. Links are stored as a
sequence of pathname components. We walk the list building a string in
the caller's passed in buffer. Currently this only handles path names
in CS8 (character set 8) as that is what mkisofs generates for UDF images.

MFC after: 1 month


188245 06-Feb-2009 jhb

Add support for fifos to UDF:
- Add a separate set of vnode operations that inherits from the fifo ops
and use it for fifo nodes.
- Add a VOP_SETATTR() method that allows setting the size (by silently
ignoring the requests) of fifos. This is to allow O_TRUNC opens of
fifo devices (e.g. I/O redirection in shells using ">").
- Add a VOP_PRINT() handler while I'm here.


188244 06-Feb-2009 jhb

Tweak the output of VOP_PRINT/vn_printf() some.
- Align the fifo output in fifo_print() with other vn_printf() output.
- Remove the leading space from lockmgr_printinfo() so its output lines up
in vn_printf().
- lockmgr_printinfo() now ends with a newline, so remove an extra newline
from vn_printf().


187960 31-Jan-2009 bz

After r186194 the *fs_strategy() functions always return 0.
So we are no longer interested in the error returned from
the *fs_doio() functions. With that we can remove the
error variable as its value is unused now.

Submitted by: Christoph Mallon christoph.mallon@gmx.de


187959 31-Jan-2009 bz

Remove unused local variables.

Submitted by: Christoph Mallon christoph.mallon@gmx.de
Reviewed by: kib
MFC after: 2 weeks


187864 28-Jan-2009 ed

Mark most often used sysctl's as MPSAFE.

After running a `make buildkernel', I noticed most of the Giant locks in
sysctl are only caused by a very small amount of sysctl's:

- sysctl.name2oid. This one is locked by SYSCTL_LOCK, just like
sysctl.oidfmt.

- kern.ident, kern.osrelease, kern.version, etc. These are just constant
strings.

- kern.arandom, used by the stack protector. It is already protected by
arc4_mtx.

I also saw the following sysctl's show up. Not as often as the ones
above, but still quite often:

- security.jail.jailed. Also mark security.jail.list as MPSAFE. They
don't need locking or already use allprison_lock.

- kern.devname, used by devname(3), ttyname(3), etc.

This seems to reduce Giant locking inside sysctl by ~75% in my primitive
test setup.


187840 28-Jan-2009 imp

Use the correct field name for the size of the sierra_id. While this
is the same size as id, and is unlikely to change, it seems better to
use the correct field here. There's no difference in the generated
code.


187838 28-Jan-2009 jhb

Mark cd9660 MPSAFE and add support for using shared vnode locks during
pathname lookups.
- Remove 'i_offset' and 'i_ino' from the ISO node structure and replace
them with local variables in the lookup routine instead.
- Cache a copy of 'i_diroff' for use during a lookup in a local variable.
- Save a copy of the found directory entry in a malloc'd buffer after a
successfull lookup before getting the vnode. This allows us to release
the buffer holding the directory block before calling vget() which
otherwise resulted in a LOR between "bufwait" and the vnode lock.
- Use an inlined version of vn_vget_ino() to handle races with ..
lookups. I had to inline the code here since cd9660 uses an internal
vget routine to save a disk I/O that would otherwise re-read the
directory block.
- Honor the requested locking flags during lookups to allow for shared
locking.
- Honor the requested locking flags passed to VFS_ROOT() and VFS_VGET()
similar to UFS.
- Don't make every ISO 9660 vnode hold a reference on the vnode of the
underlying device vnode of the mountpoint. The mountpoint already
holds a suitable reference.


187836 28-Jan-2009 jhb

Sync with ufs_vnops.c:1.245 and remove support for accessing device nodes
in ISO 9660 filesystems.


187832 28-Jan-2009 jhb

Assert an exclusive vnode lock for fifo_cleanup() and fifo_close() since
they change v_fifoinfo.

Discussed with: ups (a while ago)


187830 28-Jan-2009 ed

Last step of splitting up minor and unit numbers: remove minor().

Inside the kernel, the minor() function was responsible for obtaining
the device minor number of a character device. Because we made device
numbers dynamically allocated and independent of the unit number passed
to make_dev() a long time ago, it was actually a misnomer. If you really
want to obtain the device number, you should use dev2udev().

We already converted all the drivers to use dev2unit() to obtain the
device unit number, which is still used by a lot of drivers. I've
noticed not a single driver passes NULL to dev2unit(). Even if they
would, its behaviour would make little sense. This is why I've removed
the NULL check.

Ths commit removes minor(), minor2unit() and unit2minor() from the
kernel. Because there was a naming collision with uminor(), we can
rename umajor() and uminor() back to major() and minor(). This means
that the makedev(3) manual page also applies to kernel space code now.

I suspect umajor() and uminor() isn't used that often in external code,
but to make it easier for other parties to port their code, I've
increased __FreeBSD_version to 800062.


187715 26-Jan-2009 kib

The kernel may do unbalanced calls to fifo_close() for fifo vnode,
without corresponding number of fifo_open(). This causes assertion
failure in fifo_close() due to vp->v_fifoinfo being NULL for kernel
with INVARIANTS, or NULL pointer dereference otherwise. In fact, we may
ignore excess calls to fifo_close() without bad consequences.

Turn KASSERT() into the return, and print warning for now.

Tested by: pho
Reviewed by: rwatson
MFC after: 2 weeks


187199 13-Jan-2009 trasz

Turn a "panic: non-decreasing id" into an error printf. This seems
to be caused by a metadata corruption that occurs quite often after
unplugging a pendrive during write activity.

Reviewed by: scottl
Approved by: rwatson (mentor)
Sponsored by: FreeBSD Foundation


187058 11-Jan-2009 trasz

Fix msdosfs_print(), which in turn fixes "show lockedvnods" for msdosfs
vnodes.

Reviewed by: kib
Approved by: rwatson (mentor)
Sponsored by: FreeBSD Foundation


186981 09-Jan-2009 marcus

Fix a deadlock which can occur due to a pseudofs vnode not getting unlocked.

Reported by: Richard Todd <rmtodd@ichotolot.servalan.com>
Reviewed by: kib
Approved by: kib


186911 08-Jan-2009 trasz

Don't panic with "vinvalbuf: dirty bufs" when the mounted device that was
being written to goes away.

Reviewed by: kib, scottl
Approved by: rwatson (mentor)
Sponsored by: FreeBSD Foundation


186617 30-Dec-2008 marcus

Add a VOP_VPTOCNP implementation for pseudofs which covers file systems
such as procfs and linprocfs.

This implementation's locking was enhanced by kib.

Reviewed by: kib
des
Approved by: des
kib
Tested by: pho


186565 29-Dec-2008 kib

When the insmntque() in the pfs_vncache_alloc() fails, vop_reclaim calls
pfs_vncache_free() that removes pvd from the list, while it is not yet
put on the list.

Prevent the invalid removal from the list by clearing pvd_next and
pvd_prev for the newly allocated pvd, and only move pfs_vncache list
head when the pvd was at the head.

Suggested and approved by: des
MFC after: 2 weeks


186563 29-Dec-2008 kib

vm_map_lock_read() does not increment map->timestamp, so we should
compare map->timestamp with saved timestamp after map read lock is
reacquired, not with saved timestamp + 1. The only consequence of the +1
was unconditional lookup of the next map entry, though.

Tested by: pho
Approved by: des
MFC after: 2 weeks


186562 29-Dec-2008 kib

Use curproc->p_sysent->sv_flags bit SV_ILP32 for detection of the 32 bit
caller, instead of direct comparision with ia32_freebsd_sysvec.

Tested by: pho
Approved by: des
MFC after: 2 weeks


186561 29-Dec-2008 kib

Drop the pseudofs vnode lock around call to pfs_read handler. The handler
may need to lock arbitrary vnodes, causing either lock order reversal or
recursive vnode lock acquisition.

Tested by: pho
Approved by: des
MFC after: 2 weeks


186560 29-Dec-2008 kib

After the pfs_vncache_mutex is dropped, another thread may attempt to
do pfs_vncache_alloc() for the same pfs_node and pid. In this case, we
could end up with two vnodes for the pair. Recheck the cache under the
locked pfs_vncache_mutex after all sleeping operations are done [1].

This case mostly cannot happen now because pseudofs uses exclusive vnode
locking for lookup. But it does drop the vnode lock for dotdot lookups,
and Marcus' pseudofs_vptocnp implementation is vulnerable too.

Do not call free() on the struct pfs_vdata after insmntque() failure,
because vp->v_data points to the structure, and pseudofs_reclaim()
frees it by the call to pfs_vncache_free().

Tested by: pho [1]
Approved by: des
MFC after: 2 weeks


186194 16-Dec-2008 trasz

According to phk@, VOP_STRATEGY should never, _ever_, return
anything other than 0. Make it so. This fixes
"panic: VOP_STRATEGY failed bp=0xc320dd90 vp=0xc3b9f648",
encountered when writing to an orphaned filesystem. Reason
for the panic was the following assert:
KASSERT(i == 0, ("VOP_STRATEGY failed bp=%p vp=%p", bp, bp->b_vp));
at vfs_bio:bufstrategy().

Reviewed by: scottl, phk
Approved by: rwatson (mentor)
Sponsored by: FreeBSD Foundation


185984 12-Dec-2008 kib

Reference the vmspace of the process being inspected by procfs, linprocfs
and sysctl kern_proc_vmmap handlers.

Reported and tested by: pho
Reviewed by: rwatson, des
MFC after: 1 week


185980 12-Dec-2008 kib

Do not leak defs_de_interlock on error.

Another pointy hat for my collection.


185959 12-Dec-2008 marcus

Implement VOP_VPTOCNP for devfs. Directory and character device vnodes are
properly translated to their component names.

Reviewed by: arch
Approved by: kib


185958 12-Dec-2008 marcus

Add a simple VOP_VPTOCNP implementation for deadfs which returns EBADF.

Reviewed by: arch
Approved by: kib


185864 10-Dec-2008 kib

Relock user map earlier, to have the lock held when break leaves the
loop earlier due to sbuf error.

Pointy hat to: me
Submitted by: dchagin


185766 08-Dec-2008 kib

Make two style changes to create new commit and document proper commit
message for r185765.

Noted by: rdivacky
Requested by: des

Commit message for r185765 should be:
In procfs map handler, and in linprocfs maps handler, do not call
vn_fullpath() while having vm map locked. This is done in anticipation
of the vop_vptocnp commit, that would make vn_fullpath sometime
acquire vnode lock.

Also, in linprocfs, maps handler already acquires vnode lock.

No objections from: des
MFC after: 2 week


185765 08-Dec-2008 kib

Change the linprocfs <pid>/maps and procfs <pid>/map handlers to use
sbuf instead of doing uiomove. This allows for reads from non-zero
offsets to work.

Patch is forward-ported des@' one, and was adopted to current code
by dchagin@ and me.

Reviewed by: des (linprocfs part)
PR: kern/101453
MFC after: 1 week


185361 27-Nov-2008 kientzle

The timezone byte is a signed value, treat it as such.
Otherwise, time zone information for time zones west of
GMT gets discarded.

PR: kern/128934
Submitted by: J.R. Oldroyd
MFC after: 4 days


185335 26-Nov-2008 kib

In null_lookup(), do the needed cleanup instead of panicing saying
the cleanup is needed.

Reported by: kris, pho
Tested by: pho
MFC after: 2 weeks


185334 26-Nov-2008 lulf

- Support IEEE_P1282 and IEEE_1282 tags in the rock ridge extensions record.

PR: kern/128942
Submitted by: "J.R. Oldroyd" <fbsd - at - opal.com>


185284 25-Nov-2008 daichi

Simplify mode_t check treatment (suggested by trasz).
By semantical view, trasz's code is better than prior one.

Submitted by: trasz
Reviewed by: Masanori OZAWA <ozawa@ongs.co.jp>


185283 25-Nov-2008 daichi

Fixes Unionfs socket issue reported as kern/118346.

PR: 118346
Submitted by: Masanori OZAWA <ozawa@ongs.co.jp>
Discussed at: devsummit Strassburg, EuroBSDCon2008
Discussed with: rwatson, gnn, hrs
MFC after: 2 week


185071 18-Nov-2008 jhb

- Fix a typo in a comment.
- Whitespace fix.
- Remove #if 0'd BSD 4.x code for flushing busy buffers from a mountpoint
during an unmount. FreeBSD uses vflush() for this.


185070 18-Nov-2008 jhb

When looking up the vnode for the device to mount the filesystem on,
ask NDINIT to return a locked vnode instead of letting it drop the
lock and return a referenced vnode and then relock the vnode a few
lines down. This matches the behavior of other filesystem mount routines.


185069 18-Nov-2008 jhb

Remove copy/paste code from UFS to handle sparse blocks. While Rock
Ridge does support sparse files, the cd9660 code does not currently
support them.


185068 18-Nov-2008 jhb

Remove unused i_flags field and IN_ACCESS flag from cd9660 in-memory
i-nodes. cd9660 doesn't support access times.


184652 04-Nov-2008 jhb

Remove unnecessary locking around vn_fullpath(). The vnode lock for the
vnode in question does not need to be held. All the data structures used
during the name lookup are protected by the global name cache lock.
Instead, the caller merely needs to ensure a reference is held on the
vnode (such as vhold()) to keep it from being freed.

In the case of procfs' <pid>/file entry, grab the process lock while we
gain a new reference (via vhold()) on p_textvp to fully close races with
execve(2).

For the kern.proc.vmmap sysctl handler, use a shared vnode lock around
the call to VOP_GETATTR() rather than an exclusive lock.

MFC after: 1 month


184650 04-Nov-2008 jhb

Don't pass WANTPARENT to the pathname lookup of the mount point for a
unionfs mount just so we can immediately drop the reference on the parent
directory vnode without using it.


184595 03-Nov-2008 trasz

Fix few missed accmode changes in coda.

Approved by: rwatson (mentor)


184588 03-Nov-2008 dfr

Implement support for RPCSEC_GSS authentication to both the NFS client
and server. This replaces the RPC implementation of the NFS client and
server with the newer RPC implementation originally developed
(actually ported from the userland sunrpc code) to support the NFS
Lock Manager. I have tested this code extensively and I believe it is
stable and that performance is at least equal to the legacy RPC
implementation.

The NFS code currently contains support for both the new RPC
implementation and the older legacy implementation inherited from the
original NFS codebase. The default is to use the new implementation -
add the NFS_LEGACYRPC option to fall back to the old code. When I
merge this support back to RELENG_7, I will probably change this so
that users have to 'opt in' to get the new code.

To use RPCSEC_GSS on either client or server, you must build a kernel
which includes the KGSSAPI option and the crypto device. On the
userland side, you must build at least a new libc, mountd, mount_nfs
and gssd. You must install new versions of /etc/rc.d/gssd and
/etc/rc.d/nfsd and add 'gssd_enable=YES' to /etc/rc.conf.

As long as gssd is running, you should be able to mount an NFS
filesystem from a server that requires RPCSEC_GSS authentication. The
mount itself can happen without any kerberos credentials but all
access to the filesystem will be denied unless the accessing user has
a valid ticket file in the standard place (/tmp/krb5cc_<uid>). There
is currently no support for situations where the ticket file is in a
different place, such as when the user logged in via SSH and has
delegated credentials from that login. This restriction is also
present in Solaris and Linux. In theory, we could improve this in
future, possibly using Brooks Davis' implementation of variant
symlinks.

Supporting RPCSEC_GSS on a server is nearly as simple. You must create
service creds for the server in the form 'nfs/<fqdn>@<REALM>' and
install them in /etc/krb5.keytab. The standard heimdal utility ktutil
makes this fairly easy. After the service creds have been created, you
can add a '-sec=krb5' option to /etc/exports and restart both mountd
and nfsd.

The only other difference an administrator should notice is that nfsd
doesn't fork to create service threads any more. In normal operation,
there will be two nfsd processes, one in userland waiting for TCP
connections and one in the kernel handling requests. The latter
process will create as many kthreads as required - these should be
visible via 'top -H'. The code has some support for varying the number
of service threads according to load but initially at least, nfsd uses
a fixed number of threads according to the value supplied to its '-n'
option.

Sponsored by: Isilon Systems
MFC after: 1 month


184572 02-Nov-2008 rwatson

Catch up with netsmb locking: explicit thread arguments no longer required.


184557 02-Nov-2008 trasz

Remove the call to getinoquota() from ntfs_access. How did it get there?!

Approved by: rwatson (mentor)


184413 28-Oct-2008 trasz

Introduce accmode_t. This is required for NFSv4 ACLs - it will be neccessary
to add more V* constants, and the variables changed by this patch were often
being assigned to mode_t variables, which is 16 bit.

Approved by: rwatson (mentor)


184214 23-Oct-2008 des

Fix a number of style issues in the MALLOC / FREE commit. I've tried to
be careful not to fix anything that was already broken; the NFSv4 code is
particularly bad in this respect.


184205 23-Oct-2008 des

Retire the MALLOC and FREE macros. They are an abomination unto style(9).

MFC after: 3 months


183806 12-Oct-2008 rwatson

The locking in portalfs's socket connect code is no less correct than
identical code in connect(2), so remove XXX that it might be incorrect.

MFC after: 3 days


183754 10-Oct-2008 attilio

Remove the struct thread unuseful argument from bufobj interface.
In particular following functions KPI results modified:
- bufobj_invalbuf()
- bufsync()

and BO_SYNC() "virtual method" of the buffer objects set.
Main consumers of bufobj functions are affected by this change too and,
in particular, functions which changed their KPI are:
- vinvalbuf()
- g_vfs_close()

Due to the KPI breakage, __FreeBSD_version will be bumped in a later
commit.

As a side note, please consider just temporary the 'curthread' argument
passing to VOP_SYNC() (in bufsync()) as it will be axed out ASAP

Reviewed by: kib
Tested by: Giovanni Trematerra <giovanni dot trematerra at gmail dot com>


183649 06-Oct-2008 rwatson

Use soconnect2() rather than directly invoking uipc_connect2() to
interconnect two UNIX domain sockets.

MFC after: 3 days


183600 04-Oct-2008 kib

Change the linprocfs <pid>/maps and procfs <pid>/map handlers to use
sbuf instead of doing uiomove. This allows for reads from non-zero
offsets to work.

Patch is forward-ported des@' one, and was adopted to current code
by dchagin@ and me.

Reviewed by: des (linprocfs part)
PR: kern/101453
MFC after: 1 week


183578 03-Oct-2008 trasz

Fix Vflags abuse in fdescfs. There should be no functional changes.

Approved by: rwatson (mentor)


183577 03-Oct-2008 trasz

Fix Vflags abuse in cd9660. There should be no functional changes.

Approved by: rwatson (mentor)


183550 02-Oct-2008 zec

Step 1.5 of importing the network stack virtualization infrastructure
from the vimage project, as per plan established at devsummit 08/08:
http://wiki.freebsd.org/Image/Notes200808DevSummit

Introduce INIT_VNET_*() initializer macros, VNET_FOREACH() iterator
macros, and CURVNET_SET() context setting macros, all currently
resolving to NOPs.

Prepare for virtualization of selected SYSCTL objects by introducing a
family of SYSCTL_V_*() macros, currently resolving to their global
counterparts, i.e. SYSCTL_V_INT() == SYSCTL_INT().

Move selected #defines from sys/sys/vimage.h to newly introduced header
files specific to virtualized subsystems (sys/net/vnet.h,
sys/netinet/vinet.h etc.).

All the changes are verified to have zero functional impact at this
point in time by doing MD5 comparision between pre- and post-change
object files(*).

(*) netipsec/keysock.c did not validate depending on compile time options.

Implemented by: julian, bz, brooks, zec
Reviewed by: julian, bz, brooks, kris, rwatson, ...
Approved by: julian (mentor)
Obtained from: //depot/projects/vimage-commit2/...
X-MFC after: never
Sponsored by: NLnet Foundation, The FreeBSD Foundation


183383 26-Sep-2008 kib

Save previous content of the td_fpop before storing the current
filedescriptor into it. Make sure that td_fpop is NULL when calling
d_mmap from dev_pager_getpages().

Change guards against td_fpop field being non-NULL with private state
for another device, and against sudden clearing the td_fpop. This
could occur when either a driver method calls another driver through
the filedescriptor operation, or a page fault happen while driver is
writing to a memory backed by another driver.

Noted by: rwatson
Tested by: rnoland
MFC after: 3 days


183381 26-Sep-2008 ed

Remove unit2minor() use from kernel code.

When I changed kern_conf.c three months ago I made device unit numbers
equal to (unneeded) device minor numbers. We used to require
bitshifting, because there were eight bits in the middle that were
reserved for a device major number. Not very long after I turned
dev2unit(), minor(), unit2minor() and minor2unit() into macro's.
The unit2minor() and minor2unit() macro's were no-ops.

We'd better not remove these four macro's from the kernel, because there
is a lot of (external) code that may still depend on them. For now it's
harmless to remove all invocations of unit2minor() and minor2unit().

Reviewed by: kib


183299 23-Sep-2008 obrien

The kernel implemented 'memcmp' is an alias for 'bcmp'. However, memcmp
and bcmp are not the same thing. 'man bcmp' states that the return is
"non-zero" if the two byte strings are not identical. Where as,
'man memcmp' states that the return is the "difference between the
first two differing bytes (treated as unsigned char values" if the
two byte strings are not identical.

So provide a proper memcmp(9), but it is a C implementation not a tuned
assembly implementation. Therefore bcmp(9) should be preferred over memcmp(9).


183230 21-Sep-2008 ed

Already initialize the vfs timestamps inside the cdev upon allocation.

In the MPSAFE TTY branch I noticed the vfs timestamps inside devfs were
allocated with 0, where the getattr() routine bumps the timestamps to
boottime if the value is below 3600. The reason why it has been designed
like this, is because timestamps during boot are likely to be invalid.

This means that device nodes that are created on demand (posix_openpt())
have timestamps with a value of boottime, which is not what we want.
Solve this by calling vfs_timestamp() inside devfs_alloc().

Discussed with: kib


183215 20-Sep-2008 kib

fdescfs, devfs, mqueuefs, nfs, portalfs, pseudofs, tmpfs and xfs
initialize the vattr structure in VOP_GETATTR() with VATTR_NULL(),
vattr_null() or by zeroing it. Remove these to allow preinitialization
of fields work in vn_stat(). This is needed to get birthtime initialized
correctly.

Submitted by: Jaakko Heinonen <jh saunalahti fi>
Discussed on: freebsd-fs
MFC after: 1 month


183214 20-Sep-2008 kib

Initialize va_rdev to NODEV instead of 0 or VNOVAL in VOP_GETATTR().
NODEV is more appropriate when va_rdev doesn't have a meaningful value.

Submitted by: Jaakko Heinonen <jh saunalahti fi>
Suggested by: bde
Discussed on: freebsd-fs
MFC after: 1 month


183212 20-Sep-2008 kib

Initialize va_flags and va_filerev properly in VOP_GETATTR(). Don't
initialize va_vaflags and va_spare because they are not part of the
VOP_GETATTR() API. Also don't initialize birthtime to ctime or zero.

Submitted by: Jaakko Heinonen <jh saunalahti fi>
Reviewed by: bde
Discussed on: freebsd-fs
MFC after: 1 month


182943 11-Sep-2008 ed

Fix two small typo's in comments in the nullfs vnops code.

Submitted by: Jille Timmermans <jille quis cx>


182739 03-Sep-2008 delphij

Reflect license change of NetBSD code.

Obtained from: NetBSD
MFC after: 3 days


182600 01-Sep-2008 kib

In rev. 1.17 (r33548) of msdosfs_fat.c, relative cluster numbers were
replaced by file relative sector numbers as the buffer block number when
zero-padding a file during extension. Revert the change, it causes wrong
blocks filled with zeroes on seeking beyond end of file.

PR: kern/47628
Submitted by: tegge
MFC after: 3 days


182371 28-Aug-2008 attilio

Decontextualize the couplet VOP_GETATTR / VOP_SETATTR as the passed thread
was always curthread and totally unuseful.

Tested by: Giovanni Trematerra <giovanni dot trematerra at gmail dot com>


181905 20-Aug-2008 ed

Integrate the new MPSAFE TTY layer to the FreeBSD operating system.

The last half year I've been working on a replacement TTY layer for the
FreeBSD kernel. The new TTY layer was designed to improve the following:

- Improved driver model:

The old TTY layer has a driver model that is not abstract enough to
make it friendly to use. A good example is the output path, where the
device drivers directly access the output buffers. This means that an
in-kernel PPP implementation must always convert network buffers into
TTY buffers.

If a PPP implementation would be built on top of the new TTY layer
(still needs a hooks layer, though), it would allow the PPP
implementation to directly hand the data to the TTY driver.

- Improved hotplugging:

With the old TTY layer, it isn't entirely safe to destroy TTY's from
the system. This implementation has a two-step destructing design,
where the driver first abandons the TTY. After all threads have left
the TTY, the TTY layer calls a routine in the driver, which can be
used to free resources (unit numbers, etc).

The pts(4) driver also implements this feature, which means
posix_openpt() will now return PTY's that are created on the fly.

- Improved performance:

One of the major improvements is the per-TTY mutex, which is expected
to improve scalability when compared to the old Giant locking.
Another change is the unbuffered copying to userspace, which is both
used on TTY device nodes and PTY masters.

Upgrading should be quite straightforward. Unlike previous versions,
existing kernel configuration files do not need to be changed, except
when they reference device drivers that are listed in UPDATING.

Obtained from: //depot/projects/mpsafetty/...
Approved by: philip (ex-mentor)
Discussed: on the lists, at BSDCan, at the DevSummit
Sponsored by: Snow B.V., the Netherlands
dcons(4) fixed by: kan


181803 17-Aug-2008 bz

Commit step 1 of the vimage project, (network stack)
virtualization work done by Marko Zec (zec@).

This is the first in a series of commits over the course
of the next few weeks.

Mark all uses of global variables to be virtualized
with a V_ prefix.
Use macros to map them back to their global names for
now, so this is a NOP change only.

We hope to have caught at least 85-90% of what is needed
so we do not invalidate a lot of outstanding patches again.

Obtained from: //depot/projects/vimage-commit2/...
Reviewed by: brooks, des, ed, mav, julian,
jamie, kris, rwatson, zec, ...
(various people I forgot, different versions)
md5 (with a bit of help)
Sponsored by: NLnet Foundation, The FreeBSD Foundation
X-MFC after: never
V_Commit_Message_Reviewed_By: more people than the patch


181635 12-Aug-2008 kib

Remove unnecessary locking around pointer fetch.

Requested by: jhb


180291 05-Jul-2008 rwatson

Introduce a new lock, hostname_mtx, and use it to synchronize access
to global hostname and domainname variables. Where necessary, copy
to or from a stack-local buffer before performing copyin() or
copyout(). A few uses, such as in cd9660 and daemon_saver, remain
under-synchronized and will require further updates.

Correct a bug in which a failed copyin() of domainname would leave
domainname potentially corrupted.

MFC after: 3 weeks


180252 04-Jul-2008 kib

The uniqdosname() function takes char[12] as it third argument.

Found by: -fstack-protector
Reported by: dougb
Tested by: dougb, Rainer Hurling <rhurlin gwdg de>
MFC after: 3 days


180139 01-Jul-2008 rwatson

Remove unused 'td' arguments from smbfs_hash_lock() and
smbfs_hash_unlock().

MFC after: 3 days


179926 22-Jun-2008 gonzo

Get pointer to devfs_ruleset struct after garbage collection has been
performed. Otherwise if ruleset is used by given mountpoint and is empty
it's freed by devfs_ruleset_reap and pointer becomes bogus.

Submitted by: Mateusz Guzik <mjguzik@gmail.com>
PR: kern/124853


179828 16-Jun-2008 kib

Struct cdev is always the member of the struct cdev_priv. When devfs
needed to promote cdev to cdev_priv, the si_priv pointer was followed.

Use member2struct() to calculate address of the wrapping cdev_priv.
Rename si_priv to __si_reserved.

Tested by: pho
Reviewed by: ed
MFC after: 2 weeks


179808 15-Jun-2008 kib

Do not redo the vnode tear-down work already done by insmntque() when
vnode cannot be put on the vnode list for mount.

Reported and tested by: marck
Guilty party: me
MFC after: 3 days


179726 11-Jun-2008 ed

Don't enforce unique device minor number policy anymore.

Except for the case where we use the cloner library (clone_create() and
friends), there is no reason to enforce a unique device minor number
policy. There are various drivers in the source tree that allocate unr
pools and such to provide minor numbers, without using them themselves.

Because we still need to support unique device minor numbers for the
cloner library, introduce a new flag called D_NEEDMINOR. All cdevsw's
that are used in combination with the cloner library should be marked
with this flag to make the cloning work.

This means drivers can now freely use si_drv0 to store their own flags
and state, making it effectively the same as si_drv1 and si_drv2. We
still keep the minor() and dev2unit() routines around to make drivers
happy.

The NTFS code also used the minor number in its hash table. We should
not do this anymore. If the si_drv0 field would be changed, it would no
longer end up in the same list.

Approved by: philip (mentor)


179722 11-Jun-2008 kib

In cd9660_readdir vop, always initialize the idp->uio_off member.

The while loop that is assumed to initialize the uio_off later, may
be not entered at all, causing uninitialized value to be returned in
uio->uio_offset.

PR: 122925
Submitted by: Jaakko Heinonen <jh saunalahti fi>
MFC after: 1 weeks


179554 05-Jun-2008 kib

When devfs_allocv() committed to create new vnode, since de_vnode is NULL,
the dm_lock is held while the newly allocated vnode is locked. Since no
other threads may try to lock the new vnode yet, the LOR there cannot
result in the deadlock.

Shut down the witness warning to note this fact.

Tested by: pho
Prodded by: attilio


179475 01-Jun-2008 ed

Revert the changes I made to devfs_setattr() in r179457.

As discussed with Robert Watson and John Baldwin, it would be better if
PTY's are created with proper permissions, turning grantpt() into a
no-op.

Bypassing security frameworks like MAC by passing NOCRED to
VOP_SETATTR() will only make things more complex.

Approved by: philip (mentor)


179457 31-May-2008 ed

Merge back devfs changes from the mpsafetty branch.

In the mpsafetty branch, PTY's are allocated through the posix_openpt()
system call. The controller side of a PTY now uses its own file
descriptor type (just like sockets, vnodes, pipes, etc).

To remain compatible with existing FreeBSD and Linux C libraries, we can
still create PTY's by opening /dev/ptmx or /dev/ptyXX. These nodes
implement d_fdopen(). Devfs has been slightly changed here, to allow
finit() to be called from d_fdopen().

The routine grantpt() has also been moved into the kernel. This routine
is a little odd, because it needs to bypass standard UNIX permissions.
It needs to change the owner/group/mode of the slave device node, which
may often not be possible. The old implementation solved this by
spawning a setuid utility.

When VOP_SETATTR() is called with NOCRED, devfs_setattr() dereferences
ap->a_cred, causing a kernel panic. Change the de_{uid,gid,mode} code to
allow changes when a->a_cred is set to NOCRED.

Approved by: philip (mentor)


179288 24-May-2008 lulf

- Add locking to all filesystem operations in fdescfs and flag it as MPSAFE.
- Use proper synhronization primitives to protect the internal fdesc node cache
used in fdescfs.
- Properly initialize and uninitalize hash.
- Remove unused functions.

Since fdescfs might recurse on itself, adding proper locking to it needed some
tricky workarounds in some parts to make it work. For instance, a descriptor in
fdescfs could refer to an open descriptor to itself, thus forcing the thread to
recurse on vnode locks. Because of this, other race conditions also had to be
fixed.

Tested by: pho
Reviewed by: kib (mentor)
Approved by: kib (mentor)


179247 23-May-2008 kib

When vget() fails (because the vnode has been reclaimed), there is no
sense to loop trying to vget() the vnode again.

PR: 122977
Submitted by: Arthur Hartwig <arthur.hartwig nokia com>
Tested by: pho
Reviewed by: jhb
MFC after: 1 week


179175 21-May-2008 kib

Implement the per-open file data for the cdev.

The patch does not change the cdevsw KBI. Management of the data is
provided by the functions
int devfs_set_cdevpriv(void *priv, cdevpriv_dtr_t dtr);
int devfs_get_cdevpriv(void **datap);
void devfs_clear_cdevpriv(void);
All of the functions are supposed to be called from the cdevsw method
contexts.

- devfs_set_cdevpriv assigns the priv as private data for the file
descriptor which is used to initiate currently performed driver
operation. dtr is the function that will be called when either the
last refernce to the file goes away, the device is destroyed or
devfs_clear_cdevpriv is called.
- devfs_get_cdevpriv is the obvious accessor.
- devfs_clear_cdevpriv allows to clear the private data for the still
open file.

Implementation keeps the driver-supplied pointers in the struct
cdev_privdata, that is referenced both from the struct file and struct
cdev, and cannot outlive any of the referee.

Man pages will be provided after the KPI stabilizes.

Reviewed by: jhb
Useful suggestions from: jeff, antoine
Debugging help and tested by: pho
MFC after: 1 month


179060 16-May-2008 markus

Fix and speedup timestamp calculations which is roughly based on the patch in
the mentioned PR:

- bounds check time->month as it is used as an array index
- fix usage of time->month as array index (month is 1-12)
- fix calculation based on time->day (day is 1-31)
- fix the speedup code as it doesn't calculate correct timestamps before
the year 2000 and reduce the number of calculation in the year-by-year code
- speedup month calculations by replacing the array content with cumulative
values
- add microseconds calculation
- fix an endian problem

PR: kern/97786
Submitted by: Andriy Gapon <avg@topspin.kiev.ua>
Reviewed by: scottl (earlier version)
Approved by: emax (mentor)
MFC after: 1 week


179030 15-May-2008 attilio

lockinit() can't accept LK_EXCLUSIVE as an initializaiton flag, so just
drop it.

Reported by: Josh Carroll <josh dot carroll at gmail dot com>
Submitted by: jhb


178834 07-May-2008 jhb

Don't explicitly drop Giant around d_open/d_fdopen/d_close for MPSAFE
drivers. Since devfs is already marked MPSAFE it shouldn't be held
anyway.

MFC after: 2 weeks
Discussed with: phk


178822 07-May-2008 daichi

- change function name from *_vdir to *_vnode because
VSOCK has been added as cache target. Now they process
not only VDIR but also VSOCK.
- fixed panic issue caused by cache incorrect free process
by "umount -f"

Submitted by: Masanori OZAWA <ozawa@ongs.co.jp>
MFC after: 1 week


178491 25-Apr-2008 daichi

o Fixed multi thread access issue reported by Alexander V. Chernikov
(admin@su29.net)
fixed: kern/109950

PR: kern/109950
Submitted by: Alexander V. Chernikov (admin@su29.net)
Reviewed by: Masanori OZAWA (ozawa@ongs.co.jp)
MFC after: 1 week


178485 25-Apr-2008 daichi

o Improved unix socket connection issue
fixed: kern/118346

PR: kern/118346
Submitted by: Masanori OZAWA (ozawa@ongs.co.jp)
MFC after: 1 week


178484 25-Apr-2008 daichi

o Fixed rename panic issue

Submitted by: Masanori OZAWA (ozawa@ongs.co.jp)
MFC after: 1 week


178483 25-Apr-2008 daichi

o Fixed inaccessible issue especially including devfs on unionfs case.
fixed also: kern/117829

PR: kern/117829
Submitted by: Masanori OZAWA (ozawa@ongs.co.jp)
MFC after: 1 week


178478 25-Apr-2008 daichi

o Added system hang-up process when VOP_READDIR of unionfs_nodeget()
returns not end of the file status on debug mode (DIAGNOSTIC defined)
kernel.

Submitted by: Masanori OZAWA (ozawa@ongs.co.jp)
MFC after: 1 week


178243 16-Apr-2008 kib

Move the head of byte-level advisory lock list from the
filesystem-specific vnode data to the struct vnode. Provide the
default implementation for the vop_advlock and vop_advlockasync.
Purge the locks on the vnode reclaim by using the lf_purgelocks().
The default implementation is augmented for the nfs and smbfs.
In the nfs_advlock, push the Giant inside the nfs_dolock.

Before the change, the vop_advlock and vop_advlockasync have taken the
unlocked vnode and dereferenced the fs-private inode data, racing with
with the vnode reclamation due to forced unmount. Now, the vop_getattr
under the shared vnode lock is used to obtain the inode size, and
later, in the lf_advlockasync, after locking the vnode interlock, the
VI_DOOMED flag is checked to prevent an operation on the doomed vnode.

The implementation of the lf_purgelocks() is submitted by dfr.

Reported by: kris
Tested by: kris, pho
Discussed with: jeff, dfr
MFC after: 2 weeks


178195 14-Apr-2008 dfr

When calling lf_advlock to unlock a record, make sure that ap->a_fl->l_type
is F_UNLCK otherwise we trigger a LOCKF_DEBUG panic.

MFC after: 3 days


177957 06-Apr-2008 attilio

Optimize lockmgr in order to get rid of the pool mutex interlock, of the
state transitioning flags and of msleep(9) callings.
Use, instead, an algorithm very similar to what sx(9) and rwlock(9)
alredy do and direct accesses to the sleepqueue(9) primitive.

In order to avoid writer starvation a mechanism very similar to what
rwlock(9) uses now is implemented, with the correspective per-thread
shared lockmgrs counter.

This patch also adds 2 new functions to lockmgr KPI: lockmgr_rw() and
lockmgr_args_rw(). These two are like the 2 "normal" versions, but they
both accept a rwlock as interlock. In order to realize this, the general
lockmgr manager function "__lockmgr_args()" has been implemented through
the generic lock layer. It supports all the blocking primitives, but
currently only these 2 mappers live.

The patch drops the support for WITNESS atm, but it will be probabilly
added soon. Also, there is a little race in the draining code which is
also present in the current CVS stock implementation: if some sharers,
once they wakeup, are in the runqueue they can contend the lock with
the exclusive drainer. This is hard to be fixed but the now committed
code mitigate this issue a lot better than the (past) CVS version.
In addition assertive KA_HELD and KA_UNHELD have been made mute
assertions because they are dangerous and they will be nomore supported
soon.

In order to avoid namespace pollution, stack.h is splitted into two
parts: one which includes only the "struct stack" definition (_stack.h)
and one defining the KPI. In this way, newly added _lockmgr.h can
just include _stack.h.

Kernel ABI results heavilly changed by this commit (the now committed
version of "struct lock" is a lot smaller than the previous one) and
KPI results broken by lockmgr_rw() / lockmgr_args_rw() introduction,
so manpages and __FreeBSD_version will be updated accordingly.

Tested by: kris, pho, jeff, danger
Reviewed by: jeff
Sponsored by: Google, Summer of Code program 2007


177910 04-Apr-2008 kib

The temporary workaround for the call to the vget() without lock type in
the fdesc_allocvp(). The caller of the fdesc_allocvp() expects that the
returned vnode is not reclaimed. Do lock the vnode exclusive and drop
the lock after.

Reported by: pho
Reviewed by: jeff


177785 31-Mar-2008 kib

Add the support for the AT_FDCWD and fd-relative name lookups to the
namei(9).

Based on the submission by rdivacky,
sponsored by Google Summer of Code 2007
Reviewed by: rwatson, rdivacky
Tested by: pho


177725 29-Mar-2008 jeff

- Simplify null_hashget() and null_hashins() by using vref() rather
than a complex series of steps involving vget() without a lock type
to emulate the same thing.


177633 26-Mar-2008 dfr

Add the new kernel-mode NFS Lock Manager. To use it instead of the
user-mode lock manager, build a kernel with the NFSLOCKD option and
add '-k' to 'rpc_lockd_flags' in rc.conf.

Highlights include:

* Thread-safe kernel RPC client - many threads can use the same RPC
client handle safely with replies being de-multiplexed at the socket
upcall (typically driven directly by the NIC interrupt) and handed
off to whichever thread matches the reply. For UDP sockets, many RPC
clients can share the same socket. This allows the use of a single
privileged UDP port number to talk to an arbitrary number of remote
hosts.

* Single-threaded kernel RPC server. Adding support for multi-threaded
server would be relatively straightforward and would follow
approximately the Solaris KPI. A single thread should be sufficient
for the NLM since it should rarely block in normal operation.

* Kernel mode NLM server supporting cancel requests and granted
callbacks. I've tested the NLM server reasonably extensively - it
passes both my own tests and the NFS Connectathon locking tests
running on Solaris, Mac OS X and Ubuntu Linux.

* Userland NLM client supported. While the NLM server doesn't have
support for the local NFS client's locking needs, it does have to
field async replies and granted callbacks from remote NLMs that the
local client has contacted. We relay these replies to the userland
rpc.lockd over a local domain RPC socket.

* Robust deadlock detection for the local lock manager. In particular
it will detect deadlocks caused by a lock request that covers more
than one blocking request. As required by the NLM protocol, all
deadlock detection happens synchronously - a user is guaranteed that
if a lock request isn't rejected immediately, the lock will
eventually be granted. The old system allowed for a 'deferred
deadlock' condition where a blocked lock request could wake up and
find that some other deadlock-causing lock owner had beaten them to
the lock.

* Since both local and remote locks are managed by the same kernel
locking code, local and remote processes can safely use file locks
for mutual exclusion. Local processes have no fairness advantage
compared to remote processes when contending to lock a region that
has just been unlocked - the local lock manager enforces a strict
first-come first-served model for both local and remote lockers.

Sponsored by: Isilon Systems
PR: 95247 107555 115524 116679
MFC after: 2 weeks


177493 22-Mar-2008 jeff

- Complete part of the unfinished bufobj work by consistently using
BO_LOCK/UNLOCK/MTX when manipulating the bufobj.
- Create a new lock in the bufobj to lock bufobj fields independently.
This leaves the vnode interlock as an 'identity' lock while the bufobj
is an io lock. The bufobj lock is ordered before the vnode interlock
and also before the mnt ilock.
- Exploit this new lock order to simplify softdep_check_suspend().
- A few sync related functions are marked with a new XXX to note that
we may not properly interlock against a non-zero bv_cnt when
attempting to sync all vnodes on a mountlist. I do not believe this
race is important. If I'm wrong this will make these locations easier
to find.

Reviewed by: kib (earlier diff)
Tested by: kris, pho (earlier diff)


177458 20-Mar-2008 kib

Do not dereference cdev->si_cdevsw, use the dev_refthread() to properly
obtain the reference. In particular, this fixes the panic reported in
the PR. Remove the comments stating that this needs to be done.

PR: kern/119422
MFC after: 1 week


177091 12-Mar-2008 jeff

Remove kernel support for M:N threading.

While the KSE project was quite successful in bringing threading to
FreeBSD, the M:N approach taken by the kse library was never developed
to its full potential. Backwards compatibility will be provided via
libmap.conf for dynamically linked binaries and static binaries will
be broken.


176745 02-Mar-2008 rwatson

Replace lockmgr lock protecting nwfs vnode hash table with an sx lock.

MFC after: 1 month


176744 02-Mar-2008 rwatson

Replace lockmgr lock protecting smbfs node hash table with sx lock.

MFC after: 1 month


176708 01-Mar-2008 attilio

- Handle buffer lock waiters count directly in the buffer cache instead
than rely on the lockmgr support [1]:
* bump the waiters only if the interlock is held
* let brelvp() return the waiters count
* rely on brelvp() instead than BUF_LOCKWAITERS() in order to check
for the waiters number
- Remove a namespace pollution introduced recently with lockmgr.h
including lock.h by including lock.h directly in the consumers and
making it mandatory for using lockmgr.
- Modify flags accepted by lockinit():
* introduce LK_NOPROFILE which disables lock profiling for the
specified lockmgr
* introduce LK_QUIET which disables ktr tracing for the specified
lockmgr [2]
* disallow LK_SLEEPFAIL and LK_NOWAIT to be passed there so that it
can only be used on a per-instance basis
- Remove BUF_LOCKWAITERS() and lockwaiters() as they are no longer
used

This patch breaks KPI so __FreBSD_version will be bumped and manpages
updated by further commits. Additively, 'struct buf' changes results in
a disturbed ABI also.

[2] Really, currently there is no ktr tracing in the lockmgr, but it
will be added soon.

[1] Submitted by: kib
Tested by: pho, Andrea Barberio <insomniac at slackware dot it>


176583 26-Feb-2008 kib

Rename fdescfs vnode from "fdesc" to "fdescfs" to avoid name collision
of the vnode lock with the fdesc_mtx mutex. Having different kinds of
locks with the same name confuses witness.


176578 26-Feb-2008 rwatson

Add "Make MPSAFE" to the Coda todo list.

MFC after: 3 days


176559 25-Feb-2008 attilio

Axe the 'thread' argument from VOP_ISLOCKED() and lockstatus() as it is
always curthread.

As KPI gets broken by this patch, manpages and __FreeBSD_version will be
updated by further commits.

Tested by: Andrea Barberio <insomniac at slackware dot it>


176519 24-Feb-2008 attilio

Introduce some functions in the vnode locks namespace and in the ffs
namespace in order to handle lockmgr fields in a controlled way instead
than spreading all around bogus stubs:
- VN_LOCK_AREC() allows lock recursion for a specified vnode
- VN_LOCK_ASHARE() allows lock sharing for a specified vnode

In FFS land:
- BUF_AREC() allows lock recursion for a specified buffer lock
- BUF_NOREC() disallows recursion for a specified buffer lock

Side note: union_subr.c::unionfs_node_update() is the only other function
directly handling lockmgr fields. As this is not simple to fix, it has
been left behind as "sole" exception.


176431 21-Feb-2008 marcel

Don't check the bpbSecPerTrack and bpbHeads fields of the BPB.
They are typically 0 on new ia64 systems. Since we don't use
either field, there's no harm in not checking.


176363 17-Feb-2008 rwatson

Remove custom queue macros in Coda, replacing them with queue(9) tailq
macros. The only semantic change was the need to add a vc_opened field
to struct vcomm since we can no longer use the request queue returning
to an uninitialized state to hold whether or not the device is open.

MFC after: 1 month


176362 17-Feb-2008 rwatson

Remove namecache performance-tuning todo for Coda: we now use the FreeBSD
name cache.

MFC after: 1 month


176309 15-Feb-2008 rwatson

The possibly interruptible msleep in coda_call() means well, but is
fundamentally fairly confused about how signals work and when it is
appropriate for upcalls to be interrupted. In particular, we should
be exempting certain upcalls from interruption, we should not always
eventually time out sleeping on a upcall, and we should not be
interrupting the sleep for certain signals that we currently are
(including SIGINFO). This code needs to be reworked in the style of
NFS interruptible mounts.

MFC after: 1 month


176308 15-Feb-2008 rwatson

Spell replys as replies.

MFC after: 1 month


176307 15-Feb-2008 rwatson

Reorder and clean up make_coda_node(), annotate weaknesses in the
implementation.

MFC after: 1 month


176263 14-Feb-2008 rwatson

Remove debugging code under OLD_DIAGNOSTIC; this is all >10 years old and
hasn't been used in that time.

MFC after: 1 month


176262 14-Feb-2008 rwatson

In Coda, flush the attribute cache for a cnode when its fid is
changed, as its synthesized inode number may have changed and we
want stat(2) to pick up the new inode number.

MFC after: 1 month


176248 13-Feb-2008 rwatson

Update cache flushing behavior in light of recent namecache and
access cache improvements:

- Flush just access control state on CODA_PURGEUSER, not the full
namecache for /coda.

- When replacing a fid on a cnode as a result of, e.g.,
reintegration after offline operation, we no longer need to
purge the namecache entries associated with its vnode.

MFC after: 1 month


176238 13-Feb-2008 rwatson

Implement a rudimentary access cache for the Coda kernel module,
modeled on the access cache found in NFS, smbfs, and the Linux coda
module. This is a positive access cache of a single entry per file,
tracking recently granted rights, but unlike NFS and smbfs,
supporting explicit invalidation by the distributed file system.

For each cnode, maintain a C_ACCCACHE flag indicating the validity
of the cache, and a cached uid and mode tracking recently granted
positive access control decisions.

Prefer the cache to venus_access() in VOP_ACCESS() if it is valid,
and when we must fall back to venus_access(), update the cache.

Allow Venus to clear the access cache, either the whole cache on
CODA_FLUSH, or just entries for a specific uid on CODA_PURGEUSER.
Unlike the Coda module on Linux, we don't flush all entries on a
user purge using a generation number, we instead walk present
cnodes and clear only entries for the specific user, meaning it is
somewhat more expensive but won't hit all users.

Since the Coda module is agressive about not keeping around
unopened cnodes, the utility of the cache is somewhat limited for
files, but works will for directories. We should make Coda less
agressive about GCing cnodes in VOP_INACTIVE() in order to improve
the effectiveness of in-kernel caching of attributes and access
rights.

MFC after: 1 month


176234 13-Feb-2008 rwatson

Remove now-unused Coda namecache.

MFC after: 1 month


176233 13-Feb-2008 rwatson

Rather than having the Coda module use its own namecache, use the global
VFS namecache, as is done by the Coda module on Linux. Unlike the Coda
namecache, the global VFS namecache isn't tagged by credential, so use
ore conservative flushing behavior (for now) when CODA_PURGEUSER is
issued by Venus.

This improves overall integration with the FreeBSD VFS, including
allowing __getcwd() to work better, procfs/procstat monitoring, and so
on. This improves shell behavior in many cases, and improves ".."
handling. It may lead to some slowdown until we've implemented a
specific access cache, which should net improve performance, but in the
mean time, lookup access control now always goes to Venus, whereas
previously it didn't.

MFC after: 1 month


176232 13-Feb-2008 attilio

Fix a lock leak in the ntfs locking scheme:
When ntfs_ntput() reaches 0 in the refcount the inode lockmgr is not
released and directly destroyed. Fix this by unlocking the lockmgr() even
in the case of zero-refcount.

Reported by: dougb, yar, Scot Hetzel <swhetzel at gmail dot com>
Submitted by: yar


176156 11-Feb-2008 rwatson

Clean up coda_pathconf() slightly while debugging a problem there.

MFC after: 1 month


176139 10-Feb-2008 rwatson

Since we're now actively maintaining the Coda module in the FreeBSD source
tree, restyle everything but coda.h (which is more explicitly shared
across systems) into a closer approximation to style(9).

Remove a few more unused function prototypes.

Add or clarify some comments.

MFC after: 1 month


176131 09-Feb-2008 rwatson

Various further non-functional cleanups to coda:

- Rename print_vattr to coda_print_vattr and make static, rename
print_cred to coda_print_cred.
- Remove unused coda_vop_nop.
- Add XXX comment because coda_readdir forwards to the cache vnode's
readdir rather than venus_readdir, and annotate venus_readdir as
unused.
- Rename vc_nb_* to vc_*.
- Use d_open_t, d_close_t, d_read_t, d_write_t, d_ioctl_t and d_poll_t
for prototyping vc_* as that is the intent, don't use our own
definitions.
- Rename coda_nb_statfs to coda_statfs, rename NB_SFS_SIZ to
CODA_SFS_SIZ.
- Replace one more OBE reference to NetBSD with a reference to FreeBSD.
- Tidy up a little vertical whitespace here and there.
- Annotate coda_nc_zapvnode as unused.
- Remove unused vcodattach.
- Annotate VM_INTR as unused.
- Annotate that coda_fhtovp is unused and doesn't match the FreeBSD
prototype, so isn't hooked up to vfs_fhtovp. If we want NFS export of
Coda to work someday, this needs to be fixed.
- Remove unused getNewVnode.
- Remove unused coda_vget, coda_init, coda_quotactl prototypes.

MFC after: 1 month


176130 09-Feb-2008 rwatson

No reason not to maintain stats on statfs in Coda, as it's done for
other VFS operations, so uncomment the existing statistics gathering.

MFC after: 1 month


176129 09-Feb-2008 rwatson

Remove unused devtomp(), which exploited UFS-specific knowledge to find
the mountpoint for a specific device. This was implemented incorrectly,
a bad idea in a fundamental sense, and also never used, so presumably
a long-idle debugging function.

MFC after: 1 month


176127 09-Feb-2008 rwatson

Since Coda is effectively a stacked file system, use VOP_EOPNOTSUPP
for vop_bmap; delete the existing stub that returned either EINVAL
or EOPNOTSUPP, and had unreachable calls to VOP_BMAP on the cache
vnode.

MFC after: 1 month


176122 09-Feb-2008 rwatson

Lock cache vnode when VOP_FSYNC() is called on a Coda vnode.

MFC after: 1 month


176121 09-Feb-2008 rwatson

Make all calls to vn_lock() in Coda, including recently added ones,
use LK_RETRY, since failure is undesirable (and not handled).

MFC after: 1 month
Pointed out by: kib


176120 08-Feb-2008 rwatson

The Coda module was originally ported to NetBSD from Mach by rvb, and
then later to FreeBSD. Update various NetBSD-related comments: in some
cases delete them because they don't appply, in others update to say
FreeBSD as they still apply but in FreeBSD (and might for that matter
no longer apply on NetBSD), and flag one case where I'm not sure
whether it applies.

MFC after: 1 month


176118 08-Feb-2008 rwatson

Before invoking vnode operations on cache vnodes, acquire the vnode
locks of those vnodes. Probably, Coda should do the same lock sharing/
pass-through that is done for nullfs, but in the mean time this ensures
that locks are adequately held to prevent corruption of data structures
in the cache file system.

Assuming most operations came from the top layer of Coda and weren't
performed directly on the cache vnodes, in practice this corruption was
relatively unlikely as the Coda vnode locks were ensuring exclusive
access for most consumers.

This causes WITNESS to squeal like a pig immediately when Coda is used,
rather than waiting until file close; I noticed these problems because
of the lack of said squealing.

MFC after: 1 month


176117 08-Feb-2008 rwatson

Remove undefined coda excluded by #if 1 #else, which previously protected
vget() calls using inode numbers to query the root of /coda, which is not
needed since we now cache the root vnode with the mountpoint.

MFC after: 1 month


176116 08-Feb-2008 attilio

Conver all explicit instances to VOP_ISLOCKED(arg, NULL) into
VOP_ISLOCKED(arg, curthread). Now, VOP_ISLOCKED() and lockstatus() should
only acquire curthread as argument; this will lead in axing the additional
argument from both functions, making the code cleaner.

Reviewed by: jeff, kib


175679 26-Jan-2008 rwatson

Remove Giant acquisition around soreceive() and sosend() in fifofs. The
bug that caused us to reintroduce it is believed to be fixed, and Kris
says he no longer sees problems with fifofs in highly parallel builds.
If this works out, we'll MFC it for 7.1.

MFC after: 3 months
Pointed out by: kris


175635 24-Jan-2008 attilio

Cleanup lockmgr interface and exported KPI:
- Remove the "thread" argument from the lockmgr() function as it is
always curthread now
- Axe lockcount() function as it is no longer used
- Axe LOCKMGR_ASSERT() as it is bogus really and no currently used.
Hopefully this will be soonly replaced by something suitable for it.
- Remove the prototype for dumplockinfo() as the function is no longer
present

Addictionally:
- Introduce a KASSERT() in lockstatus() in order to let it accept only
curthread or NULL as they should only be passed
- Do a little bit of style(9) cleanup on lockmgr.h

KPI results heavilly broken by this change, so manpages and
FreeBSD_version will be modified accordingly by further commits.

Tested by: matteo


175545 21-Jan-2008 rwatson

Put "coda_rdwr: Internally Opening" printf generated by in-kernel writes
to files, such as ktrace output, under CODA_VERBOSE. Otherwise, each
such call to VOP_WRITE() results in a kernel printf.

MFC after: 3 days
Obtained from: NetBSD


175544 21-Jan-2008 rwatson

Replace references to VOP_LOCK() w/o LK_RETRY to vn_lock() with LK_RETRY,
avoiding extra error handling, or in some cases, missing error handling.

MFC after: 3 days
Discussed with: kib


175498 19-Jan-2008 rwatson

Remove unused oldhash definition from Coda namecache.

MFC after: 3 days


175482 19-Jan-2008 rwatson

Improve default vnode operation handling for Coda:

- Don't specify vnode operations for mknod, lease, and advlock--let them
fall through to vop_default.

- Implement vop_default with &default_vnodeops, rather than with VOP_PANIC,
so that unimplemented vnode operations are handled in more sensible ways
than panicking, such as EOPNOTSUPP on ACL queries generated by bsdtar,
or mknod.

MFC after: 3 days


175481 19-Jan-2008 rwatson

Rework coda_statfs(): no longer need to zero the statfs structure or
fill out all fields, just fill out the ones the file system knows
about. Among other things, this causes the outpuf of "mount" and
"df" to make quite a bit more sense as /dev/cfs0 is specified as the
mountfrom name.

MFC after: 3 days


175479 19-Jan-2008 rwatson

Zero mi_rotovp and coda_ctlvp immediately after calling vrele() on the
vnodes during coda_unmount() in order to detect errant use of them
after the vnode references may no longer be valid.

No need to clear the VV_ROOT flag on mi_rootvp flag (especially after
the vnode reference is no longer valid) as this isn't done on other
file systems.

MFC after: 3 days


175478 19-Jan-2008 rwatson

Don't acquire an additional vnode reference to a vnode when it is opened
and then release it when it is closed: we rely on the caller to keep the
vnode around with a valid reference. This avoids vrele() destroying the
vnode vop_close() is being called from during a call to vop_close(), and
a crash due to lockmgr recursing the vnode lock when a Coda unmount
occurs.

MFC after: 3 days


175476 19-Jan-2008 rwatson

Don't declare functions as extern.

Move all extern variable definitions to associated .h files, move some
extern variable definitions between include files to place them more
appropriately.

MFC after: 3 days


175475 19-Jan-2008 rwatson

Use VOP_NULL rather than VOP_PANIC for Coda's vop_print routine, so as
to avoid panicking in DDB show lockedvnods.

MFC after: 3 days


175474 19-Jan-2008 rwatson

Lock the new directory vnode returned by coda_mkdir(), as this is required
by FreeBSD's vnode locking protocol.

MFC after: 3 days


175473 19-Jan-2008 rwatson

Borrow the VM object associated with an underlying cache vnode with the
Coda vnode derived from it, in the style of nullfs. This allows files
in the Coda file system to be memory-mapped, such as with execve(2) or
mmap(2).

MFC after: 3 days
Reported by: Rune <u+openafsdev-sr55 at chalmers dot se>


175436 18-Jan-2008 kib

udf_vget() shall vgone() the vnode when the file_entry cannot be allocated
or read from the volume. Otherwise, half-constructed vnode could be found
later and cause panic when accessed.

PR: 118322
MFC after: 1 week


175294 13-Jan-2008 attilio

VOP_LOCK1() (and so VOP_LOCK()) and VOP_UNLOCK() are only used in
conjuction with 'thread' argument passing which is always curthread.
Remove the unuseful extra-argument and pass explicitly curthread to lower
layer functions, when necessary.

KPI results broken by this change, which should affect several ports, so
version bumping and manpage update will be further committed.

Tested by: kris, pho, Diego Sardina <siarodx at gmail dot com>


175202 10-Jan-2008 attilio

vn_lock() is currently only used with the 'curthread' passed as argument.
Remove this argument and pass curthread directly to underlying
VOP_LOCK1() VFS method. This modify makes the code cleaner and in
particular remove an annoying dependence helping next lockmgr() cleanup.
KPI results, obviously, changed.

Manpage and FreeBSD_version will be updated through further commits.

As a side note, would be valuable to say that next commits will address
a similar cleanup about VFS methods, in particular vop_lock1 and
vop_unlock.

Tested by: Diego Sardina <siarodx at gmail dot com>,
Andrea Di Pasquale <whyx dot it at gmail dot com>


175166 08-Jan-2008 attilio

Remove explicit calling of lockmgr() with the NULL argument.
Now, lockmgr() function can only be called passing curthread and the
KASSERT() is upgraded according with this.

In order to support on-the-fly owner switching, the new function
lockmgr_disown() has been introduced and gets used in BUF_KERNPROC().
KPI, so, results changed and FreeBSD version will be bumped soon.
Differently from previous code, we assume idle thread cannot try to
acquire the lockmgr as it cannot sleep, so loose the relative check[1]
in BUF_KERNPROC().

Tested by: kris

[1] kib asked for a KASSERT in the lockmgr_disown() about this
condition, but after thinking at it, as this is a well known general
rule, I found it not really necessary.


175151 08-Jan-2008 jhb

Lock the vnode interlock while reading v_usecount to update si_usecount
in a cdev in devfs_reclaim().

MFC after: 3 days
Reviewed by: jeff (a while ago)


175140 07-Jan-2008 jhb

Make ftruncate a 'struct file' operation rather than a vnode operation.
This makes it possible to support ftruncate() on non-vnode file types in
the future.
- 'struct fileops' grows a 'fo_truncate' method to handle an ftruncate() on
a given file descriptor.
- ftruncate() moves to kern/sys_generic.c and now just fetches a file
object and invokes fo_truncate().
- The vnode-specific portions of ftruncate() move to vn_truncate() in
vfs_vnops.c which implements fo_truncate() for vnode file types.
- Non-vnode file types return EINVAL in their fo_truncate() method.

Submitted by: rwatson


175137 07-Jan-2008 attilio

g_vfs_close() wants the sx topology lock held while executing, so just
add correct locking to the operation of unmounting.
This will prevent debugging kernels from panicking if mounting a
non-hpfs partition (I'm not sure if this can be a problem with a
successful mounting operation though).

MFC: 3 days


174988 30-Dec-2007 jeff

Remove explicit locking of struct file.
- Introduce a finit() which is used to initailize the fields of struct file
in such a way that the ops vector is only valid after the data, type,
and flags are valid.
- Protect f_flag and f_count with atomic operations.
- Remove the global list of all files and associated accounting.
- Rewrite the unp garbage collection such that it no longer requires
the global list of all files and instead uses a list of all unp sockets.
- Mark sockets in the accept queue so we don't incorrectly gc them.

Tested by: kris, pho


174951 28-Dec-2007 attilio

Trimm out now unused option LK_EXCLUPGRADE from the lockmgr namespace.
This option just adds complexity and the new implementation no longer
will support it, so axing it now that it is unused is probabilly the
better idea.

FreeBSD version is bumped in order to reflect the KPI breakage introduced
by this patch.

In the ports tree, kris found that only old OSKit code uses it, but as
it is thought to work only on 2.x kernels serie, version bumping will
solve any problem.


174898 25-Dec-2007 rwatson

Add a new 'why' argument to kdb_enter(), and a set of constants to use
for that argument. This will allow DDB to detect the broad category of
reason why the debugger has been entered, which it can use for the
purposes of deciding which DDB script to run.

Assign approximate why values to all current consumers of the
kdb_enter() interface.


174538 11-Dec-2007 markus

Fix calculation of descriptor tag checksums. According to ECMA-167, Part 4,
7.2.3, bytes 0-3 and 5-15 are used to calculate the checksum of a descriptor
tag.

PR: kern/90521
Submitted by: Björn König <bkoenig@cs.tu-berlin.de>
Reviewed by: scottl
Approved by: emax (mentor)


174384 07-Dec-2007 delphij

Turn MPASS(0) into panic with more obvious reason why the assertion
is failed.


174379 06-Dec-2007 delphij

size_max should be unsigned, as such, use size_t here.


174265 04-Dec-2007 wkoszek

Explicitly initialize 'error' to 0 (two places). It lets one to build tmpfs
from the latest source tree with older compiler--gcc3.

Reviewed by: kib@ (on freebsd-current@)
Approved by: cognet@ (mentor)


173728 18-Nov-2007 maxim

o English lesson from bde@: "iff" is not a typo, it means "if and only if".
Backout previous.


173725 18-Nov-2007 delphij

MFp4: Several fixes to tmpfs which makes it to survive from pho@'s
strees2 suite, to quote his letter, this change:

1. It removes the tn_lookup_dirent stuff. I think this cannot be fixed,
because nothing protects vnode/tmpfs node between lookup is done, and
actual operation is performed, in the case the vnode lock is dropped.
At least, this is the case with the from vnode for rename.

For now, we do the linear lookup in the parent node. This has its own
drawbacks. Not mentioning speed (that could be fixed by using hash), the
real problem is the situation where several hardlinks exist in the dvp.
But, I think this is fixable.

2. The patch restores the VV_ROOT flag on the root vnode after it became
reclaimed and allocated again. This fixes MPASS assertion at the start
of the tmpfs_lookup() reported by many.

Submitted by: kib


173724 18-Nov-2007 delphij

MFp4: Fix several style(9) bugs.

Submitted by: des


173695 17-Nov-2007 maxim

o Mask maximum file permissions we get from mount_ntfs -m
with ACCESSPERMS. Document in mount_ntfs(8) only the nine
low-order bits of mask are used (taken from mount_msdosfs(8)).

PR: kern/114856
Submitted by: Ighighi
MFC after: 1 month


173690 17-Nov-2007 maxim

o Fix a typo in the comment.


173590 13-Nov-2007 maxim

o Do not leak inodes hash table at module unload.

PR: kern/118017
Submitted by: Ighighi
MFC after: 1 week


173570 12-Nov-2007 delphij

Correct a stack overflow which will trigger panics when
mode= is specified, caused by incorrect format string
specified to vfs_scanopt() and subsequently vsscanf().

Pointed out by: kib
Submitted by: des


172954 25-Oct-2007 trhodes

Remove some debugging code that, while useful, doesn't belong in the committed
version. While here, expand a macro only used once.

Discussed with/oked by: bde


172930 24-Oct-2007 rwatson

Merge first in a series of TrustedBSD MAC Framework KPI changes
from Mac OS X Leopard--rationalize naming for entry points to
the following general forms:

mac_<object>_<method/action>
mac_<object>_check_<method/action>

The previous naming scheme was inconsistent and mostly
reversed from the new scheme. Also, make object types more
consistent and remove spaces from object types that contain
multiple parts ("posix_sem" -> "posixsem") to make mechanical
parsing easier. Introduce a new "netinet" object type for
certain IPv4/IPv6-related methods. Also simplify, slightly,
some entry point names.

All MAC policy modules will need to be recompiled, and modules
not updates as part of this commit will need to be modified to
conform to the new KPI.

Sponsored by: SPARTA (original patches against Mac OS X)
Obtained from: TrustedBSD Project, Apple Computer


172883 22-Oct-2007 delphij

Fixes to msdosfs dirtyflag related stuff:

- markvoldirty() needs to write to underlying GEOM provider. We
have to do that *before* g_access() which sets the GEOM provider
to read-only.
- Remove dirty flag before free'ing iconv related resources. The
dirty flag removal could fail, and it is hard to revert the
iconv-free after the fail.
- Mark volume as dirty if we have failed to mark it clean for safe.
- Other style fixes to the touched functions.


172798 19-Oct-2007 bde

Implement the async (really, delayed-write) mount option for msdosfs.

This is much simpler than for ffs since there are many fewer places
where we need to choose between a delayed write and a sync write --
just 5 in msdosfs and more than 30 in ffs.

This is more complete and correct than in ffs. Several places in ffs
are are still missing the choice. ffs_update() has a layering violation
that breaks callers which want to force a sync update (mainly fsync(2)
and O_SYNC write(2)).

However, fsync(2) and O_SYNC write(2) are still more broken than in
ffs, since they are broken for default (non-sync non-async) mounts
too. Both fail to sync the FAT in all cases, and both fail to sync
the directory entry in some cases after losing a race. Async everything
is probably safer than the half-baked sync of metadata given by default
mounts.


172758 18-Oct-2007 bde

Add noclusterr and noclusterw options to the options list. I forgot these
when I implemented clustering.


172757 18-Oct-2007 bde

Fix some style bugs in the mount options list. Mainly, sort the list,
leaving space for adding missing options. Negative options are sorted
after removing their "no" prefix, and generic options are sorted before
msdosfs-specific ones.


172741 18-Oct-2007 bde

In msdosfs_settattr(), don't do synchronous updates of the denode
(except indirectly for the size pseudo-attribute). If anything deserves
a sync update, then it is ids and immutable flags, since these are
related to security, but ffs never synced these and msdosfs doesn't
support them. (ufs_setattr() only does an update in one case where
it is least needed (for timestamps); it did pessimal sync updates for
timestamps until 1998/03/08 but was changed for unlogged reasons related
to soft updates.)

Now msdosfs calls deupdat() with waitfor == 0, which normally gives a
delayed update to disk but always gives a sync update of timestamps
in core, while for ffs everything is delayed until the syncer daemon
or other activity causes an update (except for timestamps).

This gives a large optimization mainly for things like cp -p, where
attribute adjustment could easily triple the number of physical I/O's
if it is done synchronously (but cp -p to msdosfs is not as bad as
that, since msdosfs doesn't support many attributes so null adjustments
are more common, and msdosfs doesn't support ctimes so even if cp
doesn't weed out null adjustments they don't become non-null after
clobbering the ctime).


172697 16-Oct-2007 alfred

Get rid of qaddr_t.

Requested by: bde


172644 14-Oct-2007 daichi

This changes give nullfs correctly work with latest unionfs.

Submitted by: Masanori Ozawa <ozawa@ongs.co.jp> (unionfs developer)
Reviewed by: jeff, kensmith
Approved by: re (kensmith)
MFC after: 1 week


172643 14-Oct-2007 daichi

Added whiteout behavior option. ``-o whiteout=always'' is default mode
(it is established practice) and ``-o whiteout=whenneeded'' is less
disk-space using mode especially for resource restricted environments
like embedded environments. (Contributed by Ed Schouten. Thanks)

Submitted by: Masanori Ozawa <ozawa@ongs.co.jp> (unionfs developer)
Reviewed by: jeff, kensmith
Approved by: re (kensmith)
MFC after: 1 week


172642 14-Oct-2007 daichi

Default copy mode has been changed from traditional-mode to transparent-mode.
Some folks who have reported some issues have solved with transparent mode.
We guess it is time to change the default copy mode. The transparent-mode is
the best in most situations.

Submitted by: Masanori Ozawa <ozawa@ongs.co.jp> (unionfs developer)
Reviewed by: jeff, kensmith
Approved by: re (kensmith)
MFC after: 1 week


172641 14-Oct-2007 daichi

Fixed un-vrele issue of upper layer root vnode of unionfs.

Submitted by: Masanori Ozawa <ozawa@ongs.co.jp> (unionfs developer)
Reviewed by: jeff, kensmith
Approved by: re (kensmith)
MFC after: 1 week


172640 14-Oct-2007 daichi

Added NULL check code pointed out by Coverity. (via Stanislav
Sedov. Thanks)

Submitted by: Masanori Ozawa <ozawa@ongs.co.jp> (unionfs developer)
Reviewed by: jeff, kensmith
Approved by: re (kensmith)
MFC after: 1 week


172639 14-Oct-2007 daichi

- It has been become MPSAFE.
- Fixed lock panic issue under MPSAFE.
- Fixed panic issue whenever it locks vnode with reclaim.
- Fixed lock implementations not conforming to vnode_if.src style.

Submitted by: Masanori Ozawa <ozawa@ongs.co.jp> (unionfs developer)
Reviewed by: jeff, kensmith
Approved by: re (kensmith)
MFC after: 1 week


172638 14-Oct-2007 daichi

Fixed vnode unlock/vrele untreated issues whenever errors have
occurred during some treatments.

Submitted by: Masanori Ozawa <ozawa@ongs.co.jp> (unionfs developer)
Reviewed by: jeff, kensmith
Approved by: re (kensmith)
MFC after: 1 week


172637 14-Oct-2007 daichi

- Added support for vfs_cache on unionfs. As a result, you can use
applications that use procfs on unionfs.
- Removed unionfs internal cache mechanism because it has
vfs_cache support instead. As a result, it just simplified code of
unionfs.
- Fixed kern/111262 issue.

Submitted by: Masanori Ozawa <ozawa@ongs.co.jp> (unionfs developer)
Reviewed by: jeff, kensmith
Approved by: re (kensmith)
MFC after: 1 week


172636 14-Oct-2007 daichi

Added treatments to prevent readdir infinity loop using with Linux binary
compatibility feature.

Submitted by: Masanori Ozawa <ozawa@ongs.co.jp> (unionfs developer)
Reviewed by: jeff, kensmith
Approved by: re (kensmith)
MFC after: 1 week


172635 14-Oct-2007 daichi

Changed it frees unneeded memory ASAP.

Submitted by: Masanori Ozawa <ozawa@ongs.co.jp> (unionfs developer)
Reviewed by: jeff, kensmith
Approved by: re (kensmith)
MFC after: 1 week


172634 14-Oct-2007 daichi

Log:
Improved access permission check treatments.

Submitted by: Masanori Ozawa <ozawa@ongs.co.jp> (unionfs developer)
Reviewed by: jeff, kensmith
Approved by: re (kensmith)
MFC after: 1 week


172453 05-Oct-2007 jhb

Use the correct pid when checking to see whether or not the /proc/<pid>
directory itself (rather than any of its contents) is visible to the
current thread.

MFC after: 1 week
PR: kern/90063
Submitted by: john of 8192.net
Approved by: re (kensmith)


172442 04-Oct-2007 delphij

MFp4: Provide a dummy verb "export" to shut up the message
showed up at start when NFS is enabled.

Reported by: rafan
Approved by: re (tmpfs blanket)


172441 04-Oct-2007 delphij

Additional work is still needed before we can claim that tmpfs
is stable enough for production usage. Warn user upon mount.

Approved by: re (tmpfs blanket)


172303 23-Sep-2007 bde

Remove some of the pessimizations involving writing the fsi sector.
All active fields in fsi are advisory/optional, so we shouldn't do
extra work to make them valid at all times, but instead we write to
the fsi too often (we still do), and we searched for a free cluster
for fsinxtfree too often.

This commit just removes the whole search and its results, so that we
write out our in-core copy of fsinxtfree instead of writing a "fixed"
copy and clobbering our in-core copy. This saves fixing 3 bugs:
- off-by-1 error for the end of the search, resulting in fsinxtfree
not actually being adjusted iff only the last cluster is free.
- missing adjustment when no clusters are free.
- off-by-many error for the start of the search. Starting the search
at 0 instead of at (the in-core copy of) fsinxtfree did more than
defeat the reasons for existence of fsinxtfree. fsinxtfree exists
mainly to avoid having to start at 0 for just the first search per
mount, but has the side effect of reducing bias towards allocating
near cluster 0. The bias would normally only be generated by the
first search per mount (if fsinxtfree is not supported), but since
we also adjusted the in-core copy of fsinxtfree here, we were doing
extra work to maximize the bias.

Approved by: re (kensmith)


172292 21-Sep-2007 rodrigc

Disable multiple ntfs mounts to the same mountpoint.
Eliminates panics due to locking issues.
Idea taken from src/sys/gnu/fs/xfs/FreeBSD/xfs_super.c.

PR: 89966, 92000, 104393
Reported by: H. Matsuo <hiroshi50000 yahoo co jp>,
Chris <m2chrischou gmail.com>,
Andrey V. Elsukov <bu7cher yandex ru>,
Jan Henrik Sylvester <me janh de>
Approved by: re (kensmith)


172207 17-Sep-2007 jeff

- Move all of the PS_ flags into either p_flag or td_flags.
- p_sflag was mostly protected by PROC_LOCK rather than the PROC_SLOCK or
previously the sched_lock. These bugs have existed for some time.
- Allow swapout to try each thread in a process individually and then
swapin the whole process if any of these fail. This allows us to move
most scheduler related swap flags into td_flags.
- Keep ki_sflag for backwards compat but change all in source tools to
use the new and more correct location of P_INMEM.

Reported by: pho
Reviewed by: attilio, kib
Approved by: re (kensmith)


172027 31-Aug-2007 bde

Fix races in msdosfs_lookup() and msdosfs_readdir(). These functions
can easily block in bread(), and then there was nothing to prevent the
static buffer (nambuf_{ptr,len,last_id}) being clobbered by another
thread.

The effects of the bug seem to have been limited to failed lookups and
mangled names in readdir(), since Giant locking provides enough
serialization to prevent concurrent calls to the functions that access
the buffer. They were very obvious for multiple concurrent tree walks,
especially with a small cluster size.

The bug was introduced in msdosfs_conv.c 1.34 and associated changes,
and is in all releases starting with 5.2.

The fix is to allocate the buffer as a local variable and pass around
pointers to it like "_r" functions in libc do. Stack use from this
is large but not too large. This also fixes a memory leak on module
unload.

Reviewed by: kib
Approved by: re (kensmith)


171862 16-Aug-2007 delphij

MFp4: rework tmpfs_readdir() logic in terms of correctness.

Approved by: re (tmpfs blanket)
Tested with: fstest, fsx


171852 15-Aug-2007 jhb

On 6.x this works:

% mount | grep home
/dev/ad4s1e on /home (ufs, local, noatime, soft-updates)
% mount -u -o atime /home
% mount | grep home
/dev/ad4s1e on /home (ufs, local, soft-updates)

Restore this behavior for on 7.x for the following mount options:
noatime, noclusterr, noclusterw, noexec, nosuid, nosymfollow

In addition, on 7.x, the following are equivalent:
mount -u -o atime /home
mount -u -o nonoatime /home

Ideally, when we introduce new mount options, we should avoid
options starting with "no". :)

Requested by: jhb
Reported by: Karol Kwiat <karol.kwiat gmail com>, Scott Hetzel <swhetzel gmail com>
Approved by: re (bmah)
Proxy commit for: rodrigc


171802 10-Aug-2007 delphij

MFp4:
- LK_RETRY prohibits vget() and vn_lock() to return error.
Remove associated code. [1]
- Properly use vhold() and vdrop() instead of their unlocked
versions, we are guaranteed to have the vnode's interlock
unheld. [1]
- Fix a pseudo-infinite loop caused by 64/32-bit arithmetic
with the same way used in modern NetBSD versions. [2]
- Reorganize tmpfs_readdir to reduce duplicated code.

Submitted by: kib [1]
Obtained from: NetBSD [2]
Approved by: re (tmpfs blanket)


171799 10-Aug-2007 delphij

MFp4:

- Respect cnflag and don't lock vnode always as LK_EXCLUSIVE [1]
- Properly lock around tn_vnode to avoid NULL deference
- Be more careful handling vnodes (*)

(*) This is a WIP
[1] by pjd via howardsu

Thanks kib@ for his valuable VFS related comments.

Tested with: fsx, fstest, tmpfs regression test set
Found by: pho's stress2 suite
Approved by: re (tmpfs blanket)


171774 07-Aug-2007 bde

In msdosfs_read() and msdosfs_write(), don't check explicitly for
(uio_offset < 0) since this can't happen. If this happens, then the
general code handles the problem safely (better than before for reading,
returning 0 (EOF) instead of the bogus errno EINVAL, and the same as
before for writing, returning EFBIG).

In msdosfs_read(), don't check for (uio_resid < 0). msdosfs_write()
already didn't check.

In msdosfs_read(), document in a comment our assumptions that the caller
passed a valid uio_offset and uio_resid. ffs checks using KASSERT(),
and that is enough sanity checking. In the same comment, partly document
there is no need to check for the EOVERFLOW case, unlike in ffs where this
case can happen at least in theory.

In msdosfs_write(), add a comment about why the checking of
(uio_resid == 0) is explicit, unlike in ffs.

In msdosfs_write(), check for impossibly large final offsets before
checking if the file size rlimit would be exceeded, so that we don't
have an overflow bug in the rlimit check and are consistent with ffs.
We now return EFBIG instead of EFBIG plus a SIGXFSZ signal if the final
offset would be impossibly large but not so large as to cause overflow.
Overflow normally gave the benign behaviour of no signal.

Approved by: re (kensmith) (blanket)


171771 07-Aug-2007 bde

Fix and update the comments about the effect of the read-only flag on writing.
They are still too verbose.

Remove nearby unreachable code for handling symlinks.

Approved by: re (kensmith) (blanket)


171759 07-Aug-2007 bde

Fix some style bugs (don't assume that off_t == int64_t; fix some comments;
remove some parentheses; fix some whitespace errors; fix only one case of
a boolean comparison of a non-boolean).

Improve an error message by quoting ".", and by not printing large positive
values as negative ones.

Approved by: re (kensmith) (blanket)


171758 07-Aug-2007 bde

Fix some style bugs (don't assume that off_t == int64_t; fix some comments;
remove some parentheses; fix only a couple of whtespace errors).

Approved by: re (kensmith) (blanket)


171757 07-Aug-2007 bde

Fix some style bugs (mainly some whitespace errors).

Approved by: re (kensmith) (blanket)


171756 07-Aug-2007 bde

Fix some style bugs (some whitespace errors only).

Approved by: re (kensmith) (blanket)


171755 07-Aug-2007 bde

Sort includes.

Remove rotted banal comment attached to includes.

Approved by: re (kensmith) (blanket)


171754 07-Aug-2007 bde

Sort includes.

Remove banal comments attached to includes.

Approved by: re (kensmith) (blanket)


171752 07-Aug-2007 bde

Sort includes.

Remove banal comments before includes. Remove rotted banal comments attached
to includes.

Approved by: re (kensmith) (blanket)


171751 07-Aug-2007 bde

Remove unused include(s).

Remove banal comments before includes.

Approved by: re (kensmith) (blanket)


171750 07-Aug-2007 bde

Remove unused include(s).

Approved by: re (kensmith) (blanket)


171749 07-Aug-2007 bde

Include <sys/mutex.h> and its prerequisite <sys/lock.h> instead of
depending on namespace pollution in <sys/buf.h> and/or <sys/vnode.h>

Approved by: re (kensmith) (blanket)


171748 07-Aug-2007 bde

Include <sys/mutex.h>'s prerequisite <sys/lock.h> instead of depending on
namespace pollution in <sys/vnode.h>.

Sort the include of <sys/mutex.h> instead of unsorting it after
<sys/vnode.h> and depending on the pollution there.

Approved by: re (kensmith) (blanket)


171747 07-Aug-2007 bde

Remove unused include(s).

Approved by: re (kensmith) (blanket)


171731 05-Aug-2007 bde

Silently fix up the estimated next free cluster number from the fsinfo
sector, instead of failing the whole mount if it is garbage. Fields
in the fsinfo sector are only advisory, so there are better sanity
checks than this, and we already silently fix up the only other advisory
field in the fsinfo (the free cluster count).

This wasn't handled quite right in rev.1.92, 1.117, or in NetBSD. 1.92
also failed the whole mount for the non-garbage magic value 0xffffffff
1.117 fixed this well enough in practice since garbage values shouldn't
occur in practice, but left the error handling larger and more convoluted
than necessary. Now we handle the magic value as a special case of
fixing up all out of bounds values.

Also fix up the estimated next free cluster number when there is no
fsinfo sector. We were using 0, but CLUST_FIRST is safer.

Approved by: re (kensmith)


171711 03-Aug-2007 bde

Oops, fix the fix for the i/o size of the fsinfo block. Its log
message explained why the size is 1 sector, but the code used a
size of 1 cluster.

I/o sizes larger than necessary may cause serious coherency problems
in the buffer cache. Here I think there were only minor efficiency
problems, since a too-large fsinfo buffer could only get far enough
to overlap buffers for the same vnode (the device vnode), so mappings
are coherent at the page level although not at the buffer level, and
the former is probably enough due to our limited use of the fsinfo
buffer.

Approved by: re (kensmith)


171704 03-Aug-2007 delphij

MFp4 - Refine locking to eliminate some potential race/panics:

- Copy before testing a pointer. This closes a race window.
- Use msleep with the node interlock instead of tsleep.
- Do proper locking around access to tn_vpstate.
- Assert vnode VOP lock for dir_{atta,de}tach to capture
inconsistent locking.

Suggested by: kib
Submitted by: delphij
Reviewed by: Howard Su
Approved by: re (tmpfs blanket)


171599 26-Jul-2007 pjd

When we do open, we should lock the vnode exclusively. This fixes few races:
- fifo race, where two threads assign v_fifoinfo,
- v_writecount modifications,
- v_object modifications,
- and probably more...

Discussed with: kib, ups
Approved by: re (rwatson)


171570 24-Jul-2007 delphij

MFp4: Force 64-bit arithmatic when caculating the maximum file size.
This fixes tmpfs caculations on 32-bit systems equipped with more than
4GB swap.

Reported by: Craig Boston <craig xfoil gank org>
PR: kern/114870
Approved by: re (tmpfs blanket)


171551 23-Jul-2007 bde

Make using msdosfs as the root file system sort of work:

o Initialize ownerships and permissions. They were garbage (0) for
root mounts since vfs_mountroot_try() doesn't ask for them to be set
and msdosfs's old incomplete code to set them was removed. The
garbage happened to give the correct ownerships root:wheel, but it
gave permissions 000 so init could not be execed. Use the macros
for root: wheel and 0755. (The removed code gave 0:0 and 0777. 0755
is more normal and secure, thought wrong for /tmp.)

o Check the readonly flag for initial (non-MNT_UPDATE) mounts in the
correct place, as in ffs. For root mounts, it is only passed in
mp->mnt_flags, since vfs_mountroot_try() only passes it as a flag
and nothing translates the flag to the "ro" option string. msdosfs
only looked for it in the string, so it gave a rw mount for root
mounts without even clearing the flag in mp->mnt_flags, so the final
state was inconsistent. Checking the flag only in mp->mnt_flags
works for initial userland mounts too. The MNT_UPDATE case is
messier.

The main point that should work but doesn't is fsck of msdosfs root
while it is mounted ro. This needs mainly MNT_RELOAD support to work.
It should be possible to run fsck -p and succeed provided the fs is
consistent, not just for msdosfs, but this fails because fsck -p always
tries to open the device rw. The hack that allows open for writing
in ffs is not implemented in msdosfs, since without MNT_RELOAD support
writing could only be harmful. So fsck must be turned off to use
msdosfs as root. This is quite dangerous, since msdosfs is still missing
actually using its fs-dirty flag internally, so it is happy to mount
dirty fileystems rw.

Unrelated changes:
- Fix missing error handling for MNT_UPDATE from rw to ro.
- Catch up with renaming msdos to msdosfs in a string.

Approved by: re (kensmith)


171550 23-Jul-2007 delphij

MFp4: When swapping is not enabled, allow creating files by taking
physical memory pages into account for tm_maxfilesize.

Reported by: Dominique Goncalves <dominique.goncalves gmail.com>
Submitted by: Howard Su
Approved by: re (tmpfs blanket)


171523 20-Jul-2007 bde

Implement vfs clustering for msdosfs.

This gives a very large speedup for small block sizes (in my tests,
about 5 times for write and 3 times for read with a block size of 512,
if clustering is possible) and a moderate speedup for the moderatatly
large block sizes that should be used on non-small media (4K is the
best size in most cases, and the speedup for that is about 1.3 times
for write and 1.2 times for read). mmap() should benefit from clustering
like read()/write(), but the current implementation of vm only supports
clustering (at least for getpages) if the fs block size is >= PAGE SIZE.

msdosfs is now only slightly slower than ffs with soft updates for
writing and slightly faster for reading when both use their best block
sizes. Writing is slower for msdosfs because of more sync writes.
Reading is faster for msdosfs because indirect blocks interfere with
clustering in ffs.

The changes in msdosfs_read() and msdosfs_write() are simpler merges
of corresponding code in ffs (after fixing some style bugs in ffs).
msdosfs_bmap() needs fs-specific code. This implementation loops
calling a lower level bmap function to do the hard parts. This is a
bit inefficient, but is efficient enough since msdsfs_bmap() is only
called when there is physical i/o to do.

Approved by: re (hrs)


171522 20-Jul-2007 bde

Clean up before implementing vfs clustering for msdosfs:

In msdosfs_read(), mainly reorder the main loop to the same order as in
ffs_read().

In msdosfs_write() and extendfile(), use vfs_bio_clrbuf() instead of
clrbuf(). I think this just just a bogus optimization, but ffs always
does it and msdosfs already did it in one place, and it is what I've
tested.

In msdosfs_write(), merge good bits from a comment in ffs_write(), and
fix 1 style bug.

In the main comment for msdosfs_pcbmap(), improve wording and catch
up with 13 years of changes in the function. This comment belongs in
VOP_BMAP.9 but that doesn't exist.

In msdosfs_bmap(), return EFBIG if the requested cluster number is out
of bounds instead of blindly truncating it, and fix many style bugs.

Approved by: re (hrs)


171518 20-Jul-2007 rwatson

Make sure we release the control vnode in Coda:

We allocate coda_ctlvp when /coda is mounted, but never release it.
During the unmount this vnode was marked as UNMOUNTING and when venus
is started a second time the system would hang, possibly waiting for
the old vnode to disappear.

So now we call vrele on the control vnode when file system is unmounted
to drop the reference we got during the mount. I'm pretty sure it is
also necessary to not skip the handling in coda_inactive for the control
vnode, it seems like that is the place we actually get rid of the vnode
once the refcount has dropped to 0.

Submitted by: Jan Harkes <jaharkes at cs dot cmu dot edu>
Approved by: re (kensmith)


171489 19-Jul-2007 delphij

MFp4: Rework on tmpfs's mapped read/write procedures. This
should finally fix fsx test case.

The printf's added here would be eventually turned into
assertions.

Submitted by: Mingyan Guo (mostly)
Approved by: re (tmpfs blanket)


171416 12-Jul-2007 rwatson

Complete repo-copy and move of Coda from src/sys/coda to src/sys/fs/coda
by removing files from src/sys/coda, and updating include paths in the
new location, kernel configuration, and Makefiles. In one case add
$FreeBSD$.

Discussed with: anderson, Jan Harkes <jaharkes@cs.cmu.edu>
Approved by: re (kensmith)
Repo-copy madness: simon


171414 12-Jul-2007 rwatson

Forced commit to recognize repo-copy of Coda files from src/sys/coda to
src/sys/fs/coda.

Discussed with: anderson, Jan Harkes <jaharkes@cs.cmu.edu>
Approved by: re (kensmith)
Repo-copy madness: simon


171408 12-Jul-2007 bde

Round up the FAT block size to a multiple of the sector size so that i/o
to the FAT is possible.

Make the FAT block size less arbitrary before it is rounded up:
- for FAT12, default to 3*512 instead of to 3 sectors. The magic 3 is
the default number of 512-byte FAT sectors on a floppy drive. That
many sectors is too many if the sector size is larger.
- for !FAT12, default to PAGE_SIZE instead of to 4096. Remove
MSDOSFS_DFLTBSIZE since it only obfuscated this 4096.

For reading the BPB, use a block size of 8192 instead of 2048 so that
sector sizes up to 8192 can work. We should try several sizes, or just
try the maximum supported size (MAXBSIZE = 64K). I use 8192 because
that is enough for DVD-RW's (even 2048 is enough) and 8192 has been
tested a lot in use by ffs.

This completes fixing msdosfs for some large sector sizes (up to 8K
for read and 64K for write). Microsoft documents support for sector
sizes up to 4K in mdosfs. ffs is currently limited to 8K for both
read and write.

Approved by: re (kensmith)
Approved by: nyan (several years ago)


171406 12-Jul-2007 bde

Fix some bugs involving the fsinfo block (many remain unfixed). This is
part of fixing msdosfs for large sector sizes. One of the fixed bugs
was fatal for large sector sizes.

1. The fsinfo block has size 512, but it was misunderstood and declared
as having size 1024, with nothing in the second 512 bytes except a
signature at the end. The second 512 bytes actually normally (if
the file system was created by Windows) consist of a second boot
sector which is normally (in WinXP) empty except for a signature --
the normal layout is one boot sector, one fsinfo sector, another
boot sector, then these 3 sectors duplicated. However, other
layouts are valid. newfs_msdos produces a valid layout with one
boot sector, one fsinfo sector, then these 2 sectors duplicated.
The signature check for the extra part of the fsinfo was thus
normally checking the signature in either the second boot sector
or the first boot sector in the copy, and thus accidentally
succeeding. The extra signature check would just fail for weirder
layouts with 512-byte sectors, and for normal layouts with any other
sector size.

Remove the extra bytes and the extra signature check.

2. Old versions did i/o to the fsinfo block using size 1024, with the
second half only used for the extra signature check on read. This
was harmless for sector size 512, and worked accidentally for sector
size 1024. The i/o just failed for larger sector sizes.

The version being fixed did i/o to the fsinfo block using size
fsi_size(pmp) = (1024 << ((pmp)->pm_BlkPerSec >> 2)). This
expression makes no sense. It happens to work for sector small
sector sizes, but for sector size 32K it gives the preposterous
value of 64M and thus causes panics. A sector size of 32768 is
necessary for at least some DVD-RW's (where the minimum write size
is 32768 although the minimum read size is 2048).

Now that the size of the fsinfo block is 512, it always fits in
one sector so there is no need for a macro to express it. Just
use the sector size where the old code uses 1024.

Approved by: re (kensmith)
Approved by: nyan (several years ago for a different version of (2))


171379 11-Jul-2007 rwatson

Fix ioctls on the control vnode: ioctls on a character device fail with
ENOTTY. Make the control vnode a regular file so that ioctls are passed
through to our kernel module.

Submitted by: Jan Harkes <jaharkes@cs.cmu.edu>
Approved by: re (kensmith)


171378 11-Jul-2007 rwatson

Avoid a panic in insmntque when we pass a NULL mount: this reenables
some previously disabled code which according to the comment caused a
problem during shutdown. But even that is still better than
triggering a kernel panic whenever venus is started.

Submitted by: Jan Harkes <jaharkes@cs.cmu.edu>
Approved by: re (kensmith)


171377 11-Jul-2007 rwatson

Replace CODA_OPEN with CODA_OPEN_BY_FD: coda_open was disabled because
we can't open container files by device/inode number pair anymore.
Replace the CODA_OPEN upcall with CODA_OPEN_BY_FD, where venus returns
an open file descriptor for the container file. We can then grab a
reference on the vnode coda_psdev.c:vc_nb_write and use this vnode for
further accesses to the container file.

Submitted by: Jan Harkes <jaharkes@cs.cmu.edu>
Approved by: re (kensmith)


171375 11-Jul-2007 rwatson

Resolve Coda mount failing because Coda failed to match the device
operations. But we don't have to, if we find the coda_mntinfo structure
for this device in our linked list, we know the device is good.

Submitted by: Jan Harkes <jaharkes@cs.cmu.edu>
Approved by: re (kensmith)


171374 11-Jul-2007 rwatson

Avoid crash when opening Coda device: when allocating coda_mntinfo, we
need to initialize dev so that we can actually find the allocated
coda_mntinfo structure later on.

Submitted by: Jan Harkes <jaharkes@cs.cmu.edu>
Approved by: re (kensmith)


171362 11-Jul-2007 delphij

MFp4: Make use of the kernel unit number allocation facility
for tmpfs nodes.

Submitted by: Mingyan Guo <guomingyan gmail com>
Approved by: re (tmpfs blanket)


171343 10-Jul-2007 bde

Don't use almost perfectly pessimal cluster allocation. Allocation
of the the first cluster in a file (and, if the allocation cannot be
continued contiguously, for subsequent clusters in a file) was randomized
in an attempt to leave space for contiguous allocation of subsequent
clusters in each file when there are multiple writers. This reduced
internal fragmentation by a few percent, but it increased external
fragmentation by up to a few thousand percent.

Use simple sequential allocation instead. Actually maintain the fsinfo
sequence index for this. The read and write of this index from/to
disk still have many non-critical bugs, but we now write an index that
has something to do with our allocations instead of being modified
garbage. If there is no fsinfo on the disk, then we maintain the index
internally and don't go near the bugs for writing it.

Allocating the first free cluster gives a layout that is almost as good
(better in some cases), but takes too much CPU if the FAT is large and
the first free cluster is not near the beginning.

The effect of this change for untar and tar of a slightly reduced copy
of /usr/src on a new file system was:

Before (msdosfs 4K-clusters):
untar: 459.57 real untar from cached file (actually a pipe)
tar: 342.50 real tar from uncached tree to /dev/zero
Before (ffs2 soft updates 4K-blocks 4K-frags)
untar: 39.18 real
tar: 29.94 real
Before (ffs2 soft updates 16K-blocks 2K-frags)
untar: 31.35 real
tar: 18.30 real

After (msdosfs 4K-clusters):
untar 54.83 real
tar 16.18 real

All of these times can be improved further.

With multiple concurrent writers or readers (especially readers), the
improvement is smaller, but I couldn't find any case where it is
negative. 342 seconds for tarring up about 342 MB on a ~47MB/S partition
is just hard to unimprove on. (This operation would take about 7.3
seconds with reasonably localized allocation and perfect read-ahead.)
However, for active file systems, 342 seconds is closer to normal than
the 16+ seconds above or the 11 seconds with other changes (best I've
measured -- won easily by msdosfs!). E.g., my active /usr/src on ffs1
is quite old and fragmented, so reading to prepare for the above
benchmark takes about 6 times longer than reading back the fresh copies
of it.

Approved by: re (kensmith)


171308 08-Jul-2007 delphij

MFp4:
- Plug memory leak.
- Respect underlying vnode's properties rather than assuming that
the user want root:wheel + 0755. Useful for using tmpfs(5) for
/tmp.
- Use roundup2 and howmany macros instead of rolling our own version.
- Try to fix fsx -W -R foo case.
- Instead of blindly zeroing a page, determine whether we need a pagein
order to prevent data corruption.
- Fix several bugs reported by Coverity.

Submitted by: Mingyan Guo <guomingyan gmail com>, Howard Su, delphij
Coverity ID: CID 2550, 2551, 2552, 2557
Approved by: re (tmpfs blanket)


171181 03-Jul-2007 kib

Since rev. 1.199 of sys/kern/kern_conf.c, the thread that calls
destroy_dev() from d_close() cdev method would self-deadlock.
devfs_close() bump device thread reference counter, and destroy_dev()
sleeps, waiting for si_threadcount to reach zero for cdev without
d_purge method.

destroy_dev_sched() could be used instead from d_close(), to
schedule execution of destroy_dev() in another context. The
destroy_dev_sched_drain() function can be used to drain the scheduled
calls to destroy_dev_sched(). Similarly, drain_dev_clone_events() drains
the events clone to make sure no lingering devices are left after
dev_clone event handler deregistered.

make_dev_credf(MAKEDEV_REF) function should be used from dev_clone
event handlers instead of make_dev()/make_dev_cred() to ensure that created
device has reference counter bumped before cdev mutex is dropped inside
make_dev().

Reviewed by: tegge (early versions), njl (programming interface)
Debugging help and testing by: Peter Holm
Approved by: re (kensmith)


171087 29-Jun-2007 delphij

MFp4:

- Remove unnecessary NULL checks after M_WAITOK allocations.
- Use VOP_ACCESS instead of hand-rolled suser_cred()
calls. [1]
- Use malloc(9) KPI to allocate memory for string. The
optimization taken from NetBSD is not valid for FreeBSD
because our malloc(9) already act that way. [2]

Requested by: rwatson [1]
Submitted by: Howard Su [2]
Approved by: re (tmpfs blanket)


171070 28-Jun-2007 delphij

Space/style cleanups after last set of commits.

Approved by: re (tmpfs blanket)


171069 28-Jun-2007 delphij

Staticify most of fifo/vn operations, they should not
be directly exposed outside.

Approved by: re (tmpfs blanket)


171068 28-Jun-2007 delphij

Use vfs_timestamp instead of nanotime when obtaining
a timestamp for use with timekeeping.

Approved by: re (tmpfs blanket)


171067 28-Jun-2007 delphij

Reorder tf_gen and tf_id in struct tmpfs_fid. This
saves 8 bytes on amd64 architecture.

Obtained from: NetBSD
Approved by: re (tmpfs blanket)


171040 26-Jun-2007 delphij

Remove two function prototypes that are no longer used.

Approved by: re (tmpfs blanket)


171038 26-Jun-2007 delphij

- Sync with NetBSD's RCSID (HEAD preferred).
- Correct a typo.

Approved by: re (tmpfs blanket)


171029 25-Jun-2007 delphij

MFp4: Several clean-ups and improvements over tmpfs:

- Remove tmpfs_zone_xxx KPI, the uma(9) wrapper, since
they does not bring any value now.
- Use |= instead of = when applying VV_ROOT flag.
- Remove tm_avariable_nodes list. Use uma to hold the
released nodes.
- init/destory interlock mutex of node when init/fini
instead of ctor/dtor.
- Change memory computing using u_int to fix negative
value in 2G mem machine.
- Remove unnecessary bzero's
- Rely uma logic to make file id allocation harder to
guess.
- Fix some unsigned/signed related things. Make sure
we respect -o size=xxxx
- Use wire instead of hold a page.
- Pass allocate_zero to obtain zeroed pages upon first
use.

Submitted by: Howard Su
Approved by: re (tmpfs blanket, kensmith)


171023 25-Jun-2007 rafan

- Remove UMAP filesystem. It was disconnected from build three years ago,
and it is seriously broken.

Discussed on: freebsd-arch@
Approved by: re (mux)


170922 18-Jun-2007 delphij

Use vfs_timestamp() instead of nanotime() - make it up to
the user to make decisions about how detail they wanted
timestamps to have.


170903 18-Jun-2007 delphij

MFp4: fix two locking problems:

- Hold TMPFS_LOCK while updating tm_pages_used.
- Hold vm page while doing uiomove.

This will hopefully fix all known panics.

Submitted by: Howard Su


170808 16-Jun-2007 delphij

MFp4: Add tmpfs, an efficient memory file system.

Please note that, this is currently considered as an
experimental feature so there could be some rough
edges. Consult http://wiki.freebsd.org/TMPFS for
more information.

For now, connect tmpfs to build on i386 and amd64
architectures only. Please let us know if you have
success with other platforms.

This work was developed by Julio M. Merino Vidal
for NetBSD as a SoC project; Rohit Jalan ported it
from NetBSD to FreeBSD. Howard Su and Glen Leeder
are worked on it to continue this effort.

Obtained from: NetBSD via p4
Submitted by: Howard Su (with some minor changes)
Approved by: re (kensmith)


170587 12-Jun-2007 rwatson

Eliminate now-unused SUSER_ALLOWJAIL arguments to priv_check_cred(); in
some cases, move to priv_check() if it was an operation on a thread and
no other flags were present.

Eliminate caller-side jail exception checking (also now-unused); jail
privilege exception code now goes solely in kern_jail.c.

We can't yet eliminate suser() due to some cases in the KAME code where
a privilege check is performed and then used in many different deferred
paths. Do, however, move those prototypes to priv.h.

Reviewed by: csjp
Obtained from: TrustedBSD Project


170577 11-Jun-2007 remko

Correct corrupt read when the read starts at a non-aligned offset.

PR: kern/77234
MFC After: 1 week
Approved by: imp (mentor)
Requested by: many many people
Submitted by: Andriy Gapon <avg at icyb dot net dot ua>


170472 09-Jun-2007 attilio

rufetch and calcru sometimes should be called atomically together.
This patch fixes places where they should be called atomically changing
their locking requirements (both assume per-proc spinlock held) and
introducing rufetchcalc which wrappers both calls to be performed in
atomic way.

Reviewed by: jeff
Approved by: jeff (mentor)


170401 07-Jun-2007 bmah

Fix off-by-one error (introduced in r1.60) that had the effect of
disallowing a read of exactly MAXPHYS bytes.

Reviewed by: des, rdivacky
MFC after: 1 week
Sponsored by: nCircle Network Security


170307 05-Jun-2007 jeff

Commit 14/14 of sched_lock decomposition.
- Use thread_lock() rather than sched_lock for per-thread scheduling
sychronization.
- Use the per-process spinlock rather than the sched_lock for per-process
scheduling synchronization.

Tested by: kris, current@
Tested on: i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc.
Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)


170292 04-Jun-2007 attilio

Do proper "locking" for missing vmmeters part.
Now, we assume no more sched_lock protection for some of them and use the
distribuited loads method for vmmeter (distribuited through CPUs).

Reviewed by: alc, bde
Approved by: jeff (mentor)


170188 01-Jun-2007 trhodes

Revert previous, part of NFS that I didn't know about.


170184 01-Jun-2007 trhodes

Garbage collect msdosfs_fhtovp; it appears unused and I have been using
MSDOSFS without this function and problems for the last month.


170183 01-Jun-2007 kib

Change the VOP_OPEN(), vn_open() vnode operation and d_fdopen() cdev operation
argument from being file descriptor index into the pointer to struct file:
part 2. Convert calls missed in the first big commit.

Noted by: rwatson
Pointy hat to: kib


170170 31-May-2007 attilio

Revert VMCNT_* operations introduction.
Probabilly, a general approach is not the better solution here, so we should
solve the sched_lock protection problems separately.

Requested by: alc
Approved by: jeff (mentor)


170152 31-May-2007 kib

Revert UF_OPENING workaround for CURRENT.
Change the VOP_OPEN(), vn_open() vnode operation and d_fdopen() cdev operation
argument from being file descriptor index into the pointer to struct file.

Proposed and reviewed by: jhb
Reviewed by: daichi (unionfs)
Approved by: re (kensmith)


170093 29-May-2007 rwatson

Where I previously removed calls to kdb_enter(), now remove include of
kdb.h.

Pointed out by: bde


170015 27-May-2007 rwatson

Rather than entering the debugger via kdb_enter() when detecting memory
corruption under SMBUFS_NAME_DEBUG, panic() with the same error message.


170014 27-May-2007 rwatson

Rather than entering the debugger via kdb_enter() in the event the
root vnode is unexpectedly locked under NULLFS_DEBUG in nullfs and
then returning EDEADLK, panic.


169671 18-May-2007 kib

Since renaming of vop_lock to _vop_lock, pre- and post-condition
function calls are no more generated for vop_lock.
Rename _vop_lock to vop_lock1 to satisfy tools/vnode_if.awk assumption
about vop naming conventions. This restores pre/post-condition calls.


169667 18-May-2007 jeff

- define and use VMCNT_{GET,SET,ADD,SUB,PTR} macros for manipulating
vmcnts. This can be used to abstract away pcpu details but also changes
to use atomics for all counters now. This means sched lock is no longer
responsible for protecting counts in the switch routines.

Contributed by: Attilio Rao <attilio@FreeBSD.org>


169168 01-May-2007 des

The process lock is held when procfs_ioctl() is called. Assert that this
is so, and PHOLD the process while sleeping since msleep() will release
the lock.


168985 23-Apr-2007 des

Fix old locking bugs which were revealed when pseudofs was made MPSAFE.

Submitted by: tegge


168977 23-Apr-2007 rwatson

Rename mac*devfsdirent*() to mac*devfs*() to synchronize with SEDarwin,
where similar data structures exist to support devfs and the MAC
Framework, but are named differently.

Obtained from: TrustedBSD Project
Sponsored by: SPARTA, Inc.


168968 23-Apr-2007 alc

Add synchronization. Eliminate the acquisition and release of Giant.

Reviewed by: tegge


168884 20-Apr-2007 trhodes

In some cases, like whenever devfs file times are zero, the fix(aa) will not
be applied to dev entries. This leaves us with file times like "Jan 1 1970."
Work around this problem by replacing the tv_sec == 0 check with a
<= 3600 check. It's doubtful anyone will be booting within an hour of the
Epoch, let alone care about a few seconds worth of nonzero timestamps. It's
a hackish work around, but it does work and I have not experienced any
negatives in my testing.

Discussed with: bde
"Ok with me: phk


168768 15-Apr-2007 des

Avoid "unused variable" warning when building without PSEUDOFS_TRACE.


168764 15-Apr-2007 des

Make pseudofs (and consequently procfs, linprocfs and linsysfs) MPSAFE.


168763 15-Apr-2007 des

Instead of stating GIANT_REQUIRED, just acquire and release Giant where
needed. This does not make a difference now, but will when procfs is
marked MPSAFE.


168759 15-Apr-2007 des

Fix the same bug as in procfs_doproc{,db}regs(): check that uio_offset is
0 upon entry, and don't reset it before returning.

MFC after: 3 weeks


168758 15-Apr-2007 des

Don't reset uio_offset to 0 before returning. Instead, refuse to service
requests where uio_offset is not 0 to begin with. This fixes a long-
standing bug where e.g. 'cat /proc/$$/regs' would loop forever.

MFC after: 3 weeks


168720 14-Apr-2007 des

Further pseudofs improvements:

The pfs_info mutex is only needed to lock pi_unrhdr. Everything else
in struct pfs_info is modified only while Giant is held (during
vfs_init() / vfs_uninit()); add assertions to that effect.

Simplify pfs_destroy somewhat.

Remove superfluous arguments from pfs_fileno_{alloc,free}(), and the
assertions which were added in the previous commit to ensure they were
consistent.

Assert that Giant is held while the vnode cache is initialized and
destroyed. Also assert that the cache is empty when it is destroyed.

Rename the vnode cache mutex for consistency.

Fix a long-standing bug in pfs_getattr(): it would uncritically return
the node's pn_fileno as st_ino. This would result in st_ino being 0
if the node had not previously been visited by readdir(), and also in
an incorrect st_ino for process directories and any files contained
therein. Correct this by abstracting the fileno manipulations
previously done in pfs_readdir() into a new function, pfs_fileno(),
which is used by both pfs_getattr() and pfs_readdir().


168637 11-Apr-2007 des

Add a flag to struct pfs_vdata to mark the vnode as dead (e.g. process-
specific nodes when the process exits)

Move the vnode-cache-walking loop which was duplicated in pfs_exit() and
pfs_disable() into its own function, pfs_purge(), which looks for vnodes
marked as dead and / or belonging to the specified pfs_node and reclaims
them. Note that this loop is still extremely inefficient.

Add a comment in pfs_vncache_alloc() explaining why we have to purge the
vnode from the vnode cache before returning, in case anyone should be
tempted to remove the call to cache_purge().

Move the special handling for pfstype_root nodes into pfs_fileno_alloc()
and pfs_fileno_free() (the root node's fileno must always be 2). This
also fixes a bug where pfs_fileno_free() would reclaim the root node's
fileno, triggering a panic in the unr code, as that fileno was never
allocated from unr to begin with.

When destroying a pfs_node, release its fileno and purge it from the
vnode cache. I wish we could put off the call to pfs_purge() until
after the entire tree had been destroyed, but then we'd have vnodes
referencing freed pfs nodes. This probably doesn't matter while we're
still under Giant, but might become an issue later.

When destroying a pseudofs instance, destroy the tree before tearing
down the fileno allocator.

In pfs_mount(), acquire the mountpoint interlock when required.

MFC after: 3 weeks


168387 05-Apr-2007 des

Whitespace nits.


168355 04-Apr-2007 rwatson

Replace custom file descriptor array sleep lock constructed using a mutex
and flags with an sxlock. This leads to a significant and measurable
performance improvement as a result of access to shared locking for
frequent lookup operations, reduced general overhead, and reduced overhead
in the event of contention. All of these are imported for threaded
applications where simultaneous access to a shared file descriptor array
occurs frequently. Kris has reported 2x-4x transaction rate improvements
on 8-core MySQL benchmarks; smaller improvements can be expected for many
workloads as a result of reduced overhead.

- Generally eliminate the distinction between "fast" and regular
acquisisition of the filedesc lock; the plan is that they will now all
be fast. Change all locking instances to either shared or exclusive
locks.

- Correct a bug (pointed out by kib) in fdfree() where previously msleep()
was called without the mutex held; sx_sleep() is now always called with
the sxlock held exclusively.

- Universally hold the struct file lock over changes to struct file,
rather than the filedesc lock or no lock. Always update the f_ops
field last. A further memory barrier is required here in the future
(discussed with jhb).

- Improve locking and reference management in linux_at(), which fails to
properly acquire vnode references before using vnode pointers. Annotate
improper use of vn_fullpath(), which will be replaced at a future date.

In fcntl(), we conservatively acquire an exclusive lock, even though in
some cases a shared lock may be sufficient, which should be revisited.
The dropping of the filedesc lock in fdgrowtable() is no longer required
as the sxlock can be held over the sleep operation; we should consider
removing that (pointed out by attilio).

Tested by: kris
Discussed with: jhb, kris, attilio, jeff


167916 26-Mar-2007 kris

Annotate that this giant acqusition is dependent on tty locking.


167875 24-Mar-2007 maxim

o cd9660 code repo-copied, update a comment.


167497 13-Mar-2007 tegge

Make insmntque() externally visibile and allow it to fail (e.g. during
late stages of unmount). On failure, the vnode is recycled.

Add insmntque1(), to allow for file system specific cleanup when
recycling vnode on failure.

Change getnewvnode() to no longer call insmntque(). Previously,
embryonic vnodes were put onto the list of vnode belonging to a file
system, which is unsafe for a file system marked MPSAFE.

Change vfs_hash_insert() to no longer lock the vnode. The caller now
has that responsibility.

Change most file systems to lock the vnode and call insmntque() or
insmntque1() after a new vnode has been sufficiently setup. Handle
failed insmntque*() calls by propagating errors to callers, possibly
after some file system specific cleanup.

Approved by: re (kensmith)
Reviewed by: kib
In collaboration with: kib


167482 12-Mar-2007 des

Add a pn_destroy field to pfs_node. This field points to a destructor
function which is called from pfs_destroy() before the node is reclaimed.

Modify pfs_create_{dir,file,link}() to accept a pointer to a destructor
function in addition to the usual attr / fill / vis pointers.

This breaks both the programming and binary interfaces between pseudofs
and its consumers. It is believed that there are no pseudofs consumers
outside the source tree, so that the impact of this change is minimal.

Submitted by: Aniruddha Bohra <bohra@cs.rutgers.edu>


167158 02-Mar-2007 mpp

Change fifo_printinfo to check if the vnode v_fifoinfo pointer
is NULL and print a message to that effect to prevent a panic.


167086 27-Feb-2007 jhb

Use pause() rather than tsleep() on stack variables and function pointers.


166858 21-Feb-2007 cognet

Check that the error returned by vfs_getopts() is not ENOENT before assuming
there's actually an error.
This is just in order to unbreak ntfs on current, before a proper solution is
committed.


166826 19-Feb-2007 rwatson

Do allow PIOCSFL in jail for setguid processes; this is more consistent
with other debugging checks elsewhere. XXX comment on the fact that
p_candebug() is not being used here remains.


166774 15-Feb-2007 pjd

Move vnode-to-file-handle translation from vfs_vptofh to vop_vptofh method.
This way we may support multiple structures in v_data vnode field within
one file system without using black magic.

Vnode-to-file-handle should be VOP in the first place, but was made VFS
operation to keep interface as compatible as possible with SUN's VFS.
BTW. Now Solaris also implements vnode-to-file-handle as VOP operation.

VFS_VPTOFH() was left for API backward compatibility, but is marked for
removal before 8.0-RELEASE.

Approved by: mckusick
Discussed with: many (on IRC)
Tested with: ufs, msdosfs, cd9660, nullfs and zfs


166639 11-Feb-2007 rodrigc

Forced commit and #include changes for repo copy from
sys/isofs/cd9660 to sys/fs/cd9660.

Discussed on freebsd-current.


166559 08-Feb-2007 rodrigc

Add noatime to the list of mount options that msdosfs accepts.

PR: 108896
Submitted by: Eugene Grosbein <eugen grosbein pp ru>


166558 08-Feb-2007 rodrigc

Style fixes: use ANSI C function declarations.


166548 07-Feb-2007 kib

Fix the race of dereferencing /proc/<pid>/file with execve(2) by caching
the value of p_textvp. This way, we always unlock the locked vnode.
While there, vhold() the vnode around the vn_lock().

Reported and tested by: Guy Helmer (ghelmer palisadesys com)
Approved by: des (procfs maintainer)
MFC after: 1 week


166524 06-Feb-2007 rodrigc

Eliminate some dead code which was introduced in 1.23, yet was always
commented out.


166429 02-Feb-2007 pjd

coda_vptofh is never defined nor used.


166343 30-Jan-2007 avatar

Fixing compilation bustage by removing references to opt_msdosfs.h.

This auto-generated header file no longer exists since the removal of
MSDOSFS_LARGE in sys/conf/options:1.574.


166341 30-Jan-2007 trhodes

Fix spacing from my previous commit to this file:

Noticed by: fjoe


166340 30-Jan-2007 rodrigc

Add a "-o large" mount option for msdosfs. Convert compile-time checks for
#ifdef MSDOSFS_LARGE to run-time checks to see if "-o large" was specified.

Test case provided by Oliver Fromme:
truncate -s 200G test.img
mdconfig -a -t vnode -f test.img -u 9
newfs_msdos -s 419430400 -n 1 /dev/md9 zip250
mount -t msdosfs /dev/md9 /mnt # should fail
mount -t msdosfs -o large /dev/md9 /mnt # should succeed

PR: 105964
Requested by: Oliver Fromme <olli lurza secnetix de>
Tested by: trhodes
MFC after: 2 weeks


166167 22-Jan-2007 kib

Below is slightly edited description of the LOR by Tor Egge:

--------------------------
[Deadlock] is caused by a lock order reversal in vfs_lookup(), where
[some] process is trying to lock a directory vnode, that is the parent
directory of covered vnode) while holding an exclusive vnode lock on
covering vnode.

A simplified scenario:

root fs var fs
/ A / (/var) D
/var B /log (/var/log) E
vfs lock C vfs lock F

Within each file system, the lock order is clear: C->A->B and F->D->E

When traversing across mounts, the system can choose between two lock orders,
but everything must then follow that lock order:

L1: C->A->B
|
+->F->D->E

L2: F->D->E
|
+->C->A->B

The lookup() process for namei("/var") mixes those two lock orders:

VOP_LOOKUP() obtains B while A is held
vfs_busy() obtains a shared lock on F while A and B are held (follows L1,
violates L2)
vput() releases lock on B
VOP_UNLOCK() releases lock on A
VFS_ROOT() obtains lock on D while shared lock on F is held
vfs_unbusy() releases shared lock on F
vn_lock() obtains lock on A while D is held (violates L1, follows L2)

dounmount() follows L1 (B is locked while F is drained).

Without unmount activity, vfs_busy() will always succeed without blocking
and the deadlock isn't triggered (the system behaves as if L2 is followed).

With unmount, you can get 4 processes in a deadlock:

p1: holds D, want A (in lookup())
p2: holds shared lock on F, want D (in VFS_ROOT())
p3: holds B, want drain lock on F (in dounmount())
p4: holds A, want B (in VOP_LOOKUP())

You can have more than one instance of p2.

The reversal was introduced in revision 1.81 of src/sys/kern/vfs_lookup.c and
MFCed to revision 1.80.2.1, probably to avoid a cascade of vnode locks when nfs
servers are dead (VFS_ROOT() just hangs) spreading to the root fs root vnode.

- Tor Egge

To fix the LOR, ups@ noted that when crossing the mount point, ni_dvp
is actually not used by the callers of namei. Thus, placeholder deadfs
vnode vp_crossmp is introduced that is filled into ni_dvp.

Idea by: ups
Reviewed by: tegge, ups, jeff, rwatson (mac interaction)
Tested by: Peter Holm
MFC after: 2 weeks


166062 16-Jan-2007 trhodes

Add a 3rd entry in the cache, which keeps the end position
from just before extending a file. This has the desired effect
of keeping the write speed constant. And yes, that helps a lot
copying large files always at full speed now, and I have seen
improvements using benchmarks/bonnie.

Stolen from: NetBSD
Reviewed by: bde


166030 15-Jan-2007 pav

Rewrite the udf_read() routine to use a file vnode instead of the devvp vnode.
The code is modelled after cd9660, including support for simple read-ahead
courtesy of clustered read.

Fix udf_strategy to DTRT.

This change fixes sendfile(2) not to send out garbage.

Reviewed by: scottl
MFC after: 1 month


165879 07-Jan-2007 pav

Tell backing v_object the filesize right on it's creation.

MFC after: 1 week


165836 06-Jan-2007 rodrigc

When performing a mount update to change a mount from read-only to read-write,
do not call markvoldirty() until the mount has been flagged as read-write.
Due to the nature of the msdosfs code, this bug only seemed to appear for
FAT-16 and FAT-32.

This fixes the testcase:
#!/bin/sh
dd if=/dev/zero bs=1m count=1 oseek=119 of=image.msdos
mdconfig -a -t vnode -f image.msdos
newfs_msdos -F 16 /dev/md0 fd120m
mount_msdosfs -o ro /dev/md0 /mnt
mount | grep md0
mount -u -o rw /dev/md0; echo $?
mount | grep md0
umount /mnt
mdconfig -d -u 0

PR: 105412
Tested by: Eugene Grosbein <eugen grosbein pp ru>


165804 05-Jan-2007 rodrigc

Simplify code in union_hashins() and union_hashget() functions. These
functions now more closely resemble similar functions in nullfs.
This also eliminates some errors.

Submitted by: daichi, Masanori OZAWA <ozawa ongs co jp>


165792 05-Jan-2007 rodrigc

Eliminate obsolete comment, now that getushort() is implemented in
terms of functions in <sys/endian.h>.


165785 05-Jan-2007 rodrigc

Eliminate ASSERT_VOP_ELOCKED panics when doing mkdir or symlink when
sysctl vfs.lookup_shared=1.

Submitted by: daichi, Masanori OZAWA <ozawa ongs co jp>


165737 02-Jan-2007 jhb

Use the vnode interlock to close a race where pfs_vncache_alloc() could
attempt to vn_lock() a destroyed vnode resulting in a hang.

MFC after: 1 week
Submitted by: ups
Reviewed by: des


165500 23-Dec-2006 pav

Call vnode_create_vobject() in VOP_OPEN. Makes mmap work on UDF filesystem.

PR: kern/92040
Approved by: scottl
MFC after: 1 week


165431 21-Dec-2006 marcel

Unbreak 64-bit little-endian systems that do require alignment.
The fix involves using le16dec(), le32dec(), le16enc() and
le32enc(). This eliminates invalid casts and duplicated logic.


165342 19-Dec-2006 rodrigc

For big-endian version of getulong() macro, cast result to u_int32_t.
This macro was written expecting a 32-bit unsigned long, and
doesn't work properly on 64-bit systems. This bug caused vn_stat()
to return incorrect values for files larger than 2gb on msdosfs filesystems
on 64-bit systems.

PR: 106703
Submitted by: Axel Gonzalez <loox e-shell net>
MFC after: 3 days


165341 19-Dec-2006 rodrigc

Fix get_ulong() macro on AMD64 (or any little-endian 64-bit platform).
This bug caused vn_stat() to fail on files larger than 2gb on msdosfs
filesystems on AMD64.

PR: 106703
Tested by: Axel Gonzalez <loox e-shell net>
MFC after: 3 days


165037 09-Dec-2006 rodrigc

Remove unused variable in unionfs_root().

Submitted by: daichi, Masanori OZAWA


165036 09-Dec-2006 rodrigc

Use vfs_mount_error() in a few places to give more descriptive mount error
messages.


165035 09-Dec-2006 rodrigc

Add locking around calls to unionfs_get_node_status()
in unionfs_ioctl() and unionfs_poll().

Submitted by: daichi, Masanori OZAWA <ozawa@ongs.co.jp>
Prompted by: kris


165034 09-Dec-2006 rodrigc

In unionfs_readdir(), prevent a possible NULL dereference.

CID: 1667
Found by: Coverity Prevent (tm)


165033 09-Dec-2006 rodrigc

In unionfs_hashrem(), use LIST_FOREACH_SAFE when iterating over
the list of nodes to free them.

CID: 1668
Found by: Coverity Prevent (tm)


165022 09-Dec-2006 rodrigc

Minor cleanup. If we are doing a mount update, and we pass in
an "export" flag indicating that we are trying to NFS export the
filesystem, and the MSDOSFS_LARGEFS flag is set on the filesystem,
then deny the mount update and export request. Otherwise,
let the full mount update proceed normally.
MSDOSFS_LARGES and NFS don't mix because of the way inodes are calculated
for MSDOSFS_LARGEFS.

MFC after: 3 days


165005 08-Dec-2006 kientzle

The ISO9660 spec does allow files up to 4G. Change the i_size
field to "unsigned long" so that it actually works.
Thanks to Robert Sciuk for sending me a DVD that
demonstrated ISO9660-formatted media with a file >2G.
I've now fixed this both in libarchive and in the cd9660
filesystem.

MFC after: 14 days


164936 06-Dec-2006 julian

Threading cleanup.. part 2 of several.

Make part of John Birrell's KSE patch permanent..
Specifically, remove:
Any reference of the ksegrp structure. This feature was
never fully utilised and made things overly complicated.
All code in the scheduler that tried to make threaded programs
fair to unthreaded programs. Libpthread processes will already
do this to some extent and libthr processes already disable it.

Also:
Since this makes such a big change to the scheduler(s), take the opportunity
to rename some structures and elements that had to be moved anyhow.
This makes the code a lot more readable.

The ULE scheduler compiles again but I have no idea if it works.

The 4bsd scheduler still reqires a little cleaning and some functions that now do
ALMOST nothing will go away, but I thought I'd do that as a separate commit.

Tested by David Xu, and Dan Eischen using libthr and libpthread.


164855 03-Dec-2006 maxim

o Do not leave uninitialized birthtime: in MSDOSFSMNT_LONGNAME
set birthtime to FAT CTime (creation time) and in the other cases
set birthtime to -1.

o Set ctime to mtime instead of FAT CTime which has completely
different meaning.

PR: kern/106018
Submitted by: Oliver Fromme
MFC after: 1 month


164836 02-Dec-2006 rodrigc

Add missing includes for <sys/buf.h> and <sys/bio.h>.


164829 02-Dec-2006 rodrigc

Many, many thanks to Masanori OZAWA <ozawa@ongs.co.jp>
and Daichi GOTO <daichi@FreeBSD.org> for submitting this
major rewrite of unionfs. This rewrite was done to
try to solve many of the longstanding crashing and locking
issues in the existing unionfs implementation. This
implementation also adds a 'MASQUERADE mode', which allows
the user to set different user, group, and file permission
modes in the upper layer.

Submitted by: daichi, Masanori OZAWA
Reviewed by: rodrigc (modified for minor style issues)


164627 26-Nov-2006 maxim

o From the submitter: dos2unixchr will convert to lower case if
LCASE_BASE or LCASE_EXT or both are set. But dos2unixfn uses
dos2unixchr separately for the basename and the extension. So if
either LCASE_BASE or LCASE_EXT is set, dos2unixfn will convert both
the basename and extension to lowercase because it is blindly
passing in the state of both flags to dos2unixchr. The bit masks I
used ensure that only the state of LCASE_BASE gets passed to
dos2unixchr when the basename is converted, and only the state of
LCASE_EXT is passed in when the extension is converted.

PR: kern/86655
Submitted by: Micah Lieske
MFC after: 3 weeks


164450 20-Nov-2006 le

Fix an integer overflow and allow access to files larger than 4GB on
NTFS.


164356 17-Nov-2006 kib

Wake up PIOCWAIT handler on the process exit in addition to the stop
events. &p->p_stype is explicitely woken up on process exit for us.

Now, truss /nonexistent exits with error instead of waiting until killed
by signal.

Reported by: Nikos Vassiliadis nvass at teledomenet gr
Reviewed by: jhb
MFC after: 1 week


164248 13-Nov-2006 kmacy

change vop_lock handling to allowing tracking of callers' file and line for
acquisition of lockmgr locks

Approved by: scottl (standing in for mentor rwatson)


164033 06-Nov-2006 rwatson

Sweep kernel replacing suser(9) calls with priv(9) calls, assigning
specific privilege names to a broad range of privileges. These may
require some future tweaking.

Sponsored by: nCircle Network Security, Inc.
Obtained from: TrustedBSD Project
Discussed on: arch@
Reviewed (at least in part) by: mlaier, jmg, pjd, bde, ceri,
Alex Lyashkov <umka at sevcity dot net>,
Skip Ford <skip dot ford at verizon dot net>,
Antoine Brodin <antoine dot brodin at laposte dot net>


163993 05-Nov-2006 bp

Create a bidirectional mapping of the DOS 'read only' attribute
to the 'w' flag.

PR: kern/77958
Submitted by: ghozzy gmail com
MFC after: 1 month


163709 26-Oct-2006 jb

Make KSE a kernel option, turned on by default in all GENERIC
kernel configs except sun4v (which doesn't process signals properly
with KSE).

Reviewed by: davidxu@


163652 24-Oct-2006 phk

Ditch crummy fattime <--> timespec conversion functions


163651 24-Oct-2006 phk

Drop crummy fattime to timespec conversion routines.

Leave a XXX here for anybody able to test.


163647 24-Oct-2006 phk

Replace slightly crummy fattime<->timespec conversion functions.


163606 22-Oct-2006 rwatson

Complete break-out of sys/sys/mac.h into sys/security/mac/mac_framework.h
begun with a repo-copy of mac.h to mac_framework.h. sys/mac.h now
contains the userspace and user<->kernel API and definitions, with all
in-kernel interfaces moved to mac_framework.h, which is now included
across most of the kernel instead.

This change is the first step in a larger cleanup and sweep of MAC
Framework interfaces in the kernel, and will not be MFC'd.

Obtained from: TrustedBSD Project
Sponsored by: SPARTA


163559 21-Oct-2006 trhodes

Fake the link count until we have no choice but to load data from the
MFT.

PR: 86965
Submitted by: Lowell Gilbert <lgfbsd@be-well.ilk.org>


163530 20-Oct-2006 kib

Update the access and modification times for dev while still holding
thread reference on it.

Reviewed by: tegge
Approved by: pjd (mentor)


163529 20-Oct-2006 kib

Fix the race between devfs_fp_check and devfs_reclaim. Derefence the
vnode' v_rdev and increment the dev threadcount , as well as clear it
(in devfs_reclaim) under the dev_lock().

Reviewed by: tegge
Approved by: pjd (mentor)


163481 18-Oct-2006 kib

Properly lock the vnode around vgone() calls.

Unlock the vnode in devfs_close() while calling into the driver d_close()
routine.

devfs_revoke() changes by: ups
Reviewed and bugfixes by: tegge
Tested by: mbr, Peter Holm
Approved by: pjd (mentor)
MFC after: 1 week


162970 02-Oct-2006 phk

Use utc_offset() where applicable, and hide the internals of it
as static variables.


162954 02-Oct-2006 phk

First part of a little cleanup in the calendar/timezone/RTC handling.

Move relevant variables to <sys/clock.h> and fix #includes as necessary.

Use libkern's much more time- & spamce-efficient BCD routines.


162711 27-Sep-2006 ru

Fix our ioctl(2) implementation when the argument is "int". New
ioctls passing integer arguments should use the _IOWINT() macro.
This fixes a lot of ioctl's not working on sparc64, most notable
being keyboard/syscons ioctls.

Full ABI compatibility is provided, with the bonus of fixing the
handling of old ioctls on sparc64.

Reviewed by: bde (with contributions)
Tested by: emax, marius
MFC after: 1 week


162647 26-Sep-2006 tegge

Use mount interlock to protect all changes to mnt_flag and mnt_kern_flag.
This eliminates a race where MNT_UPDATE flag could be lost when nmount()
raced against sync(), sync_fsync() or quotactl().


162443 19-Sep-2006 kib

Fix the bug in rev. 1.134. In devfs_allocv_drop_refs(), when not_found == 2
and drop_dm_lock is true, no unlocking shall be attempted. The lock is
already dropped and memory is freed.

Found with: Coverity Prevent(tm)
CID: 1536
Approved by: pjd (mentor)


162398 18-Sep-2006 kib

Resolve the devfs deadlock caused by LOR between devfs_mount->dm_lock and
vnode lock in devfs_allocv. Do this by temporary dropping dm_lock around
vnode locking.

For safe operation, add hold counters for both devfs_mount and devfs_dirent,
and DE_DOOMED flag for devfs_dirent. The facilities allow to continue after
dropping of the dm_lock, by making sure that referenced memory does not
disappear.

Reviewed by: tegge
Tested by: kris
Approved by: kan (mentor)
PR: kern/102335


162255 12-Sep-2006 imp

Put the osta.c license on osta.h. The license is the same.

Approved by: scottl@


161425 17-Aug-2006 imp

while (0); -> while (0) in multi-line macros


161125 09-Aug-2006 alc

Introduce a field to struct vm_page for storing flags that are
synchronized by the lock on the object containing the page.

Transition PG_WANTED and PG_SWAPINPROG to use the new field,
eliminating the need for holding the page queues lock when setting
or clearing these flags. Rename PG_WANTED and PG_SWAPINPROG to
VPO_WANTED and VPO_SWAPINPROG, respectively.

Eliminate the assertion that the page queues lock is held in
vm_page_io_finish().

Eliminate the acquisition and release of the page queues lock
around calls to vm_page_io_finish() in kern_sendfile() and
vfs_unbusy_pages().


160964 04-Aug-2006 yar

Commit the results of the typo hunt by Darren Pilgrim.
This change affects documentation and comments only,
no real code involved.

PR: misc/101245
Submitted by: Darren Pilgrim <darren pilgrim bitfreak org>
Tested by: md5(1)
MFC after: 1 week


160939 03-Aug-2006 delphij

When the volume is being downgraded from a read-write mode, mark
it as clean.

PR: kern/85366
Submitted by: Dan Lukes <dan at obluda dot cz>
MFC After: 2 weeks


160664 25-Jul-2006 yar

In udf_find_partmaps(), when we find a type 1 partition map, we have to
skip the actual type 1 length (6 bytes). With this change, it is now possible
to correctly spot the VAT partition map in certain discs.

Submitted by: Pedro Martelletto <pedro@ambientworks.net>


160489 18-Jul-2006 jhb

Update comment.


160437 17-Jul-2006 jhb

Lock the smb share before doing a 'put' on it in smbfs_unmount().

Tested by: "Jiawei Ye" <leafy7382 at gmail>


160425 17-Jul-2006 phk

Remove the NDEVFSINO and NDEVFSOVERFLOW options which no longer exists in
DEVFS.

Remove the opt_devfs.h file now that it is empty.


160310 12-Jul-2006 ups

Add vnode interlocking to devfs.
This prevents race conditions that can cause pagefaults or devfs
to use arbitrary vnodes.

MFC after: 1 week


160190 08-Jul-2006 jhb

Add a kern_close() so that the ABIs can close a file descriptor w/o having
to populate a close_args struct and change some of the places that do.


160134 06-Jul-2006 rwatson

Remove unneeded mac.h include.

MFC after: 3 days


160133 06-Jul-2006 rwatson

Remove now unneeded opt_mac.h and mac.h includes.

MFC after: 3 days


160132 06-Jul-2006 rwatson

Use #include "", not #include <> for opt_foo.h.

MFC after: 3 days


159996 27-Jun-2006 netchild

Correctly calculate a buffer length. It was off by one so a read() returned
one byte less than needed.

This is a RELENG_x_y candidate, since it fixes a problem with Oracle 10.

Noticed by: Dmitry Ganenko <dima@apk-inform.com>
Testcase by: Dmitry Ganenko <dima@apk-inform.com>
Reviewed by: des
Submitted by: rdivacky
Sponsored by: Google SoC 2006
MFC after: 1 week


159939 26-Jun-2006 scottl

Fix a memory leak and a nested 'for' loop in the spare table handling.

Submitted by: Pedro Martelletto


159283 05-Jun-2006 ghelmer

Upon further review, DES prefers this change over that in revision 1.13
to resolve the directory access problem for processes with P_SUGID flag
set.

Suggested by: des


159128 01-Jun-2006 rodrigc

mount_msdosfs.c:
- remove call to getmntopts(), and just pass -o options to
nmount(). This removes some confusion as to what options
msdosfs can parse, by pushing the responsibility of option parsing
to the VFS and FS specific code in the kernel.

msdosfs_vfsops.c:
- add "force" and "sync" to msdosfs_opts. They used to be specified
in mount_msdosfs.c, so move them here. It's not clear whethere these
options should be placed into global_opts in vfs_mount.c or not.

Motivated by: marcus


159117 31-May-2006 cperciva

Enable inadvertantly disabled "securenet" access controls in ypserv. [1]

Correct a bug in the handling of backslash characters in smbfs which can
allow an attacker to escape from a chroot(2). [2]

Security: FreeBSD-SA-06:15.ypserv [1]
Security: FreeBSD-SA-06:16.smbfs [2]


159023 28-May-2006 rodrigc

Remove incorrect null_checkexp() routine. This
will allow the NFS server to call vfs_stdcheckexp() on the exported nullfs
filesystem, not the underlying filesystem being nullfs mounted.
If the lower filesystem was not NFS exported, then the NFS exported
null filesystem would not work.

Pointed out by: scottl
PR: kern/87906
MFC after: 1 week


159019 28-May-2006 rodrigc

Modify MNT_UPDATE behavior for nullfs so that it does not
return EOPNOTSUPP if an "export" parameter was passed in.
This should allow nullfs mounts to be NFS exported.

PR: kern/87906
MFC after: 1 week


158927 26-May-2006 rodrigc

Remove calls to vfs_export() for exporting a filesystem for NFS mounting
from individual filesystems. Call it instead in vfs_mount.c,
after we call VFS_MOUNT() for a specific filesystem.


158924 26-May-2006 rodrigc

Remove calls to vfs_export() for exporting a filesystem for NFS mounting
from individual filesystems. Call it instead in vfs_mount.c,
after we call VFS_MOUNT() for a specific filesystem.


158915 25-May-2006 ups

Call vm_object_page_clean() with the object lock held.

Submitted by: kensmith@
Reviewed by: mohans@
MFC after: 6 days


158906 25-May-2006 ups

Do not set B_NOCACHE on buffers when releasing them in flushbuflist().
If B_NOCACHE is set the pages of vm backed buffers will be invalidated.
However clean buffers can be backed by dirty VM pages so invalidating them
can lead to data loss.
Add support for flush dirty page in the data invalidation function
of some network file systems.

This fixes data losses during vnode recycling (and other code paths
using invalbuf(*,V_SAVE,*,*)) for data written using an mmaped file.

Collaborative effort by: jhb@,mohans@,peter@,ps@,ups@
Reviewed by: tegge@
MFC after: 7 days


158880 24-May-2006 ghelmer

Revision 1.4 set access for all sensitive files in /proc/<PID> to mode 0
if a process's uid or gid has changed, but the /proc/<PID> directory
itself was also set to mode 0. Assuming this doesn't open any
security holes, open access to the /proc/<PID> directory for users
other than root to read or search the directory.

Reviewed by: des (back in February)
MFC after: 3 weeks


158651 16-May-2006 phk

Since DELAY() was moved, most <machine/clock.h> #includes have been
unnecessary.


158611 15-May-2006 kbyanc

Restore the ability to mount procfs and fdescfs filesystems via the
mount(2) system call:

* Add cmount hook to fdescfs and pseudofs (and, by extension, procfs and
linprocfs). This (mostly) restores the ability to mount these
filesystems using the old mount(2) system call (see below for the
rest of the fix).

* Remove not-NULL check for the data argument from the mount(2) entry
point. Per the mount(2) man page, it is up to the individual
filesystem being mounted to verify data. Or, in the case of procfs,
etc. the filesystem is free to ignore the data parameter if it does
not use it. Enforcing data to be not-NULL in the mount(2) system call
entry point prevented passing NULL to filesystems which ignored the
data pointer value. Apparently, passing NULL was common practice
in such cases, as even our own mount_std(8) used to do it in the
pre-nmount(2) world.

All userland programs in the tree were converted to nmount(2) long ago,
but I've found at least one external program which broke due to this
(presumably unintentional) mount(2) API change. One could argue that
external programs should also be converted to nmount(2), but then there
isn't much point in keeping the mount(2) interface for backward
compatibility if it isn't backward compatible.


157685 12-Apr-2006 pjd

Remove unused prototypes.


157342 31-Mar-2006 jeff

- Add a bogus vhold/vdrop around vgone() in devfs_revoke. Without this
the vnode is never recycled. It is bogus because the reference really
should be associated with the devfs dirent.


156894 19-Mar-2006 tegge

Call vn_start_write() before locking vnode.


156732 15-Mar-2006 rwatson

Add a_fdidx to comment prototype for fifo_open().

MFC after: 3 days
Submitted by: Kostik Belousov <kostikbel at gmail dot com>


156714 14-Mar-2006 rwatson

If fifo_open() is called with a negative file descriptor, return EINVAL
rather than panicking later. This can occur if the kernel calls
vn_open() on a fifo, as there will be no associated file descriptor,
and therefore the file descriptor operations cannot be modified to
point to the fifo operation set.

MFC after: 3 days
Reported by: Martin <nakal at nurfuerspam dot de>
PR: 94278


156693 13-Mar-2006 joerg

When encountering a ISO_SUSP_CFLAG_ROOT element in Rock Ridge
processing, this actually means there's a double slash recorded in the
symbolic link's path name. We used to start over from / then, which
caused link targets like ../../bsdi.1.0/include//pathnames.h to be
interpreted as /pathnahes.h. This is both contradictionary to our
conventional slash interpretation, as well as potentially dangerous.

The right thing to do is (obviously) to just ignore that element.

bde once pointed out that mistake when he noticed it on the
4.4BSD-Lite2 CD-ROM, and asked me for help.

Reviewed by: bde (about half a year ago)
MFC after: 3 days


156585 12-Mar-2006 jeff

- Define a null_getwritemount to get the mount-point for the lower
filesystem so that nullfs doesn't permit you to circumvent snapshots.

Discussed with: tegge
Sponsored by: Isilon Systems, Inc.


156095 28-Feb-2006 kris

Correct the vnode locking in fdescfs.

PR: kern/93905
Submitted by: Kostik Belousov <kostikbel@gmail.com>
Reviewed by: jeff
MFC After: 1 week


156062 27-Feb-2006 yar

CODA_COMPAT_5 may not be defined unconditionally in the coda5 module.
Otherwise a kernel build would break in the coda5 module if the main
kernel conf file enabled CODA_COMPAT_5, too. Redefined symbols are
strictly disallowed by -Werror.

To overcome this issue, introduce a different symbol indicating coda5
build, CODA5_MODULE, and translate it to CODA_COMPAT_5 appropriately
in /sys/coda/coda.h.

MFC after: 3 days


155922 22-Feb-2006 jhb

Close some races between procfs/ptrace and exit(2):
- Reorder the events in exit(2) slightly so that we trigger the S_EXIT
stop event earlier. After we have signalled that, we set P_WEXIT and
then wait for any processes with a hold on the vmspace via PHOLD to
release it. PHOLD now KASSERT()'s that P_WEXIT is clear when it is
invoked, and PRELE now does a wakeup if P_WEXIT is set and p_lock drops
to zero.
- Change proc_rwmem() to require that the processing read from has its
vmspace held via PHOLD by the caller and get rid of all the junk to
screw around with the vmspace reference count as we no longer need it.
- In ptrace() and pseudofs(), treat a process with P_WEXIT set as if it
doesn't exist.
- Only do one PHOLD in kern_ptrace() now, and do it earlier so it covers
FIX_SSTEP() (since on alpha at least this can end up calling proc_rwmem()
to clear an earlier single-step simualted via a breakpoint). We only
do one to avoid races. Also, by making the EINVAL error for unknown
requests be part of the default: case in the switch, the various
switch cases can now just break out to return which removes a _lot_ of
duplicated PRELE and proc unlocks, etc. Also, it fixes at least one bug
where a LWP ptrace command could return EINVAL with the proc lock still
held.
- Changed the locking for ptrace_single_step(), ptrace_set_pc(), and
ptrace_clear_single_step() to always be called with the proc lock
held (it was a mixed bag previously). Alpha and arm have to drop
the lock while the mess around with breakpoints, but other archs
avoid extra lock release/acquires in ptrace(). I did have to fix a
couple of other consumers in kern_kse and a few other places to
hold the proc lock and PHOLD.

Tested by: ps (1 mostly, but some bits of 2-4 as well)
MFC after: 1 week


155920 22-Feb-2006 jhb

Change pfs_visible() to optionally return a pointer to the process
associated with the passed in pfs_node. If it does return a pointer, it
keeps the process locked. This allows a lot of places that were calling
pfind() again right after pfs_visible() to not have to do that and avoids
races since we don't drop the proc lock just to turn around and lock it
again. This will become more important with future changes to fix races
between procfs/ptrace and exit(2). Also, removed a duplicate pfs_visible()
call in pfs_getextattr().

Reviewed by: des
MFC after: 1 week


155918 22-Feb-2006 jhb

Hold the proc lock while calling proc_sstep() since the function asserts
it and remove a PRELE() that didn't have a matching PHOLD(). The calling
code already has a PHOLD anyway.

MFC after: 1 week


155903 22-Feb-2006 jeff

- We must hold a reference to a vnode before calling vgone() otherwise
it may not be removed from the freelist.

MFC After: 1 week
Found by: kris


155899 22-Feb-2006 jeff

- spell VOP_LOCK(vp, LK_RELEASE... VOP_UNLOCK(vp,... so that asserts in
vop_lock_post do not trigger.
- Rearrange null_inactive to null_hashrem earlier so there is no chance
of finding the null node on the hash list after the locks have been
switched.
- We should never have a NULL lowervp in null_reclaim() so there is
no need to handle this situation. panic instead.

MFC After: 1 week


155898 22-Feb-2006 jeff

- Assert that the lowervp is locked in null_hashget().
- Simplify the logic dealing with recycled vnodes in null_hashget() and
null_hashins(). Since we hold the lower node locked in both cases
the null node can not be undergoing recycling unless reclaim somehow
called null_nodeget(). The logic that was in place was not safe and
was essentially dead code.

MFC After: 1 week


155896 22-Feb-2006 jeff

- Deadfs should not use the std GETWRITEMOUNT routine. Add one that always
returns NULL.

MFC After: 1 week


155508 10-Feb-2006 jhb

Correctly set MNTK_MPSAFE flag from the lower vnode's mount rather than
always turning it on along with any flags set in the lower mount.

Tested by: kris
Reviewed by: jeff
MFC after: 3 days


155423 07-Feb-2006 jeff

- No need to WANTPARENT when we're just going to vrele it in a deadlock
prone way later.

Reported by: kkenn
MFC After: 3 days


155256 03-Feb-2006 will

Make UDF endian-safe.

Submitted by: Pedro Martelletto <pedro@ambientworks.net> (via scottl)
Tested on: sparc64


155160 01-Feb-2006 jeff

- Reorder calls to vrele() after calls to vput() when the vrele is a
directory. vrele() may lock the passed vnode, which in these cases would
give an invalid lock order of child -> parent. These situations are
deadlock prone although do not typically deadlock because the vrele
is typically not releasing the last reference to the vnode. Users of
vrele must consider it as a call to vn_lock() and order it appropriately.

MFC After: 1 week
Sponsored by: Isilon Systems, Inc.
Tested by: kkenn


155034 30-Jan-2006 jeff

- Remove a stale comment. This function was rewritten to be SMP safe some
time ago.

Sponsored by: Isilon Systems, Inc.


154730 23-Jan-2006 trhodes

Update incorrect comments here, there should not be a call to panic()
over fs corruption.

Discussed with: alfred, phk


154692 22-Jan-2006 fjoe

Do not assume that `char direntry::deExtension[3]' starts right after
`char direntry::deName[8]' and access deExtension[] explicitly.

Found by: Coverity Prevent(tm)
CID: 350, 351, 352


154647 21-Jan-2006 rwatson

Convert last four functions in coda_vnops.c to ANSI C function
declarations. I knew I would get to fix something in Coda
eventually.

MFC after: 1 week


154487 17-Jan-2006 alfred

I ran into an nfs client panic a couple of times in a row over the
last few days. I tracked it down to the fact that nfs_reclaim()
is setting vp->v_data to NULL _before_ calling vnode_destroy_object().
After silence from the mailing list I checked further and discovered
that ufs_reclaim() is unique among FreeBSD filesystems for calling
vnode_destroy_object() early, long before tossing v_data or much
of anything else, for that matter. The rest, including NFS, appear
to be identical, as if they were just clones of one original routine.

The enclosed patch fixes all file systems in essentially the same
way, by moving the call to vnode_destroy_object() to early in the
routine (before the call to vfs_hash_remove(), if any). I have
only tested NFS, but I've now run for over eighteen hours with the
patch where I wouldn't get past four or five without it.

Submitted by: Frank Mayhar
Requested by: Mohan Srinivasan
MFC After: 1 week


154152 09-Jan-2006 tegge

Add marker vnodes to ensure that all vnodes associated with the mount point are
iterated over when using MNT_VNODE_FOREACH.

Reviewed by: truckman


154144 09-Jan-2006 maxim

o Fix typo in the define: s/MRAK_INT_GEN/MARK_INT_GEN/. The typo
was harmless because the define is not used in coda_vfsops.c.

Submitted by: Hugo Meiland


154054 05-Jan-2006 maxim

o Typo in the debug message: s/skiped/skipped.

PR: kern/91346
Submitted by: Gavin Atkinson


153986 03-Jan-2006 rwatson

When returning EIO from DEVFSIO_RADD ioctl, drop the exclusive rule
lock. Otherwise the system comes to a rather sudden and grinding
halt.

MFC after: 1 week


153706 24-Dec-2005 trhodes

Make tv_sec a time_t on all platforms but alpha. Brings us more in line with
POSIX. This also makes the struct correct we ever implement an i386-time64
architecture. Not that we need too.

Reviewed by: imp, brooks
Approved by: njl (acpica), des (no objects, touches procfs)
Tested with: make universe


153400 14-Dec-2005 des

Eradicate caddr_t from the VFS API.


153121 05-Dec-2005 avatar

Recent nmount(2) adoption in mount_smbfs(8) did not flag the "long" option
since mount_smbfs(8) assumed long name mounting by default unless "-n long"
was explicitly specified.

Rather than supplying a "long" option in mount_smbfs(8), this commit brings
back the original behaviour by associating SMBFS_MOUNT_NO_LONG with the
"nolong" option. This should fix the broken long file names on smbfs people
observed recently.

Reported by: Vladimir Grebenschikov <vova at fbsd dot ru>
Reviewed by: phk
Tested by: Slawa Olhovchenkov <slw at zxy dot spb dot ru>


153110 05-Dec-2005 ru

Fix -Wundef warnings found when compiling i386 LINT, GENERIC and
custom kernels.


153084 04-Dec-2005 ru

Fix -Wundef from compiling the amd64 LINT.


153072 04-Dec-2005 ru

Fix -Wundef.


152678 22-Nov-2005 bp

Fix interaction with Windows 2000/XP based servers:

If the complete reply on the TRANS2_FIND_FIRST2 request fits exactly
into one responce packet, then next call to TRANS2_FIND_NEXT2 will return
zero entries and server will close current transaction. To avoid
subsequent errors we should not perform FIND_CLOSE2 request.

PR: kern/78953
Submitted by: Jim Carroll


152610 19-Nov-2005 rodrigc

Properly parse the nowin95 mount option.

Tested by: Rainer Hurling <rhurlin at gwdg dot de>


152595 18-Nov-2005 rodrigc

Add "shortnames" and "longnames" mount options which are
synonyms for "shortname" and "longname" mount options. The old
(before nmount()) mount_msdosfs program accepted "shortnames" and "longnames",
but the kernel nmount() checked for "shortname" and "longname".
So, make the kernel accept "shortnames", "longnames", "shortname", "longname"
for forwards and backwarsd compatibility.

Discovered by: Rainer Hurling <rhurlin at gwdg dot de>


152466 16-Nov-2005 rodrigc

- Add errmsg to the list of smbfs mount options.
- Use vfs_mount_error() to propagate smbfs mount errors back to userspace.

Reviewed by: bp (smbfs maintainer)


152254 09-Nov-2005 dwhite

This is a workaround for a complicated issue involving VFS cookies and devfs.
The PR and patch have the details. The ultimate fix requires architectural
changes and clarifications to the VFS API, but this will prevent the system
from panicking when someone does "ls /dev" while running in a shell under the
linuxulator.

This issue affects HEAD and RELENG_6 only.

PR: 88249
Submitted by: "Devon H. O'Dell" <dodell@ixsystems.com>
MFC after: 3 days


151897 31-Oct-2005 rwatson

Normalize a significant number of kernel malloc type names:

- Prefer '_' to ' ', as it results in more easily parsed results in
memory monitoring tools such as vmstat.

- Remove punctuation that is incompatible with using memory type names
as file names, such as '/' characters.

- Disambiguate some collisions by adding subsystem prefixes to some
memory types.

- Generally prefer lower case to upper case.

- If the same type is defined in multiple architecture directories,
attempt to use the same name in additional cases.

Not all instances were caught in this change, so more work is required to
finish this conversion. Similar changes are required for UMA zone names.


151453 18-Oct-2005 phk

Use correct cirteria for determining which directory entries we can
purge right away and which we merely can hide.

Beaten into my skull by: kris


151447 18-Oct-2005 des

Implement the full range of ISO9660 number conversion routines in iso.h.

MFC after: 2 weeks


151407 17-Oct-2005 rodrigc

Unconditionally mount a CD9660 filesystem as read-only, instead of
returning EROFS if we forget to mount it as read-only.


151406 17-Oct-2005 rodrigc

Use the actual sector size of the media instead of hard-coding it to 2048.
This eliminates KASSERTs in GEOM if we accidentally mount an audio CD
as a cd9660 filesystem.


151405 17-Oct-2005 rodrigc

Unconditionally mount a UDF filesystem as read-only, instead of
returning an EROFS if we forget to mount it as read-only.


151396 17-Oct-2005 flz

- Fix typo.

Approved by: ssouhlal
MFC after: 1 week


151394 16-Oct-2005 truckman

Update nwfs_lookup() to match the current cache_lookup() API.
cache_lookup() has returned a ref'ed and locked vnode since
vfs_cache.c:1.96, dated Tue Mar 29 12:59:06 2005 UTC. This change
is similar to the change made to smbfs_lookup() in smbfs_vnops.c:1.58.

Tested by: "Antony Mawer" ant AT mawer.org
MFC after: 2 weeks


151393 16-Oct-2005 kris

Reflect mpsafety of the underlying filesystem in the nullfs image.

I benchmarked this by simultaneously extracting 4 large tarballs (basically
world images) on a 4-processor AMD64 system, in a malloc-backed md.

With this patch, system time was reduced by 43%, and wall clock time by 33%.

Submitted by: jeff
MFC after: 1 week


151392 16-Oct-2005 truckman

Apply the same fix to a potential race in the ISDOTDOT code in
cd9660_lookup() that was used to fix an actual race in ufs_lookup.c:1.78.
This is not currently a hazard, but the bug would be activated by
marking cd9660 as MPSAFE.

Requested by: bde


151349 14-Oct-2005 yar

In preparation for making the modules actually use opt_*.h files
provided in the kernel build directory, fix modules that were
failing to build this way due to not quite correct kernel option
usage. In particular:

ng_mppc.c uses two complementary options, both of which are listed
in sys/conf/files. Ideally, there should be a separate option for
including ng_mppc.c in kernel build, but now only
NETGRAPH_MPPC_ENCRYPTION is usable anyway, the other one requires
proprietary files.

nwfs and smbfs were trying to ensure they were built with proper
network components, but the check was rather questionable.

Discussed with: ru


151316 14-Oct-2005 davidxu

1. Change prototype of trapsignal and sendsig to use ksiginfo_t *, most
changes in MD code are trivial, before this change, trapsignal and
sendsig use discrete parameters, now they uses member fields of
ksiginfo_t structure. For sendsig, this change allows us to pass
POSIX realtime signal value to user code.

2. Remove cpu_thread_siginfo, it is no longer needed because we now always
generate ksiginfo_t data and feed it to libpthread.

3. Add p_sigqueue to proc structure to hold shared signals which were
blocked by all threads in the proc.

4. Add td_sigqueue to thread structure to hold all signals delivered to
thread.

5. i386 and amd64 now return POSIX standard si_code, other arches will
be fixed.

6. In this sigqueue implementation, pending signal set is kept as before,
an extra siginfo list holds additional siginfo_t data for signals.
kernel code uses psignal() still behavior as before, it won't be failed
even under memory pressure, only exception is when deleting a signal,
we should call sigqueue_delete to remove signal from sigqueue but
not SIGDELSET. Current there is no kernel code will deliver a signal
with additional data, so kernel should be as stable as before,
a ksiginfo can carry more information, for example, allow signal to
be delivered but throw away siginfo data if memory is not enough.
SIGKILL and SIGSTOP have fast path in sigqueue_add, because they can
not be caught or masked.
The sigqueue() syscall allows user code to queue a signal to target
process, if resource is unavailable, EAGAIN will be returned as
specification said.
Just before thread exits, signal queue memory will be freed by
sigqueue_flush.
Current, all signals are allowed to be queued, not only realtime signals.

Earlier patch reviewed by: jhb, deischen
Tested on: i386, amd64


151157 09-Oct-2005 rodrigc

- Do not hardcode the bsize to a sectorsize of 2048, even though
the UDF specification specifies a logical sectorsize of 2048.
Instead, get it from GEOM.
- When reading the UDF Anchor Volume Descriptor, use the logical
sectorsize of 2048 when calculating the offset to read from, but
use the actual sectorsize to determine how much to read.

- works with reading a DVD disk and a DVD disk image file via mdconfig
- correctly returns EINVAL if we try to mount_udf an audio CD, instead
of panicking inside GEOM when INVARIANTS is set


151054 07-Oct-2005 pjd

We don't need 'imp' here.


150794 01-Oct-2005 rwatson

Second attempt at a work-around for fifo-related socket panics during
make -j with high levels of parallelism: acquire Giant in fifo I/O
routines.

Discussed with: ups
MFC after: 3 days


150761 30-Sep-2005 phk

The NWFS code in RELENG_6 is broken due to a typo in
sys/fs/nwfs/nwfs_vfsop= s.c, introduced with the conversion to
nmount with revision 1.38. This causes mount_nwfs to fail with
the error message:

mount_nwfs: mount error: /mnt/netware: syserr = No such file or directo=
ry

This is caused by a typo on line 178, which specifies "nwfw_args"
rather than "nwfs_args".

Submitted by: Antony Mawer <gnats@mawer.org>
Fat fingers: phk
PR: 86757
MFC: 3 days


150711 29-Sep-2005 peadar

Remove checks for BOOTSIG[23] from FAT32 bootblocks.

There seems to be very little documentary evidence outside this
implementation to suggest a these checks are neccessary, and more
than one camera-formatted flash disk fails the check, but mounts
successfully on most other systems.

Reviewed By: bde@


150623 27-Sep-2005 rwatson

Back out fifo_vnops.c:1.127, which introduced an sx lock around I/O on
a fifo. While this did indeed close the race, confirming suspicions
about the nature of the problem, it causes difficulties with blocking
I/O on fifos.

Discussed with: ups
Also spotted by: Peter Holm <peter at holm dot cc>


150561 26-Sep-2005 rwatson

Assert v_fifoinfo is non-NULL in fifo_close() in order to catch
non-conforming cases sooner.

MFC after: 3 days
Reported by: Peter Holm <peter at holm dot cc>


150545 25-Sep-2005 rwatson

Lock the read socket receive buffer when frobbing the sb_state flag on
that socket during open, not the write socket receive buffer. This
might explain clearing of the sb_state SB_LOCK flag seen occasionally
in soreceive() on fifos.

MFC after: 3 days
Spotted by: ups


150501 24-Sep-2005 phk

Make rule zero really magical, that way we don't have to do anything
when we mount and get zero cost if no rules are used in a mountpoint.

Add code to deref rules on unmount.

Switch from SLIST to TAILQ.

Drop SYSINIT, use SX_SYSINIT and static initializer of TAILQ instead.

Drop goto, a break will do.

Reduce double pointers to single pointers.

Combine reaping and destroying rulesets.

Avoid memory leaks in a some error cases.


150486 23-Sep-2005 rwatson

For reasons of consistency (and necessity), assert an exclusive vnode
lock on the fifo vnode in fifo_open(): we rely on the vnode lock to
serialize access to v_fifoinfo.

MFC after: 3 days


150462 22-Sep-2005 rwatson

Add fi_sx, an sx lock to serialize I/O operations on the socket pair
underlying the POSIX fifo implementation. In 6.x/7.x, fifo access is
moved from the VFS layer, where it was serialized using the vnode
lock, to the file descriptor layer, where access is protected by a
reference count but not serialized. This exposed socket buffer
locking to high levels of parallelism in specific fifo workloads, such
as make -j 32, which expose as yet unresolved socket buffer bugs.

fi_sx re-adds serialization about the read and write routines,
although not paths that simply test socket buffer mbuf queue state,
such as the poll and kqueue methods. This restores the extra locking
cost previously present in some cases, but is an effective workaround
for the instability that has been experienced. This workaround should
be removed once the bug in socket buffer handling has been fixed.

Reported by: kris, jhb, Julien Gabel <jpeg at thilelli dot net>,
Peter Holm <peter at holm dot cc>, others
MFC after: 3 days


150342 19-Sep-2005 phk

Rewamp DEVFS internals pretty severely [1].

Give DEVFS a proper inode called struct cdev_priv. It is important
to keep in mind that this "inode" is shared between all DEVFS
mountpoints, therefore it is protected by the global device mutex.

Link the cdev_priv's into a list, protected by the global device
mutex. Keep track of each cdev_priv's state with a flag bit and
of references from mountpoints with a dedicated usecount.

Reap the benefits of much improved kernel memory allocator and the
generally better defined device driver APIs to get rid of the tables
of pointers + serial numbers, their overflow tables, the atomics
to muck about in them and all the trouble that resulted in.

This makes RAM the only limit on how many devices we can have.

The cdev_priv is actually a super struct containing the normal cdev
as the "public" part, and therefore allocation and freeing has moved
to devfs_devs.c from kern_conf.c.

The overall responsibility is (to be) split such that kern/kern_conf.c
is the stuff that deals with drivers and struct cdev and fs/devfs
handles filesystems and struct cdev_priv and their private liason
exposed only in devfs_int.h.

Move the inode number from cdev to cdev_priv and allocate inode
numbers properly with unr. Local dirents in the mountpoints
(directories, symlinks) allocate inodes from the same pool to
guarantee against overlaps.

Various other fields are going to migrate from cdev to cdev_priv
in the future in order to hide them. A few fields may migrate
from devfs_dirent to cdev_priv as well.

Protect the DEVFS mountpoint with an sx lock instead of lockmgr,
this lock also protects the directory tree of the mountpoint.

Give each mountpoint a unique integer index, allocated with unr.
Use it into an array of devfs_dirent pointers in each cdev_priv.
Initially the array points to a single element also inside cdev_priv,
but as more devfs instances are mounted, the array is extended with
malloc(9) as necessary when the filesystem populates its directory
tree.

Retire the cdev alias lists, the cdev_priv now know about all the
relevant devfs_dirents (and their vnodes) and devfs_revoke() will
pick them up from there. We still spelunk into other mountpoints
and fondle their data without 100% good locking. It may make better
sense to vector the revoke event into the tty code and there do a
destroy_dev/make_dev on the tty's devices, but that's for further
study.

Lots of shuffling of stuff and churn of bits for no good reason[2].

XXX: There is still nothing preventing the dev_clone EVENTHANDLER
from being invoked at the same time in two devfs mountpoints. It
is not obvious what the best course of action is here.

XXX: comment out an if statement that lost its body, until I can
find out what should go there so it doesn't do damage in the meantime.

XXX: Leave in a few extra malloc types and KASSERTS to help track
down any remaining issues.

Much testing provided by: Kris
Much confusion caused by (races in): md(4)

[1] You are not supposed to understand anything past this point.

[2] This line should simplify life for the peanut gallery.


150281 18-Sep-2005 rwatson

Assert that (vp) is locked in fifo_close(), since we rely on the
exclusive vnode lock to synchronize the reference counts on struct
fifoinfo.

MFC after: 3 days


150200 15-Sep-2005 phk

Don't attempt to recurse lockmgr, it doesn't like it.


150181 15-Sep-2005 kan

Handle a race condition where NULLFS vnode can be cleaned while threads
can still be asleep waiting for lowervp lock.

Tested by: kkenn
Discussed with: ssouhlal, jeffr


150165 15-Sep-2005 rwatson

The socket pointers in fifoinfo are not permitted to be NULL, so
don't check if they are, it just confuses the fifo code more.

MFC after: 3 days


150151 15-Sep-2005 phk

Various minor polishing.


150150 15-Sep-2005 phk

Protect the devfs rule internal global lists with a sx lock, the per
mount locks are not enough. Finer granularity (x)locking could be
implemented, but I prefer to keep it simple for now.


150149 15-Sep-2005 phk

Absolve devfs_rule.c from locking responsibility and call it with
all necessary locking held.


150147 15-Sep-2005 phk

Close a race which could result in unwarranted "ruleset %d already
running" panics.

Previously, recursion through the "include" feature was prevented by
marking each ruleset as "running" when applied. This doesn't work for
the case where two DEVFS instances try to apply the same ruleset at
the same time.

Instead introduce the sysctl vfs.devfs.rule_depth (default == 1) which
limits how many levels of "include" we will traverse.

Be aware that traversal of "include" is recursive and kernel stack
size is limited.

MFC: after 3 days


150096 13-Sep-2005 rwatson

Trim down now (believed to be) unused fifo_ioctl() and
fifo_kqfilter() VOP implementations, since they in theory are used
only on open file descriptors, in which case the ioctls are via
fifo_ioctl_f() and kqueue requests are via fifo_kqfilter_f().
Generate warnings if they are entered for now. These printf()
calls should become panic() calls.

Annotate and re-implement fifo_ioctl_f(): don't arbitrarily
forward ioctls to the socket layer, only forward the ones we
explicitly support for fifos. In the case of FIONREAD, don't
forward the request to the write socket on a read-write fifo, or
the read result is overwritten. Annotate a nasty case for the
undefined POSIX O_RDWR on fifos, in which failure of the second
ioctl will result in the socket pair being in an inconsistent
state.

Assert copyright as I find myself rewriting non-trivial parts of
fifofs.

MFC after: 3 days


150077 13-Sep-2005 rwatson

As a result of kqueue locking work, socket buffer locks will always
be held when entering a kqueue filter for fifos via a socket buffer
event: as such, assert the lock unconditionally rather than acquiring
it conditionall.

MFC after: 3 days


150074 13-Sep-2005 rwatson

Annotate two issues:

1) fifo_kqfilter() is not actually ever used, it likely should be GC'd.

2) fifo_kqfilter_f() doesn't implement EVFILT_VNODE, so detecting events
on the underlying vnode for a fifo no longer works (it did in 4.x).
Likely, fifo_kqfilter_f() should forward the request to the VFS using
fp->f_vnode, which would work once fifo_kqfilter() was detached from
the vnode operation vector (removing the fifo override).

Discussed with: phk


150066 12-Sep-2005 rwatson

Introduce no-op nosup fifo kqueue filter and detach routine, which are
used when a read filter is requested on a write-only fifo descriptor, or
a write filter is requested on a read-only fifo descriptor. This
permits the filters to be registered, but never raises the event, which
causes kqueue behavior for fifos to more closely match similar semantics
for poll and select, which permit testing for the condition even though
the condition will never be raised, and is consistent with POSIX's notion
that a fifo has identical semantics to a one-way IPC channel created
using pipe() on most operating systems.

The fifo regression test suite can now run to completion on HEAD without
errors.

MFC after: 3 days


150060 12-Sep-2005 rwatson

When a request is made to register a filter on a fifo that doesn't
apply to the fifo (i.e., not EVFILT_READ or EVFILT_WRITE), reject
it as EINVAL, not by returning 1 (EPERM).

MFC after: 3 days


150033 12-Sep-2005 rwatson

Remove DFLAG_SEEKABLE from fifo file descriptors: fifos are not seekable
according to POSIX, not to mention the fact that it doesn't make sense
(and hence isn't really implemented). This causes the fifo_misc
regression test to succeed.


150027 12-Sep-2005 rwatson

Only poll the fifo for read events if the fifo is attached to a readable
file descriptor. Otherwise, the read end of a fifo might return that it
is writable (which it isn't).

Only poll the fifo for write events if the fifo attached to a writable
file descriptor. Otherwise, the write end of a fifo might return that
it is readable (which it isn't).

In the event that a file is FREAD|FWRITE (which is allowed by POSIX, but
has undefined behavior), we poll for both.

MFC after: 3 days


150026 12-Sep-2005 rwatson

After going to some trouble to identify only the write-related events
to poll the write socket for, the fifo polling code proceeded to poll
for the complete set of events. Use 'levents' instead of 'events' as
the argument to poll, and only poll the write socket if there is
interest in write events.

MFC after: 3 days


150025 12-Sep-2005 rwatson

When a writer opens a fifo, wake up the read socket for read, not the
write socket.

MFC after: 3 days


150024 12-Sep-2005 rwatson

Add an assertion that fifo_open() doesn't race against other threads
while sleeping to allocate fifo state: due to using the vnode lock to
serialize access to a fifo during open, it shouldn't happen (tm).

MFC after: 3 days


150023 12-Sep-2005 rwatson

Rather than reaching into the internals of the UNIX domain socket code
by calling uipc_connect2() to connect two socket endpoints to create a
fifo, call soconnect2().

MFC after: 3 days


150019 12-Sep-2005 phk

Clean up prototypes.


149991 11-Sep-2005 rodrigc

Cast bf_sysid to const char * when passing it to strncmp(), because
strncmp does not take an unsigned char *. Eliminates warning with GCC 4.0.


149990 11-Sep-2005 rodrigc

Do not declare M_NTFSMNT with extern linkage here, since
it is defined with static linkage in ntfs_vfsops.c.
Fixes compilation with GCC 4.0.


149850 07-Sep-2005 obrien

Ensure the full value is written into inode variables.

PR: 85503
Submitted by: Dmitry Pryanishnikov <dmitry@atlantis.dp.ua>


149771 03-Sep-2005 ssouhlal

Unbreak hpfs/ntfs/udf/ext2fs/reiserfs mounting.

Another pointyhat to: ssouhlal


149745 03-Sep-2005 ssouhlal

Unbreak the build.

Pointyhat to: ssouhlal


149722 02-Sep-2005 ssouhlal

Use vput() instead of vrele() in null_reclaim() since the lower vnode
is locked.

MFC after: 3 days


149720 02-Sep-2005 ssouhlal

*_mountfs() (if the filesystem mounts from a device) needs devvp to be
locked, so lock it.

Glanced at by: phk
MFC after: 3 days


149573 29-Aug-2005 phk

Add a missing dev_relthread() call.

Remove unused variable.

Spotted by: Hans Petter Selasky <hselasky@c2i.net>


149177 17-Aug-2005 phk

Handle device drivers with D_NEEDGIANT in a way which does not
penalize the 'good' drivers: Allocate a shadow cdevsw and populate
it with wrapper functions which grab Giant


149146 16-Aug-2005 phk

Collect the devfs related sysctls in one place


149144 16-Aug-2005 phk

Create a new internal .h file to communicate very private stuff
from kern_conf.c to devfs.

For now just two prototypes, more to come.


149107 15-Aug-2005 phk

Eliminate effectively unused dm_basedir field from devfs_mount.


149045 14-Aug-2005 grehan

- restore the ability to mount cd9660 filesystems as root by inverting
some of the options test, specifically the joliet and rockridge tests.
Since the root mount callchain doesn't go through cd9660_cmount, the
default mount options aren't set. Rather than having the main codepath
assume the options are there, test for the absence of the inverted
optioin

e.g. instead of vfs_flagopt(.. "joliet" ..), test for
!vfs_flagopt(.. "nojoliet" ..)

This works for root mount, non-root mount and future nmount cases.

- in cd9660_cmount, remove inadvertent setting of "gens" when "extatt"
was set.

Reported by: grehan, Dario Freni <saturnero at freesbie org>
Tested by: Dario Freni
Not objected to by: phk

MFC after: 3 days


148984 12-Aug-2005 des

Eliminate an unnecessary bcopy().


148920 10-Aug-2005 obrien

Remove public declarations of variables that were forgotten when they were
made static.


148919 10-Aug-2005 obrien

Remove the need to forward declare statics by moving them around.


148868 08-Aug-2005 rwatson

Merge the dev_clone and dev_clone_cred event handlers into a single
event handler, dev_clone, which accepts a credential argument.
Implementors of the event can ignore it if they're not interested,
and most do. This avoids having multiple event handler types and
fall-back/precedence logic in devfs.

This changes the kernel API for /dev cloning, and may affect third
party packages containg cloning kernel modules.

Requested by: phk
MFC after: 3 days


148547 29-Jul-2005 kris

devfs is not yet fully MPSAFE - for example, multiple concurrent devfs(8)
processes can cause a panic when operating on rulesets.

Approved by: phk


148182 20-Jul-2005 simon

Correct devfs ruleset bypass.

Submitted by: csjp
Reviewed by: phk
Security: FreeBSD-SA-05:17.devfs
Approved by: cperciva


148089 17-Jul-2005 imura

[1] unix2doschr()
If a character cannot be converted to DOS code page,
unix2doschr() returned `0'. As a result, unix2dosfn()
was forced to return `0', so we saw a file which was
composed of these characters as `Invalid argument'.
To correct this, if a character can be converted to
Unicode, unix2doschr() now returns `1' which is a magic
number to make unix2dosfn() know that the character
must be converted to `_'.

[2] unix2dosfn()
The above-mentioned solution only works if a file
has both of Unicode name and DOS code page name.
Unicode name would not be recorded if file name
can be settled within 11 bytes (DOS short name)
and if no conversion from Unix charset to DOS code
page has occurred. Thus, FreeBSD can create a file
which has only short name, but there is no guarantee
that the short name contains allways valid characters
because we leave it to people by using mount_msdosfs(8)
to select which conversion is used between DOS code
page and unix charset.
To avoid this, Unicode file name should be recorded
unless a character is an ascii character. This is
the way Windows XP do.

PR: 77074 [1]
MFC after: 1 week


147982 14-Jul-2005 rwatson

When devfs cloning takes place, provide access to the credential of the
process that caused the clone event to take place for the device driver
creating the device. This allows cloned device drivers to adapt the
device node based on security aspects of the process, such as the uid,
gid, and MAC label.

- Add a cred reference to struct cdev, so that when a device node is
instantiated as a vnode, the cloning credential can be exposed to
MAC.

- Add make_dev_cred(), a version of make_dev() that additionally
accepts the credential to stick in the struct cdev. Implement it and
make_dev() in terms of a back-end make_dev_credv().

- Add a new event handler, dev_clone_cred, which can be registered to
receive the credential instead of dev_clone, if desired.

- Modify the MAC entry point mac_create_devfs_device() to accept an
optional credential pointer (may be NULL), so that MAC policies can
inspect and act on the label or other elements of the credential
when initializing the skeleton device protections.

- Modify tty_pty.c to register clone_dev_cred and invoke make_dev_cred(),
so that the pty clone credential is exposed to the MAC Framework.

While currently primarily focussed on MAC policies, this change is also
a prerequisite for changes to allow ptys to be instantiated with the UID
of the process looking up the pty. This requires further changes to the
pty driver -- in particular, to immediately recycle pty nodes on last
close so that the credential-related state can be recreated on next
lookup.

Submitted by: Andrew Reisse <andrew.reisse@sparta.com>
Obtained from: TrustedBSD Project
Sponsored by: SPAWAR, SPARTA
MFC after: 1 week
MFC note: Merge to 6.x, but not 5.x for ABI reasons


147857 09-Jul-2005 tanimura

Regrab dvp only when ISDOTDOT.

Approved by: re (scottl)


147809 07-Jul-2005 jeff

- Since we don't hold a usecount in pfs_exit we have to get a holdcnt
prior to calling vgone() to prevent any races.

Sponsored by: Isilon Systems, Inc.
Approved by: re (vfs blanket)


147692 30-Jun-2005 peter

Jumbo-commit to enhance 32 bit application support on 64 bit kernels.
This is good enough to be able to run a RELENG_4 gdb binary against
a RELENG_4 application, along with various other tools (eg: 4.x gcore).
We use this at work.

ia32_reg.[ch]: handle the 32 bit register file format, used by ptrace,
procfs and core dumps.
procfs_*regs.c: vary the format of proc/XXX/*regs depending on the client
and target application.
procfs_map.c: Don't print a 64 bit value to 32 bit consumers, or their
sscanf fails. They expect an unsigned long.
imgact_elf.c: produce a valid 32 bit coredump for 32 bit apps.
sys_process.c: handle 32 bit consumers debugging 32 bit targets. Note
that 64 bit consumers can still debug 32 bit targets.

IA64 has got stubs for ia32_reg.c.

Known limitations: a 5.x/6.x gdb uses get/setcontext(), which isn't
implemented in the 32/64 wrapper yet. We also make a tiny patch to
gdb pacify it over conflicting formats of ld-elf.so.1.

Approved by: re


147676 30-Jun-2005 peter

Conditionally weaken sys_generic.c rev 1.136 to allow certain dubious
ioctl numbers in backwards compatability mode. eg: an IOC_IN ioctl with
a size of zero. Traditionally this was what you did before IOC_VOID
existed, and we had some established users of this in the tree, namely
procfs. Certain 3rd party drivers with binary userland components also
have this too.

This is necessary to have 4.x and 5.x binaries use these ioctl's. We
found this at work when trying to run 4.x binaries.

Approved by: re


146984 05-Jun-2005 imura

Avoid casting from (int *) to (size_t *) in order to fix udf_iconv on amd64.

Reviewed by: scottl
MFC after: 2 weeks


146823 31-May-2005 rodrigc

Do not declare a struct as extern, and then implement
it as static in the same file. This is not legal C,
and GCC 4.0 will issue an error.

Reviewed by: phk
Approved by: das (mentor)


146121 11-May-2005 brueffer

Fix three typos in comments. Two of them obtained from OpenBSD.

MFC after: 3 days


146115 11-May-2005 kan

Do not dereference dvp pointer before doing a NULL check.

Noticed by: Coverity Prevent analysis tool.


145974 06-May-2005 anholt

Staticize a symbol used only in this file.

PR: kern/43613
Submitted by: Matt Emmerton, matt at gsicomp dot on dot ca


145939 06-May-2005 robert

The printf(9) `%p' conversion specifier puts an "0x" in
front of the pointer value. Therefore, remove the "0x"
from the format string.


145938 06-May-2005 robert

Fix our NTFS readdir function.

To check a directory's in-use bitmap bit by bit, we use
a pointer to an 8 bit wide unsigned value.

The index used to dereference this pointer is calculated
by shifting the bit index right 3 bits. Then we do a
logical AND with the bit# represented by the lower 3
bits of the bit index.

This is an idiomatic way of iterating through a bit map
with simple bitwise operations.

This commit fixes the bug that we only checked bits
3:0 of each 8 bit chunk, because we only used bits 1:0
of the bit index for the bit# in the current 8 bit value.
This resulted in files not being returned by getdirentries(2).

Change the type of the bit map pointer from `char *' to
`u_int8_t *'.


145900 05-May-2005 takawata

Fix breakage on alpha.

Pointed out by: hrs via IRC


145872 04-May-2005 takawata

Make smbfs capable to use 16bit char set in filenames.

PR:78110


145825 03-May-2005 jeff

- Set the v_object pointer after a successful VOP_OPEN(). This isn't a
perfect solution as the lower vm object can change at unpredictable times
if our lower vp happens to be on another unionfs, etc.

Submitted by: Oleg Sharoiko <os@rsu.ru>


145730 01-May-2005 jeff

- In devfs_open() and devfs_close() grab Giant if the driver sets NEEDGIANT.
We still have to DROP_GIANT and PICKUP_GIANT when NEEDGIANT is not set
because vfs is still sometime entered with Giant held.


145714 30-Apr-2005 des

Fix an old pasto.


145698 30-Apr-2005 jeff

- Mark devfs as MNTK_MPSAFE as I belive it does not require Giant.

Sponsored by: Isilon Systems, Inc.
Agreed in principle by: phk


145586 27-Apr-2005 jeff

- Fix several locking problems in unionfs_mount so that it will come
closer to passing DEBUG_VFS_LOCKS.


145585 27-Apr-2005 jeff

- Pass the ISOPEN flag down to our lower filesystems.
- Remove an erroneous VOP lock assert.


145424 22-Apr-2005 jeff

- As this is presently the one and only place where duplicate acquires of
the vnode interlock are allowed mark it by passing MTX_DUPOK to this
lock operation only.

Sponsored by: Isilon Systems, Inc.


145174 16-Apr-2005 das

Disable negative name caching for msdosfs to work around a bug.
Since the name cache is case-sensitive and msdosfs isn't,
creating a file 'foo' won't invalidate a negative entry for 'FOO'.
There are similar problems related to 8.3 filenames.

A better solution is to override VOP_LOOKUP with a method that
canonicalizes the name, then calls vfs_cache_lookup(). Unfortunately,
it's not quite that simple because vfs_cache_lookup() will call
msdosfs_lookup() on a cache miss, and msdosfs_lookup() needs a way to
get at the original component name.


145131 16-Apr-2005 njl

Fix mbnambuf support for multi-byte characters. If a substring is larger
than WIN_CHARS bytes, we shift the suffix (previous substrings) upwards
by the amount this substring exceeds its WIN_CHARS slot. Profiling shows
this change is indistinguishable from the previous code at 95% confidence.
This bug would result in attempts to access or create files or directories
with multi-byte characters returning an error but no data loss.

Reported and tested by: avatar
MFC after: 3 days


145072 14-Apr-2005 brueffer

Correct typo.

Obtained from: OpenBSD


145006 13-Apr-2005 jeff

- Change all filesystems and vfs_cache to relock the dvp once the child is
locked in the ISDOTDOT case. Se vfs_lookup.c r1.79 for details.

Sponsored by: Isilon Systems, Inc.


144904 11-Apr-2005 jeff

- Clear VI_OWEINACT before calling vget() with no lock type. We know
the node is actually already locked, and VOP_INACTIVE is not desirable
in this case.


144903 11-Apr-2005 jeff

- Honor the flags argument passed to null_root(). The filesystem below
us will decide whether or not to grab a real shared lock.


144852 10-Apr-2005 delphij

Initialize vp before using it. Failing to do this can cause instant
panic when trying to access a file on mounted smbfs.

Submitted by: takawata at jp freebsd org


144740 07-Apr-2005 phk

Give msdosfs a unique inode number which is really the byteoffset of
the directory entry.

This solves the corruption problem I belive.

Regression test script by: silby


144620 04-Apr-2005 jeff

- Fix union's assumptions about when the dvp is unlocked. It is only
unlocked in the ISDOTDOT case now, not for all !ISLASTCN lookups.


144389 31-Mar-2005 phk

Explicitly hold a reference to the cdev we have just cloned. This
closes the race where the cdev was reclaimed before it ever made it
back to devfs lookup.


144385 31-Mar-2005 phk

cdev (still) needs per instance uid/gid/mode

Add unlocked version of dev_ref()

Clean up various stuff in sys/conf.h


144384 31-Mar-2005 phk

Rename dev_ref() to dev_refl()


144366 31-Mar-2005 jeff

- LK_NOPAUSE is a nop now.

Sponsored by: Isilon Systems, Inc.


144299 29-Mar-2005 jeff

- Remove wantparent, it is no longer necessary. An assert in vfs_lookup.c
prevents any callers from doing a modifying op without
LOCKPARENT or WANTPARENT.


144298 29-Mar-2005 jeff

- Remove wantparent, it is no longer necessary. An assert in vfs_lookup.c
prevents any callers from doing a DELETE or RENAME without locking
the parent.


144297 29-Mar-2005 jeff

- cache_lookup() now locks the new vnode for us to prevent some races.
Remove redundant code.

Sponsored by: Isilon Systems, Inc.


144230 28-Mar-2005 jeff

- Correct the dprintf format int the _lookup routine.

Spotted by: pjd


144228 28-Mar-2005 jeff

- Garbage collect an unused variable.


144227 28-Mar-2005 jeff

- Don't panic if we can't lock a child in lookup, return an error instead.
- Only unlock the directory if this is a DOTDOT lookup. Previously this
code could have deadlocked if there was a DOTDOT lookup with LOCKPARENT
set and another thread was locking the other way up the tree.

Sponsored by: Isilon Systems, Inc.


144225 28-Mar-2005 jeff

- Remove unnecessary LOCKPARENT manipulation.

Sponsored by: Isilon Systems, Inc.


144215 28-Mar-2005 jeff

- nwfs_lookup() is no longer responsible for unlocking the dvp, this is
handled in vfs_lookup.c. This code was missing PDIRUNLOCK use prior
to the removal of PDIRUNLOCK in rev 1.73 of vfs_lookup.c.

Sponsored by: Isilon Systems, Inc.


144213 28-Mar-2005 jeff

- hpfs_lookup() is no longer responsible for unlocking the dvp, this is
handled in vfs_lookup.c. This code was missing PDIRUNLOCK use prior
to the removal of PDIRUNLOCK in rev 1.73 of vfs_lookup.c.

Sponsored by: Isilon Systems, Inc.


144208 28-Mar-2005 jeff

- We no longer have to bother with PDIRUNLOCK, lookup() handles it for us.

Sponsored by: Isilon Systems, Inc.


144207 28-Mar-2005 jeff

- We no longer have to bother with PDIRUNLOCK, lookup() handles it for us.
- In the ISDOTDOT case we have to unlock the dvp before locking the child,
if this fails we must relock dvp before returning an error. This was
missing before.

Sponsored by: Isilon Systems, Inc.


144206 28-Mar-2005 jeff

- We no longer have to bother with PDIRUNLOCK, lookup() handles it for us.
- Network filesystems are written with a special idiom that checks the
cache first, and may even unlock dvp before discovering that a network
round-trip is required to resolve the name. I believe dvp is prevented
from being recycled even in the forced unmount case by the shared lock
on the mount point. If not, this code should grow checks for VI_DOOMED
after it relocks dvp or it will access NULL v_data fields.

Sponsored by: Isilon Systems, Inc.


144103 25-Mar-2005 jeff

- Pass LK_EXCLUSIVE as the lock type to vget in vfs_hash_insert().


144059 24-Mar-2005 jeff

- Update vfs_root implementations to match the new prototype. None of
these filesystems will support shared locks until they are explicitly
modified to do so. Careful review must be done to ensure that this
is safe for each individual filesystem.

Sponsored by: Isilon Systems, Inc.


144058 24-Mar-2005 jeff

- Update vfs_root implementations to match the new prototype. None of
these filesystems will support shared locks until they are explicitly
modified to do so. Careful review must be done to ensure that this
is safe for each individual filesystem.

Sponsored by: Isilon Systems, Inc.


143841 19-Mar-2005 phk

Use subr_unit


143756 17-Mar-2005 phk

Also remember to set the fsid here.


143755 17-Mar-2005 phk

Forgot to replace code to set fsid in vop_getattr.


143746 17-Mar-2005 phk

Prepare for the final onslaught on devices:

Move uid/gid/mode from cdev to cdevsw.

Add kind field to use for devd(8) later.

Bump both D_VERSION and __FreeBSD_version


143744 17-Mar-2005 jeff

- Lock the clearing of v_data so it is safe to inspect it with the
interlock.

Sponsored by: Isilon Systems, Inc.


143692 16-Mar-2005 phk

Add two arguments to the vfs_hash() KPI so that filesystems which do
not have unique hashes (NFS) can also use it.


143691 16-Mar-2005 phk

Remove unused file


143686 16-Mar-2005 phk

Remove inode fields previously used for private inode hash tables.


143679 16-Mar-2005 phk

XXX: unnecessary pointer in inode.


143678 16-Mar-2005 phk

Don't store the disk cdev in all inodes.


143668 15-Mar-2005 phk

Don't hold a reference to the disk vnode for each inode.

Eliminate cdev and vnode pointer to the disk from the inodes,
the mount holds everything we need.


143667 15-Mar-2005 phk

Eliminate cdev pointer in inodes, they're not used or needed.

The cdev could have been pulled out of the mountpoint cheaper back
when it was used anyway.


143666 15-Mar-2005 phk

Don't hold a reference on the disk vnode for each inode.


143663 15-Mar-2005 phk

Improve the vfs_hash() API: vput() the unneeded vnode centrally to
avoid replicating the vput in all the filesystems.


143642 15-Mar-2005 jeff

- Assume that all lower filesystems now support proper locking. Assert
that they set v->v_vnlock. This is true for all filesystems in the
tree.
- Remove all uses of LK_THISLAYER. If the lower layer is locked, the
null layer is locked. We only use vget() to get a reference now.
null essentially does no locking. This fixes LOOKUP_SHARED with
nullfs.
- Remove the special LK_DRAIN considerations, I do not believe this is
needed now as LK_DRAIN doesn't destroy the lower vnode's lock, and
it's hardly used anymore.
- Add one well commented hack to prevent the lowervp from going away
while we're in it's VOP_LOCK routine. This can only happen if we're
forcibly unmounted while some callers are waiting in the lock. In
this case the lowervp could be recycled after we drop our last ref
in null_reclaim(). Prevent this with a vhold().


143637 15-Mar-2005 phk

Disable two users of findcdev. They do the wrong thing now and will
need to be fixed. In both cases the API should be reengineered to do
something (more) sensible.


143630 15-Mar-2005 jeff

- We have to transfer lockers after reseting our vnlock pointer.

Sponsored by: Isilon Systems, Inc.


143629 15-Mar-2005 phk

Don't export major,minor, instead export tty name.


143624 15-Mar-2005 phk

Print devtoname() instead of minor().


143623 15-Mar-2005 phk

Fix typo: pointers are not boolean in style(9).


143619 15-Mar-2005 phk

Simplify the vfs_hash calling convention.


143597 14-Mar-2005 des

Hook pfs_lookup() up to vfs_cachedlookup_desc instead of vfs_lookup_desc,
as suggested by Matt's comment. Also fix some style and paranoia issues.

The entire function could benefit from review by a VFS guru.

MFC after: 6 weeks


143596 14-Mar-2005 des

Fix two long-standing bugs in pfs_readdir():

Since we used an sbuf of size resid to accumulate dirents, we would end
up returning one byte short when we had enough dirents to fill or exceed
the size of the sbuf (the last byte being lost to bogus NUL termination)
causing the next call to return EINVAL due to an unaligned offset. This
went undetected for a long time because I did most of my testing in
single-user mode, where there are rarely enough processes to fill the
4096-byte buffer ls(1) uses. The most common symptom of this bug is that
tab completion of /proc or /compat/linux/proc does not work properly when
many processes are running.

Also, a check near the top would return EINVAL if resid was smaller than
PFS_DELEN, even if it was 0, which is frequently the case and perfectly
allowable. Change the test so that it returns 0 if resid is 0.

MFC after: 2 weeks


143595 14-Mar-2005 des

If PSEUDOFS_TRACE is defined, create a sysctl knob to enable / disable
pseudofs call tracing.


143592 14-Mar-2005 des

fbsdidize.


143588 14-Mar-2005 phk

Use vfs_hash instead of home-rolled.


143577 14-Mar-2005 phk

Use vfs_hash instead of home-rolled.


143571 14-Mar-2005 phk

Use vfs_hash instead of home-rolled.

Correct locking around g_vfs_close()


143570 14-Mar-2005 phk

Use vfs_hash instead of home-rolling.


143514 13-Mar-2005 jeff

- VOP_INACTIVE should no longer drop the vnode lock.

Sponsored by: Isilon Systems, Inc.


143513 13-Mar-2005 jeff

- The VI_DOOMED flag now signals the end of a vnode's relationship with
the filesystem. Check that rather than VI_XLOCK.
- VOP_INACTIVE should no longer drop the vnode lock.
- The vnode lock is required around calls to vrecycle() and vgone().

Sponsored by: Isilon Systems, Inc.


143510 13-Mar-2005 jeff

- The VI_DOOMED flag now signals the end of a vnode's relationship with
the filesystem. Check that rather than VI_XLOCK.

Sponsored by: Isilon Systems, Inc.


143507 13-Mar-2005 jeff

- The c_lock in the coda node does not offer any features over the standard
vnode lock. Remove the c_lock and use the vn lock in its place.
- Keep the coda lock functions so that the debugging information is
preserved, but call directly to the vop_std*lock routines for the real
functionality.

Sponsored by: Isilon Systems, Inc.


143506 13-Mar-2005 jeff

- Deadfs may now use the standard vop lock, get rid of dead_lock().
- We no longer have to take the XLOCK state into consideration in any
routines.

Sponsored by: Isilon Systems, Inc.


143446 12-Mar-2005 obrien

Used unsigned version.

Submitted by: jmallett


143444 12-Mar-2005 obrien

Fix kernel build on 64-bit machines.


143436 11-Mar-2005 njl

Correct a last-minute thinko. Instead of copying the nul with the string,
nul-terminate the dp->d_name directly and only copy the string.


143435 11-Mar-2005 njl

The mbnambuf routines combine multiple substrings into a single
long filename. Each substring is indexed by the windows ID, a
sequential one-based value. The previous code was extremely slow,
doing a malloc/strcpy/free for each substring.

This code optimizes these routines with this in mind, using the ID
to index into a single array and concatenating each WIN_CHARS chunk
at once. (The last chunk is variable-length.)

This code has been tested as working on an FS with difficult filename
sizes (255, 13, 26, etc.) It gives a 77.1% decrease in profiled
time (total across all functions) and a 73.7% decrease in wall time.
Test was "ls -laR > /dev/null".

Per-function time savings:
mbnambuf_init: -90.7%
mbnambuf_write: -18.7%
mbnambuf_flush: -67.1%

MFC after: 1 month


143383 10-Mar-2005 phk

One more bit of the major/minor patch to make ttyname happy as well.


143381 10-Mar-2005 phk

Try to fix the mess I made of devname, with the minimal subset of the
larger minor/major patch which was posted for testing.


143303 08-Mar-2005 phk

Remove kernelside support for devfs rules filtering on major numbers.


142907 01-Mar-2005 phk

Avoid a couple of mutex operations in the process exit path for the
common case where procfs have never been mounted.

OK'ed by: des


142692 27-Feb-2005 phk

Remove debug printout of major/minor numbers, print name instead.


142255 22-Feb-2005 sam

remove dead code

Submitted by: Coverity Prevent analysis tool


142250 22-Feb-2005 phk

We may not have an actual cdev at this point.


142242 22-Feb-2005 phk

Reap more benefits from DEVFS:

List devfs_dirents rather than vnodes off their shared struct cdev, this
saves a pointer field in the vnode at the expense of a field in the
devfs_dirent. There are often 100 times more vnodes so this is bargain.
In addition it makes it harder for people to try to do stypid things like
"finding the vnode from cdev".

Since DEVFS handles all VCHR nodes now, we can do the vnode related
cleanup in devfs_reclaim() instead of in dev_rel() and vgonel().
Similarly, we can do the struct cdev related cleanup in dev_rel()
instead of devfs_reclaim().

rename idestroy_dev() to destroy_devl() for consistency.

Add LIST_ENTRY de_alias to struct devfs_dirent.
Remove v_specnext from struct vnode.
Change si_hlist to si_alist in struct cdev.
String new devfs vnodes' devfs_dirent on si_alist when
we create them and take them off in devfs_reclaim().

Fix devfs_revoke() accordingly. Also don't clear fields
devfs_reclaim() will clear when called from vgone();

Let devfs_reclaim() call dev_rel() instead of vgonel().

Move the usecount tracking from dev_rel() to devfs_reclaim(),
and let dev_rel() take a struct cdev argument instead of vnode.

Destroy SI_CHEAPCLONE devices in dev_rel() (instead of
devfs_reclaim()) when they are no longer used. (This
should maybe happen in devfs_close() instead.)


142238 22-Feb-2005 phk

vp->v_id is a private field for the vfs namecache and it is a big mistake
that NFS ever started using it and an even bigger that it got copied&pasted
to nwfs and smbfs.

Replace with use of vhold()/vdrop().


142235 22-Feb-2005 phk

Use vn_printf() instead of home-rolling.


142232 22-Feb-2005 phk

Make dev_ref() require the dev_lock() to be held and use it from
devfs instead of directly frobbing the si_refcount.


142152 20-Feb-2005 das

Replace the workaround for a deadlock bug in Coda with a different
workaround that does not rely on vfs_start().


142043 18-Feb-2005 rwatson

Remove basically unused root_vp pointer in udfmount.

MFC after: 1 week
Discussed with: scottl


142040 18-Feb-2005 rwatson

Conditionalize cd9660 chattiness regarding the nature of the file system
mounted (is it Joliet, RockRidge, High Sierra) based on bootverbose.
Most file systems don't generate log messages based on details of the
file system superblock, and these log messages disrupt sysinstall output
during a new install from CD. We may want to explore exposing this
status information using nmount() at some point.

MFC after: 3 days


142011 17-Feb-2005 phk

Introduce vx_wait{l}() and use it instead of home-rolled versions.


141633 10-Feb-2005 phk

Make a SYSCTL_NODE static


141623 10-Feb-2005 phk

make M_NTFSMNT and ntfs_calccfree() static


141622 10-Feb-2005 phk

Make fdesc_root static


141620 10-Feb-2005 phk

Make smbfs_debuglevel private.


141619 10-Feb-2005 phk

don't call vprint with NULL.


141618 10-Feb-2005 phk

Statize malloc types.
Don't call vprint with NULL.


141617 10-Feb-2005 phk

Statize devfs_ops_f


141616 10-Feb-2005 phk

Make a bunch of malloc types static.

Found by: src/tools/tools/kernxref


141497 08-Feb-2005 njl

Unroll the loop for calculating the 8.3 filename checksum. In testing
on my P3, microbenchmarks show the unrolled version is 78x faster. In
actual use (recursive ls), this gives an average of 9% improvement in
system time and 2% improvement in wall time.


141447 07-Feb-2005 phk

Remove vop_destroyvobject()


141442 07-Feb-2005 phk

Deimplement vop_destroyvobject()


141439 07-Feb-2005 phk

Remove vop_destroyvobject() initialization.


140965 29-Jan-2005 peadar

Unbreak a few filesystems for which vnode_create_vobject() wasn't being
called in "open", causing mmap() to fail.

Where possible, pass size of file to vnode_create_vobject() rather
than having it find it out the hard way via VOP_LOOKUP

Reviewed by: phk


140939 28-Jan-2005 phk

Make filesystems get rid of their own vnodes vnode_pager object in
VOP_RECLAIM().


140936 28-Jan-2005 phk

Remove unused argument to vrecycle()


140904 27-Jan-2005 peadar

Make NTFS at least minimally usable after bufobj and GEOM fallout.

mmap() on NTFS files was hosed, returning pages offset from the
start of the disk rather than the start of the file. (ie, "cp" of
a 1-block file would get you a copy of the boot sector, not the
data in the file.) The solution isn't ideal, but gives a functioning
filesystem.

Cached vnode lookup was also broken, resulting in vnode haemorrhage.
A lookup on the same file twice would give you two vnodes, and the
resulting cached pages.

Just recently, mmap() was broken due to a lack of a call to
vnode_create_vobject() in ntfs_open().

Discussed with: phk@


140822 25-Jan-2005 phk

Introduce and use g_vfs_close().


140783 25-Jan-2005 phk

Take VOP_GETVOBJECT() out to pasture. We use the direct pointer now.


140781 25-Jan-2005 phk

Kill VOP_CREATEVOBJECT(), it is now the responsibility of the filesystem
for a given vnode to create a vnode_pager object if one is needed.


140780 24-Jan-2005 phk

Don't implement vop_createvobject(), vop_open() and vop_close() manages
this for nullfs now.


140779 24-Jan-2005 phk

Don't call VOP_CREATEVOBJECT(), it's the responsibility of the
filesystem which owns the vnode.


140776 24-Jan-2005 phk

Add null_open() and null_close() which calls null_bypass() and managed
the v_object pointer.


140768 24-Jan-2005 phk

Create a vp->v_object in VFS_FHTOVP() if we want to be exportable
with NFS.

We are moving responsibility for creating the vnode_pager object into
the filesystems which own the vnode, and this is one of the places
we have to cover.

We call vnode_create_vobject() directly because we own the vnode.

If we can get the size easily, pass it as an argument to save the
call to VOP_GETATTR() in vnode_create_vobject()


140734 24-Jan-2005 phk

Kill the VV_OBJBUF and test the v_object for NULL instead.


140732 24-Jan-2005 phk

Remove "register" keywords.


140728 24-Jan-2005 phk

Style: Remove the commented out vop_foo_args replicas.


140471 19-Jan-2005 phk

whitespace nit


140470 19-Jan-2005 phk

Remove unused coda_fbsd_getpages()


140416 18-Jan-2005 scottl

Fix an incorrect cast.

Submitted by: Andriy Gapon
MFC-after: 3 days.


140250 14-Jan-2005 scottl

NULL-terminate the . and .. directory entries. Apparently some tools ignore
d_namlen and assume that d_name is null-terminated.

Submitted by: Andriy Gapon


140249 14-Jan-2005 scottl

Replace the min() macro with a test that doesn't truncate the 64-bit values
that are used. Thanks to Bruce Evans for pointing this out.


140223 14-Jan-2005 phk

Eliminate unused and constant arguments to smbfs_vinvalbuf()


140222 14-Jan-2005 phk

Eliminate constant and unused arguments to nwfs_vinvalbuf()


140220 14-Jan-2005 phk

Eliminate unused and unnecessary "cred" argument from vinvalbuf()


140196 13-Jan-2005 phk

Whitespace in vop_vector{} initializations.


140181 13-Jan-2005 phk

Ditch vfs_object_create() and make the callers call VOP_CREATEVOBJECT()
directly.


140165 13-Jan-2005 phk

Change the generated VOP_ macro implementations to improve type checking
and KASSERT coverage.

After this check there is only one "nasty" cast in this code but there
is a KASSERT to protect against the wrong argument structure behind
that cast.

Un-inlining the meat of VOP_FOO() saves 35kB of text segment on a typical
kernel with no change in performance.

We also now run the checking and tracing on VOP's which have been layered
by nullfs, umapfs, deadfs or unionfs.

Add new (non-inline) VOP_FOO_AP() functions which take a "struct
foo_args" argument and does everything the VOP_FOO() macros
used to do with checks and debugging code.

Add KASSERT to VOP_FOO_AP() check for argument type being
correct.

Slim down VOP_FOO() inline functions to just stuff arguments
into the struct foo_args and call VOP_FOO_AP().

Put function pointer to VOP_FOO_AP() into vop_foo_desc structure
and make VCALL() use it instead of the current offsetoff() hack.

Retire vcall() which implemented the offsetoff()

Make deadfs and unionfs use VOP_FOO_AP() calls instead of
VCALL(), we know which specific call we want already.

Remove unneeded arguments to VCALL() in nullfs and umapfs bypass
functions.

Remove unused vdesc_offset and VOFFSET().

Generally improve style/readability of the generated code.


140105 12-Jan-2005 scottl

Use off_t when passing and calculating file offsets. While a single
extent in UDF is only 32 bits, multiple extents can exist in a file.
Also clean up some minor whitespace problems.

Submitted by: John Wehle


140104 12-Jan-2005 scottl

Don't allow reads past the end of a file.

Submitted by: John Wehle, Andriy Gapon
MFC After: 3 days


140067 11-Jan-2005 phk

Silently ignore forced argument to unmount.


140051 11-Jan-2005 phk

Wrap the bufobj operations in macros: BO_STRATEGY() and BO_WRITE()


140048 11-Jan-2005 phk

Remove the unused credential argument from VOP_FSYNC() and VFS_SYNC().

I'm not sure why a credential was added to these in the first place, it is
not used anywhere and it doesn't make much sense:

The credentials for syncing a file (ability to write to the
file) should be checked at the system call level.

Credentials for syncing one or more filesystems ("none")
should be checked at the system call level as well.

If the filesystem implementation needs a particular credential
to carry out the syncing it would logically have to the
cached mount credential, or a credential cached along with
any delayed write data.

Discussed with: rwatson


139984 10-Jan-2005 phk

whitespace


139896 08-Jan-2005 rwatson

Annotate that pfs_exit() always acquires and releases two mutexes for
every process exist, even if procfs isn't mounted. And one of those
mutexes is Giant. No immediate thoughts on fixing this.


139790 06-Jan-2005 imp

/* -> /*- for copyright notices, minor format tweaks as necessary


139776 06-Jan-2005 imp

/* -> /*- for copyright notices, minor format tweaks as necessary


139745 05-Jan-2005 imp

Start each of the license/copyright comments with /*-


139664 04-Jan-2005 phk

Unsupport forceful unmounts of DEVFS.

After disscussing things I have decided to take the easy and
consistent 90% solution instead of aiming for the very involved 99%
solution.

If we allow forceful unmounts of DEVFS we need to decide how to handle
the devices which are in use through this filesystem at the time.

We cannot just readopt the open devices in the main /dev instance since
that would open us to security issues.

For the majority of the devices, this is relatively straightforward
as we can just pretend they got revoke(2)'ed.

Some devices get tricky: /dev/console and /dev/tty for instance
does a sort of recursive open of the real console device. Other devices
may be mmap'ed (kill the processes ?).

And then there are disk devices which are mounted.

The correct thing here would be to recursively unmount the filesystems
mounte from devices from our DEVFS instance (forcefully) and if
this succeeds, complete the forcefully unmount of DEVFS. But if
one of the forceful unmounts fail we cannot complete the forceful
unmount of DEVFS, but we are likely to already have severed a lot
of stuff in the process of trying.

Event attempting this would be a lot of code for a very far out
corner-case which most people would never see or get in touch with.

It's just not worth it.


139189 22-Dec-2004 phk

Be consistent about flag values passed to device drivers read/write
methods:

Read can see O_NONBLOCK and O_DIRECT.

Write can see O_NONBLOCK, O_DIRECT and O_FSYNC.

In addition O_DIRECT is shadowed as IO_DIRECT for now for backwards
compatibility.


139188 22-Dec-2004 phk

Shuffle numeric values of the IO_* flags to match the O_* flags from
fcntl.h.

This is in preparation for making the flags passed to device drivers be
consistently from fcntl.h for all entrypoints.

Today open, close and ioctl uses fcntl.h flags, while read and write
uses vnode.h flags.


139085 20-Dec-2004 phk

We can only ever get to vgonechrl() from a devfs vnode, so we do not
need to reassign the vp->v_op to devfs_specops, we know that is the
value already.

Make devfs_specops private to devfs.


139083 20-Dec-2004 phk

Add a couple of KASSERTS to try to diagnose a problem reported.


138841 14-Dec-2004 phk

Be a bit more assertive about vnode bypass.


138810 13-Dec-2004 ssouhlal

Exporting of NTFS filesystem broke in rev 1.70. Fix it.

Approved by: phk, grehan (mentor)


138796 13-Dec-2004 phk

Don't forget to bypass vnodes in corner cases.

Found by: kkenn and ports/shell/zsh
Thanks to: jeffr


138791 13-Dec-2004 phk

Another FNONBLOCK -> O_NONBLOCK.

Don't unconditionally set IO_UNIT to device drivers in write: nobody
checks it, and since it was always set it did not carry information anyway.


138790 13-Dec-2004 phk

Use O_NONBLOCK instead of FNONBLOCK alias.


138788 13-Dec-2004 phk

Explicit panic in vop_read/vop_write for devices


138784 13-Dec-2004 phk

Explicitly panic vop_read/vop_write on fifos.


138737 12-Dec-2004 phk

Don't deref NULL if no charset-conversion is specified.

Return correct vnode in vop_bmap()


138689 11-Dec-2004 phk

Handle MNT_UPDATE export requests first and return so we do not
interpret the rest of the msdosfs_args structure.

Detected by: marcel


138678 11-Dec-2004 phk

typo


138519 07-Dec-2004 phk

First save from editor, *then* commit.


138518 07-Dec-2004 phk

Fix exports.


138509 07-Dec-2004 phk

The remaining part of nmount/omount/rootfs mount changes. I cannot sensibly
split the conversion of the remaining three filesystems out from the root
mounting changes, so in one go:

cd9660:
Convert to nmount.
Add omount compat shims.
Remove dedicated rootfs mounting code.
Use vfs_mountedfrom()
Rely on vfs_mount.c calling VFS_STATFS()

nfs(client):
Convert to nmount (the simple way, mount_nfs(8) is still necessary).
Add omount compat shims.
Drop COMPAT_PRELITE2 mount arg compatibility.

ffs:
Convert to nmount.
Add omount compat shims.
Remove dedicated rootfs mounting code.
Use vfs_mountedfrom()
Rely on vfs_mount.c calling VFS_STATFS()

Remove vfs_omount() method, all filesystems are now converted.

Remove MNTK_WANTRDWR, handling RO/RW conversions is a filesystem
task, and they all do it now.

Change rootmounting to use DEVFS trampoline:

vfs_mount.c:
Mount devfs on /. Devfs needs no 'from' so this is clean.
symlink /dev to /. This makes it possible to lookup /dev/foo.
Mount "real" root filesystem on /.
Surgically move the devfs mountpoint from under the real root
filesystem onto /dev in the real root filesystem.

Remove now unnecessary getdiskbyname().

kern_init.c:
Don't do devfs mounting and rootvnode assignment here, it was
already handled by vfs_mount.c.

Remove now unused bdevvp(), addaliasu() and addalias(). Put the
few necessary lines in devfs where they belong. This eliminates the
second-last source of bogo vnodes, leaving only the lemming-syncer.

Remove rootdev variable, it doesn't give meaning in a global context and
was not trustworth anyway. Correct information is provided by
statfs(/).


138495 06-Dec-2004 phk

Use vfs_mountedfrom().

Since VFS_STATFS() always calls the filesystem with mp->mnt_stat now, the
vfs_statfs method is now a no-op. Explain this in a comment.


138491 06-Dec-2004 phk

Trust vfs_mount to call VFS_STATFS() on all mounts.


138490 06-Dec-2004 phk

Convert to nmount. Add omount compat.

Unpropagate the sm_args function into the runtime part.


138489 06-Dec-2004 phk

Convert to nmount. Add omount compat.

Use vfs_mountedon(). Rely on vfs_mount.c calling VFS_STATFS().


138488 06-Dec-2004 phk

Convert to nmount. Add omount compat.

Same comment about charset conversions apply.

Use vfs_mountedfrom(). Rely on vfs_mount.c calling VFS_STATFS().


138487 06-Dec-2004 phk

Convert to nmount. Add backwards compat cmount method.

Same comment as msdosfs applies: It would be nice if we had generic option
names for charset conversions.

Use vfs_mountefrom(). Rely on vfs_mount.c calling VFS_STATFS().


138486 06-Dec-2004 phk

Convert nwfs to nmount, but take the low road: There is no way this is
ever going to work without a dedicated mount_nwfs(8) program so simply
stick struct nwfs_args into a nmount argument and leave it at that.


138485 06-Dec-2004 kan

Fix a typo in PFS_TRACE.

PR: kern/74461
Submitted by: Craig Rodrigues <rodrigc at crodrigues.org>


138484 06-Dec-2004 phk

ufs vfs_mountedon(), rely on vfs_mount.c calling VFS_STATFS()


138483 06-Dec-2004 phk

Use vfs_mountedfrom(), rely on vfs_mount.c calling VFS_STATFS().


138481 06-Dec-2004 phk

Use vfs_mountedfrom() and rely on vfs_mount.c to call VFS_STATFS()


138478 06-Dec-2004 phk

Convert coda to nmount.


138471 06-Dec-2004 phk

Convert msdosfs to nmount.

Add a vfs_cmount() function which converts omount argument stucture
to nmount arguments.

Convert vfs_omount() to vfs_mount() and parse nmount arguments.

This is 100% compatible with existing userland.

Later on, but before userland gets converted to nmount we may want
to revisit the names of the mountoptions, for instance it may make
sense to use consistent options for charset conversion etc.


138443 06-Dec-2004 phk

Fix warning


138412 05-Dec-2004 phk

VFS_STATFS(mp, ...) is mostly called with &mp->mnt_stat, but a few cases
doesn't. Most of the implementations have grown weeds for this so they
copy some fields from mnt_stat if the passed argument isn't that.

Fix this the cleaner way: Always call the implementation on mnt_stat
and copy that in toto to the VFS_STATFS argument if different.


138367 04-Dec-2004 phk

Remove embryonic rootfs mounting facility.

In the near future rootfs mounting will not require special handling
in the filesystems.


138309 02-Dec-2004 phk

Remove the de_devvp and stop VREF'ing it for every vnode we create.


138290 01-Dec-2004 phk

Back when VOP_* was introduced, we did not have new-style struct
initializations but we did have lofty goals and big ideals.

Adjust to more contemporary circumstances and gain type checking.

Replace the entire vop_t frobbing thing with properly typed
structures. The only casualty is that we can not add a new
VOP_ method with a loadable module. History has not given
us reason to belive this would ever be feasible in the the
first place.

Eliminate in toto VOCALL(), vop_t, VNODEOP_SET() etc.

Give coda correct prototypes and function definitions for
all vop_()s.

Generate a bit more data from the vnode_if.src file: a
struct vop_vector and protype typedefs for all vop methods.

Add a new vop_bypass() and make vop_default be a pointer
to another struct vop_vector.

Remove a lot of vfs_init since vop_vector is ready to use
from the compiler.

Cast various vop_mumble() to void * with uppercase name,
for instance VOP_PANIC, VOP_NULL etc.

Implement VCALL() by making vdesc_offset the offsetof() the
relevant function pointer in vop_vector. This is disgusting
but since the code is generated by a script comparatively
safe. The alternative for nullfs etc. would be much worse.

Fix up all vnode method vectors to remove casts so they
become typesafe. (The bulk of this is generated by scripts)


138281 01-Dec-2004 cperciva

Fix unvalidated pointer dereference. This is FreeBSD-SA-04:17.procfs.


138279 01-Dec-2004 phk

hpfs_lookup() should have a vop_cachedlookup_t prototype an corresponding
argument.


138277 01-Dec-2004 phk

Correctly prototype union_write with vop_write_t, not vop_read_t.


138270 01-Dec-2004 phk

Mechanically change prototypes for vnode operations to use the new typedefs.


138106 26-Nov-2004 phk

Ignore MNT_NODEV, it is implicit in choice of filesystem these days.


138105 26-Nov-2004 phk

Eliminate null_open() and use instead null_bypass().

Null_open() was only here to handle MNT_NODEV, but since that does
not affect any filesystems anymore, it could only have any effect
if you nullfs mounted a devfs but didn't want devices to show up.

If you need that, there are easier ways.


138075 25-Nov-2004 phk

Use system wide no-op vfs_start function.


137867 18-Nov-2004 phk

Add dropped implementation of ioctl for fifos.


137801 17-Nov-2004 phk

Make vnode bypass for fifos (read, write, poll) mandatory.


137800 17-Nov-2004 phk

Make vnode bypass for devices mandatory.


137755 15-Nov-2004 phk

Make vnode bypass the default for devices.

Can be disabled in case of problems with
vfs.devfs.fops=0
in loader.conf


137739 15-Nov-2004 phk

Add file ops to fifofs so that we can bypass vnodes (and Giant) for the
heavy-duty operations (read, write, poll/select, kqueue).

Disabled for now, enable with "vfs.fifofs.fops=1" in loader.conf.


137726 15-Nov-2004 phk

Make VOP_BMAP return a struct bufobj for the underlying storage device
instead of a vnode for it.

The vnode_pager does not and should not have any interest in what
the filesystem uses for backend.

(vfs_cluster doesn't use the backing store argument.)


137679 13-Nov-2004 phk

Integrate most of vop_revoke() into devfs_revoke() where it belongs.


137678 13-Nov-2004 phk

Add the devfs_fp_check() function which helps us get from a struct file
to a cdev and a devsw, doing all the relevant checks along the way.

Add the check to see if fp->f_vnode->v_rdev differs from our cached
fp->f_data copy of our cdev. If it does the device was revoked and
we return ENXIO.


137676 13-Nov-2004 phk

VOP_REVOKE() is only ever for VCHR vnodes, so unionfs does not
need a vop_revoke() method.


137673 13-Nov-2004 phk

fifos doesn't need a vop_lookup, the default will do fine.


137647 13-Nov-2004 phk

Introduce an alias for FILEDESC_{UN}LOCK() with the suffix _FAST.

Use this in all the places where sleeping with the lock held is not
an issue.

The distinction will become significant once we finalize the exact
lock-type to use for this kind of case.


137488 09-Nov-2004 trhodes

Remove stale comment after previous commit.

Noticed by: pjd


137480 09-Nov-2004 phk

Detect root mount attempts on the flag, not on the NULL path.


137479 09-Nov-2004 phk

Refuse attempts to mount root filesystem


137478 09-Nov-2004 phk

Refuse attemps to mount root filesystem


137382 08-Nov-2004 phk

Add optional device vnode bypass to DEVFS.

The tunable vfs.devfs.fops controls this feature and defaults to off.

When enabled (vfs.devfs.fops=1 in loader), device vnodes opened
through a filedescriptor gets a special fops vector which instead
of the detour through the vnode layer goes directly to DEVFS.

Amongst other things this allows us to run Giant free read/write to
device drivers which have been weaned off D_NEEDGIANT.

Currently this means /dev/null, /dev/zero, disks, (and maybe the
random stuff ?)

On a 700MHz K7 machine this doubles the speed of
dd if=/dev/zero of=/dev/null bs=1 count=1000000

This roughly translates to shaving 2usec of each read/write syscall.

The poll/kqfilter paths need more work before they are giant free,
this work is ongoing in p4::phk_bufwork

Please test this and report any problems, LORs etc.


137308 06-Nov-2004 phk

Properly implement a default version of VOP_GETWRITEMOUNT.

Remove improper access to vop_stdgetwritemount() which should and
will instead rely on the VOP default path.


137195 04-Nov-2004 phk

Add back securelevel check for disks.

XXX: This should live in geom_dev.c but we don't have access to the
cred there.
XXX: XXX: This may not matter anymore since filesystems use geom_vfs.


137185 04-Nov-2004 phk

s/ffs/ntfs/

Fix error handling to not use VOP_CLOSE() on the disk.

Spotted by: tegge


137172 03-Nov-2004 phk

Make a more whole-hearted attempt at GEOM'ifying NTFS.

I must have been sleepy when I did the first pass.

Spotted by: tegge


137047 29-Oct-2004 phk

Don't give disks special treatment, they don't come this way anymore.


137043 29-Oct-2004 phk

Remove VOP_SPECSTRATEGY() from the system.


137041 29-Oct-2004 phk

Move NTFS to GEOM backing instead of DEVFS.

For details, please see src/sys/ufs/ffs/ffs_vfsops.c 1.250.


137040 29-Oct-2004 phk

Move HPFS to GEOM backing instead of DEVFS.

For details, please see src/sys/ufs/ffs/ffs_vfsops.c 1.250.


137038 29-Oct-2004 phk

Move CD9660 to GEOM backing instead of DEVFS.

For details, please see src/sys/ufs/ffs/ffs_vfsops.c 1.250.


137037 29-Oct-2004 phk

Move UDF to GEOM backing instead of DEVFS.

For details, please see src/sys/ufs/ffs/ffs_vfsops.c 1.250.


137036 29-Oct-2004 phk

Move MSDOSFS to GEOM backing instead of DEVFS.

For details, please see src/sys/ufs/ffs/ffs_vfsops.c 1.250.


137029 29-Oct-2004 phk

Give dev_strategy() an explict cdev argument in preparation for removing
buf->b-dev.

Put a bio between the buf passed to dev_strategy() and the device driver
strategy routine in order to not clobber fields in the buf.

Assert copyright on vfs_bio.c and update copyright message to canonical
text. There is no legal difference between John Dysons two-clause
abbreviated BSD license and the canonical text.


137008 28-Oct-2004 phk

Reduce the locking activity by epsilon by checking VNON condition before
releasing the mountlock.


137006 28-Oct-2004 phk

What can I say: don't allow people to mount DEVFS with option "nodev".


136991 27-Oct-2004 phk

Eliminate unnecessary KASSERTs.

Don't use bp->b_vp in VOP_STRATEGY: the vnode is passed in as an argument.


136966 26-Oct-2004 phk

Put the I/O block size in bufobj->bo_bsize.

We keep si_bsize_phys around for now as that is the simplest way to pull
the number out of disk device drivers in devfs_open(). The correct solution
would be to do an ioctl(DIOCGSECTORSIZE), but the point is probably mooth
when filesystems sit on GEOM, so don't bother for now.


136943 25-Oct-2004 phk

Loose the v_dirty* and v_clean* alias macros.

Check the count field where we just want to know the full/empty state,
rather than using TAILQ_EMPTY() or TAILQ_FIRST().


136770 22-Oct-2004 phk

Alas, poor SPECFS! -- I knew him, Horatio; A filesystem of infinite
jest, of most excellent fancy: he hath taught me lessons a thousand
times; and now, how abhorred in my imagination it is! my gorge rises
at it. Here were those hacks that I have curs'd I know not how
oft. Where be your kludges now? your workarounds? your layering
violations, that were wont to set the table on a roar?

Move the skeleton of specfs into devfs where it now belongs and
bury the rest.


136152 05-Oct-2004 jhb

Rework how we store process times in the kernel such that we always store
the raw values including for child process statistics and only compute the
system and user timevals on demand.

- Fix the various kern_wait() syscall wrappers to only pass in a rusage
pointer if they are going to use the result.
- Add a kern_getrusage() function for the ABI syscalls to use so that they
don't have to play stackgap games to call getrusage().
- Fix the svr4_sys_times() syscall to just call calcru() to calculate the
times it needs rather than calling getrusage() twice with associated
stackgap, etc.
- Add a new rusage_ext structure to store raw time stats such as tick counts
for user, system, and interrupt time as well as a bintime of the total
runtime. A new p_rux field in struct proc replaces the same inline fields
from struct proc (i.e. p_[isu]ticks, p_[isu]u, and p_runtime). A new p_crux
field in struct proc contains the "raw" child time usage statistics.
ruadd() has been changed to handle adding the associated rusage_ext
structures as well as the values in rusage. Effectively, the values in
rusage_ext replace the ru_utime and ru_stime values in struct rusage. These
two fields in struct rusage are no longer used in the kernel.
- calcru() has been split into a static worker function calcru1() that
calculates appropriate timevals for user and system time as well as updating
the rux_[isu]u fields of a passed in rusage_ext structure. calcru() uses a
copy of the process' p_rux structure to compute the timevals after updating
the runtime appropriately if any of the threads in that process are
currently executing. It also now only locks sched_lock internally while
doing the rux_runtime fixup. calcru() now only requires the caller to
hold the proc lock and calcru1() only requires the proc lock internally.
calcru() also no longer allows callers to ask for an interrupt timeval
since none of them actually did.
- calcru() now correctly handles threads executing on other CPUs.
- A new calccru() function computes the child system and user timevals by
calling calcru1() on p_crux. Note that this means that any code that wants
child times must now call this function rather than reading from p_cru
directly. This function also requires the proc lock.
- This finishes the locking for rusage and friends so some of the Giant locks
in exit1() and kern_wait() are now gone.
- The locking in ttyinfo() has been tweaked so that a shared lock of the
proctree lock is used to protect the process group rather than the process
group lock. By holding this lock until the end of the function we now
ensure that the process/thread that we pick to dump info about will no
longer vanish while we are trying to output its info to the console.

Submitted by: bde (mostly)
MFC after: 1 month


136146 05-Oct-2004 takawata

Minor Bug fix. Some file was not translated.


136135 05-Oct-2004 takawata

Fix unionfs problems when a directory is mounted on other directory
with different file systems. This may cause ill things
with my previous fix. Now it translate fsid of direct child of
mount point directory only.

Pointed out by: Uwe Doering


136060 02-Oct-2004 takawata

Fix a problem when you try to mount a directory on another directory
belongs to the same filesystem. In this problem, getcwd(3) will fail.

I found the problem two years ago and I have forgotten to merge.

http://docs.FreeBSD.org/cgi/mid.cgi?200202251435.XAA91094


136004 01-Oct-2004 das

Don't PHOLD() the target process in procfs, since this is already done
in pseudofs. Moreover, PHOLD() may block between the p_candebug()
access check and the actual operation.


135727 24-Sep-2004 phk

XXX mark two places where we do not hold a threadcount on the dev when
frobbing the cdevsw.

In both cases we examine only the cdevsw and it is a good question if we
weren't better off copying those properties into the cdev in the first
place. This question will be revisited.


135722 24-Sep-2004 phk

Hold proper thread count while frobbing drivers ioctl.


135719 24-Sep-2004 phk

Remove devsw() call missed in last commit.


135706 24-Sep-2004 phk

Use def_re[fl]thread().

Retire various old compatibility helpers.


135617 23-Sep-2004 phk

Eliminate DEV_STRATEGY() macro: call dev_strategy() directly.

Make dev_strategy() handle errors and departing devices properly.


135613 23-Sep-2004 phk

Do not use devsw() but si_devsw direction. This is still bogus but a
fair bit less so.


135600 23-Sep-2004 phk

Do not refcount the cdevsw, but rather maintain a cdev->si_threadcount
of the number of threads which are inside whatever is behind the
cdevsw for this particular cdev.

Make the device mutex visible through dev_lock() and dev_unlock().
We may want finer granularity later.

Replace spechash_mtx use with dev_lock()/dev_unlock().


135578 22-Sep-2004 phk

Pointy hat please!

Refuse VCHR not VREG.


135541 21-Sep-2004 phk

De support opening device nodes on CD9660 filesystems. They are
still visible, they can still be seen, but they cannot be opened.
Use DEVFS for that.


135459 19-Sep-2004 phk

The getpages VOP was a good stab at getting scatter/gather I/O without
too much kernel copying, but it is not the right way to do it, and it is
in the way for straightening out the buffer cache.

The right way is to pass the VM page array down through the struct
bio to the disk device driver and DMA directly in to/out off the
physical memory. Once the VM/buf thing is sorted out it is next on
the list.

Retire most of vnode method. ffs_getpages(). It is not clear if what is
left shouldn't be in the default implementation which we now fall back to.

Retire specfs_getpages() as well, as it has no users now.


135280 15-Sep-2004 phk

Remove unused B_WRITEINPROG flag


135135 13-Sep-2004 phk

Remove the buffercache/vnode side of BIO_DELETE processing in
preparation for integration of p4::phk_bufwork. In the future,
local filesystems will talk to GEOM directly and they will consequently
be able to issue BIO_DELETE directly. Since the removal of the fla
driver, BIO_DELETE has effectively been a no-op anyway.


134945 08-Sep-2004 tjr

Reduce the size of struct defid's defid_dirclust, defid_dirofs and
(disabled) defid_gen members from u_long to u_int32_t so that alignment
requirements don't cause the structure to become larger than struct fid
on LP64 platforms. This fixes NFS exports of msdos filesystems on at
least amd64.

PR: 71173


134942 08-Sep-2004 tjr

Merge from NetBSD:
Fix a problem in previous: we can't blindly assume that we have
wincnt entries available at the offset the file has been found. If the dos
directory entry is not preceded by appropriate number of long name
entries (happens e.g. when the filesystem is corrupted, or when
the filename complies to DOS rules and doesn't use any long name entry),
we would overwrite random directory entries.

There are still some problems, the whole thing has to be revisited and solved
right.

Submitted by: Xin LI


134941 08-Sep-2004 tjr

Merge from NetBSD:
Fix a panic that occurred when trying to traverse a corrupt msdosfs
filesystem. With this particular corruption, the code in pcbmap()
would compute an offset into an array that was way out of bounds,
so check the bounds before trying to access and return an error if
the offset would be out of bounds.

Submitted by: Xin LI


134899 07-Sep-2004 phk

Create simple function init_va_filerev() for initializing a va_filerev
field.

Replace three instances of longhaired initialization va_filerev fields.

Added XXX comment wondering why we don't use random bits instead of
uptime of the system for this purpose.


134897 07-Sep-2004 phk

Explicitly pass vnode to smbfs_doio() function.


134896 07-Sep-2004 phk

Explicitly pass the vnode to the nw_doio() function.


134807 05-Sep-2004 tjr

Temporarily back out revision 1.77. This changed cd9660_getattr() and
cd9660_readdir() to return the address of the file's first data block as
the inode number instead of the address of the directory entry, but
neglected to update cd9660_vget_internal() for the new inode numbering
scheme.

Since the NFS server calls VFS_VGET (cd9660_vget()) with inode numbers
returned through VOP_READDIR (cd9660_readdir()) when servicing a READDIRPLUS
request, these two interfaces must agree on the numbering scheme; failure to
do so caused panics and/or bogus information about the entries to be returned
to clients using READDIRPLUS (Solaris, FreeBSD w/ mount -o rdirplus).

PR: 63446


134647 02-Sep-2004 rwatson

Back out pseudo_vnops.c:1.45, which was a workaround for pfind()
returning incompletely initialized processes. This problem was
eliminated by kern_proc.c:1.215, which causes pfind() not to
return processes in the PRS_NEW state.


134585 01-Sep-2004 brooks

General modernization of coda:
- Ditch NVCODA
- Don't use a static major
- Don't declare functions extern

Reviewed by: peter


134542 30-Aug-2004 peter

Kill count device support from config. I've changed the last few
remaining consumers to have the count passed as an option. This is
i4b, pc98/wdc, and coda.

Bump configvers.h from 500013 to 600000.

Remove heuristics that tried to parse "device ed5" as 5 units of the ed
device. This broke things like the snd_emu10k1 device, which required
quotes to make it parse right. The no-longer-needed quotes have been
removed from NOTES, GENERIC etc. eg, I've removed the quotes from:
device snd_maestro
device "snd_maestro3"
device snd_mss

I believe everything will still compile and work after this.


134374 27-Aug-2004 tjr

Remove bogus vrele() call added in previous.


134345 26-Aug-2004 tjr

Improve the robustness of MSDOSFSMNT_KICONV handling:
- Use copyinstr() to read cs_win, cs_dos, cs_local strings from the
mount argument structure instead of reading through user-space pointers(!).
- When mounting a filesystem, or updating an existing mount, only try to
update the iconv handles from the information in the mount argument
structure if the structure itself has the MSDOSFSMNT_KICONV flag set.
- Attempt to handle failure of update_mp() in the MNT_UPDATE case.


133776 15-Aug-2004 des

Release the vnode cache mutex when calling vgone(), since vgone() may
sleep. This makes pfs_exit() even less efficient than before, but on
the bright side, the vnode cache mutex no longer needs to be recursive.


133741 15-Aug-2004 jmg

Add locking to the kqueue subsystem. This also makes the kqueue subsystem
a more complete subsystem, and removes the knowlege of how things are
implemented from the drivers. Include locking around filter ops, so a
module like aio will know when not to be unloaded if there are outstanding
knotes using it's filter ops.

Currently, it uses the MTX_DUPOK even though it is not always safe to
aquire duplicate locks. Witness currently doesn't support the ability
to discover if a dup lock is ok (in some cases).

Reviewed by: green, rwatson (both earlier versions)


133668 13-Aug-2004 rwatson

Commit a work-around for a more general bug involving process state:
check whether p_ucred is NULL or not in pfs_getattr() before
dereferencing the credential, and return ENOENT if there wasn't one.

This is a symptom of a larger problem, wherein pfind() can return
references to incompletely initialized processes, and we instead ought
to not return them, or check the process state before acting on the
process.

Reported by: kris
Discussed with: tjr, others


133327 08-Aug-2004 phk

use bufdone() not biodone().


133326 08-Aug-2004 phk

Use bufdone(), not biodone().


133287 07-Aug-2004 phk

Push all changes to disk before downgrading a mount from rw to ro.


132902 30-Jul-2004 phk

Put a version element in the VFS filesystem configuration structure
and refuse initializing filesystems with a wrong version. This will
aid maintenance activites on the 5-stable branch.

s/vfs_mount/vfs_omount/

s/vfs_nmount/vfs_mount/

Name our filesystems mount function consistently.

Eliminate the namiedata argument to both vfs_mount and vfs_omount.
It was originally there to save stack space. A few places abused
it to get hold of some credentials to pass around. Effectively
it is unused.

Reorganize the root filesystem selection code.


132805 28-Jul-2004 phk

Remove global variable rootdevs and rootvp, they are unused as such.

Add local rootvp variables as needed.

Remove checks for miniroot's in the swappartition. We never did that
and most of the filesystems could never be used for that, but it had
still been copy&pasted all over the place.


132772 28-Jul-2004 kan

Avoid casts as lvalues.


132765 28-Jul-2004 kan

Avoid casts as lvalues.


132653 26-Jul-2004 cperciva

Rename suser_cred()'s PRISON_ROOT flag to SUSER_ALLOWJAIL. This is
somewhat clearer, but more importantly allows for a consistent naming
scheme for suser_cred flags.

The old name is still defined, but will be removed in a few days (unless I
hear any complaints...)

Discussed with: rwatson, scottl
Requested by: jhb


132547 22-Jul-2004 rwatson

In devfs_allocv(), rather than assigning 'td = curthread', assert that
the caller passes in a td that is curthread, and consistently pass 'td'
into vget(). Remove some bogus logic that passed in td or curthread
conditional on td being non-NULL, which seems redundant in the face of
the earlier assignment of td to curthread if td is NULL.

In devfs_symlink(), cache the passed thread in 'td' so we don't have
to keep retrieving it from the 'ap' structure, and assert that td is
curthread (since we dereference it to get thread-local td_ucred). Use
'td' in preference to curthread for later lockmgr calls, since they are
equal.


132199 15-Jul-2004 phk

Do a pass over all modules in the kernel and make them return EOPNOTSUPP
for unknown events.

A number of modules return EINVAL in this instance, and I have left
those alone for now and instead taught MOD_QUIESCE to accept this
as "didn't do anything".


132094 13-Jul-2004 phk

Another LINT compilation fix


132093 13-Jul-2004 phk

Make LINT compile


132037 12-Jul-2004 rwatson

Remove 'td = curthread' that shadows the arguments to coda_root().

Missed by: alfred


132023 12-Jul-2004 alfred

Make VFS_ROOT() and vflush() take a thread argument.
This is to allow filesystems to decide based on the passed thread
which vnode to return.
Several filesystems used curthread, they now use the passed thread.


131924 10-Jul-2004 marcel

Update for the KDB framework:
o Call kdb_enter() instead of Debugger().


131923 10-Jul-2004 marcel

Update for the KDB framework:
o Call kdb_enter() instead of Debugger().
o Make debugging code conditional upon KDB instead of DDB.


131871 09-Jul-2004 des

Accumulate directory entries in a fixed-length sbuf, and uiomove them in
one go before returning. This avoids calling uiomove() while holding
allproc_lock.

Don't adjust uio->uio_offset manually, uiomove() does that for us.

Don't drop allproc_lock before calling panic().

Suggested by: alfred


131551 04-Jul-2004 phk

When we traverse the vnodes on a mountpoint we need to look out for
our cached 'next vnode' being removed from this mountpoint. If we
find that it was recycled, we restart our traversal from the start
of the list.

Code to do that is in all local disk filesystems (and a few other
places) and looks roughly like this:

MNT_ILOCK(mp);
loop:
for (vp = TAILQ_FIRST(&mp...);
(vp = nvp) != NULL;
nvp = TAILQ_NEXT(vp,...)) {
if (vp->v_mount != mp)
goto loop;
MNT_IUNLOCK(mp);
...
MNT_ILOCK(mp);
}
MNT_IUNLOCK(mp);

The code which takes vnodes off a mountpoint looks like this:

MNT_ILOCK(vp->v_mount);
...
TAILQ_REMOVE(&vp->v_mount->mnt_nvnodelist, vp, v_nmntvnodes);
...
MNT_IUNLOCK(vp->v_mount);
...
vp->v_mount = something;

(Take a moment and try to spot the locking error before you read on.)

On a SMP system, one CPU could have removed nvp from our mountlist
but not yet gotten to assign a new value to vp->v_mount while another
CPU simultaneously get to the top of the traversal loop where it
finds that (vp->v_mount != mp) is not true despite the fact that
the vnode has indeed been removed from our mountpoint.

Fix:

Introduce the macro MNT_VNODE_FOREACH() to traverse the list of
vnodes on a mountpoint while taking into account that vnodes may
be removed from the list as we go. This saves approx 65 lines of
duplicated code.

Split the insmntque() which potentially moves a vnode from one mount
point to another into delmntque() and insmntque() which does just
what the names say.

Fix delmntque() to set vp->v_mount to NULL while holding the
mountpoint lock.


131526 03-Jul-2004 phk

Remove "register" keyword and trailing white space.


131523 03-Jul-2004 tjr

By popular request, add a workaround that allows large (>128GB or so)
FAT32 filesystems to be mounted, subject to some fairly serious limitations.

This works by extending the internal pseudo-inode-numbers generated from
the file's starting cluster number to 64-bits, then creating a table
mapping these into arbitrary 32-bit inode numbers, which can fit in
struct dirent's d_fileno and struct vattr's va_fileid fields. The mappings
do not persist across unmounts or reboots, so it's not possible to export
these filesystems through NFS. The mapping table may grow to be rather
large, and may grow large enough to exhaust kernel memory on filesystems
with millions of files.

Don't enable this option unless you understand the consequences.


131003 24-Jun-2004 rwatson

Remove spls from portal_open(). Acquire socket lock while sleeping
waiting for the socket to connect and use msleep() on the socket
mute rather than tsleep(). Acquire socket buffer mutexes around
read-modify-write of socket buffer flags.


130994 23-Jun-2004 scottl

Make the udf_vnops side endian clean.


130986 23-Jun-2004 scottl

First half of making UDF be endian-clean. This addresses the vfsops side.


130960 23-Jun-2004 bde

Include <sys/mutex.h> and its prerequisite <sys/lock.h> instead of
depending on namespace pollution in <sys/vnode.h> for the definition
of mutex interfaces used in SOCKBUF_*LOCK().

Sorted includes.

Removed unused includes.


130952 23-Jun-2004 rwatson

Remove unlocked read annotation for sbspace(); the read is locked.


130678 18-Jun-2004 phk

Reduce a fair bit of the atomics because we are now called with a
lock from kern_conf.c and cdev's act a lot more like real objects
these days.


130665 18-Jun-2004 rwatson

Merge some additional leaf node socket buffer locking from
rwatson_netperf:

Introduce conditional locking of the socket buffer in fifofs kqueue
filters; KNOTE() will be called holding the socket buffer locks in
fifofs, but sometimes the kqueue() system call will poll using the
same entry point without holding the socket buffer lock.

Introduce conditional locking of the socket buffer in the socket
kqueue filters; KNOTE() will be called holding the socket buffer
locks in the socket code, but sometimes the kqueue() system call
will poll using the same entry points without holding the socket
buffer lock.

Simplify the logic in sodisconnect() since we no longer need spls.

NOTE: To remove conditional locking in the kqueue filters, it would
make sense to use a separate kqueue API entry into the socket/fifo
code when calling from the kqueue() system call.


130653 17-Jun-2004 rwatson

Merge additional socket buffer locking from rwatson_netperf:

- Lock down low hanging fruit use of sb_flags with socket buffer
lock.

- Lock down low hanging fruit use of so_state with socket lock.

- Lock down low hanging fruit use of so_options.

- Lock down low-hanging fruit use of sb_lowwat and sb_hiwat with
socket buffer lock.

- Annotate situations in which we unlock the socket lock and then
grab the receive socket buffer lock, which are currently actually
the same lock. Depending on how we want to play our cards, we
may want to coallesce these lock uses to reduce overhead.

- Convert a if()->panic() into a KASSERT relating to so_state in
soaccept().

- Remove a number of splnet()/splx() references.

More complex merging of socket and socket buffer locking to
follow.


130640 17-Jun-2004 phk

Second half of the dev_t cleanup.

The big lines are:
NODEV -> NULL
NOUDEV -> NODEV
udev_t -> dev_t
udev2dev() -> findcdev()

Various minor adjustments including handling of userland access to kernel
space struct cdev etc.


130585 16-Jun-2004 phk

Do the dreaded s/dev_t/struct cdev */
Bump __FreeBSD_version accordingly.


130551 16-Jun-2004 julian

Nice, is a property of a process as a whole..
I mistakenly moved it to the ksegroup when breaking up the process
structure. Put it back in the proc structure.


130513 15-Jun-2004 rwatson

Grab the socket buffer send or receive mutex when performing a
read-modify-write on the sb_state field. This commit catches only
the "easy" ones where it doesn't interact with as yet unmerged
locking.


130480 14-Jun-2004 rwatson

The socket field so_state is used to hold a variety of socket related
flags relating to several aspects of socket functionality. This change
breaks out several bits relating to send and receive operation into a
new per-socket buffer field, sb_state, in order to facilitate locking.
This is required because, in order to provide more granular locking of
sockets, different state fields have different locking properties. The
following fields are moved to sb_state:

SS_CANTRCVMORE (so_state)
SS_CANTSENDMORE (so_state)
SS_RCVATMARK (so_state)

Rename respectively to:

SBS_CANTRCVMORE (so_rcv.sb_state)
SBS_CANTSENDMORE (so_snd.sb_state)
SBS_RCVATMARK (so_rcv.sb_state)

This facilitates locking by isolating fields to be located with other
identically locked fields, and permits greater granularity in socket
locking by avoiding storing fields with different locking semantics in
the same short (avoiding locking conflicts). In the future, we may
wish to coallesce sb_state and sb_flags; for the time being I leave
them separate and there is no additional memory overhead due to the
packing/alignment of shorts in the socket buffer structure.


129911 01-Jun-2004 truckman

Add MSG_NBIO flag option to soreceive() and sosend() that causes
them to behave the same as if the SS_NBIO socket flag had been set
for this call. The SS_NBIO flag for ordinary sockets is set by
fcntl(fd, F_SETFL, O_NONBLOCK).

Pass the MSG_NBIO flag to the soreceive() and sosend() calls in
fifo_read() and fifo_write() instead of frobbing the SS_NBIO flag
on the underlying socket for each I/O operation. The O_NONBLOCK
flag is a property of the descriptor, and unlike ordinary sockets,
fifos may be referenced by multiple descriptors.


129880 30-May-2004 phk

add missing #include <sys/module.h>


129355 17-May-2004 truckman

Switch from using the vnode interlock to a private mutex in fifo_open()
to avoid lock order problems when manipulating the sockets associated
with the fifo.

Minor optimization of a couple of calls to fifo_cleanup() from
fifo_open().


128992 06-May-2004 alc

Make vm_page's PG_ZERO flag immutable between the time of the page's
allocation and deallocation. This flag's principal use is shortly after
allocation. For such cases, clearing the flag is pointless. The only
unusual use of PG_ZERO is in vfs_bio_clrbuf(). However, allocbuf() never
requests a prezeroed page. So, vfs_bio_clrbuf() never sees a prezeroed
page.

Reviewed by: tegge@


128171 12-Apr-2004 phk

Do not drop Giant around the poll method yet, we're not ready for it.


128019 07-Apr-2004 imp

Remove advertising clause from University of California Regent's
license, per letter dated July 22, 1999 and email from Peter Wemm,
Alan Cox and Robert Watson.

Approved by: core, peter, alc, rwatson


127694 01-Apr-2004 pjd

Remove ps_argsopen from this check, because of two reasons:
1. This check if wrong, because it is true by default
(kern.ps_argsopen is 1 by default) (p_cansee() is not even checked).
2. Sysctl kern.ps_argsopen is going away.


127652 31-Mar-2004 rwatson

Export uipc_connect2() from uipc_usrreq.c instead of unp_connect2(),
and consume that interface in portalfs and fifofs instead. In the
new world order, unp_connect2() assumes that the unpcb mutex is
held, whereas uipc_connect2() validates that the passed sockets are
UNIX domain sockets, then grabs the mutex.

NB: the portalfs and fifofs code gets down and dirty with UNIX domain
sockets. Maybe this is a bad thing.


127603 30-Mar-2004 scottl

Catch all cases where bread() returns an error and a valid *bp, and release
the *bp.

Obtained from: DragonFlyBSD


127592 29-Mar-2004 peter

Clean up the stub fake vnode locking implemenations. The main reason this
stuff was here (NFS) was fixed by Alfred in November. The only remaining
consumer of the stub functions was umapfs, which is horribly horribly
broken. It has missed out on about the last 5 years worth of maintenence
that was done on nullfs (from which umapfs is derived). It needs major
work to bring it up to date with the vnode locking protocol. umapfs really
needs to find a caretaker to bring it into the 21st century.

Functions GC'ed:
vop_noislocked, vop_nolock, vop_nounlock, vop_sharedlock.


126998 14-Mar-2004 rwatson

Don't reject FAT file systems with a number of "Heads" greater than
255; USB keychains exist that use 256 as the number of heads. This
check has also been removed in Darwin (along with most of the other
head/sector sanity checks).


126975 14-Mar-2004 green

When taking event callbacks (like process_exit) out from under Giant, those
which do not lock Giant themselves will be exposed. Unbreak pfs_exit().


126858 11-Mar-2004 phk

When I was a kid my work table was one cluttered mess an cleaning it up
were a rather overwhelming task. I soon learned that if you don't know
where you're going to store something, at least try to pile it next to
something slightly related in the hope that a pattern emerges.

Apply the same principle to the ffs/snapshot/softupdates code which have
leaked into specfs: Add yet a buf-quasi-method and call it from the
only two places I can see it can make a difference and implement the
magic in ffs_softdep.c where it belongs.

It's not pretty, but at least it's one less layer violated.


126851 11-Mar-2004 phk

Remove unused second arg to vfinddev().
Don't call addaliasu() on VBLK nodes.


126823 10-Mar-2004 phk

Don't call devsw() more than we need to, and in particular do not expose
ourselves to device removal by not checking for it the second time.

Use count_dev(dev) rather than vcount(vp)


126532 03-Mar-2004 scottl

Change __FUNCTION__ to __func__

Submitted by: Stefan Farfeleder


126425 01-Mar-2004 rwatson

Rename dup_sockaddr() to sodupsockaddr() for consistency with other
functions in kern_socket.c.

Rename the "canwait" field to "mflags" and pass M_WAITOK and M_NOWAIT
in from the caller context rather than "1" or "0".

Correct mflags pass into mac_init_socket() from previous commit to not
include M_ZERO.

Submitted by: sam


126191 24-Feb-2004 phk

Do not attempt to open NODEV


126133 23-Feb-2004 tjr

Fix comment containing vop_readdir_args contents: a_cookies is really
u_long ** not u_long *.


126132 23-Feb-2004 tjr

cookies is an array of u_long, not u_int, so MALLOC() it accordingly.
Allocating it with the wrong size could have caused corruption on
64-bit architectures.


126086 21-Feb-2004 bde

Fixed a serious off by 1 error. The cluster-in-use bitmap was overrun
by 1 u_int if the number of clusters was 1 more than a multiple of
(8 * sizeof(u_int)). The bitmap is malloced and large (often huge), so
fatal overrun probably only occurred if the number of clusters was 1
more than 1 multiple of PAGE_SIZE/8.


126082 21-Feb-2004 phk

Device megapatch 6/6:

This is what we came here for: Hang dev_t's from their cdevsw,
refcount cdevsw and dev_t and generally keep track of things a lot
better than we used to:

Hold a cdevsw reference around all entrances into the device driver,
this will be necessary to safely determine when we can unload driver
code.

Hold a dev_t reference while the device is open.

KASSERT that we do not enter the driver on a non-referenced dev_t.

Remove old D_NAG code, anonymous dev_t's are not a problem now.

When destroy_dev() is called on a referenced dev_t, move it to
dead_cdevsw's list. When the refcount drops, free it.

Check that cdevsw->d_version is correct. If not, set all methods
to the dead_*() methods to prevent entrance into driver. Print
warning on console to this effect. The device driver may still
explode if it is also incompatible with newbus, but in that case
we probably didn't get this far in the first place.


126081 21-Feb-2004 phk

Device megapatch 5/6:

Remove the unused second argument from udev2dev().

Convert all remaining users of makedev() to use udev2dev(). The
semantic difference is that udev2dev() will only locate a pre-existing
dev_t, it will not line makedev() create a new one.

Apart from the tiny well controlled windown in D_PSEUDO drivers,
there should no longer be any "anonymous" dev_t's in the system
now, only dev_t's created with make_dev() and make_dev_alias()


126080 21-Feb-2004 phk

Device megapatch 4/6:

Introduce d_version field in struct cdevsw, this must always be
initialized to D_VERSION.

Flip sense of D_NOGIANT flag to D_NEEDGIANT, this involves removing
four D_NOGIANT flags and adding 145 D_NEEDGIANT flags.


126019 19-Feb-2004 phk

Report the correct length for symlink entries.


125992 19-Feb-2004 tjr

Use size_t or ssize_t wherever appropriate instead of casting from int *
to size_t *, which is incorrect because they may have different widths.
This caused some subtle forms of corruption, the mostly frequently
reported one being that the last character of a filename was sometimes
duplicated on amd64.


125942 17-Feb-2004 trhodes

Do not place dirmask in unnamed padding. Move it to the bottom of this
list where it should have been added originally.

Prodded by: bde


125934 17-Feb-2004 tjr

If the "next free cluster" field of the FSInfo block is 0xFFFFFFFF,
it means that the correct value is unknown. Since this value is just
a hint to improve performance, initially assume that the first non-reserved
cluster is free, then correct this assumption if necessary before writing
the FSInfo block back to disk.

PR: 62826
MFC after: 2 weeks


125855 15-Feb-2004 phk

White-space align a struct definition.
Move a SYSINIT to the file where it belongs.


125796 14-Feb-2004 bde

Fixed some style bugs:
- don't unlock the vnode after vinvalbuf() only to have to relock it
almost immediately.
- don't refer to devices classified by vn_isdisk() as block devices.


125739 12-Feb-2004 bde

MFffs (ffs_vfsops.c 1.227: clean up open mode bandaid). This reduces
gratuitous differences with ffs a little.


125671 10-Feb-2004 nectar

Fix a panic in pseudofs(9) that could occur when doing an I/O
operation with a large request or large offset.

Reported by: Joel Ray Holveck <joelh@piquan.org>
Submitted by: des


125637 10-Feb-2004 tjr

Fixes problems that occurred when a file was removed and a directory
created with the same name, and vice versa:
- Immediately recycle vnodes of files & directories that have been deleted
or renamed.
- When looking an entry in the VFS name cache or smbfs's private
cache, make sure the vnode type is consistent with the type of file
the server thinks it is, and re-create the vnode if it isn't.

The alternative to this is to recycle vnodes unconditionally when their
use count drops to 0, but this would make all the caching we do
mostly useless.

PR: 62342
MFC after: 2 weeks


125454 04-Feb-2004 jhb

Locking for the per-process resource limits structure.
- struct plimit includes a mutex to protect a reference count. The plimit
structure is treated similarly to struct ucred in that is is always copy
on write, so having a reference to a structure is sufficient to read from
it without needing a further lock.
- The proc lock protects the p_limit pointer and must be held while reading
limits from a process to keep the limit structure from changing out from
under you while reading from it.
- Various global limits that are ints are not protected by a lock since
int writes are atomic on all the archs we support and thus a lock
wouldn't buy us anything.
- All accesses to individual resource limits from a process are abstracted
behind a simple lim_rlimit(), lim_max(), and lim_cur() API that return
either an rlimit, or the current or max individual limit of the specified
resource from a process.
- dosetrlimit() was renamed to kern_setrlimit() to match existing style of
other similar syscall helper functions.
- The alpha OSF/1 compat layer no longer calls getrlimit() and setrlimit()
(it didn't used the stackgap when it should have) but uses lim_rlimit()
and kern_setrlimit() instead.
- The svr4 compat no longer uses the stackgap for resource limits calls,
but uses lim_rlimit() and kern_setrlimit() instead.
- The ibcs2 compat no longer uses the stackgap for resource limits. It
also no longer uses the stackgap for accessing sysctl's for the
ibcs2_sysconf() syscall but uses kernel_sysctl() instead. As a result,
ibcs2_sysconf() no longer needs Giant.
- The p_rlimit macro no longer exists.

Submitted by: mtm (mostly, I only did a few cleanups and catchups)
Tested on: i386
Compiled on: alpha, amd64


124804 21-Jan-2004 cperciva

Fix style(9) of my previous commit.

Noticed by: nate
Approved by: nate, rwatson (mentor)


124798 21-Jan-2004 cperciva

Allow devfs path rules to work on directories. Without this fix,
devfs rule add path fd unhide
is a no-op, while it should unhide the fd subdirectory.

Approved by: phk, rwatson (mentor)
PR: kern/60897


124728 19-Jan-2004 kan

Spell magic '16' number as IO_SEQSHIFT.


124600 16-Jan-2004 green

Do not allow operations which cause known file-system corruption.


124599 16-Jan-2004 green

Remove a warning.


124593 16-Jan-2004 green

Fix an upper-vnode leak created in revision 1.52. When an upper-layer
file has been removed, it should be purged from the cache, but it need
not be removed from the directory stack causing corruption; instead,
it will simply be removed once the last references and holds on it
are dropped at the end of the unlink/rmdir system calls, and the
normal !UN_CACHED VOP_INACTIVE() handler for unionfs finishes it off.

This is easily reproduced by repeated "echo >file; rm file" on a
unionfs mount. Strangely, "echo -n >file; rm file" didn't make
it happen.


124434 12-Jan-2004 tjr

Fix an inverted test for NOPEN in the unused function smb_smb_flush().


124404 11-Jan-2004 truckman

Don't try to unlock the directory vnode in null_lookup() if the lock is
shared with the underlying file system and the lookup in the underlying
file system did the unlock for us.


124326 10-Jan-2004 tjr

Restore closing of SMB find handle in smbfs_close().


124219 07-Jan-2004 rwatson

Lock p->p_textvp before calling vn_fullpath() on it. Note the
potential lock order concern due to the vnode lock held
simultaneously by the caller into procfs.

Reported by: kuriyama
Approved by: des


124115 04-Jan-2004 tjr

In smbfs_inactive(), only invalidate the node's attribute cache if we
had to send a file close request to the server.


124090 03-Jan-2004 tjr

Pass ACL, extended attribute and MAC vnode ops down the vnode stack.


124081 02-Jan-2004 phk

Improve on POLA by populating DEVFS before doing devfs(8) rule ioctls.

PR: 60687
Spotted by: Colin Percival <cperciva@daemonology.net>


123967 29-Dec-2003 bde

Fixed some (most) style bugs in rev.1.33. Mainly 4-char indentation
(msdosfs uses normal 8-char indentation almost everywhere else),
too-long lines, and minor English usage errors. The verbose formal
comment before the new function is still abnormal.


123964 29-Dec-2003 bde

Fixed some minor style bugs in rev.1.144. All related to msdosfs_advlock()
(mainly unsorting). There were no changes related to the dirty flag
here. The reference NetBSD implementation put msdosfs_advlock() in a
different place. This commit only moves its declarations and changes
some of the function body to be like the NetBSD version.


123963 29-Dec-2003 bde

Fixed style bugs in rev.1.112. The bugs started with obscure magic
numbers in comments (Apple PR numbers?) and didn't improve.


123932 28-Dec-2003 bde

v_vxproc was a bogus name for a thread (pointer).


123873 26-Dec-2003 trhodes

Make msdosfs support the dirty flag in FAT16 and FAT32.
Enable lockf support.

PR: 55861
Submitted by: Jun Su <junsu@m-net.arbornet.org> (original version)
Reviewed by: make universe


123724 22-Dec-2003 tjr

Make oldsize in smbfs_getattr() 64 bits wide instead of 32 to avoid
truncation when files are larger than 4GB.


123559 16-Dec-2003 tjr

Avoid sign extension when casting signed characters to unsigned wide
characters in ntfs_u28(). This fixes the conversion of filenames containing
single-byte characters with the high bit set.


123293 08-Dec-2003 fjoe

Make msdosfs long filenames matching case insensitive again.

PR: 59765
Submitted by: Ryuichiro Imura <imura@ryu16.org>


123248 07-Dec-2003 des

Constify, and add an API function to find a named node in a directory.


123247 07-Dec-2003 des

Minor whitespace and style issues.


123245 07-Dec-2003 des

Remove useless SMP check code.


123215 07-Dec-2003 scottl

Re-arrange and consolidate some random debugging stuff


122893 19-Nov-2003 kan

Fix vnode locking in fdesc_setattr. Lock vnode before invoking
VOP_SETATTR on it.

Approved by: re@ (rwatson)


122772 16-Nov-2003 truckman

Use "fip->fi_readers == 0 && fip->fi_writers == 0" as the condition for
disposing fifo resources in fifo_cleanup() instead using of
"vp->v_usecount == 1". There may be other references to the vnode, for
instance by nullfs, at the time fifo_open() or fifo_close() is called,
which could cause a resource leak.

Don't bother grabbing the vnode interlock in fifo_cleanup() since it no
longer accesses v_usecount.


122652 14-Nov-2003 das

- A sanity check in unionfs verifies that lookups of '.' return the
vnode of the parent. However, this check should not be performed if
the lookup failed. This change should fix "union_lookup returning
. not same as startdir" panics people were seeing. The bug was
introduced by an incomplete import of a NetBSD delta in rev 1.38.
- Move the aforementioned check out from DIAGNOSTIC. Performance
is the least of our unionfs worries.
- Minor reorganization.

PR: 53004
MFC after: 1 week


122608 13-Nov-2003 phk

Initialize b_iooffset correctly.


122552 12-Nov-2003 phk

Don't mess around with spare fields of public structures.


122551 12-Nov-2003 phk

Don't mess about with spare fields in public structures.


122524 12-Nov-2003 rwatson

Modify the MAC Framework so that instead of embedding a (struct label)
in various kernel objects to represent security data, we embed a
(struct label *) pointer, which now references labels allocated using
a UMA zone (mac_label.c). This allows the size and shape of struct
label to be varied without changing the size and shape of these kernel
objects, which become part of the frozen ABI with 5-STABLE. This opens
the door for boot-time selection of the number of label slots, and hence
changes to the bound on the number of simultaneous labeled policies
at boot-time instead of compile-time. This also makes it easier to
embed label references in new objects as required for locking/caching
with fine-grained network stack locking, such as inpcb structures.

This change also moves us further in the direction of hiding the
structure of kernel objects from MAC policy modules, not to mention
dramatically reducing the number of '&' symbols appearing in both the
MAC Framework and MAC policy modules, and improving readability.

While this results in minimal performance change with MAC enabled, it
will observably shrink the size of a number of critical kernel data
structures for the !MAC case, and should have a small (but measurable)
performance benefit (i.e., struct vnode, struct socket) do to memory
conservation and reduced cost of zeroing memory.

NOTE: Users of MAC must recompile their kernel and all MAC modules as a
result of this change. Because this is an API change, third party
MAC modules will also need to be updated to make less use of the '&'
symbol.

Suggestions from: bmilekic
Obtained from: TrustedBSD Project
Sponsored by: DARPA, Network Associates Laboratories


122444 10-Nov-2003 truckman

If fifo_open() is interrupted, fifo_close() may not get called, causing
a resource leak. Move the resource deallocation code from fifo_close()
to a new function, fifo_cleanup(), and call fifo_cleanup() from
fifo_close() and the appropriate places in fifo_open().

Tested by: Lukas Ertl
Pointy hat to: truckman


122352 09-Nov-2003 tanimura

- Implement selwakeuppri() which allows raising the priority of a
thread being waken up. The thread waken up can run at a priority as
high as after tsleep().

- Replace selwakeup()s with selwakeuppri()s and pass appropriate
priorities.

- Add cv_broadcastpri() which raises the priority of the broadcast
threads. Used by selwakeuppri() if collision occurs.

Not objected in: -arch, -current


122102 05-Nov-2003 scottl

Add hooks for translating directories entries using the iconv methods.

Submitted by: imura@ryu16.org


122101 05-Nov-2003 scottl

Add udf_UncompressUnicodeByte() for processing cs0 strings in a way that the
iconv mehtods can handle

Submitted by: imura@ryu16.org


122091 05-Nov-2003 kan

Remove mntvnode_mtx and replace it with per-mountpoint mutex.
Introduce two new macros MNT_ILOCK(mp)/MNT_IUNLOCK(mp) to
operate on this mutex transparently.

Eventually new mutex will be protecting more fields in
struct mount, not only vnode list.

Discussed with: jeff


121874 02-Nov-2003 kan

Take care not to call vput if thread used in corresponding vget
wasn't curthread, i.e. when we receive a thread pointer to use
as a function argument. Use VOP_UNLOCK/vrele in these cases.

The only case there td != curthread known at the moment is
boot() calling sync with thread0 pointer.

This fixes the panic on shutdown people have reported.


121859 01-Nov-2003 kan

Remove now unused variable.


121847 01-Nov-2003 kan

Temporarily undo parts of the stuct mount locking commit by jeff.
It is unsafe to hold a mutex across vput/vrele calls.

This will be redone when a better locking strategy is agreed upon.

Discussed with: jeff


121842 01-Nov-2003 kan

Do not bother walking mount point vnode list just to calculate
the number of vnodes. Use precomputed mp->mnt_nvnodelistsize
value instead.


121281 20-Oct-2003 phk

Remember to check the DE_WHITEOUT flag in the case where a cloned
device is hidden by a devfs(8) rule.

Spotted by: Adam Nowacki <ptnowak@bsk.vectranet.pl>


121270 20-Oct-2003 phk

When a driver successfully created a device on demand, we can directly
pick up the DEVFS inode number from the dev_t and find our directory
entry from that, we don't need to scan the directory to find it.

This also solves an issue with on-demand devices in subdirectories.

Submitted by: cognet


121247 19-Oct-2003 mux

Remove debug printf().


121223 18-Oct-2003 phk

Initialize b_iooffset before calling strategy


121205 18-Oct-2003 phk

DuH!

bp->b_iooffset (the spot on the disk), not bp->b_offset (the offset in
the file)


121198 18-Oct-2003 phk

Initialize b_offset before calling VOP_SPECSTRATEGY()


121196 18-Oct-2003 phk

Initialize b_offset before calling VOP_STRATEGY/VOP_SPECSTRATEGY.

Remove various comments of KASSERTS and comments about B_PHYS which
does not apply anymore.


121190 18-Oct-2003 phk

Convert some if(bla) panic("foo") to KASSERTS to improve grep-ability.


121121 15-Oct-2003 phk

Introduce a new optional memberfunction for cdevsw, fdopen() which
passes the fdidx from VOP_OPEN down.

This is for all I know the final API for this functionality, but
the locking semantics for messing with the filedescriptor from
the device driver are not settled at this time.


120794 05-Oct-2003 bde

Include <sys/mutex.h>. Don't depend on namespace pollution in <sys/vnode.h>.

Fixed a nearby style bug. The include of vcoda.h used angle brackets and
was not used.


120785 05-Oct-2003 jeff

- Check the XLOCK prior to inspecting v_data.


120784 05-Oct-2003 jeff

- Check XLOCK prior to accessing v_data.


120778 05-Oct-2003 jeff

- Don't cache_purge() in cd9660_reclaim. vclean() does it for us so
this is redundant.


120775 05-Oct-2003 jeff

- Don't cache_purge() in *_reclaim routines. vclean() does it for us so
this is redundant.


120770 04-Oct-2003 alc

Synchronize access to a vm page's valid field using the containing
vm object's lock.


120735 04-Oct-2003 jeff

- Make proper use of the mntvnode_mtx. We do not need the loop label
because we do not drop the mntvnode_mtx. If this code had ever executed
and hit the loop condition it would have spun forever.


120733 04-Oct-2003 jeff

- Acquire the vnode interlock prior to droping the mntvnode_mtx. This does
not eliminate races where the vnode could be reclaimed and end up with
a NULL v_data pointer but Giant is protecting us from that at the moment.


120731 04-Oct-2003 alc

Synchronize access to a page's valid field by using the lock from its
containing object.


120730 04-Oct-2003 jeff

- Remove the backtrace() call from the *_vinvalbuf() functions. Thanks to a
stack trace supplied by phk, I now understand what's going on here. The
check for VI_XLOCK stops us from calling vinvalbuf once the vnode has been
partially torn down in vclean(). It is not clear that this would cause
a problem. Document this in nfs_bio.c, which is where the other two
filesystems copied this code from.


120665 02-Oct-2003 nectar

Introduce a uiomove_frombuf helper routine that handles computing and
validating the offset within a given memory buffer before handing the
real work off to uiomove(9).

Use uiomove_frombuf in procfs to correct several issues with
integer arithmetic that could result in underflows/overflows. As a
side-effect, the code is significantly simplified.

Add additional sanity checks when computing a memory allocation size
in pfs_read.

Submitted by: rwatson (original uiomove_frombuf -- bugs are mine :-)
Reported by: Joost Pol <joost@pine.nl> (integer underflows/overflows)


120583 29-Sep-2003 rwatson

Add a new column to the procfs map to hold the name of the mapped
file for vnode mappings. Note that this uses vn_fullpath() and may
be somewhat unreliable, although not too unreliable for shared
libraries. For non-vnode mappings, just print "-" for the field.

Obtained from: TrustedBSD Projects
Sponsored by: DARPA, AFRL, Network Associates Laboratories


120511 27-Sep-2003 phk

forgot to remove static declaration of fdesc_poll()


120509 27-Sep-2003 phk

fdesc_poll() called seltrue() to do the default thing, this is pointlessly
wrong when we have a default in vop_nopoll() which does the right thing.


120498 27-Sep-2003 bde

Fixed some style bugs in previous commit. Mainly, forward-declare
struct msdosfsmount so that this file has the same prerequisites as
it used to. The new prerequistite was a meta-style bug. It required
many style bugs (unsorted includes ...) elsewhere.

Formatted prototypes in KNF. Resisted urge to sort all the prototypes,
to minimise differences with NetBSD. (NetBSD has reformatted the
prototypes but has not sorted them and still uses __P(()).)


120492 26-Sep-2003 fjoe

- Support for multibyte charsets in LIBICONV.
- CD9660_ICONV, NTFS_ICONV and MSDOSFS_ICONV kernel options
(with corresponding modules).
- kiconv(3) for loadable charset conversion tables support.

Submitted by: Ryuichiro Imura <imura@ryu16.org>


120471 26-Sep-2003 tjr

Allow the [, ], and = characters in non-8.3 filenames since they
are allowed by Windows (ref: MS KB article 120138).

XXX From my reading of the CIFS specification, it's not clear that
clients need to validate filenames at all.

PR: 57123
Submitted by: Paul Coucher
MFC after: 1 month


120264 19-Sep-2003 jeff

- Remove interlock protection around VI_XLOCK. The interlock is not
sufficient to guarantee that this race is not hit. The XLOCK will likely
have to be redesigned due to the way reference counting and mutexes work
in FreeBSD. We currently can not be guaranteed that xlock was not set
and cleared while we were blocked on the interlock while waiting to check
for XLOCK. This would lead us to reference a vnode which was not the
vnode we requested.
- Add a backtrace() call inside of INVARIANTS in the hopes of finding out if
this condition is ever hit. It should not, since we should be retaining
a reference to the vnode in these cases. The reference would be sufficient
to block recycling.


120011 13-Sep-2003 tjr

Move an overly verbose message under #ifdef CODA_VERBOSE.


119942 10-Sep-2003 tjr

Move an annoying printf() call that gets triggered every time an
operation is interrupted (with ^C or ^Z) under CODA_VERBOSE.


119832 07-Sep-2003 tjr

Add support for the Coda 6.x venus<->kernel interface. This extends
FIDs to be 128-bits wide and adds support for realms.

Add a new CODA_COMPAT_5 option, which requests support for the old
Coda 5.x interface instead of the new one.

Create a new coda5.ko module that supports the 5.x interface, and make
the existing coda.ko module use the new 6.x interface. These modules
cannot both be loaded at the same time.

Obtained from: Jan Harkes & the coda-6.0.2 distribution,
NetBSD (drochner) (CODA_COMPAT_5 option).


119514 28-Aug-2003 marcel

The valid field in struct vm_page can be of type unsigned long when
32K pages are selected. In spec_getpages() change the printf format
specifier and add an explicit cast so that we always print the field
as a long type.


119318 22-Aug-2003 alc

Use the requested page's object field instead of the vnode's. In some
cases, the vnode's object field is not initialized leading to a NULL
pointer dereference when the object is locked.

Tested by: rwatson


119122 19-Aug-2003 des

Add pfs_visible() checks to pfs_getattr() and pfs_getextattr(). This
also fixes pfs_access() since it relies on VOP_GETATTR() which will call
pfs_getattr(). This prevents jailed processes from discovering the
existence, start time and ownership of processes outside the jail.

PR: kern/48156


119091 18-Aug-2003 jhb

Spell the name of the lock right in addition to getting the type right.

Submitted by: Kim Culhan <kimc@w8hd.org>


119089 18-Aug-2003 jhb

The allproc lock is a sx lock, not a mutex, so fix the assertion. This
asserts that the sx lock is held, but does not specify if the lock is held
shared or exclusive, thus either type of lock satisfies the assertion.


119069 18-Aug-2003 des

Rework pfs_iterate() a bit to eliminate a bug related to process
directories. Previously, pfs_iterate() would return -1 when it
reached the end of the process list while processing a process
directory node, even if the parent directory contained further nodes
(which is the case for the linprocfs root directory, where the process
directory node is actually first in the list). With this patch,
pfs_iterate() will continue to traverse the parent directory's node
list after exhausting the process list (as was the intention all
along). The code should hopefully be easier to read as well.

While I'm here, have pfs_iterate() assert that the allproc lock is
held.


119055 17-Aug-2003 phk

Do not call VOP_BMAP() on our own vnodes.

It is particularly silly when all it does is a minor piece of math.


118907 14-Aug-2003 rwatson

Add p_candebug() check to access a process map file in procfs; limit
access to map information for processes that you wouldn't otherwise
have debug rights on.

Tested by: bms


118837 12-Aug-2003 trhodes

Add a '-M mask' option so that users can have different
masks for files and directories. This should make some
of the Midnight Commander users happy.

Remove an extra ')' in the manual page.

PR: 35699
Submitted by: Eugene Grosbein <eugen@grosbein.pp.ru> (original version)
Tested by: simon


118607 07-Aug-2003 jhb

Consistently use the BSD u_int and u_short instead of the SYSV uint and
ushort. In most of these files, there was a mixture of both styles and
this change just makes them self-consistent.

Requested by: bde (kern_ktrace.c)


118520 06-Aug-2003 phk

Don't drop giant around ->d_strategy(), too much code explodes.


118463 05-Aug-2003 phk

Only drop Giant around the drivers ->d_strategy() if the buffer is not
marked to prevent this.


118047 26-Jul-2003 phk

Add a "int fd" argument to VOP_OPEN() which in the future will
contain the filedescriptor number on opens from userland.

The index is used rather than a "struct file *" since it conveys a bit
more information, which may be useful to in particular fdescfs and /dev/fd/*

For now pass -1 all over the place.


118035 26-Jul-2003 tjr

Revise and improve ntfs_subr.c 1.30: read only a single cluster at a time
in ntfs_writentvattr_plain and ntfs_readntvattr_plain, and purge the boot
block from the buffer cache if isn't exactly one cluster long. These two
changes work around the same buffer cache bug that ntfs_subr.c 1.30 tried
to, but in a different way. This may decrease throughput by reading smaller
amounts of data from the disk at a time, but may increase it by avoiding
bogus writes of clean buffers.
Problem (re)reported by Karel J. Bosschaart on -current.


117949 24-Jul-2003 peter

size_t != int. Make this compile on 64 bit platforms (eg: amd64).
Also, "u_short value; if (value > 0xffff)" can never be true.


117200 03-Jul-2003 trhodes

If bread() returns a zero-length buffer, as can happen after a
failed write, return an error instead of looping forever.

PR: 37035
Submitted by: das


117018 29-Jun-2003 tjr

XXX Copy workaround from UFS: open device for write access even if
the user requests a read-only mount. This is necessary because we
don't do the VOP_OPEN again if they upgrade a read-only mount to
read-write.

Fixes lockup when creating files on msdosfs mounts that have been
mounted read-only then upgraded to read-write. The exact cause of
the lockup is not known, but it is likely to be the kernel getting
stuck in an infinite loop trying to write dirty buffers to a device
without write permission.

Reported/tested by andreas, discussed with phk.


116917 27-Jun-2003 trhodes

Fix a bug where a truncate operation involving truncate() or ftruncate() on
an MSDOSFS file system either failed, silently corrupted the file, or
sometimes corrupted the neighboring file.

PR: 53695
Submitted by: Ariff Abdullah <skywizard@MyBSD.org.my> (original version)
MFC: 3 days


116796 24-Jun-2003 jmg

change dev_t to struct cdev * to match ufs. This fixes fstat for cd9660
and msdosfs.

Reviewed by: bde


116678 22-Jun-2003 phk

Add a f_vnode field to struct file.

Several of the subtypes have an associated vnode which is used for
stuff like the f*() functions.

By giving the vnode a speparate field, a number of checks for the specific
subtype can be replaced simply with a check for f_vnode != NULL, and
we can later free f_data up to subtype specific use.

At this point in time, f_data still points to the vnode, so any code I
might have overlooked will still work.


116639 20-Jun-2003 jmg

fix grammar in comment


116620 20-Jun-2003 tjr

Merge from NetBSD src/sys/ntfs/ntfs_subr.c 1.5 & 1.30 (jdolecek):
- Avoid calling bread() with different sizes on the same blkno.
Although the buffer cache is designed to handle differing size
buffers, it erroneously tries to write the incorrectly-sized buffer
buffer back to disk before reading the correctly-sized one, even
when it's not dirty. This behaviour caused a panic for read-only
NTFS mounts when INVARIANTS was enabled ("bundirty: buffer x still
on queue y"), reported by NAKAJI Hiroyuki.
- Fix a bug in the code handling holes: a variable was incremented
instead of decremented, which could cause an infinite loop.


116583 19-Jun-2003 alc

Lock the vm object when freeing a vm page.


116561 19-Jun-2003 alc

Lock the vm object when freeing a vm page.


116560 19-Jun-2003 alc

Lock the vm object when freeing a vm page.


116486 17-Jun-2003 tjr

Send the close request to the SMB server in smbfs_inactive(), instead of
smbfs_close(). This fixes paging to and from mmap()'d regions of smbfs
files after the descriptor has been closed, and makes thttpd, GNU ld,
and perhaps more things work that depend on being able to do this.

PR: 48291


116472 17-Jun-2003 tjr

Set f_mntfromname[] to "fdescfs" instead of "fdesc" for consistency
with other synthetic filesystems, which have f_mntfromname the same
as f_fstypename. Noticed by Sean Kelly on -current.


116469 17-Jun-2003 tjr

MFp4: Fix two bugs causing possible deadlocks or panics, and one nit:
- Emulate lock draining (LK_DRAIN) in null_lock() to avoid deadlocks
when the vnode is being recycled.
- Don't allow null_nodeget() to return a nullfs vnode from the wrong
mount when multiple nullfs's are mounted. It's unclear why these checks
were removed in null_subr.c 1.35, but they are definitely necessary.
Without the checks, trying to unmount a nullfs mount will erroneously
return EBUSY, and forcibly unmounting with -f will cause a panic.
- Bump LOG2_SIZEVNODE up to 8, since vnodes are >256 bytes now. The old
value (7) didn't cause any problems, but made the hash algorithm
suboptimal.

These changes fix nullfs enough that a parallel buildworld succeeds.

Submitted by: tegge (partially; LK_DRAIN)
Tested by: kris


116447 16-Jun-2003 truckman

Partially back out rev 1.87 by nuking fifo_inactive() and moving the
resource deallocation back to fifo_close(). This eliminates any
stale data that might be stuck in the socket buffers after all the
readers and writers have closed the fifo.

Tested by: Thorsten Schroeder <ths@katjusha.de>


116418 15-Jun-2003 phk

In specfs::vop_specstratey(), assert that the vnode and buffer agree about
the device.


116416 15-Jun-2003 phk

I have not had any reports of trouble for a long time, so remove the
gentle versions of the vop_strategy()/vop_specstrategy() mismatch methods
and use vop_panic() instead.


116414 15-Jun-2003 phk

Take 2: Remove _both_ KASSERTS.


116413 15-Jun-2003 phk

Duh! I misread my handwritte notes: We do _not_ want to asser that
vp == bp->b_vp in specfs, that was the entire point of VOP_SPECSTRATEGY().


116412 15-Jun-2003 phk

Add the same KASSERT to all VOP_STRATEGY and VOP_SPECSTRATEGY implementations
to check that the buffer points to the correct vnode.


116410 15-Jun-2003 phk

Remove in toto coda_strategy which incorrectly implemented vop_panic();


116366 15-Jun-2003 das

Fix some style problems, some of which are old, some new, and some
inherited from UFS.

Requested by: bde, njl


116361 15-Jun-2003 davidxu

Rename P_THREADED to P_SA. P_SA means a process is using scheduler
activations.


116358 14-Jun-2003 das

If someone tries to mount a union filesystem with another unionfs as
the upper layer, fail gracefully instead of panicing.

MFC after: 3 days


116357 14-Jun-2003 das

Introduce malloc types M_UNDCACHE and M_UNPATH for important
unionfs-related data structures to aid in debugging memory leaks.
Use NULL and NULLVP instead of 0 as appropriate.

MFC after: 3 days


116356 14-Jun-2003 das

Factor out the process of freeing ``directory caches'', which unionfs
directory vnodes use to refer to their constituent vnodes, into
union_dircache_free(). Also s/union_dircache/union_dircache_get/ and
tweak the structure of union_dircache_r().

MFC after: 3 days


116338 14-Jun-2003 tjr

Don't follow smbnode n_parent pointer when NREFPARENT flag is not set
in smb_fphelp(): the parent vnode may have already been recycled
since we don't hold a reference to it. Fixes a panic when rebooting
with mdconfig -t vnode devices referring to vnodes on a smbfs mount.


116290 13-Jun-2003 das

Plug a serious memory leak. The -STABLE equivalent of this patch has
been tested extensively, but -CURRENT testing has been hampered by a
number of panics that also occur without the patch. Since the
destabilizing changes between 4.X and 5.X are external to unionfs,
I believe this patch applies equally well to both.

Thanks to scrappy for assistance testing these and other changes.

MFC after: 4 days


116281 13-Jun-2003 truckman

Clean up the fifo_open() implementation:

Restructure the error handling portion of the resource allocation
code to eliminate duplicated code.

Test for the O_NONBLOCK && fi_readers == 0 case before incrementing
fi_writers and modifying the the socket flag to avoid having to
undo these operations in this error case.

Restructure and simplify the code that handles blocking opens.

There should be no change to functionality.


116271 12-Jun-2003 phk

Initialize struct vfsops C99-sparsely.

Submitted by: hmp
Reviewed by: phk


116181 11-Jun-2003 obrien

Use __FBSDID().


116173 10-Jun-2003 obrien

Use __FBSDID().


115609 01-Jun-2003 truckman

Don't unlock the parent directory vnode twice if the ISDOTDOT flag
is set.


115602 01-Jun-2003 truckman

Fix up locking problems in fifo_open() and fifo_close():

Sleep on the vnode interlock while waiting for another
caller to increment fi_readers or fi_writers. Hold the
vnode interlock while incrementing fi_readers or fi_writers
to prevent a wakeup from being missed.

Only access fi_readers and fi_writers while holding the vnode
lock. Previously fifo_close() decremented their values without
holding a lock.

Move resource deallocation from fifo_close() to fifo_inactive(),
which allows the VOP_CLOSE() call in the error return path in
fifo_open() to be removed. Fifo_open() was calling VOP_CLOSE()
with the vnode lock held, in violation the current vnode locking
API. Also the way fifo_close() used vrefcnt() to decide whether
to deallocate resources was bogus according to comments in the
vrefcnt() implementation.

Reviewed by: bde


115549 31-May-2003 phk

Remove unused variable(s).

Found by: FlexeLint


115542 31-May-2003 phk

emove unused variable(s).

Found by: FlexeLint


115511 31-May-2003 phk

Remove unused variable.

Found by: FlexeLint


115486 31-May-2003 phk

Use temporary variable to avoid double expansion of macro with side effects.

Found by: FlexeLint


115485 31-May-2003 phk

Remove unused variable.

Found by: FlexeLint


114734 05-May-2003 rwatson

Clean up proc locking in procfs: make sure the proc lock is held before
entering sys_process.c debugging primitives, or we violate assertions.
Also, be more careful about releasing the process lock around calls
to uiomove() which may sleep waiting for paging machinations or
related notions. We may want to defer the uiomove() in at least
one case, but jhb will look into that at a later date.

Reported by: Philippe Charnier <charnier@xp11.frmug.org>
Reviewed by: jhb


114653 04-May-2003 scottl

Eliminate the separate malloc type for the sparing table.


114652 04-May-2003 scottl

Add a missing __inline. Strange that gcc never complained about it.
Implement udf_readlblks() in terms of RDSECTOR.


114651 04-May-2003 scottl

Correctly calculate the size of the extent that should be read in
udf_readatoffset(). This should fixe problems with reading udf filesystems
created with mkisofs.


114632 04-May-2003 scottl

Implement the node cache as a hash table.


114434 01-May-2003 des

Instead of recording the Unix time in a process when it starts, record the
uptime. Where necessary, convert it back to Unix time by adding boottime
to it. This fixes a potential problem in the accounting code, which would
compute the elapsed time incorrectly if the Unix time was stepped during
the lifetime of the process.


114216 29-Apr-2003 kan

Deprecate machine/limits.h in favor of new sys/limits.h.
Change all in-tree consumers to include <sys/limits.h>

Discussed on: standards@
Partially submitted by: Craig Rodrigues <rodrigc@attbi.com>


113979 24-Apr-2003 jhb

Fail to mount a device if the bytes per sector in the BPB is less than
DEV_BSIZE or if the number of FAT sectors is zero.


113867 22-Apr-2003 jhb

- Always call faultin() in _PHOLD() if PS_INMEM is clear. This closes a
race where a thread could assume that a process was swapped in by
PHOLD() when it actually wasn't fully swapped in yet.
- In faultin(), always msleep() if PS_SWAPPINGIN is set instead of doing
this check after bumping p_lock in the PS_INMEM == 0 case. Also,
sched_lock is only needed for setting and clearning swapping PS_*
flags and the swap thread inhibitor.
- Don't set and clear the thread swap inhibitor in the same loops as the
pmap_swapin/out_thread() since we have to do it under sched_lock.
Instead, mimic the treatment of the PS_INMEM flag and use separate loops
to set the inhibitors when clearing PS_INMEM and clear the inhibitors
when setting PS_INMEM.
- swapout() now returns with the proc lock held as it holds the lock
while adjusting the swapping-related PS_* flags so that the proc lock
can be used to test those flags.
- Only use the proc lock to check the swapping-related PS_* flags in
several places.
- faultin() no longer requires sched_lock to be held by callers.
- Rename PS_SWAPPING to PS_SWAPPINGOUT to be less ambiguous now that we
have PS_SWAPPINGIN.


113620 17-Apr-2003 jhb

- Use a local variable to close a minor race when determining if the wmesg
printed out needs a prefix such as when a thread is blocked on a lock.
- Use another local variable to close another race for the td_wmesg and
td_wchan members of struct thread.


113619 17-Apr-2003 jhb

Protect p_flag with the proc lock. The sched_lock is not needed to turn
off P_STOPPED_SIG in p_flag.


113618 17-Apr-2003 jhb

- P_SHOULDSTOP just needs proc lock now, so don't acquire sched_lock unless
it is needed.
- Add a proc lock assertion.


113617 17-Apr-2003 jhb

Add a proc lock assertion and move another assertion up to the top of the
function.


113310 10-Apr-2003 imp

It appears that msdosfs_init() is called multiple times. This happens
on my system where I preload msdosfs and have it in my kernel.
There's likely another bug that's causing msdosfs_init() to be called
multiple times, but this makes that harmless.


112934 01-Apr-2003 jeff

- smb_td_intr takes a thread as an argument not a proc.


112933 01-Apr-2003 jeff

- smb_proc_intr is now spelled smb_td_intr.

Noticed by: phk
Pointy hat to: jeffr


112916 01-Apr-2003 tjr

Specify the M_WAITOK flag explicitly in the MALLOC call to silence a
runtime warning ("Bad malloc flags: 0").


112915 01-Apr-2003 tjr

Give the M_WAITOK flag explicitly to the MALLOC call to silence a runtime
warning ("Bad malloc flags: 0").


112888 31-Mar-2003 jeff

- Move p->p_sigmask to td->td_sigmask. Signal masks will be per thread with
a follow on commit to kern_sig.c
- signotify() now operates on a thread since unmasked pending signals are
stored in the thread.
- PS_NEEDSIGCHK moves to TDF_NEEDSIGCHK.


112706 27-Mar-2003 tjr

Deregister the dev_clone event handler we registered - don't touch the
handlers installed by other devices.


112564 24-Mar-2003 jhb

Replace the at_fork, at_exec, and at_exit functions with the slightly more
flexible process_fork, process_exec, and process_exit eventhandlers. This
reduces code duplication and also means that I don't have to go duplicate
the eventhandler locking three more times for each of at_fork, at_exec, and
at_exit.

Reviewed by: phk, jake, almost complete silence on arch@


112529 24-Mar-2003 bde

Better fix for the problem addressed by rev.1.79: don't loop in
fifo_open() waiting for another reader or writer if one arrived and
departed while we were waiting (or a little earlier).

Rev.1.79 broke blocking opens of fifos by making them time out after 1
second. This was bad for at least apsfilter.

Tested by: "Simon 'corecode' Schubert" <corecode@corecode.ath.cx>,
Alexander Leidinger <Alexander@leidinger.net>,
phk
MFC after: 4 weeks


112317 16-Mar-2003 tjr

Make udf_allocv() return an unlocked vnode instead of a locked one
to avoid a "locking against myself" panic when udf_hashins() tries
to lock it again. Lock the vnode in udf_hashins() before adding it to
the hash bucket.


112183 13-Mar-2003 jeff

- Add a lock for protecting against msleep(bp, ...) wakeup(bp) races.
- Create a new function bdone() which sets B_DONE and calls wakup(bp). This
is suitable for use as b_iodone for buf consumers who are not going
through the buf cache.
- Create a new function bwait() which waits for the buf to be done at a set
priority and with a specific wmesg.
- Replace several cases where the above functionality was implemented
without locking with the new functions.


112119 11-Mar-2003 kan

Rename vfs_stdsync function to vfs_stdnosync which matches more
closely what function is really doing. Update all existing consumers
to use the new name.

Introduce a new vfs_stdsync function, which iterates over mount
point's vnodes and call FSYNC on each one of them in turn.

Make nwfs and smbfs use this new function instead of rolling their
own identical sync implementations.

Reviewed by: jeff


111960 07-Mar-2003 tjr

Set f_fstypename in coda_nb_statfs().


111945 06-Mar-2003 tjr

Add a temporary workaround for a deadlock in Coda venus 5.3.19 that
occurs when mounting the filesystem. The problem is that venus issues
the mount() syscall, which calls vfs_mount(), which calls coda_root()
which attempts to communicate with venus.


111944 06-Mar-2003 tjr

Remove fragments of support for the FreeBSD 3.x and 4.x branches.


111931 05-Mar-2003 tjr

VOP_PATHCONF returns a register_t, not an int. Noticed by phk.


111908 05-Mar-2003 tjr

Add prototype for coda_pathconf() that I missed in the previous commit.


111903 05-Mar-2003 tjr

Add a minimal implementation of VOP_PATHCONF to silence warning
messages from ls(1).


111902 05-Mar-2003 tjr

Handle the case where a_uio->uio_td == NULL properly in coda_readlink().
This happens when called from lookup().


111856 04-Mar-2003 jeff

- Add a new 'flags' parameter to getblk().
- Define one flag GB_LOCK_NOWAIT that tells getblk() to pass the LK_NOWAIT
flag to the initial BUF_LOCK(). This will eventually be used in cases
were we want to use a buffer only if it is not currently in use.
- Convert all consumers of the getblk() api to use this extra parameter.

Reviwed by: arch
Not objected to by: mckusick


111841 03-Mar-2003 njl

Finish cleanup of vprint() which was begun with changing v_tag to a string.
Remove extraneous uses of vop_null, instead defering to the default op.
Rename vnode type "vfs" to the more descriptive "syncer".
Fix formatting for various filesystems that use vop_print.


111821 03-Mar-2003 phk

Make nokqfilter() return the correct return value.

Ditch the D_KQFILTER flag which was used to prevent calling NULL pointers.


111815 03-Mar-2003 phk

Gigacommit to improve device-driver source compatibility between
branches:

Initialize struct cdevsw using C99 sparse initializtion and remove
all initializations to default values.

This patch is automatically generated and has been tested by compiling
LINT with all the fields in struct cdevsw in reverse order on alpha,
sparc64 and i386.

Approved by: re(scottl)


111769 02-Mar-2003 des

Get rid of caddr_t.


111748 02-Mar-2003 des

More low-hanging fruit: kill caddr_t in calls to wakeup(9) / [mt]sleep(9).


111742 02-Mar-2003 des

Clean up whitespace, s/register //, refrain from strong urge to ANSIfy.


111741 02-Mar-2003 des

uiomove-related caddr_t -> void * (just the low-hanging fruit)


111738 02-Mar-2003 des

wakeup(9) and msleep(9) take void * arguments, not caddr_t.


111730 02-Mar-2003 phk

NODEVFS cleanup:

Replace devfs_{create,destroy} hooks with direct function calls.


111611 27-Feb-2003 tjr

Copy some VM changes from smbfs_putpages() to nwfs_putpages(): lock
page queues, use vm_page_undirty().


111603 27-Feb-2003 tjr

Fix vnode corruption bug when trying to rename files across filesystems.
Similar to the bug fixed in smbfs_vnops.c rev 1.33.


111601 27-Feb-2003 tjr

Sync nwfs_access() with smbfs_access(): use vaccess() instead of checking
permissions ourself, fixes problem with VAPPEND.


111597 27-Feb-2003 tjr

Catch up with recent netncp changes: ncp_chkintr() takes a thread, not
a proc, as its second argument.


111585 27-Feb-2003 julian

Change the process flags P_KSES to be P_THREADED.
This is just a cosmetic change but I've been meaning to do it for about a year.


111573 26-Feb-2003 phk

msg


111127 19-Feb-2003 tjr

Do not call smbfs_attr_cacheremove() in the EXDEV case in smbfs_rename().
One of the vnodes is on different mount and is possibly on a different
kind of filesystem; treating it as an smbfs vnode then writing to it
will probably corrupt it.

PR: 48381
MFC after: 1 month


111119 19-Feb-2003 imp

Back out M_* changes, per decision of the TRB.

Approved by: trb


110700 11-Feb-2003 phk

Use the SI_CANDELETE flag on the dev_t rather than the D_CANFREE flag
on the cdevsw to determine ability to handle the BIO_DELETE request.


110584 09-Feb-2003 jeff

- Cleanup unlocked accesses to buf flags by introducing a new b_vflag member
that is protected by the vnode lock.
- Move B_SCANNED into b_vflags and call it BV_SCANNED.
- Create a vop_stdfsync() modeled after spec's sync.
- Replace spec_fsync, msdos_fsync, and hpfs_fsync with the stdfsync and some
fs specific processing. This gives all of these filesystems proper
behavior wrt MNT_WAIT/NOWAIT and the use of the B_SCANNED flag.
- Annotate the locking in buf.h


110533 08-Feb-2003 tjr

Revert removal of vnode and VFS stubs; bp asserts that they are needed.


110501 07-Feb-2003 tjr

Garbage-collect stub VFS ops, use the defaults instead.


110500 07-Feb-2003 tjr

Garbage-collect stub vnode ops, use the defaults instead.


110314 04-Feb-2003 tjr

Add missing permission checks to the smbfs VOP_SETATTR vnode op for the
case where the caller requests to change access or modification times.

MFC after: 3 days


110299 03-Feb-2003 phk

Split the global timezone structure into two integer fields to
prevent the compiler from optimizing assignments into byte-copy
operations which might make access to the individual fields non-atomic.

Use the individual fields throughout, and don't bother locking them with
Giant: it is no longer needed.

Inspired by: tjr


110272 03-Feb-2003 tjr

Use vaccess() instead of rolling our own access checks. This fixes a bug
where requests to open a file in append mode were always denied, and
will also be useful when capabilities and auditing are implemented.


110063 29-Jan-2003 phk

NODEVFS cleanup: remove #ifdefs.


110043 29-Jan-2003 tjr

Escape the backslash in badchars so that smbfs_pathcheck() correctly
rejects pathnames with backslashes in them (and to avoid a syntax error).

Found by: FlexeLint


109969 28-Jan-2003 tjr

Do not allow a cached vnode to be shared among multiple mounts of the same
kind of pseudofs-based filesystem. Fixes (at least) one problem where
when procfs is mounted mupltiple times, trying to unmount one will often
cause the wrong one to get unmounted, and other problem where mounting
one procfs on top of another caused the kernel to lock up.

Reviewed by: des


109623 21-Jan-2003 alfred

Remove M_TRYWAIT/M_WAITOK/M_WAIT. Callers should use 0.
Merge M_NOWAIT/M_DONTWAIT into a single flag M_NOWAIT.


109608 21-Jan-2003 rwatson

GC an unused reference to vop_refreshlabel_desc; reference to
opt_mac.h was removed previously so it was never compiled in.

Obtained from: TrustedBSD Project
Sponsored by: DARPA, Network Associates Laboratories


109526 19-Jan-2003 phk

Originally when DEVFS was added, a global variable "devfs_present"
was used to control code which were conditional on DEVFS' precense
since this avoided the need for large-scale source pollution with
#include "opt_geom.h"

Now that we approach making DEVFS standard, replace these tests
with an #ifdef to facilitate mechanical removal once DEVFS becomes
non-optional.

No functional change by this commit.


109450 18-Jan-2003 tjr

Fake up a struct componentname to pass to VOP_WHITEOUT instead of passing
NULL. union_whiteout() expects the componentname argument to be non-NULL.
Fixes a NULL dereference panic when an existing union mount becomes the
upper layer of a new union mount.


109202 13-Jan-2003 phk

Even if the permissions deny it, a process should be allowed to
access its controlling terminal.

In essense, history dictates that any process is allowed to open
/dev/tty for RW, irrespective of credential, because by definition
it is it's own controlling terminal.

Before DEVFS we relied on a hacky half-device thing (kern/tty_tty.c)
which did the magic deep down at device level, which at best was
disgusting from an architectural point of view.

My first shot at this was to use the cloning mechanism to simply
give people the right tty when they ask for /dev/tty, that's why
you get this, slightly counter intuitive result:

syv# ls -l /dev/tty `tty`
crw--w---- 1 u1 tty 5, 0 Jan 13 22:14 /dev/tty
crw--w---- 1 u1 tty 5, 0 Jan 13 22:14 /dev/ttyp0

Trouble is, when user u1 su(1)'s to user u2, he cannot open
/dev/ttyp0 anymore because he doesn't have permission to do so.

The above fix allows him to do that.

The interesting side effect is that one was previously only able
to access the controlling tty by indirection:
date > /dev/tty
but not by name:
date > `tty`

This is now possible, and that feels a lot more like DTRT.

PR: 46635
MFC candidate: could be.


109153 13-Jan-2003 dillon

Bow to the whining masses and change a union back into void *. Retain
removal of unnecessary casts and throw in some minor cleanups to see if
anyone complains, just for the hell of it.


109123 12-Jan-2003 dillon

Change struct file f_data to un_data, a union of the correct struct
pointer types, and remove a huge number of casts from code using it.

Change struct xfile xf_data to xun_data (ABI is still compatible).

If we need to add a #define for f_data and xf_data we can, but I don't
think it will be necessary. There are no operational changes in this
commit.


109090 11-Jan-2003 dd

Add symlink support to devfs_rule_matchpath(). This allows the user
to unhide symlinks as well as hide them.


108716 05-Jan-2003 phk

Don't override the vop_lock, vop_unlock and vop_isunlocked methods.

Previously all filesystems which relied on specfs to do devices
would have private overrides for vop_std*, so the vop_no* overrides
here had no effect. I overlooked the transitive nature of the vop
vectors when I removed the vop_std* in those filesystems.

Removing the override here restores device node locking to it's
previous modus operandi.

Spotted by: bde


108707 05-Jan-2003 phk

Don't take the detour over VOP_STRATEGY from spec_getpages, call our
own strategy directly.


108706 05-Jan-2003 phk

Split out the vnode and buf arguments to the internal strategy worker
routine instead of doing evil casts.


108692 05-Jan-2003 tjr

Repair vnode locking in portal_lookup(). Specifically, lock the file
vnode, and unlock the parent directory vnode if LOCKPARENT is not set.

Obtained from: NetBSD (rev. 1.34)


108686 04-Jan-2003 phk

Temporarily introduce a new VOP_SPECSTRATEGY operation while I try
to sort out disk-io from file-io in the vm/buffer/filesystem space.

The intent is to sort VOP_STRATEGY calls into those which operate
on "real" vnodes and those which operate on VCHR vnodes. For
the latter kind, the call will be changed to VOP_SPECSTRATEGY,
possibly conditionally for those places where dual-use happens.

Add a default VOP_SPECSTRATEGY method which will call the normal
VOP_STRATEGY. First time it is called it will print debugging
information. This will only happen if a normal vnode is passed
to VOP_SPECSTRATEGY by mistake.

Add a real VOP_SPECSTRATEGY in specfs, which does what VOP_STRATEGY
does on a VCHR vnode today.

Add a new VOP_STRATEGY method in specfs to catch instances where
the conversion to VOP_SPECSTRATEGY has not yet happened. Handle
the request just like we always did, but first time called print
debugging information.

Apart up to two instances of console messages per boot, this amounts
to a glorified no-op commit.

If you get any of the messages on your console I would very much
like a copy of them mailed to phk@freebsd.org


108681 04-Jan-2003 phk

resort vnode ops list


108658 04-Jan-2003 phk

Replace spec_bmap() with vop_panic: We should never BMAP a device backed
vnode only filesystem backed vnodes.


108648 04-Jan-2003 phk

Since Jeffr made the std* functions the default in rev 1.63 of
kern/vfs_defaults.c it is wrong for the individual filesystems to use
the std* functions as that prevents override of the default.

Found by: src/tools/tools/vop_table


108589 03-Jan-2003 phk

Convert calls to BUF_STRATEGY to VOP_STRATEGY calls. This is a no-op since
all BUF_STRATEGY did in the first place was call VOP_STRATEGY.


108586 03-Jan-2003 phk

Remove unused second argument from DEV_STRATEGY().


108470 30-Dec-2002 schweikh

Fix typos, mostly s/ an / a / where appropriate and a few s/an/and/
Add FreeBSD Id tag where missing.


108387 29-Dec-2002 phk

There is some sort of race/deadlock which I have not identified
here. It manifests itself by sendmail hanging in "fifoow" during
boot on a diskless machine with sendmail disabled.

Giving the sleep a 1sec timout breaks the deadlock, but does not solve
the underlying problem.

XXX comment applied.


108357 28-Dec-2002 dillon

Abstract-out the constants for the sequential heuristic.

No operational changes.

MFC after: 1 day


108341 28-Dec-2002 rwatson

Trim left-over and unused vop_refreshlabel() bits from devfs.

Reported by: bde


107890 15-Dec-2002 tjr

Remove redundant check for negative or zero v_usecount; vrele() already
checks that.


107842 13-Dec-2002 tjr

Keep trying to flush the vnode list for the mount while some are still
busy and we are making progress towards making them not busy. This is
needed because smbfs vnodes reference their parent directory but may
appear after their parent in the mount's vnode list; one pass over the
list is not sufficient in this case.

This stops attempts to unmount idle smbfs mounts failing with EBUSY.


107822 13-Dec-2002 tjr

Fix build with SMB_VNODE_DEBUG defined; use td_proc->p_pid instead of
the nonexistent td_pid.


107821 13-Dec-2002 tjr

Store a reference to the parent directory's vnode in struct smbnode,
not to the parent's smbnode, which may be freed during the lifetime
of the child if the mount is forcibly unmounted. umount -f should now
work properly (ie. not panic) on smbfs mounts.


107698 09-Dec-2002 rwatson

Remove dm_root entry from struct devfs_mount. It's never set, and is
unused. Replace it with a dm_mount back-pointer to the struct mount
that the devfs_mount is associated with. Export that pointer to MAC
Framework entry points, where all current policies don't use the
pointer. This permits the SEBSD port of SELinux's FLASK/TE to compile
out-of-the-box on 5.0-CURRENT with full file system labeling support.

Approved by: re (murray)
Obtained from: TrustedBSD Project
Sponsored by: DARPA, Network Associates Laboratories


106696 09-Nov-2002 alfred

Fix instances of macros with improperly parenthasized arguments.

Verified by: md5


106595 07-Nov-2002 jhb

Cast a pointer to a uintptr_t to quiet a warning.


106594 07-Nov-2002 jhb

Third argument to copyinstr() is a pointer to a size_t, not a pointer to a
u_int.


106402 04-Nov-2002 mckusick

Add debug.doslowdown to enable/disable niced slowdown on I/O. Default
to off until locking interference issues get sorted out.

Sponsored by: DARPA & NAI Labs.


106355 02-Nov-2002 peter

Unbreak MNT_UPDATE when running with cd as root. Detect mountroot by
checking for "path == NULL" (like ffs) rather than MNT_ROOT. Otherwise
when you try and do an update or mountd does an NFS export, the remount
fails because the code tries to mount a fresh rootfs and gets an EBUSY.
The same bug is in 4.x (which is where I found it).

Sanity check by: mux


106298 01-Nov-2002 phk

Put a KASSERT in specfs::strategy() to check that the incoming buffer
has a valid b_iocmd. Valid is any one of BIO_{READ,WRITE,DELETE}.

I have seen at least one case where the bio_cmd field was zero once the
request made it into GEOM. Putting the KASSERT here allows us to spot
the culprit in the backtrace.


106110 29-Oct-2002 semenu

Fix winChkName() to match when the last slot contains nothing but the
terminating zero (it was treated as length missmatch). The mtools create
such slots if the name len is the product of 13 (max number of unicode
chars fitting in directory slot).

MFC after: 1 week


105998 26-Oct-2002 mux

In VOP_LOOKUP, don't deny DELETE and RENAME operations
when ISLASTCN is not set. The actual file which is being
looked up may live in a different filesystem.


105988 26-Oct-2002 rwatson

Slightly change the semantics of vnode labels for MAC: rather than
"refreshing" the label on the vnode before use, just get the label
right from inception. For single-label file systems, set the label
in the generic VFS getnewvnode() code; for multi-label file systems,
leave the labeling up to the file system. With UFS1/2, this means
reading the extended attribute during vfs_vget() as the inode is
pulled off disk, rather than hitting the extended attributes
frequently during operations later, improving performance. This
also corrects sematics for shared vnode locks, which were not
previously present in the system. This chances the cache
coherrency properties WRT out-of-band access to label data, but in
an acceptable form. With UFS1, there is a small race condition
during automatic extended attribute start -- this is not present
with UFS2, and occurs because EAs aren't available at vnode
inception. We'll introduce a work around for this shortly.

Approved by: re
Obtained from: TrustedBSD Project
Sponsored by: DARPA, Network Associates Laboratories


105902 25-Oct-2002 mckusick

Within ufs, the ffs_sync and ffs_fsync functions did not always
check for and/or report I/O errors. The result is that a VFS_SYNC
or VOP_FSYNC called with MNT_WAIT could loop infinitely on ufs in
the presence of a hard error writing a disk sector or in a filesystem
full condition. This patch ensures that I/O errors will always be
checked and returned. This patch also ensures that every call to
VFS_SYNC or VOP_FSYNC with MNT_WAIT set checks for and takes
appropriate action when an error is returned.

Sponsored by: DARPA & NAI Labs.


105667 22-Oct-2002 mckusick

This checkin reimplements the io-request priority hack in a way
that works in the new threaded kernel. It was commented out of
the disksort routine earlier this year for the reasons given in
kern/subr_disklabel.c (which is where this code used to reside
before it moved to kern/subr_disk.c):

----------------------------
revision 1.65
date: 2002/04/22 06:53:20; author: phk; state: Exp; lines: +5 -0
Comment out Kirks io-request priority hack until we can do this in a
civilized way which doesn't cause grief.

The problem is that it is not generally safe to cast a "struct bio
*" to a "struct buf *". Things like ccd, vinum, ata-raid and GEOM
constructs bio's which are not entrails of a struct buf.

Also, curthread may or may not have anything to do with the I/O request
at hand.

The correct solution can either be to tag struct bio's with a
priority derived from the requesting threads nice and have disksort
act on this field, this wouldn't address the "silly-seek syndrome"
where two equal processes bang the diskheads from one edge to the
other of the disk repeatedly.

Alternatively, and probably better: a sleep should be introduced
either at the time the I/O is requested or at the time it is completed
where we can be sure to sleep in the right thread.

The sleep also needs to be in constant timeunits, 1/hz can be practicaly
any sub-second size, at high HZ the current code practically doesn't
do anything.
----------------------------

As suggested in this comment, it is no longer located in the disk sort
routine, but rather now resides in spec_strategy where the disk operations
are being queued by the thread that is associated with the process that
is really requesting the I/O. At that point, the disk queues are not
visible, so the I/O for positively niced processes is always slowed
down whether or not there is other activity on the disk.

On the issue of scaling HZ, I believe that the current scheme is
better than using a fixed quantum of time. As machines and I/O
subsystems get faster, the resolution on the clock also rises.
So, ten years from now we will be slowing things down for shorter
periods of time, but the proportional effect on the system will
be about the same as it is today. So, I view this as a feature
rather than a drawback. Hence this patch sticks with using HZ.

Sponsored by: DARPA & NAI Labs.
Reviewed by: Poul-Henning Kamp <phk@critter.freebsd.dk>


105655 21-Oct-2002 jhb

Grrr, s/PBP/BPB/ here as well.

Noticed by: peter


105645 21-Oct-2002 jhb

Spell the BPB member of the 7.10 bootsector as bsBPB rather than bsPBP to
be like all the other bootsectors. Apple has done the same it seems.


105585 20-Oct-2002 rwatson

Missed a case of _POSIX_MAC_PRESENT -> _PC_MAC_PRESENT rename.

Pointed out by: phk


105561 20-Oct-2002 phk

'&' not used for pointers to functions.

Spotted by: FlexeLint


105560 20-Oct-2002 phk

Remove even more '&' from pointers to functions.

Spotted by: FlexeLint


105488 19-Oct-2002 kan

umap_sync is empty and is identical to vfs_stdsync. Remove it and
use generic function instead.

Approved by: obrien


105487 19-Oct-2002 kan

style(9)

Approved by: obrien


105212 16-Oct-2002 phk

Fix comments and one resulting code confusion about the type of the
"command" argument to VOP_IOCTL.

Spotted by: FlexeLint.


105211 16-Oct-2002 phk

Be consistent about functions being static.

Spotted by: FlexeLint


105210 16-Oct-2002 phk

A better solution to avoiding variable sized structs in DEVFS.


105209 16-Oct-2002 phk

#include "opt_devfs.h" to protect against variable sized structures.

Spotted by: FlexeLint


105165 15-Oct-2002 phk

Plug an infrequent (I think) memory leak.

Spotted by: FlexeLint


105077 14-Oct-2002 mckusick

Regularize the vop_stdlock'ing protocol across all the filesystems
that use it. Specifically, vop_stdlock uses the lock pointed to by
vp->v_vnlock. By default, getnewvnode sets up vp->v_vnlock to
reference vp->v_lock. Filesystems that wish to use the default
do not need to allocate a lock at the front of their node structure
(as some still did) or do a lockinit. They can simply start using
vn_lock/VOP_UNLOCK. Filesystems that wish to manage their own locks,
but still use the vop_stdlock functions (such as nullfs) can simply
replace vp->v_vnlock with a pointer to the lock that they wish to
have used for the vnode. Such filesystems are responsible for
setting the vp->v_vnlock back to the default in their vop_reclaim
routine (e.g., vp->v_vnlock = &vp->v_lock).

In theory, this set of changes cleans up the existing filesystem
lock interface and should have no function change to the existing
locking scheme.

Sponsored by: DARPA & NAI Labs.


105051 13-Oct-2002 mux

- Remove a useless initialization for 'ronly', if it hadn't been
there, we would have noticed that 'ronly' was uninitialized :-).
- Kill a nearby 'register' keyword.


105050 13-Oct-2002 phk

Pass flags to VOP_CLOSE() corresponding to what was passed to VOP_OPEN().

Submitted by: "Peter Edwards" <pmedwards@eircom.net>


104908 11-Oct-2002 mike

Change iov_base's type from `char *' to the standard `void *'. All
uses of iov_base which assume its type is `char *' (in order to do
pointer arithmetic) have been updated to cast iov_base to `char *'.


104653 08-Oct-2002 dd

Treat the pathptrn field as a real pattern with the aid of fnmatch().


104566 06-Oct-2002 mux

Yet another 64 bits warning fix: s/u_int/size_t/.


104565 06-Oct-2002 mux

Fix a warning on 64 bits platforms: copyinstr() takes
a size_t *, not an u_int *.


104564 06-Oct-2002 mux

Fix a warning on 64 bits platforms: copystr() takes a size_t *,
not an int *.


104533 05-Oct-2002 rwatson

Integrate a devfs/MAC fix from the MAC tree: avoid a race condition during
devfs VOP symlink creation by introducing a new entry point to determine
the label of the devfs_dirent prior to allocation of a vnode for the
symlink.

Obtained from: TrustedBSD Project
Sponsored by: DARPA, Network Associates Laboratories


104508 05-Oct-2002 phk

Plug memoryleaks detected by FlexeLint.


104306 01-Oct-2002 jmallett

Back our kernel support for reliable signal queues.

Requested by: rwatson, phk, and many others


104278 01-Oct-2002 phk

Move the vop-vector declaration into devfs_vnops.c where it belongs.


104264 01-Oct-2002 jmallett

When working with sigset_t's, and needing to perform masking operations based
on a process's pending signals, use the signal queue flattener,
ksiginfo_to_sigset_t, on the process, and on a local sigset_t, and then work
with that as needed.


104233 30-Sep-2002 jmallett

First half of implementation of ksiginfo, signal queues, and such. This
gets signals operating based on a TailQ, and is good enough to run X11,
GNOME, and do job control. There are some intricate parts which could be
more refined to match the sigset_t versions, but those require further
evaluation of directions in which our signal system can expand and contract
to fit our needs.

After this has been in the tree for a while, I will make in kernel API
changes, most notably to trapsignal(9) and sendsig(9), to use ksiginfo
more robustly, such that we can actually pass information with our
(queued) signals to the userland. That will also result in using a
struct ksiginfo pointer, rather than a signal number, in a lot of
kern_sig.c, to refer to an individual pending signal queue member, but
right now there is no defined behaviour for such.

CODAFS is unfinished in this regard because the logic is unclear in
some places.

Sponsored by: New Gold Technology
Reviewed by: bde, tjr, jake [an older version, logic similar]


104113 28-Sep-2002 phk

s/struct dev_t */dev_t */


104099 28-Sep-2002 phk

Fix mis-indent.


104094 28-Sep-2002 phk

Be consistent about "static" functions: if the function is marked
static in its prototype, mark it static at the definition too.

Inspired by: FlexeLint warning #512


104089 28-Sep-2002 phk

I misplaced a local variable yesterday.


104048 27-Sep-2002 phk

Add a D_NOGIANT flag which can be set in a struct cdevsw to indicate
that a particular device driver is not Giant-challenged.

SPECFS will DROP_GIANT() ... PICKUP_GIANT() around calls to the
driver in question.

Notice that the interrupt path is not affected by this!

This does _NOT_ work for drivers accessed through cdevsw->d_strategy()
ie drivers for disk(-like), some tapes, maybe others.


104043 27-Sep-2002 phk

Rename struct specinfo to the more appropriate struct cdev.

Agreed on: jake, rwatson, jhb


104012 26-Sep-2002 phk

I hate it when patch gives me .rej files.

Can't we make the pre-commit check refuse if there are .rej files in
the directory ?


104007 26-Sep-2002 phk

Return ENOTTY on unhandled ioctls.


104005 26-Sep-2002 phk

Return ENOTTY on unrecognized ioctls.


104004 26-Sep-2002 phk

Return ENOTTY on incorrect ioctls.


104003 26-Sep-2002 phk

Return ENOTTY when we don't recognize an ioctl.


103989 26-Sep-2002 njl

Fix these warns where sizeof(int) != sizeof(void *)
/h/des/src/sys/coda/coda_venus.c: In function `venus_ioctl':
/h/des/src/sys/coda/coda_venus.c:277: warning: cast from pointer to integer of
different size
/h/des/src/sys/coda/coda_venus.c:292: warning: cast from pointer to integer of
different size
/h/des/src/sys/coda/coda_venus.c: In function `venus_readlink':
/h/des/src/sys/coda/coda_venus.c:380: warning: cast from pointer to integer of
different size
/h/des/src/sys/coda/coda_venus.c: In function `venus_readdir':
/h/des/src/sys/coda/coda_venus.c:637: warning: cast from pointer to integer of
different size

Submitted by: des-alpha-tinderbox


103983 26-Sep-2002 jeff

- Fix a botch in previous commit; oldvp should not be unconditionally
assigned.


103979 25-Sep-2002 semenu

Fix the problem introduced by vop_stdbmap() usage. The NTFS does not
implement worthful VOP_BMAP() handler, so it expect the blkno not to be
changed by VOP_BMAP(). Otherwise, it'll have to find some tricky way to
determine if bp was VOP_BMAP()ed or not in VOP_STRATEGY().

PR: kern/42139


103942 25-Sep-2002 jeff

- Use vrefcnt() instead of v_usecount.


103937 25-Sep-2002 jeff

- Use vrefcnt() instead of directly accessing v_usecount.


103936 25-Sep-2002 jeff

- Use vrefcnt() where it is safe to do so instead of doing direct and
unlocked accesses to v_usecount.
- Lock access to the buf lists in the various sync routines. interlock
locking could be avoided almost entirely in leaf filesystems if the
fsync function had a generic helper.


103935 25-Sep-2002 jeff

- Lock access to the buf lists in spec_sync()
- Fixup interlock locking in spec_close()


103934 25-Sep-2002 jeff

- Hold the vp lock while accessing v_vflags.


103870 23-Sep-2002 alfred

use __packed.


103804 22-Sep-2002 iedowse

Attempt to fix the error reported by the alpha tinderbox. A pointer
was being cast to an integer as part of a hash function, so just
add an intptr_t cast to silence the warning.


103796 22-Sep-2002 truckman

Fix misspellings, capitalization, and punctuation in comments. Minor
comment phrasing and style changes.


103767 21-Sep-2002 jake

Use the fields in the sysentvec and in the vm map header in place of the
constants VM_MIN_ADDRESS, VM_MAXUSER_ADDRESS, USRSTACK and PS_STRINGS.
This is mainly so that they can be variable even for the native abi, based
on different machine types. Get stack protections from the sysentvec too.
This makes it trivial to map the stack non-executable for certain abis, on
machines that support it.


103636 19-Sep-2002 truckman

VOP_FSYNC() requires that it's vnode argument be locked, which nfs_link()
wasn't doing. Rather than just lock and unlock the vnode around the call
to VOP_FSYNC(), implement rwatson's suggestion to lock the file vnode
in kern_link() before calling VOP_LINK(), since the other filesystems
also locked the file vnode right away in their link methods. Remove the
locking and and unlocking from the leaf filesystem link methods.

Reviewed by: rwatson, bde (except for the unionfs_link() changes)


103559 18-Sep-2002 njl

Remove any VOP_PRINT that redundantly prints the tag.
Move lockmgr_printinfo() into vprint() for everyone's benefit.

Suggested by: bde


103537 18-Sep-2002 bp

Always open file in the DENYNONE mode and let the server to decide what is
good for this file.
This should allow read only access to file which is already opened on server.


103533 18-Sep-2002 bp

Implement additional SMB calls to allow proper update of file size as some
file servers fail to do it in the right way.

New NFLUSHWIRE flag marks pending flush request(s).

NB: not all cases covered by this commit.

Obtained from: Darwin


103314 14-Sep-2002 njl

Remove all use of vnode->v_tag, replacing with appropriate substitutes.
v_tag is now const char * and should only be used for debugging.

Additionally:
1. All users of VT_NTS now check vfsconf->vf_type VFCF_NETWORK
2. The user of VT_PROCFS now checks for the new flag VV_PROCDEP, which
is propagated by pseudofs to all child vnodes if the fs sets PFS_PROCDEP.

Suggested by: phk
Reviewed by: bde, rwatson (earlier version)


103216 11-Sep-2002 julian

Completely redo thread states.

Reviewed by: davidxu@freebsd.org


102950 05-Sep-2002 davidxu

s/SGNL/SIG/
s/SNGL/SINGLE/
s/SNGLE/SINGLE/

Fix abbreviation for P_STOPPED_* etc flags, in original code they were
inconsistent and difficult to distinguish between them.

Approved by: julian (mentor)


102821 01-Sep-2002 iedowse

Add a missing #include <sys/lockmgr.h>.


102412 25-Aug-2002 charnier

Replace various spelling with FALLTHROUGH which is lint()able


102392 25-Aug-2002 bde

Fixed printf format errors and style bugs in rev.1.92. This is the version
that should have been committed in rev.1.93.


102391 25-Aug-2002 bde

Oops, the previous commit wasn't the version that I meant to commit (it
does some extra things which are probably harmless). Back it out.


102385 25-Aug-2002 bde

Fixed printf format errors and style bugs in previous commit.


102314 23-Aug-2002 scottl

Remove stddef.h from the header list

Prodded by: peter


102295 22-Aug-2002 trhodes

Fix a bug where large msdos partitions were not handled correctly, and fix
a few fsck_msdosfs related 'issues'

PR: 28536, 30168
Submitted by: Jiangyi Liu <jyliu@163.net> && NetBSD
Approved by: rwatson (mentor)


102170 20-Aug-2002 scottl

Remove the possibility of a race condition when reading the . and ..
entries.


102169 20-Aug-2002 scottl

Don't abuse the stack when translating names.


102160 20-Aug-2002 rwatson

Handle one more case of a fifofs filetmp: set filetmp.f_cred to
ap->a_cred, and pass in ap->a_td->td_ucred as the active_cred to
soo_poll().

Obtained from: TrustedBSD Project
Sponsored by: DARPA, NAI Labs


102003 17-Aug-2002 rwatson

In continuation of early fileop credential changes, modify fo_ioctl() to
accept an 'active_cred' argument reflecting the credential of the thread
initiating the ioctl operation.

- Change fo_ioctl() to accept active_cred; change consumers of the
fo_ioctl() interface to generally pass active_cred from td->td_ucred.
- In fifofs, initialize filetmp.f_cred to ap->a_cred so that the
invocations of soo_ioctl() are provided access to the calling f_cred.
Pass ap->a_td->td_ucred as the active_cred, but note that this is
required because we don't yet distinguish file_cred and active_cred
in invoking VOP's.
- Update kqueue_ioctl() for its new argument.
- Update pipe_ioctl() for its new argument, pass active_cred rather
than td_ucred to MAC for authorization.
- Update soo_ioctl() for its new argument.
- Update vn_ioctl() for its new argument, use active_cred rather than
td->td_ucred to authorize VOP_IOCTL() and the associated VOP_GETATTR().

Obtained from: TrustedBSD Project
Sponsored by: DARPA, NAI Labs


101983 16-Aug-2002 rwatson

Make similar changes to fo_stat() and fo_poll() as made earlier to
fo_read() and fo_write(): explicitly use the cred argument to fo_poll()
as "active_cred" using the passed file descriptor's f_cred reference
to provide access to the file credential. Add an active_cred
argument to fo_stat() so that implementers have access to the active
credential as well as the file credential. Generally modify callers
of fo_stat() to pass in td->td_ucred rather than fp->f_cred, which
was redundantly provided via the fp argument. This set of modifications
also permits threads to perform these operations on behalf of another
thread without modifying their credential.

Trickle this change down into fo_stat/poll() implementations:

- badfo_poll(), badfo_stat(): modify/add arguments.
- kqueue_poll(), kqueue_stat(): modify arguments.
- pipe_poll(), pipe_stat(): modify/add arguments, pass active_cred to
MAC checks rather than td->td_ucred.
- soo_poll(), soo_stat(): modify/add arguments, pass fp->f_cred rather
than cred to pru_sopoll() to maintain current semantics.
- sopoll(): moidfy arguments.
- vn_poll(), vn_statfile(): modify/add arguments, pass new arguments
to vn_stat(). Pass active_cred to MAC and fp->f_cred to VOP_POLL()
to maintian current semantics.
- vn_close(): rename cred to file_cred to reflect reality while I'm here.
- vn_stat(): Add active_cred and file_cred arguments to vn_stat()
and consumers so that this distinction is maintained at the VFS
as well as 'struct file' layer. Pass active_cred instead of
td->td_ucred to MAC and to VOP_GETATTR() to maintain current semantics.

- fifofs: modify the creation of a "filetemp" so that the file
credential is properly initialized and can be used in the socket
code if desired. Pass ap->a_td->td_ucred as the active
credential to soo_poll(). If we teach the vnop interface about
the distinction between file and active credentials, we would use
the active credential here.

Note that current inconsistent passing of active_cred vs. file_cred to
VOP's is maintained. It's not clear why GETATTR would be authorized
using active_cred while POLL would be authorized using file_cred at
the file system level.

Obtained from: TrustedBSD Project
Sponsored by: DARPA, NAI Labs


101967 16-Aug-2002 trhodes

When a cluster entry for ``.'' is set to 0, msdosfs fails to handle it
correctly.

PR: 24393
Submitted by: semenu
Approved by: rwatson (mentor)
MFC after: 1 week


101901 15-Aug-2002 jake

Fixed 64bit big endian bugs relating to abuse of ioctl argument passing.
This makes truss work on sparc64.


101895 15-Aug-2002 scottl

Clean up comments that are no longer relevant.


101890 15-Aug-2002 scottl

Factor out some ugle code that's shared by udf_readdir and udf_lookup.
Significantly de-obfuscate udf_lookup

Inspired By: tes@sgi.com


101777 13-Aug-2002 phk

Introduce typedefs for the member functions of struct vfsops and employ
these in the main filesystems. This does not change the resulting code
but makes the source a little bit more grepable.

Sponsored by: DARPA and NAI Labs.


101404 05-Aug-2002 pb

Fix typo in vnode flags causing deadlock in msdosfs_fsync().

Reviewed by: jeff


101330 04-Aug-2002 mike

Fix typo in the last revision.

Noticed by: i386 tinderbox


101317 04-Aug-2002 scottl

Simplify the handling of a fragmented file_id descriptor. Also
de-obfuscate the file_char flags.


101308 04-Aug-2002 jeff

- Replace v_flag with v_iflag and v_vflag
- v_vflag is protected by the vnode lock and is used when synchronization
with VOP calls is needed.
- v_iflag is protected by interlock and is used for dealing with vnode
management issues. These flags include X/O LOCK, FREE, DOOMED, etc.
- All accesses to v_iflag and v_vflag have either been locked or marked with
mp_fixme's.
- Many ASSERT_VOP_LOCKED calls have been added where the locking was not
clear.
- Many functions in vfs_subr.c were restructured to provide for stronger
locking.

Idea stolen from: BSD/OS


101202 02-Aug-2002 scottl

Calculate the correct physical block number for files that are
embedded into their file_entry descriptor. This is more for
correctness, since these files cannot be bmap'ed/mmap'ed anyways.
Enforce this restriction.

Submitted by: tes@sgi.com


101201 02-Aug-2002 scottl

Check for deleted files in udf_lookup(), not just udf_readdir().

Submitted by: tes@sgi.com


101200 02-Aug-2002 alc

o Lock page queue accesses in nwfs and smbfs.
o Assert that the page queues lock is held in vm_page_deactivate().


101195 02-Aug-2002 rwatson

Introduce support for Mandatory Access Control and extensible
kernel access control.

Teach devfs how to respond to pathconf() _POSIX_MAC_PRESENT queries,
allowing it to indicate to user processes that individual vnode labels
are available.

Obtained from: TrustedBSD Project
Sponsored by: DARPA, NAI Labs


101191 01-Aug-2002 rwatson

Hook up devfs_pathconf() for specfs devfs nodes, not just regular
devfs nodes.

Obtained from: TrustedBSD Project
Sponsored by: DARPA, NAI Labs


101132 01-Aug-2002 rwatson

Introduce support for Mandatory Access Control and extensible
kernel access control.

Modify procfs so that (when mounted multilabel) it exports process MAC
labels as the vnode labels of procfs vnodes associated with processes.

Approved by: des
Obtained from: TrustedBSD Project
Sponsored by: DARPA, NAI Labs


101130 01-Aug-2002 rwatson

Introduce support for Mandatory Access Control and extensible
kernel access control.

Modify pseudofs so that it can support synthetic file systems with
the multilabel flag set. In particular, implement vop_refreshlabel()
as pn_refreshlabel(). Implement pfs_refreshlabel() to invoke this,
and have it fall back to the mount label if the file system does
not implement pn_refreshlabel() for the node. Otherwise, permit
the file system to determine how the service is provided.

Approved by: des
Obtained from: TrustedBSD Project
Sponsored by: DARPA, NAI Labs


101069 31-Jul-2002 rwatson

Introduce support for Mandatory Access Control and extensible
kernel access control.

Instrument devfs to support per-dirent MAC labels. In particular,
invoke MAC framework when devfs directory entries are instantiated
due to make_dev() and related calls, and invoke the MAC framework
when vnodes are instantiated from these directory entries. Implement
vop_setlabel() for devfs, which pushes the label update into the
devfs directory entry for semi-persistant store. This permits the MAC
framework to assign labels to devices and directories as they are
instantiated, and export access control information via devfs vnodes.

Obtained from: TrustedBSD Project
Sponsored by: DARPA, NAI Labs


101002 31-Jul-2002 semenu

Fix a problem with sendfile() syscall by always doing I/O via bread() in
ntfs_read(). This guarantee that requested cache pages will be valid if
UIO_NOCOPY specifed.

PR: bin/34072, bin/36189
MFC after: 1 week


100994 30-Jul-2002 rwatson

Introduce support for Mandatory Access Control and extensible
kernel access control.

Label devfs directory entries, permitting labels to be maintained
on device nodes in devfs instances persistently despite vnode
recycling.

Obtained from: TrustedBSD Project
Sponsored by: DARPA, NAI Labs


100884 29-Jul-2002 julian

Create a new thread state to describe threads that would be ready to run
except for the fact tha they are presently swapped out. Also add a process
flag to indicate that the process has started the struggle to swap
back in. This will be needed for the case where multiple threads
start the swapin action top a collision. Also add code to stop
a process fropm being swapped out if one of the threads in this
process is actually off running on another CPU.. that might hurt...

Submitted by: Seigo Tanimura <tanimura@r.dl.itc.u-tokyo.ac.jp>


100804 28-Jul-2002 dd

Correct misindentation of DRA_UID.


100793 28-Jul-2002 dd

Unimplement panic(8) by making sure that we don't recurse into a
ruleset. If we do, that means there's a ruleset loop (10 includes 20
include 30 includes 10), which will quickly cause a double fault due
to stack overflow (since "include" is implemented by recursion).
(Previously, we only checked that X didn't include X.)


100738 27-Jul-2002 jeff

- Explicitly state that specfs does not support locking by using
vop_no{lock,unlock,islocked}. This should be the only vnode opv that does
so.


100737 27-Jul-2002 alc

o Lock page queue accesses by vm_page_activate() and vm_page_deactivate().


100206 17-Jul-2002 dd

Introduce the DEVFS "rule" subsystem. DEVFS rules permit the
administrator to define certain properties of new devfs nodes before
they become visible to the userland. Both static (e.g., /dev/speaker)
and dynamic (e.g., /dev/bpf*, some removable devices) nodes are
supported. Each DEVFS mount may have a different ruleset assigned to
it, permitting different policies to be implemented for things like
jails.

Approved by: phk


100164 16-Jul-2002 markm

Unbreak LINT; sort the includes so that functions are explicitly
declared. Remove duplicate includes.


99689 09-Jul-2002 jeff

- Change all LK_SHARE locks to LK_EXCLUSIVE. Shared locks aren't quite safe
yet
- Use vop_std{lock,unlock,islocked}.


99566 08-Jul-2002 jeff

Lock down pseudofs:
- Initialize lock structure in vncache_alloc
- Return locked vnodes from vncache_alloc
- Setup vnode op vectors to use default lock, unlock, and islocked
- Implement simple locking scheme required for lookup


99072 29-Jun-2002 julian

Part 1 of KSE-III

The ability to schedule multiple threads per process
(one one cpu) by making ALL system calls optionally asynchronous.
to come: ia64 and power-pc patches, patches for gdb, test program (in tools)

Reviewed by: Almost everyone who counts
(at various times, peter, jhb, matt, alfred, mini, bernd,
and a cast of thousands)

NOTE: this is still Beta code, and contains lots of debugging stuff.
expect slight instability in signals..


98266 15-Jun-2002 mux

nmount'ify unionfs further by using separate options instead
of passing a flags mount options. This removes the include of
sys/fs/unionfs/union.h in mount_unionfs as it should be.

Reviewed by: phk


98265 15-Jun-2002 mux

Convert UDF to nmount.

Reviewed by: scottl


98183 13-Jun-2002 semenu

Fix a race during null node creation between relookuping the hash and
adding vnode to hash. The fix is to use atomic hash-lookup-and-add-if-
not-found operation. The odd thing is that this race can't happen
actually because the lowervp vnode is locked exclusively now during the
whole process of null node creation. This must be thought as a step
toward shared lookups.

Also remove vp->v_mount checks when looking for a match in the hash,
as this is the vestige.

Also add comments and cosmetic changes.


98177 13-Jun-2002 semenu

Change null_hashlock into null_hashmtx, because there is no need for
lockmgr and this helps to vget() vnode from hash without a race.

Reviewed by: bp
MFC after: 2 weeks


98176 13-Jun-2002 semenu

Fix the "error" path (when dropping not fully initialized vnode).
Also move hash operations out of null_vnops.c and explicitly initialize
v_lock in null_node_alloc (to set wmesg).

Reviewed by: bp
MFC after: 2 weeks


98175 13-Jun-2002 semenu

Fix wrong locking in null_inactive and null_reclaim. This makes nullfs
relatively working back.

Reviewed by: mckusick, bp


97940 06-Jun-2002 des

Gratuitous whitespace cleanup.


97702 01-Jun-2002 semenu

Make devfs to give honour to PDIRUNLOCK flag.

Reviewed by: jeff
MFC after: 1 week


97658 31-May-2002 tanimura

Back out my lats commit of locking down a socket, it conflicts with hsu's work.

Requested by: hsu


97195 24-May-2002 mux

Convert unionfs to nmount.


97192 24-May-2002 mux

Fix comments.


97186 23-May-2002 mux

Convert nullfs to nmount.


97094 22-May-2002 bde

Quick fix for non-unique inode numbers for hard links. We use the
byte offset of the directory entry for the inode number for all types
of files except directories, although this breaks hard links for
non-directories even if it doesn't cause overflow. Just ignore this
broken inode number for stat() and readdir() and return a less broken
one (the block offset of the file), so that applications normally can't
see the brokenness.

This leaves at least the following brokenness:
- extra inodes, vnodes and caching for hard links.
- various overflow bugs. cd9660 supports 64-bit block numbers, but we
silently ignore the top 32 bits in isonum_733() and then drop another
10 bits for our broken inode numbers. We may also have sign extension
bugs from storing 32-bit extents in ints and longs even if ints are
32-bits. These bugs affect DVDs. mkisofs apparently limits them
by writing directory entries first.

Inode numbers were broken mainly in 4.4BSD-Lite2. FreeBSD-1.1.5 seems
to have a correct implementation modulo the overflow bugs. We need
to look up directory entries from inodes for symlinks only. FreeBSD-1.1.5
use separate fields (iso_parent_extent, iso_parent) to point to the
directory entry. 4.4BSD-Lite doesn't have these, and abuses i_ino to
point to the directory entry. Correct pointers are impossible for
hard links, but symlinks can't be hard links.


97072 21-May-2002 semenu

Fix null_lock() not unlocking vp->v_interlock if LK_THISLAYER.

Reviewed by: bp@FreeBSD.org
MFC after: 1 week


97035 21-May-2002 tanimura

Lock the writer socket across sorwakeup(fip->fi_writesock).

Spotted by: peter


96972 20-May-2002 tanimura

Lock down a socket, milestone 1.

o Add a mutex (sb_mtx) to struct sockbuf. This protects the data in a
socket buffer. The mutex in the receive buffer also protects the data
in struct socket.

o Determine the lock strategy for each members in struct socket.

o Lock down the following members:

- so_count
- so_options
- so_linger
- so_state

o Remove *_locked() socket APIs. Make the following socket APIs
touching the members above now require a locked socket:

- sodisconnect()
- soisconnected()
- soisconnecting()
- soisdisconnected()
- soisdisconnecting()
- sofree()
- soref()
- sorele()
- sorwakeup()
- sotryfree()
- sowakeup()
- sowwakeup()

Reviewed by: alfred


96886 19-May-2002 jhb

Change p_can{debug,see,sched,signal}()'s first argument to be a thread
pointer instead of a proc pointer and require the process pointed to
by the second argument to be locked. We now use the thread ucred reference
for the credential checks in p_can*() as a result. p_canfoo() should now
no longer need Giant.


96847 18-May-2002 phk

Remove a check of blocknumbers/offsets which will be pointless with
64 bit daddr_t.

Sponsored by: DARPA & NAI Labs.


96755 16-May-2002 trhodes

More s/file system/filesystem/g


96750 16-May-2002 mux

In VOP_LOOKUP, don't assume that the final pathname component
will be in the same filesystem than the one where the current
component is.

Approved by: scottl


96572 14-May-2002 phk

Make daddr_t and u_daddr_t 64bits wide.
Retire daddr64_t and use daddr_t instead.

Sponsored by: DARPA & NAI Labs.


96356 10-May-2002 mux

Fix several bugs in devfs_lookupx(). When we check the nameiop to
make sure it's a correct operation for devfs, do it only in the
ISLASTCN case. If we don't, we are assuming that the final file will
be in devfs, which is not true if another partition is mounted on top
of devfs or with special filenames (like /dev/net/../../foo).

Reviewed by: phk


96009 04-May-2002 jeff

Include systm.h for panic(9) so that DEBUG_ALL_VFS_LOCKS compiles.


95994 03-May-2002 phk

HPFS picks up the vop_stdgetpages and vop_stdputpages member functions
via the default entry and the default vop vector.


95984 03-May-2002 des

s/pfs_badop/vop_eopnotsupp/

Submitted by: phk


95954 02-May-2002 mux

Convert devfs to nmount.

Reviewed by: phk


95953 02-May-2002 mux

Convert the pseudofs framework to nmount (thus procfs and linprocfs).

Reviewed by: des (some time ago), phk


95952 02-May-2002 mux

Convert fdescfs to nmount.

Reviewed by: phk


95951 02-May-2002 scottl

Don't reference vop_std* since they are already implicitly
referenced through the VOP_DEFAULT vector

Submitted by: phk


95944 02-May-2002 phk

Use vop_panic() instead of rolling our own.


95913 02-May-2002 scottl

In udf_bmap(), return the physical block number, not the logical
block number. This fixes things like cp (ouch!) which use mmap.


95767 30-Apr-2002 scottl

Fix udf_read(). Honor the uio_resid when determining the size of
the block to read and copy out. This removes the hack in
udf_readatoffset() for only reading one block at a time. WooHoo!
Remove a redundant test for fragmented fids in both udf_readdir()
and udf_lookup(). Add comment to both as to why the test is
written the way it is. Add a few more safety checks for brelse().

Thanks to Timothy Shimmin <tes@boing.melbourne.sgi.com> for pointing
out these problems.


95759 30-Apr-2002 tanimura

Revert the change of #includes in sys/filedesc.h and sys/socketvar.h.

Requested by: bde

Since locking sigio_lock is usually followed by calling pgsigio(),
move the declaration of sigio_lock and the definitions of SIGIO_*() to
sys/signalvar.h.

While I am here, sort include files alphabetically, where possible.


95750 29-Apr-2002 rwatson

Use vnode locking with devfs; permit VFS locking assertions to make
sense for devfs vnodes, and reduce/remove potential races in the devfs
code.

Submitted by: iadowse
Approved by: phk


95480 26-Apr-2002 bp

UIO_NOCOPY is not supported for now, so refuse read opeartion if this flag
is set. The full emulation of bio are on its way...


95315 23-Apr-2002 bp

Track nfs's getpages() changes:

Properly count v_vnodepgsin.
Do not reread page if is already valid.
Properly handle partially filled pages.


95314 23-Apr-2002 bp

Get rid from extra #ifdefs.


95212 21-Apr-2002 bde

Don't attempt to decvlare M_DEVFS whern MALLOC_DECLARE is not defined.
This fixes warnings that should be errors in fstat.

Reminded by: alpha tinderbox

Fixed some style bugs (ones near BOF and EOF; there are many more).


95210 21-Apr-2002 bde

Include <sys/systm.h> for (at least) the definition of atomic functions
which are sometimes used by the macros in <sys/mutex.h>; don't depend
on not-quite-necessary namespace pollution in <sys/mutex.h>.


95094 20-Apr-2002 marcel

Don't put a line break in string literals. GCC 3.1 complains and GCC
3.2 drops the ball.


95090 20-Apr-2002 rwatson

Spelling fix for comment.


94995 18-Apr-2002 alfred

Cleanup of logic, flow and comments.

Submitted by: bde


94861 16-Apr-2002 jhb

Lock proctree_lock instead of pgrpsess_lock.


94795 15-Apr-2002 asmodai

Sync with UDF p4 tree: Use POSIX integer types instead of BSD types.


94663 14-Apr-2002 scottl

Actually add the UDF files!


94637 14-Apr-2002 jhb

Remove stale XXX comment.


94624 13-Apr-2002 jhb

- Change procfs_control()'s first argument to be a thread pointer instead
of a process pointer.
- Move the p_candebug() at the start of procfs_control() a bit to make
locking feasible. We still perform the access check before doing
anything, we just now perform it after acquiring locks.
- Don't lock the sched_lock for TRACE_WAIT_P() and when checking to see if
p_stat is SSTOP. We lock the process while setting p_stat to SSTOP
so locking the process is sufficient to do a read to see if p_stat is
SSTOP or not.


94623 13-Apr-2002 jhb

Lock the target process for p_candebug().


94622 13-Apr-2002 jhb

Lock the target process in procfs_doproc*regs() for p_candebug and while
reading/writing the registers.


94620 13-Apr-2002 jhb

- p_cansee() needs the target process locked.
- We need the proc lock held for more of procfs_doprocstatus().


94602 13-Apr-2002 bp

Check write permissions before creating anything.

PR: kern/27883
MFC after: 1 week


94177 08-Apr-2002 phk

Remove 3 instances of vm_zone.h inclusion.


94167 08-Apr-2002 jeff

Change the vm_zone calls over to uma calls. Remove the reference to the
vm_zone header.


93886 05-Apr-2002 bde

Fixed assorted bugs in setting of timestamps in devfs_setattr().

Setting of timestamps on devices had no effect visible to userland
because timestamps for devices were set in places that are never used.
This broke:
- update of file change time after a change of an attribute
- setting of file access and modification times.

The VA_UTIMES_NULL case did not work. Revs 1.31-1.32 were supposed to
fix this by copying correct bits from ufs, but had little or no effect
because the old checks were not removed.


93883 05-Apr-2002 bde

Fixed a very old bug in setting timestamps using utimes(2) on msdosfs
files. We didn't clear the update marks when we set the times, so
some of the settings were sometimes clobbered with the current time a
little later. This caused cp -p even by root to almost always fail
to preserve any times despite not reporting any errors in attempting
to preserve them.

Don't forget to set the archive attribute when we set the read-only
attribute. We should only set the archive attribute if we actually
change something, but we mostly don't bother avoiding setting it
elsewhere, so don't bother here yet.

MFC after: 1 week


93818 04-Apr-2002 jhb

Change callers of mtx_init() to pass in an appropriate lock type name. In
most cases NULL is passed, but in some cases such as network driver locks
(which use the MTX_NETWORK_LOCK macro) and UMA zone locks, a name is used.

Tested on: i386, alpha, sparc64


93793 04-Apr-2002 bde

Moved signal handling and rescheduling from userret() to ast() so that
they aren't in the usual path of execution for syscalls and traps.
The main complication for this is that we have to set flags to control
ast() everywhere that changes the signal mask.

Avoid locking in userret() in most of the remaining cases.

Submitted by: luoqi (first part only, long ago, reorganized by me)
Reminded by: dillon


93593 01-Apr-2002 jhb

Change the suser() API to take advantage of td_ucred as well as do a
general cleanup of the API. The entire API now consists of two functions
similar to the pre-KSE API. The suser() function takes a thread pointer
as its only argument. The td_ucred member of this thread must be valid
so the only valid thread pointers are curthread and a few kernel threads
such as thread0. The suser_cred() function takes a pointer to a struct
ucred as its first argument and an integer flag as its second argument.
The flag is currently only used for the PRISON_ROOT flag.

Discussed on: smp@


93430 30-Mar-2002 bde

In ffs_mountffs(), set mnt_iosize_max to si_iosize_max unconditionally
provided the latter is nonzero. At this point, the former is a fairly
arbitrary default value (DFTPHYS), so changing it to any reasonable
value specified by the device driver is safe. Using the maximum of
these limits broke ffs clustered i/o for devices whose si_iosize_max
is < DFLTPHYS. Using the minimum would break device drivers' ability
to increase the active limit from DFTLPHYS up to MAXPHYS.

Copied the code for this and the associated (unnecessary?) fixup of
mp_iosize_max to all other filesystems that use clustering (ext2fs and
msdosfs). It was completely missing.

PR: 36309
MFC-after: 1 week


93393 29-Mar-2002 alfred

Protect proc struct (p_args and p_comm) when doing procfs IO that pulls
data from it.

Submitted by: Jonathan Mini <mini@haikugeek.com>


93075 24-Mar-2002 bde

Fixed some style bugs in the removal of __P(()). Continuation lines
were not outdented to preserve non-KNF lining up of code with parentheses.
Switch to KNF formatting in some cases.


93012 23-Mar-2002 bde

Fixed some style bugs in the removal of __P(()). Continuation lines
were not outdented to preserve non-KNF lining up of code with parentheses.
Switch to KNF formatting.


92785 20-Mar-2002 jeff

Remove references to vm_zone.h and switch over to the new uma API.


92765 20-Mar-2002 alfred

Remove __P.


92755 20-Mar-2002 alfred

Remove __P.


92727 19-Mar-2002 alfred

Remove __P.


92540 18-Mar-2002 mckusick

Cannot release vnode underlying the nullfs vnode in null_inactive
as it leaves the nullfs vnode allocated, but with no identity. The
effect is that a null mount can slowly accumulate all the vnodes
in the system, reclaiming them only when it is unmounted. Thus
the null_inactive state instead accelerates the release of the
null vnode by calling vrecycle which will in turn call the
null_reclaim operator. The null_reclaim routine then does the
freeing actions previosuly (incorrectly) done in null_inactive.


92462 17-Mar-2002 mckusick

Add a flags parameter to VFS_VGET to pass through the desired
locking flags when acquiring a vnode. The immediate purpose is
to allow polling lock requests (LK_NOWAIT) needed by soft updates
to avoid deadlock when enlisting other processes to help with
the background cleanup. For the future it will allow the use of
shared locks for read access to vnodes. This change touches a
lot of files as it affects most filesystems within the system.
It has been well tested on FFS, loopback, and CD-ROM filesystems.
only lightly on the others, so if you find a problem there, please
let me (mckusick@mckusick.com) know.


92363 15-Mar-2002 mckusick

Introduce the new 64-bit size disk block, daddr64_t. Change
the bio and buffer structures to have daddr64_t bio_pblkno,
b_blkno, and b_lblkno fields which allows access to disks
larger than a Terabyte in size. This change also requires
that the VOP_BMAP vnode operation accept and return daddr64_t
blocks. This delta should not affect system operation in
any way. It merely sets up the necessary interfaces to allow
the development of disk drivers that work with these larger
disk block addresses. It also allows for the development of
UFS2 which will use 64-bit block addresses.


92270 14-Mar-2002 maxim

Be consistent with UFS in a way how devfs_setattr() checks credentials
for chmod(2), chown(2) and utimes(2) with respect to jail(2).

Reviewed by: rwatson, ru
Not objected by: phk
Approved by: ru


91683 05-Mar-2002 phk

If in strategy we find that we have no devsw on the device anymore we
are probably talking about some disk-device which wente away, so
return ENXIO instead of panicing.


91406 27-Feb-2002 jhb

Simple p_ucred -> td_ucred changes to start using the per-thread ucred
reference.


91181 23-Feb-2002 tmm

Fix LINT breakage by adding a missing include.


91140 23-Feb-2002 tanimura

Lock struct pgrp, session and sigio.

New locks are:

- pgrpsess_lock which locks the whole pgrps and sessions,
- pg_mtx which protects the pgrp members, and
- s_mtx which protects the session members.

Please refer to sys/proc.h for the coverage of these locks.

Changes on the pgrp/session interface:

- pgfind() needs the pgrpsess_lock held.

- The caller of enterpgrp() is responsible to allocate a new pgrp and
session.

- Call enterthispgrp() in order to enter an existing pgrp.

- pgsignal() requires a pgrp lock held.

Reviewed by: jhb, alfred
Tested on: cvsup.jp.FreeBSD.org
(which is a quad-CPU machine running -current)


90873 18-Feb-2002 des

Paranoia: if the process is setugid, set all sensitive files mode 0.


90785 17-Feb-2002 phk

Don't even think about using v_id for magic tricks, v_id is giving
us enough trouble as it is for SMPng.


90717 16-Feb-2002 bde

FIxed the following style bugs:
- clobbering of jsp's $Id$ by FreeBSD's old $Id$.
- long lines in recent KSE changes (procfs_ctl.c).
- other style bugs in KSE changes (most related to an shadowed variable
in procfs_status.c -- the td in the outer scope is obfuscated by
PFS_FILL_ARGS).

Approved by: des


90716 16-Feb-2002 bde

FIxed the following style bugs:
- clobbering of jsp's $Id$ by FreeBSD's old $Id$.
- lost Berkeley id in procfs_dbregs.c
- long lines in recent KSE changes.
- various gratuitous differences between procfs_*regs.c.


90715 16-Feb-2002 bde

Fixed missing PHOLD()/PRELE().

Obtained from: procfs_dbregs.c
Approved by: des


90489 10-Feb-2002 phk

Various nit-picking, mostly of style(9) character.

Obtained from: ~bde/sys.dif.gz


90448 10-Feb-2002 rwatson

Part I: Update extended attribute API and ABI:

o Modify the system call syntax for extattr_{get,set}_{fd,file}() so
as not to use the scatter gather API (which appeared not to be used
by any consumers, and be less portable), rather, accepts 'data'
and 'nbytes' in the style of other simple read/write interfaces.
This changes the API and ABI.

o Modify system call semantics so that extattr_get_{fd,file}() return
a size_t. When performing a read, the number of bytes read will
be returned, unless the data pointer is NULL, in which case the
number of bytes of data are returned. This changes the API only.

o Modify the VOP_GETEXTATTR() vnode operation to accept a *size_t
argument so as to return the size, if desirable. If set to NULL,
the size will not be returned.

o Update various filesystems (pseodofs, ufs) to DTRT.

These changes should make extended attributes more useful and more
portable. More commits to rebuild the system call files, as well
as update userland utilities to follow.

Obtained from: TrustedBSD Project
Sponsored by: DARPA, NAI Labs


90361 07-Feb-2002 julian

Pre-KSE/M3 commit.
this is a low-functionality change that changes the kernel to access the main
thread of a process via the linked list of threads rather than
assuming that it is embedded in the process. It IS still embeded there
but remove all teh code that assumes that in preparation for the next commit
which will actually move it out.

Reviewed by: peter@freebsd.org, gallatin@cs.duke.edu, benno rice,


90206 04-Feb-2002 rwatson

Change EPERM to EOPNOTSUPP when failing pseudofs_setattr() arbitrarily.

Quoth the alfred: The latter would be better.


90205 04-Feb-2002 rwatson

Return EPERM instead of 0 in the un-implemented pseudofs_setattr().
Conceivably, it should even return EOPNOTSUPP.


89376 14-Jan-2002 alfred

Fix select on fifos.

Backout revision 1.56 and 1.57 of fifo_vnops.c.

Introduce a new poll op "POLLINIGNEOF" that can be used to ignore
EOF on a fifo, POLLIN/POLLRDNORM is converted to POLLINIGNEOF within
the FIFO implementation to effect the correct behavior.

This should allow one to view a fifo pretty much as a data source
rather than worry about connections coming and going.

Reviewed by: bde


89372 14-Jan-2002 semenu

Commit a know fix for hpfs to use vop_defaultop plug instead of wrong
hpfs_bypass() routine.

MFC after: 1 day


89325 14-Jan-2002 alfred

don't initialize the mutex in the temporary struct file, the soo_*
functions just grab f_data and don't muck with anything else so this
should be ok.

this fixes a panic with invariants where it thinks we've doubly initialized
the filetmp mutex even though all we've done is neglect to bzero it.


89319 14-Jan-2002 alfred

Replace ffind_* with fget calls.

Make fget MPsafe.

Make fgetvp and fgetsock use the fget subsystem to reduce code bloat.

Push giant down in fpathconf().


89317 13-Jan-2002 alfred

remove unused socket pointer


89316 13-Jan-2002 alfred

Include sys/_lock.h and sys/_mutex.h to reduce namespace pollution.

Requested by: jhb


89306 13-Jan-2002 alfred

SMP Lock struct file, filedesc and the global file list.

Seigo Tanimura (tanimura) posted the initial delta.

I've polished it quite a bit reducing the need for locking and
adapting it for KSE.

Locks:

1 mutex in each filedesc
protects all the fields.
protects "struct file" initialization, while a struct file
is being changed from &badfileops -> &pipeops or something
the filedesc should be locked.

1 mutex in each struct file
protects the refcount fields.
doesn't protect anything else.
the flags used for garbage collection have been moved to
f_gcflag which was the FILLER short, this doesn't need
locking because the garbage collection is a single threaded
container.
could likely be made to use a pool mutex.

1 sx lock for the global filelist.

struct file * fhold(struct file *fp);
/* increments reference count on a file */

struct file * fhold_locked(struct file *fp);
/* like fhold but expects file to locked */

struct file * ffind_hold(struct thread *, int fd);
/* finds the struct file in thread, adds one reference and
returns it unlocked */

struct file * ffind_lock(struct thread *, int fd);
/* ffind_hold, but returns file locked */

I still have to smp-safe the fget cruft, I'll get to that asap.


89118 09-Jan-2002 msmith

Add a new sysinit SI_SUB_DEVFS. Devfs hooks into the kernel at SI_ORDER_FIRST,
and devices can be created anytime after that.

Print a warning if an atttempt is made to create a device too early.


89107 09-Jan-2002 msmith

Use a sysinit to initialise the devfs hooks in kern_conf.c rather than common
variables.

Reviewed by: phk (in principle)


89090 08-Jan-2002 msmith

Staticise the coda vfsop pointer.


89071 08-Jan-2002 msmith

Staticise pfs_vncache, it's not used anywhere else.

Reviewed by: des


88868 04-Jan-2002 tanimura

Do not derefer null.

Reviewed by: des


88739 31-Dec-2001 rwatson

o Make the credential used by socreate() an explicit argument to
socreate(), rather than getting it implicitly from the thread
argument.

o Make NFS cache the credential provided at mount-time, and use
the cached credential (nfsmount->nm_cred) when making calls to
socreate() on initially connecting, or reconnecting the socket.

This fixes bugs involving NFS over TCP and ipfw uid/gid rules, as well
as bugs involving NFS and mandatory access control implementations.

Reviewed by: freebsd-arch


88318 20-Dec-2001 dillon

Fix a BUF_TIMELOCK race against BUF_LOCK and fix a deadlock in vget()
against VM_WAIT in the pageout code. Both fixes involve adjusting
the lockmgr's timeout capability so locks obtained with timeouts do not
interfere with locks obtained without a timeout.

Hopefully MFC: before the 4.5 release


88279 20-Dec-2001 bp

Previous commit was intented to silence a warning, not to change codepath.


88263 20-Dec-2001 sheldonh

Silence harmless "smbfs_closel: Negative opencount" messages at
unmount time.

Thanks to iedowse for the background information.

Submitted by: bp


88234 19-Dec-2001 dillon

Pseudofs was leaking VFS cache entries badly due to its cache and use of
the wrong VOP descriptor. This misuse caused VFS-cached vnodes to be
re-cached, resulting in the leak. This commit is an interim fix until DES
has a chance to rework the code involved.


87798 13-Dec-2001 sheldonh

Add module dependency on libmchain.

With this change, mounting an smb share (using mount_smb, which is not
yet included in the tree) without any of smbfs, libiconv or libmchain
compiled into the kernel or loaded works.


87725 12-Dec-2001 alfred

Fix select on named pipes without a reader.

PR: kern/19871
MFC after: 1 month


87670 11-Dec-2001 green

Add VOP_GETEXTATTR(9) passthrough support to pseudofs.

Obtained from: TrustedBSD Project
Sponsored by: DARPA, NAI Labs


87669 11-Dec-2001 des

Remove an obsolete prototype for procfs_kmemaccess().

Submitted by: rwatson


87599 10-Dec-2001 obrien

Update to C99, s/__FUNCTION__/__func__/,
also don't use ANSI string concatenation.


87542 09-Dec-2001 des

Fix various bugs in the debugging code and reenable it.


87541 09-Dec-2001 des

Fix an incorrect PFS_TRACE. Also, use __func__ instead of __FUNCTION__.


87538 08-Dec-2001 des

Fix a KSEfication brain-o in procfs_doprocfile(): return the path of the target process,
not the calling process. While we're here, also unstaticize procfs_doprocfile() and
procfs_docurproc() so linprocfs can call them directly instead of duplicating them.

Submitted by: Dominic Mitchell <dom@semantico.com>


87321 04-Dec-2001 des

Pseudofsize procfs(5).


87275 03-Dec-2001 rwatson

o Introduce pr_mtx into struct prison, providing protection for the
mutable contents of struct prison (hostname, securelevel, refcount,
pr_linux, ...)
o Generally introduce mtx_lock()/mtx_unlock() calls throughout kern/
so as to enforce these protections, in particular, in kern_mib.c
protection sysctl access to the hostname and securelevel, as well as
kern_prot.c access to the securelevel for access control purposes.
o Rewrite linux emulator abstractions for accessing per-jail linux
mib entries (osname, osrelease, osversion) so that they don't return
a pointer to the text in the struct linux_prison, rather, a copy
to an array passed into the calls. Likewise, update linprocfs to
use these primitives.
o Update in_pcb.c to always use prison_getip() rather than directly
accessing struct prison.

Reviewed by: jhb


87194 02-Dec-2001 bp

Catch up with KSE changes.

Submitted by: Max Khon <fjoe@iclub.nsu.ru>


87068 28-Nov-2001 jhb

Fix indentation after removing GEMDOS support. Whitespace changes only.


87067 28-Nov-2001 jhb

Use suser_td() instead of explicitly checking cr_uid against 0.

PR: kern/21809
Submitted by: <mbendiks@eunet.no>
Reviewed by: rwatson


87061 28-Nov-2001 jhb

Axe more unused GEMDOS code that was #ifdef atari.

PR: kern/21809
Submitted by: <mbendiks@eunet.no>


87007 27-Nov-2001 jhb

Remove GEMDOS support from msdosfs. I don't think anyone is going to
port FreeBSD to Atari machines any time soon.


86969 27-Nov-2001 des

Add support for a last-close handler.
Revert the module version bumps; they're quite pointless as long as the
only pseudofs consumer is linprocfs, which is in the tree.


86941 27-Nov-2001 ken

Fix mounting root from a ISO9660 filesystem on a SCSI CDROM.

The problem was that the ISO9660 code wasn't opening the device prior to
issuing ioctl calls. In particular, the device must be open before
iso_get_ssector() is called in iso_mountroot().

If the device isn't opened first, the disk layer blows up due to an
uninitialized variable.

The solution was to open the device, call iso_get_ssector() and then close
it again.

The ATAPI CDROM driver doesn't have this problem because it doesn't use the
disk layer, and evidently doesn't mind if someone issues an ioctl without
first issuing an open call.

Thanks to phk for pointing me at the source of this problem.

Tested by: dirk
MFC after: 1 week


86931 27-Nov-2001 jhb

Replace 'p' with 'td' as appropriate.


86930 27-Nov-2001 jhb

GC compat macros HASHINIT, VOP__LOCK, VOP__UNLOCK, VGET, and VN_LOCK.


86929 27-Nov-2001 jhb

Expand LOCKMGR() compat macro.


86928 26-Nov-2001 jhb

GC some KSE compatiblity macros that were somehow still here.


86927 26-Nov-2001 jhb

GC non-FreeBSD code that didn't work anyways.


86892 25-Nov-2001 dd

Address two minor issues: implement the _PC_NAME_MAX and _PC_PATH_MAX
pathconf() variables for directories, and set st_size and st_blocks
(of struct stat) for directories as appropriate. Note that st_size is
always set to DEV_BSIZE, since the size of the directories is not
currently kept.

Reviewed by: phk, bde


86872 24-Nov-2001 dillon

convert holdsock() to fget(). Add XXX reminder for future socket locking.


86481 17-Nov-2001 peter

Missing KSE s/curproc/curthread/


86185 08-Nov-2001 alfred

Switch behavior of fifos to more closely match what goes on in other OSes.
Basically FIFOs become a real pain to abuse as a rendevous point without
this change because you can't really select(2) on them because they always
return ready even though there is no writer (to signal EOF).

Obtained from: BSD/os


86165 07-Nov-2001 peter

Fix printf format bugs introduced in rev 1.34 for printing times.
quad_t cannot be printed with %lld on 64 bit systems.

Dont waste cpu to round user and system times up to long long, it is
highly improbable that a process will have accumulated 68 years of
user or system cpu time (not wall clock time) before a reboot or
process restart.


86136 06-Nov-2001 green

Correctly unlock the target process if /proc/$foo/mem is open()ed by
another process which cannot p_candebug() it. The bug was introduced
in rev. 1.100.

Approved by: des


86056 04-Nov-2001 dillon

Fix the fix. BIO_ERROR must be set in b_ioflags, not b_flags


86040 04-Nov-2001 phk

Fix "echo > /dev/null" for non-root users which broke in previous commit.


86037 04-Nov-2001 dillon

Add mnt_reservedvnlist so we can MFC to 4.x, in order to make all mount
structure changes now rather then piecemeal later on. mnt_nvnodelist
currently holds all the vnodes under the mount point. This will eventually
be split into a 'dirty' and 'clean' list. This way we only break kld's once
rather then twice. nvnodelist will eventually turn into the dirty list
and should remain compatible with the klds.


86009 04-Nov-2001 phk

B_ERROR is BIO_ERROR on -current.

Now it compiles, I don't know if it works.


86003 04-Nov-2001 dillon

Fix a bug in CD9660 when vmiodirenable is turned on. CD9660 was assuming
that a buffer's b_blkno would be valid. This is true when vmiodirenable
is turned off because the B_MALLOC'd buffer's data is invalidated when
the buffer is destroyed. But when vmiodirenable is turned on a buffer
can be reconstituted from its VMIO backing store. The reconstituted buffer
will have no knowledge of the physical block translation and the result is
serious directory corruption of the CDROM.

The solution is to fix cd9660_blkatoff() to always BMAP the buffer if
b_lblkno == b_blkno.

MFC after: 0 days


85980 03-Nov-2001 phk

Use vfs_timestamp() instead of getnanotime().

Add magic stuff copied from ufs_setattr().

Instructed by: bde


85979 03-Nov-2001 phk

Use vfs_timestamp() instead of getnanotime() directly.
Fix some modes on directories and symlinks.

Instructed by: bde


85940 03-Nov-2001 des

Reduce the number of #include dependencies by declaring some of the structs
used in pseudofs.h as opaque structs.


85644 28-Oct-2001 dillon

Adjust printfs to be time_t agnostic.


85561 26-Oct-2001 des

Add VOP_IOCTL support, and fix a bug that would cause a panic if a file or
symlink lacked a filler function.


85339 23-Oct-2001 dillon

Change the vnode list under the mount point from a LIST to a TAILQ
in preparation for an implementation of limiting code for kern.maxvnodes.

MFC after: 3 days


85320 22-Oct-2001 des

No, you may not /* FALLTHROUGH */. Not only will you return an incorrect
result, but you'd corrupt the kernel malloc() arena if it weren't for a
small but life-saving optimization in ioctl().

MFC after: 1 week


85297 21-Oct-2001 des

Move procfs_* from procfs_machdep.c into sys_process.c, and rename them to
proc_* in the process; procfs_machdep.c is no longer needed.

Run-tested on i386, build-tested on Alpha, untested on other platforms.


85208 20-Oct-2001 jhb

Assert that a ucred is unshared before we remap its ids.


85180 19-Oct-2001 des

Argh! I updated the version number in the MODULE_DEPEND() thingamagook but
not in the actual MODULE_VERSION(). Pass me the pointy hat.


85128 19-Oct-2001 des

Switch to dynamic rather than static initialization.
This makes it possible (in theory) for nodes to be added and / or removed
from pseudofs filesystems at runtime.


84874 13-Oct-2001 bde

Fixed bitrot in a banal comment by removing the comment.


84873 13-Oct-2001 bde

Backed out vestiges of the quick fixes for the transient breakage of
<sys/mount.h> in rev.1.106 of the latter (don't include <sys/socket.h>
just to work around bugs in <sys/mount.h>).


84827 11-Oct-2001 jhb

Change the kernel's ucred API as follows:
- crhold() returns a reference to the ucred whose refcount it bumps.
- crcopy() now simply copies the credentials from one credential to
another and has no return value.
- a new crshared() primitive is added which returns true if a ucred's
refcount is > 1 and false (0) otherwise.


84811 11-Oct-2001 jhb

Add missing includes of sys/lock.h.


84637 07-Oct-2001 des

Dissociate ptrace from procfs.

Until now, the ptrace syscall was implemented as a wrapper that called
various functions in procfs depending on which ptrace operation was
requested. Most of these functions were themselves wrappers around
procfs_{read,write}_{,db,fp}regs(), with only some extra error checks,
which weren't necessary in the ptrace case anyway.

This commit moves procfs_rwmem() from procfs_mem.c into sys_process.c
(renaming it to proc_rwmem() in the process), and implements ptrace()
directly in terms of procfs_{read,write}_{,db,fp}regs() instead of
having it fake up a struct uio and then call procfs_do{,db,fp}regs().

It also moves the prototypes for procfs_{read,write}_{,db,fp}regs()
and proc_rwmem() from proc.h to ptrace.h, and marks all procfs files
except procfs_machdep.c as "optional procfs" instead of "standard".


84634 07-Oct-2001 des

Remove some useless preprocesor paranoia.


84633 07-Oct-2001 des

In procfs_readdir(), when the directory being read was a process directory,
the target process was being held locked during the uiomove() call. If the
process calling readdir() was the same as the target process (for instance
'ls /proc/curproc/'), and uiomove() caused a page fault, the result would
be a proc lock recursion. I have no idea how long this has been broken -
possibly ever since pfind() was changed to lock the process it returns.

Also replace the one and only call to procfs_findtextvp() with a direct
test of td->td_proc->p_textvp.


84386 02-Oct-2001 des

Add a PFS_DISABLED flag; pfs_visible() automatically returns 0 if it is set
on the node in question. Also add two API functions for setting and clearing
this flag; setting it also reclaims all vnodes associated with the node.


84383 02-Oct-2001 des

Only print "XXX (un)registered" message if bootverbose.


84247 01-Oct-2001 des

[the previous commit to pseudofs_vncache.c got the wrong log message]

YA pseudofs megacommit, part 2:

- Merge the pfs_vnode and pfs_vdata structures, and make the vnode cache
a doubly-linked list. This eliminates the need to walk the list in
pfs_vncache_free().

- Add an exit callout which revokes vnodes associated with the process
that just exited. Since it needs to lock the cache when it does this,
pfs_vncache_mutex needs MTX_RECURSE.


84246 01-Oct-2001 des

YA pseudofs megacommit, part 1:

- Add a third callback to the pfs_node structure. This one simply returns
non-zero if the specified requesting process is allowed to access the
specified node for the specified target process. This is used in
addition to the usual permission checks, e.g. when certain files don't
make sense for certain (system) processes.

- Make sure that pfs_lookup() and pfs_readdir() don't yap about files
which aren't pfs_visible(). Also check pfs_visible() before performing
reads and writes, to prevent the kind of races reported in SA-00:77 and
SA-01:55 (fork a child, open /proc/child/ctl, have that child fork a
setuid binary, and assume control of it).

- Add some more trace points.


84187 30-Sep-2001 des

pseudofs.h:

- Rearrange the flag constants a little to simplify specifying and testing
for readability and writeability.

pseudofs_vnops.c:

- Track the aforementioned change.

- Add checks to pfs_open() to prevent opening read-only files for writing
or vice versa (pfs_{read,write} would block the actual reads and writes,
but it's still a bug to allow the open() to succeed). Also, return
EOPNOTSUPP if the caller attempts to lock the file.

- Add more trace points.


84156 30-Sep-2001 phk

The behaviour of whiteout'ing symlinks were too confusing, instead
remove them when asked to.


84098 29-Sep-2001 des

Pseudofs take 2:

- Remove hardcoded uid, gid, mode from struct pfs_node; make pfs_getattr()
smart enough to get it right most of the time, and allow for callbacks
to handle the remaining cases. Rework the definition macros to match.

- Add lots of (conditional) debugging output.

- Fix a long-standing bug inherited from procfs: don't pretend to be a
read-only file system. Instead, return EOPNOTSUPP for operations we
truly can't support and allow others to fail silently. In particular,
pfs_lookup() now treats CREATE as LOOKUP. This may need more work.

- In pfs_lookup(), if the parent node is process-dependent, check that
the process in question still exists.

- Implement pfs_open() - its only current function is to check that the
process opening the file can see the process it belongs to.

- Finish adding support for writeable nodes.

- Bump module version number.

- Introduce lots of new bugs.


84082 28-Sep-2001 des

The previous commit introduced some references to "curproc" which should have
been references to "curthread". Correct this.


83978 26-Sep-2001 rwatson

o Modify generic specfs device open access control checks to use
securelevel_ge() instead of direct securelevel variable checks.

Obtained from: TrustedBSD Project


83949 26-Sep-2001 fenner

Fix (typo? pasteo?): panic("ffs_mountroot..." -> panic("ntfs_mountroot...")


83927 25-Sep-2001 des

Clean up my source tree to avoid getting hit too badly by the next KSE or
whatever mega-commit. This goes some way towards adding support for
writeable files (needed by procfs).


83920 25-Sep-2001 mike

A process name may contain whitespace and unprintable characters,
so convert those characters to octal notation. Also convert
backslashes to octal notation to avoid confusion.

Reviewed by: des
MFC after: 1 week


83804 21-Sep-2001 jhb

Use the passed in thread to selrecord() instead of curthread.


83635 18-Sep-2001 rwatson

o Remove redundant securelevel/pid1 check in procfs_rw() -- this
protection is enforced at the invidual method layer using
p_candebug().

Obtained from: TrustedBSD Project


83417 13-Sep-2001 julian

fix typo
pointed out by: jhb


83384 12-Sep-2001 jhb

Restore these files to being portable:
- Use some simple #define's at the top of the files for proc -> thread
changes instead of having lots of needless #ifdef's in the code.
- Don't try to use struct thread in !FreeBSD code.
- Don't use a few struct lwp's in some of the NetBSD code since it isn't
in their HEAD.
The new diff relative to before KSE is now signficantly smaller and easier
to maintain.


83366 12-Sep-2001 julian

KSE Milestone 2
Note ALL MODULES MUST BE RECOMPILED
make the kernel aware that there are smaller units of scheduling than the
process. (but only allow one thread per process at this time).
This is functionally equivalent to teh previousl -current except
that there is a thread associated with each process.

Sorry john! (your next MFC will be a doosie!)

Reviewed by: peter@freebsd.org, dillon@freebsd.org

X-MFC after: ha ha ha ha


83291 10-Sep-2001 kris

Fix some signed/unsigned integer confusion, and add bounds checking of
arguments to some functions.

Obtained from: NetBSD
Reviewed by: peter
MFC after: 2 weeks


83229 08-Sep-2001 semenu

Stole unicode translation table from mount_msdos. Add kernel code
to support this translation.

MFC after: 2 weeks


83227 08-Sep-2001 semenu

Fix opening particular file's attributes (as described in man page).
This is useful for debug purposes.

MFC after: 2 weeks


83226 08-Sep-2001 semenu

Reference devvp on ntnode creation and dereference on removal. Previous
code lead to page faults becouse i_devvp went zero after VOP_RECLAIM, but
ntnode was reused (not reclaimed).

MFC after: 2 weeks


83225 08-Sep-2001 semenu

Fix errors and warnings when compiling with NTFS_DEBUG > 1

MFC after: 2 weeks


82517 29-Aug-2001 ache

smbfs_advlock: simplify overflow checks (copy from kern_lockf.c)
minor formatting issues to minimize differences


82347 26-Aug-2001 ache

Cosmetique & style fixes from bde


82270 24-Aug-2001 ache

Copy from kern_lockf.c: remove extra check


82210 23-Aug-2001 ache

Copy yet one check for SEEK_END overflow


82203 23-Aug-2001 ache

Copy my newly introduced l_len<0 'oops' fix from kern_lockf.c


82201 23-Aug-2001 ache

Copy POSIX l_len<0 handling from kern_lockf.c


82196 23-Aug-2001 ache

Cosmetique: correct English in comments
non-cosmetique: add missing break; - original code was broken here


82190 23-Aug-2001 ache

Move <machine/*> after <sys/*>

Pointed by: bde


82175 23-Aug-2001 ache

adv. lock:
copy EOVERFLOW handling code from main variant
fix type of 'size' arg


82039 21-Aug-2001 bp

Use proper endian conversion.

Obtained from: Mac OS X
MFC after: 1 week


82038 21-Aug-2001 bp

Return proper length of _PC_NAME_MAX value if long names support is enabled.

Obtained from: Mac OS X
MFC after: 1 week


81620 14-Aug-2001 phk

linux ls fails on DEVFS /dev because linux_getdents fails because
linux_getdents uses VOP_READDIR( ..., &ncookies, &cookies ) instead of
VOP_READDIR( ..., NULL, NULL ) because it seems to need the offsets for
linux_dirent and sizeof(dirent) != sizeof(linux_dirent)...

PR: 29467
Submitted by: Michael Reifenberger <root@nihil.plaut.de>
Reviewed by: phk


81112 03-Aug-2001 rwatson

Remove dangling prototype for the now defunct procfs_kmemaccess()
call.

Obtained from: TrustedBSD Project


81109 03-Aug-2001 rwatson

Collapse a Pmem case in with the other debugging files case for procfs,
as there are now "unusual" protection properties to Pmem that differ
from the other files. While I'm at it, introduce proc locking for
the other files, which was previously present only in the Pmem case.

Obtained from: TrustedBSD Project


81108 03-Aug-2001 rwatson

Remove read permission for group on the /proc/*/mem file, since kmem
no longer requires access.

Reviewed by: tmm
Obtained from: TrustedBSD Project


81107 03-Aug-2001 rwatson

Prior to support for almost all ps activity via sysctl, ps used procfs,
and so special-casing was introduced to provide extra procfs privilege
to the kmem group. With the advent of non-setgid kmem ps, this code
is no longer required, and in fact, can is potentially harmful as it
allocates privilege to a gid that is increasingly less meaningful.
Knowledge of specific gid's in kernel is also generally bad precedent,
as the kernel security policy doesn't distinguish gid's specifically,
only uid 0.

This commit removes reference to kmem in procfs, both in terms of
access control decisions, and the applying of gid kmem to the
/proc/*/mem file, simplifying the associated code considerably.
Processes are still permitted to access the mem file based on
the debugging policy, so ps -e still works fine for normal
processes and use.

Reviewed by: tmm
Obtained from: TrustedBSD Project


79996 19-Jul-2001 assar

remove support for creating files and directories from msdosfs_mknod


79872 18-Jul-2001 jhb

Grab the process lock around psignal().

Noticed by: tanimura


79335 05-Jul-2001 rwatson

o Replace calls to p_can(..., P_CAN_xxx) with calls to p_canxxx().
The p_can(...) construct was a premature (and, it turns out,
awkward) abstraction. The individual calls to p_canxxx() better
reflect differences between the inter-process authorization checks,
such as differing checks based on the type of signal. This has
a side effect of improving code readability.
o Replace direct credential authorization checks in ktrace() with
invocation of p_candebug(), while maintaining the special case
check of KTR_ROOT. This allows ktrace() to "play more nicely"
with new mandatory access control schemes, as well as making its
authorization checks consistent with other "debugging class"
checks.
o Eliminate "privused" construct for p_can*() calls which allowed the
caller to determine if privilege was required for successful
evaluation of the access control check. This primitive is currently
unused, and as such, serves only to complicate the API.

Approved by: ({procfs,linprocfs} changes) des
Obtained from: TrustedBSD Project


79245 04-Jul-2001 jhb

- Update the vmmeter statistics for vnode pageins and pageouts in
getpages/putpages.
- Use vm_page_undirty() instead of messing with pages' dirty fields
directly.


79224 04-Jul-2001 dillon

With Alfred's permission, remove vm_mtx in favor of a fine-grained approach
(this commit is just the first stage). Also add various GIANT_ macros to
formalize the removal of Giant, making it easy to test in a more piecemeal
fashion. These macros will allow us to test fine-grained locks to a degree
before removing Giant, and also after, and to remove Giant in a piecemeal
fashion via sysctl's on those subsystems which the authors believe can
operate without Giant.


78907 28-Jun-2001 jhb

Fix a mntvnode and vnode interlock reversal.


78906 28-Jun-2001 jhb

Protect the mnt_vnode list with the mntvnode lock.


78274 15-Jun-2001 des

#if 0 out pfs_null() to silence the warning about it not being referenced.


78244 15-Jun-2001 peter

Fix warning: 568: warning: `portal_badop' defined but not used


78242 15-Jun-2001 peter

Fix warning (exposed NetBSD code):
94: warning: `ntfs_bmap' declared `static' but never defined


78241 15-Jun-2001 peter

Fix warnings (mostly harmless, due to struct bio being embedded in buf):
738: warning: passing arg 1 of `biodone' from incompatible pointer type
745: warning: passing arg 1 of `biodone' from incompatible pointer type


78240 15-Jun-2001 peter

Fix warning: 552: warning: `fdesc_badop' defined but not used


78229 15-Jun-2001 peter

Warning fix: coda_fbsd.c:113: warning: unused variable `ret'


78205 14-Jun-2001 bp

Coda do not call vop_defaultop(), so add nesessary calls for VM objects.

Submitted by: Greg Troxel <gdt@ir.bbn.com>
MFC after: 2 days


78179 13-Jun-2001 mjacob

the last argument to copyinstr is of t ype size_t, not u_int


78161 13-Jun-2001 peter

With this commit, I hereby pronounce gensetdefs past its use-by date.

Replace the a.out emulation of 'struct linker_set' with something
a little more flexible. <sys/linker_set.h> now provides macros for
accessing elements and completely hides the implementation.

The linker_set.h macros have been on the back burner in various
forms since 1998 and has ideas and code from Mike Smith (SET_FOREACH()),
John Polstra (ELF clue) and myself (cleaned up API and the conversion
of the rest of the kernel to use it).

The macros declare a strongly typed set. They return elements with the
type that you declare the set with, rather than a generic void *.

For ELF, we use the magic ld symbols (__start_<setname> and
__stop_<setname>). Thanks to Richard Henderson <rth@redhat.com> for the
trick about how to force ld to provide them for kld's.

For a.out, we use the old linker_set struct.

NOTE: the item lists are no longer null terminated. This is why
the code impact is high in certain areas.

The runtime linker has a new method to find the linker set
boundaries depending on which backend format is in use.

linker sets are still module/kld unfriendly and should never be used
for anything that may be modular one day.

Reviewed by: eivind


78073 11-Jun-2001 des

For some reason, though the module builds just fine without <sys/lock.h>,
LINT fails to build without it.


78018 10-Jun-2001 des

Bail out if the fill function failed.


78017 10-Jun-2001 des

Whoops, some of my test code snuck in here.


78003 10-Jun-2001 des

Argh. Fix braino in previous commit.


78001 10-Jun-2001 des

Add a 'flags' argument to the PFS_PROCDIR macro.


77998 10-Jun-2001 des

Add support for process-dependent directories. This means that save for
the lack of a man page, pseudofs is mostly complete now.


77967 10-Jun-2001 des

Blah, not my day. This file needs <sys/mutex.h> now.


77966 10-Jun-2001 des

Remember to unlock the process pfind() returns.


77965 10-Jun-2001 des

Add missing #include of <sys/mutex.h>.


77964 10-Jun-2001 des

Catch up with the change in sbuf_new's prototype.


77821 06-Jun-2001 jlemon

The kq write filter was hooked up to the wrong socket, and thus was
not behaving correctly. Fix by attaching to the correct socket.

Also call so{rw}wakeup in addition to the fifo wakeup, so that any
kqfilters attached to the socket buffer get poked.


77799 06-Jun-2001 tanimura

Lock VM Giant prior to locking a vm map.

Spotted by: Daniel Rock <D.Rock@t-online.de>
Tested by: David Wolfskill <david@catwhisker.org>,
Sean Eric Fagan <sef@kithrup.com>


77784 05-Jun-2001 shafeeq

Now works again and as a module and with devfs.
Used the bpf & tun drivers as examples as to what is necessary for devfs.


77589 01-Jun-2001 brian

Support /dev/tun cloning. Ansify if_tun.c while I'm there.

Only tun0 -> tun32767 may now be opened as struct ifnet's if_unit
is a short.

It's now possible to open /dev/tun and get a handle back for an available
tun device (use devname to find out what you got).

The implementation uses rman by popular demand (and against my judgement)
to track opened devices and uses the new dev_depends() to ensure that
all make_dev()d devices go away before the module is unloaded.

Reviewed by: phk


77577 01-Jun-2001 ru

- VFS_SET(msdos) -> VFS_SET(msdosfs)
- msdos.ko -> msdosfs.ko
- mount_msdos(8) -> mount_msdosfs(8)
- "msdos" -> "msdosfs" compatibility glue in mount(8)


77243 26-May-2001 phk

Don't copy the trailing zero in readlink, it confuses namei().

PR: 27656


77223 26-May-2001 ru

- sys/n[tw]fs moved to sys/fs/n[tw]fs
- /usr/include/n[tw]fs moved to /usr/include/fs/n[tw]fs


77215 26-May-2001 phk

Create a general facility for making dev_t's depend on another
dev_t. The dev_depends(dev_t, dev_t) function is for tying them
to each other.

When destroy_dev() is called on a dev_t, all dev_t's depending
on it will also be destroyed (depth first order).

Rewrite the make_dev_alias() to use this dependency facility.

kern/subr_disk.c:
Make the disk mini-layer use dependencies to make sure all
relevant dev_t's are removed when the disk disappears.

Make the disk mini-layer precreate some magic sub devices
which the disk/slice/label code expects to be there.

kern/subr_disklabel.c:
Remove some now unneeded variables.

kern/subr_diskmbr.c:
Remove some ancient, commented out code.

kern/subr_diskslice.c:
Minor cleanup. Use name from dev_t instead of dsname()


77183 25-May-2001 rwatson

o Merge contents of struct pcred into struct ucred. Specifically, add the
real uid, saved uid, real gid, and saved gid to ucred, as well as the
pcred->pc_uidinfo, which was associated with the real uid, only rename
it to cr_ruidinfo so as not to conflict with cr_uidinfo, which
corresponds to the effective uid.
o Remove p_cred from struct proc; add p_ucred to struct proc, replacing
original macro that pointed.
p->p_ucred to p->p_cred->pc_ucred.
o Universally update code so that it makes use of ucred instead of pcred,
p->p_ucred instead of p->p_pcred, cr_ruidinfo instead of p_uidinfo,
cr_{r,sv}{u,g}id instead of p_*, etc.
o Remove pcred0 and its initialization from init_main.c; initialize
cr_ruidinfo there.
o Restruction many credential modification chunks to always crdup while
we figure out locking and optimizations; generally speaking, this
means moving to a structure like this:
newcred = crdup(oldcred);
...
p->p_ucred = newcred;
crfree(oldcred);
It's not race-free, but better than nothing. There are also races
in sys_process.c, all inter-process authorization, fork, exec, and
exit.
o Remove sigio->sio_ruid since sigio->sio_ucred now contains the ruid;
remove comments indicating that the old arrangement was a problem.
o Restructure exec1() a little to use newcred/oldcred arrangement, and
use improved uid management primitives.
o Clean up exit1() so as to do less work in credential cleanup due to
pcred removal.
o Clean up fork1() so as to do less work in credential cleanup and
allocation.
o Clean up ktrcanset() to take into account changes, and move to using
suser_xxx() instead of performing a direct uid==0 comparision.
o Improve commenting in various kern_prot.c credential modification
calls to better document current behavior. In a couple of places,
current behavior is a little questionable and we need to check
POSIX.1 to make sure it's "right". More commenting work still
remains to be done.
o Update credential management calls, such as crfree(), to take into
account new ruidinfo reference.
o Modify or add the following uid and gid helper routines:
change_euid()
change_egid()
change_ruid()
change_rgid()
change_svuid()
change_svgid()
In each case, the call now acts on a credential not a process, and as
such no longer requires more complicated process locking/etc. They
now assume the caller will do any necessary allocation of an
exclusive credential reference. Each is commented to document its
reference requirements.
o CANSIGIO() is simplified to require only credentials, not processes
and pcreds.
o Remove lots of (p_pcred==NULL) checks.
o Add an XXX to authorization code in nfs_lock.c, since it's
questionable, and needs to be considered carefully.
o Simplify posix4 authorization code to require only credentials, not
processes and pcreds. Note that this authorization, as well as
CANSIGIO(), needs to be updated to use the p_cansignal() and
p_cansched() centralized authorization routines, as they currently
do not take into account some desirable restrictions that are handled
by the centralized routines, as well as being inconsistent with other
similar authorization instances.
o Update libkvm to take these changes into account.

Obtained from: TrustedBSD Project
Reviewed by: green, bde, jhb, freebsd-arch, freebsd-audit


77162 25-May-2001 ru

- sys/msdosfs moved to sys/fs/msdosfs
- msdos.ko renamed to msdosfs.ko
- /usr/include/msdosfs moved to /usr/include/fs/msdosfs


77133 24-May-2001 ru

Actually rename FDESC, PORTAL, UMAP and UNION file systems.

OK'ed by: bp


77131 24-May-2001 ru

mount_umap(8) -> mount_umapfs(8).


77130 24-May-2001 ru

mount_null(8) -> mount_nullfs(8).


77084 23-May-2001 jhb

Don't acquire/release Giant around some of the places that need it in
spec_getpages(). Instead, assert that Giant is held by the caller.


77050 23-May-2001 phk

Change the way deletes are managed in DEVFS.

This fixes a number of warnings relating to removed cloned devices.

It also makes it possible to recreate deleted devices with
mknod(2). The major/minor arguments are ignored.


77031 23-May-2001 ru

- FDESC, FIFO, NULL, PORTAL, PROC, UMAP and UNION file
systems were repo-copied from sys/miscfs to sys/fs.

- Renamed the following file systems and their modules:
fdesc -> fdescfs, portal -> portalfs, union -> unionfs.

- Renamed corresponding kernel options:
FDESC -> FDESCFS, PORTAL -> PORTALFS, UNION -> UNIONFS.

- Install header files for the above file systems.

- Removed bogus -I${.CURDIR}/../../sys CFLAGS from userland
Makefiles.


76945 21-May-2001 jhb

Sort includes from previous commit.


76827 19-May-2001 alfred

Introduce a global lock for the vm subsystem (vm_mtx).

vm_mtx does not recurse and is required for most low level
vm operations.

faults can not be taken without holding Giant.

Memory subsystems can now call the base page allocators safely.

Almost all atomic ops were removed as they are covered under the
vm mutex.

Alpha and ia64 now need to catch up to i386's trap handlers.

FFS and NFS have been tested, other filesystems will need minor
changes (grabbing the vm lock when twiddling page properties).

Reviewed (partially) by: jake, jhb


76797 18-May-2001 bp

Currently there is no way to tell if write operation invoked via
vn_start_write() on the given vnode will be successful. VOP_LEASE() may
help to solve this problem, but its return value ignored nearly everywhere.
For now just assume that the missing upper layer on write means insufficient
access rights (which is correct for most cases).


76718 17-May-2001 bp

VOP getwritemount() can be invoked on vnodes with VFREE flag set (used in
snapshots code). At this point upper vp may not exist.


76716 17-May-2001 bp

Use vop_*vobject() VOPs to get reference to VM object from upper or lower fs.


76715 17-May-2001 bp

Do not leave an extra reference on vnode.

PR: kern/27250
Submitted by: "Vladimir B. Grebenschikov" <vova@express.ru>
MFC after: 2 weeks


76688 16-May-2001 iedowse

Change the second argument of vflush() to an integer that specifies
the number of references on the filesystem root vnode to be both
expected and released. Many filesystems hold an extra reference on
the filesystem root vnode, which must be accounted for when
determining if the filesystem is busy and then released if it isn't
busy. The old `skipvp' approach required individual filesystem
xxx_unmount functions to re-implement much of vflush()'s logic to
deal with the root vnode.

All 9 filesystems that hold an extra reference on the root vnode
got the logic wrong in the case of forced unmounts, so `umount -f'
would always fail if there were any extra root vnode references.
Fix this issue centrally in vflush(), now that we can.

This commit also fixes a vnode reference leak in devfs, which could
result in idle devfs filesystems that refuse to unmount.

Reviewed by: phk, bp


76571 14-May-2001 phk

After a successfull poll of the cloning functions, match on the
returned dev_t rather than the original name.

This allows cloning from one name to another which is useful for
/dev/tty and later for the pty's.


76554 13-May-2001 phk

Convert DEVFS from an "opt-in" to an "opt-out" option.

If for some reason DEVFS is undesired, the "NODEVFS" option is
needed now.

Pending any significant issues, DEVFS will be made mandatory in
-current on july 1st so that we can start reaping the full
benefits of having it.


76491 11-May-2001 jhb

GC prototype for procfs_bmap() missed during a previous commit.


76320 06-May-2001 phk

Remove unneeded devfs_badop()

Noticed by: rwatson


76236 03-May-2001 bp

Convert vnode_pager_freepage() to vm_free_page().

Forgotten by: alfred


76167 01-May-2001 phk

Implement vop_std{get|put}pages() and add them to the default vop[].

Un-copy&paste all the VOP_{GET|PUT}PAGES() functions which do nothing but
the default.


76166 01-May-2001 markm

Undo part of the tangle of having sys/lock.h and sys/mutex.h included in
other "system" header files.

Also help the deprecation of lockmgr.h by making it a sub-include of
sys/lock.h and removing sys/lockmgr.h form kernel .c files.

Sort sys/*.h includes where possible in affected files.

OK'ed by: bde (with reservations)


76160 30-Apr-2001 phk

Uncut&paste som bogus use of VOP_BMAP in cd9660::VOP_STRATEGY.
XXX mark some stuff which looks like further cut&paste junk.


76159 30-Apr-2001 phk

Uncut&paste som bogus use of VOP_BMAP in hpfs::VOP_STRATEGY.

At the same time, eliminate uninitialized use of a vnode
pointer. Interesting GCC didn't spot this.


76146 30-Apr-2001 bde

Backed out previous commit. It cause massive filesystem corruption,
not to mention a compile-time warning about the critical function
becoming unused, by replacing spec_bmap() with vop_stdbmap().

ntfs seems to have the same bug.

The factor for converting specfs block numbers to physical block
numbers is 1, but vop_stdbmap() uses the bogus factor
btodb(ap->a_vp->v_mount->mnt_stat.f_iosize), which is 16 for ffs with
the default block size of 8K. This factor is bogus even for vop_stdbmap()
-- the correct factor is related to the filesystem blocksize which is not
necessarily the same to the optimal i/o size. vop_stdbmap() was apparently
cloned from nfs where these sizes happen to be the same.

There may also be a problem with a_vp->v_mount being null. spec_bmap()
still checks for this, but I think the checks in specfs are dead code
which used to support block devices.


76131 29-Apr-2001 phk

Add a vop_stdbmap(), and make it part of the default vop vector.

Make 7 filesystems which don't really know about VOP_BMAP rely
on the default vector, rather than more or less complete local
vop_nopbmap() implementations.


76117 29-Apr-2001 grog

Revert consequences of changes to mount.h, part 2.

Requested by: bde


75934 25-Apr-2001 phk

Move the netexport structure from the fs-specific mountstructure
to struct mount.

This makes the "struct netexport *" paramter to the vfs_export
and vfs_checkexport interface unneeded.

Consequently that all non-stacking filesystems can use
vfs_stdcheckexp().

At the same time, make it a pointer to a struct netexport
in struct mount, so that we can remove the bogus AF_MAX
and #include <net/radix.h> from <sys/mount.h>


75893 24-Apr-2001 jhb

Change the pfind() and zpfind() functions to lock the process that they
find before releasing the allproc lock and returning.

Reviewed by: -smp, dfr, jake


75877 23-Apr-2001 mjacob

fix it so it compiles again


75874 23-Apr-2001 mjacob

add this ridiculous include foo so it will compile again


75858 23-Apr-2001 grog

Correct #includes to work with fixed sys/mount.h.


75856 23-Apr-2001 grog

Correct #includes to work with fixed sys/mount.h.


75692 19-Apr-2001 alfred

vnode_pager_freepage() is really vm_page_free() in disguise,
nuke vnode_pager_freepage() and replace all calls to it with vm_page_free()


75580 17-Apr-2001 phk

This patch removes the VOP_BWRITE() vector.

VOP_BWRITE() was a hack which made it possible for NFS client
side to use struct buf with non-bio backing.

This patch takes a more general approach and adds a bp->b_op
vector where more methods can be added.

The success of this patch depends on bp->b_op being initialized
all relevant places for some value of "relevant" which is not
easy to determine. For now the buffers have grown a b_magic
element which will make such issues a tiny bit easier to debug.


75478 13-Apr-2001 bp

Move VT_SMBFS definition to the proper place. Undefine VI_LOCK/VI_UNLOCK.


75374 10-Apr-2001 bp

Import kernel part of SMB/CIFS requester.
Add smbfs(CIFS) filesystem.

Userland part will be in the ports tree for a while.

Obtained from: smbfs-1.3.7-dev package.


75295 07-Apr-2001 des

Let pseudofs into the warmth of the FreeBSD CVS repo.

It's not finished yet (I still have to find a way to implement process-
dependent nodes without consuming too much memory, and the permission
system needs tightening up), but it's becoming hard to work on without
a repo (I've accidentally almost nuked it once already), and it works
(except for the lack of process-dependent nodes, that is).

I was supposed to commit this a week ago, but timed out waiting for jkh
to reply to some questions I had. Pass him a spoonful of bad karma :)


74996 29-Mar-2001 jhb

- Various style fixes.
- Fix a silly bug so that we return the actual error code if a procfs
attach fails rather than always returning 0.

Reported by: bde


74927 28-Mar-2001 jhb

Convert the allproc and proctree locks from lockmgr locks to sx locks.


74914 28-Mar-2001 jhb

Catch up to header include changes:
- <sys/mutex.h> now requires <sys/systm.h>
- <sys/mutex.h> and <sys/sx.h> now require <sys/lock.h>


74810 26-Mar-2001 phk

Send the remains (such as I have located) of "block major numbers" to
the bit-bucket.


74637 22-Mar-2001 bp

Add dependancy on libmchain module.

Spotted by: Andrzej Tobola <san@iem.pw.edu.pl>


74273 15-Mar-2001 rwatson

o Change the API and ABI of the Extended Attribute kernel interfaces to
introduce a new argument, "namespace", rather than relying on a first-
character namespace indicator. This is in line with more recent
thinking on EA interfaces on various mailing lists, including the
posix1e, Linux acl-devel, and trustedbsd-discuss forums. Two namespaces
are defined by default, EXTATTR_NAMESPACE_SYSTEM and
EXTATTR_NAMESPACE_USER, where the primary distinction lies in the
access control model: user EAs are accessible based on the normal
MAC and DAC file/directory protections, and system attributes are
limited to kernel-originated or appropriately privileged userland
requests.

o These API changes occur at several levels: the namespace argument is
introduced in the extattr_{get,set}_file() system call interfaces,
at the vnode operation level in the vop_{get,set}extattr() interfaces,
and in the UFS extended attribute implementation. Changes are also
introduced in the VFS extattrctl() interface (system call, VFS,
and UFS implementation), where the arguments are modified to include
a namespace field, as well as modified to advoid direct access to
userspace variables from below the VFS layer (in the style of recent
changes to mount by adrian@FreeBSD.org). This required some cleanup
and bug fixing regarding VFS locks and the VFS interface, as a vnode
pointer may now be optionally submitted to the VFS_EXTATTRCTL()
call. Updated documentation for the VFS interface will be committed
shortly.

o In the near future, the auto-starting feature will be updated to
search two sub-directories to the ".attribute" directory in appropriate
file systems: "user" and "system" to locate attributes intended for
those namespaces, as the single filename is no longer sufficient
to indicate what namespace the attribute is intended for. Until this
is committed, all attributes auto-started by UFS will be placed in
the EXTATTR_NAMESPACE_SYSTEM namespace.

o The default POSIX.1e attribute names for ACLs and Capabilities have
been updated to no longer include the '$' in their filename. As such,
if you're using these features, you'll need to rename the attribute
backing files to the same names without '$' symbols in front.

o Note that these changes will require changes in userland, which will
be committed shortly. These include modifications to the extended
attribute utilities, as well as to libutil for new namespace
string conversion routines. Once the matching userland changes are
committed, a buildworld is recommended to update all the necessary
include files and verify that the kernel and userland environments
are in sync. Note: If you do not use extended attributes (most people
won't), upgrading is not imperative although since the system call
API has changed, the new userland extended attribute code will no longer
compile with old include files.

o Couple of minor cleanups while I'm there: make more code compilation
conditional on FFS_EXTATTR, which should recover a bit of space on
kernels running without EA's, as well as update copyright dates.

Obtained from: TrustedBSD Project


74105 11-Mar-2001 sobomax

Add missed MODULE_VERSION() call, so loading of unicode conversion routine
works properly.

Clue beaten in by: des


74099 11-Mar-2001 bp

Do not kill vnodes after rename. This can cause deadlocks in the deadfs.

Noticed by: Matthew N. Dodd <winter@jurai.net>


74096 11-Mar-2001 bp

Add a mount time option which slightly relaxes checks for valid Joilet
extensions.

PR: kern/23315
Reviewed by: adrian


74064 10-Mar-2001 bp

Slightly reorganize allocation of new vnode. Use bit NVOLUME to detected
vnodes which represent volumes (before it was done via strcmp()).
Turn n_refparent into bit in the n_flag field.


74062 10-Mar-2001 bp

Synch with changes in the NCP requester.


73942 07-Mar-2001 mckusick

Fixes to track snapshot copy-on-write checking in the specinfo
structure rather than assuming that the device vnode would reside
in the FFS filesystem (which is obviously a broken assumption with
the device filesystem).


73929 07-Mar-2001 jhb

Grab the process lock while calling psignal and before calling psignal.


73920 07-Mar-2001 jhb

Proc locking identical to that of linprocfs' vnops except that we hold the
proc lock while calling psignal.


73919 07-Mar-2001 jhb

Protect read to p_pptr with proc lock rather than proctree lock.


73918 07-Mar-2001 jhb

Proc locking. Lock around psignal() and also ensure both an exclusive
proctree lock and the process lock are held when updating p_pptr and
p_oppid. When we are just reaading p_pptr we only need the proc lock and
not a proctree lock as well.


73906 07-Mar-2001 jhb

Protect p_flag with the proc lock.


73871 06-Mar-2001 bp

A name of the file can change while its id stays the same. So, we have
to update it as well.

Remove unused function.


73383 03-Mar-2001 dfr

Remove the copyinstr call which was trying to copy the pathname in from
user space. It has already been copied in and mp->mnt_stat.f_mntonname has
already been initialised by the caller.

This fixes a panic on the alpha caused by the fact that the variable
'size' wasn't initialised because the call to copyinstr() bailed out with
an EFAULT error.


73286 01-Mar-2001 adrian

Reviewed by: jlemon

An initial tidyup of the mount() syscall and VFS mount code.

This code replaces the earlier work done by jlemon in an attempt to
make linux_mount() work.

* the guts of the mount work has been moved into vfs_mount().

* move `type', `path' and `flags' from being userland variables into being
kernel variables in vfs_mount(). `data' remains a pointer into
userspace.

* Attempt to verify the `type' and `path' strings passed to vfs_mount()
aren't too long.

* rework mount() and linux_mount() to take the userland parameters
(besides data, as mentioned) and pass kernel variables to vfs_mount().
(linux_mount() already did this, I've just tidied it up a little more.)

* remove the copyin*() stuff for `path'. `data' still requires copyin*()
since its a pointer into userland.

* set `mount->mnt_statf_mntonname' in vfs_mount() rather than in each
filesystem. This variable is generally initialised with `path', and
each filesystem can override it if they want to.

* NOTE: f_mntonname is intiailised with "/" in the case of a root mount.


72933 23-Feb-2001 alfred

Display the Joliet Extension 'level' in the log message.

PR: kern/24998


72786 21-Feb-2001 rwatson

o Move per-process jail pointer (p->pr_prison) to inside of the subject
credential structure, ucred (cr->cr_prison).
o Allow jail inheritence to be a function of credential inheritence.
o Abstract prison structure reference counting behind pr_hold() and
pr_free(), invoked by the similarly named credential reference
management functions, removing this code from per-ABI fork/exit code.
o Modify various jail() functions to use struct ucred arguments instead
of struct proc arguments.
o Introduce jailed() function to determine if a credential is jailed,
rather than directly checking pointers all over the place.
o Convert PRISON_CHECK() macro to prison_check() function.
o Move jail() function prototypes to jail.h.
o Emulate the P_JAILED flag in fill_kinfo_proc() and no longer set the
flag in the process flags field itself.
o Eliminate that "const" qualifier from suser/p_can/etc to reflect
mutex use.

Notes:

o Some further cleanup of the linux/jail code is still required.
o It's now possible to consider resolving some of the process vs
credential based permission checking confusion in the socket code.
o Mutex protection of struct prison is still not present, and is
required to protect the reference count plus some fields in the
structure.

Reviewed by: freebsd-arch
Obtained from: TrustedBSD Project


72637 18-Feb-2001 phk

Remove a debug printf.


72521 15-Feb-2001 jlemon

Extend kqueue down to the device layer.

Backwards compatible approach suggested by: peter


72435 13-Feb-2001 sobomax

Add a hook for loading of a Unicode -> char conversion routine as a kld at a
run-time. This is temporary solution until proper kernel Unicode interfaces
are in place and as such was purposely designed to be as tiny as possible
(3 lines of the code not counting comments). The port with conversion routines
for the most popular single-byte languages will be added later today

Reviewed by: bp, "Michael C . Wu" <keichii@iteration.net>
Approved by: bp


72200 09-Feb-2001 bmilekic

Change and clean the mutex lock interface.

mtx_enter(lock, type) becomes:

mtx_lock(lock) for sleep locks (MTX_DEF-initialized locks)
mtx_lock_spin(lock) for spin locks (MTX_SPIN-initialized)

similarily, for releasing a lock, we now have:

mtx_unlock(lock) for MTX_DEF and mtx_unlock_spin(lock) for MTX_SPIN.
We change the caller interface for the two different types of locks
because the semantics are entirely different for each case, and this
makes it explicitly clear and, at the same time, it rids us of the
extra `type' argument.

The enter->lock and exit->unlock change has been made with the idea
that we're "locking data" and not "entering locked code" in mind.

Further, remove all additional "flags" previously passed to the
lock acquire/release routines with the exception of two:

MTX_QUIET and MTX_NOSWITCH

The functionality of these flags is preserved and they can be passed
to the lock/unlock routines by calling the corresponding wrappers:

mtx_{lock, unlock}_flags(lock, flag(s)) and
mtx_{lock, unlock}_spin_flags(lock, flag(s)) for MTX_DEF and MTX_SPIN
locks, respectively.

Re-inline some lock acq/rel code; in the sleep lock case, we only
inline the _obtain_lock()s in order to ensure that the inlined code
fits into a cache line. In the spin lock case, we inline recursion and
actually only perform a function call if we need to spin. This change
has been made with the idea that we generally tend to avoid spin locks
and that also the spin locks that we do have and are heavily used
(i.e. sched_lock) do recurse, and therefore in an effort to reduce
function call overhead for some architectures (such as alpha), we
inline recursion for this case.

Create a new malloc type for the witness code and retire from using
the M_DEV type. The new type is called M_WITNESS and is only declared
if WITNESS is enabled.

Begin cleaning up some machdep/mutex.h code - specifically updated the
"optimized" inlined code in alpha/mutex.h and wrote MTX_LOCK_SPIN
and MTX_UNLOCK_SPIN asm macros for the i386/mutex.h as we presently
need those.

Finally, caught up to the interface changes in all sys code.

Contributors: jake, jhb, jasone (in no particular order)


72091 06-Feb-2001 asmodai

Fix typo: seperate -> separate.

Seperate does not exist in the english language.


72012 04-Feb-2001 phk

Another round of the <sys/queue.h> FOREACH transmogriffer.

Created with: sed(1)
Reviewed by: md5(1)


71999 04-Feb-2001 phk

Mechanical change to use <sys/queue.h> macro API instead of
fondling implementation details.

Created with: sed(1)
Reviewed by: md5(1)


71998 04-Feb-2001 phk

Use <sys/queue.h> macro API.


71993 04-Feb-2001 phk

Remove a DIAGNOSTIC check which belongs in <sys/queue.h> if anyplace at all.


71945 02-Feb-2001 phk

At the point in time where most devices are created, we don't know what
time it is because boottime is not yet initialized. Finagle the relevant
fields when we get the chance.


71936 02-Feb-2001 phk

Only superuser can create symlinks.
Give symlinks mode 755 by default to avoid triggering alert eyes.
(the mode isn't use on symlinks)


71858 31-Jan-2001 peter

Zap last remaining references to (and a use use of) of simple_locks.


71829 30-Jan-2001 phk

Add a BUF_KERNPROC() in the BIO_DELETE path.

This seems to fix the problem which md(4) backed filesystems exposed.


71822 30-Jan-2001 phk

Fix two minor nits.

Existences revealed, but no details offered by: bp


71777 29-Jan-2001 dillon

This patch reestablishes the spec_fsync() guarentee that synchronous
fsyncs, which typically occur during unmounting, will drain all dirty
buffers even if it takes multiple passes to do so. The guarentee was
mangled by the last patch which solved a problem due to -current disabling
interrupts while holding giant (which caused an infinite spin loop waiting for
I/O to complete). -stable does not have either patch, but has a similar
bug in the original spec_fsync() code which is triggered by a bug in the
softupdates umount code, a fix for which will be committed to -current
as soon as Kirk stamps it. Then both solutions will be MFC'd to -stable.

-stable currently suffers from a combination of the softupdates bug and
a small window of opportunity in the original spec_fsync() code, and -stable
also suffers from the spin-loop bug but since interrupts are enabled the
spin resolves itself in a few milliseconds.


71699 27-Jan-2001 jhb

Back out proc locking to protect p_ucred for obtaining additional
references along with the actual obtaining of additional references.


71576 24-Jan-2001 jasone

Convert all simplelocks to mutexes and remove the simplelock implementations.


71569 24-Jan-2001 jhb

- Catch up to proc flag changes.


71509 24-Jan-2001 jhb

The lock being destroyed was misnamed, not unused. Add the lockdestroy()
back in but with the proper name so that this compiles.

Submitted by: jasone


71496 24-Jan-2001 jhb

Proc locking to protect p_ucred while we obtain additional references.


71482 23-Jan-2001 jhb

- Remove unused header include.
- Use queue macros.


71481 23-Jan-2001 jhb

Proc locking to protect p_ucred while we obtain an additional reference.


71480 23-Jan-2001 jhb

- FreeBSD doesn't have an abortop vnop as far as I can tell, so #ifdef
references to the hpf op out.
- Remove a lockdestroy() on a non-existent variable.


71138 17-Jan-2001 peter

Fix breakage unconvered by LINT - dont refer to undefined variables in
KASSERT()


70833 09-Jan-2001 wollman

Delete unused #include <sys/select.h>.


70829 09-Jan-2001 wollman

Don't compile a dead variable declaration.


70536 31-Dec-2000 phk

Use macro API to <sys/queue.h>


70528 30-Dec-2000 dillon

Fix a lockup problem that occurs with 'cvs update'. specfs's fsync can
get into the same sort of infinite loop that ffs's fsync used to get
into, probably due to background bitmap writes. The solution is
the same.


70374 26-Dec-2000 dillon

This implements a better launder limiting solution. There was a solution
in 4.2-REL which I ripped out in -stable and -current when implementing the
low-memory handling solution. However, maxlaunder turns out to be the saving
grace in certain very heavily loaded systems (e.g. newsreader box). The new
algorithm limits the number of pages laundered in the first pageout daemon
pass. If that is not sufficient then suceessive will be run without any
limit.

Write I/O is now pipelined using two sysctls, vfs.lorunningspace and
vfs.hirunningspace. This prevents excessive buffered writes in the
disk queues which cause long (multi-second) delays for reads. It leads
to more stable (less jerky) and generally faster I/O streaming to disk
by allowing required read ops (e.g. for indirect blocks and such) to occur
without interrupting the write stream, amoung other things.

NOTE: eventually, filesystem write I/O pipelining needs to be done on a
per-device basis. At the moment it is globalized.


70317 23-Dec-2000 jake

Protect proc.p_pptr and proc.p_children/p_sibling with the
proctree_lock.

linprocfs not locked pending response from informal maintainer.

Reviewed by: jhb, -smp@


70038 15-Dec-2000 jhb

When p_ucred is passed to the venus daemon, first grab the proc lock to
protect the p_ucred pointer, obtain a seperate reference to the ucred,
release the lock, and then pass in the new ucred reference.


69958 13-Dec-2000 rwatson

o Tighten restrictions on use of /proc/pid/ctl and move access checks
in ctl to using centralized p_can() inter-process access control
interface.

Reviewed by: sef


69947 13-Dec-2000 jake

- Change the allproc_lock to use a macro, ALLPROC_LOCK(how), instead
of explicit calls to lockmgr. Also provides macros for the flags
pased to specify shared, exclusive or release which map to the
lockmgr flags. This is so that the use of lockmgr can be easily
replaced with optimized reader-writer locks.
- Add some locking that I missed the first time.


69798 09-Dec-2000 des

Add a module version (so that linprocfs can properly depend on procfs)


69781 08-Dec-2000 dwmalone

Convert more malloc+bzero to malloc+M_ZERO.

Submitted by: josh@zipperup.org
Submitted by: Robert Drehmel <robd@gmx.net>


69767 08-Dec-2000 phk

staticize.


69652 06-Dec-2000 jhb

Protect accesses to member of struct proc with the proc lock.


69507 02-Dec-2000 jhb

Protect p_stat with the sched_lock.

Reviewed by: jake


69149 25-Nov-2000 jlemon

Update to reflect the disappearance of getsock().

Found by: LINT


68870 18-Nov-2000 bp

Use vop_defaultop() instead of ntfs_bypass().

PR: kern/22756


68708 14-Nov-2000 mckusick

Missed conversion of CIRCLEQ => TAILQ for mount list.


68505 08-Nov-2000 eivind

More paranoia against overflows


68295 04-Nov-2000 bp

v_interlock is a mutex now, not simple lock.


68259 02-Nov-2000 phk

Take VBLK devices further out of their missery.

This should fix the panic I introduced in my previous commit on this topic.


68199 01-Nov-2000 eivind

Fix overflow from jail hostname.

Bug found by: Esa Etelavuori <eetelavu@cc.hut.fi>


68186 01-Nov-2000 eivind

Give vop_mmap an untimely death. The opportunity to give it a timely
death timed out in 1996.


67895 29-Oct-2000 dwmalone

Make malloc use M_ZERO in some more locations.
Don't check for a null pointer if malloc called with M_WAITOK.

Submitted by: josh@zipperup.org
Submitted by: Robert Drehmel <robd@gmx.net>
Approved by: bp


67893 29-Oct-2000 phk

Move suser() and suser_xxx() prototypes and a related #define from
<sys/proc.h> to <sys/systm.h>.

Correctly document the #includes needed in the manpage.

Add one now needed #include of <sys/systm.h>.
Remove the consequent 48 unused #includes of <sys/proc.h>.


67885 29-Oct-2000 phk

Weaken a bogus dependency on <sys/proc.h> in <sys/buf.h> by #ifdef'ing
the offending inline function (BUF_KERNPROC) on it being #included
already.

I'm not sure BUF_KERNPROC() is even the right thing to do or in the
right place or implemented the right way (inline vs normal function).

Remove consequently unneeded #includes of <sys/proc.h>


67882 29-Oct-2000 phk

Remove unneeded #include <sys/proc.h> lines.


67441 22-Oct-2000 bp

Rev 1.41 was committed from wrong diff, now do it right.


67439 22-Oct-2000 bp

Release and unlock vnode if resource deadlock detected.


67438 22-Oct-2000 bp

Update stale comment.

PR: kern/21805


67437 22-Oct-2000 bp

Remove de_lock field from denode structure and make msdosfs PDIRUNLOCK aware.


67145 15-Oct-2000 bp

Fix nullfs breakage caused by incomplete migration of v_interlock from
simple_lock to mutex.

Reset LK_INTERLOCK flag when interlock released manually.


66894 09-Oct-2000 chris

o Move from Alfred Perstein's "exclusion" technique of handling special
file types to requiring all file types to properly implement fo_stat.
This makes any new file type additions much easier as this code no
longer has to be modified to accomodate it.

o Instead of using curproc in fdesc_allocvp, pass a `struct proc' pointer as
a new fifth parameter.


66886 09-Oct-2000 eivind

Blow away the v_specmountpoint define, replacing it with what it was
defined as (rdev->si_mountpoint)


66877 09-Oct-2000 phk

Don't hold an extra reference to vnodes. Devfs vnodes are sufficiently
cheap to setup that it doesn't really matter that we recycle device
vnodes at kleenex speed.

Implement first cut try at killing cloned devices when they are
not needed anymore. For now only the bpf driver is involved in
this experiment. Cloned devices can set the SI_CHEAPCLONE flag
which allows us to destroy_dev() it when the vcount() drops to zero
and the vnode is reclaimed. For now it's a requirement that the
driver doesn't keep persistent state from close to (re)open.

Some whitespace changes.


66701 05-Oct-2000 alfred

return correct type for process directory entries, DT_DIR not DT_REG


66673 05-Oct-2000 bde

Forward-declare struct mbuf so that this file is less self-insufficient
-- don't depend on garbage in <sys/mount.h>. mbufs aren't actually
used here either. They should have been completely removed from filesystem
interfaces when they were removed from the interfaces to convert between
file handles and vnodes.


66615 04-Oct-2000 jasone

Convert lockmgr locks from using simple locks to using mutexes.

Add lockdestroy() and appropriate invocations, which corresponds to
lockinit() and must be called to clean up after a lockmgr lock is no
longer needed.


66571 03-Oct-2000 bp

Make cd9660 filesystem PDIRUNLOCK aware. Now it can be used in vnode stacks
and nullfs mounts.

Remove now unnecessary i_lock field from the iso_node structure.


66570 03-Oct-2000 bp

Prevent dereference of NULL pointer when null_lock() and null_unlock()
called and there is no underlying vnode.


66540 02-Oct-2000 bp

Protect hash data with lock manager instead of home grown one.

Replace shared lock on vnode with exclusive one. It shouldn't impact
perfomance as NCP protocol doesn't support outstanding requests.

Do not hold simple lock on vnode for long period of time.

Add functionality to the nwfs_print() routine.


66539 02-Oct-2000 bp

Get rid from the legacy __P() macro. Remove 'register' keywords.


66524 02-Oct-2000 peter

PDIRUNLOCK now exists on FreeBSD. Remove the (now incorrect) redefinition.


66356 25-Sep-2000 bp

Fix vnode locking bugs in the nullfs.
Add correct support for v_object management, so mmap() operation should
work properly.
Add support for extattrctl() routine (submitted by semenu).

At this point nullfs can be considered as functional and much more stable.
In fact, it should behave as a "hard" "symlink" to underlying filesystem.

Reviewed in general by: mckusick, dillon
Parts of logic obtained from: NetBSD


66028 18-Sep-2000 phk

Ignore attempts to set flags to zero. This quenches a syslog warning
from login(1).


65920 16-Sep-2000 phk

Add canonical checks to devfs_setattr().


65788 12-Sep-2000 jhb

Use size_t instead of u_int for 4th argument to copyinstr().


65557 07-Sep-2000 jasone

Major update to the way synchronization is done in the kernel. Highlights
include:

* Mutual exclusion is used instead of spl*(). See mutex(9). (Note: The
alpha port is still in transition and currently uses both.)

* Per-CPU idle processes.

* Interrupts are run in their own separate kernel threads and can be
preempted (i386 only).

Partially contributed by: BSDi (BSD/OS)
Submissions by (at least): cp, dfr, dillon, grog, jake, jhb, sheldonh


65515 06-Sep-2000 phk

Add refcounts to the "global" DEVFS inode slots, this allows us
to recycle inodes after a destroy_dev() but not until all mounts
have picked up the change.

Add support for an overflow table for DEVFS inodes. The static
table defaults to 1024 inodes, if that fills, an overflow table
of 32k inodes is allocated. Both numbers can be changed at
compile time, the size of the overflow table also with the
sysctl vfs.devfs.noverflow.

Use atomic instructions to barrier between make_dev()/destroy_dev()
and the mounts.

Add lockmgr() locking of directories for operations accessing or
modifying the directory TAILQs.

Various nitpicking here and there.


65467 05-Sep-2000 bp

Various cleanups towards make nullfs functional (it is still broken
at this point):

Replace all '#ifdef DEBUG' with '#ifdef NULLFS_DEBUG' and add NULLFSDEBUG
macro.

Protect nullfs hash table with lockmgr.

Use proper order of operations when freeing mnt_data.

Return correct fsid in the null_getattr().

Add null_open() function to catch MNT_NODEV (obtained from NetBSD).

Add null_rename() to catch cross-fs rename operations (submitted by
Ustimenko Semen <semen@iclub.nsu.ru>)

Remove duplicate $FreeBSD$ tags.


65464 05-Sep-2000 bp

Get rid from the __P() macros.

Encouraged by: peter


65447 04-Sep-2000 phk

Off by one error.

Submitted by: des


65445 04-Sep-2000 des

Remove a comment that has been not only obsolete but patently wrong for the
last 31 revisions (almost three years).


65374 02-Sep-2000 phk

Avoid the modules madness I inadvertently introduced by making the
cloning infrastructure standard in kern_conf. Modules are now
the same with or without devfs support.

If you need to detect if devfs is present, in modules or elsewhere,
check the integer variable "devfs_present".

This happily removes an ugly hack from kern/vfs_conf.c.

This forces a rename of the eventhandler and the standard clone
helper function.

Include <sys/eventhandler.h> in <sys/conf.h>: it's a helper #include
like <sys/queue.h>

Remove all #includes of opt_devfs.h they no longer matter.


65339 01-Sep-2000 rwatson

o Simplify if/then clause equating ESRCH with ENOENT when hiding a process

Submitted by: des


65331 01-Sep-2000 rwatson

o Make procfs use vaccess() for procfs_access() DAC and super-user checks,
rather than implementing its own {uid,gid,other} checks against vnode
mode. Similar change to linprocfs currently under review.

Obtained from: TrustedBSD Project


65237 30-Aug-2000 rwatson

o Centralize inter-process access control, introducing:

int p_can(p1, p2, operation, privused)

which allows specification of subject process, object process,
inter-process operation, and an optional call-by-reference privused
flag, allowing the caller to determine if privilege was required
for the call to succeed. This allows jail, kern.ps_showallprocs and
regular credential-based interaction checks to occur in one block of
code. Possible operations are P_CAN_SEE, P_CAN_SCHED, P_CAN_KILL,
and P_CAN_DEBUG. p_can currently breaks out as a wrapper to a
series of static function checks in kern_prot, which should not
be invoked directly.

o Commented out capabilities entries are included for some checks.

o Update most inter-process authorization to make use of p_can() instead
of manual checks, PRISON_CHECK(), P_TRESPASS(), and
kern.ps_showallprocs.

o Modify suser{,_xxx} to use const arguments, as it no longer modifies
process flags due to the disabling of ASU.

o Modify some checks/errors in procfs so that ENOENT is returned instead
of ESRCH, further improving concealment of processes that should not
be visible to other processes. Also introduce new access checks to
improve hiding of processes for procfs_lookup(), procfs_getattr(),
procfs_readdir(). Correct a bug reported by bp concerning not
handling the CREATE case in procfs_lookup(). Remove volatile flag in
procfs that caused apparently spurious qualifier warnigns (approved by
bde).

o Add comment noting that ktrace() has not been updated, as its access
control checks are different from ptrace(), whereas they should
probably be the same. Further discussion should happen on this topic.

Reviewed by: bde, green, phk, freebsd-security, others
Approved by: bde
Obtained from: TrustedBSD Project


65200 29-Aug-2000 rwatson

o Restructure vaccess() so as to check for DAC permission to modify the
object before falling back on privilege. Make vaccess() accept an
additional optional argument, privused, to determine whether
privilege was required for vaccess() to return 0. Add commented
out capability checks for reference. Rename some variables to make
it more clear which modes/uids/etc are associated with the object,
and which with the access mode.
o Update file system use of vaccess() to pass NULL as the optional
privused argument. Once additional patches are applied, suser()
will no longer set ASU, so privused will permit passing of
privilege information up the stack to the caller.

Reviewed by: bde, green, phk, -security, others
Obtained from: TrustedBSD Project


65132 27-Aug-2000 phk

Reorder vop's alphabetically.
Smarter use of devfs_allocv() (from bp@)
Introduce devfs_find()
".." fixes to devfs_lookup (from bp@)


65118 26-Aug-2000 phk

Minor cleanups tp devfs_readdir();
Add devfs_read() for directories. (inspired by bp@)


65075 25-Aug-2000 bde

Quick fix for msdsofs_write() on alphas and other machines with either
longs larger than 32 bits or strict alignment requirements.

pm_fatmask had type u_long, but it must have a type that has precisely
32 bits and this type must be no smaller than int, so that ~pmp->pm_fatmask
has no bits above the 31st set. Otherwise, comparisons between (cn
| ~pmp->pm_fatmask) and magic 32-bit "cluster" numbers always fail.
The correct fix is to use the C99 type uint_least32_t and mask with
0xffffffff. The quick fix is to use u_int32_t and assume that ints
have

msdosfs metadata is riddled with unaligned fields, and on alphas,
unaligned_fixup() apparently has problems fixing up the unaligned
accesses caused by this. The quick fix is to not comment out the
NetBSD code that sort of handles this, and define UNALIGNED_ACCESS on
i386's so that the code doesn't change on i386's. The correct fix
would define UNALIGNED_ACCESS in a central machine-dependent header
and maybe add some extra cases to unaligned_fixup(). UNALIGNED_ACCESS
is also tested in isofs.

Submitted by: parts by Mark Abene <phiber@radicalmedia.com>
PR: 19086


65051 24-Aug-2000 phk

Fix panic when removing open device (found by bp@)
Implement subdirs.
Build the full "devicename" for cloning functions.
Fix panic when deleted device goes away.
Collaps devfs_dir and devfs_dirent structures.
Add proper cloning to the /dev/fd* "device-"driver.
Fix a bug in make_dev_alias() handling which made aliases appear
multiple times.
Use devfs_clone to implement getdiskbyname()
Make specfs maintain the stat(2) timestamps per dev_t


64895 21-Aug-2000 phk

Fix devfs_access() bug on directories.

Remove unused #includes.

Bug spotted by: markm


64880 20-Aug-2000 phk

Remove all traces of Julians DEVFS (incl from kern/subr_diskslice.c)

Remove old DEVFS support fields from dev_t.

Make uid, gid & mode members of dev_t and set them in make_dev().

Use correct uid, gid & mode in make_dev in disk minilayer.

Add support for registering alias names for a dev_t using the
new function make_dev_alias(). These will show up as symlinks
in DEVFS.

Use makedev() rather than make_dev() for MFSs magic devices to prevent
DEVFS from noticing this abuse.

Add a field for DEVFS inode number in dev_t.

Add new DEVFS in fs/devfs.

Add devfs cloning to:
disk minilayer (ie: ad(4), sd(4), cd(4) etc etc)
md(4), tun(4), bpf(4), fd(4)

If DEVFS add -d flag to /sbin/inits args to make it mount devfs.

Add commented out DEVFS to GENERIC


64865 20-Aug-2000 phk

Centralize the canonical vop_access user/group/other check in vaccess().

Discussed with: bde


64819 18-Aug-2000 phk

Introduce vop_stdinactive() and make it the default if no vop_inactive
is declared.

Sort and prune a few vop_op[].


63962 28-Jul-2000 sheldonh

Rename the loadable nullfs kernel module: null -> nullfs


63788 24-Jul-2000 mckusick

This patch corrects the first round of panics and hangs reported
with the new snapshot code.

Update addaliasu to correctly implement the semantics of the old
checkalias function. When a device vnode first comes into existence,
check to see if an anonymous vnode for the same device was created
at boot time by bdevvp(). If so, adopt the bdevvp vnode rather than
creating a new vnode for the device. This corrects a problem which
caused the kernel to panic when taking a snapshot of the root
filesystem.

Change the calling convention of vn_write_suspend_wait() to be the
same as vn_start_write().

Split out softdep_flushworklist() from softdep_flushfiles() so that
it can be used to clear the work queue when suspending filesystem
operations.

Access to buffers becomes recursive so that snapshots can recursively
traverse their indirect blocks using ffs_copyonwrite() when checking
for the need for copy on write when flushing one of their own indirect
blocks. This eliminates a deadlock between the syncer daemon and a
process taking a snapshot.

Ensure that softdep_process_worklist() can never block because of a
snapshot being taken. This eliminates a problem with buffer starvation.

Cleanup change in ffs_sync() which did not synchronously wait when
MNT_WAIT was specified. The result was an unclean filesystem panic
when doing forcible unmount with heavy filesystem I/O in progress.

Return a zero'ed block when reading a block that was not in use at
the time that a snapshot was taken. Normally, these blocks should
never be read. However, the readahead code will occationally read
them which can cause unexpected behavior.

Clean up the debugging code that ensures that no blocks be written
on a filesystem while it is suspended. Snapshots must explicitly
label the blocks that they are writing during the suspension so that
they do not cause a `write on suspended filesystem' panic.

Reorganize ffs_copyonwrite() to eliminate a deadlock and also to
prevent a race condition that would permit the same block to be
copied twice. This change eliminates an unexpected soft updates
inconsistency in fsck caused by the double allocation.

Use bqrelse rather than brelse for buffers that will be needed
soon again by the snapshot code. This improves snapshot performance.


63141 14-Jul-2000 dwmalone

Certain error contitions cause msdosfs_rename() to decrement the
vnode reference count on 'fdvp' more times than it should.

PR: 17347
Submitted by: Ian Dowse <iedowse@maths.tcd.ie>
Approved by: bde


62976 11-Jul-2000 mckusick

Add snapshots to the fast filesystem. Most of the changes support
the gating of system calls that cause modifications to the underlying
filesystem. The gating can be enabled by any filesystem that needs
to consistently suspend operations by adding the vop_stdgetwritemount
to their set of vnops. Once gating is enabled, the function
vfs_write_suspend stops all new write operations to a filesystem,
allows any filesystem modifying system calls already in progress
to complete, then sync's the filesystem to disk and returns. The
function vfs_write_resume allows the suspended write operations to
begin again. Gating is not added by default for all filesystems as
for SMP systems it adds two extra locks to such critical kernel
paths as the write system call. Thus, gating should only be added
as needed.

Details on the use and current status of snapshots in FFS can be
found in /sys/ufs/ffs/README.snapshot so for brevity and timelyness
is not included here. Unless and until you create a snapshot file,
these changes should have no effect on your system (famous last words).


62573 04-Jul-2000 phk

Previous commit changing SYSCTL_HANDLER_ARGS violated KNF.

Pointed out by: bde


62472 03-Jul-2000 phk

Pull the rug under block mode devices. they return ENXIO on open(2) now.


62454 03-Jul-2000 phk

Style police catches up with rev 1.26 of src/sys/sys/sysctl.h:

Sanitize SYSCTL_HANDLER_ARGS so that simplistic tools can grog our
sources:

-sysctl_vm_zone SYSCTL_HANDLER_ARGS
+sysctl_vm_zone (SYSCTL_HANDLER_ARGS)


62228 29-Jun-2000 bp

Fix memory leakage on module unload.

Spotted by: fixed INVARIANTS code


62227 29-Jun-2000 bp

Fix memory leakage on module unload.

Spotted by: fixed INVARIANTS code


62219 28-Jun-2000 chris

fdesc_getattr:
Don't fake any file types, just set vap->va_type to IFTOVT(stb.st_mode).
If something does not report its mode, vap->va_type is set to VNON
accordingly.


62184 27-Jun-2000 alfred

by changing the logic here we can support dynamic additions of new
filetypes.

Reviewed by: green


62182 27-Jun-2000 alfred

if there are leading zeros fail the lookup

Pointed out by: Alexander Viro <viro@math.psu.edu>


62048 25-Jun-2000 bp

Remove obsolete comment.

Submitted by: Marius Bendiksen <mbendiks@eunet.no>


61884 20-Jun-2000 chris

Rename the `VRXEC' macro used to clear read and exec bits to `FDRX' so
as not to impede upon VFS namespace.


61724 16-Jun-2000 phk

Virtualizes & untangles the bioops operations vector.

Ref: Message-ID: <18317.961014572@critter.freebsd.dk> To: current@


61716 15-Jun-2000 chris

Remove unused include <sys/socketvar.h>.


61712 15-Jun-2000 chris

Replace vattr_null() with VATTR_NULL() and do not explicity set vattr
fields to VNOVAL afterwards.


61572 12-Jun-2000 jmb

before this commit, specfs reported disk partitions
using decimal major and minor numbers. "ls -l" reports
disk partitions using decimal major numbers and hex
minor numbers.

make specfs use decimal major numbers and hex minor numbers,
just like "ls -l"


61315 06-Jun-2000 chris

Instead of completely disallowing VOP_SETATTR, just do it where there is
an underlying vnode.

Suggested by: bde


61173 02-Jun-2000 chris

Update the comment for fdesc_setattr to reflect that we no longer
actually setattr() on underlying vnodes.


61172 02-Jun-2000 chris

- Do not allow VOP_SETATTR to modify underlying vnodes at all. This caused
problems when fetch(1) was passed `-o -'. The rationale of this change
is that applications attempting to change underlying vnodes for /dev/fd
nodes are improperly written and the use of this interface should not
ever have been encouraged. Proper alternatives are fchmod, fchown and
others.

PR: 18952

- Remove stale, unused fdescnode->fd_link structure member.


60938 26-May-2000 jake

Back out the previous change to the queue(3) interface.
It was not discussed and should probably not happen.

Requested by: msmith and others


60833 23-May-2000 jake

Change the way that the queue(3) structures are declared; don't assume that
the type argument to *_HEAD and *_ENTRY is a struct.

Suggested by: phk
Reviewed by: phk
Approved by: mdodd


60406 11-May-2000 chris

Adapt fdesc to be mounted on /dev/fd and remove fd, stdin, stdout and
stderr nodes. More specific items of this patch:
o Removed support for symbolic links, and the need for
fdesc_readlink().
o Put all the code from fdesc_attr() into fdesc_getattr() and removed
fdesc_attr(). This also made it easier to properly give all nodes
unique inode numbers.
o The removal of all non-fd nodes allowed the removal of the fdesc_read(),
fdesc_write(), and fdesc_ioctl() nodes, since we no longer have nodes
that get special handling.
o Correct the component name validity-checking in fdesc_lookup(). It
previously detected the end of the string by checking for a terminating
NUL, now it uses cnp->cn_namelen.
o Handle kqueue files as FIFOs. This is probably the closest file type
to represent this type of file there is, and it is unfortunately not
very representative of a kqueue. Creation time is not supported by
kqueue, so ctime, mtime and atime are all set to the current time when
getattr() was called.
o Also set st_[mca]time to the current time since there's no data in
socket structures that can be used to fill this in (FIFOs).
o Simplify fdesc_readdir() since it only has to report the numbered
fd nodes. Add `.' and `..' directory links as well.
o Remove read bits from directories as they tend to confuse programs
like tar(1).

Reviewed by: phk
Discussed with: bde (earlier on, not quite review)


60281 09-May-2000 phk

Change the "bdev-whiner" to whine when open is attempted and extend
the deadline a month.


60041 05-May-2000 phk

Separate the struct bio related stuff out of <sys/buf.h> into
<sys/bio.h>.

<sys/bio.h> is now a prerequisite for <sys/buf.h> but it shall
not be made a nested include according to bdes teachings on the
subject of nested includes.

Diskdrivers and similar stuff below specfs::strategy() should no
longer need to include <sys/buf.> unless they need caching of data.

Still a few bogus uses of struct buf to track down.

Repocopy by: peter


59914 03-May-2000 phk

Remove 42 unneeded #include <sys/ioccom.h>.

ioccom.h defines only implementation detail, and should therefore
only be included from the #include which defines the ioctl tags,
in other words: never include it from *.c


59874 01-May-2000 peter

Add $FreeBSD$


59794 30-Apr-2000 phk

Remove unneeded #include <vm/vm_zone.h>

Generated by: src/tools/tools/kerninclude


59760 29-Apr-2000 phk

Remove unneeded #include <sys/kernel.h>


59755 29-Apr-2000 peter

nwfs depends on ncp


59652 26-Apr-2000 green

Move procfs_fullpath() to vfs_cache.c, with a rename to textvp_fullpath().
There's no excuse to have code in synthetic filestores that allows direct
references to the textvp anymore.

Feature requested by: msmith
Feature agreed to by: warner
Move requested by: phk
Move agreed to by: bde


59522 22-Apr-2000 green

Quiet an unused variable warning by commenting out a variable declaration
that goes with a commented out statement.


59482 22-Apr-2000 green

There's no reason to make "file" 0500 rather than 0555.


59481 22-Apr-2000 green

Welcome back our old friend from procfs, "file"!


59391 19-Apr-2000 phk

Remove ~25 unneeded #include <sys/conf.h>
Remove ~60 unneeded #include <sys/malloc.h>


59368 18-Apr-2000 phk

Remove unneeded <sys/buf.h> includes.

Due to some interesting cpp tricks in lockmgr, the LINT kernel shrinks
by 924 bytes.


59288 16-Apr-2000 jlemon

Introduce kqueue() and kevent(), a kernel event notification facility.


59249 15-Apr-2000 phk

Complete the bio/buf divorce for all code below devfs::strategy

Exceptions:
Vinum untouched. This means that it cannot be compiled.
Greg Lehey is on the case.

CCD not converted yet, casts to struct buf (still safe)

atapi-cd casts to struct buf to examine B_PHYS


59241 15-Apr-2000 rwatson

Introduce extended attribute support for FFS, allowing arbitrary
(name, value) pairs to be associated with inodes. This support is
used for ACLs, MAC labels, and Capabilities in the TrustedBSD
security extensions, which are currently under development.

In this implementation, attributes are backed to data vnodes in the
style of the quota support in FFS. Support for FFS extended
attributes may be enabled using the FFS_EXTATTR kernel option
(disabled by default). Userland utilities and man pages will be
committed in the next batch. VFS interfaces and man pages have
been in the repo since 4.0-RELEASE and are unchanged.

o ufs/ufs/extattr.h: UFS-specific extattr defines
o ufs/ufs/ufs_extattr.c: bulk of support routines
o ufs/{ufs,ffs,mfs}/*.[ch]: hooks and extattr.h includes
o contrib/softupdates/ffs_softdep.c: extattr.h includes
o conf/options, conf/files, i386/conf/LINT: added FFS_EXTATTR

o coda/coda_vfsops.c: XXX required extattr.h due to ufsmount.h
(This should not be the case, and will be fixed in a future commit)

Currently attributes are not supported in MFS. This will be fixed.

Reviewed by: adrian, bp, freebsd-fs, other unthanked souls
Obtained from: TrustedBSD Project


59034 05-Apr-2000 bp

Try to obtain timezone offset from an environment of mount program.
This helps in cases where CMOS clock set to UTC time.


58934 02-Apr-2000 phk

Move B_ERROR flag to b_ioflags and call it BIO_ERROR.

(Much of this done by script)

Move B_ORDERED flag to b_ioflags and call it BIO_ORDERED.

Move b_pblkno and b_iodone_chain to struct bio while we transition, they
will be obsoleted once bio structs chain/stack.

Add bio_queue field for struct bio aware disksort.

Address a lot of stylistic issues brought up by bde.


58706 27-Mar-2000 dillon

Commit the buffer cache cleanup patch to 4.x and 5.x. This patch fixes a
fragmentation problem due to geteblk() reserving too much space for the
buffer and imposes a larger granularity (16K) on KVA reservations for
the buffer cache to avoid fragmentation issues. The buffer cache size
calculations have been redone to simplify them (fewer defines, better
comments, less chance of running out of KVA).

The geteblk() fix solves a performance problem that DG was able reproduce.

This patch does not completely fix the KVA fragmentation problems, but
it goes a long way

Mostly Reviewed by: bde and others
Approved by: jkh


58349 20-Mar-2000 phk

Rename the existing BUF_STRATEGY() to DEV_STRATEGY()

substitute BUF_WRITE(foo) for VOP_BWRITE(foo->b_vp, foo)

substitute BUF_STRATEGY(foo) for VOP_STRATEGY(foo->b_vp, foo)

This patch is machine generated except for the ccd.c and buf.h parts.


58345 20-Mar-2000 phk

Remove B_READ, B_WRITE and B_FREEBUF and replace them with a new
field in struct buf: b_iocmd. The b_iocmd is enforced to have
exactly one bit set.

B_WRITE was bogusly defined as zero giving rise to obvious coding
mistakes.

Also eliminate the redundant struct buf flag B_CALL, it can just
as efficiently be done by comparing b_iodone to NULL.

Should you get a panic or drop into the debugger, complaining about
"b_iocmd", don't continue. It is likely to write on your disk
where it should have been reading.

This change is a step in the direction towards a stackable BIO capability.

A lot of this patch were machine generated (Thanks to style(9) compliance!)

Vinum users: Greg has not had time to test this yet, be careful.


58132 16-Mar-2000 phk

Eliminate the undocumented, experimental, non-delivering and highly
dangerous MAX_PERF option.


56674 27-Jan-2000 nyan

Supported non-512 bytes/sector format.

PR: misc/12992
Submitted by: chi@bd.mbn.or.jp (Chiharu Shibata) and
Dmitrij Tejblum <tejblum@arc.hq.cti.ru>
Reviewed by: Dmitrij Tejblum <tejblum@arc.hq.cti.ru>


56272 19-Jan-2000 rwatson

Fix bde'isms in acl/extattr syscall interface, renaming syscalls to
prettier (?) names, adding some const's around here, et al.

Reviewed by: bde


56034 15-Jan-2000 bp

Check if module was compiled without SMP support and running on
an SMP system.


56033 15-Jan-2000 bp

Add VT_NWFS tag.


55991 14-Jan-2000 bde

Forward declare some structs so that this header is more self-suifficent.


55989 14-Jan-2000 bde

Use MALLOC_DECLARE when it is #defined, not when a (wrong) test of
__FreeBSD_version succeeds.


55765 10-Jan-2000 phk

remove check now done in vn_isdisk().


55756 10-Jan-2000 phk

Give vn_isdisk() a second argument where it can return a suitable errno.

Suggested by: bde


55594 08-Jan-2000 bp

Treat negative uio_offset value as eof (idea by: bde).
Prevent overflows by casting uio_offset to uoff_t.
Return correct error number if directory entry is broken.

Reviewed by: bde


55311 02-Jan-2000 phk

Return ENXIO if there is no device.


55308 02-Jan-2000 bp

Fix the mess with signed/unsigned longs and ints (inspired by bde).
Fix potential bug with directory reading.
Explicitly limit file size to 4GB (msdos can't handle larger files).
Slightly reorganize msdosfs_read() to reduce number of 'if's.


55206 29-Dec-1999 peter

Change #ifdef KERNEL to #ifdef _KERNEL in the public headers. "KERNEL"
is an application space macro and the applications are supposed to be free
to use it as they please (but cannot). This is consistant with the other
BSD's who made this change quite some time ago. More commits to come.


55190 28-Dec-1999 bp

Avoid to write garbage if uiomove fails.


55189 28-Dec-1999 bp

Fix an overflow in the msdosfs_read() function which exposed on the files
with size > 2GB.

PR: 15639
Submitted by: Tim Kientzle <kientzle@acm.org>
Reviewed by: phk


55188 28-Dec-1999 bp

It is possible that number of sectors specified in the BPB
will exceed FAT capacity. This will lead to kernel panic while other
systems just limit number of clusters.

PR: 4381, 15136
Reviewed by: phk


55153 27-Dec-1999 peter

Fix typo "," vs ";"

PR: 15696
Submitted by: Takashi Okumura <taka@cs.pitt.edu>


54932 21-Dec-1999 chris

Fix a typo that was doing something kind of silly, and that is initializing
the creation time for files to the uninitialized value:

vap->va_ctime = vap->va_ctime;

Changed to what was intended, assigning it to the modification time (thus
making all three values of access time, modification time and creation time
the same thing).

Reviewed by: grog


54908 20-Dec-1999 eivind

Include vm/vm_extern.h to get at prototypes


54803 19-Dec-1999 rwatson

Second pass commit to introduce new ACL and Extended Attribute system
calls, vnops, vfsops, both in /kern, and to individual file systems that
require a vfsop_ array entry.

Reviewed by: eivind


54655 15-Dec-1999 eivind

Introduce NDFREE (and remove VOP_ABORTOP)


54519 12-Dec-1999 peter

Fix pointer problem for the Alpha


54479 12-Dec-1999 bp

Bump local version number to 1.3.4.


54444 11-Dec-1999 eivind

Lock reporting and assertion changes.
* lockstatus() and VOP_ISLOCKED() gets a new process argument and a new
return value: LK_EXCLOTHER, when the lock is held exclusively by another
process.
* The ASSERT_VOP_(UN)LOCKED family is extended to use what this gives them
* Extend the vnode_if.src format to allow more exact specification than
locked/unlocked.

This commit should not do any semantic changes unless you are using
DEBUG_VFS_LOCKS.

Discussed with: grog, mch, peter, phk
Reviewed by: peter


54424 11-Dec-1999 peter

Don't simulate a pseudo address-space beyond VM_MAXUSER_ADDRESS that
maps onto the upages. We used to use this extensively, particularly
for ps and gdb. Both of these have been "fixed". ps gets the p_stats
via eproc along with all the other stats, and gdb uses the regs, fpregs
etc files.

Once apon a time the UPAGES were mapped here, but that changed back
in January '96. This essentially kills my revisions 1.16 and 1.17.
The 2-page "hole" above the stack can be reclaimed now.


54371 09-Dec-1999 semenu

First version of HPFS stuff.


54292 08-Dec-1999 phk

Remove unused #includes.

Obtained from: http://bogon.freebsd.dk/include


54272 07-Dec-1999 sos

Commit the kernel part of our DVD support. Nothing much to say really,
its just a number of new ioctl's, the rest is done in userland.


54095 03-Dec-1999 semenu

Merged NetBSD version, as they have done improvements:
1. ntfs_read*attr*() functions now accept
uio structure to eliminate one data copying.
2. found and removed deadlock caused
by 6 concurent ls -lR.
3. started implementation of nromal
Unicode<->unix recodeing.

Obtained from: NetBSD


53975 01-Dec-1999 mckusick

Collect read and write counts for filesystems. This new code
drops the counting in bwrite and puts it all in spec_strategy.
I did some tests and verified that the counts collected for writes
in spec_strategy is identical to the counts that we previously
collected in bwrite. We now also get read counts (async reads
come from requests for read-ahead blocks). Note that you need
to compile a new version of mount to get the read counts printed
out. The old mount binary is completely compatible, the only
reason to install a new mount is to get the read counts printed.

Submitted by: Craig A Soules <soules+@andrew.cmu.edu>
Reviewed by: Kirk McKusick <mckusick@mckusick.com>


53773 27-Nov-1999 bp

Remove abuse of struct nameidata.

Pointed by: Eivind Eklund


53709 26-Nov-1999 phk

Add a sysctl to control if argv is disclosed to the world:
kern.ps_argsopen
It defaults to 1 which means that all users can see all argvs in ps(1).

Reviewed by: Warner


53518 21-Nov-1999 phk

Introduce the new function
p_trespass(struct proc *p1, struct proc *p2)
which returns zero or an errno depending on the legality of p1 trespassing
on p2.

Replace kern_sig.c:CANSIGNAL() with call to p_trespass() and one
extra signal related check.

Replace procfs.h:CHECKIO() macros with calls to p_trespass().

Only show command lines to process which can trespass on the target
process.


53509 21-Nov-1999 bp

Remove race condition under SMP.

Noted by: Denis Kalinin <denis@mail.rbc.ru>


53503 21-Nov-1999 phk

s/p_cred->pc_ucred/p_ucred/g


53467 20-Nov-1999 sef

A process should be able to examine itself.


53452 20-Nov-1999 phk

struct mountlist and struct mount.mnt_list have no business being
a CIRCLEQ. Change them to TAILQ_HEAD and TAILQ_ENTRY respectively.

This removes ugly mp != (void*)&mountlist comparisons.

Requested by: phk
Submitted by: Jake Burkholder jake@checker.org
PR: 14967


53364 18-Nov-1999 peter

Fix an unused variable warning.


53359 18-Nov-1999 peter

Fix a warning.


53301 17-Nov-1999 phk

Make proc/*/cmdline use the cached argv if available.

Submitted by: Paul Saab <paul@mu.org>
Reviewed by: phk


53300 17-Nov-1999 phk

The function `procfs_getattr()' in procfs doesn't set the value of
vap->va_fsid, so we cannot get valid information about procfs.

Submitted by: SAWADA Mizuki miz@pa.aix.or.jp
Reviewed by: phk
PR: 1654


53131 13-Nov-1999 eivind

Remove WILLRELE from VOP_SYMLINK

Note: Previous commit to these files (except coda_vnops and devfs_vnops)
that claimed to remove WILLRELE from VOP_RENAME actually removed it from
VOP_MKNOD.


53101 12-Nov-1999 eivind

Remove WILLRELE from VOP_RENAME


53059 09-Nov-1999 phk

Next step in the device cleanup process.

Correctly lock vnodes when calling VOP_OPEN() from filesystem mount code.

Unify spec_open() for bdev and cdev cases.

Remove the disabled bdev specific read/write code.


53045 09-Nov-1999 alc

Passing "0" or "FALSE" as the fourth argument to vm_fault is wrong. It
should be "VM_FAULT_NORMAL".


53017 08-Nov-1999 phk

remove a confusing and stale comment.


53016 08-Nov-1999 phk

Oops, a bit too hasty there.


53010 08-Nov-1999 phk

Various cleanups.


52990 08-Nov-1999 sef

Explain why Warner is right, and I am wrong, in the removing of the
file object. Also explain some possible directions to re-implement it --
I'm not sure it should be, given the minimal application use. (Other
than having the debugger automatically access the symbols for a process,
the main use I'd found was with some minor accounting ability, but _that_
depends on it being in the filesystem space; an ioctl access method would
be useless in that case.)

This is a code-less change; only a comment has been added.


52988 08-Nov-1999 peter

Update for fileops.fo_stat() addition. Note, this would panic if
it saw a DTYPE_PIPE. This isn't quite right but should stop a crash.


52971 07-Nov-1999 phk

Use vop_panic() instead of spec_badop().


52967 07-Nov-1999 phk

Remove the iskmemdev() function. Make it the responsibility of the mem.c
drivers to enforce the securelevel checks.


52961 07-Nov-1999 sef

Make an incredibly stupid change because Warner threatened to do it and
continue doing it despite objections by me (the principal author).

Note that this doesn't fix the real problem -- the real problem is generally
bad setup by ignorant users, and education is the right way to fix it.

So while this doesn't actually solve the prolem mentioned in the complaint
(since it's still possible to do it via other methods, although they mostly
involve a bit more complicity), and there are better methods to do this,
nobody was willing or able to provide me with a real world example that
couldn't be worked around using the existing permissions and group
mechanism. And therefore, security by removing features is the method of
the day.

I only had three applications that used it, in any event. One of them would
have made debugging easier, but I still haven't finished it, and won't
now, so it doesn't really matter.


52814 02-Nov-1999 archie

Change structure field named 'toupper' to 'to_upper' to avoid conflict
with the macro of the same name. Same thing for 'tolower'.


52782 01-Nov-1999 msmith

Newline-terminate the complaint message about not being able to find
the root vnode pointer.


52728 01-Nov-1999 phk

Remove specfs::vop_lookup() There is no code path which can call it.


52719 31-Oct-1999 bp

Bump version number to sync with ncplib 1.3.3


52635 29-Oct-1999 phk

useracc() the prequel:

Merge the contents (less some trivial bordering the silly comments)
of <vm/vm_prot.h> and <vm/vm_inherit.h> into <vm/vm.h>. This puts
the #defines for the vm_inherit_t and vm_prot_t types next to their
typedefs.

This paves the road for the commit to follow shortly: change
useracc() to use VM_PROT_{READ|WRITE} rather than B_{READ|WRITE}
as argument.


52399 20-Oct-1999 dillon

A tentative agreement has been reached in regards to a procedure
to remove 'b'lock devices. The agreement is, essentially, that
block devices will be collapsed into character devices as a first
step (though I don't particularly agree), and raw device names 'rxxx'
will become simply 'xxx' in devfs in the second step (i.e. no 'rxxx'
names will exist). The renaming will not effect the original /dev
and the expectation is that devfs will eventually (but not immediately)
become the standard way to access devices in the system.

If it is determined that a reimplementation of block device access
characteristics is beneficial, a number of alternatives will
be possible that do not involve resurrecting the 'b'lock device class.
For example, an ioctl() that might be made on an open character device
descriptor or a generic buffered overlay device.

This commit removes the blockdev disablement sysctl which does not
apply to the solution that was reached.


52385 18-Oct-1999 phk

Change the default for the vfs.bdev_buffered sysctl to zero.

This means that access to block devices nodes will act the
same as char device nodes for disk-like devices.

If you encounter problems after this, where programs accessing
disks directly fail to operate, please use the following command
to revert to previous behaviour:

sysctl -w vfs.bdev_buffered=1

And verify that this was indeed the cause of your trouble.

See the mail-archives of the arch@FreeBSD.org list for background.


52230 14-Oct-1999 bp

Under some condition vnode can reference itself.


52229 14-Oct-1999 bp

Isolate old constant NCP_VOLNAME_LEN.


52152 12-Oct-1999 bp

Remove unnessary includes.


52137 11-Oct-1999 phk

remove unused #includes


52034 08-Oct-1999 phk

Add a couple of strategic KASSERTs


52032 08-Oct-1999 phk

Add back sysctl vfs.enable_userblk_io


51983 07-Oct-1999 bp

Put back cn_namelen initialization. Removed by phk in rev 1.2.


51929 04-Oct-1999 phk

Warn once per driver about dev_t's not registered with make_dev().


51926 04-Oct-1999 phk

Move the buffered read/write code out of spec_{read|write} and into
two new functions spec_buf{read|write}.

Add sysctl vfs.bdev_buffered which defaults to 1 == true. This
sysctl can be used to experimentally turn buffered behaviour for
bdevs off. I should not be changed while any blockdevices are
open. Remove the misplaced sysctl vfs.enable_userblk_io.

No other changes in behaviour.


51906 03-Oct-1999 phk

Before we start to mess with the VFS name-cache clean things up a little bit:
Isolate the namecache in its own file, and give it a dedicated malloc type.


51852 02-Oct-1999 bp

Import kernel part of ncplib: netncp and nwfs

Reviewed by: msmith, peter
Obtained from: ncplib


51808 30-Sep-1999 phk

Remove the D_NOCLUSTER[RW] options which were added because vn had
problems. Now that Matt has fixed vn, this can go. The vn driver
should have used d_maxio (now si_iosize_max) anyway.


51797 29-Sep-1999 phk

Remove v_maxio from struct vnode.

Replace it with mnt_iosize_max in struct mount.

Nits from: bde


51791 29-Sep-1999 marcel

sigset_t change (part 2 of 5)
-----------------------------

The core of the signalling code has been rewritten to operate
on the new sigset_t. No methodological changes have been made.
Most references to a sigset_t object are through macros (see
signalvar.h) to create a level of abstraction and to provide
a basis for further improvements.

The NSIG constant has not been changed to reflect the maximum
number of signals possible. The reason is that it breaks
programs (especially shells) which assume that all signals
have a non-null name in sys_signame. See src/bin/sh/trap.c
for an example. Instead _SIG_MAXSIG has been introduced to
hold the maximum signal possible with the new sigset_t.

struct sigprop has been moved from signalvar.h to kern_sig.c
because a) it is only used there, and b) access must be done
though function sigprop(). The latter because the table doesn't
holds properties for all signals, but only for the first NSIG
signals.

signal.h has been reorganized to make reading easier and to
add the new and/or modified structures. The "old" structures
are moved to signalvar.h to prevent namespace polution.

Especially the coda filesystem suffers from the change, because
it contained lines like (p->p_sigmask == SIGIO), which is easy
to do for integral types, but not for compound types.

NOTE: kdump (and port linux_kdump) must be recompiled.

Thanks to Garrett Wollman and Daniel Eischen for pressing the
importance of changing sigreturn as well.


51747 28-Sep-1999 dillon

Make sure file after VOP_OPEN is VMIO'd when transfering control from
a lower layer to an upper layer. I'm not sure how necessary this is
for reading.

Fix bug in union_lookup() (note: there are probably still several bugs
in union_lookup()). This one set lerror as a side effect without
setting lowervp, causing copyup code further on down to crash on a null
lowervp pointer. Changed the side effect to use a temporary variable
instead.


51688 26-Sep-1999 dillon

This is a major fixup of unionfs. At least 30 serious bugs have been
fixed (many due to changing semantics in other parts of the kernel and not
the original author's fault), including one critical one: unionfs could
cause UFS corruption in the fronting store due to calling VOP_OPEN for
writing without turning on vmio for the UFS vnode.

Most of the bugs were related to semantics changes in VOP calls, lock
ordering problems (causing deadlocks), improper handling of a read-only
backing store (such as an NFS mount), improper referencing and locking
of vnodes, not using real struct locks for vnode locking, not using
recursive locks when accessing the fronting store, and things like that.

New functionality has been added: unionfs now has mmap() support, but
only partially tested, and rename has been enhanced considerably.

There are still some things that unionfs cannot do. You cannot
rename a directory without confusing unionfs, and there are issues
with softlinks, hardlinks, and special files. unionfs mostly doesn't
understand them (and never did).

There are probably still panic situations, but hopefully no where near
as many as before this commit.

The unionfs in this commit has been tested overlayed on /usr/src
(backing /usr/src being a read-only NFS mount, fronting /usr/src being
a local filesystem). kernel builds have been tested, buildworld is
undergoing testing. More testing is necessary.


51662 25-Sep-1999 phk

Remove a warning check which was too general.


51658 25-Sep-1999 phk

Remove five now unused fields from struct cdevsw. They should never
have been there in the first place. A GENERIC kernel shrinks almost 1k.

Add a slightly different safetybelt under nostop for tty drivers.

Add some missing FreeBSD tags


51654 25-Sep-1999 phk

This patch clears the way for removing a number of tty related
fields in struct cdevsw:

d_stop moved to struct tty.
d_reset already unused.
d_devtotty linkage now provided by dev_t->si_tty.

These fields will be removed from struct cdevsw together with
d_params and d_maxio Real Soon Now.

The changes in this patch consist of:

initialize dev->si_tty in *_open()
initialize tty->t_stop
remove devtotty functions
rename ttpoll to ttypoll
a few adjustments to these changes in the generic code
a bump of __FreeBSD_version
add a couple of FreeBSD tags


51558 22-Sep-1999 phk

Kill the cdevsw->d_maxio field.

d_maxio is replaced by the dev->si_iosize_max field which the driver
should be set in all calls to cdevsw->d_open if it has a better
idea than the system wide default.

The field is a generic dev_t field (ie: not disk specific) so that
tapes and other devices can use physio as well.


51486 20-Sep-1999 dillon

More removals of vnode->v_lastr, replaced by preexisting seqcount
heuristic to detect sequential operation.

VM-related forced clustering code removed from ufs in preparation for a
commit to vm/vm_fault.c that does it more generally.

Reviewed by: David Greenman <dg@root.com>, Alan Cox <alc@cs.rice.edu>


51485 20-Sep-1999 dillon

Fix handling of a device EOF that occurs in the middle of a block. The
transfer size calculation was incorrect resulting in the last read being
potentially larger then the actual extent of the device.

EOF and write handling has not yet been fixed.

Reviewed by: Tor.Egge@fast.no


51479 20-Sep-1999 phk

Step one of replacing devsw->d_maxio with si_bsize_max.

Rename dev->si_bsize_max to si_iosize_max and set it in spec_open
if the device didn't.

Set vp->v_maxio from dev->si_bsize_max in spec_open rather than
in ufs_bmap.c


51345 17-Sep-1999 dillon

Add vfs.enable_userblk_io sysctl to control whether user reads and writes
to buffered block devices are allowed. The default is to be backwards
compatible, i.e. reads and writes are allowed.

The idea is for a larger crowd to start running with this disabled and
see what problems, if any, crop up, and then to change the default to
off and see if any problems crop up in the next 6 months prior to
potentially removing support entirely. There are still a few people,
Julian and myself included, who believe the buffered block device
access from usermode to be useful.

Remove use of vnode->v_lastr from buffered block device I/O in
preparation for removal of vnode->v_lastr field, replacing it with
the already existing seqcount metric to detect sequential operation.

Reviewed by: Alan Cox <alc@cs.rice.edu>, David Greenman <dg@root.com>


51138 11-Sep-1999 alfred

Seperate the export check in VFS_FHTOVP, exports are now checked via
VFS_CHECKEXP.

Add fh(open|stat|stafs) syscalls to allow userland to query filesystems
based on (network) filehandle.

Obtained from: NetBSD


51111 09-Sep-1999 julian

Changes to centralise the default blocksize behaviour.
More likely to follow.

Submitted by: phk@freebsd.org


51068 07-Sep-1999 alfred

All unimplemented VFS ops now have entries in kern/vfs_default.c that return
reasonable defaults.

This avoids confusing and ugly casting to eopnotsupp or making dummy functions.
Bogus casting of filesystem sysctls to eopnotsupp() have been removed.

This should make *_vfsops.c more readable and reduce bloat.

Reviewed by: msmith, eivind
Approved by: phk
Tested by: Jeroen Ruigrok/Asmodai <asmodai@wxs.nl>


50890 04-Sep-1999 bde

Get rid of the NULLFS_DIAGNOSTIC option. This option was as useful as
the other XXXFS_DIAGNOSTIC options (not very) and mostly controlled
tracing of normal operation. Use `#ifdef DEBUG' for non-diagnostics
and `#ifdef DIAGNOSTIC' for diagnostics.


50888 04-Sep-1999 bde

Fixed the previous change. Some more code controlled by UMAPFS_DIAGNOSTIC
is actually for diagnostics; control it with DIAGNOSTIC and not DDB.


50839 03-Sep-1999 julian

Print out the device name when there is an uninitialised IO size or IO error
in spec_getpages().

Submitted by: phk suggested the idea.


50835 03-Sep-1999 julian

Add a catchall to set default blocksize values for disk like devices.

Submitted by: phk@freebsd.org


50830 03-Sep-1999 julian

Revert a bunch of contraversial changes by PHK. After
a quick think and discussion among various people some form of some of
these changes will probably be recommitted.

The reversion requested was requested by dg while discussions proceed.
PHK has indicated that he can live with this, and it has been agreed
that some form of some of these changes may return shortly after further
discussion.


50752 01-Sep-1999 phk

Fix the sense of the vn_isdisk() check.


50715 31-Aug-1999 phk

Set the buffersize for non BSDFFS labeled partitions to
max(dev->si_bsize_phys, BLKDEV_IOSIZE).

Requested by: davidg


50714 31-Aug-1999 phk

Make buffered acces to bdevs from userland controllable with
a sysctl vfs.bdev_access.


50623 30-Aug-1999 phk

Make bdev userland access work like cdev userland access unless
the highly non-recommended option ALLOW_BDEV_ACCESS is used.

(bdev access is evil because you don't get write errors reported.)

Kill si_bsize_best before it kills Matt :-)

Use the specfs routines rather having cloned copies in devfs.


50616 30-Aug-1999 bde

Converted the silly SAFTEY option into a new-style option by renaming it to
DIAGNOSTIC.

Fixed an English style bug in the panic messages controlled by SAFETY.


50554 29-Aug-1999 bde

Changed old-style option UNION_DIAGNOSTIC to DEBUG and fixed printf
format errors exposed by this. It has nothing to do with diagnostics
since it does little more than control tracing of normal operation.
Actual diagnostics for the union file system are still controlled by
the DIAGNOSTIC option.


50553 29-Aug-1999 bde

Changed old-style options UMAPFS_DIAGNOSTIC and UMAP_DIAGNOSTIC to DEBUG
or DDB and fixed printf format errors exposed by this. The options had
little to do with diagnostics; they mostly controlled tracing of normal
operation.


50523 28-Aug-1999 phk

Fix various trivial warnings from LINT


50477 28-Aug-1999 peter

$Id$ -> $FreeBSD$


50405 26-Aug-1999 phk

Simplify the handling of VCHR and VBLK vnodes using the new dev_t:

Make the alias list a SLIST.

Drop the "fast recycling" optimization of vnodes (including
the returning of a prexisting but stale vnode from checkalias).
It doesn't buy us anything now that we don't hardlimit
vnodes anymore.

Rename checkalias2() and checkalias() to addalias() and
addaliasu() - which takes dev_t and udev_t arg respectively.

Make the revoke syscalls use vcount() instead of VALIASED.

Remove VALIASED flag, we don't need it now and it is faster
to traverse the much shorter lists than to maintain the
flag.

vfs_mountedon() can check the dev_t directly, all the vnodes
point to the same one.

Print the devicename in specfs/vprint().

Remove a couple of stale LFS vnode flags.

Remove unimplemented/unused LK_DRAINED;


50347 25-Aug-1999 phk

Introduce vn_isdisk(struct vnode *vp) function, and use it to test for diskness.


50327 25-Aug-1999 julian

Fix comment to match reality..
vop_strategy gets a vnode argument these days.


50256 23-Aug-1999 bde

Initialise fsids with (user) device numbers again. Bitrot when dev_t's
were changed to pointers was obscured by casting dev_t's to longs.
fsids haven't even been comprised of longs since the Lite2 merge.


50254 23-Aug-1999 phk

Convert DEVFS hooks in (most) drivers to make_dev().

Diskslice/label code not yet handled.

Vinum, i4b, alpha, pc98 not dealt with (left to respective Maintainers)

Add the correct hook for devfs to kern_conf.c

The net result of this excercise is that a lot less files depends on DEVFS,
and devtoname() gets more sensible output in many cases.

A few drivers had minor additional cleanups performed relating to cdevsw
registration.

A few drivers don't register a cdevsw{} anymore, but only use make_dev().


50061 19-Aug-1999 marcel

Let processes retrieve their argv through procfs. Revert to the original
behaviour in all other cases.

Submitted by: Andrew Gordon <arg@arg1.demon.co.uk>


49945 17-Aug-1999 alc

Add the (inline) function vm_page_undirty for clearing the dirty bitmask
of a vm_page.

Use it.

Submitted by: dillon


49771 14-Aug-1999 phk

Spring cleaning around strategy and disklabels/slices:

Introduce BUF_STRATEGY(struct buf *, int flag) macro, and use it throughout.
please see comment in sys/conf.h about the flag argument.

Remove strategy argument from all the diskslice/label/bad144
implementations, it should be found from the dev_t.

Remove bogus and unused strategy1 routines.

Remove open/close arguments from dssize(). Pick them up from dev_t.

Remove unused and unfinished setgeom support from diskslice/label/bad144 code.


49695 13-Aug-1999 phk

Add support for device drivers which want to track all open/close
operations. This allows a device driver better insight into
what is going on that the current:

proc1: open /dev/foo R/O
devsw->open( R/O, proc1 )
proc2: open /dev/foo R/W
devsw->open( R/W, proc2 )
proc2: close
/* nothing, but device is
really only R/O open */
proc1: close
devsw->close( R/O, proc1 )


49687 13-Aug-1999 phk

Don't examine vp->v_tag (see comment in vnode.h)


49681 13-Aug-1999 phk

Remove spec_getattr(), which as far as I can tell can never be called from the current code-paths, and if it were, would panic on any unmounted bdev.


49679 13-Aug-1999 phk

The bdevsw() and cdevsw() are now identical, so kill the former.


49678 13-Aug-1999 phk

s/v_specinfo/v_rdev/


49535 08-Aug-1999 phk

Decommision miscfs/specfs/specdev.h. Most of it goes into <sys/conf.h>,
a few lines into <sys/vnode.h>.

Add a few fields to struct specinfo, paving the way for the fun part.


49525 08-Aug-1999 bde

Fixed printf format errors (%qu -> %llu; the arg was already unsigned long
long to hide problems on alphas).


49524 08-Aug-1999 bde

Fixed all printf format errors reported by gcc -Wformat on i386's:
- %q -> %ll; don't assume that the promotion of off_t is quad_t; only
assume that off_t's are representable as long longs.
- printing of dev_t's was completely broken.

Fixed nearby printf format errors not reported by gcc -Wformat on i386's:
- printing of ino_t's and pointers was sloppy.


49383 02-Aug-1999 rvb

The dev returned here is what is found in the st_dev field.
This should not be further translated ... hence the 0.


49075 25-Jul-1999 bde

Don't set DE_ACCESS for unsuccessful reads.
Translated from: a similar fix in ufs_readwrite.c rev.1.61.

Don't forget to set DE_ACCESS for short reads.

Check for invalid (negative) offsets before checking for reads of
0 bytes, as in ufs, although checking for invalid offsets at all
is probably a bug.


48960 21-Jul-1999 phk

Remove the RCS "Log" and all the verbiage it has generated.


48936 20-Jul-1999 phk

Now a dev_t is a pointer to struct specinfo which is shared by all specdev
vnodes referencing this device.

Details:
cdevsw->d_parms has been removed, the specinfo is available
now (== dev_t) and the driver should modify it directly
when applicable, and the only driver doing so, does so:
vn.c. I am not sure the logic in checking for "<" was right
before, and it looks even less so now.

An intial pool of 50 struct specinfo are depleted during
early boot, after that malloc had better work. It is
likely that fewer than 50 would do.

Hashing is done from udev_t to dev_t with a prime number
remainder hash, experiments show no better hash available
for decent cost (MD5 is only marginally better) The prime
number used should not be close to a power of two, we use
83 for now.

Add new checkalias2() to get around the loss of info from
dev2udev() in bdevvp();

The aliased vnodes are hung on a list straight of the dev_t,
and speclisth[SPECSZ] is unused. The sharing of struct
specinfo means that the v_specnext moves into the vnode
which grows by 4 bytes.

Don't use a VBLK dev_t which doesn't make sense in MFS, now
we hang a dummy cdevsw on B/Cmaj 253 so that things look sane.

Storage overhead from all of this is O(50k).

Bump __FreeBSD_version to 400009

The next step will add the stuff needed so device-drivers can start to
hang things from struct specinfo


48926 20-Jul-1999 phk

Don't access the device with vp->v_specinfo->si_rdev, use vp->v_rdev.


48859 17-Jul-1999 phk

I have not one single time remembered the name of this function correctly
so obviously I gave it the wrong name. s/umakedev/makeudev/g


48719 09-Jul-1999 phk

Allow jailed proccesses to open non-process vnodes like the root of the fs.


48715 09-Jul-1999 peter

Use %q rather than rolling a custom routine.


48692 09-Jul-1999 jlemon

Support for i386 hardware breakpoints.

Submitted by: Brian Dean <brdean@unx.sas.com>


48691 09-Jul-1999 jlemon

Implement support for hardware debug registers on the i386.

Submitted by: Brian Dean <brdean@unx.sas.com>


48468 02-Jul-1999 phk

Make sure that stat(2) and friends always return a valid st_dev field.

Pseudo-FS need not fill in the va_fsid anymore, the syscall code
will use the first half of the fsid, which now looks like a udev_t
with major 255.


48425 01-Jul-1999 peter

move <sys/systm.h> before <sys/buf.h>


48225 26-Jun-1999 mckusick

Convert buffer locking from using the B_BUSY and B_WANTED flags to using
lockmgr locks. This commit should be functionally equivalent to the old
semantics. That is, all buffer locking is done with LK_EXCLUSIVE
requests. Changes to take advantage of LK_SHARED and LK_RECURSIVE will
be done in future commits.


47964 16-Jun-1999 mckusick

Add a vnode argument to VOP_BWRITE to get rid of the last vnode
operator special case. Delete special case code from vnode_if.sh,
vnode_if.src, umap_vnops.c, and null_vnops.c.


47897 13-Jun-1999 phk

Eliminate the bogus procfs private almost struct dirent structure.

Spotted by: Lars Hamren
Reviewed by: bde


47686 01-Jun-1999 dt

Remove an unused variable.


47640 31-May-1999 phk

Simplify cdevsw registration.

The cdevsw_add() function now finds the major number(s) in the
struct cdevsw passed to it. cdevsw_add_generic() is no longer
needed, cdevsw_add() does the same thing.

cdevsw_add() will print an message if the d_maj field looks bogus.

Remove nblkdev and nchrdev variables. Most places they were used
bogusly. Instead check a dev_t for validity by seeing if devsw()
or bdevsw() returns NULL.

Move bdevsw() and devsw() functions to kern/kern_conf.c

Bump __FreeBSD_version to 400006

This commit removes:
72 bogus makedev() calls
26 bogus SYSINIT functions

if_xe.c bogusly accessed cdevsw[], author/maintainer please fix.

I4b and vinum not changed. Patches emailed to authors. LINT
probably broken until they catch up.


47625 30-May-1999 phk

This commit should be a extensive NO-OP:

Reformat and initialize correctly all "struct cdevsw".

Initialize the d_maj and d_bmaj fields.

The d_reset field was not removed, although it is never used.

I used a program to do most of this, so all the files now use the
same consistent format. Please keep it that way.

Vinum and i4b not modified, patches emailed to respective authors.


47407 22-May-1999 dt

Don't call calcru() on a swapped-out process. calcru() access p_stats, which
is in U-area.


47060 12-May-1999 semenu

Driver is now ported to NetBSD.

Submitted by: Christos Zoulas <christos@zoulas.com>


47028 11-May-1999 phk

Divorce "dev_t" from the "major|minor" bitmap, which is now called
udev_t in the kernel but still called dev_t in userland.

Provide functions to manipulate both types:
major() umajor()
minor() uminor()
makedev() umakedev()
dev2udev() udev2dev()

For now they're functions, they will become in-line functions
after one of the next two steps in this process.

Return major/minor/makedev to macro-hood for userland.

Register a name in cdevsw[] for the "filedescriptor" driver.

In the kernel the udev_t appears in places where we have the
major/minor number combination, (ie: a potential device: we
may not have the driver nor the device), like in inodes, vattr,
cdevsw registration and so on, whereas the dev_t appears where
we carry around a reference to a actual device.

In the future the cdevsw and the aliased-from vnode will be hung
directly from the dev_t, along with up to two softc pointers for
the device driver and a few houskeeping bits. This will essentially
replace the current "alias" check code (same buck, bigger bang).

A little stunt has been provided to try to catch places where the
wrong type is being used (dev_t vs udev_t), if you see something
not working, #undef DEVT_FASCIST in kern/kern_conf.c and see if
it makes a difference. If it does, please try to track it down
(many hands make light work) or at least try to reproduce it
as simply as possible, and describe how to do that.

Without DEVT_FASCIST I belive this patch is a no-op.

Stylistic/posixoid comments about the userland view of the <sys/*.h>
files welcome now, from userland they now contain the end result.

Next planned step: make all dev_t's refer to the same devsw[] which
means convert BLK's to CHR's at the perimeter of the vnodes and
other places where they enter the game (bootdev, mknod, sysctl).


46795 09-May-1999 phk

remove cast from dev_t to dev_t.


46676 08-May-1999 phk

I got tired of seeing all the cdevsw[major(foo)] all over the place.

Made a new (inline) function devsw(dev_t dev) and substituted it.

Changed to the BDEV variant to this format as well: bdevsw(dev_t dev)

DEVFS will eventually benefit from this change too.


46669 08-May-1999 dcs

The lowercasing of Joliet filenames was not a feature.


46635 07-May-1999 phk

Continue where Julian left off in July 1998:

Virtualize bdevsw[] from cdevsw. bdevsw() is now an (inline)
function.

Join CDEV_MODULE and BDEV_MODULE to DEV_MODULE (please pay attention
to the order of the cmaj/bmaj arguments!)

Join CDEV_DRIVER_MODULE and BDEV_DRIVER_MODULE to DEV_DRIVER_MODULE
(ditto!)

(Next step will be to convert all bdev dev_t's to cdev dev_t's
before they get to do any damage^H^H^H^H^H^Hwork in the kernel.)


46593 06-May-1999 peter

One too many vfsops..


46580 06-May-1999 phk

remove b_proc from struct buf, it's (now) unused.

Reviewed by: dillon, bde


46568 06-May-1999 peter

Add sufficient braces to keep egcs happy about potentially ambiguous
if/else nesting.


46389 04-May-1999 phk

Make the type and map files claim 0 bytes size. Tar doesn't get confused
now, but doesn't store any data eiter.

I wonder if we shouldn't claim to be fifos instead...


46388 04-May-1999 phk

Add even more () to CHECKIO which by now feels positively LISPish.

Submitted by: bde
Reviewed by: phk


46201 30-Apr-1999 phk

Add a new "file" to procfs: "rlimit" which shows the resource limits for
the process.

PR: 11342
Submitted by: Adrian Chadd adrian@freebsd.org
Reviewed by: phk


46155 28-Apr-1999 phk

This Implements the mumbled about "Jail" feature.

This is a seriously beefed up chroot kind of thing. The process
is jailed along the same lines as a chroot does it, but with
additional tough restrictions imposed on what the superuser can do.

For all I know, it is safe to hand over the root bit inside a
prison to the customer living in that prison, this is what
it was developed for in fact: "real virtual servers".

Each prison has an ip number associated with it, which all IP
communications will be coerced to use and each prison has its own
hostname.

Needless to say, you need more RAM this way, but the advantage is
that each customer can run their own particular version of apache
and not stomp on the toes of their neighbors.

It generally does what one would expect, but setting up a jail
still takes a little knowledge.

A few notes:

I have no scripts for setting up a jail, don't ask me for them.

The IP number should be an alias on one of the interfaces.

mount a /proc in each jail, it will make ps more useable.

/proc/<pid>/status tells the hostname of the prison for
jailed processes.

Quotas are only sensible if you have a mountpoint per prison.

There are no privisions for stopping resource-hogging.

Some "#ifdef INET" and similar may be missing (send patches!)

If somebody wants to take it from here and develop it into
more of a "virtual machine" they should be most welcome!

Tools, comments, patches & documentation most welcome.

Have fun...

Sponsored by: http://www.rndassociates.com/
Run for almost a year by: http://www.servetheweb.com/


46116 27-Apr-1999 phk

Change suser_xxx() to suser() where it applies.


46112 27-Apr-1999 phk

Suser() simplification:

1:
s/suser/suser_xxx/

2:
Add new function: suser(struct proc *), prototyped in <sys/proc.h>.

3:
s/suser_xxx(\([a-zA-Z0-9_]*\)->p_ucred, \&\1->p_acflag)/suser(\1)/

The remaining suser_xxx() calls will be scrutinized and dealt with
later.

There may be some unneeded #include <sys/cred.h>, but they are left
as an exercise for Bruce.

More changes to the suser() API will come along with the "jail" code.


45879 20-Apr-1999 semenu

Removed annoying messaged during boot,added some check
before mounting (should help to do not mount extended partitions:-).
Fixed problem with hanging while unmounting busy fs.

And (the most important) added some locks to prevent
simulaneous access to kernel structures!


45773 18-Apr-1999 dcs

Add support for Joliet extensions to the iso9660 fs. The related PR
cannot yet be closed, though.

I hope I got all credits right, and that the multiple submitted by lines
do not break anyone's scripts...

PR: kern/5038, kern/5567
Submitted by: Keith Jang <keith@email.gcn.net.tw>
Submitted by: Joachim Kuebart <joki@kuebart.stuttgart.netsurf.de>
Submitted by: Byung Yang <byung@wam.umd.edu>
Submitted by: Motomichi Matsuzaki <mzaki@e-mail.ne.jp>


45653 13-Apr-1999 semenu

Removed DIAGNOSTIC opion redefinition.

Submitted by: Eivind Eklund <eivind@FreeBSD.org>


45347 05-Apr-1999 julian

Catch a case spotted by Tor where files mmapped could leave garbage in the
unallocated parts of the last page when the file ended on a frag
but not a page boundary.
Delimitted by tags PRE_MATT_MMAP_EOF and POST_MATT_MMAP_EOF,
in files alpha/alpha/pmap.c i386/i386/pmap.c nfs/nfs_bio.c vm/pmap.h
vm/vm_page.c vm/vm_page.h vm/vnode_pager.c miscfs/specfs/spec_vnops.c
ufs/ufs/ufs_readwrite.c kern/vfs_bio.c

Submitted by: Matt Dillon <dillon@freebsd.org>
Reviewed by: Alan Cox <alc@freebsd.org>


45098 28-Mar-1999 dt

Back out half of 1.32: don't print a message on every failed mount attempt.
It is too chatty and hardly useful. 2 mesages in somewhat usual cases are
left for now.


44693 12-Mar-1999 imp

Don't allow anyone except root to mount file systems that map uids.
This can have bad security implications, but the impact on FreeBSD
systems is minimal because this fs isn't in the default kernels and it
is unknown if it even works.

Submitted by: Manuel Bouyer <bouyer@antioche.eu.org> and
Artur Grabowski <art@stacken.kth.se>


44329 28-Feb-1999 peter

This code got moved as a result of confusion between union mounts and
unionfs. Julian has already revived the union mount part of this move
in vfs_syscalls.c rev 1.119, but forgot to take it out of here.


44247 25-Feb-1999 dillon

Reviewed by: Julian Elischer <julian@whistle.com>

Add d_parms() to {c,b}devsw[]. If non-NULL this function points to
a device routine that will properly fill in the specinfo structure.
vfs_subr.c's checkalias() supplies appropriate defaults. This change
should be fully backwards compatible with existing devices.


44146 19-Feb-1999 luoqi

Hide access to vmspace:vm_pmap with inline function vmspace_pmap(). This
is the preparation step for moving pmap storage out of vmspace proper.

Reviewed by: Alan Cox <alc@cs.rice.edu>
Matthew Dillion <dillon@apollo.backplane.com>


44142 19-Feb-1999 semenu

Added limited write ability. Now we can use some kind
of files for swap holders. See mount_ntfs..8 for details.


43748 07-Feb-1999 dillon

Remove MAP_ENTRY_IS_A_MAP 'share' maps. These maps were once used to
attempt to optimize forks but were essentially given-up on due to
problems and replaced with an explicit dup of the vm_map_entry structure.
Prior to the removal, they were entirely unused.


43634 05-Feb-1999 jdp

Correct a format mismatch on 64-bit architectures. This should
fix the erroneous values in the procfs "map" file on the Alpha.


43552 03-Feb-1999 semenu

First version.
Reviewed by: David O'Brien <obrien@NUXI.com>


43461 31-Jan-1999 bde

Don't comment out dead code; remove it.


43427 30-Jan-1999 phk

Use suser() to determine super-user-ness.
Don't pretend we can mount RW.

Reviewed by: bde


43382 29-Jan-1999 bde

Removed a bogus cast to c_caddr_t. This is part of terminating
c_caddr_t with extreme prejudice. Here we want to convert from
`const char *' to `const char *'. Casting through c_caddr_t is
not the way to do this. The original cast to caddr_t was apparently
to break warnings about const mismatches in other versions of BSD
(in 4.4BSDLite2, the conversion is from `const char *path' to
plain caddr_t).


43311 28-Jan-1999 dillon

Fix warnings in preparation for adding -Wall -Wcast-qual to the
kernel compile


43309 27-Jan-1999 dillon

Fix warnings in preparation for adding -Wall -Wcast-qual to the
kernel compile.

This commit includes significant work to proper handle const arguments
for the DDB symbol routines.


43305 27-Jan-1999 dillon

Fix warnings in preparation for adding -Wall -Wcast-qual to the
kernel compile


43301 27-Jan-1999 dillon

Fix warnings in preparation for adding -Wall -Wcast-qual to the
kernel compile


43295 27-Jan-1999 dillon

Fix warnings preparing for -Wall -Wcast-qual

Also disable one usb module in LINT due to fatal compilation errors,
temporary.


42957 21-Jan-1999 dillon

This is a rather large commit that encompasses the new swapper,
changes to the VM system to support the new swapper, VM bug
fixes, several VM optimizations, and some additional revamping of the
VM code. The specific bug fixes will be documented with additional
forced commits. This commit is somewhat rough in regards to code
cleanup issues.

Reviewed by: "John S. Dyson" <root@dyson.iquest.net>, "David Greenman" <dg@root.com>


42900 20-Jan-1999 eivind

Add 'options DEBUG_LOCKS', which stores extra information in struct
lock, and add some macros and function parameters to make sure that
the information get to the point where it can be put in the lock
structure.

While I'm here, add DEBUG_VFS_LOCKS to LINT.


42770 17-Jan-1999 peter

Missed a stray LKM #ifdef


42768 17-Jan-1999 peter

Mountroot could concievably make sense to a KLD though, in the preload
case. I'm not sure the autoconf code is up to it though...


42763 17-Jan-1999 peter

Clean up the KLD/LKM goop a bit.


42568 12-Jan-1999 eivind

Remove declarations for undefined functions and a couple of unused
enotsupp implementations.


42374 07-Jan-1999 bde

Don't pass unused unused timestamp args to UFS_UPDATE() or waste
time initializing them. This almost finishes centralizing (in-core)
timestamp updates in ufs_itimes().


42315 05-Jan-1999 eivind

Remove the 'waslocked' parameter to vfs_object_create().


42301 05-Jan-1999 peter

A partial implementation of the procfs cmdline pseudo-file. This
is enough to satisfy things like StarOffice. This is a hack, but doing
it properly would be a LOT of work, and would require extensive grovelling
around in the user address space to find the argv[].

Obtained from: Mostly from Andrzej Bialecki <abial@nask.pl>.


42252 02-Jan-1999 dt

Now empty DOS filesystems default to long file names. Non-empty filesystems
without traces of Win95 default to short file names, as before.


42249 02-Jan-1999 dt

Ensure that deHighClust in direntry always initialized.

Noticed by: Carl Mascott <cmascott@world.std.com>

Don't write access time of a file more than once per day. (Its precision is
1 day anyway). Don't try to write access and creation time in nonwin95 case.

Suggested by: bde (long time ago).


42248 02-Jan-1999 bde

Ifdefed conditionally used simplock variables.


42227 01-Jan-1999 bde

Made this compile if UMAPFS_DIAGNOSTIC is defined. This has been broken
since before rev.1.1, so UMAPFS_DIAGNOSTIC should not be trusted.
UMAPFS_DIAGNOSTIC is commented out in LINT to hide various bugs.


41836 16-Dec-1998 eivind

Fix possible NULL-pointer deref in error case (same as DEVFS).


41761 14-Dec-1998 dillon

Cleanup uninitialized-possibly-used (but really not) warnings


41591 07-Dec-1998 archie

The "easy" fixes for compiling the kernel -Wunused: remove unreferenced static
and local variables, goto labels, and functions declared but not defined.


41570 07-Dec-1998 eivind

'\0' is the most ugly NULL pointer constant I've ever seen in real code.


41560 06-Dec-1998 jkh

MFC: loosen compare even though bde doesn't like it.


41514 04-Dec-1998 archie

Examine all occurrences of sprintf(), strcat(), and str[n]cpy()
for possible buffer overflow problems. Replaced most sprintf()'s
with snprintf(); for others cases, added terminating NUL bytes where
appropriate, replaced constants like "16" with sizeof(), etc.

These changes include several bug fixes, but most changes are for
maintainability's sake. Any instance where it wasn't "immediately
obvious" that a buffer overflow could not occur was made safer.

Reviewed by: Bruce Evans <bde@zeta.org.au>
Reviewed by: Matthew Dillon <dillon@apollo.backplane.com>
Reviewed by: Mike Spengler <mks@networkcs.com>


41504 04-Dec-1998 rvb

Don't print diagnostic anymore


41416 29-Nov-1998 dt

Honor MNT_NOATIME.

PR: 8383
Submitted by: Carl Mascott <cmascott@world.std.com>


41287 22-Nov-1998 bde

Return ENOTTY instead of EBADF for ioctls on dead vnodes. This fixes
tcsetpgrp() on controlling terminals that are no longer associated
with the session of the calling process, not to mention ioctl.2.


41275 21-Nov-1998 dt

Support NT VFAT lower case flags.

PR: 8383
(Mostly) Submitted by: Carl Mascott <cmascott@world.std.com>


41202 16-Nov-1998 rvb

A few bug fixes for Robert Watson


41173 15-Nov-1998 bde

Finished updating module event handlers to be compatible with
modeventhand_t.


41095 11-Nov-1998 rvb

coda_lookup now passes up an extra flag. But old veni will
be ok; new veni will check /dev/cfs0 to make sure that a new
kernel is running.
Also, a bug in vc_nb_close iff CODA_SIGNAL's were seen has been
fixed.


41059 10-Nov-1998 peter

add #include <sys/kernel.h> where it's needed by MALLOC_DEFINE()


41031 09-Nov-1998 peter

"fix" a warning that has been bugging me for ages. Eliminate a couple
of temporary variables since they are only used once and their types
were the cause of the warnings.


40857 03-Nov-1998 peter

Support KLD. We register and unregister two modules. "coda" (the vfs)
via VFS_SET(), and "codadev" for the cdevsw entry. From kldstat -v:
3 1 0xf02c5000 115d8 coda.ko
Contains modules:
Id Name
2 codadev
3 coda


40852 03-Nov-1998 peter

Change the #ifdef UNION code into a callable hook. Arrange to have this
set up when unionfs is present, either statically or as a kld module.


40790 31-Oct-1998 peter

Use TAILQ macros for clean/dirty block list processing. Set b_xflags
rather than abusing the list next pointer with a magic number.


40717 29-Oct-1998 peter

Use vtruncbuf() rather than vinvalbuf() when shortening files.


40708 28-Oct-1998 rvb

Change the way unmounting happens to guarantee that the
client programs are allowed to finish up (coda_call is
forced to complete) and release their locks. Thus there
is a reasonable chance that the vflush implicit in the
unmount will not get hung on held locks.


40706 28-Oct-1998 rvb

Venus must be passed O_CREAT flag on VOP_OPEN iff this is
a creat so that we can will allow a mode 444 file to be
written into. Sync with the latest coda.h and deal with
collateral damage.


40700 28-Oct-1998 dg

Added a second argument, "activate" to the vm_page_unwire() call so that
the caller can select either inactive or active queue to put the page on.


40660 26-Oct-1998 bde

Removed redundant bitrotted checks for major numbers instead of updating
them.


40651 25-Oct-1998 bde

Don't follow null bdevsw pointers. The `major(dev) < nblkdev' test rotted
when bdevsw[] became sparse. We still depend on magic to avoid having to
check that (v_rdev) device numbers in vnodes are not NODEV.


40648 25-Oct-1998 phk

Nitpicking and dusting performed on a train. Removes trivial warnings
about unused variables, labels and other lint.


39778 29-Sep-1998 rvb

Fixes for lkm:
1. use VFS_LKM vs ACTUALLY_LKM_NOT_KERNEL
2. don't pass -DCODA to lkm build


39728 28-Sep-1998 rvb

Cleanup and fix THE bug


39651 25-Sep-1998 rvb

Don't lose this file


39650 25-Sep-1998 rvb

Put "stray" printouts under DIAGNOSTIC. Make everything build
with DEBUG on. Add support for lkm. (The macro's don't work
for me; for a good chuckle look at the end of coda_fbsd.c.)


39187 14-Sep-1998 sos

Remove the SLICE code.
This clearly needs alot more thought, and we dont need this to hunt
us down in 3.0-RELEASE.


39129 13-Sep-1998 dt

Remove unused variable.

Pointed out by: bde


39128 13-Sep-1998 dt

Fix a bug related to renaming in root directory. This bug reported by
Cejka Rudolf <cejkar@dcse.fee.vutbr.cz> on freebsd-current in Messaage-Id
<199807141023.MAA09803@kazi.dcse.fee.vutbr.cz>.

Reviewed by: bde


39126 13-Sep-1998 rvb

Finish conversion of cfs -> coda


39111 12-Sep-1998 phk

various nits that didn't make it through the brucefilter.


39085 11-Sep-1998 rvb

All the references to cfs, in symbols, structs, and strings
have been changed to coda. (Same for CFS.)


38909 07-Sep-1998 bde

Removed statically configured mount type numbers (MOUNT_*) and all
references to them.

The change a couple of days ago to ignore these numbers in statically
configured vfsconf structs was slightly premature because the cd9660,
cfs, devfs, ext2fs, nfs vfs's still used MOUNT_* instead of the number
in their vfsconf struct.


38903 07-Sep-1998 guido

Fix problem reported on bugtraq: check permission of device mounted
for non-root users. Fortunately, the default for vfs.usermount is 0.
Tested by: "Jan B. Koum " <jkb@best.com


38884 06-Sep-1998 rvb

Clean LINT


38862 05-Sep-1998 phk

Add a new vnode op, VOP_FREEBLKS(), which filesystems can use to inform
device drivers about sectors no longer in use.

Device-drivers receive the call through d_strategy, if they have
D_CANFREE in d_flags.

This allows flash based devices to erase the sectors and avoid
pointlessly carrying them around in compactions.

Reviewed by: Kirk Mckusick, bde
Sponsored by: M-Systems (www.m-sys.com)


38799 04-Sep-1998 dfr

Cosmetic changes to the PAGE_XXX macros to make them consistent with
the other objects in vm.


38759 02-Sep-1998 rvb

Pass2 complete


38625 29-Aug-1998 rvb

Very Preliminary Coda


38545 25-Aug-1998 phk

sort the prototypes


38529 24-Aug-1998 phk

Last commit managed to get mangled somehow.


38525 24-Aug-1998 phk

Remove the last remaining evidence of B_TAPE.
Reclaim 3 unused bits in b_flags


38489 23-Aug-1998 bde

Enabled Lite2 fix for reading from dead ttys.


38408 17-Aug-1998 bde

Removed unused includes.


38354 16-Aug-1998 bde

Use [u]intptr_t instead of [u_]long for casts between pointers and
integers. Don't forget to cast to (void *) as well.


37977 30-Jul-1998 bde

Fixed printf format errors.


37898 27-Jul-1998 alex

Style fixes and a bug fix: don't remove the exit handler if unmount
fails.

Submitted by: bde


37877 27-Jul-1998 alex

A better solution to the rm_at_exit problem: Register the exit function
during first mount. Unregister the exit function at last unmount.

Concept by: sef
Reviewed by: sef
Implemented by: alex


37864 25-Jul-1998 alex

Override the default VFS LKM dispatch functions so that a module
unload function can be provided (this is necessary to unregister
the at_exit handler).


37653 15-Jul-1998 bde

Cast pointers to [u]intptr_t instead of to [unsigned] long.


37649 15-Jul-1998 bde

Cast pointers to uintptr_t/intptr_t instead of to u_long/long,
respectively. Most of the longs should probably have been
u_longs, but this changes is just to prevent warnings about
casts between pointers and integers of different sizes, not
to fix poorly chosen types.


37555 11-Jul-1998 bde

Fixed printf format errors.


37465 07-Jul-1998 bde

Quick fix for type mismatches which were fatal if longs aren't 32
bits. We used a private, wrong, version of `struct dirent' to help
break getdirentries(), and we use a silly check that the size of this
struct is a power of 2 to help break mount() if getdirentries() would
not work. This fix just changes the struct to match `struct dirent'
(except for the name length).


37389 04-Jul-1998 julian

There is no such thing any more as "struct bdevsw".

There is only cdevsw (which should be renamed in a later edit to deventry
or something). cdevsw contains the union of what were in both bdevsw an
cdevsw entries. The bdevsw[] table stiff exists and is a second pointer
to the cdevsw entry of the device. it's major is in d_bmaj rather than
d_maj. some cleanup still to happen (e.g. dsopen now gets two pointers
to the same cdevsw struct instead of one to a bdevsw and one to a cdevsw).

rawread()/rawwrite() went away as part of this though it's not strictly
the same patch, just that it involves all the same lines in the drivers.

cdroms no longer have write() entries (they did have rawwrite (?)).
tapes no longer have support for bdev operations.

Reviewed by: Eivind Eklund and Mike Smith
Changes suggested by eivind.


37384 04-Jul-1998 julian

VOP_STRATEGY grows an (struct vnode *) argument
as the value in b_vp is often not really what you want.
(and needs to be frobbed). more cleanups will follow this.
Reviewed by: Bruce Evans <bde@freebsd.org>


37154 25-Jun-1998 dt

Remove "not hungly" panics. Cookies now used by the linux and ibcs2
emulators. The emulators assume that filesystem may just ignore cookies, and
handle this case correctly. So we just ignore cookies.

Also sync *_readdir "prototypes" with reality.


36969 14-Jun-1998 bde

Avoid a 64-bit division in procfs_readdir(). Fixed related overflows.
Check args using the same expression as in fdesc and kernfs. The check
was actually already correct, modulo overflow. It could be tightened
up to either allow huge (aligned) offsets, treating them as EOF, or
disallow all offsets beyond EOF.

Didn't fix invalid address calculation &foo[i] where i may be out of
bounds.

Didn't fix shooting of foot using a private unportable dirent struct.


36963 14-Jun-1998 bde

Avoid a 64-bit division in fdesc_readdir(). Fixed related overflows
and missing arg checking.

Panic instead of returning bogus error codes or forgetting to check
all cases if fdesc_readdir() gets called for a non-directory. This
can't happen.


36873 10-Jun-1998 dfr

Make these files compile.


36864 10-Jun-1998 alex

ENOPNOTSUPP --> EOPNOTSUPP

PR: 6906
Submitted by: Steven G. Kargl <kargl@troutmask.apl.washington.edu>


36858 10-Jun-1998 dt

Back out previous change. This behavior is at least completely
"susv2"-compliant.


36851 10-Jun-1998 dt

Also return EOPNOTSUPP rather than EINVAL for not supported owner and group
changes.


36840 10-Jun-1998 peter

Don't silently accept attempts to change flags where they are not
supported.


36839 10-Jun-1998 peter

Return EOPNOTSUPP rather than EINVAL for flags that are not supported.


36811 09-Jun-1998 dt

Fix typo in a comment.


36735 07-Jun-1998 dfr

This commit fixes various 64bit portability problems required for
FreeBSD/alpha. The most significant item is to change the command
argument to ioctl functions from int to u_long. This change brings us
inline with various other BSD versions. Driver writers may like to
use (__FreeBSD_version == 300003) to detect this change.

The prototype FreeBSD/alpha machdep will follow in a couple of days
time.


36275 21-May-1998 dyson

Make flushing dirty pages work correctly on filesystems that
unexpectedly do not complete writes even with sync I/O requests.
This should help the behavior of mmaped files when using
softupdates (and perhaps in other circumstances also.)


36168 19-May-1998 tegge

Disallow reading the current kernel stack. Only the user structure and
the current registers should be accessible.
Reviewed by: David Greenman <dg@root.com>


36154 18-May-1998 dt

Fix priority bug in previous commit.

Submitted by: bde


36133 17-May-1998 dt

Fix support for pre-Win95 filesystems: Make it possible to lookup just
created short file name. Don't insert "generation numbers".


36130 17-May-1998 dt

Remove bogus LK_RETRY.

Submitted by: bde


36123 17-May-1998 bde

Don't forget to clean up after an error reading the directory entry
in deget().


36122 17-May-1998 bde

Removed vestiges of pre-Lite2 locking.


36119 17-May-1998 phk

s/nanoruntime/nanouptime/g
s/microruntime/microuptime/g

Reviewed by: bde


36117 17-May-1998 sos

Cleanup after Garret, include unpch.h to get at various macros..


35871 09-May-1998 dt

Fix off by ane error in previous commit.

This caused following commands:
mkdir z
cd z
touch A B
mv B A
corrupt the '..' entry in 'z'.

Reported by: bde


35823 07-May-1998 msmith

In the words of the submitter:

---------
Make callers of namei() responsible for releasing references or locks
instead of having the underlying filesystems do it. This eliminates
redundancy in all terminal filesystems and makes it possible for stacked
transport layers such as umapfs or nullfs to operate correctly.

Quality testing was done with testvn, and lat_fs from the lmbench suite.

Some NFS client testing courtesy of Patrik Kudo.

vop_mknod and vop_symlink still release the returned vpp. vop_rename
still releases 4 vnode arguments before it returns. These remaining cases
will be corrected in the next set of patches.
---------

Submitted by: Michael Hancock <michaelh@cet.co.jp>


35769 06-May-1998 msmith

As described by the submitter:

Reverse the VFS_VRELE patch. Reference counting of vnodes does not need
to be done per-fs. I noticed this while fixing vfs layering violations.
Doing reference counting in generic code is also the preference cited by
John Heidemann in recent discussions with him.

The implementation of alternative vnode management per-fs is still a valid
requirement for some filesystems but will be revisited sometime later,
most likely using a different framework.

Submitted by: Michael Hancock <michaelh@cet.co.jp>


35511 29-Apr-1998 dt

Use DFLTBSIZE instead of MAXBSIZE for pm_fatblksize.

In msdosfs_sync: spelling fix, formatting changes; fix MNT_LAZY (sync
modified denodes, don't sync device)

Mostly submitted by (and with hints from): bde

Increase limit for maximum disk size: as far as I can see previous limit was
gratuitously too low.


35497 29-Apr-1998 dyson

Tighten up management of memory and swap space during map allocation,
deallocation cycles. This should provide a measurable improvement
on swap and memory allocation on loaded systems. It is unlikely a
complete solution. Also, provide more map info with procfs.
Chuck Cranor spurred on this improvement.


35360 20-Apr-1998 julian

The 'mountroot' option is obviously pointless for an LKM
so allow LKM compilation to succeed by making it go away for that case.
Saves needing to include opt_devfs.h which an LKM cannot rely on anyhow.


35323 20-Apr-1998 julian

Make the devfs SLICE option a standard type option.
(hopefully it will go away eventually anyhow)


35319 19-Apr-1998 julian

Add changes and code to implement a functional DEVFS.
This code will be turned on with the TWO options
DEVFS and SLICE. (see LINT)
Two labels PRE_DEVFS_SLICE and POST_DEVFS_SLICE will deliniate these changes.

/dev will be automatically mounted by init (thanks phk)
on bootup. See /sys/dev/slice/slice.4 for more info.
All code should act the same without these options enabled.

Mike Smith, Poul Henning Kamp, Soeren, and a few dozen others

This code does not support the following:
bad144 handling.
Persistance. (My head is still hurting from the last time we discussed this)
ATAPI flopies are not handled by the SLICE code yet.

When this code is running, all major numbers are arbitrary and COULD
be dynamically assigned. (this is not done, for POLA only)
Minor numbers for disk slices ARE arbitray and dynamically assigned.


35256 17-Apr-1998 des

Seventy-odd "its" / "it's" typos in comments fixed as per kern/6108.


35210 15-Apr-1998 bde

Support compiling with `gcc -ansi'.


35202 15-Apr-1998 dt

Add a missing LK_RETRY.
Noticed by: Bruce (almost 2 monts ago)

Remove a debugging printf.


35063 06-Apr-1998 phk

Use random() rather then than homegrown stuff.


35046 05-Apr-1998 ache

Print explanation diagnostics when mount is impossible
Submitted by: Dmitrij Tejblum <dima@tejblum.dnttm.rssi.ru>


35029 04-Apr-1998 phk

Time changes mark 2:

* Figure out UTC relative to boottime. Four new functions provide
time relative to boottime.

* move "runtime" into struct proc. This helps fix the calcru()
problem in SMP.

* kill mono_time.

* add timespec{add|sub|cmp} macros to time.h. (XXX: These may change!)

* nanosleep, select & poll takes long sleeps one day at a time

Reviewed by: bde
Tested by: ache and others


34920 28-Mar-1998 ache

Fix dead hang writing to FAT
Submitted by: Dmitrij Tejblum <dima@tejblum.dnttm.rssi.ru>


34901 26-Mar-1998 phk

Add two new functions, get{micro|nano}time.

They are atomic, but return in essence what is in the "time" variable.
gettime() is now a macro front for getmicrotime().

Various patches to use the two new functions instead of the various
hacks used in their absence.

Some puntuation and grammer patches from Bruce.

A couple of XXX comments.


34698 20-Mar-1998 kato

Deleted 1024bytes/sector floppy code for PC-98 arch. The
1024bytes/sector code has not worked for long time and it should be
re-implemented.


34642 17-Mar-1998 kato

If lowervp is NULLVP, vap was clobbered.

Submitted by: Naofumi Honda <honda@Kururu.math.sci.hokudai.ac.jp>
Obtained from: NetBSD/pc98


34266 08-Mar-1998 julian

Reviewed by: dyson@freebsd.org (john Dyson), dg@root.com (david greenman)
Submitted by: Kirk McKusick (mcKusick@mckusick.com)
Obtained from: WHistle development tree


34249 08-Mar-1998 dyson

Initialize b_resid, and also print out better diagnostics on I/O
errors. This will allow for better tracking of user error reports.


34206 07-Mar-1998 dyson

This mega-commit is meant to fix numerous interrelated problems. There
has been some bitrot and incorrect assumptions in the vfs_bio code. These
problems have manifest themselves worse on NFS type filesystems, but can
still affect local filesystems under certain circumstances. Most of
the problems have involved mmap consistancy, and as a side-effect broke
the vfs.ioopt code. This code might have been committed seperately, but
almost everything is interrelated.

1) Allow (pmap_object_init_pt) prefaulting of buffer-busy pages that
are fully valid.
2) Rather than deactivating erroneously read initial (header) pages in
kern_exec, we now free them.
3) Fix the rundown of non-VMIO buffers that are in an inconsistent
(missing vp) state.
4) Fix the disassociation of pages from buffers in brelse. The previous
code had rotted and was faulty in a couple of important circumstances.
5) Remove a gratuitious buffer wakeup in vfs_vmio_release.
6) Remove a crufty and currently unused cluster mechanism for VBLK
files in vfs_bio_awrite. When the code is functional, I'll add back
a cleaner version.
7) The page busy count wakeups assocated with the buffer cache usage were
incorrectly cleaned up in a previous commit by me. Revert to the
original, correct version, but with a cleaner implementation.
8) The cluster read code now tries to keep data associated with buffers
more aggressively (without breaking the heuristics) when it is presumed
that the read data (buffers) will be soon needed.
9) Change to filesystem lockmgr locks so that they use LK_NOPAUSE. The
delay loop waiting is not useful for filesystem locks, due to the
length of the time intervals.
10) Correct and clean-up spec_getpages.
11) Implement a fully functional nfs_getpages, nfs_putpages.
12) Fix nfs_write so that modifications are coherent with the NFS data on
the server disk (at least as well as NFS seems to allow.)
13) Properly support MS_INVALIDATE on NFS.
14) Properly pass down MS_INVALIDATE to lower levels of the VM code from
vm_map_clean.
15) Better support the notion of pages being busy but valid, so that
fewer in-transit waits occur. (use p->busy more for pageouts instead
of PG_BUSY.) Since the page is fully valid, it is still usable for
reads.
16) It is possible (in error) for cached pages to be busy. Make the
page allocation code handle that case correctly. (It should probably
be a printf or panic, but I want the system to handle coding errors
robustly. I'll probably add a printf.)
17) Correct the design and usage of vm_page_sleep. It didn't handle
consistancy problems very well, so make the design a little less
lofty. After vm_page_sleep, if it ever blocked, it is still important
to relookup the page (if the object generation count changed), and
verify it's status (always.)
18) In vm_pageout.c, vm_pageout_clean had rotted, so clean that up.
19) Push the page busy for writes and VM_PROT_READ into vm_pageout_flush.
20) Fix vm_pager_put_pages and it's descendents to support an int flag
instead of a boolean, so that we can pass down the invalidate bit.


34096 06-Mar-1998 msmith

Trivial filesystem getpages/putpages implementations, set the second.
These should be considered the first steps in a work-in-progress.
Submitted by: Terry Lambert <terry@freebsd.org>


34023 04-Mar-1998 dyson

Fix certain kinds of block device operations. For example, tunefs on
a block device shouldn't crash the system anymore.


34002 03-Mar-1998 msmith

Patch to the last commit; attempt to unspam stuff from NetBSD.
Submitted by: Dmitrij Tejblum <dima@tejblum.dnttm.rssi.ru>


33964 01-Mar-1998 msmith

The intent is to get rid of WILLRELE in vnode_if.src by making
a complement to all ops that return a vpp, VFS_VRELE. This is
initially only for file systems that implement the following ops
that do a WILLRELE:

vop_create, vop_whiteout, vop_mknod, vop_remove, vop_link,
vop_rename, vop_mkdir, vop_rmdir, vop_symlink

This is initial DNA that doesn't do anything yet. VFS_VRELE is
implemented but not called.

A default vfs_vrele was created for fs implementations that use the
standard vnode management routines.

VFS_VRELE implementations were made for the following file systems:

Standard (vfs_vrele)
ffs mfs nfs msdosfs devfs ext2fs

Custom
union umapfs

Just EOPNOTSUPP
fdesc procfs kernfs portal cd9660

These implementations may change as VOP changes are implemented.

In the next phase, in the vop implementations calls to vrele and the vrele
part of vput will be moved to the top layer vfs_vnops and made visible
to all layers. vput will be replaced by unlock in these cases. Unlocking
will still be done in the per fs layer but the refcount decrement will be
triggered at the top because it doesn't hurt to hold a vnode reference a
little longer. This will have minimal impact on the structure of the
existing code.

This will only be done for vnode arguments that are released by the various
fs vop implementations.

Wider use of VFS_VRELE will likely require restructuring of the code.

Reviewed by: phk, dyson, terry et. al.
Submitted by: Michael Hancock <michaelh@cet.co.jp>


33959 01-Mar-1998 msmith

Fix mmap() on msdosfs. In the words of the submitter:

|In the process of evaluating the getpages/putpages issues I discovered
|that mmap on MSDOSFS does not work. This is because I blindly merged
|NetBSD changes in msdosfs_bmap and msdosfs_strategy. Apparently, their
|blocksize is always DEV_BSIZE (even in files), while in FreeBSD
|blocksize in files is v_mount->mnt_stat.f_iosize (i.e. clustersize in
|MSDOSFS case). The patch is below.

Submitted by: Dmitrij Tejblum <dima@tejblum.dnttm.rssi.ru>


33872 27-Feb-1998 msmith

Fix a problem with the conversion of Unix filenames into the VFAT
namespace.
Submitted by: Dmitrij Tejblum <dima@tejblum.dnttm.rssi.ru>


33848 26-Feb-1998 msmith

Fixes for some bugs in the VFAT/FAT32 support:

- 'mv longnamedfile1 longnamedfile2' would cause longnamedfile2 to lose its
long name.
- Long names have trailing spaces/dots stripped for lookup as well as
assignment.
- A lockup when the mdsosfs was accessed from within the Linux emulator is fixed.
- A bug whereby long filenames were recognised by Microsoft operating systems but
not FreeBSD is fixed.

Submitted by: Dmitrij Tejblum <dima@tejblum.dnttm.rssi.ru>


33844 26-Feb-1998 kato

Deleted KLOCK-hack.


33791 24-Feb-1998 ache

Back out "always view in lowercase" part
Return to previous variant "comparing in lowercase" in winChkName


33768 23-Feb-1998 ache

Implement loadable DOS<->local conversion tables for DOS names
Always create DOS name in uppercase
Always view DOS name in lowercase


33765 23-Feb-1998 kato

Fix signatures of NEC's DOS formats.

Submitted by: Takahashi Yoshihiro <nyan@wyvern.cc.kogakuin.ac.jp>


33762 23-Feb-1998 ache

Oops, add missing bcopy of upper->lower table


33760 23-Feb-1998 ache

Implement loadable upper->lower local conversion table


33751 22-Feb-1998 ache

Reduce new arguments number added in my changes


33750 22-Feb-1998 ache

Add Unicode support to winChkName, now lookup works!


33747 22-Feb-1998 ache

Implement loadable local<->unicode file names conversion
Note: it produce correct names only for Win95, DOS names are still
incorrect and need similar work
mount_msdos support coming soon


33745 22-Feb-1998 ache

Replace all unknown Unicode characters with '?' in win->unix mapping


33744 22-Feb-1998 ache

Add initial support to map 0x4XX Unicode Cyrillic range names:
only win->unix part is implemented at this time with 256-byte
table defaulted to KOI8-R (will be loadable in future).
Since back mapping not supported yet, you'll get "No such file or directory"
on each Cyrillic name with 'ls -l', only 'echo *' work at this moment.
Teach current code to understand Unicode a bit.


33676 20-Feb-1998 bde

Removed unused #includes.


33548 18-Feb-1998 jkh

Update MSDOSFS code using NetBSD's msdosfs as a guide to support
FAT32 partitions. Unfortunately, we looked around here at
Walnut Creek CDROM for any newer FAT32-supporting versions
of Win95 and we were unsuccessful; only the older stuff here.
So this is untested beyond simply making sure it compiles and
someone with access to an actual FAT32 fs will have
to let us know how well it actually works.
Submitted by: Dmitrij Tejblum <dima@tejblum.dnttm.rssi.ru>
Obtained from: NetBSD


33215 10-Feb-1998 kato

Deleted unused variable.


33211 10-Feb-1998 kato

Undo UN_KLOCK hack except union_allocvp(). Now, vput() doesn't lock
the vnode.


33181 09-Feb-1998 eivind

Staticize.


33146 07-Feb-1998 kato

Fixed pagefault when cred == NOCRED.

PR: 5632


33145 07-Feb-1998 kato

Fixed number of entries in gid-mapfile.

PR: 5640


33134 06-Feb-1998 eivind

Back out DIAGNOSTIC changes.


33129 06-Feb-1998 kato

Workarround for DIAGNOSTIC kernel's panic in union_lookup().
Union_removed_upper() clobbers cache when file is removed.
Upper vp will be removed by union_reclaim().


33109 05-Feb-1998 dyson

1) Start using a cleaner and more consistant page allocator instead
of the various ad-hoc schemes.
2) When bringing in UPAGES, the pmap code needs to do another vm_page_lookup.
3) When appropriate, set the PG_A or PG_M bits a-priori to both avoid some
processor errata, and to minimize redundant processor updating of page
tables.
4) Modify pmap_protect so that it can only remove permissions (as it
originally supported.) The additional capability is not needed.
5) Streamline read-only to read-write page mappings.
6) For pmap_copy_page, don't enable write mapping for source page.
7) Correct and clean-up pmap_incore.
8) Cluster initial kern_exec pagin.
9) Removal of some minor lint from kern_malloc.
10) Correct some ioopt code.
11) Remove some dead code from the MI swapout routine.
12) Correct vm_object_deallocate (to remove backing_object ref.)
13) Fix dead object handling, that had problems under heavy memory load.
14) Add minor vm_page_lookup improvements.
15) Some pages are not in objects, and make sure that the vm_page.c can
properly support such pages.
16) Add some more page deficit handling.
17) Some minor code readability improvements.


33108 04-Feb-1998 eivind

Turn DIAGNOSTIC into a new-style option.


33054 03-Feb-1998 bde

Forward declare some structs so that this file is more self-sufficient.


33052 03-Feb-1998 bde

Forward declare some structs so that this file is more self-sufficient.

Don't declare kernel objects or functions unless KERNEL is defined.


33037 03-Feb-1998 kato

Declare the variable `i' when UMAP_DIAGNOSTIC is defined.


32929 31-Jan-1998 eivind

Make the debug options new-style.

This also zaps a DPT option from lint; it wasn't referenced from
anywhere.


32760 25-Jan-1998 kato

Fixed typo in comment.


32702 22-Jan-1998 dyson

VM level code cleanups.

1) Start using TSM.
Struct procs continue to point to upages structure, after being freed.
Struct vmspace continues to point to pte object and kva space for kstack.
u_map is now superfluous.
2) vm_map's don't need to be reference counted. They always exist either
in the kernel or in a vmspace. The vmspaces are managed by reference
counts.
3) Remove the "wired" vm_map nonsense.
4) No need to keep a cache of kernel stack kva's.
5) Get rid of strange looking ++var, and change to var++.
6) Change more data structures to use our "zone" allocator. Added
struct proc, struct vmspace and struct vnode. This saves a significant
amount of kva space and physical memory. Additionally, this enables
TSM for the zone managed memory.
7) Keep ioopt disabled for now.
8) Remove the now bogus "single use" map concept.
9) Use generation counts or id's for data structures residing in TSM, where
it allows us to avoid unneeded restart overhead during traversals, where
blocking might occur.
10) Account better for memory deficits, so the pageout daemon will be able
to make enough memory available (experimental.)
11) Fix some vnode locking problems. (From Tor, I think.)
12) Add a check in ufs_lookup, to avoid lots of unneeded calls to bcmp.
(experimental.)
13) Significantly shrink, cleanup, and make slightly faster the vm_fault.c
code. Use generation counts, get rid of unneded collpase operations,
and clean up the cluster code.
14) Make vm_zone more suitable for TSM.

This commit is partially as a result of discussions and contributions from
other people, including DG, Tor Egge, PHK, and probably others that I
have forgotten to attribute (so let me know, if I forgot.)

This is not the infamous, final cleanup of the vnode stuff, but a necessary
step. Vnode mgmt should be correct, but things might still change, and
there is still some missing stuff (like ioopt, and physical backing of
non-merged cache files, debugging of layering concepts.)


32689 22-Jan-1998 kato

Delete unused code in union_fsync().


32642 20-Jan-1998 kato

- Move SETKLOC and CLEARKLOCK macros into uion.h.
- Set UN_ULOCK in union_lock() when UN_KLOCK is set. Caller expects
that vnode is locked correctly, and may call another function which
expects locked vnode and may unlock the vnode.
- Do not assume the behavior of inside functions in FreeBSD's
vfs_suber.c is same as 4.4BSD-Lite2. Vnode may be locked in
vget() even though flag is zero. (Locked vnode is, of course,
unlocked before returning from vget.)


32599 18-Jan-1998 kato

Workarround for locking violation while recycling vnode which union fs
used in freelist.


32598 18-Jan-1998 kato

Improve and revise fixes for locking violation.

Obtained from: NetBSD/pc98


32286 06-Jan-1998 dyson

Make our v_usecount vnode reference count work identically to the
original BSD code. The association between the vnode and the vm_object
no longer includes reference counts. The major difference is that
vm_object's are no longer freed gratuitiously from the vnode, and so
once an object is created for the vnode, it will last as long as the
vnode does.

When a vnode object reference count is incremented, then the underlying
vnode reference count is incremented also. The two "objects" are now
more intimately related, and so the interactions are now much less
complex.

When vnodes are now normally placed onto the free queue with an object still
attached. The rundown of the object happens at vnode rundown time, and
happens with exactly the same filesystem semantics of the original VFS
code. There is absolutely no need for vnode_pager_uncache and other
travesties like that anymore.

A side-effect of these changes is that SMP locking should be much simpler,
the I/O copyin/copyout optimizations work, NFS should be more ponderable,
and further work on layered filesystems should be less frustrating, because
of the totally coherent management of the vnode objects and vnodes.

Please be careful with your system while running this code, but I would
greatly appreciate feedback as soon a reasonably possible.


32285 06-Jan-1998 sef

Use CHECKIO in procfs_ioctl() to ensure that any changes in UID/GID result
in the expected failure.


32150 01-Jan-1998 bde

Fixed missing initialization of mp->mnt_stat. At least vm depends on
at least mp->mnt_stat.f_iosize being nonzero.

PR: 5212


32120 30-Dec-1997 bde

Fixed a missing/misplaced/misstyled prototype.


32071 29-Dec-1997 dyson

Lots of improvements, including restructring the caching and management
of vnodes and objects. There are some metadata performance improvements
that come along with this. There are also a few prototypes added when
the need is noticed. Changes include:

1) Cleaning up vref, vget.
2) Removal of the object cache.
3) Nuke vnode_pager_uncache and friends, because they aren't needed anymore.
4) Correct some missing LK_RETRY's in vn_lock.
5) Correct the page range in the code for msync.

Be gentle, and please give me feedback asap.


32011 27-Dec-1997 bde

Unspammed nested include of <vm/vm_zone.h>.


31929 21-Dec-1997 joerg

Properly clean out the SI_MOUNTEDON flag iff the mount attempt fails
half the way down. Otherwise, further attempts to mount the device
will be rejected with BUSY.

IMHO, this flag can completely go away for cd9660. There's no reason
you need to prevent CDs from being mounted multiple times, and in case
of multisession CDs it can even make sense to mount two different
sessions by the same time (to different mount points, otherwise it
would be pointless ;).


31891 20-Dec-1997 sef

Clear the p_stops field on change of user/group id, unless the correct
flag is set in the p_pfsflags field. This, essentially, prevents an SUID
proram from hanging after being traced. (E.g., "truss /usr/bin/rlogin" would
fail, but leave rlogin in a stopevent state.) Yet another case where procctl
is (hopefully ;)) no longer needed in the general case.

Reviewed by: bde (thanks bruce :))


31860 19-Dec-1997 bde

Set the sender's low watermark to match the maximum size for atomic
writes that we advertise (PIPE_BUF = 512).


31727 15-Dec-1997 wollman

Add support for poll(2) on files. vop_nopoll() now returns POLLNVAL
if one of the new poll types is requested; hopefully this will not break
any existing code. (This is done so that programs have a dependable
way of determining whether a filesystem supports the extended poll types
or not.)

The new poll types added are:

POLLWRITE - file contents may have been modified
POLLNLINK - file was linked, unlinked, or renamed
POLLATTRIB - file's attributes may have been changed
POLLEXTEND - file was extended

Note that the internal operation of poll() means that it is impossible
for two processes to reliably poll for the same event (this could
be fixed but may not be worth it), so it is not possible to rewrite
`tail -f' to use poll at this time.


31701 13-Dec-1997 bde

Fixed EOF handing.

1. SS_CANTRCVMORE was initially set on the wrong socket, so reads
when there has never been a writer on the socket did not return 0.
Note that such reads are only possible if the fifo was opened in
(O_RDONLY | O_NONBLOCK) mode.

2. SS_CANTSENDMORE was initially set on the wrong socket, but this
was harmless because the wrong socket is never sent from and there
is no need to set the flag initially on the right socket (since open
in (O_WRONLY | O_NONBLOCK) mode fails if there is no reader...).

3. SS_CANTRCVMORE was cleared when read() returns. This broke the
case where read() returns 0 - subsequent reads are supposed to
return 0 until a writer appears. There is no need to clear the
flag when read() returns, since it is cleared correctly when a
writer appears.


31700 13-Dec-1997 bde

Restored fifo_pathconf() from rev.1.32. vop_stdpathconf() is too
general to be of much use. Using it here weakened the _PC_MAX_CANON,
_PC_MAX_INPUT and _PC_VDISABLE cases.

fifo_pathconf() is not quite correct either. _PC_CHOWN_RESTRICTED
and _PC_LINK_MAX should be handled by the host file system. For
directories, the host file system should let us handle _PC_PIPE_BUF.


31691 13-Dec-1997 sef

Change the ioctls for procfs around a bit; in particular, whever possible,
change from

ioctl(fd, PIOC<foo>, &i);

to

ioctl(fd, PIOC<foo>, i);

This is going from the _IOW to _IO ioctl macro. The kernel, procctl, and
truss must be in synch for it all to work (not doing so will get errors about
inappropriate ioctl's, fortunately). Hopefully I didn't forget anything :).


31674 12-Dec-1997 sef

Fix a problem with procfs_exit() that resulted in missing some procfs
nodes; this also apparantly caused a panic in some circumstances.
Also, since procfs_exit() is getting rid of the nodes when a process
exits, don't bother checking for the process' existance in procfs_inactive().


31640 09-Dec-1997 sef

Code to prevent a panic caused by procfs_exit(). Note that i don't know
what is teh root cause -- but, sometimes, a procfs vnode in pfshead is
apparantly corrupt (or a UFS vnode instead). Without this patch, I can
get it to panic by doing (in csh)

while (1)
ps auxwww
end

and it will panic when the PID's wrap. With it, it does not panic.
Yes -- I know that this is NOT the right way to fix it. But I haven't
been able to get it to panic yet (which confuses me). I am going to
be looking into the vgone() code now, as that may be a part of it.


31636 08-Dec-1997 sef

A couple of fixes from bruce: first of all, psignal is a void (stupid
me; unfortunately, also makes it hard ot check for errors); second, I had
managed to forget a change to PIOCSFL (it should be _IOW, not _IOR) I had
in my local copy, and Bruce called me on it.

Submitted by: bde


31618 08-Dec-1997 sef

Use at_exit() to invoke procfs_exit() instead of calling it directly.
Note that an unload facility should be used to call rm_at_exit() (if
procfs is being loaded as an LKM and is subsequently removed), but it
was non-obvious how to do this in the VFS framework.

Reviewed by: Julian Elischer


31595 07-Dec-1997 sef

Clear the stop events and wakeup the process on teh last close of the
procfs/mem file. While this doesn't prevent an unkillable process, it
means that a broken truss prorgam won't do it accidently now (well,
there's a small window of opportunity). Note that this requires the
change to truss I am about to commit.


31564 06-Dec-1997 sef

Changes to allow event-based process monitoring and control.


31561 05-Dec-1997 bde

Don't include <sys/lock.h> in headers when only `struct simplelock' is
required. Fixed everything that depended on the pollution.


31273 18-Nov-1997 phk

Staticize.


31271 18-Nov-1997 phk

Staticize a few things.


31174 14-Nov-1997 tegge

Don't try to obtain an excluive lock on the vm map, since a deadlock might
occur if the process owning the map is wiring pages.


31132 12-Nov-1997 julian

Reviewed by: various.

Ever since I first say the way the mount flags were used I've hated the
fact that modes, and events, internal and exported, and short-term
and long term flags are all thrown together. Finally it's annoyed me enough..
This patch to the entire FreeBSD tree adds a second mount flag word
to the mount struct. it is not exported to userspace. I have moved
some of the non exported flags over to this word. this means that we now
have 8 free bits in the mount flags. There are another two that might
well move over, but which I'm not sure about.
The only user visible change would have been in pstat -v, except
that davidg has disabled it anyhow.
I'd still like to move the state flags and the 'command' flags
apart from each other.. e.g. MNT_FORCE really doesn't have the
same semantics as MNT_RDONLY, but that's left for another day.


31016 07-Nov-1997 phk

Remove a bunch of variables which were unused both in GENERIC and LINT.

Found by: -Wunused


30994 06-Nov-1997 phk

Move the "retval" (3rd) parameter from all syscall functions and put
it in struct proc instead.

This fixes a boatload of compiler warning, and removes a lot of cruft
from the sources.

I have not removed the /*ARGSUSED*/, they will require some looking at.

libkvm, ps and other userland struct proc frobbing programs will need
recompiled.


30785 27-Oct-1997 bde

KNFize rev.1.31.


30784 27-Oct-1997 bde

Use unique sleep message strings.


30782 27-Oct-1997 bde

Use bread() instead of cluster_read() for reading the last block
in a file. There was a (harmless, I think) off-by-1 error. This
was fixed in ufs long ago (rev.1.21 of ufs_readwrite.c) but not
in cd9660.

cd9660_read() has stagnated in many other ways. It is closer to
the Net/2 ufs_read() (which is was cloned from) than ufs_read()
itself is.


30780 27-Oct-1997 bde

Removed unused #includes. The need for most of them went away with
recent changes (docluster* and vfs improvements).


30743 26-Oct-1997 phk

VFS interior redecoration.

Rename vn_default_error to vop_defaultop all over the place.
Move vn_bwrite from vfs_bio.c to vfs_default.c and call it vop_stdbwrite.
Use vop_null instead of nullop.
Move vop_nopoll from vfs_subr.c to vfs_default.c
Move vop_sharedlock from vfs_subr.c to vfs_default.c
Move vop_nolock from vfs_subr.c to vfs_default.c
Move vop_nounlock from vfs_subr.c to vfs_default.c
Move vop_noislocked from vfs_subr.c to vfs_default.c
Use vop_ebadf instead of *_ebadf.
Add vop_defaultop for getpages on master vnode in MFS.


30637 21-Oct-1997 roberto

Fix the same leak as in nullfs. Now the lowervp is properly marked inactive.

Reviewed by: phk


30636 21-Oct-1997 roberto

Fix the file leak bug. The lower layer wasn't informed the vnode was inactive
and kept a reference, preventing the blocks to be reclaimed.

Changed the comment in null_inactive to reflect the current situation.

Reviewed by: phk


30513 17-Oct-1997 phk

Make a set of VOP standard lock, unlock & islocked VOP operators, which
depend on the lock being located at vp->v_data. Saves 3x3 identical
vop procs, more as the other filesystems becomes lock aware.


30496 16-Oct-1997 phk

VFS clean up "hekto commit"

1. Add defaults for more VOPs
VOP_LOCK vop_nolock
VOP_ISLOCKED vop_noislocked
VOP_UNLOCK vop_nounlock
and remove direct reference in filesystems.

2. Rename the nfsv2 vnop tables to improve sorting order.


30492 16-Oct-1997 phk

Another VFS cleanup "kilo commit"

1. Remove VOP_UPDATE, it is (also) an UFS/{FFS,LFS,EXT2FS,MFS}
intereface function, and now lives in the ufsmount structure.

2. Remove VOP_SEEK, it was unused.

3. Add mode default vops:

VOP_ADVLOCK vop_einval
VOP_CLOSE vop_null
VOP_FSYNC vop_null
VOP_IOCTL vop_enotty
VOP_MMAP vop_einval
VOP_OPEN vop_null
VOP_PATHCONF vop_einval
VOP_READLINK vop_einval
VOP_REALLOCBLKS vop_eopnotsupp

And remove identical functionality from filesystems

4. Add vop_stdpathconf, which returns the canonical stuff. Use
it in the filesystems. (XXX: It's probably wrong that specfs
and fifofs sets this vop, shouldn't it come from the "host"
filesystem, for instance ufs or cd9660 ?)

5. Try to make system wide VOP functions have vop_* names.

6. Initialize the um_* vectors in LFS.

(Recompile your LKMS!!!)


30474 16-Oct-1997 phk

VFS mega cleanup commit (x/N)

1. Add new file "sys/kern/vfs_default.c" where default actions for
VOPs go. Implement proper defaults for ABORTOP, BWRITE, LEASE,
POLL, REVOKE and STRATEGY. Various stuff spread over the entire
tree belongs here.

2. Change VOP_BLKATOFF to a normal function in cd9660.

3. Kill VOP_BLKATOFF, VOP_TRUNCATE, VOP_VFREE, VOP_VALLOC. These
are private interface functions between UFS and the underlying
storage manager layer (FFS/LFS/MFS/EXT2FS). The functions now
live in struct ufsmount instead.

4. Remove a kludge of VOP_ functions in all filesystems, that did
nothing but obscure the simplicity and break the expandability.
If a filesystem doesn't implement VOP_FOO, it shouldn't have an
entry for it in its vnops table. The system will try to DTRT
if it is not implemented. There are still some cruft left, but
the bulk of it is done.

5. Fix another VCALL in vfs_cache.c (thanks Bruce!)


30439 15-Oct-1997 phk

vnops megacommit

1. Use the default function to access all the specfs operations.
2. Use the default function to access all the fifofs operations.
3. Use the default function to access all the ufs operations.
4. Fix VCALL usage in vfs_cache.c
5. Use VOCALL to access specfs functions in devfs_vnops.c
6. Staticize most of the spec and fifofs vnops functions.
7. Make UFS panic if it lacks bits of the underlying storage handling.


30434 15-Oct-1997 phk

Hmm, realign the vnops into two columns.


30431 15-Oct-1997 phk

Stylistic overhaul of vnops tables.
1. Remove comment stating the blatantly obvious.
2. Align in two columns.
3. Sort all but the default element alphabetically.
4. Remove XXX comments pointing out entries not needed.


30354 12-Oct-1997 phk

Last major round (Unless Bruce thinks of somthing :-) of malloc changes.

Distribute all but the most fundamental malloc types. This time I also
remembered the trick to making things static: Put "static" in front of
them.

A couple of finer points by: bde


30309 11-Oct-1997 phk

Distribute and statizice a lot of the malloc M_* types.

Substantial input from: bde


29888 27-Sep-1997 kato

Clustered read and write are switched at mount-option level.

1. Clustered I/O is switched by the MNT_NOCLUSTERR and MNT_NOCLUSTERW
bits of the mnt_flag. The sysctl variables, vfs.foo.doclusterread
and vfs.foo.doclusterwrite are deleted. Only mount option can
control clustered I/O from userland.
2. When foofs_mount mounts block device, foofs_mount checks D_CLUSTERR
and D_CLUSTERW bits of the d_flags member in the block device switch
table. If D_NOCLUSTERR / D_NOCLUSTERW are set, MNT_NOCLUSTERR /
MNT_NOCLUSTERW bits will be set. In this case, MNT_NOCLUSTERR and
MNT_NOCLUSTERW cannot be cleared from userland.
3. Vnode driver disables both clustered read and write.
4. Union filesystem disables clutered write.

Reviewed by: bde


29653 21-Sep-1997 dyson

Change the M_NAMEI allocations to use the zone allocator. This change
plus the previous changes to use the zone allocator decrease the useage
of malloc by half. The Zone allocator will be upgradeable to be able
to use per CPU-pools, and has more intelligent usage of SPLs. Additionally,
it has reasonable stats gathering capabilities, while making most calls
inline.


29584 18-Sep-1997 phk

Executing binaries on a nullfs (or nullfs-based) filesystem results in
a trap.
PR: 3104
Reviewed by: phk
Submitted by: Dan Walters hannibal@cyberstation.net


29362 14-Sep-1997 peter

Convert select -> poll.
Delete 'always succeed' select/poll handlers, replaced with generic call.
Flag missing vnode op table entries.


29286 10-Sep-1997 phk

Fix a type in a comment and remove some checks now done centrally.


29285 10-Sep-1997 phk

This stuff is now done centrally.


29208 07-Sep-1997 bde

Removed yet more vestiges of config-time swap configuration and/or
cleaned up nearby cruft.


29180 07-Sep-1997 bde

Staticized.


29179 07-Sep-1997 bde

Some staticized variables were still declared to be extern.


29084 04-Sep-1997 kato

Support read-only mount.


29041 02-Sep-1997 bde

Removed unused #includes.


28844 28-Aug-1997 kato

Include "opt_ddb.h" only when NULLFS_DIAGNOSTIC is defined.


28832 27-Aug-1997 kato

Fixed NULLFS_DIAGNOSTIC stuff.


28787 26-Aug-1997 phk

Uncut&paste cache_lookup().

This unifies several times in theory indentical 50 lines of code.

The filesystems have a new method: vop_cachedlookup, which is the
meat of the lookup, and use vfs_cache_lookup() for their vop_lookup
method. vfs_cache_lookup() will check the namecache and pass on
to the vop_cachedlookup method in case of a miss.

It's still the task of the individual filesystems to populate the
namecache with cache_enter().

Filesystems that do not use the namecache will just provide the
vop_lookup method as usual.


28774 26-Aug-1997 dyson

Back out some incorrect changes that was worse than the original bug.


28716 25-Aug-1997 kato

Added a sysctl arg, vfs.cd9660.doclusterread. Deleted debug and
!FreeBSD code arround cluster read stuff.


28558 22-Aug-1997 dyson

This is a trial improvement for the vnode reference count while on the vnode
free list problem. Also, the vnode age flag is no longer used by the
vnode pager. (It is actually incorrect to use then.) Constructive
feedback welcome -- just be kind.


28270 16-Aug-1997 wollman

Fix all areas of the system (or at least all those in LINT) to avoid storing
socket addresses in mbufs. (Socket buffers are the one exception.) A number
of kernel APIs needed to get fixed in order to make this happen. Also,
fix three protocol families which kept PCBs in mbufs to not malloc them
instead. Delete some old compatibility cruft while we're at it, and add
some new routines in the in_cksum family.


28233 15-Aug-1997 kato

Added DIAGNOSTIC routine to test inconsistency of vnode when cnp
points `.'.

Obtained from: NetBSD


28232 15-Aug-1997 kato

Deleted unused code which adjust UN_UNLOCK flag.


28189 14-Aug-1997 kato

If the user doesn't have read permission, union_copyup should not copy
a file to upper layer.

Reviewed by: Naofumi Honda <honda@Kururu.math.sci.hokudai.ac.jp>


28188 14-Aug-1997 kato

Backed out part of previous change. The example of -b mount in
manpage works again.


28101 12-Aug-1997 kato

Fixed vnode corruption by undefined case in union_lookup(). When
uerror == 0 && lerror == EACCES, lowervp == NULLVP and union_allocvp
doesn't find existing union node and new union node is created.

Sicne it is dificult to cover all the case, union_lookup always
returns when union_lookup1() returns EACCES.

Submitted by: Naofumi Honda <honda@Kururu.math.sci.hokudai.ac.jp>
Obtained from: NetBSD/pc98


28089 12-Aug-1997 sef

Check permissions for fp regs as well as normal regs.


28086 12-Aug-1997 sef

Fix procfs security hole -- check permissions on meaningful I/Os (namely,
reading/writing of mem and regs). Also have to check for the requesting
process being group KMEM -- this is a bit of a hack, but ps et al need it.

Reviewed by: davidg


27845 02-Aug-1997 bde

Removed unused #includes.


26964 26-Jun-1997 alex

More comment cleanup.


26963 26-Jun-1997 alex

Typo police.


26962 26-Jun-1997 alex

Style fix my previous commit.


26769 21-Jun-1997 alex

Block all write operations to /proc/1/* when securelevel > 0.
The additional check in procfs_ctl.c could be backed out, but
I'm leaving it in for good measure.

Reviewed by: Theo de Raadt <deraadt@OpenBSD.org>


26271 29-May-1997 tegge

Don't remove the controlling tty from the session if the vnode is being
cleaned. This should help for PR kern/3581.


26111 25-May-1997 peter

Fix some warnings (missing prototypes, wrong "generic" args etc)
umapfs uses one of nullfs's functions...


25877 17-May-1997 phk

Remove redundant check for vp == dvp (done in VFS before calling).


25535 07-May-1997 kato

1. Added cast and parenthesis in block size calculaion in
union_statfs().
2. staticized union vops.

Submitted by: Doug Rabson <dfr@nlsystems.com>


25531 07-May-1997 joerg

Hide the kernel-only stuff inside #ifdef KERNEL.
XXX should be #ifdef _KERNEL
XXX^2 the !KERNEL part should probably be moved out into a publically
visible header file anyway.


25461 04-May-1997 joerg

Oops. The function cd9660_mountroot() is gone, but i've committed an
even more bogus prototype for it in my previous commit.


25460 04-May-1997 joerg

This mega-commit brings the following:

. It makes cd9660 root f/s working again.
. It makes CD9660 a new-style option.
. It adds support to mount an ISO9660 multi-session CD-ROM as the root
filesystem (the last session actually, but that's what is expected
behaviour).

Sigh. The CDIOREADTOCENTRYS did a copyout() of its own, and thus has
been unusable for me for this work. Too bad it didn't simply stuff
the max 100 entries into the struct ioc_read_toc_entry, but relied on
a user supplied data buffer instead. :-( I now had to reinvent the
wheel, and created a CDIOREADTOCENTRY ioctl command that can be used
in a kernel context.

While doing this, i noticed the following bogosities in existing CD-ROM
drivers:

wcd: This driver is likely to be totally bogus when someone tries
two succeeding CDIOREADTOCENTRYS (or now CDIOREADTOCENTRY)
commands with requesting MSF format, since it apparently
operates on an internal table.

scd: This driver apparently returns just a single TOC entry only for
the CDIOREADTOCENTRYS command.

I have only been able to test the CDIOREADTOCENTRY command with the
cd(4) driver. I hereby request the respective maintainers of the
other CD-ROM drivers to verify my code for their driver. When it
comes to merging this CD-ROM multisession stuff into RELENG_2_2 i will
only consider drivers where i've got a confirmation that it actually
works.


25397 03-May-1997 kato

Fixed panic message in union_lock(): union_link --> union_lock.


25379 02-May-1997 kato

Access correct union mount point in union_access. Old vnode is saved
in savedvp variable and it is used for the argument of
MOUNTTOUNIONMOUNT(). I didn't realize ap->a_vp is modified before
MOUNTTOUNIONMOUNT(), so the change by revision 1.22 is incorrect.


25358 01-May-1997 sos

Remove the dependancy on DEV_BSIZE, now specfs works on != 512byte
sector devices given that the fs uses a blocksize of at least a physical
sector size.


25287 29-Apr-1997 joerg

For multi-session CD-ROMs, we have to account for previous sessions as
well in volume_space_size. Otherwise, NFS exports won't work.


25285 29-Apr-1997 joerg

Add support for ISO9660 multi-session CD-ROMs. This is just nothing
but searching the directory on something else than the default
location.

NB: this comprises an interface change to the mount_cd9660(8)
utility (commit will follow). You need to rebuild both.

I've got similar patches for RELENG_2_2, should i commit them too?


25261 29-Apr-1997 kato

Revised fix for locking violation when unionfs calls vput with
UN_KLOCK flag.

When UN_KLOCK is set, VOP_UNLOCK should keep uppervp locked and clear
UN_ULOCK flag. To do this, when UN_KLOCK is set, (1) union_unlock
clears UN_ULOCK and does not clear UN_KLOCK, (2) union_lock() does not
access uppervp and does not clear UN_KLOCK, and (3) callers of
vput/VOP_UNLOCK should clear UN_KLOCK. For example, vput becomes:

SETKLOCK(union_node);
vput(vnode);
CLEARKLOCK(union_node);

where SETKLOCK macro sets UN_KLOCK and CLEARKLOCK macro clears
UN_KLOCK.


25207 27-Apr-1997 alex

Removed bogon from previous commit: doubly included sys/systm.h.


25200 27-Apr-1997 alex

Prevent debugger attachment to init when securelevel > 0.

Noticed by: Brian Buchanan <brian@wasteland.calbbs.com>


25192 27-Apr-1997 kato

Undo 1.29.


25167 26-Apr-1997 kato

Do nothing instead of adjusting un_flags when (uppervp is locked) &&
(UN_ULOCK is not set) in union_lock. This condition may indicate
race. DIAGNOSTIC kernel still panic here.


25160 26-Apr-1997 kato

Do not clear UN_ULOCK in certain case.

Our vput calls vm_object_deallocate() --> vm_object_terminate(). The
vm_object_terminate() calls vn_lock(), since UN_LOCKED has been
already cleared in union_unlock(). Then, union_lock locks upper vnode
when UN_ULOCK is not set. The upper vnode is not unlocked when
UN_KLOCK is set in union_unlock(), thus, union_lock tries to lock
locked vnode and we get panic.


25079 21-Apr-1997 kato

Dirty change in union_lock(). Sometimes upper vnode is locked without
UN_ULOCK flag. This shows a locking violation but I couldn't find the
reason UN_ULOCK is not set or upper vnode is not unlocked. I added
the code that detect this case and adjust un_flags. DIAGNOSTIC kernel
doesn't adjust un_flags, but just panic here to help debug by kernel
hackers.


25070 21-Apr-1997 kato

Replace VOP_LOCK with vn_lock.


25055 20-Apr-1997 dyson

Fix both a problem with accessing backing objects, and also release
the process map on nonexistant pages.
PR: kern/3327
Submitted by: Tor Egge <Tor.Egge@idi.ntnu.no>


25016 19-Apr-1997 kato

Avoid `lock against myself' panic by following operation:

# mount -t union (or null) dir1 dir2
# mount -t union (or null) dir2 dir1

The function namei in union_mount calls union_root. The upper vnode
has been already locked and vn_lock in union_root causes above panic.

Add printf's included in `#ifdef DIAGNOSTIC' for EDEADLK cases.


24988 17-Apr-1997 kato

Fix `locking against myself' panic by multi nullfs mount of same
directory pair.


24987 17-Apr-1997 kato

Use NULLVP instead of NULL.


24985 16-Apr-1997 kato

Do not set the uppervp to NULLVP in union_removed_upper. If lowervp
is NULLVP, union node will have neither uppervp nor lowervp. This
causes page fault trap.

The union_removed_upper just remove union node from cache and it
doesn't set uppervp to NULLVP. Since union node is removed from
cache, it will not be referenced.

The code that remove union node from cache was copied from
union_inactive.


24974 16-Apr-1997 kato

Undo previous commit to avoid panic, and fix order of argument of
VOP_LINK(). The reason of strange behavior was wrong order of the
argument, that is, the operation

# ln foo bar

in a union fs tried to do

# ln bar foo

in ufs layer.

Now we can make a link in a union fs.


24963 15-Apr-1997 kato

Quick-hack to avoid `lock against myself' panic. It is not the real
fix!

The ufs_link() assumes that vnode is not unlocked and tries to lock it
in certain case. Because union_link calls VOP_LINK after locking vnode,
vn_lock in ufs_link causes above panic.

Currently, I don't know the real fix for a locking violation in
union_link, but I think it is important to avoid panic.

A vnode is unlocked before calling VOP_LINK and is locked after it if
the vnode is not union fs. Even though panic went away, the process
that access the union fs in which link was made will hang-up.

Hang-up can be easily reproduced by following operation:

mount -t union a b
cd b
ln foo bar
ls


24948 15-Apr-1997 bde

Removed more traces of ISODEVMAP.


24934 14-Apr-1997 phk

Remove all traces of undocumented feature ISODEVMAP.


24921 14-Apr-1997 kato

Fix `lockmgr: locking against myself' panic by multi union mount of
same directory pair.

If we do:
mount -t union a b
mount -t union a b
then, (1) namei tries to lock fs which has been already locked by
first union mount and (2) union_root() tries to lock locked fs. To
avoid first deadlock condition, unlock vnode if lowerrootvp is union
node, and to avoid second case, union_mount returns EDEADLK when multi
union mount is detected.


24918 14-Apr-1997 kato

Fix locking violation when accessing `..'.
Obtained from: NetBSD


24875 13-Apr-1997 kato

Access correct union mount point in union_access.


24858 13-Apr-1997 phk

The function union_fsync tries to lock overlaying vnode object when
dolock is not set (that is, targetvp == overlaying vnode object).
Current code use FIXUP macro to do this, and never unlocks overlaying
vnode object in union_fsync. So, the vnode object will be locked
twice and never unlocked.

PR: 3271
Submitted by: kato


24857 13-Apr-1997 phk

The path name buffer, cn->cn_pnbuf, is FREEed by VOP_MKDIR when
relookup() in union_relookup() is succeeded. However, if relookup()
returns non-zero value, that is relookup fails, VOP_MKDIR is never
called (c.f. union_mkshadow). Thus, pathname buffer is never FREEed.

Reviewed by: phk
Submitted by: kato
PR: 3262


24856 13-Apr-1997 phk

Though malloc allocates only cn.cn_namelen bytes for cn.cn_pnbuf in
union_vn_create(), following bcopy copies cn.cn_namlen + 1 bytes to
cn.cn_pnbuf

PR: 3255
Reviewed by: phk
Submitted by: kato


24788 10-Apr-1997 bde

Get the declaration of `struct dirent' from <sys/dirent.h>, not from
<sys/dir.h>, and use the new macro GENERIC_DIRSIZ() instead of DIRSIZ().

Removed unused #includes.


24787 10-Apr-1997 bde

Get the declaration of `struct dirent' from <sys/dirent.h>, not from
<sys/dir.h>.

Removed unused #include.

Fixed type and order of struct members in pseudo-declaration of `struct
vop_readdir_args'.


24785 10-Apr-1997 bde

Removed unused or apparently-unused #includes, especially of the
deprecated header <sys/dir.h>.


24666 06-Apr-1997 dyson

Fix the gdb executable modify problem. Thanks to the detective work
by Alan Cox <alc@cs.rice.edu>, and his description of the problem.

The bug was primarily in procfs_mem, but the mistake likely happened
due to the lack of vm system support for the operation. I added
better support for selective marking of page dirty flags so that
vm_map_pageable(wiring) will not cause this problem again.

The code in procfs_mem is now less bogus (but maybe still a little
so.)


24205 24-Mar-1997 bde

Don't include <sys/ioctl.h> in the kernel. Stage 3: include
<sys/filio.h> instead of <sys/ioctl.h> in non-network non-tty files.


24203 24-Mar-1997 bde

Don't include <sys/ioctl.h> in the kernel. Stage 1: don't include
it when it is not used. In most cases, the reasons for including it
went away when the special ioctl headers became self-sufficient.


24131 23-Mar-1997 bde

Don't #include <sys/fcntl.h> in <sys/file.h> if KERNEL is defined.
Fixed everything that depended on getting fcntl.h stuff from the wrong
place. Most things don't depend on file.h stuff at all.


23997 18-Mar-1997 peter

Restore the lost MNT_LOCAL flag twiddle. Lite2 has a different mechanism
of setting it (compiled into vfs_conf.c), but we have a dynamic system
in place. This could probably be better done via a runtime configure
flag in the VFS_SET() VFS declaration, perhaps VFCF_LOCAL, and have the
VFS code propagate this down into MNT_LOCAL at mount time. The other FS's
would need to be updated, havinf UFS and MSDOSFS filesystems without
MNT_LOCAL breaks a few things.. the man page rebuild scans for local
filesystems and currently fails, I suspect that other tools like find
and tar with their "local filesystem only" modes might be affected.


23527 08-Mar-1997 bde

Use the common nchstats struct instead of a private one for ncs_2passes
and ncs_pass2. The public one is already used for other cd9660 statistics
and the private one was effectively invisible.


23526 08-Mar-1997 bde

Fixed missing initialisation of vp->v_type for types Pfile and Pmem
in procfs_allocvp(). This fixes at least stat() of /proc/*/mem.

stat() of /proc/*/file already worked. I think procfs_allocvp() isn't
actually called for type Pfile.


23351 03-Mar-1997 bde

Don't export kernel interfaces to applications. msdosfs_mount probably
didn't compile before this change.

Added idempotency ifdef.


23134 26-Feb-1997 bde

Updated msdosfs to use Lite2 vfs configuration and Lite2 locking. It
should now work as (un)well as before the Lite2 merge.


23077 24-Feb-1997 bde

Fixed procfs's locking vops. They were missed in the Lite2 merge,
partly because the #define's for them were moved to a different
file. At least the null VOP_LOCK() no longer works, since vclean()
expects VOP_LOCK( ..., LK_DRAIN | LK_INTERLOCK, ...) to clear the
interlock. This probably only matters when simple_lock() is not
null, i.e., when there are multiple CPUs or SIMPLELOCK_DEBUG is
defined.


22975 22-Feb-1997 peter

Back out part 1 of the MCFH that changed $Id$ to $FreeBSD$. We are not
ready for it yet.


22620 13-Feb-1997 bde

Killed more FIFO ifdefs. All gone now.


22618 13-Feb-1997 bde

Removed bogus B_AGE policy again (see rev 1.4).

Removed FIFO ifdef again (see rev.1.8). This also fixes vfs initialization
since the VNODEOP_SET() was inside the ifdef.


22607 12-Feb-1997 mpp

Eliminate the last of the compile warnings in this module by
correctly casting the arguments to all of the null_bypass() calls.


22605 12-Feb-1997 mpp

Restore of #include <sys/kernel.h> so that this compiles without
warnings again.


22601 12-Feb-1997 mpp

Make this compile without warnings after the Lite2 merge:

- *fs_init routines now take a "struct vfsconf * vfsp" pointer
as an argument.
- Use the correct type for cookies.
- Update function prototypes.

Submitted by: bde


22600 12-Feb-1997 mpp

Rstored #include of <sys/kernel.h> so that this compiles
without warnings again.

Submitted by: bde


22597 12-Feb-1997 mpp

Make this compile again after the Lite2 merge.
Also add missing function prototypes.


22596 12-Feb-1997 mpp

Add missing function prototypes.


22595 12-Feb-1997 bde

Added parameter names to prototypes that were added in the last commit to
match nearby style.


22594 12-Feb-1997 bde

Restored #include of <sys/kernel.h> so that this compiles again.


22593 12-Feb-1997 bde

Declare function args in order in recently K&Rised function headers.


22582 12-Feb-1997 mpp

Add function protypes for the new Lite2 unionfs functions.


22579 12-Feb-1997 mpp

Add function prototypes for most of the new Lite2 functions.
Also made a few of the miscfs routines static to be
consistent. Some modules simply required some additional
#includes to remove -Wall warnings.


22567 11-Feb-1997 bde

Restored one line of "High Sierra" changes from rev.1.8.

The Lite2 changes in cd9660 are scarey. I probably missed some
other lossage in this file.


22566 11-Feb-1997 bde

Restored one line of "High Sierra" changes from rev.1.6 which was
blown away by the previous commit.

Not restored: trailing whitespace changes from rev.1.7.
Not restored: -Wall cleanup from rev.1.5.


22565 11-Feb-1997 bde

Removed High Sierra task from TODO list. Joerg did it years ago and
other items were removed from the list when they were done in the
Lite2 merge. The Lite2 merge just broke the High Sierra changes.


22521 10-Feb-1997 dyson

This is the kernel Lite/2 commit. There are some requisite userland
changes, so don't expect to be able to run the kernel as-is (very well)
without the appropriate Lite/2 userland changes.

The system boots and can mount UFS filesystems.

Untested: ext2fs, msdosfs, NFS
Known problems: Incorrect Berkeley ID strings in some files.
Mount_std mounts will not work until the getfsent
library routine is changed.

Reviewed by: various people
Submitted by: Jeffery Hsu <hsu@freebsd.org>


21754 16-Jan-1997 dyson

Change the map entry flags from bitfields to bitmasks. Allows
for some code simplification.


21673 14-Jan-1997 jkh

Make the long-awaited change from $Id$ to $FreeBSD$

This will make a number of things easier in the future, as well as (finally!)
avoiding the Id-smashing problem which has plagued developers for so long.

Boy, I'm glad we're not using sup anymore. This update would have been
insane otherwise.


21002 29-Dec-1996 dyson

This commit is the embodiment of some VFS read clustering improvements.
Firstly, now our read-ahead clustering is on a file descriptor basis and not
on a per-vnode basis. This will allow multiple processes reading the
same file to take advantage of read-ahead clustering. Secondly, there
previously was a problem with large reads still using the ramp-up
algorithm. Of course, that was bogus, and now we read the entire
"chunk" off of the disk in one operation. The read-ahead clustering
algorithm should use less CPU than the previous also (I hope :-)).

NOTE: THAT LKMS MUST BE REBUILT!!!


20910 25-Dec-1996 bde

Don't synchronously update the directory entry at the end of every
successful write. Only do it for the IO_SYNC case (like ufs). On
one of my systems, this speeds up `iozone 24 512' from 32K/sec
(1/128 as fast as ufs) to 2.8MB/sec (7/10 as fast as ufs).

Obtained from: partly from NetBSD


20691 19-Dec-1996 bde

Fixed lseek() on named pipes. It always succeeded but should always fail.
Broke locking on named pipes in the same way as locking on non-vnodes
(wrong errno). This will be fixed later.

The fix involves negative logic. Named pipes are now distinguished from
other types of files with vnodes, and there is additional code to handle
vnodes and named pipes in the same way only where that makes sense (not
for lseek, locking or TIOCSCTTY).


20687 19-Dec-1996 bde

Fixed errno for unsupported advisory locks. The errno is now EINVAL
fcntl() and EOPNOTSUPP for flock(). POSIX specifies the weaker EINVAL
errno and the man page agrees.

Not fixed:
deadfs: always returns wrong EBADF
devfs, msdosfs: always return sometimes-wrong EINVAL
cd9660, fdesc, kernfs, portal: always return sometimes-wrong EOPNOTSUPP
procfs: always returns wrong EIO
mfs: panic?!
nfs: fudged

NetBSD uses a generic file system genfs to do return the sometimes-wrong
EOPNOTSUPP more consistently :-)(.

Found by: NIST-PCTS


20138 04-Dec-1996 bde

Fixed an off by 1 error in unix2dostime(). The first day of each month
was converted to the last day of the previous month. This bug was
introduced in the optimizations in rev.1.4.


19261 30-Oct-1996 dyson

Fix a potential deadlock from the previous commit.


19260 30-Oct-1996 dyson

Fix the /proc/???/map file so that it is possible to read an arbitrarily
large process map. Another commit will follow to fix a problem just found
during this one... Sorry!!! :-(.


19141 24-Oct-1996 dyson

Fix setting breakpoints in shared regions.


19067 20-Oct-1996 alex

Fix signed/unsigned comparison warnings.

Reviewed by: bde


18775 06-Oct-1996 dyson

Substitution of a long divide by a shift. Other cosmetic improvements.
Submitted by: bde


18640 02-Oct-1996 dyson

MSDOS FS used to allocate a buffer before extending the VM object. In
certain error conditions, it is possible for pages to be left allocated
in the object beyond it's end. It is generally bad practice to allocate
pages beyond the end of an object.


18413 20-Sep-1996 nate

Whoops, I should've used the LINT config file. More ts -> tv changes
for timespec structure.


18412 20-Sep-1996 nate

Whoops, I should've used the LINT config file. More ts -> tv changes
for timespec structure.


18397 19-Sep-1996 nate

In sys/time.h, struct timespec is defined as:

/*
* Structure defined by POSIX.4 to be like a timeval.
*/
struct timespec {
time_t ts_sec; /* seconds */
long ts_nsec; /* and nanoseconds */
};

The correct names of the fields are tv_sec and tv_nsec.

Reminded by: James Drobina <jdrobina@infinet.com>


18020 03-Sep-1996 bde

Eliminated nested include of <sys/unistd.h> in <sys/file.h> in the kernel.
Include it directly in the few places where it is used.

Reduced some #includes of <sys/file.h> to #includes of <sys/fcntl.h> or
nothing.


17974 31-Aug-1996 bde

Fixed the easy cases of const poisoning in the kernel. Cosmetic.


17761 21-Aug-1996 dyson

Even though this looks like it, this is not a complex code change.
The interface into the "VMIO" system has changed to be more consistant
and robust. Essentially, it is now no longer necessary to call vn_open
to get merged VM/Buffer cache operation, and exceptional conditions
such as merged operation of VBLK devices is simpler and more correct.

This code corrects a potentially large set of problems including the
problems with ktrace output and loaded systems, file create/deletes,
etc.

Most of the changes to NFS are cosmetic and name changes, eliminating
a layer of subroutine calls. The direct calls to vput/vrele have
been re-instituted for better cross platform compatibility.

Reviewed by: davidg


17314 28-Jul-1996 ache

bzero reserved field into directory entry, junk here cause
scandisk error under Win95


17306 27-Jul-1996 dyson

Modify slightly the output from the map file in /proc. Now the
executable bit is shown.


17303 27-Jul-1996 dyson

Under certain circumstances, reading the /proc/*/map file can
crash the system. Nonexistant objects were not handled correctly.


17296 27-Jul-1996 dyson

Remove a totally unneeded (and as of the last VM commit, incorrect) call
to pmap_clear_modify.


16901 02-Jul-1996 dyson

Implement locking for pfs nodes, when at the leaf. Concurrent access
to information from a single process causes hangs. Specifically, this
fixes problems (hangs) with concurrent ps commands, when the system is under
heavy memory load.
Reviewed by: davidg


16889 02-Jul-1996 dyson

Fix a serious problem, with a window where an object lock is needed,
but not there. The extent of the object lock is expanded to be over the
range that it is needed. Additionally, clean up the code so that it conforms
to better coding style.


16476 18-Jun-1996 dyson

Add procfs_type.c to the repository.


16474 18-Jun-1996 dyson

Clean-up the new VM map procfs code, and also add support for executable
format file "etype". It contains a description of the binary type for
a process.


16468 17-Jun-1996 dyson

This file is the "meat" of the process address space capability. If you
would like other things added, just ask!!! It might be pretty easy to add.


16467 17-Jun-1996 dyson

Add a feature to procfs to allow display of the process address map
with multiple entries as follows:

start address, end address, resident pages in range, private pages
in range, RW/RO, COW or not, (vnode/device/swap/default).


16363 14-Jun-1996 asami

The Great PC98 Merge.

All new code is "#ifdef PC98"ed so this should make no difference to
PC/AT (and its clones) users.

Ok'd by: core
Submitted by: FreeBSD(98) development team


16322 12-Jun-1996 gpalmer

Clean up -Wunused warnings.

Reviewed by: bde


16312 12-Jun-1996 dg

Moved the fsnode MALLOC to before the call to getnewvnode() so that the
process won't possibly block before filling in the fsnode pointer (v_data)
which might be dereferenced during a sync since the vnode is put on the
mnt_vnodelist by getnewvnode.

Pointed out by Matt Day <mday@artisoft.com>


16311 12-Jun-1996 dg

Moved the fsnode MALLOC to before the call to getnewvnode() so that the
process won't possibly block before filling in the fsnode pointer (v_data)
which might be dereferenced during a sync since the vnode is put on the
mnt_vnodelist by getnewvnode.


16308 11-Jun-1996 dyson

Properly lock the vm space when accessing the memory in a process. This
fix could solve some "interesting" problems that could happen during
process rundown.


15538 02-May-1996 phk

First pass at cleaning up macros relating to pages, clusters and all that.


15055 05-Apr-1996 ache

Fix adjkerntz expression priority.
Make filetimes the same as DOS times for UTC cmos clock.


15053 05-Apr-1996 ache

Don't adjust file times for UTC clock to have the same timestamps
for DOS/FreeBSD.


15033 03-Apr-1996 gpalmer

add a `Warning:' to the message saying that the root directory is not a
multiple of the clustersize in length to try and reduce the number
of questions we get on the subject.


14693 19-Mar-1996 dyson

Fix the problem that unmounting filesystems that are backed by a VMIO
device have reference count problems. We mark the underlying object
ono-persistent, and account for the reference count that the VM system
maintainsfor the special device close. This should fix the removable
device problem.


14625 14-Mar-1996 joerg

Provide a better handling of partially corrupted directory entries.

Submitted by: bde


14553 11-Mar-1996 peter

Import 4.4BSD-Lite2 onto the vendor branch, note that in the kernel, all
files are off the vendor branch, so this should not change anything.

A "U" marker generally means that the file was not changed in between
the 4.4Lite and Lite-2 releases, and does not need a merge. "C" generally
means that there was a change.
[note, new file: cd9660_mount.h]


14532 11-Mar-1996 hsu

For Lite2: proc LIST changes.
Reviewed by: davidg & bde


14434 09-Mar-1996 dyson

Make sure that the zero flag is cleared upon completion of paging I/O.


14093 13-Feb-1996 wollman

Kill XNS.
While we're at it, fix socreate() to take a process argument. (This
was supposed to get committed days ago...)


13838 02-Feb-1996 wosch

add ruid and rgid to file 'status'


13765 30-Jan-1996 mpp

Fix a bunch of spelling errors in the comment fields of
a bunch of system include files.


13627 25-Jan-1996 peter

This time, really make the procfs work when reading stuff from the UPAGES.

This is a really ugly bandaid on the problem, but it works well enough for
'ps -u' to start working again. The problem was caused by the user
address space shrinking by a little bit and the UPAGES being "cast off" to
become a seperate entity rather than being at the top of the process's
vmspace. That optimization was part of John's most recent VM speedups.

Now, rather than decoding the VM space, it merely ensures the pages are
in core and accesses them the same way the ptrace(PT_READ_U..) code does,
ie: off the p->p_addr pointer.


13608 24-Jan-1996 peter

Major fixes for procfs..

Implement a "variable" directory structure. Files that do not make
sense for the given process do not "appear" and cannot be opened.
For example, "system" processes do not have "file", "regs" or "fpregs",
because they do not have a user area.

"attempt" to fill in the user area of a given process when it is being
accessed via /proc/pid/mem (the user struct is just after
VM_MAXUSER_ADDRESS in the process address space.)

Dont do IO to the U area while it's swapped, hold it in place if possible.

Lock off access to the "ctl" file if it's done a setuid like the other
pseudo-files in there.


13490 19-Jan-1996 dyson

Eliminated many redundant vm_map_lookup operations for vm_mmap.
Speed up for vfs_bio -- addition of a routine bqrelse to greatly diminish
overhead for merged cache.
Efficiency improvement for vfs_cluster. It used to do alot of redundant
calls to cluster_rbuild.
Correct the ordering for vrele of .text and release of credentials.
Use the selective tlb update for 486/586/P6.
Numerous fixes to the size of objects allocated for files. Additionally,
fixes in the various pagers.
Fixes for proper positioning of vnode_pager_setsize in msdosfs and ext2fs.
Fixes in the swap pager for exhausted resources. The pageout code
will not as readily thrash.
Change the page queue flags (PG_ACTIVE, PG_INACTIVE, PG_FREE, PG_CACHE) into
page queue indices (PQ_ACTIVE, PQ_INACTIVE, PQ_FREE, PQ_CACHE),
thereby improving efficiency of several routines.
Eliminate even more unnecessary vm_page_protect operations.
Significantly speed up process forks.
Make vm_object_page_clean more efficient, thereby eliminating the pause
that happens every 30seconds.
Make sequential clustered writes B_ASYNC instead of B_DELWRI even in the
case of filesystems mounted async.
Fix a panic with busy pages when write clustering is done for non-VMIO
buffers.


13260 05-Jan-1996 wollman

Convert QUOTA to new-style option.


13160 01-Jan-1996 phk

I have some problem here, which shows up in the ahc0 driver. It isn't where
it originates, so I catch it here and fail.
This may expose the same bug on other disk controllers (both scsi & ide).


12904 17-Dec-1995 bde

Fixed 1TB filesize changes. Some pindexes had bogus names and types
but worked because vm_pindex_t is indistinuishable from vm_offset_t.


12820 14-Dec-1995 phk

Another mega commit to staticize things.


12813 13-Dec-1995 julian

devsw tables are now arrays of POINTERS to struct [cb]devsw
seems to work hre just fine though I can't check every file
that changed due to limmited h/w, however I've checked enught to be petty
happy withe hte code..

WARNING... struct lkm[mumble] has changed
so it might be an idea to recompile any lkm related programs


12771 11-Dec-1995 phk

Back out this one, must have screwed up somewhere :-(


12769 11-Dec-1995 phk

Staticize.


12767 11-Dec-1995 dyson

Changes to support 1Tb filesizes. Pages are now named by an
(object,index) pair instead of (object,offset) pair.


12675 08-Dec-1995 julian

Pass 3 of the great devsw changes
most devsw referenced functions are now static, as they are
in the same file as their devsw structure. I've also added DEVFS
support for nearly every device in the system, however
many of the devices have 'incorrect' names under DEVFS
because I couldn't quickly work out the correct naming conventions.
(but devfs won't be coming on line for a month or so anyhow so that doesn't
matter)

If you "OWN" a device which would normally have an entry in /dev
then search for the devfs_add_devsw() entries and munge to make them right..
check out similar devices to see what I might have done in them in you
can't see what's going on..
for a laugh compare conf.c conf.h defore and after... :)
I have not doen DEVFS entries for any DISKSLICE devices yet as that will be
a much more complicated job.. (pass 5 :)

pass 4 will be to make the devsw tables of type (cdevsw * )
rather than (cdevsw)
seems to work here..
complaints to the usual places.. :)


12662 07-Dec-1995 dg

Untangled the vm.h include file spaghetti.


12645 05-Dec-1995 bde

Include <vm/vm.h> or <vm/vm_page.h> explicitly to avoid breaking when
vnode_if.h doesn't include vm stuff.


12636 05-Dec-1995 bde

Restored #include of <sys/tty.h>. fdesc_vnops.c needs to know too much
about tty_tty.c's cdevswitch functions.


12597 03-Dec-1995 bde

Added prototypes.

cd9660_rrip.c:
Added lots of bogus casts to hide type errors exposed by the prototypes.
(Different structs are assumed to have a common prefix.)

cd9660_vnops.c:
Finished staticizing.


12596 03-Dec-1995 bde

Added prototypes.


12595 03-Dec-1995 bde

Added prototypes.

Removed some unnecessary #includes.


12594 03-Dec-1995 bde

null_node_find() and umap_node_find() were sometimes called without a
`struct mount *' arg. I don't know what the effects of this were.


12570 02-Dec-1995 phk

staticize.


12520 29-Nov-1995 julian

#ifdef out nearly the entire file of conf.c when JREMOD is defined
add a few safety checks in specfs because
now it's possible to get entries in [cd]devsw[] which are ALL NULL
so it's better to discover this BEFORE jumping into the d_open() entry..

more check to come later.. this getsthe code to the stage where I
can start testing it, even if I haven't caught every little error case...
I guess I'll find them quick enough..


12453 21-Nov-1995 bde

Completed function declarations and/or added prototypes.


12412 20-Nov-1995 dyson

Since FreeBSD clustering code now supports filesystems < PAGE_SIZE,
enable clustering for cd9660, thereby giving a BIG performance boost.


12373 18-Nov-1995 bde

KNFized spec_getpages_idone() and spec_getpages().

Moved misplaced #includes.

Completed function pointer declarations.


12338 16-Nov-1995 bde

Moved declarations for static functions to the correct place (not in a
header) and cleaned them up.


12337 16-Nov-1995 bde

Moved declarations for static functions to the correct place (not in a
header).

Removed stupid comments.


12336 16-Nov-1995 bde

Fixed the type of procfs_sync(). Trailing args were missing.

Fixed the type of procfs_fhtovp(). The args had little resemblance to
the correct ones.

Added prototypes.


12335 16-Nov-1995 bde

Fixed the type of portal_sync(). Trailing args were missing.

Fixed the type of portal_fhtovp(). The args had little resemblance to
the correct ones.

Added prototypes.


12333 16-Nov-1995 bde

Fixed the type of fdesc_sync(). Trailing args were missing.

Fixed the type of fdesc_fhtovp(). The args had little resemblance to
the correct ones.

Added prototypes.


12287 14-Nov-1995 phk

Get rid of hostnamelen variable.


12265 13-Nov-1995 bde

Fixed getdirentries() on nfs mounted msdosfs's. No cookies were returned
for certain common combinations of directory sizes, cluster sizes, and i/o
sizes (e.g., 4K, 4K, and 4K). The fix in rev. 1.21 was incomplete.

Reviewed by: dfr
Obtained from: party from NetBSD


12230 12-Nov-1995 dg

Brought in the setattr call support from Lite-2 so that more correct error
returns are provided.

Obtained from: 4.4BSD-Lite2


12228 12-Nov-1995 dg

Fix isoilk hang caused by not checking for read-onlyness in several places.
The fix for this in Lite-2 is more complete, but these quick hacks of mine
are safer for now. I plan to integrate the additional Lite-2 stuff at some
later time. Should completely fix PR810.


12203 11-Nov-1995 bde

Removed unsed function dead_nullop().

Converted incomplete function declarations to prototypes.


12158 09-Nov-1995 bde

Introduced a type `vop_t' for vnode operation functions and used
it 1138 times (:-() in casts and a few more times in declarations.
This change is null for the i386.

The type has to be `typedef int vop_t(void *)' and not `typedef
int vop_t()' because `gcc -Wstrict-prototypes' warns about the
latter. Since vnode op functions are called with args of different
(struct pointer) types, neither of these function types is any use
for type checking of the arg, so it would be preferable not to use
the complete function type, especially since using the complete
type requires adding 1138 casts to avoid compiler warnings and
another 40+ casts to reverse the function pointer conversions before
calling the functions.


12145 07-Nov-1995 phk

missed one static thingie.


12144 07-Nov-1995 phk

staticize private parts.


12143 07-Nov-1995 phk

Make a lot of private stuff static.
Should anybody out there wonder about this vendetta against global
variables, it is basically to make it more visible what our interfaces
in the kernel really are.
I'm almost convinced we should have a
#define PUBLIC /* public interface */
and use it in the #includes...


11977 31-Oct-1995 pst

Pad out MSDOS boot block to 512 bytes (bugfix only)
Submitted by: Andreas Haakh, ah@alman.RoBIN.de


11954 31-Oct-1995 phk

Make a lot of stuff static.


11921 29-Oct-1995 phk

Second batch of cleanup changes.
This time mostly making a lot of things static and some unused
variables here and there.


11707 23-Oct-1995 dyson

Removal of unnecessary usage of PG_COPYONWRITE.


11701 23-Oct-1995 dyson

Finalize GETPAGES layering scheme. Move the device GETPAGES
interface into specfs code. No need at this point to modify the
PUTPAGES stuff except in the layered-type (NULL/UNION) filesystems.


11644 22-Oct-1995 dg

Moved the filesystem read-only check out of the syscalls and into the
filesystem layer, as was done in lite-2. Merged in some other cosmetic
changes while I was at it. Rewrote most of msdosfs_access() to be more
like ufs_access() and to include the FS read-only check.

Obtained from: partially from 4.4BSD-lite2


11333 08-Oct-1995 swallace

Add #include <sys/sysproto.h> to get struct close_args and close
function prototype.


11297 07-Oct-1995 bde

Return EINVAL instead of panicing for rename("dir1", "dir2/..").

Fixes part of PR 760.

This bug seems to be very old.


11262 06-Oct-1995 phk

Avoid some 64bit divides.


10551 04-Sep-1995 dyson

Added VOP_GETPAGES/VOP_PUTPAGES and also the "backwards" block count
for VOP_BMAP. Updated affected filesystems...


10534 02-Sep-1995 mpp

Do not allow delete/rename lookup request to prevent
panics if a user attempts to remove/rename files in
a fdesc file system.


10533 02-Sep-1995 mpp

Correctly initialize the mount stat structure so that
fdesc file systems show up in "mount" correctly and so that
they can then be unmounted.


10531 02-Sep-1995 mpp

Change procfs_lookup to not allow delete/rename operations
to prevent panics when a user tries to remove/rename the
contents of /proc/###/*.

Obtained from: 4.4BSD-lite2


10272 25-Aug-1995 bde

Fix bogus arg (&p instead of p) in the call to VOP_ACCESS() from
msdosfs_setattr(). The bug was benign because the arg isn't used.


10093 17-Aug-1995 bde

The `cred' and `proc' args were missing for some VOP_OPEN() and VOP_CLOSE()
calls.

Found by: gcc -Wstrict-prototypes after I supplied some of the 5000+
missing prototypes. Now I have 9000+ lines of warnings and errors
about bogus conversions of function pointers.


10027 11-Aug-1995 dg

Converted mountlist to a CIRCLEQ.

Partially obtained from: 4.4BSD-Lite2


10024 11-Aug-1995 dg

Be careful not to dereference NULL credentials pointers when doing the
getattr function.


9973 06-Aug-1995 jkh

Allow a pipe to be opened read/write at one end, as is allowed in
SunOS and SCO. You can then even use the pipe as a cheap fifo stack
(yuck!). A semantic change also important (but not limited) to iBCS2
compatibility.
Submitted by: swallace


9878 03-Aug-1995 dfr

Make sure that a non-null cookie vector is returned even if there were no
valid entries in the block. Doing otherwise confuses the nfs server.


9862 02-Aug-1995 dfr

Add support for the va_filerev attribute required by NFSv3.


9842 01-Aug-1995 dg

Removed my special-case hack for VOP_LINK and fixed the problem with the
wrong vp's ops vector being used by changing the VOP_LINK's argument order.
The special-case hack doesn't go far enough and breaks the generic
bypass routine used in some non-leaf filesystems. Pointed out by Kirk
McKusick.


9759 29-Jul-1995 bde

Eliminate sloppy common-style declarations. There should be none left for
the LINT configuation.


9715 25-Jul-1995 bde

Change `extern inline' to `static inline' so that several functions
don't go away when the kernel is compiled with -O.

The functions are backed up by extern versions in cd9660_util.c,
but these versions are disabled by `#ifdef __notanymore__'. They
could have been enabled by using `#if defined(__notanymore__) ||
!defined(__OPTIMIZE__)' but then I would have had to check that
they still work. The correct way to handle all this is to replace
`extern inline' by `EXTERN_INLINE' and define `EXTERN_INLINE' as
`extern inline' in most modules and as empty in one module.


9542 16-Jul-1995 joerg

There is a small bug in the cd9660 code that prevents stating of
associated files.

Submitted by: leo@dachau.marco.de (Matthias Pfaller)
Not-obtained from: NetBSD. Instead sent directly to me by Matthias.
(Sorry, this is to prevent people from claiming i might have gotten
this from NetBSD. :)


9540 16-Jul-1995 bde

Don't include <sys/tty.h> in drivers that aren't tty drivers or in general
files that don't depend on the internals of <sys/tty.h>


9507 13-Jul-1995 dg

NOTE: libkvm, w, ps, 'top', and any other utility which depends on struct
proc or any VM system structure will have to be rebuilt!!!

Much needed overhaul of the VM system. Included in this first round of
changes:

1) Improved pager interfaces: init, alloc, dealloc, getpages, putpages,
haspage, and sync operations are supported. The haspage interface now
provides information about clusterability. All pager routines now take
struct vm_object's instead of "pagers".

2) Improved data structures. In the previous paradigm, there is constant
confusion caused by pagers being both a data structure ("allocate a
pager") and a collection of routines. The idea of a pager structure has
escentially been eliminated. Objects now have types, and this type is
used to index the appropriate pager. In most cases, items in the pager
structure were duplicated in the object data structure and thus were
unnecessary. In the few cases that remained, a un_pager structure union
was created in the object to contain these items.

3) Because of the cleanup of #1 & #2, a lot of unnecessary layering can now
be removed. For instance, vm_object_enter(), vm_object_lookup(),
vm_object_remove(), and the associated object hash list were some of the
things that were removed.

4) simple_lock's removed. Discussion with several people reveals that the
SMP locking primitives used in the VM system aren't likely the mechanism
that we'll be adopting. Even if it were, the locking that was in the code
was very inadequate and would have to be mostly re-done anyway. The
locking in a uni-processor kernel was a no-op but went a long way toward
making the code difficult to read and debug.

5) Places that attempted to kludge-up the fact that we don't have kernel
thread support have been fixed to reflect the reality that we are really
dealing with processes, not threads. The VM system didn't have complete
thread support, so the comments and mis-named routines were just wrong.
We now use tsleep and wakeup directly in the lock routines, for instance.

6) Where appropriate, the pagers have been improved, especially in the
pager_alloc routines. Most of the pager_allocs have been rewritten and
are now faster and easier to maintain.

7) The pagedaemon pageout clustering algorithm has been rewritten and
now tries harder to output an even number of pages before and after
the requested page. This is sort of the reverse of the ideal pagein
algorithm and should provide better overall performance.

8) Unnecessary (incorrect) casts to caddr_t in calls to tsleep & wakeup
have been removed. Some other unnecessary casts have also been removed.

9) Some almost useless debugging code removed.

10) Terminology of shadow objects vs. backing objects straightened out.
The fact that the vm_object data structure escentially had this
backwards really confused things. The use of "shadow" and "backing
object" throughout the code is now internally consistent and correct
in the Mach terminology.

11) Several minor bug fixes, including one in the vm daemon that caused
0 RSS objects to not get purged as intended.

12) A "default pager" has now been created which cleans up the transition
of objects to the "swap" type. The previous checks throughout the code
for swp->pg_data != NULL were really ugly. This change also provides
the rudiments for future backing of "anonymous" memory by something
other than the swap pager (via the vnode pager, for example), and it
allows the decision about which of these pagers to use to be made
dynamically (although will need some additional decision code to do
this, of course).

13) (dyson) MAP_COPY has been deprecated and the corresponding "copy
object" code has been removed. MAP_COPY was undocumented and non-
standard. It was furthermore broken in several ways which caused its
behavior to degrade to MAP_PRIVATE. Binaries that use MAP_COPY will
continue to work correctly, but via the slightly different semantics
of MAP_PRIVATE.

14) (dyson) Sharing maps have been removed. It's marginal usefulness in a
threads design can be worked around in other ways. Both #12 and #13
were done to simplify the code and improve readability and maintain-
ability. (As were most all of these changes)

TODO:

1) Rewrite most of the vnode pager to use VOP_GETPAGES/PUTPAGES. Doing
this will reduce the vnode pager to a mere fraction of its current size.

2) Rewrite vm_fault and the swap/vnode pagers to use the clustering
information provided by the new haspage pager interface. This will
substantially reduce the overhead by eliminating a large number of
VOP_BMAP() calls. The VOP_BMAP() filesystem interface should be
improved to provide both a "behind" and "ahead" indication of
contiguousness.

3) Implement the extended features of pager_haspage in swap_pager_haspage().
It currently just says 0 pages ahead/behind.

4) Re-implement the swap device (swstrategy) in a more elegant way, perhaps
via a much more general mechanism that could also be used for disk
striping of regular filesystems.

5) Do something to improve the architecture of vm_object_collapse(). The
fact that it makes calls into the swap pager and knows too much about
how the swap pager operates really bothers me. It also doesn't allow
for collapsing of non-swap pager objects ("unnamed" objects backed by
other pagers).


9435 08-Jul-1995 dg

Added missing splx() in DIAGNOSTIC code.
Suggested by enami@sys.ptg.sony.co.jp.


9354 28-Jun-1995 dg

Fixed VOP_LINK argument order botch.


9346 28-Jun-1995 dg

Killed the "probably_never" ifdef'd code.


9202 11-Jun-1995 rgrimes

Merge RELENG_2_0_5 into HEAD


8876 30-May-1995 rgrimes

Remove trailing whitespace.


8740 25-May-1995 dg

Fixed panic that resulted from mmaping files in kernfs and procfs. A
regular user could panic the machine with a simple "tail /proc/curproc/mem"
command. The problem was twofold: both kernfs and procfs didn't fill in
the mnt_stat statfs struct (which would later lead to an integer divide
fault in the vnode pager), and kernfs bogusly paniced if a bmap was
attempted.

Reviewed by: John Dyson


8624 19-May-1995 dg

NFS diskless operation was broken because swapdev_vp wasn't initialized.
These changes solve the problem in a general way by moving the
initialization out of the individual fs_mountroot's and into swaponvp().

Submitted by: Poul-Henning Kamp


8456 11-May-1995 rgrimes

Fix -Wformat warnings from LINT kernel.


8386 09-May-1995 bde

Submitted by: Mike Pritchard <pritc003@maroon.tc.umn.edu>

msdosfs_lookup() did no validation to see if the caller was validated
to delete/rename/create files. msdosfs_setattr() did no validation
to see if the caller was allowed to change the file permissions (turn
on/off the write bit) or update the file modification time (utimes).

The routines were fixed to validate the calls just like ufs does.


7835 15-Apr-1995 dg

For P_SUGID processes, we must also change ownership of the mem file
to root so that group kmem can still get to it. *SIGH*


7833 15-Apr-1995 dg

Retain group kmem readability for P_SUGID processes.


7832 15-Apr-1995 dg

Made /proc/n/mem file group kmem and group readable. Needed to fix ps so
that it doesn't need to be setuid root.


7760 11-Apr-1995 ache

Fix link sys call
Submitted by: pritc003@maroon.tc.umn.edu


7755 11-Apr-1995 bde

Submitted by: Mike Pritchard <pritc003@maroon.tc.umn.edu>

Fix PR 303: msdosfs: moving a file into another directory causes panic.

" ... the code that does the rename already has the denode
locked when msdosfs_hashins() gets called, resulting in the panic
when the routine attempts to lock the denode again.
...
The attached patch changes the msdosfs_hashins() routine to not lock the
denode. The caller is now resposible for obtaining the lock instead
of having msdosfs_hashins() do it for them."


7754 11-Apr-1995 bde

Submitted by: Wolfgang Solfrank <ws@tools.de>

Fix off-by-1-sector error in the range checking for the end of the root
directory. It was possible for the root directory to overwrite the FAT.


7695 09-Apr-1995 dg

Changes from John Dyson and myself:

Fixed remaining known bugs in the buffer IO and VM system.

vfs_bio.c:
Fixed some race conditions and locking bugs. Improved performance
by removing some (now) unnecessary code and fixing some broken
logic.
Fixed process accounting of # of FS outputs.
Properly handle NFS interrupts (B_EINTR).

(various)
Replaced calls to clrbuf() with calls to an optimized routine
called vfs_bio_clrbuf().

(various FS sync)
Sync out modified vnode_pager backed pages.

ffs_vnops.c:
Do two passes: Sync out file data first, then indirect blocks.

vm_fault.c:
Fixed deadly embrace caused by acquiring locks in the wrong order.

vnode_pager.c:
Changed to use buffer I/O system for writing out modified pages. This
should fix the problem with the modification date previous not getting
updated. Also dramatically simplifies the code. Note that this is
going to change in the future and be implemented via VOP_PUTPAGES().

vm_object.c:
Fixed a pile of bugs related to cleaning (vnode) objects. The performance
of vm_object_page_clean() is terrible when dealing with huge objects,
but this will change when we implement a binary tree to keep the object
pages sorted.

vm_pageout.c:
Fixed broken clustering of pageouts. Fixed race conditions and other
lockup style bugs in the scanning of pages. Improved performance.


7465 29-Mar-1995 ache

Fix timestamps when using Wall CMOS clock,
optimize dos2unixtime()
Submitted by: pritc003@maroon.tc.umn.edu


7430 28-Mar-1995 bde

Add and move declarations to fix all of the warnings from `gcc -Wimplicit'
(except in netccitt, netiso and netns) that I didn't notice when I fixed
"all" such warnings before.


7429 28-Mar-1995 phk

Readdir on a CDrom would return bogus "d_type" values, potentially confusing
everybody (incl find(1) ?). Initialize it to DT_UNKNOWN. Maybe we can
do better, but I don't have the time.


7170 19-Mar-1995 dg

Removed redundant newlines that were in some panic strings.


7161 19-Mar-1995 dg

Removed bogus, commented out, call to vnode_pager_uncache().


7095 16-Mar-1995 wollman

Add four more filesystem flags:

VFCF_NETWORK (this FS goes over the net)
VFCF_READONLY (read-write mounts do not make any sense)
VFCF_SYNTHETIC (data in this FS is not real)
VFCF_LOOPBACK (this FS aliases something else)

cd9660 is readonly; nullfs, umapfs, and union are loopback; NFS is netowkr;
procfs, kernfs, and fdesc are synthetic.


7090 16-Mar-1995 bde

Add and move declarations to fix all of the warnings from `gcc -Wimplicit'
(except in netccitt, netiso and netns) and most of the warnings from
`gcc -Wnested-externs'. Fix all the bugs found. There were no serious
ones.


6603 21-Feb-1995 bde

Obtained from: memories of 1.1.5

Fix the sign of the timezone offset again.


6569 20-Feb-1995 dg

Make sure process isn't swapped when messing with it.
Added missing newline to log() call.


6364 14-Feb-1995 phk

YFfix


6339 13-Feb-1995 phk

strategy for block and char devices are rightfully spec_strategy.
I feel like yanking all the "ISODEVMAP" stuff altogether, it looks
like a bad kludge...


6303 10-Feb-1995 bde

Use the correct block number for updating the backup copy of the FAT when
deleting a file. Deleting a large file used to scramble the backup copy.


6151 03-Feb-1995 dg

Fixed bmap run-length brokeness.
Use bmap run-length extension when doing clustered paging.

Submitted by: John Dyson


6001 29-Jan-1995 ats

Kill the comment in a comment to shut up the compiler.


5651 16-Jan-1995 joerg

Roll in my changes to make the cd9660 code understand the older
(original "High Sierra") CD format. I've already implemented this for
1.1.5.1 (and posted to -hackers), but didn't get any response to it.
Perhaps i'm the only one who has such an old CD lying around...

Everything is done empirically, but i had three of them around (from
different vendors), so there's a high probability that i've got it
right. :)


5455 09-Jan-1995 dg

These changes embody the support of the fully coherent merged VM buffer cache,
much higher filesystem I/O performance, and much better paging performance. It
represents the culmination of over 6 months of R&D.

The majority of the merged VM/cache work is by John Dyson.

The following highlights the most significant changes. Additionally, there are
(mostly minor) changes to the various filesystem modules (nfs, msdosfs, etc) to
support the new VM/buffer scheme.

vfs_bio.c:
Significant rewrite of most of vfs_bio to support the merged VM buffer cache
scheme. The scheme is almost fully compatible with the old filesystem
interface. Significant improvement in the number of opportunities for write
clustering.

vfs_cluster.c, vfs_subr.c
Upgrade and performance enhancements in vfs layer code to support merged
VM/buffer cache. Fixup of vfs_cluster to eliminate the bogus pagemove stuff.

vm_object.c:
Yet more improvements in the collapse code. Elimination of some windows that
can cause list corruption.

vm_pageout.c:
Fixed it, it really works better now. Somehow in 2.0, some "enhancements"
broke the code. This code has been reworked from the ground-up.

vm_fault.c, vm_page.c, pmap.c, vm_object.c
Support for small-block filesystems with merged VM/buffer cache scheme.

pmap.c vm_map.c
Dynamic kernel VM size, now we dont have to pre-allocate excessive numbers of
kernel PTs.

vm_glue.c
Much simpler and more effective swapping code. No more gratuitous swapping.

proc.h
Fixed the problem that the p_lock flag was not being cleared on a fork.

swap_pager.c, vnode_pager.c
Removal of old vfs_bio cruft to support the past pseudo-coherency. Now the
code doesn't need it anymore.

machdep.c
Changes to better support the parameter values for the merged VM/buffer cache
scheme.

machdep.c, kern_exec.c, vm_glue.c
Implemented a seperate submap for temporary exec string space and another one
to contain process upages. This eliminates all map fragmentation problems
that previously existed.

ffs_inode.c, ufs_inode.c, ufs_readwrite.c
Changes for merged VM/buffer cache. Add "bypass" support for sneaking in on
busy buffers.

Submitted by: John Dyson and David Greenman


5403 05-Jan-1995 dg

Initialize map start hint to vm_map_find()...not doing so will cause it
to fail if the random thing on the stack happens to be too large.

Submitted by: David Jones <dej@qpoint.torfree.net>


5312 31-Dec-1994 ache

Fix problem when attached process detached
Submitted by: Gary Jennejohn


5241 27-Dec-1994 bde

Fix panic for `cp -p' by root to an msdos file system. Improve handling
of attributes so that `cp -p' to an msdos file system can succeed under
favourable circumstances (no uid or gid changes and no nonzero flags
except SF_ARCHIVED).

msdosfs_vnops.c:
The in-core inode flags were confused with the on-disk inode flags, so
chflags() clobbered the lock flag and caused a panic.

denode.h, msdosfs_denode.c, msdosfs_vnops.c:
Support the msdosfs archive attibute (ATTR_ARCHIVE) by mapping it to
the complement of the SF_ARCHIVED flag and setting the ATTR_ARCHIVE
bit when a file's modification time is set (but not when a file's
permissions are set; this is the standard wrong DOS behaviour).

denode.h, msdosfs_denode.c:
Remove the DE_UPDAT() macro. It was only used once, and the corresponding
macro in ufs has already been removed.

denode.h:
Don't change the timestamp for directories in DE_TIMES() (be consistent
with deupdat()).

msdosfs_vnops.c:
Handle chown() better: return EPERM instead of EINVAL if there are
insufficient permissions; otherwise, allow null changes.


5083 12-Dec-1994 bde

Fix numerous timestamp bugs.

DE_UPDATE was confused with DE_MODIFIED in some places (they do have
confusing names). Handle them exactly the same as IN_UPDATE and
IN_MODIFIED. This fixes chmod() and chown() clobbering the mtime
and other bugs.

DE_MODIFIED was set but not used.

Parenthesize macro args.

DE_TIMES() now takes a timeval arg instead of a timespec arg. It was
stupid to use a macro for speed and do unused conversions to prepare
for the macro.

Restore the left shifting of the DOS seconds count by 1. It got
lost among the shifts for the bitfields, so DOS seconds counts
appeared to range from 0 to 29 seconds (step 1) instead of 0 to 58
seconds (step 2).

Actually use the passed-in mtime in deupdat() as documented so that
utimes() works.

Change `extern __inline's to `static inline's so that msdosfs_fat.o
can be linked when it is compiled without -O.

Remove faking of directory mtimes to always be the current time. It's
more surprising for directory mtimes to change when you read the
directories than for them not to change when you write the directories.
This should be controlled by a mount-time option if at all.


4868 29-Nov-1994 ache

Restore mv check, cause panic without it
Submitted by: Ade Barkah


4463 14-Nov-1994 bde

Undo a previous change. <sys/disklabel.h> was broken, not these files.


4456 14-Nov-1994 bde

Remove the bogus include of <sys/dkbad.h>.


4140 04-Nov-1994 dg

From tim@cs.city.ac.uk (Tim Wilkinson):

Find enclosed a short bugfix to get the union filesystem up and running
in FreeBSD-current. We don't think we've got all the problems yet but
these fixes sort out the major ones (which mostly concert bad locking
of vnodes), no doubt we'll post others as necessary. Known problems
include the inability of the umount command (not the system call) to unmount
unions in certain circumstances (this is due the way "realpath" works),
and the failure of direntries to always get all available files in
unioned subdirectories. We are, as they say, working on it.

Submitted by: tim@cs.city.ac.uk (Tim Wilkinson)


4057 01-Nov-1994 jkh

Fix from John Hay to avoid kernel panics when ap->a_eofflag is NULL.
I'm not sure if this is just masking another problem (like, should
ap->a_eofflag EVER be NULL?), but if it prevents a panic for now then
it may save an ALPHA customer.
Submitted by: jhay


3962 28-Oct-1994 jkh

From: fredriks@mcs.com (Lars Fredriksen)
...
It turns out that these files do not include <sys/dkbad.h> before
<sys/disklabel.h>.
Submitted by: fredriks


3935 27-Oct-1994 pst

Set the EOF flag properly.
Obtained from: netbsd-bugs mailing list


3805 23-Oct-1994 martin

Fixed panic when unmounting floppy msdos filesystems. Problem was
we weren't flushing dirty buffers. Fix stolen from ffs_fsync()


3687 18-Oct-1994 dg

Fixed bug I just introduced that would have allowed a user to clobber
his kernel stack.


3685 18-Oct-1994 dg

Allow upages to be paged in/accessed.

Submitted by: John Dyson


3498 10-Oct-1994 phk

Cosmetics. Silence gcc -Wall


3496 10-Oct-1994 phk

Cosmetics. reduce the noise from gcc -Wall.


3442 08-Oct-1994 phk

Cosmetics: added a #include and a static prototype to silence gcc.


3396 06-Oct-1994 dg

Use tsleep() rather than sleep so that 'ps' is more informative about
the wait.


3311 02-Oct-1994 phk

GCC cleanup.
Reviewed by:
Submitted by:
Obtained from:


3167 28-Sep-1994 dfr

Make NFS ask the filesystems for directory cookies instead of making them
itself.


3152 27-Sep-1994 phk

Added declarations, fixed bugs due to missing decls. At least one of them
could panic a system. (I know, it paniced mine!).


3106 26-Sep-1994 gpalmer

Alterations to silence gcc -Wall. Some unused variables deleted.

Reviewed by: davidg


3054 24-Sep-1994 dg

1) Added "." and ".." entries.
2) Fixed directory size to return something reasonable.
3) Disabled "file" until the code is completed.
4) Corrected directory link counts.


3034 23-Sep-1994 dg

Include <sys/kernel.h> not <kernel.h>


2979 22-Sep-1994 wollman

More loadable VFS changes:

- Make a number of filesystems work again when they are statically compiled
(blush)

- FIFOs are no longer optional; ``options FIFO'' removed from distributed
config files.


2960 21-Sep-1994 wollman

Fix a few niggling little bugs:

- set args->lkm_offset correctly so that VFS modules can be unloaded
- initialize _fs_vfsops.vfc_refcount correctly so that VFS modules can
be unloaded
- include kernel.h in a few placves to get the correct definition of DATA_SET


2946 21-Sep-1994 wollman

Implemented loadable VFS modules, and made most existing filesystems
loadable. (NFS is a notable exception.)


2899 19-Sep-1994 dfr

Changed some NetBSD backwards compatibility code which was confusing mountd.


2893 19-Sep-1994 dfr

Added msdosfs.

Obtained from: NetBSD


2807 15-Sep-1994 bde

Supply prototypes for some functions that were implicitly declared and
fix the resulting warnings.


2806 15-Sep-1994 bde

Obtained from:

Remove the unnecessary inclusion of disklabel.h in cd9660_vfsops.c so
that I don't have to worry about the latter when changing disklabel.h.

Supply prototypes for some functions that were implicitly declared and
fix the resulting warnings and errors (timevals were punned to timespecs).


2610 09-Sep-1994 dg

Relaxed panic in fdesc_setattr() to just return error.


2609 09-Sep-1994 dg

Fixed off by one error in referencing an array.

Stolen from: NetBSD


2604 09-Sep-1994 dfr

Fixed some confusion between the size of a logical block and the size of a
device block which was stopping symbolic links working.

cd9660_readdir was incorrectly casting a pointer to the d_namlen field of a
struct dirent to a (u_short*) which caused the directory entries "." and ".."
to read incorrectly.

Submitted by: dfr


2152 20-Aug-1994 dg

Implemented filesystem clean bit via:

machdep.c:
Changed printf's a little and call vfs_unmountall() if the sync was
successful.

cd9660_vfsops.c, ffs_vfsops.c, nfs_vfsops.c, lfs_vfsops.c:
Allow dismount of root FS. It is now disallowed at a higher level.

vfs_conf.c:
Removed unused rootfs global.

vfs_subr.c:
Added new routines vfs_unmountall and vfs_unmountroot. Filesystems
are now dismounted if the machine is properly rebooted.

ffs_vfsops.c:
Toggle clean bit at the appropriate places. Print warning if an
unclean FS is mounted.

ffs_vfsops.c, lfs_vfsops.c:
Fix bug in selecting proper flags for VOP_CLOSE().

vfs_syscalls.c:
Disallow dismounting root FS via umount syscall.


2142 20-Aug-1994 dg

1) cleaned up after Garrett - fixed more redundant declarations, changed
use of timeout_t -> timeout_func_t in aha1542 and aha1742 drivers.
2) fix a bug in the portalfs that was uncovered by better prototyping -
specifically, the time must be converted from timeval to timespec
before storing in va_atime.
3) fixed/added some miscellaneous prototypes


2112 18-Aug-1994 wollman

Fix up some sloppy coding practices:

- Delete redundant declarations.
- Add -Wredundant-declarations to Makefile.i386 so they don't come back.
- Delete sloppy COMMON-style declarations of uninitialized data in
header files.
- Add a few prototypes.
- Clean up warnings resulting from the above.

NB: ioconf.c will still generate a redundant-declaration warning, which
is unavoidable unless somebody volunteers to make `config' smarter.


1937 08-Aug-1994 dg

Changed B_AGE policy to work correctly in a world with relatively large
buffer caches. The old policy generally ended up caching nothing.


1817 02-Aug-1994 dg

Added $Id$


1549 25-May-1994 rgrimes

The big 4.4BSD Lite to FreeBSD 2.0.0 (Development) patch.

Reviewed by: Rodney W. Grimes
Submitted by: John Dyson and David Greenman


1541 24-May-1994 rgrimes

BSD 4.4 Lite Kernel Sources