History log of /freebsd-10.2-release/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c
Revision Date Author Comments
(<<< Hide modified files)
(Show modified files >>>)
# 285830 23-Jul-2015 gjb

- Copy stable/10@285827 to releng/10.2 in preparation for 10.2-RC1
builds.
- Update newvers.sh to reflect RC1.
- Update __FreeBSD_version to reflect 10.2.
- Update default pkg(8) configuration to use the quarterly branch.[1]

Discussed with: re, portmgr [1]
Approved by: re (implicit)
Sponsored by: The FreeBSD Foundation

# 276899 10-Jan-2015 delphij

MFC r264392 (davide):

Fix a panic in zfs_rename().
this is due to a wrong dereference of a vnode when it's not locked and
can be (potentially) recycled. 'sdvp' cannot be locked on zfs_rename()
entry point because the VFS can't be sure that this scenario is
LOR-free (it might violate the parent->child lock acquisition rule).
Dereference 'tdvp' instead, which is already locked on entry, and access
'sdvp' fields only when it's safe, i.e. under ZFS_ENTER scope.

While at it, remove the usage of VOP_REALVP, as long as this is a NOP
on FreeBSD.


# 276648 04-Jan-2015 kib

MFC r276007:
Handle MAKEENTRY cnp flag in the VOP_CREATE().


# 276500 01-Jan-2015 kib

MFC r275897:
Set NOCACHE flag for CREATE namei() calls, do not specially handle
MAKEENTRY in VOP_LOOKUP().


# 276081 22-Dec-2014 delphij

MFC r274337,r274673,274681,r275515:

ZFS large block support. The default recordsize remains at 128KB.

A new tunable/sysctl variable, vfs.zfs.max_recordsize is added to
allow adjusting the permitted maximum record size, or
zfs_max_recordsize, with a default of 1MB. ZFS will not allow
setting recordsize greater than zfs_max_recordsize as a safety
belt, because larger recordsize means greater read and write
latency and more memory usage.

Please note that booting from datasets that have recordsize greater
than 128KB is not supported (but it's Okay to enable the feature on
the pool).

Limited safety belt is provided for mounted root filesystem but use
caution when using a larger value.

Illumos issue:
5027 zfs large block support


# 275901 18-Dec-2014 avg

MFC r275401: zfs_putpages: actually update mtime and ctime


# 273509 23-Oct-2014 delphij

MFC r272809: MFV r272803:

Illumos issue:
5175 implement dmu_read_uio_dbuf() to improve cached read performance


# 272676 07-Oct-2014 araujo

Make external NFS clients know when files have their attributes changed and
avoid cache the file's state indefinitely. The va_filerev is what is sent
to the client as the "change" attribute, the client is periodically fetching
the attributes and without this option the attribute remains as some garbage
value.

Phabric: D905
Reported by: Kevin Buhr <buhr@asaurus.net>
Reviewed by: rmacklem, delphij
Approved by: delphij
Obtained from: r272467
Sponsored by: QNAP Systems Inc.


# 272134 25-Sep-2014 delphij

MFC r271536: MFV r271518:

Correctly report hole at end of file.

When asked to find a hole, the DMU sees that there are no holes in the
object, and returns ESRCH. The ZPL interprets this as "no holes before
the end of the file", and therefore inserts the "virtual hole" at the
end of the file. Because DMU and ZPL have different ideas of where the
end of an object/file is, we will end up returning the end of file,
which is generally larger, instead of returning the end of object.

The fix is to handle the "virtual hole" in the DMU. If no hole is found,
the DMU will return a hole at the end of the file, rather than an error.

Illumos issue:
5139 SEEK_HOLE failed to report a hole at end of file

Approved by: re (gjb)


# 269061 24-Jul-2014 mav

MFC r268420:
Remove IO_SYNC flag when writing extended file attributes on ZFS.

While it is possible to create and write file, modify its permissions, etc.
without ever doing sync, it looks odd that it is required for setting
extended file attributes on ZFS. UFS does not do sync there too.

Samba uses those extended attributes to store some its data, and doing it
synchronously by many times reduces file creation performance for systems
without SLOG device.


# 269002 23-Jul-2014 delphij

MFC r268464: MFV r268452:

Explicitly mark file removal transactions as "presumed to result
in a net free of space" so they will not fail with ENOSPC.

Illumos issue: 4950 files sometimes can't be removed from a full
filesystem


# 262112 17-Feb-2014 avg

MFC r260704,260717: zfs: getnewvnode_reserve must be called outside of a
zfs transaction


# 262096 17-Feb-2014 avg

MFC r260706: zfs_deleteextattr: name buffer from namei is needed by zfs_remove


# 260786 16-Jan-2014 avg

MFC r258744-258746: zfs: add zfs_freebsd_putpages


# 260776 16-Jan-2014 avg

MFC r258720: MFV r258665: 4347 ZPL can use dmu_tx_assign(TXG_WAIT)


# 260773 16-Jan-2014 avg

MFC r258739: zfs mappedread_sf: assert that a page is never partially valid


# 260763 16-Jan-2014 avg

MFC r258632,258704: MFV r255255: 4045 zfs write throttle & i/o scheduler
performance work

Sponsored by: HybridCluster [merge]


# 258563 25-Nov-2013 avg

MFC r258353: zfs page_busy: fix the boundaries of the cleared range

This is a fix for a regression introduced in r246293.

vm_page_clear_dirty expects the range to have DEV_BSIZE aligned boundaries,
otherwise it extends them. Thus it can happen that the whole page is
marked clean while actually having some small dirty region(s).
This commit makes the range properly aligned and ensures that only
the clean data is marked as such.

It would interesting to evaluate how much benefit clearing with DEV_BSIZE
granularity produces. Perhaps instead we should clear the whole page
when it is completely overwritten and don't bother clearing any bits
if only a portion a page is written.

Reviewed by: kib
Approved by: re (gjb)


# 256281 10-Oct-2013 gjb

Copy head (r256279) to stable/10 as part of the 10.0-RELEASE cycle.

Approved by: re (implicit)
Sponsored by: The FreeBSD Foundation


# 255748 20-Sep-2013 davide

Fixup cross-device rename checks in ZFS. Add a check for the case
where 'fdvp' is a directory, 'tvp' is an already existing directory
and they have different mount points.

Reported by: avg, pjd
Reviewed by: pjd
Approved by: re (rodrigc)


# 254982 28-Aug-2013 delphij

Previously, both zfs_rename and zfs_link does a check on whether
the passed vnode belongs to the same mount point (v_vfsp or also
known as v_mount in FreeBSD). This check prevents the code from
proceeding further on vnodes that do not belong to ZFS, for
instance, on UFS or NULLFS.

The recent change (merged as r254585) on upstream changes the
check of v_vfsp to instead check the znode's z_zfsvfs. On Illumos
this would work because when the vnode comes from lofs, the
VOP_REALVP() would give the right vnode, this is not true on
FreeBSD where our VOP_REALVP is a no-op, and as such tdvp is
not guaranteed to be a ZFS vnode, and will later trigger a
failed assertion when verifying the vnode.

This changeset modifies our local shims (zfs_freebsd_rename and
zfs_freebsd_link) to check if v_mount matches before proceeding
further.

Reported by: many
Diagnostic work by: avg


# 254711 23-Aug-2013 avg

zfs: inline and remove zfs_vnode_lock

It didn't serve any useful purpose, but obscured file and line information
useful for debugging.

MFC after: 5 days
X-MFC with: r254445


# 254649 22-Aug-2013 kib

Remove the deprecated VM_ALLOC_RETRY flag for the vm_page_grab(9).
The flag was mandatory since r209792, where vm_page_grab(9) was
changed to only support the alloc retry semantic.

Suggested and reviewed by: alc
Sponsored by: The FreeBSD Foundation


# 254627 21-Aug-2013 ken

Expand the use of stat(2) flags to allow storing some Windows/DOS
and CIFS file attributes as BSD stat(2) flags.

This work is intended to be compatible with ZFS, the Solaris CIFS
server's interaction with ZFS, somewhat compatible with MacOS X,
and of course compatible with Windows.

The Windows attributes that are implemented were chosen based on
the attributes that ZFS already supports.

The summary of the flags is as follows:

UF_SYSTEM: Command line name: "system" or "usystem"
ZFS name: XAT_SYSTEM, ZFS_SYSTEM
Windows: FILE_ATTRIBUTE_SYSTEM

This flag means that the file is used by the
operating system. FreeBSD does not enforce any
special handling when this flag is set.

UF_SPARSE: Command line name: "sparse" or "usparse"
ZFS name: XAT_SPARSE, ZFS_SPARSE
Windows: FILE_ATTRIBUTE_SPARSE_FILE

This flag means that the file is sparse. Although
ZFS may modify this in some situations, there is
not generally any special handling for this flag.

UF_OFFLINE: Command line name: "offline" or "uoffline"
ZFS name: XAT_OFFLINE, ZFS_OFFLINE
Windows: FILE_ATTRIBUTE_OFFLINE

This flag means that the file has been moved to
offline storage. FreeBSD does not have any special
handling for this flag.

UF_REPARSE: Command line name: "reparse" or "ureparse"
ZFS name: XAT_REPARSE, ZFS_REPARSE
Windows: FILE_ATTRIBUTE_REPARSE_POINT

This flag means that the file is a Windows reparse
point. ZFS has special handling code for reparse
points, but we don't currently have the other
supporting infrastructure for them.

UF_HIDDEN: Command line name: "hidden" or "uhidden"
ZFS name: XAT_HIDDEN, ZFS_HIDDEN
Windows: FILE_ATTRIBUTE_HIDDEN

This flag means that the file may be excluded from
a directory listing if the application honors it.
FreeBSD has no special handling for this flag.

The name and bit definition for UF_HIDDEN are
identical to the definition in MacOS X.

UF_READONLY: Command line name: "urdonly", "rdonly", "readonly"
ZFS name: XAT_READONLY, ZFS_READONLY
Windows: FILE_ATTRIBUTE_READONLY

This flag means that the file may not written or
appended, but its attributes may be changed.

ZFS currently enforces this flag, but Illumos
developers have discussed disabling enforcement.

The behavior of this flag is different than MacOS X.
MacOS X uses UF_IMMUTABLE to represent the DOS
readonly permission, but that flag has a stronger
meaning than the semantics of DOS readonly permissions.

UF_ARCHIVE: Command line name: "uarch", "uarchive"
ZFS_NAME: XAT_ARCHIVE, ZFS_ARCHIVE
Windows name: FILE_ATTRIBUTE_ARCHIVE

The UF_ARCHIVED flag means that the file has changed and
needs to be archived. The meaning is same as
the Windows FILE_ATTRIBUTE_ARCHIVE attribute, and
the ZFS XAT_ARCHIVE and ZFS_ARCHIVE attribute.

msdosfs and ZFS have special handling for this flag.
i.e. they will set it when the file changes.

sys/param.h: Bump __FreeBSD_version to 1000047 for the
addition of new stat(2) flags.

chflags.1: Document the new command line flag names
(e.g. "system", "hidden") available to the
user.

ls.1: Reference chflags(1) for a list of file flags
and their meanings.

strtofflags.c: Implement the mapping between the new
command line flag names and new stat(2)
flags.

chflags.2: Document all of the new stat(2) flags, and
explain the intended behavior in a little
more detail. Explain how they map to
Windows file attributes.

Different filesystems behave differently
with respect to flags, so warn the
application developer to take care when
using them.

zfs_vnops.c: Add support for getting and setting the
UF_ARCHIVE, UF_READONLY, UF_SYSTEM, UF_HIDDEN,
UF_REPARSE, UF_OFFLINE, and UF_SPARSE flags.

All of these flags are implemented using
attributes that ZFS already supports, so
the on-disk format has not changed.

ZFS currently doesn't allow setting the
UF_REPARSE flag, and we don't really have
the other infrastructure to support reparse
points.

msdosfs_denode.c,
msdosfs_vnops.c: Add support for getting and setting
UF_HIDDEN, UF_SYSTEM and UF_READONLY
in MSDOSFS.

It supported SF_ARCHIVED, but this has been
changed to be UF_ARCHIVE, which has the same
semantics as the DOS archive attribute instead
of inverse semantics like SF_ARCHIVED.

After discussion with Bruce Evans, change
several things in the msdosfs behavior:

Use UF_READONLY to indicate whether a file
is writeable instead of file permissions, but
don't actually enforce it.

Refuse to change attributes on the root
directory, because it is special in FAT
filesystems, but allow most other attribute
changes on directories.

Don't set the archive attribute on a directory
when its modification time is updated.
Windows and DOS don't set the archive attribute
in that scenario, so we are now bug-for-bug
compatible.

smbfs_node.c,
smbfs_vnops.c: Add support for UF_HIDDEN, UF_SYSTEM,
UF_READONLY and UF_ARCHIVE in SMBFS.

This is similar to changes that Apple has
made in their version of SMBFS (as of
smb-583.8, posted on opensource.apple.com),
but not quite the same.

We map SMB_FA_READONLY to UF_READONLY,
because UF_READONLY is intended to match
the semantics of the DOS readonly flag.
The MacOS X code maps both UF_IMMUTABLE
and SF_IMMUTABLE to SMB_FA_READONLY, but
the immutable flags have stronger meaning
than the DOS readonly bit.

stat.h: Add definitions for UF_SYSTEM, UF_SPARSE,
UF_OFFLINE, UF_REPARSE, UF_ARCHIVE, UF_READONLY
and UF_HIDDEN.

The definition of UF_HIDDEN is the same as
the MacOS X definition.

Add commented-out definitions of
UF_COMPRESSED and UF_TRACKED. They are
defined in MacOS X (as of 10.8.2), but we
do not implement them (yet).

ufs_vnops.c: Add support for getting and setting
UF_ARCHIVE, UF_HIDDEN, UF_OFFLINE, UF_READONLY,
UF_REPARSE, UF_SPARSE, and UF_SYSTEM in UFS.
Alphabetize the flags that are supported.

These new flags are only stored, UFS does
not take any action if the flag is set.

Sponsored by: Spectra Logic
Reviewed by: bde (earlier version)


# 254585 20-Aug-2013 delphij

MFV r254220:

Illumos ZFS issues:
4039 zfs_rename()/zfs_link() needs stronger test for XDEV


# 254138 09-Aug-2013 attilio

The soft and hard busy mechanism rely on the vm object lock to work.
Unify the 2 concept into a real, minimal, sxlock where the shared
acquisition represent the soft busy and the exclusive acquisition
represent the hard busy.
The old VPO_WANTED mechanism becames the hard-path for this new lock
and it becomes per-page rather than per-object.
The vm_object lock becames an interlock for this functionality:
it can be held in both read or write mode.
However, if the vm_object lock is held in read mode while acquiring
or releasing the busy state, the thread owner cannot make any
assumption on the busy state unless it is also busying it.

Also:
- Add a new flag to directly shared busy pages while vm_page_alloc
and vm_page_grab are being executed. This will be very helpful
once these functions happen under a read object lock.
- Move the swapping sleep into its own per-object flag

The KPI is heavilly changed this is why the version is bumped.
It is very likely that some VM ports users will need to change
their own code.

Sponsored by: EMC / Isilon storage division
Discussed with: alc
Reviewed by: jeff, kib
Tested by: gavin, bapt (older version)
Tested by: pho, scottl


# 253953 05-Aug-2013 attilio

Revert r253939:
We cannot busy a page before doing pagefaults.
Infact, it can deadlock against vnode lock, as it tries to vget().
Other functions, right now, have an opposite lock ordering, like
vm_object_sync(), which acquires the vnode lock first and then
sleeps on the busy mechanism.

Before this patch is reinserted we need to break this ordering.

Sponsored by: EMC / Isilon storage division
Reported by: kib


# 253939 04-Aug-2013 attilio

The page hold mechanism is fast but it has couple of fallouts:
- It does not let pages respect the LRU policy
- It bloats the active/inactive queues of few pages

Try to avoid it as much as possible with the long-term target to
completely remove it.
Use the soft-busy mechanism to protect page content accesses during
short-term operations (like uiomove_fromphys()).

After this change only vm_fault_quick_hold_pages() is still using the
hold mechanism for page content access.
There is an additional complexity there as the quick path cannot
immediately access the page object to busy the page and the slow path
cannot however busy more than one page a time (to avoid deadlocks).

Fixing such primitive can bring to complete removal of the page hold
mechanism.

Sponsored by: EMC / Isilon storage division
Discussed with: alc
Reviewed by: jeff
Tested by: pho


# 253073 09-Jul-2013 avg

zfs: try to properly handle i/o errors in mappedread_sf

Unconditionally freeing a page is not good, especially if it is the page
that was wired by the caller. The checks are picked up from
kern_sendfile.

MFC after: 3 weeks


# 252337 28-Jun-2013 gavin

Don't try to re-insert an already present but invalid page.

This could happen if a thread doing a page-in loses a ZFS range lock
race to a thread writing to the same range

This fixes "panic: vm_page_alloc: pindex already allocated" in
http://docs.FreeBSD.org/cgi/mid.cgi?1372165971.96049.42.camel

Submitted by: avg
MFC after: 1 week


# 251631 11-Jun-2013 delphij

MFV r251620:

ZFS comments need cleaner, more consistent style

Illumos ZFS issues:
3741 zfs comments need cleaner, more consistent style

MFC after: 2 weeks


# 249195 06-Apr-2013 mm

MFV r248217:
Merge change from vendor to reduce diff only.
ZFS dtrace probes are not supported on FreeBSD yet.

Illumos ZFS issues:
3598 want to dtrace when errors are generated in zfs

MFC after: 3 weeks


# 248084 09-Mar-2013 attilio

Switch the vm_object mutex to be a rwlock. This will enable in the
future further optimizations where the vm_object lock will be held
in read mode most of the time the page cache resident pool of pages
are accessed for reading purposes.

The change is mostly mechanical but few notes are reported:
* The KPI changes as follow:
- VM_OBJECT_LOCK() -> VM_OBJECT_WLOCK()
- VM_OBJECT_TRYLOCK() -> VM_OBJECT_TRYWLOCK()
- VM_OBJECT_UNLOCK() -> VM_OBJECT_WUNLOCK()
- VM_OBJECT_LOCK_ASSERT(MA_OWNED) -> VM_OBJECT_ASSERT_WLOCKED()
(in order to avoid visibility of implementation details)
- The read-mode operations are added:
VM_OBJECT_RLOCK(), VM_OBJECT_TRYRLOCK(), VM_OBJECT_RUNLOCK(),
VM_OBJECT_ASSERT_RLOCKED(), VM_OBJECT_ASSERT_LOCKED()
* The vm/vm_pager.h namespace pollution avoidance (forcing requiring
sys/mutex.h in consumers directly to cater its inlining functions
using VM_OBJECT_LOCK()) imposes that all the vm/vm_pager.h
consumers now must include also sys/rwlock.h.
* zfs requires a quite convoluted fix to include FreeBSD rwlocks into
the compat layer because the name clash between FreeBSD and solaris
versions must be avoided.
At this purpose zfs redefines the vm_object locking functions
directly, isolating the FreeBSD components in specific compat stubs.

The KPI results heavilly broken by this commit. Thirdy part ports must
be updated accordingly (I can think off-hand of VirtualBox, for example).

Sponsored by: EMC / Isilon storage division
Reviewed by: jeff
Reviewed by: pjd (ZFS specific review)
Discussed with: alc
Tested by: pho


# 247187 23-Feb-2013 mm

MFV r246653:
Import vendor change to avoid "unitialized variable" warnings.

Illumos ZFS issues:
3522 zfs module should not allow uninitialized variables

References:
https://www.illumos.org/issues/3522


# 246293 03-Feb-2013 avg

zfs: fix, improve and re-organize page_lookup and page_unlock

Now they are split into two pairs: page_hold/page_unhold for mappedread
and page_busy/page_unbusy for update_pages.

For mappedread we simply hold a page that is to be used as a source if it
is resident and valid (and not busy). This is sufficient since we are
only doing page -> user buffer copying. There is no page <-> backing
storage I/O involved.

update_pages is now better split to properly handle the putpages case
(page -> arc) and the regular write case (arc -> page).

For the latter we use complete protocol of marking an object with
paging-in-progress and marking a page with io_start (busy count).
Also, in this case we remove the write bit from all page mappings and
clear dirty bits of the pages, the former is needed to ensure that the
latter does the right thing.
Additionally we update a page if it is cached instead of just freeing it
as was done before. This needs to be verified.

A minor detail: ZFS-backed pages should always be either fully valid
or fully invalid. Assert this and use simpler API that does not deal
with sub-page blocks.

Reviewed by: kib
MFC after: 26 days


# 243807 03-Dec-2012 delphij

Use SA_ZPL_CRTIME instead of SA_ZPL_CTIME for creation time.

Submitted by: phil.stone at gmx.com
MFC after: 2 weeks


# 243763 01-Dec-2012 avg

zfs_getpages: make use of vm_page_readahead_finish

Suggested by: kib
MFC after: 5 days


# 243560 26-Nov-2012 mm

MFV r243395:

Introduce a new dataset aclmode setting "restricted" to protect ACL's
being destroyed or corrupted by a drive-by chmod.

illumos-gate 13889:a67716f16746
3254 add support in zfs for aclmode=restricted

References:
https://www.illumos.org/issues/3254

MFC after: 2 weeks


# 243524 25-Nov-2012 mm

MFV r243013 and r243267:

Import the zio nop-write improvement from Illumos. To reduce I/O,
nop-write omits overwriting data if the checksum (cryptographically
secure) of new data matches the checksum of existing data.
It also saves space if snapshots are in use.

It currently works only on datasets with enabled compression, disabled
deduplication and sha256 checksums.

IllumOS 13887:196932ec9e6a and 13888:7204b3392a58
3236 zio nop-write

References:
https://www.illumos.org/issues/3236

MFC after: 2 weeks


# 243521 25-Nov-2012 avg

zfs_freebsd_reclaim: remove a stray variable

... which leaked from a subsequent local change.
Unfortunately I noticed that only after commit.

MFC after: 5 weeks
X-MFC with: r243520


# 243520 25-Nov-2012 avg

zfs: overhaul zfs-vfs glue for vnode life-cycle management

* There is no need for the delayed destruction of znodes via taskqueue,
now that we do not need to fear recursion from getnewvnode into
zfs_inactive and zfs_freebsd_reclaim, thus making znode/vnode state
machine a bit simpler.

* More complete porting of zfs_inactive from Solaris VFS model to FreeBSD
vop_inactive and vop_reclaim model. All destructive actions are done
in zfs_freebsd_reclaim.
This allows to simplify zfs_zget logic.

* Allow zfs_zget to return a doomed vnode if the current thread already
has an exclusive lock on the vnode.

* Clean up Solaris-isms like bailing out of reclaim/inactive on certain
values of v_usecount (aka v_count) or directly messing with this counter.

* Do not clear z_vnode while znode is still accessible.
z_vnode should be cleared only after zfs_znode_dmu_fini.
Otherwise zfs_zget may get an effectively half-deconstructed znode.
This allows to simplify zfs_zget logic further.

The above changes fix at least two known/reported problems:

o An indefinite wait in the following code path:
vgone -> VOP_RECLAIM -> zfs_freebsd_reclaim -> vnode_destroy_vobject ->
put_pages -> zfs_write -> zil_commit -> zfs_zget
This happened because vgone marks a vnode as VI_DOOMED before calling
VOP_RECLAIM, but zfs_zget would not return a doomed vnode under any
circumstances.
The fix in this change is not complete as it won't fix a deadlock between
two threads doing VOP_RECLAIM where one thread is in zil_commit trying to
zfs_zget a znode/vnode being reclaimed by the other thread, which would be
blocked trying to enter zil_commit. This type of deadlock has not been
reported as of now.

o An indefinite wait in the unmount path caused by a znode "falling through
the cracks" in inactive+reclaim. This would happen if the znode is unlinked
while its vnode is still active.

To Do: pass locking flags parameter to zfs_zget, so that the zfs-vfs
glue code doesn't have to re-lock a vnode but could ask for proper locking
from the very start. This would also allow for the higher level code to
obtain a doomed vnode when it is expected/requested. Or to avoid blocking
when it is not allowed (see zil_commit example above).

ffs_vgetf seems like a good source of inspiration.

Tested by: Willem Jan Withagen <wjw@digiware.nl>
MFC after: 6 weeks


# 243518 25-Nov-2012 avg

add zfs_bmap to aid vnode_pager_haspage

... otherwise zfs_getpages would mostly be called with one page at a time.

It is expected that ZFS VOP_BMAP is only called from vnode_pager_haspage.
Since ZFS files can have variable block sizes and also because we don't
really know if any given blocks are consecutive, we can not really report
any additional blocks behind or ahead of a given block. Since physical
block numbers do not make sense for ZFS, we do not do any real translation
and thus pass back blk = lblk. The net effect is that vnode_pager_haspage
knows that the block exists and that the pages backed by the block can be
accessed. vnode_pager_haspage may be wrong about the exact count of the
pages backed by the block, because of a variable block size, which
vnode_pager_haspage doesn't really know - it only knows max block size in
a filesystem. So pages from multiple blocks can be passed to zfs_getpages,
but that is expected and correctly handled.

vnode_pager should not call zfs_bmap for any other reason, because ZFS
implements VOP_PUTPAGES and thus vnode_pager_generic_getpages is not used.

vfs_cluster code vfs_bio code should not be called for ZFS, because ZFS does
not use buffer cache layer.

Also, ZFS does not use vn_bmap_seekhole, it has its prviate mechanism for
working with holes.

The above list should cover all the current calls to VOP_BMAP.

Reviewed by: kib
MFC after: 6 weeks


# 243517 25-Nov-2012 avg

zfs_getpages: optimize for large block sizes

MFC after: 6 weeks


# 243270 19-Nov-2012 avg

zfs_remove: assert that delete_now case is never true on FreeBSD

That case is specific to Solaris VFS and it would violate pretty
fundamental contracts of FreeBSD VFS.

Discussed with: pjd
MFC after: 12 days


# 243268 19-Nov-2012 avg

zfs_remove: set VV_NOSYNC flag if a node is unlinked

Suggested by: kib
MFC after: 12 days


# 242566 04-Nov-2012 avg

zfs_freebsd_close: call zfs_close with count=1 instead of count=0

Otherwise we may be leaking z_sync_cnt, which may lead to unnecessary
ZIL sync-ing.

MFC after: 12 days


# 241896 22-Oct-2012 kib

Remove the support for using non-mpsafe filesystem modules.

In particular, do not lock Giant conditionally when calling into the
filesystem module, remove the VFS_LOCK_GIANT() and related
macros. Stop handling buffers belonging to non-mpsafe filesystems.

The VFS_VERSION is bumped to indicate the interface change which does
not result in the interface signatures changes.

Conducted and reviewed by: attilio
Tested by: pho


# 240829 22-Sep-2012 pjd

As in r226967, r226987 and r232401 changes to UFS and TMPFS remove cache
entries associated with the source and the target of rename().

MFC after: 1 week


# 240415 12-Sep-2012 mm

Merge recent zfs vendor changes, sync code and adjust userland DEBUG.

Illumos issued covered:
1884 Empty "used" field for zfs *space commands
3006 VERIFY[S,U,P] and ASSERT[S,U,P] frequently check if first argument
is zero
3028 zfs {group,user}space -n prints (null) instead of numeric GID/UID
3048 zfs {user,group}space [-s|-S] is broken
3049 zfs {user,group}space -t doesn't really filter the results
3060 zfs {user,group}space -H output isn't tab-delimited
3061 zfs {user,group}space -o doesn't use specified fields order
3064 usr/src/cmd/zpool/zpool_main.c misspells "successful"
3093 zfs {user,group}space's -i is noop
3098 zfs userspace/groupspace fail without saying why when run as non-root

References:
https://www.illumos.org/issues/ + [issue_id]

Obtained from: illumos (vendor/illumos, vendor/illumos-sys)
MFC after: 2 weeks


# 239077 05-Aug-2012 marius

Include <vm/vm_param.h> for PA_LOCK_COUNT in order to fix kernel build
with options ZFS after r239065.


# 235781 22-May-2012 trasz

Fix enforcement of file size limit with O_APPEND on ZFS.

vn_rlimit_fsize takes uio->uio_offset and uio->uio_resid into account
when determining whether given write would exceed RLIMIT_FSIZE.

When APPEND flag is specified, ZFS updates uio->uio_offset to point to the
end of file.

But this happens after a call to vn_rlimit_fsize, so vn_rlimit_fsize check
can be rendered ineffective by thread that opens some file with O_APPEND
and lseeks below RLIMIT_FSIZE before calling write.

Submitted by: Mateusz Guzik <mjguzik at gmail dot com>
MFC after: 2 weeks


# 234607 23-Apr-2012 trasz

Remove unused thread argument to vrecycle().

Reviewed by: kib


# 234064 09-Apr-2012 attilio

- Introduce a cache-miss optimization for consistency with other
accesses of the cache member of vm_object objects.
- Use novel vm_page_is_cached() for checks outside of the vm subsystem.

Reviewed by: alc
MFC after: 2 weeks
X-MFC: r234039


# 233918 05-Apr-2012 avg

zfs_ioctl: no need for ddi_copyin/out here because sys_ioctl handles that

On FreeBSD the direct ioctl argument is automatically copied in/out
as necesary by the kernel ioctl entry point.

PR: kern/164445
Submitted by: Luis Garces-Erice <lge@ieee.org>
Tested by: Attila Nagy <bra@fsn.hu>
MFC after: 5 days


# 229425 03-Jan-2012 dim

In sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c, check the
the number of links against LINK_MAX (which is INT16_MAX), not against
UINT32_MAX. Otherwise, the constant would implicitly be converted to
-1.

Reviewed by: pjd
MFC after: 1 week


# 228685 18-Dec-2011 pjd

Implement replying of ACLs updates. ACL changes should go to ZIL only
if the 'sync' property is set to 'always', so replying them is not common.

MFC after: 1 month


# 225418 06-Sep-2011 kib

Split the vm_page flags PG_WRITEABLE and PG_REFERENCED into atomic
flags field. Updates to the atomic flags are performed using the atomic
ops on the containing word, do not require any vm lock to be held, and
are non-blocking. The vm_page_aflag_set(9) and vm_page_aflag_clear(9)
functions are provided to modify afalgs.

Document the changes to flags field to only require the page lock.

Introduce vm_page_reference(9) function to provide a stable KPI and
KBI for filesystems like tmpfs and zfs which need to mark a page as
referenced.

Reviewed by: alc, attilio
Tested by: marius, flo (sparc64); andreast (powerpc, powerpc64)
Approved by: re (bz)


# 224605 02-Aug-2011 mm

Fix panic in zfs_read() if IO_SYNC flag supplied by checking for
zfsvfs->z_log before calling zil_commit(). [1]
Do not call zfs_read() from zfs_getextattr() with the IO_SYNC flag.

Submitted by: Alexander Zagrebin <alex@zagrebin.ru> [1]
Reviewed by: pjd@
Approved by: re (kib)
MFC after: 3 days


# 224252 21-Jul-2011 delphij

Bring the code more in-line with OpenSolaris source to
ease future port.

Reviewed by: pjd, mm
Approved by: re (kib)


# 224251 21-Jul-2011 delphij

A different implementation of r224231 proposed by pjd@,
which does not require change in the znode structure.
Specifically, it queries rdev from the znode in the
same sa_bulk_lookup already done in zfs_getattr().

Submitted by: pjd (with some revisions)
Reviewed by: pjd, mm
Approved by: re (kib)


# 224231 20-Jul-2011 delphij

Add a new field to in-core znode, z_rdev, to represent device nodes.

PR: kern/159010
Reviewed by: mm@
Approved by: re (kib)
MFC after: 2 weeks


# 224174 18-Jul-2011 mm

Resurrect the ZFS "aclmode" property
Change default of "aclmode" to "discard".

Illumos-gate changeset: 13370:8c04143bd318

Obtained from: Illumos (Feature #742)
MFC after: 2 weeks


# 219973 24-Mar-2011 pjd

Checking file access on size change is bogus. The checks are done earlier by
VFS where we know if this is truncate(2) or ftruncate(2). If this is the
latter we should depend on the mode the file was opened and not on the current
permission.

PR: standards/154873
Reported by: Mark Martinec <Mark.Martinec@ijs.si>
Discussed with: Eric Schrock <eric.schrock@delphix.com>
Discussed with: Mark Maybee <Mark.Maybee@Oracle.COM>
MFC after: 1 month


# 219404 08-Mar-2011 pjd

Correct readdir over ZFS handling.

Reported by: Pierre Beyssac <pb@fasterix.frmug.org>
MFC after: 1 month


# 219089 27-Feb-2011 pjd

Finally... Import the latest open-source ZFS version - (SPA) 28.

Few new things available from now on:

- Data deduplication.
- Triple parity RAIDZ (RAIDZ3).
- zfs diff.
- zpool split.
- Snapshot holds.
- zpool import -F. Allows to rewind corrupted pool to earlier
transaction group.
- Possibility to import pool in read-only mode.

MFC after: 1 month


# 218550 11-Feb-2011 kib

For UIO_NOCOPY case of reading request on zfs vnode, which has vm object
attached, activate the page after the successful read, and free the page
if read was unsuccessfull.

Freshly allocated page is not on any queue yet, and not activating (or
deactivating) the page leaves it on no queue, excluding the page from
pagedaemon scans and making the memory disappeared until the vnode
reclaimed.

Reviewed by: avg
MFC after: 1 week


# 216378 11-Dec-2010 pjd

Remove redundant semicolon and empty like.


# 215401 16-Nov-2010 avg

zfs+sendfile: populate all requested pages, not just those already cached

kern_sendfile() uses vm_rdwr() to read-ahead blocks of data to populate
page cache. When sendfile stumbles upon a page that is not populated
yet, it sends out all the mbufs that it collected so far. This
resulted in very poor performance with ZFS when file data is not in the
page cache, because ZFS vop_read for UIO_NOCOPY case populated only
those pages that are already in cache, but not valid. Which means that
most of the time it populated only the first requested page in the
described above scenario.

Reported by: Alexander Zagrebin <alexz@visp.ru>
Tested by: Alexander Zagrebin <alexz@visp.ru>,
Artemiev Igor <ai@kliksys.ru>
MFC after: 12 days


# 214378 26-Oct-2010 mm

Bugfix merge from OpenSolaris:

OpenSolaris onnv-revision: 10209:91f47f0e7728
6830541 zfs_get_data_trips on a verify
6696242 multiple zfs_fillpage() zfs: accessing past end of object panics
6785914 zfs fails to drop dn_struct_rwlock in recovery code path

Approved by: delphij (mentor)
Obtained from: OpenSolaris (Bug ID 6830541, 6696242, 6785914)
MFC after: 2 weeks


# 213937 16-Oct-2010 avg

zfs: add vop_getpages method implementation

This should make vnode_pager_getpages path a bit shorter and clearer.
Also this should eliminate problems with partially valid pages.
Having this method opens room for future optimizations.

To do: try to satisfy other pages besides the required one taking into
account tradeofs between number of page faults, read throughput and read
latency. Also, eventually vop_putpages should be added too.

Reviewed by: kib, mm, pjd
MFC after: 3 weeks


# 213730 12-Oct-2010 avg

zfs + sendfile: do not produce partially valid pages for vnode's tail

Since r212650 and before this change sendfile(2) could produce
a partially valid page for a trailing portion of a ZFS vnode.
vm_fault() always wants to see a fully valid page even if it's the last
page that partially extends beyond vnode's end. Otherwise it calls
vop_getpages() to bring in the page. In the case of ZFS this means
that the data is read from the page into the same page and this breaks
checks in ZFS mappedread() - a thread that set VPO_BUSY on the page in
vm_fault() will get blocked forever waiting for it to be cleared.

Many thanks to Kai and Jeremy for reproducing the issue and providing
important debugging information and help.

Reported by: Kai Gallasch <gallasch@free.de>,
Jeremy Chadwick <freebsd@jdc.parodius.com>
Tested by: Kai Gallasch <gallasch@free.de>,
Jeremy Chadwick <freebsd@jdc.parodius.com>
Reviewed by: kib
MFC after: 3 days
To-Do: apply the same treatment to tmpfs + sendfile


# 213673 10-Oct-2010 pjd

Provide internal ioflags() function that converts ioflag provided by FreeBSD's
VFS to OpenSolaris-specific ioflag expected by ZFS. Use it for read and write
operations.

Reviewed by: mm
MFC after: 1 week


# 213634 08-Oct-2010 mm

Change FAPPEND to IO_APPEND as this is a ioflag and not a fflag.
This corrects writing to append-only files on ZFS.

PR: kern/149495 [1], kern/151082 [2]
Submitted by: Daniel Zhelev <daniel@zhelev.biz> [1], Michael Naef <cal@linu.gs> [2]
Approved by: delphij (mentor)
MFC after: 1 week


# 212951 21-Sep-2010 avg

zfs_map_page/zfs_unmap_page: do not use sched_pin() and SFB_CPUPRIVATE

zfs_map_page/zfs_unmap_page are mostly called around potential I/O paths
and it seems to be a not very good idea to do cpu pinning there.

Suggested by: kib
MFC after: 2 weeks


# 212950 21-Sep-2010 avg

zfs_vnops: use zfs_map_page/zfs_unmap_page helper functions in another place

MFC after: 2 weeks


# 212782 17-Sep-2010 avg

zfs+sendfile: advance uio_offset upon reading as well

Picked from analogous code in tmpfs.

MFC after: 1 week


# 212694 15-Sep-2010 mm

Fix kernel panic when moving a file to .zfs/shares
Fix possible loss of correct error return code in ZFS mount

OpenSolaris revisions and Bug IDs:

11824:53128e5db7cf
6863610 ZFS mount can lose correct error return

12079:13822b941977
6939941 problem with moving files in zfs (142901-12)

Approved by: delphij (mentor)
Obtained from: OpenSolaris (Bug ID 6863610, 6939941)
MFC after: 3 days


# 212657 15-Sep-2010 avg

zfs vn_has_cached_data: take into account v_object->cache != NULL

This mirrors code in tmpfs.
This changge shouldn't affect much read path, it may cause unnecessary
vm_page_lookup calls in the case where v_object has no active or inactive
pages but has some cache pages. I believe this situation to be non-essential.

In write path this change should allow us to properly detect the above
case and free a cache page when we write to a range that corresponds to it.
If this situation is undetected then we could have a discrepancy between
data in page cache and in ARC or on disk.

This change allows us to re-enable vn_has_cached_data() check in zfs_write.

NOTE: strictly speaking resident_page_count and cache fields of v_object
should be exmined under VM_OBJECT_LOCK, but for this particular usage
we may get away with it.

Discussed with: alc, kib
Approved by: pjd
Tested with: tools/regression/fsx
MFC after: 3 weeks


# 212655 15-Sep-2010 avg

zfs mappedread, update_pages: use int for offset and length within a page

uint64_t, int64_t were redundant there

Approved by: pjd
Tested by: tools/regression/fsx
MFC after: 2 weeks


# 212654 15-Sep-2010 avg

zfs mappedread: use uiomove_fromphys where possible

Reviewed by: alc
Approved by: pjd
Tested by: tools/regression/fsx
MFC after: 2 weeks


# 212652 15-Sep-2010 avg

zfs: catch up with vm_page_sleep_if_busy changes

Reviewed by: alc
Approved by: pjd
Tested by: tools/regression/fsx
MFC after: 2 weeks


# 212650 15-Sep-2010 avg

tmpfs, zfs + sendfile: mark page bits as valid after populating it with data

Otherwise, adding insult to injury, in addition to double-caching of data
we would always copy the data into a vnode's vm object page from backend.
This is specific to sendfile case only (VOP_READ with UIO_NOCOPY).

PR: kern/141305
Reported by: Wiktor Niesiobedzki <bsd@vink.pl>
Reviewed by: alc
Tested by: tools/regression/sockets/sendfile
MFC after: 2 weeks


# 212573 13-Sep-2010 pjd

Remove the page queues lock around vm_page_undirty() - it is no longer needed.

Reviewed by: alc


# 212002 30-Aug-2010 jh

execve(2) has a special check for file permissions: a file must have at
least one execute bit set, otherwise execve(2) will return EACCES even
for an user with PRIV_VFS_EXEC privilege.

Add the check also to vaccess(9), vaccess_acl_nfs4(9) and
vaccess_acl_posix1e(9). This makes access(2) to better agree with
execve(2). Because ZFS doesn't use vaccess(9) for VEXEC, add the check
to zfs_freebsd_access() too. There may be other file systems which are
not using vaccess*() functions and need to be handled separately.

PR: kern/125009
Reviewed by: bde, trasz
Approved by: pjd (ZFS part)


# 211932 28-Aug-2010 mm

Import changes from OpenSolaris that provide
- better ACL caching and speedup of ACL permission checks
- faster handling of stat()
- lowered mutex contention in the read/writer lock (rrwlock)
- several related bugfixes

Detailed information (OpenSolaris onnv changesets and Bug IDs):

9749:105f407a2680
6802734 Support for Access Based Enumeration (not used on FreeBSD)
6844861 inconsistent xattr readdir behavior with too-small buffer

9866:ddc5f1d8eb4e
6848431 zfs with rstchown=0 or file_chown_self privilege allows user to "take" ownership

9981:b4907297e740
6775100 stat() performance on files on zfs should be improved
6827779 rrwlock is overly protective of its counters

10143:d2d432dfe597
6857433 memory leaks found at: zfs_acl_alloc/zfs_acl_node_alloc
6860318 truncate() on zfsroot succeeds when file has a component of its path set without access permission

10232:f37b85f7e03e
6865875 zfs sometimes incorrectly giving search access to a dir

10250:b179ceb34b62
6867395 zpool_upgrade_007_pos testcase panic'd with BAD TRAP: type=e (#pf Page fault)

10269:2788675568fd
6868276 zfs_rezget() can be hazardous when znode has a cached ACL

10295:f7a18a1e9610
6870564 panic in zfs_getsecattr

Approved by: delphij (mentor)
Obtained from: OpenSolaris (multiple Bug IDs)
MFC after: 2 weeks


# 210470 25-Jul-2010 mm

Import two changesets from OpenSolaris to make future updates easier.

The changes do not affect FreeBSD code because
zfs_znode_move(), cleanlocks() and cleanshares() are not used.

OpenSolaris onnv changeset: 9788:f660bc44f2e8, 9909:aa280f585a3e

Approved by: pjd, delphij (mentor)
Obtained from: OpenSolaris (Bug ID 6843700, 6790232)
MFC after: 7 weeks


# 209962 13-Jul-2010 mm

Merge ZFS version 15 and almost all OpenSolaris bugfixes referenced
in Solaris 10 updates 141445-09 and 142901-14.

Detailed information:
(OpenSolaris revisions and Bug IDs, Solaris 10 patch numbers)

7844:effed23820ae
6755435 zfs_open() and zfs_close() needs to use ZFS_ENTER/ZFS_VERIFY_ZP (141445-01)

7897:e520d8258820
6748436 inconsistent zpool.cache in boot_archive could panic a zfs root filesystem upon boot-up (141445-01)

7965:b795da521357
6740164 zpool attach can create an illegal root pool (141909-02)

8084:b811cc60d650
6769612 zpool_import() will continue to write to cachefile even if altroot is set (N/A)

8121:7fd09d4ebd9c
6757430 want an option for zdb to disable space map loading and leak tracking (141445-01)

8129:e4f45a0bfbb0
6542860 ASSERT: reason != VDEV_LABEL_REMOVE||vdev_inuse(vd, crtxg, reason, 0) (141445-01)

8188:fd00c0a81e80
6761100 want zdb option to select older uberblocks (141445-01)

8190:6eeea43ced42
6774886 zfs_setattr() won't allow ndmp to restore SUNWattr_rw (141445-01)

8225:59a9961c2aeb
6737463 panic while trying to write out config file if root pool import fails (141445-01)

8227:f7d7be9b1f56
6765294 Refactor replay (141445-01)

8228:51e9ca9ee3a5
6572357 libzfs should do more to avoid mnttab lookups (141909-01)
6572376 zfs_iter_filesystems and zfs_iter_snapshots get objset stats twice (141909-01)

8241:5a60f16123ba
6328632 zpool offline is a bit too conservative (141445-01)
6739487 ASSERT: txg <= spa_final_txg due to scrub/export race (141445-01)
6767129 ASSERT: cvd->vdev_isspare, in spa_vdev_detach() (141445-01)
6747698 checksum failures after offline -t / export / import / scrub (141445-01)
6745863 ZFS writes to disk after it has been offlined (141445-01)
6722540 50% slowdown on scrub/resilver with certain vdev configurations (141445-01)
6759999 resilver logic rewrites ditto blocks on both source and destination (141445-01)
6758107 I/O should never suspend during spa_load() (141445-01)
6776548 codereview(1) runs off the page when faced with multi-line comments (N/A)
6761406 AMD errata 91 workaround doesn't work on 64-bit systems (141445-01)

8242:e46e4b2f0a03
6770866 GRUB/ZFS should require physical path or devid, but not both (141445-01)

8269:03a7e9050cfd
6674216 "zfs share" doesn't work, but "zfs set sharenfs=on" does (141445-01)
6621164 $SRC/cmd/zfs/zfs_main.c seems to have a syntax error in the translation note (141445-01)
6635482 i18n problems in libzfs_dataset.c and zfs_main.c (141445-01)
6595194 "zfs get" VALUE column is as wide as NAME (141445-01)
6722991 vdev_disk.c: error checking for ddi_pathname_to_dev_t() must test for NODEV (141445-01)
6396518 ASSERT strings shouldn't be pre-processed (141445-01)

8274:846b39508aff
6713916 scrub/resilver needlessly decompress data (141445-01)

8343:655db2375fed
6739553 libzfs_status msgid table is out of sync (141445-01)
6784104 libzfs unfairly rejects numerical values greater than 2^63 (141445-01)
6784108 zfs_realloc() should not free original memory on failure (141445-01)

8525:e0e0e525d0f8
6788830 set large value to reservation cause core dump (141445-01)
6791064 want sysevents for ZFS scrub (141445-01)
6791066 need to be able to set cachefile on faulted pools (141445-01)
6791071 zpool_do_import() should not enable datasets on faulted pools (141445-01)
6792134 getting multiple properties on a faulted pool leads to confusion (141445-01)

8547:bcc7b46e5ff7
6792884 Vista clients cannot access .zfs (141445-01)

8632:36ef517870a3
6798384 It can take a village to raise a zio (141445-01)

8636:7e4ce9158df3
6551866 deadlock between zfs_write(), zfs_freesp(), and zfs_putapage() (141909-01)
6504953 zfs_getpage() misunderstands VOP_GETPAGE() interface (141909-01)
6702206 ZFS read/writer lock contention throttles sendfile() benchmark (141445-01)
6780491 Zone on a ZFS filesystem has poor fork/exec performance (141445-01)
6747596 assertion failed: DVA_EQUAL(BP_IDENTITY(&zio->io_bp_orig), BP_IDENTITY(zio->io_bp))); (141445-01)

8692:692d4668b40d
6801507 ZFS read aggregation should not mind the gap (141445-01)

8697:e62d2612c14d
6633095 creating a filesystem with many properties set is slow (141445-01)

8768:dfecfdbb27ed
6775697 oracle crashes when overwriting after hitting quota on zfs (141909-01)

8811:f8deccf701cf
6790687 libzfs mnttab caching ignores external changes (141445-01)
6791101 memory leak from libzfs_mnttab_init (141445-01)

8845:91af0d9c0790
6800942 smb_session_create() incorrectly stores IP addresses (N/A)
6582163 Access Control List (ACL) for shares (141445-01)
6804954 smb_search - shortname field should be space padded following the NULL terminator (N/A)
6800184 Panic at smb_oplock_conflict+0x35() (N/A)

8876:59d2e67b4b65
6803822 Reboot after replacement of system disk in a ZFS mirror drops to grub> prompt (141445-01)

8924:5af812f84759
6789318 coredump when issue zdb -uuuu poolname/ (141445-01)
6790345 zdb -dddd -e poolname coredump (141445-01)
6797109 zdb: 'zdb -dddddd pool_name/fs_name inode' coredump if the file with inode was deleted (141445-01)
6797118 zdb: 'zdb -dddddd poolname inum' coredump if I miss the fs name (141445-01)
6803343 shareiscsi=on failed, iscsitgtd failed request to share (141445-01)

9030:243fd360d81f
6815893 hang mounting a dataset after booting into a new boot environment (141445-01)

9056:826e1858a846
6809691 'zpool create -f' no longer overwrites ufs infomation (141445-01)

9179:d8fbd96b79b3
6790064 zfs needs to determine uid and gid earlier in create process (141445-01)

9214:8d350e5d04aa
6604992 forced unmount + being in .zfs/snapshot/<snap1> = not happy (141909-01)
6810367 assertion failed: dvp->v_flag & VROOT, file: ../../common/fs/gfs.c, line: 426 (141909-01)

9229:e3f8b41e5db4
6807765 ztest_dsl_dataset_promote_busy needs to clean up after ENOSPC (141445-01)

9230:e4561e3eb1ef
6821169 offlining a device results in checksum errors (141445-01)
6821170 ZFS should not increment error stats for unavailable devices (141445-01)
6824006 need to increase issue and interrupt taskqs threads in zfs (141445-01)

9234:bffdc4fc05c4
6792139 recovering from a suspended pool needs some work (141445-01)
6794830 reboot command hangs on a failed zfs pool (141445-01)

9246:67c03c93c071
6824062 System panicked in zfs_mount due to NULL pointer dereference when running btts and svvs tests (141909-01)

9276:a8a7fc849933
6816124 System crash running zpool destroy on broken zpool (141445-03)

9355:09928982c591
6818183 zfs snapshot -r is slow due to set_snap_props() doing txg_wait_synced() for each new snapshot (141445-03)

9391:413d0661ef33
6710376 log device can show incorrect status when other parts of pool are degraded (141445-03)

9396:f41cf682d0d3 (part already merged)
6501037 want user/group quotas on ZFS (141445-03)
6827260 assertion failed in arc_read(): hdr == pbuf->b_hdr (141445-03)
6815592 panic: No such hold X on refcount Y from zfs_znode_move (141445-03)
6759986 zfs list shows temporary %clone when doing online zfs recv (141445-03)

9404:319573cd93f8
6774713 zfs ignores canmount=noauto when sharenfs property != off (141445-03)

9412:4aefd8704ce0
6717022 ZFS DMU needs zero-copy support (141445-03)

9425:e7ffacaec3a8
6799895 spa_add_spares() needs to be protected by config lock (141445-03)
6826466 want to post sysevents on hot spare activation (141445-03)
6826468 spa 'allowfaulted' needs some work (141445-03)
6826469 kernel support for storing vdev FRU information (141445-03)
6826470 skip posting checksum errors from DTL regions of leaf vdevs (141445-03)
6826471 I/O errors after device remove probe can confuse FMA (141445-03)
6826472 spares should enjoy some of the benefits of cache devices (141445-03)

9443:2a96d8478e95
6833711 gang leaders shouldn't have to be logical (141445-03)

9463:d0bd231c7518
6764124 want zdb to be able to checksum metadata blocks only (141445-03)

9465:8372081b8019
6830237 zfs panic in zfs_groupmember() (141445-03)

9466:1fdfd1fed9c4
6833162 phantom log device in zpool status (141445-03)

9469:4f68f041ddcd
6824968 add ZFS userquota support to rquotad (141445-03)

9470:6d827468d7b5
6834217 godfather I/O should reexecute (141445-03)

9480:fcff33da767f
6596237 Stop looking and start ganging (141909-02)

9493:9933d599bc93
6623978 lwb->lwb_buf != NULL, file ../../../uts/common/fs/zfs/zil.c, line 787, function zil_lwb_commit (141445-06)

9512:64cafcbcc337
6801810 Commit of aligned streaming rewrites to ZIL device causes unwanted disk reads (N/A)

9515:d3b739d9d043
6586537 async zio taskqs can block out userland commands (142901-09)

9554:787363635b6a
6836768 zfs_userspace() callback has no way to indicate failure (N/A)

9574:1eb6a6ab2c57
6838062 zfs panics when an error is encountered in space_map_load() (141909-02)

9583:b0696cd037cc
6794136 Panic BAD TRAP: type=e when importing degraded zraid pool. (141909-03)

9630:e25a03f552e0
6776104 "zfs import" deadlock between spa_unload() and spa_async_thread() (141445-06)

9653:a70048a304d1
6664765 Unable to remove files when using fat-zap and quota exceeded on ZFS filesystem (141445-06)

9688:127be1845343
6841321 zfs userspace / zfs get userused@ doesn't work on mounted snapshot (N/A)
6843069 zfs get userused@S-1-... doesn't work (N/A)

9873:8ddc892eca6e
6847229 assertion failed: refcount_count(&tx->tx_space_written) + delta <= tx->tx_space_towrite in dmu_tx.c (141445-06)

9904:d260bd3fd47c
6838344 kernel heap corruption detected on zil while stress testing (141445-06)

9951:a4895b3dd543
6844900 zfs_ioc_userspace_upgrade leaks (N/A)

10040:38b25aeeaf7a
6857012 zfs panics on zpool import (141445-06)

10000:241a51d8720c
6848242 zdb -e no longer works as expected (N/A)

10100:4a6965f6bef8
6856634 snv_117 not booting: zfs_parse_bootfs: error2 (141445-07)

10160:a45b03783d44
6861983 zfs should use new name <-> SID interfaces (N/A)
6862984 userquota commands can hang (141445-06)

10299:80845694147f
6696858 zfs receive of incremental replication stream can dereference NULL pointer and crash (N/A)

10302:a9e3d1987706
6696858 zfs receive of incremental replication stream can dereference NULL pointer and crash (fix lint) (N/A)

10575:2a8816c5173b (partial merge)
6882227 spa_async_remove() shouldn't do a full clear (142901-14)

10800:469478b180d9
6880764 fsync on zfs is broken if writes are greater than 32kb on a hard crash and no log attached (142901-09)
6793430 zdb -ivvvv assertion failure: bp->blk_cksum.zc_word[2] == dmu_objset_id(zilog->zl_os) (N/A)

10801:e0bf032e8673 (partial merge)
6822816 assertion failed: zap_remove_int(ds_next_clones_obj) returns ENOENT (142901-09)

10810:b6b161a6ae4a
6892298 buf->b_hdr->b_state != arc_anon, file: ../../common/fs/zfs/arc.c, line: 2849 (142901-09)

10890:499786962772
6807339 spurious checksum errors when replacing a vdev (142901-13)

11249:6c30f7dfc97b
6906110 bad trap panic in zil_replay_log_record (142901-13)
6906946 zfs replay isn't handling uid/gid correctly (142901-13)

11454:6e69bacc1a5a
6898245 suspended zpool should not cause rest of the zfs/zpool commands to hang (142901-10)

11546:42ea6be8961b (partial merge)
6833999 3-way deadlock in dsl_dataset_hold_ref() and dsl_sync_task_group_sync() (142901-09)

Discussed with: pjd
Approved by: delphij (mentor)
Obtained from: OpenSolaris (multiple Bug IDs)
MFC after: 2 months


# 209097 12-Jun-2010 mm

Fix ZFS panic deadlock: cycle in blocking chain via zfs_zget

OpenSolaris onnv-revision: 9774:0bb234ab2287

Obtained from: OpenSolaris (Bug ID 6788152)
Approved by: pjd, delphij (mentor)
MFC after: 3 days


# 208131 16-May-2010 mm

Fix deadlock between zfs_dirent_lock and zfs_rmdir

OpenSolaris onnv revision: 11321:506b7043a14c

Approved by: pjd, delphij (mentor)
Obtained from: OpenSolaris (Bug ID 6847615)
MFC after: 3 days


# 208030 13-May-2010 trasz

Add missing check to prevent local users from panicing the kernel by trying
to set malformed ACL.

MFC after: 3 days


# 207745 07-May-2010 trasz

Enforce RLIMIT_FSIZE in ZFS.

Reviewed by: pjd@


# 206667 15-Apr-2010 pjd

Fix 3-way deadlock that can happen because of ZFS and vnode lock
order reversal.

thread0 (vfs_fhtovp) thread1 (vop_getattr) thread2 (zfs_recv)
-------------------- --------------------- ------------------
vn_lock
rrw_enter_read
rrw_enter_write (hangs)
rrw_enter_read (hangs)
vn_lock (hangs)

Submitted by: Attila Nagy <bra@fsn.hu>
MFC after: 3 days


# 200162 05-Dec-2009 kib

Change VOP_FSYNC for zfs vnode from VOP_PANIC to zfs_freebsd_fsync(),
both to not panic when fsync(2) is called for fifo on zfs
filedescriptor, and to actually fsync fifo inode to permanent storage.

PR: kern/141177
Reviewed by: pjd
MFC after: 1 week


# 198703 30-Oct-2009 pjd

- zfs_zaccess() can handle VAPPEND too, so map V_APPEND to VAPPEND and call
zfs_access() instead of vaccess() in this case as well.
- If VADMIN is specified with another V* flag (unlikely) call both
zfs_access() and vaccess() after spliting V* flags.

This fixes "dirtying snapshot!" panic.

PR: kern/139806
Reported by: Carl Chave <carl@chave.us>
In co-operation with: jh
MFC after: 3 days


# 197861 08-Oct-2009 pjd

Allow file system owner to modify system flags if securelevel permits.

MFC after: 3 days


# 197831 07-Oct-2009 pjd

Fix situation where Mac OS X NFS client creates a file and when it tries
to set ownership and mode in the same setattr operation, the mode was
overwritten by secpolicy_vnode_setattr().

PR: kern/118320
Submitted by: Mark Thompson <info-gentoo@mark.thompson.bz>
MFC after: 3 days


# 197683 01-Oct-2009 delphij

Return EOPNOTSUPP instead of EINVAL when doing chflags(2) over an old
format ZFS, as defined in the manual page.

Submitted by: pjd (response of my original patch but bugs are mine)
MFC after: 3 days


# 197435 23-Sep-2009 trasz

In VOP_SETACL(9) and VOP_GETACL(9), specifying wrong ACL type should result
in EINVAL, not EOPNOTSUPP.


# 197153 13-Sep-2009 pjd

When zfs.ko is compiled with debug, make sure that znode and vnode point at
each other.

MFC after: 3 days


# 197152 13-Sep-2009 pjd

Extend scope of the z_teardown_lock lock for consistency and "just in case".

MFC after: 3 days


# 197133 12-Sep-2009 pjd

- Protect reclaim with z_teardown_inactive_lock.
- Be prepared for dbuf to disappear in zfs_reclaim_complete() and check if
z_dbuf field is NULL - this might happen in case of rollback or forced
unmount between zfs_freebsd_reclaim() and zfs_reclaim_complete().
- On forced unmount wait for all znodes to be destroyed - destruction can be
done asynchronously via zfs_reclaim_complete().

MFC after: 1 week


# 196949 07-Sep-2009 trasz

Enable NFSv4 ACL support in ZFS.

Reviewed by: pjd


# 196307 17-Aug-2009 pjd

Manage asynchronous vnode release just like Solaris.

Discussed with: kmacy
Approved by: re (kib)


# 196303 17-Aug-2009 pjd

- Reduce z_teardown_lock lock scope a bit.
- The error variable is int, not bool.
- Convert spaces to tabs where needed.

Approved by: re (kib)


# 196301 17-Aug-2009 pjd

If z_buf is NULL, we should free znode immediately.

Noticed by: avg
Approved by: re (kib)


# 196299 17-Aug-2009 pjd

- We need to recycle vnode instead of freeing znode.

Submitted by: avg

- Add missing vnode interlock unlock.
- Remove redundant znode locking.

Approved by: re (kib)


# 195822 22-Jul-2009 trasz

Fix extattr_list_file(2) on ZFS in case the attribute directory
doesn't exist and user doesn't have write access to the file.
Without this fix, it returns bogus value instead of 0. For some
reason this didn't manifest on my kernel compiled with -O0.

PR: kern/136601
Submitted by: Jaakko Heinonen <jh at saunalahti dot fi>
Approved by: re (kib)


# 195785 20-Jul-2009 trasz

Fix permission handling for extended attributes in ZFS. Without
this change, ZFS uses SunOS Alternate Data Streams semantics - each
EA has its own permissions, which are set at EA creation time
and - unlike SunOS - invisible to the user and impossible to change.
From the user point of view, it's just broken: sometimes access
is granted when it shouldn't be, sometimes it's denied when
it shouldn't be.

This patch makes it behave just like UFS, i.e. depend on current
file permissions. Also, it fixes returned error codes (ENOATTR
instead of ENOENT) and makes listextattr(2) return 0 instead
of EPERM where there is no EA directory (i.e. the file never had
any EA).

Reviewed by: pjd (idea, not actual code)
Approved by: re (kib)


# 194586 21-Jun-2009 kib

Add another flags argument to vn_open_cred. Use it to specify that some
vn_open_cred invocations shall not audit namei path.

In particular, specify VN_OPEN_NOAUDIT for dotdot lookup performed by
default implementation of vop_vptocnp, and for the open done for core
file. vn_fullpath is called from the audit code, and vn_open there need
to disable audit to avoid infinite recursion. Core file is created on
return to user mode, that, in particular, happens during syscall return.
The creation of the core file is audited by direct calls, and we do not
want to overwrite audit information for syscall.

Reported, reviewed and tested by: rwatson


# 192800 26-May-2009 trasz

MFp4 changes neccessary for NFSv4 ACLs support in ZFS. This is mostly
about removing a few #ifdefs and providing compatibility wrappers and
VOP implementations to get and set an ACL; ZFS does ACL enforcement all
by itself.

Note that the VOPs are ifdefed out for now, so this change should be
a no-op.

Reviewed by: pjd


# 192689 24-May-2009 trasz

Fix comment.


# 192237 17-May-2009 kmacy

SAVESTART implies SAVENAME


# 191900 07-May-2009 kmacy

Asynchronously release vnodes to avoid blocking on range locks when calling back in to zfs.
This is based on a fix that went in to opensolaris on March 9th. However, it uses a dedicated
thread instead of a Solaris' taskq to avoid doing a blocking memory allocation with the vnode
interlock held.

This fixes a long-time deadlock in ZFS. This is not, strictly speaking, an LOR. The spa_zio
thread releases a vnode, this calls in to vn_reclaim which in turn needs to acquire range locks
to sync dirty data out to disk. The range locks are already held by a user-level process waiting
on a condition variable that it the process is waiting on a spa_zio thread to signal it on. The
process could not be signalled because the spa_zio thread could not proceed.

The nature of this problem was not apparent due to ZFS locks opting out of witness which meant
that DDB did not know about the locks that were held by ZFS.

Reviewed by: pjd
MFC after: 7 days


# 190888 10-Apr-2009 rwatson

Remove VOP_LEASE and supporting functions. This hasn't been used since
the removal of NQNFS, but was left in in case it was required for NFSv4.
Since our new NFSv4 client and server can't use it for their
requirements, GC the old mechanism, as well as other unused lease-
related code and interfaces.

Due to its impact on kernel programming and binary interfaces, this
change should not be MFC'd.

Proposed by: jeff
Reviewed by: jeff
Discussed with: rmacklem, zach loafman @ isilon


# 188588 13-Feb-2009 jhb

Use shared vnode locks when invoking VOP_READDIR().

MFC after: 1 month


# 185172 22-Nov-2008 pjd

IFp4: Finish implemnetation of chflags(2) for ZFS. While doing this I found
that zfs_access() can only handle VREAD, VWRITE and VEXEC, for the rest we need
to use vaccess(9).


# 185029 17-Nov-2008 pjd

Update ZFS from version 6 to 13 and bring some FreeBSD-specific changes.

This bring huge amount of changes, I'll enumerate only user-visible changes:

- Delegated Administration

Allows regular users to perform ZFS operations, like file system
creation, snapshot creation, etc.

- L2ARC

Level 2 cache for ZFS - allows to use additional disks for cache.
Huge performance improvements mostly for random read of mostly
static content.

- slog

Allow to use additional disks for ZFS Intent Log to speed up
operations like fsync(2).

- vfs.zfs.super_owner

Allows regular users to perform privileged operations on files stored
on ZFS file systems owned by him. Very careful with this one.

- chflags(2)

Not all the flags are supported. This still needs work.

- ZFSBoot

Support to boot off of ZFS pool. Not finished, AFAIK.

Submitted by: dfr

- Snapshot properties

- New failure modes

Before if write requested failed, system paniced. Now one
can select from one of three failure modes:
- panic - panic on write error
- wait - wait for disk to reappear
- continue - serve read requests if possible, block write requests

- Refquota, refreservation properties

Just quota and reservation properties, but don't count space consumed
by children file systems, clones and snapshots.

- Sparse volumes

ZVOLs that don't reserve space in the pool.

- External attributes

Compatible with extattr(2).

- NFSv4-ACLs

Not sure about the status, might not be complete yet.

Submitted by: trasz

- Creation-time properties

- Regression tests for zpool(8) command.

Obtained from: OpenSolaris


# 184413 28-Oct-2008 trasz

Introduce accmode_t. This is required for NFSv4 ACLs - it will be neccessary
to add more V* constants, and the variables changed by this patch were often
being assigned to mode_t variables, which is 16 bit.

Approved by: rwatson (mentor)


# 182905 10-Sep-2008 trasz

Remove VSVTX, VSGID and VSUID. This should be a no-op,
as VSVTX == S_ISVTX, VSGID == S_ISGID and VSUID == S_ISUID.

Approved by: rwatson (mentor)


# 182824 06-Sep-2008 pjd

Lock vnode exclusively around insmntque().


# 182371 28-Aug-2008 attilio

Decontextualize the couplet VOP_GETATTR / VOP_SETATTR as the passed thread
was always curthread and totally unuseful.

Tested by: Giovanni Trematerra <giovanni dot trematerra at gmail dot com>


# 178243 16-Apr-2008 kib

Move the head of byte-level advisory lock list from the
filesystem-specific vnode data to the struct vnode. Provide the
default implementation for the vop_advlock and vop_advlockasync.
Purge the locks on the vnode reclaim by using the lf_purgelocks().
The default implementation is augmented for the nfs and smbfs.
In the nfs_advlock, push the Giant inside the nfs_dolock.

Before the change, the vop_advlock and vop_advlockasync have taken the
unlocked vnode and dereferenced the fs-private inode data, racing with
with the vnode reclamation due to forced unmount. Now, the vop_getattr
under the shared vnode lock is used to obtain the inode size, and
later, in the lf_advlockasync, after locking the vnode interlock, the
VI_DOOMED flag is checked to prevent an operation on the doomed vnode.

The implementation of the lf_purgelocks() is submitted by dfr.

Reported by: kris
Tested by: kris, pho
Discussed with: jeff, dfr
MFC after: 2 weeks


# 177633 26-Mar-2008 dfr

Add the new kernel-mode NFS Lock Manager. To use it instead of the
user-mode lock manager, build a kernel with the NFSLOCKD option and
add '-k' to 'rpc_lockd_flags' in rc.conf.

Highlights include:

* Thread-safe kernel RPC client - many threads can use the same RPC
client handle safely with replies being de-multiplexed at the socket
upcall (typically driven directly by the NIC interrupt) and handed
off to whichever thread matches the reply. For UDP sockets, many RPC
clients can share the same socket. This allows the use of a single
privileged UDP port number to talk to an arbitrary number of remote
hosts.

* Single-threaded kernel RPC server. Adding support for multi-threaded
server would be relatively straightforward and would follow
approximately the Solaris KPI. A single thread should be sufficient
for the NLM since it should rarely block in normal operation.

* Kernel mode NLM server supporting cancel requests and granted
callbacks. I've tested the NLM server reasonably extensively - it
passes both my own tests and the NFS Connectathon locking tests
running on Solaris, Mac OS X and Ubuntu Linux.

* Userland NLM client supported. While the NLM server doesn't have
support for the local NFS client's locking needs, it does have to
field async replies and granted callbacks from remote NLMs that the
local client has contacted. We relay these replies to the userland
rpc.lockd over a local domain RPC socket.

* Robust deadlock detection for the local lock manager. In particular
it will detect deadlocks caused by a lock request that covers more
than one blocking request. As required by the NLM protocol, all
deadlock detection happens synchronously - a user is guaranteed that
if a lock request isn't rejected immediately, the lock will
eventually be granted. The old system allowed for a 'deferred
deadlock' condition where a blocked lock request could wake up and
find that some other deadlock-causing lock owner had beaten them to
the lock.

* Since both local and remote locks are managed by the same kernel
locking code, local and remote processes can safely use file locks
for mutual exclusion. Local processes have no fairness advantage
compared to remote processes when contending to lock a region that
has just been unlocked - the local lock manager enforces a strict
first-come first-served model for both local and remote lockers.

Sponsored by: Isilon Systems
PR: 95247 107555 115524 116679
MFC after: 2 weeks


# 177230 15-Mar-2008 pjd

Fix mmap(2) on ZFS after some changes in VM subsystem.

Submitted by: alc
Reported by: kris (originally) and many others
Tested with: fsx
MFC after: 1 week


# 176559 25-Feb-2008 attilio

Axe the 'thread' argument from VOP_ISLOCKED() and lockstatus() as it is
always curthread.

As KPI gets broken by this patch, manpages and __FreeBSD_version will be
updated by further commits.

Tested by: Andrea Barberio <insomniac at slackware dot it>


# 175294 13-Jan-2008 attilio

VOP_LOCK1() (and so VOP_LOCK()) and VOP_UNLOCK() are only used in
conjuction with 'thread' argument passing which is always curthread.
Remove the unuseful extra-argument and pass explicitly curthread to lower
layer functions, when necessary.

KPI results broken by this change, which should affect several ports, so
version bumping and manpage update will be further committed.

Tested by: kris, pho, Diego Sardina <siarodx at gmail dot com>


# 175202 10-Jan-2008 attilio

vn_lock() is currently only used with the 'curthread' passed as argument.
Remove this argument and pass curthread directly to underlying
VOP_LOCK1() VFS method. This modify makes the code cleaner and in
particular remove an annoying dependence helping next lockmgr() cleanup.
KPI results, obviously, changed.

Manpage and FreeBSD_version will be updated through further commits.

As a side note, would be valuable to say that next commits will address
a similar cleanup about VFS methods, in particular vop_lock1 and
vop_unlock.

Tested by: Diego Sardina <siarodx at gmail dot com>,
Andrea Di Pasquale <whyx dot it at gmail dot com>


# 171567 24-Jul-2007 pjd

Update assertion after revision 1.23.

Reviewed by: dfr
Approved by: re (rwatson)


# 171316 09-Jul-2007 dfr

Correct a reference-counting mistake in the ZFS code which led to abnormal
memory usage and pessimal cache performance.

Reviewed by: pjd
Approved by: re (rwatson)


# 170044 28-May-2007 pjd

Adjust va_mask for setattr. FreeBSD doesn't have va_mask, so we initialize it
based on individual fields beeing set. This doesn't work for setattr replay,
because va_type is set there, so we add AT_TYPE flag to va_mask, which won't
be accepted by zfs_setattr().

Reported by: kris


# 169302 06-May-2007 pjd

MFp4: We don't need to cover vnode_pager_setsize() with the z_map_lock.


# 169198 02-May-2007 pjd

When parent directory has to be unlocked, lock it back with the same lock
type. Before this change, if directory was shared-locked, it was relocked
exclusively.


# 169197 02-May-2007 pjd

Lock vnode using cn_lkflags in case the caller wants the vnode to be
shared-locked.


# 169195 02-May-2007 pjd

ZFS should update timestamps upon the creat() of an existing file.

Obtained from: OpenSolaris
Bug: http://bugs.opensolaris.org/view_bug.do?bug_id=6465105


# 169172 01-May-2007 pjd

MFp4: Remove LK_RETRY flag when locking vnode in zfs_lookup, we don't want
dead vnodes here.

Suggested by: kib


# 169170 01-May-2007 pjd

White space fixes.


# 169167 01-May-2007 pjd

Add a comment explaining why we call dmu_write() unconditionally, even if
uiomove() fails, especially that it is different from what OpenSolaris
does (I'm not entirely sure they are right).

Suggested by: darrenr


# 169108 29-Apr-2007 pjd

- Define d_type for ".", ".." and ".zfs" directories.
- Add a TODO comment where d_type is still noe defined.


# 169107 29-Apr-2007 pjd

Oops, correct important typo in last commit.


# 169106 29-Apr-2007 pjd

Avoid freeing NULL pointer in case of an error.


# 169059 26-Apr-2007 pjd

MFp4: Optimize mappedwrite() and mappedread() functions to write/read as much
non-mapped data as possible at once and not page-by-page. Which this change we
combain I/Os, but also saves many VM_OBJECT_UNLOCK()/VM_OBJECT_LOCK()
operations.

Simple 'fsx -l 33554432 -o 524288 -N 10000 /tank/fsx' test shows ~23%
performance increase.


# 169057 26-Apr-2007 pjd

- Always try to write one whole page at a time.
- vm_page_undirty() is enough (instead of vm_page_set_validclean()), but it has
to be called before we write the data in case someone makes page dirty after
our write, but before our vm_page_undirty() call.
- Always dmu_write, not matter if uiomove() succeeded, because it could
partially be ok and we would lose some changes.

All good ideas from: ups


# 169025 24-Apr-2007 pjd

MFp4: Rearange the code so vobject is destroyed from reclaim() method like
in all other file system on FreeBSD (instead from inactive() method).

A nice side-effect of this change, except that it speedups file system
when mmaped file are often open/closed, is that it makes FreeBSD's
namecache work:)


# 169024 24-Apr-2007 pjd

MFp4: Once page is written successfully, we should clear the dirty bits.
This fixes slow operations on mmaped files, because without this fix,
pages were written to disk multiple times.

If one is looking for even greater speed up for such operation, he should
disable ZIL (by setting vfs.zfs.zil_disable to 1 in /boot/loader.conf).
Disabling ZIL makes fsx run ~9 times faster.


# 169023 24-Apr-2007 pjd

MFp4: Reduce diff against vendor.


# 168978 23-Apr-2007 pjd

Too much diff reduction. 'cmd' has to be u_long.

Reported by: delphij


# 168962 23-Apr-2007 pjd

MFp4: Reduce diff against vendor code:
- Move FreeBSD-specific code to zfs_freebsd_*() functions in zfs_vnops.c
and keep original functions as similar to vendor's code as possible.
- Add various includes back, now that we have them.


# 168958 22-Apr-2007 pjd

Fix st_rdev handling (implement it, actually).

Reported by: gj


# 168926 21-Apr-2007 pjd

MFp4:

@118370 Correct typo.

@118371 Integrate changes from vendor.

@118491 Show backtrace on unexpected code paths.

@118494 Integrate changes from vendor.

@118504 Fix sendfile(2). I had two ways of fixing it:
1. Fixing sendfile(2) itself to use VOP_GETPAGES() instead of
hacking around with vn_rdwr(UIO_NOCOPY), which was suggested
by ups.
2. Modify ZFS behaviour to handle this special case.

Although 1 is more correct, I've choosen 2, because hack from 1
have a side-effect of beeing faster - it reads ahead MAXBSIZE
bytes instead of reading page by page. This is not easy to implement
with VOP_GETPAGES(), at least not for me in this very moment.

Reported by: Andrey V. Elsukov <bu7cher@yandex.ru>

@118525 Reorganize the code to reduce diff.

@118526 This code path is expected. It is simply when file is opened with
O_FSYNC flag.

Reported by: kris
Reported by: Michal Suszko <dry@dry.pl>


# 168474 07-Apr-2007 des

Fix some type mismatches.

Reviewed by: pjd@


# 168404 06-Apr-2007 pjd

Please welcome ZFS - The last word in file systems.

ZFS file system was ported from OpenSolaris operating system. The code in under
CDDL license.

I'd like to thank all SUN developers that created this great piece of software.

Supported by: Wheel LTD (http://www.wheel.pl/)
Supported by: The FreeBSD Foundation (http://www.freebsdfoundation.org/)
Supported by: Sentex (http://www.sentex.net/)