274114 |
04-Nov-2014 |
des |
[SA-14:25] Fix kernel stack disclosure in setlogin(2) / getlogin(2). [SA-14:26] Fix remote command execution in ftp(1). [EN-14:12] Fix NFSv4 and ZFS cache consistency issue.
Approved by: so (des) |
267654 |
20-Jun-2014 |
gjb |
Copy stable/9 to releng/9.3 as part of the 9.3-RELEASE cycle.
Approved by: re (implicit) Sponsored by: The FreeBSD Foundation
|
267139 |
06-Jun-2014 |
delphij |
MFC r266915: MFV 266913+266914:
3897 zfs filesystem and snapshot limits (fix leak) 4901 zfs filesystem/snapshot limit leaks
Approved by: re (gjb)
|
266123 |
15-May-2014 |
smh |
MFC r264850
Add the ability to set a minimum ashift size for ZFS pool creation or root level vdev addition.
Change max_auto_ashift sysctl to error when an invalid value is requested instead of silently limiting it.
Sponsored by: Multiplay
|
265756 |
09-May-2014 |
delphij |
MFC r265458:
Import George Wilson's change for Illumos #4730:
4730 metaslab group taskq should be destroyed in metaslab_group_destroy() Reviewed by: Alex Reece <alex.reece@delphix.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Sebastien Roy <sebastien.roy@delphix.com>
Original author: George Wilson
|
265754 |
09-May-2014 |
delphij |
MFC r264835: MFV r264829:
3897 zfs filesystem and snapshot limits
|
265752 |
09-May-2014 |
delphij |
MFC r264671:
MFV r264668:
4754 io issued to near-full luns even after setting noalloc threshold 4755 mg_alloc_failures is no longer needed
|
265751 |
09-May-2014 |
delphij |
MFC r264669: MFV r264666:
4374 dn_free_ranges should use range_tree_t
|
265146 |
30-Apr-2014 |
smh |
MFC r265046
Fix ZIO reordering issue which could cause data loss / corruption.
Sponsored by: Multiplay
|
264730 |
21-Apr-2014 |
mav |
MFC r264341: Create zvol devices on zfs clone.
While big and shiny patch is not ready, it is better to have something.
PR: kern/178999
|
264505 |
15-Apr-2014 |
jhb |
Don't pass a timeout of 0 ticks to pause() for a delay of less than 1 hz tick. On 8.x this results in an infinite sleep as pause() does not support a delay of 0 ticks. Since all delay values are converted from nanoseconds to ticks using a floor function, skipping the sleep for a delay smaller than 1 tick is the more consistent than rounding up to a single tick.
This is a direct commit to 8 and 9 as 10.x and later use pause_sbt() instead.
Reviewed by: avg
|
263988 |
01-Apr-2014 |
mav |
MFC r263118: Report ZVOL block size as GEOM stripesize.
|
263410 |
20-Mar-2014 |
delphij |
MFC r260183:
MFV r260154 + 260182:
4369 implement zfs bookmarks 4368 zfs send filesystems from readonly pools
Illumos/illumos-gate@78f171005391b928aaf1642b3206c534ed644332
|
263402 |
20-Mar-2014 |
delphij |
MFC r260181:
Fix build on platforms where atomic_swap_64 is not available.
|
263400 |
20-Mar-2014 |
delphij |
MFC r260157: MFV r260153:
4121 vdev_label_init should treat request as succeeded when pool is read only
illumos/illumos-gate@973c78e94bf9634782164382c9e291bf81161fa5
|
263398 |
20-Mar-2014 |
delphij |
MFC r260150: MFV r259170:
4370 avoid transmitting holes during zfs send
4371 DMU code clean up
illumos/illumos-gate@43466aae47bfcd2ad9bf501faec8e75c08095e4f
NOTE: Make sure the boot code is updated if a zpool upgrade is done on boot zpool.
|
263396 |
19-Mar-2014 |
delphij |
MFC r260141: MFV r258385:
(Note: this change is not applicable to FreeBSD and the file is not included in build. It's integrated for completeness).
4128 disks in zpools never go away when pulled
illumos/illumos-gate@39cddb10a31c1c2e66aed69e6871d09caa4c8147
|
263394 |
19-Mar-2014 |
delphij |
MFC r260138: MFV r242733:
3306 zdb should be able to issue reads in parallel 3321 'zpool reopen' command should be documented in the man page and help message
illumos/illumos-gate@31d7e8fa33fae995f558673adb22641b5aa8b6e1
FreeBSD porting notes: the kernel part of this changeset depends on Solaris buf(9S) interfaces and are not really applicable for our use. vdev_disk.c is patched as-is to reduce diverge from upstream, but vdev_file.c is left intact.
|
263391 |
19-Mar-2014 |
delphij |
MFC r259813 + r259816: MFV r258374:
4171 clean up spa_feature_*() interfaces
4172 implement extensible_dataset feature for use by other zpool features
illumos/illumos-gate@2acef22db7808606888f8f92715629ff3ba555b9
|
263389 |
19-Mar-2014 |
delphij |
MFC r259811: MFV r258373:
4168 ztest assertion failure in dbuf_undirty
4169 verbatim import causes zdb to segfa 4170 zhack leaves pool in ACTIVE state
illumos/illumos-gate@7fdd916c474ea52896c671bbe7b56ba34a1ca132
|
263386 |
19-Mar-2014 |
delphij |
MFC r254587: MFV r254421:
Illumos ZFS issues: 3996 want a libzfs_core API to rollback to latest snapshot
|
263270 |
17-Mar-2014 |
delphij |
MFC r262676:
All callers of static method load_nvlist() in spa.c handles error case, so there is no reason to assert that we won't hit an error. Instead, just return that error to caller and have the upper layer handle it.
Obtained from: FreeNAS Reported by: rodrigc Reviewed by: Matthew Ahrens
|
262323 |
22-Feb-2014 |
delphij |
MFC r261620: MFV r261619:
4574 get_clones_stat does not call zap_count in non-debug kernel
zap_count(...) is never called in non-DEBUG kernel. As result "count" variable is always 0, and "goto fail" is always reached. This means get_clones_stat function never makes up list of clones for "clones" properties.
|
262180 |
18-Feb-2014 |
avg |
MFC r259576: MFV r258923: 4188 assertion failed in dmu_tx_hold_free(): dn_datablkshift != 0
|
262177 |
18-Feb-2014 |
avg |
MFC r259052: Expose spa_asize_inflation
|
262175 |
18-Feb-2014 |
avg |
MFC r258294: Fix ZFS deadlock when sending a snapshot which is mounted
MFC slacker: smh
|
262173 |
18-Feb-2014 |
avg |
MFC r256889: Use the vdev's ashift to calculate the supported min block size passed to zio_compress_data
|
262171 |
18-Feb-2014 |
avg |
MFC r254757: MFV r254749: 4046 dsl_dataset_t ds_dir->dd_lock is highly contended
MFC slacker: delphij
|
262170 |
18-Feb-2014 |
avg |
MFC r254608: Add kstat entries for ZFS compression statistics
MFC slacker: gibbs
|
262168 |
18-Feb-2014 |
mav |
MFC r260236: In dmu_zfetch_stream_reclaim() replace division with multiplication and move it out of the loop and lock.
|
262165 |
18-Feb-2014 |
mav |
MFC r259168: Don't even try to read vdev labels from devices smaller then SPA_MINDEVSIZE (64MB). Even if we would find one somehow, ZFS kernel code rejects such devices. It is funny to look on attempts to read 4 256K vdev labels from 1.44MB floppy, though it is not very practical and quite slow.
|
262164 |
18-Feb-2014 |
mav |
MFC: r258137 Introduce allocation cache to store LZ4 compression contexts without kicking VM subsystem twice for every written record.
Tests on 24-core system show double reduction of CPU time spent on copying single large well-compressed file.
This patch is not really needed on illumos (while not harm either) since their memory allocator by default uses caching for all requests up to 128K.
|
262163 |
18-Feb-2014 |
mav |
MFC r253992: Disable r252840 when ZFS TRIM is enabled (vfs.zfs.trim.enabled=1) and really disable TRIM otherwise.
r252840 (illumos bug 3836) is based on assumption that zio_free_sync() has no lock dependencies and should complete immediately. Unfortunately, with our TRIM implementation that is not true due to ZIO_STAGE_VDEV_IO_START added to the ZIO_FREE_PIPELINE, which, while not really accessing devices, still acquires SCL_ZIO lock for read to be sure devices won't disappear.
When TRIM is disabled, this patch enables direct free execution from r252840 and removes ZIO_STAGE_VDEV_IO_START and ZIO_STAGE_VDEV_IO_ASSESS stages from the pipeline to avoid lock acquisition. Otherwise it queues free request as it was before r252840.
|
262160 |
18-Feb-2014 |
avg |
MFC r253820: MFV r253782: 3888 zfs recv -F should destroy any snapshots created since the incremental source
MFC slacker: delphij
|
262158 |
18-Feb-2014 |
avg |
MFC r253819: MFV r253781 + r253871: 3894 zfs should not allow snapshot of inconsistent dataset
MFC slacker: delphij
|
262156 |
18-Feb-2014 |
avg |
MFC r250149: In case ZFS doesn't use UMA for buffers there's no need to waste memory
|
262154 |
18-Feb-2014 |
avg |
MFC r240829: remove cache entries associated with the source and the target of rename()
MFC slacker: pjd
|
262118 |
17-Feb-2014 |
avg |
MFC r260185: MFV r260155: 4391 panic system rather than corrupting pool if we hit bug 4390
|
262116 |
17-Feb-2014 |
avg |
MFC r260835: MFV r260834: Fix memory leak of compressed buffers in l2arc_write_done
|
262111 |
17-Feb-2014 |
avg |
MFC r260704,260717: zfs: getnewvnode_reserve must be called outside of a zfs transaction
|
262108 |
17-Feb-2014 |
avg |
MFC r260812: traverse_visitbp: visit DMU_GROUPUSED_OBJECT before DMU_USERUSED_OBJECT
|
262097 |
17-Feb-2014 |
avg |
MFC r260706: zfs_deleteextattr: name buffer from namei is needed by zfs_remove
|
262094 |
17-Feb-2014 |
avg |
MFC r258717: MFV r258371,r258372: 4101 metaslab_debug should allow for fine-grained control
|
262089 |
17-Feb-2014 |
avg |
MFC r255750: MFV r254750: Add support of Illumos dumps on zvol over RAID-Z.
Note that this only adds the features. FreeBSD would still need more work to support dumping on zvols.
MFC slacker: delphij
|
262086 |
17-Feb-2014 |
avg |
MFC r254112: MFV r254079: multiple ZFS issues
|
262084 |
17-Feb-2014 |
avg |
MFC r254077: MFV r254071: Fix a regression introduced by fix for Illumos bug #3834
|
262082 |
17-Feb-2014 |
avg |
MFC r252840: 3836 zio_free() can be processed immediately in the common case
MFC slacker: mm
|
262081 |
17-Feb-2014 |
avg |
MFC r254591,255753: Enhance the ZFS vdev layer to maintain both a logical and a physical minimum allocation size for devices
|
262077 |
17-Feb-2014 |
avg |
MFC r253441: Manually merge part of vendor import r238583 from Illumos
|
262073 |
17-Feb-2014 |
avg |
MFC r255226: Add sysctl/tunables for various metaslab variables
MFC slacker: pjd
|
260785 |
16-Jan-2014 |
avg |
MFC r258744-258746: zfs: add zfs_freebsd_putpages
|
260777 |
16-Jan-2014 |
avg |
MFC r258720: MFV r258665: 4347 ZPL can use dmu_tx_assign(TXG_WAIT)
|
260774 |
16-Jan-2014 |
avg |
MFC r258739: zfs mappedread_sf: assert that a page is never partially valid
|
260770 |
16-Jan-2014 |
avg |
MFC r258634: MFV r258376: 3964 L2ARC should always compress metadata buffers
|
260766 |
16-Jan-2014 |
avg |
MFC r258633: MFV r255256: 3954 metaslabs continue to load even after hitting zfs_mg_alloc_failure limit
|
260764 |
16-Jan-2014 |
avg |
MFC r258632,258704: MFV r255255: 4045 zfs write throttle & i/o scheduler performance work
Note a change in dmu_tx_delay: pause_sbt is not available in this branch.
Sponsored by: HybridCluster [merge]
|
260760 |
16-Jan-2014 |
avg |
MFC r254074: MFV r254070: Merge vendor bugfix for ZFS test suite that triggers false positives
|
260756 |
16-Jan-2014 |
avg |
MFC r248426: Fix typo in sysctl description
MFC slacker: mm
|
260754 |
16-Jan-2014 |
avg |
MFC r255437: MFV r247844 (illumos-gate 13975:ef6409bc370f)
Note that a different kind of cv_timedwait_hires shim is provided in this branch because cv_timedwait_sbt is not available for better emulation.
Sponsored by: HybridCluster [merge]
|
260751 |
16-Jan-2014 |
avg |
MFC r258631: MFV r247578
3581 spa_zio_taskq[ZIO_TYPE_FREE][ZIO_TASKQ_ISSUE]->tq_lock is piping hot
|
260743 |
16-Jan-2014 |
avg |
MFC r258630: 734 taskq_dispatch_prealloc() desired
|
260732 |
16-Jan-2014 |
avg |
MFC r258638,258642: expose zfs_flags as debug.zfs_flags r/w tunable and sysctl
Sponsored by: HybridCluster
|
260722 |
16-Jan-2014 |
avg |
MFC r253821,254753,256259
MFV r253783: 3834 incremental replication of 'holey' file systems is slow
MFV r254747:4047 panic from dbuf_free_range() from dmu_free_object() while doing zfs receive
MFV r255257: 4082 zfs receive gets EFBIG from dmu_tx_hold_free()
|
260518 |
10-Jan-2014 |
asomers |
MFC 259240 sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c When a da or ada device dissappears, outstanding IOs fail with ENXIO, not EIO. The check for EIO was probably copied from Illumos, where that is indeed the correct errno.
Without this change, pulling a busy drive from a zpool would usually turn it into UNAVAIL, even though pulling an idle drive would turn it into REMOVED. With this change, it is REMOVED every time.
Also, vdev_geom_io_intr shouldn't do zfs_post_remove, because that results in devd getting two resource.fs.zfs.removed events. The comment said that the event had to be sent directly instead of through the async removal thread because "the DE engine is using this information to discard prevoius I/O errors". However, the fact that vdev_geom_io_intr was never actually sending the events until now, and that vdev_geom_orphan never sent them at all, and that vdev_geom_orphan usually gets called about 2 seconds after the actual removal, means that FreeBSD's userland can cope with a late event just fine.
|
258635 |
26-Nov-2013 |
avg |
MFC r229663: Allow to change vfs.zfs.arc_meta_limit at runtime.
- Change vfs.zfs.arc_meta_used from CTLFLAG_RDTUN to CTLFLAG_RD, as it is not a tunable.
MFC slacker: pjd
|
258560 |
25-Nov-2013 |
avg |
MFV r258377: 4088 use after free in arc_release()
illumos/illumos-gate@ccc22e130479b5bd7c0002267fee1e0602d3f772
|
258557 |
25-Nov-2013 |
avg |
MFC r258389: MFV r258378: 4089 NULL pointer dereference in arc_read()
illumos/illumos-gate@57815f6b95a743697e148327725b7f568e75e6ea
|
258555 |
25-Nov-2013 |
avg |
MFC r258353: zfs page_busy: fix the boundaries of the cleared range
This is a fix for a regression introduced in r246293.
vm_page_clear_dirty expects the range to have DEV_BSIZE aligned boundaries, otherwise it extends them. Thus it can happen that the whole page is marked clean while actually having some small dirty region(s). This commit makes the range properly aligned and ensures that only the clean data is marked as such.
It would interesting to evaluate how much benefit clearing with DEV_BSIZE granularity produces. Perhaps instead we should clear the whole page when it is completely overwritten and don't bother clearing any bits if only a portion a page is written.
|
257253 |
28-Oct-2013 |
will |
MFC r248653: ZFS: Fix a panic while unmounting a busy filesystem.
This particular scenario was easily reproduced using a NFS export. When the first 'zfs unmount' occurred, it returned EBUSY via this path, while vflush() had flushed references on the filesystem's root vnode, which in turn caused its v_interlock to be destroyed. The next time 'zfs unmount' was called, vflush() tried to obtain this lock, which caused this panic.
Since vflush() on FreeBSD is a definitive call, there is no need to check vfsp->vfs_count after it completes. Simply #ifdef sun this check.
|
257119 |
25-Oct-2013 |
delphij |
MFC r253816: MFV r253780:
To quote Illumos #3875:
The problem here is that if we ever end up in the error path, we drop the locks protecting access to the zfsvfs_t prior to forcibly unmounting the filesystem. Because z_os is NULL, any thread that had already picked up the zfsvfs_t and was sitting in ZFS_ENTER() when we dropped our locks in zfs_resume_fs() will now acquire the lock, attempt to use z_os, and panic.
Illumos ZFS issues: 3875 panic in zfs_root() after failed rollback
|
255540 |
14-Sep-2013 |
mav |
MFC r253993: Block reporting of ZFS features for suspended pools.
Before executing any subcommand, zpool tool fetches pools configuration from the kernel. Before features support was added, kernel was regenerating that configuration based on data always present in memory. Unfortunately, pool features list and activity counters are not such. They are stored in ZAP, that normally resides in ARC, but under heavy memory pressure may be swapped out. If pool is suspended at this point, there is no way to recover it back since any zpool command will stuck.
This change has one predictable flaw: `zpool upgrade` always wish to upgrade suspended pools, but fortunately it can't do it due to the suspension.
|
255539 |
14-Sep-2013 |
mav |
MFC r253991: Make `zpool clear` to reopen also reconnected cache and spare devices. Since `zpool status` reports about such kinds of errors, it is strange that they are not cleared by `zpool clear`.
|
255538 |
14-Sep-2013 |
mav |
MFC r253990: Make ZFS to use separate thread to handle SPA_ASYNC_REMOVE async events. Existing async thread is running only on successfull spa_sync() completion, that is impossible in case of pool loosing required (last) disk(s). That indefinite delay of SPA_ASYNC_REMOVE processing made ZFS to not close the lost disks, preventing GEOM/CAM from destroying devices and reusing names on later disk reattach.
In earlier version of the patch I've tried to just run existing thread immediately, unrelated to spa_sync() completion, but that exposed number of situations where it could stuck due to locks held by stuck spa_sync(), that are required for other kinds of async events.
Experiments with OpenIndiana snapshot confirmed that they also have this issue with lost disks reattach.
|
255537 |
14-Sep-2013 |
mav |
MFC r253806: Allow three IOCTLs to be used on suspended pool, restoring state that existed before IOCTL code refactoring merged change 4445fffb from illumos at r248571.
This change allows `zpool clear` to be used again to recover suspended pool. It seems the only was supposed by the code to restore pool operation after reconnecting lost disks that were required for data completeness. There are still cases where `zpool clear` command can just safely stuck due to deadlocks inside ZFS kernel part, but probably that is better then having no chances to recover at all.
|
255536 |
14-Sep-2013 |
mav |
MFC r253643: Following r222950, revert unintentional change cls -> class in argument name in r245264. Aside from non-uniformity, that again confused C++ compilers.
|
255519 |
13-Sep-2013 |
avg |
MFC r254714: zfs: do not reject any operations on a pool just because it's a boot pool
|
255517 |
13-Sep-2013 |
avg |
MFC r254445,254711: zfs: inline and remove zfs_vnode_lock
|
254696 |
23-Aug-2013 |
avg |
MFC r253606: zfs module: perform cleanup during shutdown in addition to module unload
|
254694 |
23-Aug-2013 |
avg |
MFC r253603: zfs: move vnode creation from zfs_znode_cache_constructor to zfs_znode_alloc
|
254203 |
11-Aug-2013 |
smh |
MFC: r253926
zfs_ioc_rename should not leave the value of zc_name passed in via zc altered on return.
|
254049 |
07-Aug-2013 |
avg |
MFC r253073: zfs: try to properly handle i/o errors in mappedread_sf
|
254047 |
07-Aug-2013 |
avg |
MFC r253070: zfs: load zpool.cache after a root fs is mounted
|
253855 |
01-Aug-2013 |
mav |
MFC r253754: Partially close race between calls of orphan() method from GEOM and close() method from ZFS core, that reliably causes use-after-free panic if SSD vdev detached during inititial erase.
Approved by: re (delphij)
|
252891 |
06-Jul-2013 |
gavin |
Merge r252337 from head:
Don't try to re-insert an already present but invalid page.
This could happen if a thread doing a page-in loses a ZFS range lock race to a thread writing to the same range
This fixes "panic: vm_page_alloc: pindex already allocated" in http://docs.FreeBSD.org/cgi/mid.cgi?1372165971.96049.42.camel
Submitted by: avg
|
252764 |
05-Jul-2013 |
delphij |
MFC r251646 + r252219:
MFV r251644:
Poor ZFS send / receive performance due to snapshot hold / release processing (by smh@)
Illumos ZFS issues: 3740 Poor ZFS send / receive performance due to snapshot hold / release processing
MFV r252215:
Restore a previous behavior before r251646, where when destructing ZFS snapshot, the ioctl would return ENOENT when it hit any of them in the errlist (the new behavior was only return ENOENT when all returns error).
Illumos ZFS issues: 3829 fix for 3740 changed behavior of zfs destroy/hold/release ioctl
|
252762 |
05-Jul-2013 |
delphij |
MFC r251636: illumos #3749 zfs event processing should work on R/O root filesystems
This log is a modified version of the original one written by gibbs@, to account for changes made during the illumos RTI process.
Allow ZFS asynchronous event handling to proceed even if the root file system is mounted read-only. This restriction appears to have been put in place to avoid errors with updating the configuration cache file. However:
o The majority of asynchronous event handling does not involve configuration cache file updates. o The configuration cache file need not be on the root file system, so the check was not complete. o Other classes of errors (e.g. file system full) can also prevent a successful update yet do not prevent asynchronous event processing. o Configurations such as NanoBSD never have a read-write root, so ZFS event processing is permanently disabled in these systems. o Failure to handle asynchronous events promptly can extend the window of time that a pool is in a critical state.
At worst, a missed configuration cache update will force the operator to perform a manual "zfs import" (note -f is not required) to inform the system about a newly created pool. To minimize the likelihood of this rare occurrence, configuration cache write failures now emit FMA events (via devctl) so the operator can take corrective action, and the write is retried every 5 minutes. The retry interval, in seconds, is tunable via the sysctl "vfs.zfs.ccw_retry_interval".
As a side effect of reporting configuration cache events, other sysevents, such as re-silver start/stop, are now also reported via devctl.
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/spa.c: o As is done in zfs_fm.c, provide a manual declaration for devctl_notify(). Both declarations could be combined into spa_impl.h, but the declaration is fault management related, not spa specific. sys/fm/fs/zfs.h would be ideal if it weren't so public and reserved for FMA string definitions. I'm open to suggestions on how to improve this nit while minimizing our divergence from Solaris. o Use devctl_notify() to implement sysevent support in spa_event_notify(). The subsystem is EC_ZFS so that these events can never collide with those emitted in zfs_fm.c. o Add the sysctl "vfs.zfs.ccw_retry_interval". The value defaults to 5 minutes and is used to rate limit, on a per-pool basis, configuration cache file write attempts. o Modify spa_async_dispatch to honor configuration cache write limiting. If other events are pending, a configuration cache write will be attempted at the same time, so the rate limiting only applies when the asynchronous dispatch system is otherwise idle. Async events should be rare (e.g. device arrival/departure) and configuration cache writes rarer, so a more complicated system to strictly honor the retry limit seems unwarranted. o Remove check in spa_async_dispatch() for the root file system being read-write.
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/spa_config.c: Instead of silently ignoring configuration cache write failures, report them via a new FMA event as well as to the console. The current zfs_ereport_post() doesn't allow arbitrary name=value pairs to be appended to the report, so the configuration cache file name is only available on the console output. This limitation should be addressed in a future update.
Note: This error report is only posted once per incident, to avoid spamming.
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/spa_impl.h: Add a hrtime_t to the spa data structure to track the time (via gethrtime()) of the last configuration cache file write failure. This is referenced in spa_async_dispatch() to effect the rate limiting.
sys/cddl/contrib/opensolaris/uts/common/sys/fm/fs/zfs.h: Add FM_EREPORT_ZFS_CONFIG_CACHE_WRITE as an ereport class.
Submitted by: gibbs Reviewed by: Matthew Ahrens <mahrens@delphix.com>, Eric Schrock <eric.schrock@delphix.com>, Christopher Siden <christopher.siden@delphix.com> Sponsored by: Spectra Logic
|
252760 |
05-Jul-2013 |
delphij |
MFC r251635: illumos #3747 txg commit callbacks don't work
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/txg.c: Fix commit callbacks by moving them to the task's list. Previously, list_move_tail() returned without doing anything because the task list was passed as the source rather than destination.
cddl/contrib/opensolaris/cmd/ztest/ztest.c: Check the commit callback threshold correctly.
Submitted by: will Reviewed by: Matthew Ahrens <mahrens@delphix.com>, Christopher Siden <christopher.siden@delphix.com> Sponsored by: Spectra Logic
|
252756 |
05-Jul-2013 |
delphij |
MFC r251633: illumos #3744 zfs shouldn't ignore errors unmounting snapshots
Propagate errors from zfs_unmount_snap() up to its callers wherever feasible.
Submitted by: will Reviewed by: Matthew Ahrens <mahrens@delphix.com>, Christopher Siden <christopher.siden@delphix.com> Sponsored by: Spectra Logic
|
252754 |
05-Jul-2013 |
delphij |
MFC r251632: illumos #3743 zfs needs a refcount audit
Audit zap cursor usage and correct missing calls to zap_cursor_fini().
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/spa_errlog.c: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zap.c: Correct early exit handling of several functions that previously failed to close a cursor prior to returning.
Submitted by: gibbs
Audit holders of dmu_bufs and correct missing calls to dmu_buf_rele().
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_dataset.c: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zap.c: Correct early exit handling of several functions that previously failed to release a dmu_buf prior to returning.
Submitted by: will
Reviewed by: Matthew Ahrens <mahrens@delphix.com>, Eric Schrock <eric.schrock@delphix.com>, George Wilson <george.wilson@delphix.com>, Christopher Siden <christopher.siden@delphix.com> Sponsored by: Spectra Logic
|
252751 |
05-Jul-2013 |
delphij |
MFC r251631: illumos #3742 zfs comments need cleaner, more consistent style
- Make more of ZFS's comments use a natural English writing flow. - Break up long paragraphs, fix various typos and spelling errors. - Don't prefix a function description with its name when the function definition immediately follows. - Remove useless comments. - Add extra whitespace where it makes the comments more readable.
New comments were separated from this change and added in r251629.
Submitted by: asomers, gibbs, will Reviewed by: Matthew Ahrens <mahrens@delphix.com>, George Wilson <george.wilson@delphix.com>, Eric Schrock <eric.schrock@delphix.com>, Christopher Siden <christopher.siden@delphix.com> Sponsored by: Spectra Logic
|
252749 |
05-Jul-2013 |
delphij |
MFC r251629: 3741 zfs needs better comments
Embellish the comments in various components of ZFS. Move some comments around closer to what they describe. Specifically, answer the questions:
- What are some of the edge cases of the dbuf state machine? - What does a txg quiesce do? - When does the DMU notify threads waiting on txg's that they may proceed? - How do the calculations for RAIDZ map allocations work? - What process do the RAIDZ I/O start and done callbacks follow?
While here, adjust the function prototype of dmu_zfetch.c:dmu_zfetch_colinear() to match its comment which describes its return as a boolean.
Submitted by: asomers, gibbs, will Reviewed by: Matthew Ahrens <mahrens@delphix.com>, Eric Schrock <eric.schrock@delphix.com>, Christopher Siden <christopher.siden@delphix.com> Sponsored by: Spectra Logic
|
252524 |
02-Jul-2013 |
mm |
MFC r252380: Unbreak "zfs jail" and "zfs unjail" (broken in stable/9 since r249643)
I missed to register zfs_ioc_jail and zfs_ioc_unjail as legacy ioctl's with the new zfs_ioctl_register_legacy() function.
These operations do not modify pools or datasets so there is no need to log them to pool history.
Reported by: Alexander Leidinger <netchild@FreeBSD.org> on current@
|
252459 |
01-Jul-2013 |
smh |
MFC r252390: Remove invalid ASSERT
|
252308 |
27-Jun-2013 |
smh |
MFC r252056: Fix destroyed ZFS pools failing to import
MFC r252059: Fix ZFS zpool freeze (debug command) not processing due to invalid ioctl call syntax.
MFC r252060: Fix intermittent ZFS lock panic
MFC r252061: Switch ZFS mutex_owner macro to use sx_xholder as its now exported via sx.h
|
252142 |
24-Jun-2013 |
delphij |
MFC r251520: MFV r251519:
* Illumos ZFS issue #3805 arc shouldn't cache freed blocks
|
252140 |
24-Jun-2013 |
delphij |
MFC r251478: MFV r251474:
* Illumos zfs issue #3137 L2ARC compression
Whether or not to compress buffers entering the L2ARC is controlled by "compression" setting on the dataset, when compression is not "off", L2ARC compression is enabled.
The compress method is always LZ4 for L2ARC when enabled because it works best for the scenario.
|
252027 |
20-Jun-2013 |
smh |
MFC r248573: Don't register repair writes in the trim map.
|
251419 |
05-Jun-2013 |
smh |
Added ZFS TRIM support which is enabled by default. To disable ZFS TRIM support set vfs.zfs.trim.enabled=0 in loader.conf.
Creating new ZFS pools and adding new devices to existing pools first performs a full device level TRIM which can take a significant amount of time. The sysctl vfs.zfs.vdev.trim_on_init can be set to 0 to disable this behaviour.
ZFS TRIM requires the underlying device support BIO_DELETE which is currently provided by methods such as ATA TRIM and SCSI UNMAP via CAM, which are typically supported by SSD's.
Stats for ZFS TRIM can be monitored by looking at the sysctl's under kstat.zfs.misc.zio_trim.
MFC r240868: Add TRIM support MFC r244155: Renamed zfs trim stats MFC r244187: Upgrade TRIM free request sizes optimisation MFC r244188: Added vfs.zfs.vdev.trim_on_init sysctl MFC r248572: Add TRIM support for L2ARC MFC r248574: Improve TXG handling in the TRIM module MFC r248575: TRIM cache devices based on time instead of TXGs MFC r248576: Names the ZFS TRIM thread MFC r248577: Optimisation of TRIM processing MFC r248602: Fix for building libzpool under i386 MFC r249921: Enabled ZFS TRIM by default
|
251415 |
05-Jun-2013 |
smh |
MFC r248579: Add missing descriptions for ZFS sysctls
|
250098 |
30-Apr-2013 |
mm |
MFC r249858: Merge vendor bugfix for a possible deadlock related to async destroy and improve write performance by introducing a new lock protecting tx_open_txg.
Illumos ZFS issues: 3642 dsl_scan_active() should not issue I/O to determine if async destroying is active 3643 txg_delay should not hold the tc_lock
|
249920 |
26-Apr-2013 |
mm |
MFC r249787: The zfs synctask code restructuring introduced a new bug that makes it impossible to set quota and reservation on pools lower than version 22. Problem has been reported and a solution discussed with vendor.
Illumos ZFS issues: 3739 cannot set zfs quota or reservation on pool version < 22
Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reported by: Steve Wills <swills@FreeBSD.org>
|
249689 |
20-Apr-2013 |
mm |
MFC r249047 (avg): spa_open_common: fix argument to zvol_create_minors
Prior to r248571 spa_open was always called with a bare pool name, but now it is called with a dataset name instead (spa_lookup handles that). So, when a ZFS root is mounted spa_open is called with a name of a root dataset, which can very well be different from the pool name. But zvol_create_minors should be called with the pool name, because it performs a recursive traversal of all datasets under the name to find all those that are volumes.
|
249643 |
19-Apr-2013 |
mm |
MFC 248571,248976,249004,249042,249188,249195-249196,249206,249207,249319, 249326,249356-249357
Merge libzfs_core and other ZFS bugfixes and improvements.
MFC r248571: MFV 238590, 238592: In the first zfs ioctl restructuring phase, the libzfs_core library was introduced. It is a new thin library that wraps around kernel ioctl's. The idea is to provide a forward-compatible way of dealing with new features. Arguments are passed in nvlists and not random zfs_cmd fields, new-style ioctls are logged to pool history using a new method of history logging.
http://blog.delphix.com/matt/2012/01/17/the-future-of-libzfs/
MFV 247580 [1]: To address issues of several deadlocks and race conditions the locking code around dsl_dataset was rewritten and the interface to synctasks was changed.
User-Visible Changes: "zfs snapshot" can create more arbitrary snapshots at once (atomically) "zfs destroy" destroys multiple snapshots at once "zfs recv" has improved performance
Backward Compatibility: I have extended the compatibility layer to support full backward compatibility by remapping or rewriting the responsible ioctl arguments. Old utilities are fully supported by the new kernel module.
Forward Compatibility: New utilities work with old kernels with the following restrictions: - creating, destroying, holding and releasing of multiple snapshots at once is not supported, this includes recursive (-r) commands
Illumos ZFS issues: 2882 implement libzfs_core 2900 "zfs snapshot" should be able to create multiple, arbitrary snapshots at once 3464 zfs synctask code needs restructuring
MFC r248976: Call dmu_snapshot_list_next() in zvol.c with dsl_pool_config lock held
MFC r249004: Do not check against uninitialized rc and comment out vendor code
MFC r249042: Fix possible pool hold leak in dmu_send_impl()
Illumos ZFS issues: 3645 dmu_send_impl: possibilty of pool hold leak
MFC r249188: Import vendor change to reduce diff, no effect on FreeBSD.
Illumos ZFS issues: 3517 importing pool with autoreplace=on and "hole" vdevs crashes syseventd
MFC r249195: Merge change from vendor to reduce diff only. ZFS dtrace probes are not supported on FreeBSD yet.
Illumos ZFS issues: 3598 want to dtrace when errors are generated in zfs
MFC r249196: Provide a fix for kernel panic if receiving recursive deduplicated streams. Problem reported to vendor.
Illumos ZFS issues: 3692 Panic on zfs receive of a recursive deduplicated stream
MFC r249206: Merge vendor change - modify time processing in deadman thread.
Illumos ZFS issues: 3618 ::zio dcmd does not show timestamp data
MFC r249207: Allow zdb to output a histogram of compressed block sizes.
Illumos ZFS issues: 3641 want a histogram of compressed block sizes
MFC r249319: ZFS expects a copyout of zfs_cmd_t on an ioctl error. Our sys_ioctl() doesn't copyout in this case.
To solve this a new struct zfs_iocparm_t is introduced consisting of: - zfs_ioctl_version (future backwards compatibility purposes) - user space pointer to zfs_cmd_t (copyin and copyout) - size of zfs_cmd_t (verification purposes)
The copyin and copyout of zfs_cmd_t is now done the illumos (vendor) way what makes porting of new changes easier and ensures correct behavior if returning an error.
MFC r249326: Cast (void *)(uintptr_t) on copyout and copyin of zfs_iocparm_t.zfs_cmd
MFC r249356: Merge bugfixes accepted and integrated by vendor. Underlying problems have been reported by us and fixed in r240942 and r249196.
Illumos ZFS issues: 3645 dmu_send_impl: possibilty of pool hold leak 3692 Panic on zfs receive of a recursive deduplicated stream
MFC r249357: Fix libzfs to report error instead of returning zero if trying to hold or release a non-existing snapshot of a existing dataset. In recursive case error is reported if no snapshots with the requested name have been found.
Illumos ZFS issues: 3699 zfs hold or release of a non-existent snapshot does not output error
|
248945 |
31-Mar-2013 |
avg |
MFC r246293: zfs: fix, improve and re-organize page_lookup and page_unlock
|
248611 |
22-Mar-2013 |
mm |
MFC r240870 (pjd): It is possible to recursively destroy snapshots even if the snapshot doesn't exist on a dataset we are starting from. For example if we have the following configuration:
tank tank/foo tank/foo@snap tank/bar tank/bar@snap
We can execute:
# zfs destroy -t tank@snap
eventhough tank@snap doesn't exit.
Unfortunately it is not possible to do the same with recursive rename:
# zfs rename -r tank@snap tank@pans cannot open 'tank@snap': dataset does not exist
...until now. This change allows to recursively rename snapshots even if snapshot doesn't exist on the starting dataset.
Sponsored by: rsync.net
|
248609 |
22-Mar-2013 |
mm |
MFC r248493: Plug memory leak in dsl_check_snap_cb() This was unnoticed because the function is very rarely used.
|
248547 |
20-Mar-2013 |
mm |
MFC r242845, r247592:
MFC r242845 (delphij): Illumos r13840:97fd5cdf328a: 3145 single-copy arc 3212 ztest: race condition between vdev_online() and spa_vdev_remove()
Illumos r13849:3468a95b27cd: 3258 ztest's use of file descriptors is unstable
MFC r247592 (delphij): Import a fix tighten assertion on SPA versions from vendor (Illumos).
Illumos ZFS issue: 3543 Feature flags causes assertion in spa.c to miss certain cases
Approved by: delphij
|
248369 |
16-Mar-2013 |
mm |
MFC r247187,247265,247348,247398,247540,247585,247852,248265,248267 Merge various ZFS improvements and bugfixes
MFC r247187: Import vendor change to avoid "unitialized variable" warnings.
Illumos ZFS issues: 3522 zfs module should not allow uninitialized variables
MFC r247265: Merge the ZFS I/O deadman thread from vendor (illumos). This feature panics the system on hanging ZFS I/O, helps debugging and resumes failed service.
The panic behavior can be controlled with the loader-only tunables: vfs.zfs.deadman_enabled (enable or disable panic on stalled ZFS I/O) vfs.zfs.deadman_synctime (expiration time for stalled ZFS I/O)
By default, ZFS I/O deadman is enabled by default on amd64 and i386 excluding virtual guest machines.
MFC r247348: Be more verbose on ZFS deadman I/O panic Patch suggested upstream.
MFC r247398: Import metaslab_sync() speedup from vendor (illumos).
Illumos ZFS issues: 3552 condensing one space map burns 3 seconds of CPU in spa_sync() thread 3564 spa_sync() spends 5-10% of its time in metaslab_sync() (when not condensing) 3578 transferring the freed map to the defer map should be constant time 3579 ztest trips assertion in metaslab_weight()
MFC r247540: Fix the zfs_ioctl compat layer to support zfs_cmd size change introduced in r247265 (ZFS deadman thread). Both new utilities now support the old kernel and new kernel properly detects old utilities.
For future backwards compatibility, the vfs.zfs.version.ioctl read-only sysctl has been introduced. With this sysctl zfs utilities will be able to detect the ioctl interface version of the currently loaded zfs module.
MFC r247585: Merge new read-only zfs properties from vendor (illumos)
Illumos ZFS issues: 3588 provide zfs properties for logical (uncompressed) space used and referenced
MFC r247852: Import ZFS bpobj bugfix from vendor.
Illumos ZFS issues: 3603 panic from bpobj_enqueue_subobj() 3604 zdb should print bpobjs more verbosely
MFC r248265: Update zfs.8 manpage date (missing in r247585)
MFC r248267: Import minor ZFS changes from vendor
Illumos ZFS issues: 3604 zdb should print bpobjs more verbosely (fix zdb hang) 3606 zpool status -x shouldn't warn about old on-disk format
|
247888 |
06-Mar-2013 |
avg |
MFC r246532: zfs_vget, zfs_fhtovp: properly handle the z_shares_dir object
|
247406 |
27-Feb-2013 |
mm |
MFC r246631,246651,246666,246675,246678,246688: Merge various ZFS bugfixes
MFC r246631: Import vendor bugfixes
Illumos ZFS issues: 3422 zpool create/syseventd race yield non-importable pool 3425 first write to a new zvol can fail with EFBIG
MFC r246651: Import minor type change in refcount.h header from vendor (illumos).
MFC r246666: Import vendor ZFS bugfix fixing a problem in arc_read().
Illumos ZFS issues: 3498 panic in arc_read(): !refcount_is_zero(&pbuf->b_hdr->b_refcnt)
MFC r246675: Add tunable to allow block allocation on degraded vdevs.
Illumos ZFS issues: 3507 Tunable to allow block allocation even on degraded vdevs
MFC r246678: Import vendor bugfixes regarding SA rounding, header size and layout. This was already partially fixed by avg.
Illumos ZFS issues: 3512 rounding discrepancy in sa_find_sizes() 3513 mismatch between SA header size and layout
MFC r246688 [1]: Merge zfs_ioctl.c code that should have been merged together with ZFS v28. Fixes several problems if working with read-only pools.
Changed code originaly introduced in onnv-gate 13061:bda0decf867b Contains changes up to illumos-gate 13700:4bc0783f6064
PR: kern/175897 [1] Suggested by: avg [1]
|
247309 |
26-Feb-2013 |
delphij |
MFC r246586,246587,246619,246624,246768,246808:
LZ4 compression support in ZFS.
(Note: sys/conf/files change omitted from this changeset).
|
246574 |
09-Feb-2013 |
delphij |
MFC r245264:
The current ZFS code expects ddt_zap_count to always succeed by asserting the underlying zap_count() to return no errors. However, it is possible that the pool reaches to such a state where zap_count would return error, leading to panics when a pool is imported.
This commit changes the ddt_zap_count to return error returned from zap_count and handle the error appropriately. With this change, it's now possible to let zpool rollback damaged transaction groups and import the pool.
Obtained from: ZFS on Linux github (e8fd45a0f975c6b8ae8cd644714fc21f14fac2bf)
|
246534 |
08-Feb-2013 |
avg |
MFC r246242: zfs: add MODULE_VERSION for zfsctrl
|
246240 |
02-Feb-2013 |
avg |
MFC r245945: spa_generate_rootconf: add support for old vdev labels
|
246200 |
01-Feb-2013 |
delphij |
MFC r245511:
Improve the comment in txg.c
Obtained from: Illumos (13910:f3454e0a097c)
|
245693 |
20-Jan-2013 |
avg |
MFC r243518: add zfs_bmap to aid vnode_pager_haspage
|
245692 |
20-Jan-2013 |
avg |
MFC r243763: zfs_getpages: make use of vm_page_readahead_finish
|
245691 |
20-Jan-2013 |
avg |
MFC r243517: zfs_getpages: optimize for large block sizes
|
245664 |
19-Jan-2013 |
kib |
MFC r245409: For zfs vnodes, use the standard inode number based hash algorithm.
|
244636 |
23-Dec-2012 |
avg |
MFC r244635: zfs: solaris doesn't have KM_ZERO, kmem_zalloc should be used instead
|
244626 |
23-Dec-2012 |
avg |
MFC r243520,243521: zfs: overhaul zfs-vfs glue for vnode life-cycle management
|
244624 |
23-Dec-2012 |
avg |
MFC r242567: zfs_mount: drop vfs.zfs.rootpool.prefer_cached_config tunable
|
244622 |
23-Dec-2012 |
avg |
MFC r243502: zfs roopool: add support for multi-vdev configurations
|
244613 |
23-Dec-2012 |
avg |
MFC r243519: zfs_fhtovp: there is no reason to amend lock flags with LK_RETRY here
|
244611 |
23-Dec-2012 |
avg |
MFC r243497: zfs: create devices/geoms from zvols after receiveing them
PR: kern/167066
|
244344 |
17-Dec-2012 |
delphij |
MFC r243807:
Use SA_ZPL_CRTIME instead of SA_ZPL_CTIME for creation time.
Submitted by: phil.stone at gmx.com
|
244087 |
10-Dec-2012 |
mm |
MFC recent ZFS changes from illumos: 243503, 243524, 243525, 243560, 243561
MFC r243503: Illumos 13879:4eac7a87eff2 3329 spa_sync() spends 10-20% of its time in spa_free_sync_cb() 3330 space_seg_t should have its own kmem_cache 3331 deferred frees should happen after sync_pass 1 3335 make SYNC_PASS_* constants tunable
New loader-only tunables: vfs.zfs.sync_pass_deferred_free vfs.zfs.sync_pass_dont_compress vfs.zfs.sync_pass_rewrite
References: https://www.illumos.org/issues/3329 https://www.illumos.org/issues/3330 https://www.illumos.org/issues/3331 https://www.illumos.org/issues/3335
MFC r243524: Import the zio nop-write improvement from Illumos. To reduce I/O, nop-write omits overwriting data if the checksum (cryptographically secure) of new data matches the checksum of existing data. It also saves space if snapshots are in use.
It currently works only on datasets with enabled compression, disabled deduplication and sha256 checksums.
IllumOS 13887:196932ec9e6a and 13888:7204b3392a58 3236 zio nop-write
References: https://www.illumos.org/issues/3236
MFC r243525: Add loader(8) tunable to enable/disable nopwrite functionality: vfs.zfs.nopwrite_enabled
MFC r243560: Introduce a new dataset aclmode setting "restricted" to protect ACL's being destroyed or corrupted by a drive-by chmod.
illumos-gate 13889:a67716f16746 3254 add support in zfs for aclmode=restricted
MFC r243561: Update manpage dates in zfs.8 and zpool.8
|
243813 |
03-Dec-2012 |
delphij |
MFC r242332:
s/dettach/detach/g
Approved by: pjd
|
243777 |
01-Dec-2012 |
avg |
MFC r243270: zfs_remove: assert that delete_now case is never true on FreeBSD
|
243775 |
01-Dec-2012 |
avg |
MFC r243268: zfs_remove: set VV_NOSYNC flag if a node is unlinked
|
243773 |
01-Dec-2012 |
avg |
MFC r243501: spa_import_rootpool: initialize ub_version before calling spa_config_parse
|
243771 |
01-Dec-2012 |
avg |
MFC r243500: spa_import_rootpool: do not call spa_history_log_version
|
243767 |
01-Dec-2012 |
avg |
MFC r242575: zfs_dirlook: bailout early if directory is unlinked
|
243674 |
29-Nov-2012 |
mm |
Merge ZFS feature flags support and related bugfixes: 236884, 237001, 237119, 237458, 237972, 238113, 238391, 238422, 238926, 238950, 238951, 239389, 239394, 239620, 239774, 239953, 239958, 239967, 239968, 240063, 240133, 240153, 240303, 240345, 240415, 240955, 241655, 243014, 243505, 243506
MFC r236884: Introduce "feature flags" for ZFS pools (bump SPA version to 5000). Add first feature "com.delphix:async_destroy" (asynchronous destroy of ZFS datasets). Implement features support in ZFS boot code.
Illumos revisions merged: 13700:2889e2596bd6 13701:1949b688d5fb 2619 asynchronous destruction of ZFS file systems 2747 SPA versioning with zfs feature flags
References: https://www.illumos.org/issues/2619 https://www.illumos.org/issues/2747
MFC r237001: Fix ZFS boot with pre-features pools (version <= 28) broken in r236884
MFC r237119 [1]: Do not remount ZFS dataset if changing canmount property to "on" and dataset is already mounted.
MFC r237458: Import Illumos revision 13736:9f1d48e1681f 2901 ZFS receive fails for exabyte sparse files
References: https://www.illumos.org/issues/2901
MFC r237972: Expose scrub and resilver tunables. This allows the user to tune the priority trade-off between scrub/resilver and other ZFS I/O.
MFC r238113 (pjd): vdev_io_done stage is not used for ioctls.
MFC r238391: Change behavior introduced in r237119 to vendor solution
References: https://www.illumos.org/issues/2883
MFC r238422: Merge illumos commit 13749:df4cd82e2b60
1796 "ZFS HOLD" should not be used when doing "ZFS SEND" froma read-only pool 2871 support for __ZFS_POOL_RESTRICT used by ZFS test suite 2903 zfs destroy -d does not work 2957 zfs destroy -R/r sometimes fails when removing defer-destroyed snapshot
References: https://www.illumos.org/issues/1796 https://www.illumos.org/issues/2871 https://www.illumos.org/issues/2903 https://www.illumos.org/issues/2957
MFC r238926: Partial MFV (illumos-gate 13753:2aba784c276b) 2762 zpool command should have better support for feature flags
References: https://www.illumos.org/issues/2762
MFC r238950: Fix reporting of root pool upgrade notice.
MFC r238951: Fix wrong indent according to style(9)
MFC r239389: Backport fix for vendor issue #3085 3085 zfs diff panics, then panics in a loop on booting
References: https://www.illumos.org/issues/3085
MFC r239394: Update zfs(8) manpage with illumos version of "zfs diff"
Illumos issue: 2399 zfs manual page does not document use of "zfs diff"
References: https://www.illumos.org/issues/2399
MFC r239620 [2]: Merge recent vendor changes: 3086 unnecessarily setting DS_FLAG_INCONSISTENT on async destroyed datasets 3090 vdev_reopen() during reguid causes vdev to be treated as corrupt 3102 vdev_uberblock_load() and vdev_validate() may read the wrong label
Referenes: https://www.illumos.org/issues/3086 https://www.illumos.org/issues/3090 https://www.illumos.org/issues/3102
MFC r239774: Merge recent vendor changes: 3100 zvol rename fails with EBUSY when dirty 3104 eliminate empty bpobjs 3120 zinject hangs in zfsdev_ioctl() due to uninitialized zc
References: https://www.illumos.org/issues/3100 https://www.illumos.org/issues/3104 https://www.illumos.org/issues/3120
MFC r239953 (joel): Mdoc fixes.
MFC r239958 (joel): Minor mdoc fixes.
MFC r239967 (joel): Mdoc fixes.
MFC r239968 (joel): Remove trailing whitespace.
MFC r240063 (gjb): Add myself to copyright sections, per CDDL license.
MFC r240133: Merge recent vendor changes and sync code: 1862 incremental zfs receive fails for sparse file > 8PB 3112 ztest does not honor ZFS_DEBUG 3122 zfs destroy filesystem should prefetch blocks 3129 'zpool reopen' restarts resilvers 3130 ztest failure: Assertion failed: 0 == dmu_objset_destroy(name, B_FALSE) (0x0 == 0x10)
References: https://www.illumos.org/issues/1862 https://www.illumos.org/issues/3112 https://www.illumos.org/issues/3122 https://www.illumos.org/issues/3129 https://www.illumos.org/issues/3130
MFC r240153 (gjb) [3]: Typo fix and minor word swap.
MFC r240303: Add assfail() and assfail3() to the opensolaris module. Remove obsoleted intermediate cddl/compat/opensolaris/sys/debug.h.
MFC r240345 (avg): zfs: fix sa_modify_attrs handling of variable-sized attributes
- skip length_idx index for a replaced variable-sized attribute - skip length_idx index for a removed variable-sized attribute - also re-arranged code to make sure that length_idx is always incremented for variable-sized attributes - additionally add an assertion that the number of actually produced attributes is the same as the expected number of resulting attributes
MFC r240415: Merge recent zfs vendor changes, sync code and adjust userland DEBUG.
Illumos issued covered: 1884 Empty "used" field for zfs *space commands 3006 VERIFY[S,U,P] and ASSERT[S,U,P] frequently check if first argument is zero 3028 zfs {group,user}space -n prints (null) instead of numeric GID/UID 3048 zfs {user,group}space [-s|-S] is broken 3049 zfs {user,group}space -t doesn't really filter the results 3060 zfs {user,group}space -H output isn't tab-delimited 3061 zfs {user,group}space -o doesn't use specified fields order 3064 usr/src/cmd/zpool/zpool_main.c misspells "successful" 3093 zfs {user,group}space's -i is noop 3098 zfs userspace/groupspace fail without saying why when run as non-root
References: https://www.illumos.org/issues/ + [issue_id]
MFC r240955 (partial): Merge recent vendor changes in ZFS.
Illumos issued covered: 3139 zdb dies when it tries to determine path of unlinked file 3189 kernel panic in ZFS test suite during hotspare_onoffline_004_neg 3208 moving zpool cross-endian results in incorrect user/group accounting
References: https://www.illumos.org/issues/ + [issue_id]
MFC r241655: Add missing initialization for do_prefix. Corrects porting error in r238391
Vendor issue and changeset reference: 2883 changing "canmount" property to "on" should not always remount dataset https://www.illumos.org/issues/2883 Changeset 13743:95aba6e49b9f
MFC r243014: Move zpool-features manual page from section 5 to section 7 and fix references
Reported by: pluknet
MFC r243505: Illumos 13886:e3261d03efbf
3349 zpool upgrade -V bumps the on disk version number, but leaves the in core version
References: https://www.illumos.org/issues/3349
MFC r243506: zfs sha256 checksum is missing in zfs.8 manpage
PR: kern/167905 [1], kern/170912 [2], kern/170914 [2], doc/171356 [3]
|
243495 |
24-Nov-2012 |
avg |
MFC r242574: zfsctl_snapdir_lookup: obtain a snapname in the remount case
|
243493 |
24-Nov-2012 |
avg |
MFC r242573: zfs: set MNTK_EXTENDED_SHARED flag
|
243488 |
24-Nov-2012 |
avg |
MFC r242571: zfs_vnode_forget: dispose of larvae vnode using public vfs api (mostly)
|
243486 |
24-Nov-2012 |
avg |
MFC r242570: zfs_umount: no need to set MNTK_UNMOUNTF here, dounmount handles that
|
243482 |
24-Nov-2012 |
avg |
MFC r242568: zfs_vnode_lock: no need to double-guess caller's intentions here
|
243480 |
24-Nov-2012 |
avg |
MFC r243213: spa_import_rootpool: fall back to use configuration from zpool.cache
|
243478 |
24-Nov-2012 |
avg |
MFC r242862: zfs_ioc_destroy_snaps_nvl: remove disk device entries for zvol snapshots
|
243214 |
18-Nov-2012 |
avg |
MFC r242566: zfs_freebsd_close: call zfs_close with count=1 instead of count=0
|
242858 |
10-Nov-2012 |
avg |
MFC r241773: zfs: wait in arc_lowmem only if curproc == pageproc
|
242554 |
04-Nov-2012 |
avg |
MFC r241286,r242135: zfs_mount: taste geom providers for root pool config
|
242240 |
28-Oct-2012 |
avg |
MFC r241628: zfs: make use of getnewvnode_reserve in zfs_mknode and zfs_zget
|
241769 |
20-Oct-2012 |
avg |
MFC r241297: zvol: set mediasize in geom provider right upon its creation
|
241634 |
17-Oct-2012 |
avg |
MFC r240831: zfs: allow a zvol to be used as a pool vdev, again
|
241270 |
06-Oct-2012 |
avg |
MFC r240632: zfs: correctly calculate dn_bonuslen for saving SAs to disk
|
241268 |
06-Oct-2012 |
avg |
MFC r240631: zfs: allow both DEBUG and ZFS_DEBUG to be defined on command line
|
241262 |
06-Oct-2012 |
avg |
MFC r240345: zfs: fix sa_modify_attrs handling of variable-sized attributes
|
240957 |
26-Sep-2012 |
mm |
MFC r236248-r236250:
MFC r236248 (pjd): Remove unused variable.
MFC r236249 (pjd): Eliminate 'where' argument, we don't use it.
MFC r236250 (pjd): Tighten up the assertion: because size can't be 0 and even if sm_space is equal to sm_size, any 'sm_space - size' will be less than sm_size.
|
237959 |
02-Jul-2012 |
mm |
MFC r236247 (pjd): Remove unused sysctl.
|
237746 |
29-Jun-2012 |
mm |
MFC r237458: Import Illumos revision 13736:9f1d48e1681f 2901 ZFS receive fails for exabyte sparse files
References: https://www.illumos.org/issues/2901
Obtained from: illumos (issue #2901)
|
237643 |
27-Jun-2012 |
mm |
MFC r236823 (pjd):
ds_guid of 0 is special, as it is used by snapshot receive code to differentiate between an incremental and full stream. Be sure not to generate guid equal to 0.
Reported by: someone who saw 0 being generated as 64bit random guid
|
236839 |
10-Jun-2012 |
mm |
MFC r236155: Import illumos changeset 13570:3411fd5f1589 1948 zpool list should show more detailed pool information
Display per-vdev information with "zpool list -v". The added expandsize property has currently no value on FreeBSD. This changeset allows adding expansion support to individual vdevs in the future.
References: https://www.illumos.org/issues/1948
Obtained from: illumos (issue #1948)
|
236636 |
05-Jun-2012 |
trasz |
MFC r235781:
Fix enforcement of file size limit with O_APPEND on ZFS.
vn_rlimit_fsize takes uio->uio_offset and uio->uio_resid into account when determining whether given write would exceed RLIMIT_FSIZE.
When APPEND flag is specified, ZFS updates uio->uio_offset to point to the end of file.
But this happens after a call to vn_rlimit_fsize, so vn_rlimit_fsize check can be rendered ineffective by thread that opens some file with O_APPEND and lseeks below RLIMIT_FSIZE before calling write.
|
236505 |
03-Jun-2012 |
mm |
MFC r236145, r236146:
MFC r236145 [1]: Import illumos changeset 13564:cf89c0c60496 1946 incorrect formatting when listing output of multiple pools with zpool iostat -v
MFC r236146 [2]: Import illumos changeset 13605:b5c2b5db80d6 (partial) 763 FMD msg URLs should refer to something visible Replace sun.com URL's with illumos.org
References: https://www.illumos.org/issues/1946 [1] https://www.illumos.org/issues/763 [2]
Obtained from: illumos (issue #1946 [1], #763 [2])
|
235951 |
25-May-2012 |
mm |
MFC r235222: Import illumos changeset 13686:4bc0783f6064 2703 add mechanism to report ZFS send progress
If the zfs send command is used with the -v flag, the amount of bytes transmitted is reported in per second updates.
References: https://www.illumos.org/issues/2703
Obtained from: illumos (issue #2703)
|
234211 |
13-Apr-2012 |
avg |
MFC r233918: zfs_ioctl: no need for ddi_copyin/out here because sys_ioctl handles that
|
232728 |
09-Mar-2012 |
mm |
Jail-mount MFC: r231265,r231267,r231269,r232059,r232186,r232247, r232278,r232307,r232342
MFC r231265: Introduce the "ruleset=number" option for devfs(5) mounts. Add support for updating the devfs mount (currently only changing the ruleset number is supported). Check mnt_optnew with vfs_filteropt(9).
This new option sets the specified ruleset number as the active ruleset of the new devfs mount and applies all its rules at mount time. If the specified ruleset doesn't exist, a new empty ruleset is created.
MFC r231267 [1]: Add support for mounting devfs inside jails.
A new jail(8) option "devfs_ruleset" defines the ruleset enforcement for mounting devfs inside jails. A value of -1 disables mounting devfs in jails, a value of zero means no restrictions. Nested jails can only have mounting devfs disabled or inherit parent's enforcement as jails are not allowed to view or manipulate devfs(8) rules.
Utilizes new functions introduced in r231265.
MFC r231269: Allow mounting nullfs(5) inside jails.
This is now possible thanks to r230129.
MFC r232059 [1]: To improve control over the use of mount(8) inside a jail(8), introduce a new jail parameter node with the following parameters:
allow.mount.devfs: allow mounting the devfs filesystem inside a jail
allow.mount.nullfs: allow mounting the nullfs filesystem inside a jail
Both parameters are disabled by default (equals the behavior before devfs and nullfs in jails). Administrators have to explicitly allow mounting devfs and nullfs for each jail. The value "-1" of the devfs_ruleset parameter is removed in favor of the new allow setting.
MFC r232186: Analogous to r232059, add a parameter for the ZFS file system:
allow.mount.zfs: allow mounting the zfs filesystem inside a jail
This way the permssions for mounting all current VFCF_JAIL filesystems inside a jail are controlled wia allow.mount.* jail parameters.
Update sysctl descriptions. Update jail(8) and zfs(8) manpages.
MFC r232247: mdoc(7) stype - start new sentences on new line
MFC r232278 [1]: Add procfs to jail-mountable filesystems.
MFC r232291: Bump .Dd to reflect latest update
MFC r232307: Add "export" to devfs_opts[] and return EOPNOTSUPP if called with it. Fixes mountd warnings.
MFC r232342 (jamie) [2]: Handle the case where a boolean parameter is also a node.
PR: bin/165515 [2] Reviewed by: jamie [1]
|
232066 |
23-Feb-2012 |
kmacy |
MFC r230623
exclude kmem_alloc'ed ARC data buffers from kernel minidumps on amd64 excluding other allocations including UMA now entails the addition of a single flag to kmem_alloc or uma zone create
Reviewed by: alc,avg
|
231946 |
20-Feb-2012 |
mm |
MFC r230397, r230438:
MFC r230397 (pjd): By default turn off prefetch when listing snapshots. In my tests it makes listing snapshots 19% faster with cold cache and 47% faster with warm cache.
MFC r230438 (pjd): Dramatically optimize listing snapshots when user requests only snapshot names and wants to sort them by name, ie. when executes:
# zfs list -t snapshot -o name -s name
Because only name is needed we don't have to read all snapshot properties.
Below you can find how long does it take to list 34509 snapshots from a single disk pool before and after this change with cold and warm cache:
before:
# time zfs list -t snapshot -o name -s name > /dev/null cold cache: 525s warm cache: 218s
after:
# time zfs list -t snapshot -o name -s name > /dev/null cold cache: 1.7s warm cache: 1.1s
|
231141 |
07-Feb-2012 |
mm |
MFC r230514: Merge illumos revisions 13572, 13573, 13574:
Rev. 13572: disk sync write perf regression when slog is used post oi_148 [1]
Rev. 13573: crash during reguid causes stale config [2] allow and unallow missing from zpool history since removal of pyzfs [5]
Rev. 13574: leaking a vdev when removing an l2cache device [3] memory leak when adding a file-based l2arc device [4] leak in ZFS from metaslab_group_create and zfs_ereport_checksum [6]
References: https://www.illumos.org/issues/1909 [1] https://www.illumos.org/issues/1949 [2] https://www.illumos.org/issues/1951 [3] https://www.illumos.org/issues/1952 [4] https://www.illumos.org/issues/1953 [5] https://www.illumos.org/issues/1954 [6]
Obtained from: illumos (issues #1909, #1949, #1951, #1952, #1953, #1954)
|
230497 |
24-Jan-2012 |
pluknet |
MFC r230256: Fix the "lock &zrl->zr_mtx already initialized" assertion by initializing the allocated memory before calling mtx_init(9) on mtx pointing to it. Otherwize, random contents of uninitialized memory might occasionally trigger the assertion.
Reported by: Pavel Polyakov <bsd kobyla org> Reviewed by: pjd
|
229925 |
10-Jan-2012 |
dim |
MFC r229425:
In sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c, check the the number of links against LINK_MAX (which is INT16_MAX), not against UINT32_MAX. Otherwise, the constant would implicitly be converted to -1.
Reviewed by: pjd
|
229703 |
06-Jan-2012 |
kib |
MFC r227697: Change the interface for VOP_VPTOCNP(), now the dvp must be referenced. Convert all in-tree implementations of VOP_VPTOCNP(). This fixes VOP_VPTOCNP bypass for nullfs.
Approved by: re (bz)
|
229578 |
05-Jan-2012 |
mm |
MFC r228103, r228104:
MFC r228103: Merge new ZFS features from illumos:
1644 add ZFS "clones" property https://www.illumos.org/issues/1644
1645 add ZFS "written" and "written@..." properties https://www.illumos.org/issues/1645
1646 "zfs send" should estimate size of stream https://www.illumos.org/issues/1646
1647 "zfs destroy" should determine space reclaimed by destroying multiple snapshots https://www.illumos.org/issues/1647
1693 persistent 'comment' field for a zpool https://www.illumos.org/issues/1693
1708 adjust size of zpool history data https://www.illumos.org/issues/1708
1748 desire support for reguid in zfs https://www.illumos.org/issues/1748
MFC r228104: Fix typo in copyright notice.
Obtained from: illumos (changesets 13514, 13524, 13525)
|
229568 |
05-Jan-2012 |
mm |
MFC r228363, r228392:
MFC r228363 (pjd): The vfs.zfs.txg.timeout sysctl can be safely modified at run time.
MFC r228392 (pjd) [1]: Move ru_inblock increment into arc_read_nolock() so we don't account for cached reads.
Discussed with: gibbs No objections from: avg Tested by: Marcus Reid <marcus@blazingdot.com> [1] Approved by: pjd
|
229565 |
05-Jan-2012 |
mm |
MFC r226676, r226678, r226700, r226705, r226706, r226707:
MFC r226676 (pjd): Allow to rename file systems without remounting if it is possible. It is possible for file systems with 'mountpoint' preperty set to 'legacy' or 'none' - we don't have to change mount directory for them. Currently such file systems are unmounted on rename and not even mounted back.
This introduces layering violation, as we need to update 'f_mntfromname' field in statfs structure related to mountpoint (for the dataset we are renaming and all its children).
In my opinion it is worth it, as it allow to update FreeBSD in even cleaner way - in ZFS-only configuration root file system is ZFS file system with 'mountpoint' property set to 'legacy'. If root dataset is named system/rootfs, we can snapshot it (system/rootfs@upgrade), clone it (system/oldrootfs), update FreeBSD and if it doesn't boot we can boot back from system/oldrootfs and rename it back to system/rootfs while it is mounted as /. Before it was not possible, because unmounting / was not possible.
MFC r227768 (pjd): Include <sys/zfs_vfsops.h> only when compiling kernel module.
MFC r226700 (pjd): Don't forget to rename mounted snapshots of the file system being renamed.
MFC r226705 (pjd): Extend r226676 to allow rename without unmount even for file systems with non-legacy mountpoints. It is better to be able to rename such file systems and let them be mounted in old places until next reboot than using live CD, etc. to rename with remount.
This is implemented by adding -u option to 'zfs rename'. If file system's mountpoint property is set to 'legacy' or 'none', there is no need to specify -u.
Update zfs(8) manual page to reflect this addition.
MFC r226706 (pjd): Update copyright to include myself.
MFC r226707 (pjd): - Use better naming now that we allow to rename any mounted file system (not only legacy). - Update copyright to include myself.
Approved by: pjd
|
227923 |
24-Nov-2011 |
pjd |
MFC r227110,r227111:
r227110:
In zvol_open() if the spa_namespace_lock is already held, it means that ZFS is trying to open and taste ZVOL as its VDEV. This is not supported, so return an error instead of panicing on spa_namespace_lock recursion.
Reported by: Robert Millan <rmh@debian.org> PR: kern/162008
r227111:
Correct typo in comment.
Reported by: Fabian Keil <fk@fabiankeil.de>
Approved by: re (kib)
|
227704 |
19-Nov-2011 |
pjd |
MFC r226617:
zfs vdev_file_io_start: validate vdev before using vdev_tsd
vdev_tsd can be NULL for certain vdev states. At least in userland testing with ztest.
Submitted by: avg Approved by: re (kib)
|
227702 |
19-Nov-2011 |
pjd |
MFC r226620:
Update per-thread I/O statistics collection in ZFS. This allows to see processes I/O activity in 'top -m io' output.
PR kern/156218 Reported by: Marcus Reid <marcus@blazingdot.com> Patch by: avg Approved by: re (kib)
|
226944 |
30-Oct-2011 |
mm |
MFC r226512:
Import fix for Illumos bug #1475 to reduce diff against upstream.
Panic caused by this bug was already partially fixed by pjd@ in p4 CH 185940 and 185942.
Reference: 1475 zfs spill block hold can access invalid spill blkptr https://www.illumos.org/issues/1475
Reviewed by: delphij Obtained from: Illumos (issue 1475, changeset 13469:b8e89e5c4167) Approved by: re (kib)
|
226943 |
30-Oct-2011 |
mm |
MFC r226724, r226732:
MFC r226724: Update copyright information in several ZFS files, as the clause 3.3 of the CDDL licence explicitly requires every Contributor to add a copyright notice.
This also reflects the copyright notices for the changes recently added by Illumos.
MFC r226732: [1] Fix typo in copyright notice introduced in r226724 (missing character in e-mail adress)
Reported by: pjd [1] Approved by: re (kib)
|
226581 |
20-Oct-2011 |
delphij |
MFC r226483:
Fix a bug in sa_find_sizes() which could lead to panic: When calculating space needed for SA_BONUS buffers, hdrsize is always rounded up to next 8-aligned boundary. However, in two places the round up was done against sum of 'total' plus hdrsize. On the other hand, hdrsize increments by 4 each time, which means in certain conditions, we would end up returning with will_spill == 0 and (total + hdrsize) larger than full_space, leading to a failed assertion because it's invalid for dmu_set_bonus.
Sponsored by: iXsystems, Inc. Reviewed by: mm Approved by: re (kib)
|
225736 |
23-Sep-2011 |
kensmith |
Copy head to stable/9 as part of 9.0-RELEASE release cycle.
Approved by: re (implicit)
|
225418 |
06-Sep-2011 |
kib |
Split the vm_page flags PG_WRITEABLE and PG_REFERENCED into atomic flags field. Updates to the atomic flags are performed using the atomic ops on the containing word, do not require any vm lock to be held, and are non-blocking. The vm_page_aflag_set(9) and vm_page_aflag_clear(9) functions are provided to modify afalgs.
Document the changes to flags field to only require the page lock.
Introduce vm_page_reference(9) function to provide a stable KPI and KBI for filesystems like tmpfs and zfs which need to mark a page as referenced.
Reviewed by: alc, attilio Tested by: marius, flo (sparc64); andreast (powerpc, powerpc64) Approved by: re (bz)
|
225166 |
25-Aug-2011 |
mm |
Generalize ffs_pages_remove() into vn_pages_remove().
Remove mapped pages for all dataset vnodes in zfs_rezget() using new vn_pages_remove() to fix mmapped files changed by zfs rollback or zfs receive -F.
PR: kern/160035, kern/156933 Reviewed by: kib, pjd Approved by: re (kib) MFC after: 1 week
|
225153 |
24-Aug-2011 |
pjd |
We need to unlock and destroy vnode attached to znode which we are freeing.
Reviewed by: kib Approved by: re (bz) MFC after: 1 week
|
224855 |
13-Aug-2011 |
mm |
zfs_ioctl.c: improve code readability in zfs_ioc_dataset_list_next()
zvol.c: fix calling of dmu_objset_prefetch() in zvol_create_minors() by passing full instead of relative dataset name and prefetching all visible datasets to be processed later instead of just the pool name
Reviewed by: pjd Approved by: re (kib) MFC after: 1 week > Reviewed by: If someone else reviewed your modification. > Approved by: If you needed approval for this commit. > Obtained from: If the change is from a third party. > MFC after: N [day[s]|week[s]|month[s]]. Request a reminder email. > Security: Vulnerability reference (one per line) or description. > Empty fields above will be automatically removed.
M opensolaris/uts/common/fs/zfs/zfs_ioctl.c M opensolaris/uts/common/fs/zfs/zvol.c
|
224814 |
13-Aug-2011 |
mm |
Fix race between dmu_objset_prefetch() invoked from zfs_ioc_dataset_list_next() and dsl_dir_destroy_check() indirectly invoked from dmu_recv_existing_end() via dsl_dataset_destroy() by not prefetching temporary clones, as these count as always inconsistent. In addition, do not prefetch hidden datasets at all as we are not going to process these later.
Filed as Illumos Bug #1346
PR: kern/157728 Tested by: Borja Marcos <borjam@sarenet.es>, mm Reviewed by: pjd Approved by: re (kib) MFC after: 1 week
|
224791 |
12-Aug-2011 |
pjd |
Eliminate the zfsdev_state_lock entirely and replace it with the spa_namespace_lock. This fixes LOR between the spa_namespace_lock and spa_config lock. LOR can cause deadlock on vdevs removal/insertion.
Reported by: gibbs, delphij Tested by: delphij Approved by: re (kib) MFC after: 1 week
|
224605 |
02-Aug-2011 |
mm |
Fix panic in zfs_read() if IO_SYNC flag supplied by checking for zfsvfs->z_log before calling zil_commit(). [1] Do not call zfs_read() from zfs_getextattr() with the IO_SYNC flag.
Submitted by: Alexander Zagrebin <alex@zagrebin.ru> [1] Reviewed by: pjd@ Approved by: re (kib) MFC after: 3 days
|
224579 |
01-Aug-2011 |
mm |
Fix integer overflow in txg_delay() by initializing the variable "timeout" as clock_t.
Filed as Illumos Bug #1313
Reviewed by: avg Approved by: re (kib) MFC after: 3 days
|
224526 |
30-Jul-2011 |
mm |
Fix serious bug in ZIL that can lead to pool corruption in the case of a held dataset during remount.
Detailed description is available at: https://www.illumos.org/issues/883
illumos-gate revision: 13380:161b964a0e10
Reviewed by: pjd Approved by: re (kib) Obtained from: Illumos (Bug #883) MFC after: 3 days
|
224252 |
21-Jul-2011 |
delphij |
Bring the code more in-line with OpenSolaris source to ease future port.
Reviewed by: pjd, mm Approved by: re (kib)
|
224251 |
21-Jul-2011 |
delphij |
A different implementation of r224231 proposed by pjd@, which does not require change in the znode structure. Specifically, it queries rdev from the znode in the same sa_bulk_lookup already done in zfs_getattr().
Submitted by: pjd (with some revisions) Reviewed by: pjd, mm Approved by: re (kib)
|
224231 |
20-Jul-2011 |
delphij |
Add a new field to in-core znode, z_rdev, to represent device nodes.
PR: kern/159010 Reviewed by: mm@ Approved by: re (kib) MFC after: 2 weeks
|
224177 |
18-Jul-2011 |
mm |
ZFS tries to allocate blocks evenly across all devices. This means when devices are imbalanced zfs will lots of CPU searching for space on devices which tend to be pretty full. It should instead fail quickly on the full devices and move onto devices which have more availability.
New loader tunable: vfs.zfs.mg_alloc_failures (min = 8)
Illumos-gate changeset: 13379:4df42cc92254
Obtained from: Illumos (Bug #1051) MFC after: 2 weeks
|
224174 |
18-Jul-2011 |
mm |
Resurrect the ZFS "aclmode" property Change default of "aclmode" to "discard".
Illumos-gate changeset: 13370:8c04143bd318
Obtained from: Illumos (Feature #742) MFC after: 2 weeks
|
223623 |
28-Jun-2011 |
mm |
Add a new "REFCOMPRESSRATIO" property.
For snapshots, this is the same as COMPRESSRATIO, but for filesystems/volumes, the COMPRESSRATIO is based on the data "USED" (ie, includes blocks in children, but not blocks shared with the origin).
This is needed to figure out how much space a filesystem would use if it were not compressed (ignoring snapshots).
Illumos-gate revision: 13387
Obtained from: Illumos (Feature #1092) MFC after: 2 weeks
|
223622 |
28-Jun-2011 |
mm |
Disable vdev cache (readahead) by default.
The vdev cache is very underutilized (hit ratio 30%-70%) and may consume excessive memory on systems with many vdevs.
Illumos-gate revision: 13346
Obtained from: Illumos (Bug #175) MFC after: 1 week
|
222950 |
10-Jun-2011 |
gibbs |
Remove C constructs that are incompatible with C++ from various OpenSolaris and ZFS header files. These changes are sufficient to allow a C++ program to use the libzfs library.
Note: The majority of these files already included 'extern "C"' declarations, so the intention of providing C++ compatibility already existed even if it wasn't provided.
cddl/compat/opensolaris/include/assert.h: Wrap our compatibility assert implementation in 'extern "C"'. Since this is a compatibility header I matched the Solaris style of doing this explicitly rather than rely on FreeBSD's __BEGIN/END_DECLS macro.
sys/cddl/compat/opensolaris/sys/kstat.h: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/arc.h: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/dsl_pool.h: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/ddt.h: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/spa.h: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zio.h: Rename parameters in function declarations that conflict with C++ keywords. This was the solution preferred by members of the Illumos community.
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zfs_ioctl.h: In C, nested structures are visible in the global namespace, but in C++, they take on the namespace of the structure in which they are contained. Flatten nested structure definitions within struct zfs_cmd so these structures are visible in the global namespace when compiled in both languages.
Sponsored by: Spectra Logic Corporation
|
222835 |
07-Jun-2011 |
mm |
Silence notice on pool creation, import and access.
Suggested by: Jeremy Chadwick (freebsd-stable@) Discussed with: pjd MFC after: 1 week
|
222268 |
24-May-2011 |
pjd |
Don't pass pointer to name buffer which is on the stack to another thread, because the stack might be paged out once the other thread tries to use the data. Instead, just allocate memory.
MFC after: 2 weeks
|
222267 |
24-May-2011 |
pjd |
Don't access task structure once we call task function. The task structure might be no longer available. This also allows to eliminates the need for two tasks in the zio structure.
Submitted by: anonymous MFC after: 2 weeks
|
222199 |
22-May-2011 |
rmacklem |
Fix the zfs file system so that it uses the lock flags argument added to VFS_FHTOVP() by r222167.
Reviewed by: pjd
|
222167 |
22-May-2011 |
rmacklem |
Add a lock flags argument to the VFS_FHTOVP() file system method, so that callers can indicate the minimum vnode locking requirement. This will allow some file systems to choose to return a LK_SHARED locked vnode when LK_SHARED is specified for the flags argument. This patch only adds the flag. It does not change any file system to use it and all callers specify LK_EXCLUSIVE, so file system semantics are not changed.
Reviewed by: kib
|
222050 |
18-May-2011 |
mm |
Restore old (v15) behaviour for a recursive snapshot destroy. (zfs destroy -r pool/dataset@snapshot)
To destroy all descendent snapshots with the same name the top level snapshot was not required to exist. So if the top level snapshot does not exist, check permissions of the parent dataset instead.
Filed as Illumos Bug #1043
Reviewed by: delphij Approved by: pjd MFC after: together with v28
|
221409 |
03-May-2011 |
marius |
Convert the last use of xcopyout() to ddi_copyout() and remove the now unused xcopyin() as well as xcopyout(). MFC together with r219089.
Approved by: mm
|
221263 |
30-Apr-2011 |
mm |
Fix deduplicated zfs receive (dmu_recv_stream builds incomplete guid_to_ds_map)
Illumos-gate changeset: 13329:c48b8bf84ab7 MFC together with v28
Approved by: pjd Obtained from: Illumos (Bug #755)
|
219973 |
24-Mar-2011 |
pjd |
Checking file access on size change is bogus. The checks are done earlier by VFS where we know if this is truncate(2) or ftruncate(2). If this is the latter we should depend on the mode the file was opened and not on the current permission.
PR: standards/154873 Reported by: Mark Martinec <Mark.Martinec@ijs.si> Discussed with: Eric Schrock <eric.schrock@delphix.com> Discussed with: Mark Maybee <Mark.Maybee@Oracle.COM> MFC after: 1 month
|
219636 |
14-Mar-2011 |
pjd |
Fix potential panic in dbuf_sync_list() relate to spill blocks handling.
Obtained from: IllumOS MFC after: 1 month
|
219404 |
08-Mar-2011 |
pjd |
Correct readdir over ZFS handling.
Reported by: Pierre Beyssac <pb@fasterix.frmug.org> MFC after: 1 month
|
219320 |
06-Mar-2011 |
pjd |
Fix libzpool build.
MFC after: 1 month
|
219317 |
05-Mar-2011 |
pjd |
Make renaming of a ZVOL, ZVOL's parent directory and ZVOL snapshot work.
Reported by: avg MFC after: 1 month
|
219316 |
05-Mar-2011 |
pjd |
Simplify zvol_remove_minors() a bit.
MFC after: 1 month
|
219089 |
27-Feb-2011 |
pjd |
Finally... Import the latest open-source ZFS version - (SPA) 28.
Few new things available from now on:
- Data deduplication. - Triple parity RAIDZ (RAIDZ3). - zfs diff. - zpool split. - Snapshot holds. - zpool import -F. Allows to rewind corrupted pool to earlier transaction group. - Possibility to import pool in read-only mode.
MFC after: 1 month
|
218550 |
11-Feb-2011 |
kib |
For UIO_NOCOPY case of reading request on zfs vnode, which has vm object attached, activate the page after the successful read, and free the page if read was unsuccessfull.
Freshly allocated page is not on any queue yet, and not activating (or deactivating) the page leaves it on no queue, excluding the page from pagedaemon scans and making the memory disappeared until the vnode reclaimed.
Reviewed by: avg MFC after: 1 week
|
218386 |
06-Feb-2011 |
trasz |
Make it impossible to clear the MNT_NFS4ACLS flag on ZFS filesystem by using "mount -uw".
Reviewed by: pjd MFC after: 2 weeks
|
218278 |
04-Feb-2011 |
ae |
vdev's sectorsize should not be greater than 8 Kbytes and also it should be power of 2. This prevents non-aligned access while probing vdev's labels.
PR: kern/147852 Reviewed by: pjd MFC after: 1 week
|
217588 |
19-Jan-2011 |
trasz |
Add MNT_NFS4ACLS to ZFS mount flags. It's not conditional, since there is no way to disable NFSv4 ACLs in ZFS. This should make it easier for the NFS server to figure out whether the exported filesystem supports ACLs or not.
Reviewed by: pjd MFC after: 2 weeks
|
217367 |
13-Jan-2011 |
mdf |
Re-commit the zfs sysctl(9) type-safety changes.
Thanks to dim and pjd for the pointer to zfs_context.h for building userland.
|
217332 |
12-Jan-2011 |
mdf |
Revert cddl changes for sysctl(9) until I understand why this isn't building on universe.
|
217319 |
12-Jan-2011 |
mdf |
sysctl(9) cleanup checkpoint: amd64 GENERIC builds cleanly.
Commit the zfs piece.
|
216919 |
03-Jan-2011 |
mm |
MFp4 r186485, r186859:
Fix a race by defining two tasks in the zio structure as we can still be returning from issue task when interrupt task is used.
Tested by: pjd Approved by: pjd, delphij (mentor) MFC after: 3 days
|
216378 |
11-Dec-2010 |
pjd |
Remove redundant semicolon and empty like.
|
216256 |
07-Dec-2010 |
ivoras |
Undo r216230: the interaction between saved ashift in metadata and detected ashift does not support this. With this change, pools created while stripesize=512 could not be imported when stripesize becomes larger (on the same drive).
Noticed by: pjd
|
216230 |
06-Dec-2010 |
ivoras |
Use GEOM stripesize field when calculating ashift. This will enable correct alignment on drives with large sector sizes (e.g. 4 KiB) but the implementation might need to be revisited if devices with large stripesizes appear (e.g. if RAID controllers or flash drives start using the field), probably by introducing a physsectorsize field in GEOM providers.
Discussed with: mav, mostly silence on freebsd-geom@ and freebsd-fs@
|
215401 |
16-Nov-2010 |
avg |
zfs+sendfile: populate all requested pages, not just those already cached
kern_sendfile() uses vm_rdwr() to read-ahead blocks of data to populate page cache. When sendfile stumbles upon a page that is not populated yet, it sends out all the mbufs that it collected so far. This resulted in very poor performance with ZFS when file data is not in the page cache, because ZFS vop_read for UIO_NOCOPY case populated only those pages that are already in cache, but not valid. Which means that most of the time it populated only the first requested page in the described above scenario.
Reported by: Alexander Zagrebin <alexz@visp.ru> Tested by: Alexander Zagrebin <alexz@visp.ru>, Artemiev Igor <ai@kliksys.ru> MFC after: 12 days
|
215397 |
16-Nov-2010 |
avg |
fix misspelling in a comment
Reported by: Daniel Braniss <danny@cs.huji.ac.il> MFC after: 3 days
|
215260 |
13-Nov-2010 |
mm |
Disable VFS_HOLD placed on mnt_vnodecovered during the mount of a snapshot and VFS_RELE on a non-existing hold on snapshot parent's z_vfs.
This disables the changes from OpenSolaris onnv-revision 9234:bffdc4fc05c4 (bug IDs: 6792139, 6794830) - not applicable to FreeBSD.
This fixes the process hang if umounting a manually mounted snapshot.
Reported by: Alexander Zagrebin <alexz@visp.ru> Approved by: delphij (mentor) MFC after: 1 week
|
214854 |
05-Nov-2010 |
delphij |
Validate whether the zfs_cmd_t submitted from userland is not smaller than what we have. Without the check the kernel could accessing memory that does not belong to the request struct.
Note that we do not test if the struct equals in size at this time, which may faciliate forward compatibility with newer binaries.
Reviewed by: pjd at MeetBSD CA '2010 MFC after: 1 week
|
214378 |
26-Oct-2010 |
mm |
Bugfix merge from OpenSolaris:
OpenSolaris onnv-revision: 10209:91f47f0e7728 6830541 zfs_get_data_trips on a verify 6696242 multiple zfs_fillpage() zfs: accessing past end of object panics 6785914 zfs fails to drop dn_struct_rwlock in recovery code path
Approved by: delphij (mentor) Obtained from: OpenSolaris (Bug ID 6830541, 6696242, 6785914) MFC after: 2 weeks
|
213937 |
16-Oct-2010 |
avg |
zfs: add vop_getpages method implementation
This should make vnode_pager_getpages path a bit shorter and clearer. Also this should eliminate problems with partially valid pages. Having this method opens room for future optimizations.
To do: try to satisfy other pages besides the required one taking into account tradeofs between number of page faults, read throughput and read latency. Also, eventually vop_putpages should be added too.
Reviewed by: kib, mm, pjd MFC after: 3 weeks
|
213790 |
13-Oct-2010 |
rpaulo |
In zfs_post_common(), use %d instead of %hhu.
Found with: clang
|
213730 |
12-Oct-2010 |
avg |
zfs + sendfile: do not produce partially valid pages for vnode's tail
Since r212650 and before this change sendfile(2) could produce a partially valid page for a trailing portion of a ZFS vnode. vm_fault() always wants to see a fully valid page even if it's the last page that partially extends beyond vnode's end. Otherwise it calls vop_getpages() to bring in the page. In the case of ZFS this means that the data is read from the page into the same page and this breaks checks in ZFS mappedread() - a thread that set VPO_BUSY on the page in vm_fault() will get blocked forever waiting for it to be cleared.
Many thanks to Kai and Jeremy for reproducing the issue and providing important debugging information and help.
Reported by: Kai Gallasch <gallasch@free.de>, Jeremy Chadwick <freebsd@jdc.parodius.com> Tested by: Kai Gallasch <gallasch@free.de>, Jeremy Chadwick <freebsd@jdc.parodius.com> Reviewed by: kib MFC after: 3 days To-Do: apply the same treatment to tmpfs + sendfile
|
213673 |
10-Oct-2010 |
pjd |
Provide internal ioflags() function that converts ioflag provided by FreeBSD's VFS to OpenSolaris-specific ioflag expected by ZFS. Use it for read and write operations.
Reviewed by: mm MFC after: 1 week
|
213634 |
08-Oct-2010 |
mm |
Change FAPPEND to IO_APPEND as this is a ioflag and not a fflag. This corrects writing to append-only files on ZFS.
PR: kern/149495 [1], kern/151082 [2] Submitted by: Daniel Zhelev <daniel@zhelev.biz> [1], Michael Naef <cal@linu.gs> [2] Approved by: delphij (mentor) MFC after: 1 week
|
213198 |
27-Sep-2010 |
mm |
Properly handle IO with B_FAILFAST Retry IO once with ZIO_FLAG_TRYHARD before declaring a pool faulted
OpenSolaris revision and Bug IDs:
9725:0bf7402e8022 6843014 ZFS B_FAILFAST handling is broken
Approved by: delphij (mentor) Obtained from: OpenSolaris (Bug ID 6843014) MFC after: 3 weeks
|
213197 |
27-Sep-2010 |
mm |
Enable offlining of log devices.
OpenSolaris revision and Bug IDs:
9701:cc5b64682e64 6803605 should be able to offline log devices 6726045 vdev_deflate_ratio is not set when offlining a log device 6599442 zpool import has faults in the display
Approved by: delphij (mentor) Obtained from: OpenSolaris (Bug ID 6803605, 6726045, 6599442) MFC after: 3 weeks
|
212951 |
21-Sep-2010 |
avg |
zfs_map_page/zfs_unmap_page: do not use sched_pin() and SFB_CPUPRIVATE
zfs_map_page/zfs_unmap_page are mostly called around potential I/O paths and it seems to be a not very good idea to do cpu pinning there.
Suggested by: kib MFC after: 2 weeks
|
212950 |
21-Sep-2010 |
avg |
zfs_vnops: use zfs_map_page/zfs_unmap_page helper functions in another place
MFC after: 2 weeks
|
212783 |
17-Sep-2010 |
avg |
zfs arc_reclaim_needed: fix typo in mismerge in r212780
PR: kern/146410, kern/138790 MFC after: 3 weeks X-MFC with: r212780
|
212782 |
17-Sep-2010 |
avg |
zfs+sendfile: advance uio_offset upon reading as well
Picked from analogous code in tmpfs.
MFC after: 1 week
|
212781 |
17-Sep-2010 |
avg |
zfs arc_reclaim_needed: remove redundant checks for arc_c_max and arc_c_max
Those checks are not present in upstream code and they are enforced in actual calculations of delta by which ARC size can be grown or should be reduced.
MFC after: 3 weeks
|
212780 |
17-Sep-2010 |
avg |
zfs arc_reclaim_needed: more reasonable threshold for available pages
vm_paging_target() is not a trigger of any kind for pageademon, but rather a "soft" target for it when it's already triggered. Thus, trying to keep 2048 pages above that level at the expense of ARC was simply driving ARC size into the ground even with normal memory loads. Instead, use a threshold at which a pagedaemon scan is triggered, so that ARC reclaiming helps with pagedaemon's task, but the latter still recycles active and inactive pages.
PR: kern/146410, kern/138790 MFC after: 3 weeks
|
212694 |
15-Sep-2010 |
mm |
Fix kernel panic when moving a file to .zfs/shares Fix possible loss of correct error return code in ZFS mount
OpenSolaris revisions and Bug IDs:
11824:53128e5db7cf 6863610 ZFS mount can lose correct error return
12079:13822b941977 6939941 problem with moving files in zfs (142901-12)
Approved by: delphij (mentor) Obtained from: OpenSolaris (Bug ID 6863610, 6939941) MFC after: 3 days
|
212657 |
15-Sep-2010 |
avg |
zfs vn_has_cached_data: take into account v_object->cache != NULL
This mirrors code in tmpfs. This changge shouldn't affect much read path, it may cause unnecessary vm_page_lookup calls in the case where v_object has no active or inactive pages but has some cache pages. I believe this situation to be non-essential.
In write path this change should allow us to properly detect the above case and free a cache page when we write to a range that corresponds to it. If this situation is undetected then we could have a discrepancy between data in page cache and in ARC or on disk.
This change allows us to re-enable vn_has_cached_data() check in zfs_write.
NOTE: strictly speaking resident_page_count and cache fields of v_object should be exmined under VM_OBJECT_LOCK, but for this particular usage we may get away with it.
Discussed with: alc, kib Approved by: pjd Tested with: tools/regression/fsx MFC after: 3 weeks
|
212655 |
15-Sep-2010 |
avg |
zfs mappedread, update_pages: use int for offset and length within a page
uint64_t, int64_t were redundant there
Approved by: pjd Tested by: tools/regression/fsx MFC after: 2 weeks
|
212654 |
15-Sep-2010 |
avg |
zfs mappedread: use uiomove_fromphys where possible
Reviewed by: alc Approved by: pjd Tested by: tools/regression/fsx MFC after: 2 weeks
|
212652 |
15-Sep-2010 |
avg |
zfs: catch up with vm_page_sleep_if_busy changes
Reviewed by: alc Approved by: pjd Tested by: tools/regression/fsx MFC after: 2 weeks
|
212650 |
15-Sep-2010 |
avg |
tmpfs, zfs + sendfile: mark page bits as valid after populating it with data
Otherwise, adding insult to injury, in addition to double-caching of data we would always copy the data into a vnode's vm object page from backend. This is specific to sendfile case only (VOP_READ with UIO_NOCOPY).
PR: kern/141305 Reported by: Wiktor Niesiobedzki <bsd@vink.pl> Reviewed by: alc Tested by: tools/regression/sockets/sendfile MFC after: 2 weeks
|
212611 |
14-Sep-2010 |
mm |
Remove duplicated VFS_HOLD due to a mismerge.
PR: kern/150544 Approved by: delphij (mentor) MFC after: 1 day
|
212605 |
14-Sep-2010 |
mm |
Add missing vop_vector zfsctl_ops_shares Add missing locks around VOP_READDIR and VOP_GETATTR with z_shares_dir
PR: kern/150544 Approved by: delphij (mentor) Obtained from: perforce (pjd) MFC after: 1 day
|
212573 |
13-Sep-2010 |
pjd |
Remove the page queues lock around vm_page_undirty() - it is no longer needed.
Reviewed by: alc
|
212425 |
10-Sep-2010 |
mdf |
Replace sbuf_overflowed() with sbuf_error(), which returns any error code associated with overflow or with the drain function. While this function is not expected to be used often, it produces more information in the form of an errno that sbuf_overflowed() did.
|
212385 |
09-Sep-2010 |
pjd |
On FreeBSD we can log from pool that have multiple top-level vdevs or log vdevs, so don't deny adding new vdevs if bootfs property is set.
MFC after: 2 weeks
|
212160 |
02-Sep-2010 |
gibbs |
Correct bioq_disksort so that bioq_insert_tail() offers barrier semantic. Add the BIO_ORDERED flag for struct bio and update bio clients to use it.
The barrier semantics of bioq_insert_tail() were broken in two ways:
o In bioq_disksort(), an added bio could be inserted at the head of the queue, even when a barrier was present, if the sort key for the new entry was less than that of the last queued barrier bio.
o The last_offset used to generate the sort key for newly queued bios did not stay at the position of the barrier until either the barrier was de-queued, or a new barrier (which updates last_offset) was queued. When a barrier is in effect, we know that the disk will pass through the barrier position just before the "blocked bios" are released, so using the barrier's offset for last_offset is the optimal choice.
sys/geom/sched/subr_disk.c: sys/kern/subr_disk.c: o Update last_offset in bioq_insert_tail().
o Only update last_offset in bioq_remove() if the removed bio is at the head of the queue (typically due to a call via bioq_takefirst()) and no barrier is active.
o In bioq_disksort(), if we have a barrier (insert_point is non-NULL), set prev to the barrier and cur to it's next element. Now that last_offset is kept at the barrier position, this change isn't strictly necessary, but since we have to take a decision branch anyway, it does avoid one, no-op, loop iteration in the while loop that immediately follows.
o In bioq_disksort(), bypass the normal sort for bios with the BIO_ORDERED attribute and instead insert them into the queue with bioq_insert_tail(). bioq_insert_tail() not only gives the desired command order during insertion, but also provides barrier semantics so that commands disksorted in the future cannot pass the just enqueued transaction.
sys/sys/bio.h: Add BIO_ORDERED as bit 4 of the bio_flags field in struct bio.
sys/cam/ata/ata_da.c: sys/cam/scsi/scsi_da.c Use an ordered command for SCSI/ATA-NCQ commands issued in response to bios with the BIO_ORDERED flag set.
sys/cam/scsi/scsi_da.c Use an ordered tag when issuing a synchronize cache command.
Wrap some lines to 80 columns.
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c sys/geom/geom_io.c Mark bios with the BIO_FLUSH command as BIO_ORDERED.
Sponsored by: Spectra Logic Corporation MFC after: 1 month
|
212002 |
30-Aug-2010 |
jh |
execve(2) has a special check for file permissions: a file must have at least one execute bit set, otherwise execve(2) will return EACCES even for an user with PRIV_VFS_EXEC privilege.
Add the check also to vaccess(9), vaccess_acl_nfs4(9) and vaccess_acl_posix1e(9). This makes access(2) to better agree with execve(2). Because ZFS doesn't use vaccess(9) for VEXEC, add the check to zfs_freebsd_access() too. There may be other file systems which are not using vaccess*() functions and need to be handled separately.
PR: kern/125009 Reviewed by: bde, trasz Approved by: pjd (ZFS part)
|
211948 |
28-Aug-2010 |
pjd |
Return NULL pointer instead of B_FALSE as it is done in the vendor code.
Obtained from: //depot/user/pjd/zfs/...
|
211932 |
28-Aug-2010 |
mm |
Import changes from OpenSolaris that provide - better ACL caching and speedup of ACL permission checks - faster handling of stat() - lowered mutex contention in the read/writer lock (rrwlock) - several related bugfixes
Detailed information (OpenSolaris onnv changesets and Bug IDs):
9749:105f407a2680 6802734 Support for Access Based Enumeration (not used on FreeBSD) 6844861 inconsistent xattr readdir behavior with too-small buffer
9866:ddc5f1d8eb4e 6848431 zfs with rstchown=0 or file_chown_self privilege allows user to "take" ownership
9981:b4907297e740 6775100 stat() performance on files on zfs should be improved 6827779 rrwlock is overly protective of its counters
10143:d2d432dfe597 6857433 memory leaks found at: zfs_acl_alloc/zfs_acl_node_alloc 6860318 truncate() on zfsroot succeeds when file has a component of its path set without access permission
10232:f37b85f7e03e 6865875 zfs sometimes incorrectly giving search access to a dir
10250:b179ceb34b62 6867395 zpool_upgrade_007_pos testcase panic'd with BAD TRAP: type=e (#pf Page fault)
10269:2788675568fd 6868276 zfs_rezget() can be hazardous when znode has a cached ACL
10295:f7a18a1e9610 6870564 panic in zfs_getsecattr
Approved by: delphij (mentor) Obtained from: OpenSolaris (multiple Bug IDs) MFC after: 2 weeks
|
211931 |
28-Aug-2010 |
mm |
Update ZFS metaslab code from OpenSolaris. This provides a noticeable write speedup, especially on pools with less than 30% of free space.
Detailed information (OpenSolaris onnv changesets and Bug IDs):
11146:7e58f40bcb1c 6826241 Sync write IOPS drops dramatically during TXG sync 6869229 zfs should switch to shiny new metaslabs more frequently
11728:59fdb3b856f6 6918420 zdb -m has issues printing metaslab statistics
12047:7c1fcc8419ca 6917066 zfs block picking can be improved
Approved by: delphij (mentor) Obtained from: OpenSolaris (Bug ID 6826241, 6869229, 6918420, 6917066) MFC after: 2 weeks
|
211900 |
27-Aug-2010 |
pjd |
Use ZFS_CTLDIR_NAME instead of hardcoding ".zfs".
|
211855 |
26-Aug-2010 |
pjd |
Update comment now that I finally committed r211854.
MFC after: 1 month
|
211762 |
24-Aug-2010 |
avg |
zfs arc_reclaim_thread: no need to call arc_reclaim_needed when resetting needfree
needfree is checked at the very start of arc_reclaim_needed. This change makes code easier to follow and maintain in face of potential changed in arc_reclaim_needed.
Also, put the whole sub-block under _KERNEL because needfree can be set only in kernel code.
To do: rename needfree to something else to aovid confusion with OpenSolaris global variable of the same name which is used in the same code, but has different meaning (page deficit).
Note: I have an impression that locking around accesses to this variable as well as mutual notifications between arc_reclaim_thread and arc_lowmem are not proper.
MFC after: 1 week
|
210999 |
07-Aug-2010 |
pjd |
In FreeBSD we use 'jailed' property.
MFC after: 2 weeks
|
210470 |
25-Jul-2010 |
mm |
Import two changesets from OpenSolaris to make future updates easier.
The changes do not affect FreeBSD code because zfs_znode_move(), cleanlocks() and cleanshares() are not used.
OpenSolaris onnv changeset: 9788:f660bc44f2e8, 9909:aa280f585a3e
Approved by: pjd, delphij (mentor) Obtained from: OpenSolaris (Bug ID 6843700, 6790232) MFC after: 7 weeks
|
210457 |
24-Jul-2010 |
mm |
Consider snapshots as descendants via zfs allow -d
OpenSolaris onnv changeset: 9847:2f3ba86e857a
Approved by: pjd, delphij (mentor) Obtained from: OpenSolaris (Bug ID 6809340) MFC after: 1 week
|
210427 |
23-Jul-2010 |
avg |
zfs arc_memory_throttle: available memory is free + cache
OpenSolaris freemem has the same meaning as our v_free_count + v_cache_count.
Obtained from: Artem Belevich <fbsdlist@src.cx>, Peter Jeremy <peterjeremy@acm.org> Discussed with: pjd MFC after: 2 weeks
|
210398 |
22-Jul-2010 |
mm |
Enable fake resolving of SMB RIDs by using nulldomain and UID_NOBODY - fixes panics when Solaris/OpenSolaris pools that contain files uploaded with the SMB protocol are accessed
Enable seting/unsetting the sharesmb property (dummy action) - allows users who import pools from Solaris/Opensolaris to unset the sharesmb property and get rid of annoying messages
PR: kern/145778, kern/148709 Approved by: pjd, delphij (mentor) MFC after: 7 weeks
|
210282 |
20-Jul-2010 |
mm |
To improve latency, lower default vfs.zfs.vdev.max_pending from 35 to 10
OpenSolaris onnv changeset (partial): 10801:e0bf032e8673
Approved by: pjd, delphij (mentor) Obtained from: OpenSolaris (Bug ID 6891731) MFC after: 1 week
|
210192 |
17-Jul-2010 |
nwhitehorn |
Increase stack size for ZFS sync thread. This is required to make ZFS function on 64-bit PowerPC.
Reviewed by: pjd Obtained from: OpenSolaris changeset 14653:7cf402a7f374
|
210172 |
16-Jul-2010 |
jhb |
Revert the previous commit. The race is not applicable to the lockmgr implementation in 8.0 and later as its flags field does not hold dynamic state such as waiters flags, but is only modified in lockinit() aside from VN_LOCK_*().
Discussed with: attilio
|
210171 |
16-Jul-2010 |
jhb |
When the MNTK_EXTENDED_SHARED mount option was added, some filesystems were changed to defer the setting of VN_LOCK_ASHARE() (which clears LK_NOSHARE in the vnode lock's flags) until after they had determined if the vnode was a FIFO. This occurs after the vnode has been inserted a VFS hash or some similar table, so it is possible for another thread to find this vnode via vget() on an i-node number and block on the vnode lock. If the lockmgr interlock (vnode interlock for vnode locks) is not held when clearing the LK_NOSHARE flag, then the lk_flags field can be clobbered. As a result the thread blocked on the vnode lock may never get woken up. Fix this by holding the vnode interlock while modifying the lock flags in this case.
MFC after: 3 days
|
209962 |
13-Jul-2010 |
mm |
Merge ZFS version 15 and almost all OpenSolaris bugfixes referenced in Solaris 10 updates 141445-09 and 142901-14.
Detailed information: (OpenSolaris revisions and Bug IDs, Solaris 10 patch numbers)
7844:effed23820ae 6755435 zfs_open() and zfs_close() needs to use ZFS_ENTER/ZFS_VERIFY_ZP (141445-01)
7897:e520d8258820 6748436 inconsistent zpool.cache in boot_archive could panic a zfs root filesystem upon boot-up (141445-01)
7965:b795da521357 6740164 zpool attach can create an illegal root pool (141909-02)
8084:b811cc60d650 6769612 zpool_import() will continue to write to cachefile even if altroot is set (N/A)
8121:7fd09d4ebd9c 6757430 want an option for zdb to disable space map loading and leak tracking (141445-01)
8129:e4f45a0bfbb0 6542860 ASSERT: reason != VDEV_LABEL_REMOVE||vdev_inuse(vd, crtxg, reason, 0) (141445-01)
8188:fd00c0a81e80 6761100 want zdb option to select older uberblocks (141445-01)
8190:6eeea43ced42 6774886 zfs_setattr() won't allow ndmp to restore SUNWattr_rw (141445-01)
8225:59a9961c2aeb 6737463 panic while trying to write out config file if root pool import fails (141445-01)
8227:f7d7be9b1f56 6765294 Refactor replay (141445-01)
8228:51e9ca9ee3a5 6572357 libzfs should do more to avoid mnttab lookups (141909-01) 6572376 zfs_iter_filesystems and zfs_iter_snapshots get objset stats twice (141909-01)
8241:5a60f16123ba 6328632 zpool offline is a bit too conservative (141445-01) 6739487 ASSERT: txg <= spa_final_txg due to scrub/export race (141445-01) 6767129 ASSERT: cvd->vdev_isspare, in spa_vdev_detach() (141445-01) 6747698 checksum failures after offline -t / export / import / scrub (141445-01) 6745863 ZFS writes to disk after it has been offlined (141445-01) 6722540 50% slowdown on scrub/resilver with certain vdev configurations (141445-01) 6759999 resilver logic rewrites ditto blocks on both source and destination (141445-01) 6758107 I/O should never suspend during spa_load() (141445-01) 6776548 codereview(1) runs off the page when faced with multi-line comments (N/A) 6761406 AMD errata 91 workaround doesn't work on 64-bit systems (141445-01)
8242:e46e4b2f0a03 6770866 GRUB/ZFS should require physical path or devid, but not both (141445-01)
8269:03a7e9050cfd 6674216 "zfs share" doesn't work, but "zfs set sharenfs=on" does (141445-01) 6621164 $SRC/cmd/zfs/zfs_main.c seems to have a syntax error in the translation note (141445-01) 6635482 i18n problems in libzfs_dataset.c and zfs_main.c (141445-01) 6595194 "zfs get" VALUE column is as wide as NAME (141445-01) 6722991 vdev_disk.c: error checking for ddi_pathname_to_dev_t() must test for NODEV (141445-01) 6396518 ASSERT strings shouldn't be pre-processed (141445-01)
8274:846b39508aff 6713916 scrub/resilver needlessly decompress data (141445-01)
8343:655db2375fed 6739553 libzfs_status msgid table is out of sync (141445-01) 6784104 libzfs unfairly rejects numerical values greater than 2^63 (141445-01) 6784108 zfs_realloc() should not free original memory on failure (141445-01)
8525:e0e0e525d0f8 6788830 set large value to reservation cause core dump (141445-01) 6791064 want sysevents for ZFS scrub (141445-01) 6791066 need to be able to set cachefile on faulted pools (141445-01) 6791071 zpool_do_import() should not enable datasets on faulted pools (141445-01) 6792134 getting multiple properties on a faulted pool leads to confusion (141445-01)
8547:bcc7b46e5ff7 6792884 Vista clients cannot access .zfs (141445-01)
8632:36ef517870a3 6798384 It can take a village to raise a zio (141445-01)
8636:7e4ce9158df3 6551866 deadlock between zfs_write(), zfs_freesp(), and zfs_putapage() (141909-01) 6504953 zfs_getpage() misunderstands VOP_GETPAGE() interface (141909-01) 6702206 ZFS read/writer lock contention throttles sendfile() benchmark (141445-01) 6780491 Zone on a ZFS filesystem has poor fork/exec performance (141445-01) 6747596 assertion failed: DVA_EQUAL(BP_IDENTITY(&zio->io_bp_orig), BP_IDENTITY(zio->io_bp))); (141445-01)
8692:692d4668b40d 6801507 ZFS read aggregation should not mind the gap (141445-01)
8697:e62d2612c14d 6633095 creating a filesystem with many properties set is slow (141445-01)
8768:dfecfdbb27ed 6775697 oracle crashes when overwriting after hitting quota on zfs (141909-01)
8811:f8deccf701cf 6790687 libzfs mnttab caching ignores external changes (141445-01) 6791101 memory leak from libzfs_mnttab_init (141445-01)
8845:91af0d9c0790 6800942 smb_session_create() incorrectly stores IP addresses (N/A) 6582163 Access Control List (ACL) for shares (141445-01) 6804954 smb_search - shortname field should be space padded following the NULL terminator (N/A) 6800184 Panic at smb_oplock_conflict+0x35() (N/A)
8876:59d2e67b4b65 6803822 Reboot after replacement of system disk in a ZFS mirror drops to grub> prompt (141445-01)
8924:5af812f84759 6789318 coredump when issue zdb -uuuu poolname/ (141445-01) 6790345 zdb -dddd -e poolname coredump (141445-01) 6797109 zdb: 'zdb -dddddd pool_name/fs_name inode' coredump if the file with inode was deleted (141445-01) 6797118 zdb: 'zdb -dddddd poolname inum' coredump if I miss the fs name (141445-01) 6803343 shareiscsi=on failed, iscsitgtd failed request to share (141445-01)
9030:243fd360d81f 6815893 hang mounting a dataset after booting into a new boot environment (141445-01)
9056:826e1858a846 6809691 'zpool create -f' no longer overwrites ufs infomation (141445-01)
9179:d8fbd96b79b3 6790064 zfs needs to determine uid and gid earlier in create process (141445-01)
9214:8d350e5d04aa 6604992 forced unmount + being in .zfs/snapshot/<snap1> = not happy (141909-01) 6810367 assertion failed: dvp->v_flag & VROOT, file: ../../common/fs/gfs.c, line: 426 (141909-01)
9229:e3f8b41e5db4 6807765 ztest_dsl_dataset_promote_busy needs to clean up after ENOSPC (141445-01)
9230:e4561e3eb1ef 6821169 offlining a device results in checksum errors (141445-01) 6821170 ZFS should not increment error stats for unavailable devices (141445-01) 6824006 need to increase issue and interrupt taskqs threads in zfs (141445-01)
9234:bffdc4fc05c4 6792139 recovering from a suspended pool needs some work (141445-01) 6794830 reboot command hangs on a failed zfs pool (141445-01)
9246:67c03c93c071 6824062 System panicked in zfs_mount due to NULL pointer dereference when running btts and svvs tests (141909-01)
9276:a8a7fc849933 6816124 System crash running zpool destroy on broken zpool (141445-03)
9355:09928982c591 6818183 zfs snapshot -r is slow due to set_snap_props() doing txg_wait_synced() for each new snapshot (141445-03)
9391:413d0661ef33 6710376 log device can show incorrect status when other parts of pool are degraded (141445-03)
9396:f41cf682d0d3 (part already merged) 6501037 want user/group quotas on ZFS (141445-03) 6827260 assertion failed in arc_read(): hdr == pbuf->b_hdr (141445-03) 6815592 panic: No such hold X on refcount Y from zfs_znode_move (141445-03) 6759986 zfs list shows temporary %clone when doing online zfs recv (141445-03)
9404:319573cd93f8 6774713 zfs ignores canmount=noauto when sharenfs property != off (141445-03)
9412:4aefd8704ce0 6717022 ZFS DMU needs zero-copy support (141445-03)
9425:e7ffacaec3a8 6799895 spa_add_spares() needs to be protected by config lock (141445-03) 6826466 want to post sysevents on hot spare activation (141445-03) 6826468 spa 'allowfaulted' needs some work (141445-03) 6826469 kernel support for storing vdev FRU information (141445-03) 6826470 skip posting checksum errors from DTL regions of leaf vdevs (141445-03) 6826471 I/O errors after device remove probe can confuse FMA (141445-03) 6826472 spares should enjoy some of the benefits of cache devices (141445-03)
9443:2a96d8478e95 6833711 gang leaders shouldn't have to be logical (141445-03)
9463:d0bd231c7518 6764124 want zdb to be able to checksum metadata blocks only (141445-03)
9465:8372081b8019 6830237 zfs panic in zfs_groupmember() (141445-03)
9466:1fdfd1fed9c4 6833162 phantom log device in zpool status (141445-03)
9469:4f68f041ddcd 6824968 add ZFS userquota support to rquotad (141445-03)
9470:6d827468d7b5 6834217 godfather I/O should reexecute (141445-03)
9480:fcff33da767f 6596237 Stop looking and start ganging (141909-02)
9493:9933d599bc93 6623978 lwb->lwb_buf != NULL, file ../../../uts/common/fs/zfs/zil.c, line 787, function zil_lwb_commit (141445-06)
9512:64cafcbcc337 6801810 Commit of aligned streaming rewrites to ZIL device causes unwanted disk reads (N/A)
9515:d3b739d9d043 6586537 async zio taskqs can block out userland commands (142901-09)
9554:787363635b6a 6836768 zfs_userspace() callback has no way to indicate failure (N/A)
9574:1eb6a6ab2c57 6838062 zfs panics when an error is encountered in space_map_load() (141909-02)
9583:b0696cd037cc 6794136 Panic BAD TRAP: type=e when importing degraded zraid pool. (141909-03)
9630:e25a03f552e0 6776104 "zfs import" deadlock between spa_unload() and spa_async_thread() (141445-06)
9653:a70048a304d1 6664765 Unable to remove files when using fat-zap and quota exceeded on ZFS filesystem (141445-06)
9688:127be1845343 6841321 zfs userspace / zfs get userused@ doesn't work on mounted snapshot (N/A) 6843069 zfs get userused@S-1-... doesn't work (N/A)
9873:8ddc892eca6e 6847229 assertion failed: refcount_count(&tx->tx_space_written) + delta <= tx->tx_space_towrite in dmu_tx.c (141445-06)
9904:d260bd3fd47c 6838344 kernel heap corruption detected on zil while stress testing (141445-06)
9951:a4895b3dd543 6844900 zfs_ioc_userspace_upgrade leaks (N/A)
10040:38b25aeeaf7a 6857012 zfs panics on zpool import (141445-06)
10000:241a51d8720c 6848242 zdb -e no longer works as expected (N/A)
10100:4a6965f6bef8 6856634 snv_117 not booting: zfs_parse_bootfs: error2 (141445-07)
10160:a45b03783d44 6861983 zfs should use new name <-> SID interfaces (N/A) 6862984 userquota commands can hang (141445-06)
10299:80845694147f 6696858 zfs receive of incremental replication stream can dereference NULL pointer and crash (N/A)
10302:a9e3d1987706 6696858 zfs receive of incremental replication stream can dereference NULL pointer and crash (fix lint) (N/A)
10575:2a8816c5173b (partial merge) 6882227 spa_async_remove() shouldn't do a full clear (142901-14)
10800:469478b180d9 6880764 fsync on zfs is broken if writes are greater than 32kb on a hard crash and no log attached (142901-09) 6793430 zdb -ivvvv assertion failure: bp->blk_cksum.zc_word[2] == dmu_objset_id(zilog->zl_os) (N/A)
10801:e0bf032e8673 (partial merge) 6822816 assertion failed: zap_remove_int(ds_next_clones_obj) returns ENOENT (142901-09)
10810:b6b161a6ae4a 6892298 buf->b_hdr->b_state != arc_anon, file: ../../common/fs/zfs/arc.c, line: 2849 (142901-09)
10890:499786962772 6807339 spurious checksum errors when replacing a vdev (142901-13)
11249:6c30f7dfc97b 6906110 bad trap panic in zil_replay_log_record (142901-13) 6906946 zfs replay isn't handling uid/gid correctly (142901-13)
11454:6e69bacc1a5a 6898245 suspended zpool should not cause rest of the zfs/zpool commands to hang (142901-10)
11546:42ea6be8961b (partial merge) 6833999 3-way deadlock in dsl_dataset_hold_ref() and dsl_sync_task_group_sync() (142901-09)
Discussed with: pjd Approved by: delphij (mentor) Obtained from: OpenSolaris (multiple Bug IDs) MFC after: 2 months
|
209275 |
17-Jun-2010 |
mm |
Import latest ARC change from OpenSolaris: - large ghost eviction causes high write latency - arc_adjust might adjust MRU unnecessarily - arc_adapt can lead to wild arc_p adjustment
OpenSolaris onnv-revision: 12636:13b5d698941e
Submitted by: avg Approved by: pjd, delphij (mentor) Obtained from: OpenSolaris (Bug ID 6950219, 6953403, 6951024) MFC after: 1 month
|
209261 |
17-Jun-2010 |
pjd |
Turn off UMA allocations on all archs by default. It isn't stable even on amd64.
Reported by: many MFC after: 3 days
|
209230 |
16-Jun-2010 |
pjd |
Remove redundant assignment.
MFC after: 3 days
|
209101 |
12-Jun-2010 |
mm |
Fix arc_read_done may try to byteswap undefined data (sparc related)
OpenSolaris onnv-revision: 10839:cf83b553a2ab
Obtained from: OpenSolaris (Bug ID 6836714) Approved by: pjd, delphij (mentor) MFC after: 3 days
|
209100 |
12-Jun-2010 |
mm |
Fix panic in zfs_getsecattr
OpenSolaris onnv-revision: 10295:f7a18a1e9610
Obtained from: OpenSolaris (Bug ID 6870564) Approved by: pjd, delphij (mentor) MFC after: 3 days
|
209099 |
12-Jun-2010 |
mm |
Fix possible zfs panic on zpool import
OpenSolaris onnv-revision: 10040:38b25aeeaf7a
Obtained from: OpenSolaris (Bug ID 6857012) Approved by: pjd, delphij (mentor) MFC after: 3 days
|
209098 |
12-Jun-2010 |
mm |
Fix zpool resilver stalls with spa_scrub_thread in a 3 way deadlock
OpenSolaris onnv-revision: 9997:174d75a29a1c
Obtained from: OpenSolaris (Bug ID 6843235) Approved by: pjd, delphij (mentor) MFC after: 3 days
|
209097 |
12-Jun-2010 |
mm |
Fix ZFS panic deadlock: cycle in blocking chain via zfs_zget
OpenSolaris onnv-revision: 9774:0bb234ab2287
Obtained from: OpenSolaris (Bug ID 6788152) Approved by: pjd, delphij (mentor) MFC after: 3 days
|
209096 |
12-Jun-2010 |
mm |
Fix vdev_probe() starvation brings txg train to a screeching halt
OpenSolaris onnv-revision: 9722:e3866bad4e96
Obtained from: OpenSolaris (Bug ID 6844069) Approved by: pjd, delphij (mentor) MFC after: 3 days
|
209095 |
12-Jun-2010 |
mm |
Fix incomplete resilvering after disk replacement (raidz)
OpenSolaris onnv-revision: 9434:3bebded7c76a
Obtained from: OpenSolaris (Bug ID 6794570) Approved by: pjd, delphij (mentor) MFC after: 3 days
|
209094 |
12-Jun-2010 |
mm |
Fix zfs destroy fails to free object in open context, stops up txg train
OpenSolaris onnv-revision: 9409:9dc3f17354ed
Obtained from: OpenSolaris (Bug ID 6809683) Approved by: pjd, delphij (mentor) MFC after: 3 days
|
209093 |
12-Jun-2010 |
mm |
Fix unable to remove a file over NFS after hitting refquota limit
OpenSolaris onnv-revision: 8890:8c2bd5f17bf2
Obtained from: OpenSolaris (Bug ID 6798878) Approved by: pjd, delphij (mentor) MFC after: 3 days
|
208775 |
03-Jun-2010 |
mm |
Fix freeing space after deleting large files with holes.
OpenSolaris onnv revision: 9950:78fc41aa9bc5
Approved by: pjd, delphij (mentor) Obtained from: OpenSolaris (Bug ID 6792701) MFC after: 3 days
|
208689 |
01-Jun-2010 |
mm |
Fix ZIL close when doing zfs rollback or zfs receive on a mounted dataset.
The fix is a partial import and merge of OpenSolaris onnv revisions 8227:f7d7be9b1f56. and 9292:e112194b5b73
Approved by: pjd, delphij (mentor) Obtained from: OpenSolaris (Bug ID 6798298) MFC after: 3 days
|
208683 |
31-May-2010 |
pjd |
Fix a bug where resilver is not started automatically on pool import or load. If disk was missing on pool load or import and on next pool load or import it was present, resilver wasn't started automatically and ZFS reported all disks as ONLINE and healthy. Then, when another disk died, pool became unaccessible, because if it was 2-way mirror or RAIDZ1 two vdevs were out of sync.
To fix the problem, start resilver automatically on pool load or import.
Obtained from: OpenSolaris MFC after: 3 days
|
208682 |
31-May-2010 |
pjd |
Fix panic when reading label from provider with non power of 2 sector size.
Reported by: James R. Van Artsdalen <james-freebsd-fs2@jrv.org> MFC after: 3 days
|
208474 |
23-May-2010 |
mm |
Remove kstat.zfs.arcstats.l2_write_bytes_written
The arcstats.l2_write_bytes_written kstat counter introduced in r205231 was duplicite with vendor's arcstats.l2_write_bytes counter imported in r208373 (OpenSolaris revision 8582:df9361868dbe)
Approved by: pjd, delphij (mentor) MFC after: 3 days
|
208472 |
23-May-2010 |
mm |
Fix zfs receive temporarily changing unchanged stream properties. Fix possible panic with zfs_enable_datasets.
OpenSolaris onnv revision: 8536:33bd5de3260e
Approved by: pjd, delphij (mentor) Obtained from: OpenSolaris (Bug ID 6748561, 6757075) MFC after: 3 days
|
208458 |
23-May-2010 |
pjd |
Create UMA zones unconditionally.
MFC after: 3 days
|
208454 |
23-May-2010 |
pjd |
Remove ZIO_USE_UMA from arc.c as well.
MFC after: 3 days
|
208443 |
23-May-2010 |
mm |
Fix kernel panic when calling spa_tryimport() on a corrupted pool.
OpenSolaris onnv revision: 8680:005fe27123ba
Approved by: delphij (mentor) Obtained from: OpenSolaris (Bug ID 6786321) MFC after: 1 day
|
208442 |
23-May-2010 |
mm |
Fix mutex_exit misorder that can cause a kernel panic.
OpenSolaris onnv revision: 8667:5c308a17eb7c
Approved by: delphij (mentor) Obtained from: OpenSolaris (Bug ID 6795440) MFC after: 1 day
|
208373 |
21-May-2010 |
mm |
Update L2ARC code and fix several bugs.
- improve ARC memory consumption (Bug ID 6488341) - ARC/L2ARC metadata accounting (Bug ID 6748019) - L2ARC turbo warmup (Bud ID 6748023) - kstats for ARC content (Bug ID 6748023) - kstats for evicted bytes from ARC by L2ARC state (Bud ID 6871680) - fix panic on i386 systems (Bug ID 6821260)
OpenSolaris onnv revisions: 8582:df9361868dbe, 8628:97dcded6e556, 9215:7c4584f76b47, 9274:a10f8bd993c1, 10357:29060492b29d
OpenSolaris Bug IDs: 6748019, 6748023, 6748030, 6488341, 6798268, 6821260, 6790261, 6871680
Approved by: pjd, delphij (mentor) Obtained from: OpenSlaris (multiple bug IDs) MFC after: 3 days
|
208372 |
21-May-2010 |
mm |
Reorder some already introduced locking variables.
OpenSolaris onnv revision: 8214:d7abf7c1f1c1
Approved by: pjd, delphij (mentor) Obtained from: OpenSolaris (Bug ID 6747934) MFC after: 3 days
|
208371 |
21-May-2010 |
mm |
Fix stack overflow in zfs send.
OpenSolaris onnv-revision: 8012:8ea30813950f
Approved by: pjd, delphij (mentor) Obtained from: OpenSolaris (Bug ID 6765626) MFC after: 3 days
|
208370 |
21-May-2010 |
mm |
Fix: vdev_reopen() can lead to failed allocations
OpenSolaris onnv-revision: 7980:589f37f25048
Approved by: pjd, delphij (mentor) Obtained from: OpenSolaris (Bug ID 6764914) MFC after: 3 days
|
208166 |
16-May-2010 |
pjd |
Fix userland build by making io_task available only for the kernel and by providing taskq_dispatch_safe() macro.
MFC after: 1 week
|
208148 |
16-May-2010 |
pjd |
Allow to configure UMA usage for ZIO data via loader and turn it on by default for amd64. On i386 I saw performance degradation when UMA was used, but for amd64 it should help.
MFC after: 3 days
|
208147 |
16-May-2010 |
pjd |
Add task structure to zio and use it instead of allocating one. This eliminates the only place where we can sleep when calling zio_interrupt(). As a side-effect this can actually improve performance a little as we allocate one less thing for every I/O.
Prodded by: kib MFC after: 1 week
|
208142 |
16-May-2010 |
pjd |
The whole point of having dedicated worker thread for each leaf VDEV was to avoid calling zio_interrupt() from geom_up thread context. It turns out that when provider is forcibly removed from the system and we kill worker thread there can still be some ZIOs pending. To complete pending ZIOs when there is no worker thread anymore we still have to call zio_interrupt() from geom_up context. To avoid this race just remove use of worker threads altogether. This should be more or less fine, because I also thought that zio_interrupt() does more work, but it only makes small UMA allocation with M_WAITOK. It also saves one context switch per I/O request.
PR: kern/145339 Reported by: Alex Bakhtin <Alex.Bakhtin@gmail.com> MFC after: 1 week
|
208131 |
16-May-2010 |
mm |
Fix deadlock between zfs_dirent_lock and zfs_rmdir
OpenSolaris onnv revision: 11321:506b7043a14c
Approved by: pjd, delphij (mentor) Obtained from: OpenSolaris (Bug ID 6847615) MFC after: 3 days
|
208130 |
16-May-2010 |
mm |
Fix perfomance problem with ZFS prefetch caching [1] Add statistics for ZFS prefetch (sysctl kstat.zfs.misc.zfetchstats)
Partial import of OpenSolaris onnv revision 10474:0e96dd3b905a
Reported by: jhell@dataix.net (private e-mail) [1] Approved by: pjd, delphij (mentor) Obtained from: OpenSolaris (Bug ID 6859997, 6868951) MFC after: 3 days
|
208050 |
13-May-2010 |
mm |
Fix ZIL-related panic on zfs rollback.
OpenSolaris onnv-revision: 8746:e1d96ca6808c
Approved by: pjd, delphij (mentor) Obtained from: OpenSolaris (Bug ID 6796377) MCF after: 1 week
|
208047 |
13-May-2010 |
mm |
Import OpenSolaris revision 7837:001de5627df3 It includes the following changes: - parallel reads in traversal code (Bug ID 6333409) - faster traversal for zfs send (Bug ID 6418042) - traversal code cleanup (Bug ID 6725675) - fix for two scrub related bugs (Bug ID 6729696, 6730101) - fix assertion in dbuf_verify (Bug ID 6752226) - fix panic during zfs send with i/o errors (Bug ID 6577985) - replace P2CROSS with P2BOUNDARY (Bug ID 6725680)
List of OpenSolaris Bug IDs: 6333409, 6418042, 6757112, 6725668, 6725675, 6725680, 6725698, 6729696, 6730101, 6752226, 6577985, 6755042
Approved by: pjd, delphij (mentor) Obtained from: OpenSolaris (multiple Bug IDs) MFC after: 1 week
|
208030 |
13-May-2010 |
trasz |
Add missing check to prevent local users from panicing the kernel by trying to set malformed ACL.
MFC after: 3 days
|
207956 |
12-May-2010 |
mm |
Fix possible hang when replaying large truncations.
OpenSolaris onnv revision: 7904:6a124a4ca9c5
Approved by: pjd, delphij (mentor) Obtained from: OpenSolaris (Bug ID 6761624) MFC after: 3 days
|
207936 |
11-May-2010 |
pjd |
Eventhough r203504 eliminates taste traffic provoked by vdev_geom.c, ZFS still like to open all vdevs, close them and open them again, which in turn provokes taste traffic anyway.
I don't know of any clean way to fix it, so do it the hard way - if we can't open provider for writing just retry 5 times with 0.5 pauses. This should elimitate accidental races caused by other classes tasting providers created on top of our vdevs.
MFC after: 3 days Reported by: James R. Van Artsdalen <james-freebsd-fs2@jrv.org> Reported by: Yuri Pankov <yuri.pankov@gmail.com>
|
207934 |
11-May-2010 |
pjd |
Add missing new line characters to the warnings.
MFC after: 3 days
|
207911 |
11-May-2010 |
mm |
Fix failed assertion on destroying datasets from an older pool version.
OpenSolaris onnv revision: 9390:887948510f80
PR: kern/146471 Approved by: pjd, delphij (mentor) Obtained from: OpenSolaris (Bug ID 6826861) MFC after: 3 days
|
207910 |
11-May-2010 |
mm |
Fix possible panic with zfs destroy.
OpenSolaris onnv revision: 8779:f164e0e90508
PR: kern/146471 Approved by: pjd, delphij (mentor) Obtained from: OpenSolaris (Bug ID 6784924) MFC after: 3 days
|
207909 |
11-May-2010 |
mm |
Fix zfs rename (may occasionally fail with dataset busy).
OpenSolaris onnv revision: 8517:41a0783dde17
PR: kern/146471 Approved by: pjd, delphij (mentor) Obtained from: OpenSolaris (Bug ID 6784757) MFC after: 3 days
|
207908 |
11-May-2010 |
mm |
Fix endianess bug in ZFS intent log (ZIL).
OpenSolaris onnv revision: 8109:6147a1bdd359
Approved by: pjd, delphij (mentor) Obtained from: OpenSolaris (Bug ID 6760048) MFC after: 3 days
|
207745 |
07-May-2010 |
trasz |
Enforce RLIMIT_FSIZE in ZFS.
Reviewed by: pjd@
|
207683 |
05-May-2010 |
marius |
- Fix broken symlinks on cross platform zfs send/recv. [1] - Enable zfs_ace_byteswap() on FreeBSD as it works just fine (tested between amd64 and sparc64 in both directions by Michael Moll).
PR: 146272 Approved by: mm, pjd Obtained from: OpenSolaris (onnv rev. 8283:1ca59f393041; Bug ID 6764193) [1] MFC after: 3 days
|
207670 |
05-May-2010 |
mm |
Introduce hardforce export option (-F) for "zpool export". When exporting with this flag, zpool.cache remains untouched.
OpenSolaris onnv revision: 8211:32722be6ad3b
Approved by: pjd, delphij (mentor) Obtained from: OpenSolaris (Bug ID: 6775357)
|
207626 |
04-May-2010 |
mm |
Speed up ZFS list operation with objset prefetching.
Partial import of OpenSolaris onnv revisions: 8415:8809e849f63e, 10474:0e96dd3b905a
PR: kern/146297 Submitted by: myself Approved by: pjd, delphij (mentor) Obtained from: OpenSolaris (Bug ID 6386929, 6755389, 6847118) MFC after: 2 weeks
|
207624 |
04-May-2010 |
mm |
Fix deadlock during zfs receive.
OpenSolaris onnv revision: 9299:8809e849f63e
PR: kern/146296 Submitted by: myself Approved by: pjd, delphij (mentor) Obtained from: OpenSolaris (Bug ID 6783818, 6826836) MFC after: 1 week
|
207481 |
01-May-2010 |
mm |
Add sysctl and loader tunable vfs.zfs.txg.write_limit_override. This tunable improves fine-tuning of ZFS write throttling.
PR: kern/146108 Suggested by: Nikolay Denev <ndenev at gmail.com> Approved by: pjd, delphij (mentor) MFC after: 2 weeks
|
207480 |
01-May-2010 |
mm |
Change description of tunable group vfs.zfs.txg to be more understandable.
Approved by: pjd, delphij (mentor) MFC after: 3 days
|
207427 |
30-Apr-2010 |
mm |
Fix improper pool write throughput calculation.
OpenSolaris onnv revision: 9366:17553395a745
PR: kern/146108 Approved by: pjd, delphij (mentor) Obtained from: OpenSolaris, Bug ID 6817339 MFC after: 2 weeks
|
207334 |
28-Apr-2010 |
pjd |
Backport fix for 'zfs_znode_dmu_init: existing znode for dbuf' panic from OpenSolaris.
PR: kern/144402 Reported by: Alex Bakhtin <alex.bakhtin@gmail.com> Tested by: Alex Bakhtin <alex.bakhtin@gmail.com> Obtained from: OpenSolaris, Bug ID 6895088 MFC after: 3 days
|
207068 |
22-Apr-2010 |
pjd |
Allow to modify directory's content even if the ZFS_NOUNLINK (SF_NOUNLINK, sunlnk) flag is set. We only deny dirctory's removal or rename.
PR: kern/143343 Reported by: marck MFC after: 3 days
|
206797 |
18-Apr-2010 |
pjd |
Restore previous order.
|
206796 |
18-Apr-2010 |
pjd |
Style fixes.
|
206795 |
18-Apr-2010 |
pjd |
Add missing list and lock destruction.
|
206794 |
18-Apr-2010 |
pjd |
Extend locks scope to match OpenSolaris.
|
206793 |
18-Apr-2010 |
pjd |
Remove racy assertion.
Obtained from: OpenSolaris
|
206792 |
18-Apr-2010 |
pjd |
Set ARC_L2_WRITING on L2ARC header creation.
Obtained from: OpenSolaris
|
206667 |
15-Apr-2010 |
pjd |
Fix 3-way deadlock that can happen because of ZFS and vnode lock order reversal.
thread0 (vfs_fhtovp) thread1 (vop_getattr) thread2 (zfs_recv) -------------------- --------------------- ------------------ vn_lock rrw_enter_read rrw_enter_write (hangs) rrw_enter_read (hangs) vn_lock (hangs)
Submitted by: Attila Nagy <bra@fsn.hu> MFC after: 3 days
|
205346 |
19-Mar-2010 |
pjd |
The same code is used to import and to create pool. The order of operations is the following: 1. Try to open vdev by remembered path and guid. 2. If 1 failed, try to find vdev which guid matches and ignore the path. 3. If 2 failed this means either that the vdev we're looking for is gone or that pool is being created and vdev doesn't contain proper guid yet. To be able to handle pool creation we open vdev by path anyway.
Because of 3 it is possible that we open wrong vdev on import which can lead to confusions.
The solution for this is to check spa_load_state. On pool creation it will be equal to SPA_LOAD_NONE and we can open vdev only by path immediately and if it is not equal to SPA_LOAD_NONE we first open by path+guid and when that fails, we open by guid. We no longer open wrong vdev on import.
MFC after: 2 weeks
|
205264 |
17-Mar-2010 |
kmacy |
- cache line align arcs_lock array (h/t Marius Nuennerich) - fix ARCS_LOCK_PAD to use architecture defined CACHE_LINE_SIZE - cache line align buf_hash_table ht_locks array
MFC after: 7 days
|
205253 |
17-Mar-2010 |
kmacy |
use CACHE_LINE_SIZE instead of hardcoding 128 for lock pad
pointed out by Marius Nuennerich and jhb@
|
205231 |
16-Mar-2010 |
kmacy |
- reduce contention by breaking up ARC state locks in to 16 for data and 16 for metadata - export L2ARC tunables as sysctls - add several kstats to track L2ARC state more precisely - avoid holding a contended lock when atomically incrementing a contended counter (no lock protection needed for atomics)
|
205133 |
13-Mar-2010 |
kmacy |
fix compilation under ZIO_USE_UMA
|
205132 |
13-Mar-2010 |
kmacy |
Don't bottleneck on acquiring the stream locks - this avoids a massive drop off in throughput with large numbers of simultaneous reads
MFC after: 7 days
|
205079 |
12-Mar-2010 |
pjd |
Remove bogus assertion.
Reported by: Johan Ström <johan@stromnet.se> Obtained from: OpenSolaris, Bug ID 6827260 MFC after: 1 week
|
204804 |
06-Mar-2010 |
pjd |
Remove racy assertion.
Reported by: Attila Nagy <bra@fsn.hu> Obtained from: OpenSolaris, Bug ID 6827260 MFC after: 1 week
|
204101 |
19-Feb-2010 |
pjd |
Don't set f_bsize to recordsize. It might confuse some software (like squid).
Submitted by: Alexander Zagrebin <alexz@visp.ru> MFC after: 2 weeks
|
204073 |
18-Feb-2010 |
pjd |
Add tunable and sysctl to skip hostid check on pool import.
|
203504 |
04-Feb-2010 |
pjd |
Open provider for writting when we find the right one. Opening too much providers for writing provokes huge traffic related to taste events send by GEOM on close. This can lead to various problems with opening GEOM providers that are created on top of other GEOM providers.
Reorted by: Kurt Touet <ktouet@gmail.com>, mr Tested by: mr, Baginski Darren <kickbsd@ya.ru> MFC after: 2 weeks
|
202129 |
11-Jan-2010 |
delphij |
Report ZFS filesystem version instead of the zpool version when we say it.
Reported by: Yuri Pankov (on -fs@) Submitted by: delphij Approved by: pjd MFC after: 1 week
|
201756 |
07-Jan-2010 |
delphij |
Re-apply onnv-gate revisions 7994 and 8986 (corresponds to FreeBSD revision 200726 and 200727). It looks like that the two revisions were not applied in the right sequence, I found this when comparing with the OpenSolaris code.
MFC after: 3 days Reviewed by: mm@
|
201406 |
02-Jan-2010 |
delphij |
Reduce diff against OpenSolaris - move Giant acquire/release to zfs_znode.c. As a side effect this also eliminates two potential Giant leaks.
Approved by: pjd MFC after: 1 month
|
201143 |
28-Dec-2009 |
delphij |
Apply OpenSolaris revision 8012 which brings our zpool to version 14, making it possible for zpools created on OpenSolaris 2009.06 be used on FreeBSD.
PR: kern/141800 Submitted by: mm Reviewed by: pjd, trasz Obtained from: OpenSolaris MFC after: 2 weeks
|
200727 |
19-Dec-2009 |
delphij |
Apply fix for Solaris bug 6462803: zfs snapshot -r failed because filesystem was busy (onnv revision 8989)
Submitted by: mm Approved by: pjd Obtained from: OpenSolaris MFC after: 2 weeks
|
200726 |
19-Dec-2009 |
delphij |
Apply fix for Solaris bug 6801979: zfs recv can fail with E2BIG (onnv revision 8986)
Requested by: mm Submitted by: pjd Obtained from: OpenSolaris MFC after: 2 weeks
|
200724 |
19-Dec-2009 |
delphij |
Apply fix Solaris bug 6462803 zfs snapshot -r failed because filesystem was busy.
Submitted by: mm Approved by: pjd MFC after: 2 weeks
|
200162 |
05-Dec-2009 |
kib |
Change VOP_FSYNC for zfs vnode from VOP_PANIC to zfs_freebsd_fsync(), both to not panic when fsync(2) is called for fifo on zfs filedescriptor, and to actually fsync fifo inode to permanent storage.
PR: kern/141177 Reviewed by: pjd MFC after: 1 week
|
200158 |
05-Dec-2009 |
pjd |
We have to eventually look for provider without checking guid as this is need for attaching when there is no metadata yet.
Before r200125 the order of looking for providers was wrong. It was: 1. Find provider by name. 2. Find provider by guid. 3. Find provider by name and guid.
Where it should have been: 1. Find provider by name and guid. 2. Find provider by guid. 3. Find provider by name.
MFC after: 1 week
|
200126 |
05-Dec-2009 |
pjd |
Fix deadlock when ZVOLs are present and we are replacing dead component or calling scrub when pool is in a degraded state. It will try to taste ZVOLs, which will lead to deadlock, as ZVOL will try to acquire the same locks as replace/scrub is holding already.
We can't simply skip provider based on their GEOM class, because ZVOL can have providers build on top of it and we need to skip those as well.
We do it by asking for ZFS::iszvol attribute. Any ZVOL-based provider will give us positive answer and we have to skip those providers.
This way we remove possibility to create ZFS pools on top of ZVOLs, but it is not very useful anyway.
I believe deadlock is still possible in some very complex situations like when we have MD provider on top of UFS file on top of ZVOL. When we try to replace dead component in the pool mentioned ZVOL is based on, there might be a deadlock when ZFS will try to taste MD provider. There is no easy way to detect that, but it isn't very common.
MFC after: 1 week
|
200125 |
05-Dec-2009 |
pjd |
Always check guid when opening by path, because we may end up with provider that does have the same name, but only by accident.
MFC after: 1 week
|
200124 |
05-Dec-2009 |
pjd |
Avoid using additional variable for storing an error if we are not going to do anything with it.
|
199157 |
10-Nov-2009 |
pjd |
Be careful which vattr fields are set during setattr replay. Without this fix strange things can appear after unclean shutdown like files with mode set to 07777.
Reported by: des MFC after: 3 days
|
199156 |
10-Nov-2009 |
pjd |
Avoid passing invalid mountpoint to getnewvnode().
Reported by: rwatson Tested by: rwatson MFC after: 3 days
|
198703 |
30-Oct-2009 |
pjd |
- zfs_zaccess() can handle VAPPEND too, so map V_APPEND to VAPPEND and call zfs_access() instead of vaccess() in this case as well. - If VADMIN is specified with another V* flag (unlikely) call both zfs_access() and vaccess() after spliting V* flags.
This fixes "dirtying snapshot!" panic.
PR: kern/139806 Reported by: Carl Chave <carl@chave.us> In co-operation with: jh MFC after: 3 days
|
197861 |
08-Oct-2009 |
pjd |
Allow file system owner to modify system flags if securelevel permits.
MFC after: 3 days
|
197843 |
07-Oct-2009 |
pjd |
On FreeBSD it is enough to report provider removal when orphan event is received, we don't have to do it on every ENXIO error in I/O path. Solaris has no GEOM so they have to handle it in a less clean way.
MFC after: 3 days
|
197842 |
07-Oct-2009 |
pjd |
Fix white-spaces.
MFC after: 3 days
|
197831 |
07-Oct-2009 |
pjd |
Fix situation where Mac OS X NFS client creates a file and when it tries to set ownership and mode in the same setattr operation, the mode was overwritten by secpolicy_vnode_setattr().
PR: kern/118320 Submitted by: Mark Thompson <info-gentoo@mark.thompson.bz> MFC after: 3 days
|
197816 |
06-Oct-2009 |
kmacy |
Prevent paging pressure from draining arc too much - always drain arc if above arc_c_max - never drain arc if arc is below arc_c_max
MFC after: 3 days
|
197683 |
01-Oct-2009 |
delphij |
Return EOPNOTSUPP instead of EINVAL when doing chflags(2) over an old format ZFS, as defined in the manual page.
Submitted by: pjd (response of my original patch but bugs are mine) MFC after: 3 days
|
197515 |
26-Sep-2009 |
pjd |
Handle cases where virtual (GFS) vnodes are referenced when doing forced unmount. In that case we cannot depend on the proper order of invalidating vnodes, so we have to free resources when we have a chance.
PR: kern/139062 Reported by: trasz MFC after: 3 days
|
197514 |
26-Sep-2009 |
pjd |
On lookup error VFS expects *vpp to be set to NULL, be sure to do that.
MFC after: 3 days
|
197513 |
26-Sep-2009 |
pjd |
Use traverse() function to find and return mount point's vnode instead of covered vnode when snapshot is already mounted.
MFC after: 3 days
|
197497 |
25-Sep-2009 |
pjd |
Switch to fletcher4 as the default checksum algorithm. Fletcher2 was proven to be a bit weak and OpenSolaris also switched to fletcher4.
PR: kern/139072 Reported by: Daniel Grund <bugs@dgrund.de> MFC after: 3 days
|
197459 |
24-Sep-2009 |
pjd |
Before calling vflush(FORCECLOSE) mark file system as unmounted so the following vnops will fail. This is very important, because without this change vnode could be reclaimed at any point, even if we increased usecount. The only way to ensure that vnode won't be reclaimed was to lock it, which would be very hard to do in ZFS without changing a lot of code. With this change simply increasing usecount is enough to be sure vnode won't be reclaimed from under us. To be precise it can still be reclaimed but we won't be able to see it, because every try to enter ZFS through VFS will result in EIO.
The only function that cannot return EIO, because it is needed for vflush() is zfs_root(). Introduce ZFS_ENTER_NOERROR() macro that only locks z_teardown_lock and never returns EIO.
MFC after: 3 days
|
197458 |
24-Sep-2009 |
pjd |
Close race in zfs_zget(). We have to increase usecount first and then check for VI_DOOMED flag. Before this change vnode could be reclaimed between checking for the flag and increasing usecount.
MFC after: 3 days
|
197435 |
23-Sep-2009 |
trasz |
In VOP_SETACL(9) and VOP_GETACL(9), specifying wrong ACL type should result in EINVAL, not EOPNOTSUPP.
|
197426 |
23-Sep-2009 |
pjd |
Restore BSD behaviour - when creating new directory entry use parent directory gid to set group ownership and not process gid.
This was overlooked during v6 -> v13 switch.
PR: kern/139076 Reported by: Sean Winn <sean@gothic.net.au> MFC after: 3 days
|
197351 |
20-Sep-2009 |
pjd |
Purge namecache in the same place OpenSolaris does.
|
197289 |
17-Sep-2009 |
pjd |
Purge file system namecache when receiving incremental stream and rolling back to it.
MFC after: 3 days
|
197287 |
17-Sep-2009 |
pjd |
Purge namecache for the file system being rolled back, so it doesn't point at invalid vnodes after the rollback resulting in EIO errors when trying to access files which are in the namecache.
Reported by: des MFC after: 3 days
|
197219 |
15-Sep-2009 |
pjd |
Forced unmounts work just fine in my tests under heavy load. There might still be a problem, but it isn't worth a warning.
|
197218 |
15-Sep-2009 |
pjd |
We believe ZFS is ready for production use. Remove a warning about it being experimental. :)
|
197201 |
14-Sep-2009 |
pjd |
- Mount ZFS snapshots with MNT_IGNORE flag, so they are not visible in regular df(1) and mount(8) output. This is a bit smilar to OpenSolaris and follows ZFS route of not listing snapshots by default with 'zfs list' command. - Add UPDATING entry to note that ZFS snapshots are no longer visible in mount(8) and df(1) output by default.
Reviewed by: kib MFC after: 3 days
|
197177 |
13-Sep-2009 |
pjd |
Support both case: when snapshot is already mounted and when it is not yet mounted.
MFC after: 3 days
|
197172 |
13-Sep-2009 |
pjd |
Add missing \n.
Reported by: marck
|
197167 |
13-Sep-2009 |
pjd |
Work-around READDIRPLUS problem with .zfs/ and .zfs/snapshot/ directories by just returning EOPNOTSUPP. This will allow NFS server to fall back to regular READDIR.
Note that converting inode number to snapshot's vnode is expensive operation. Snapshots are stored in AVL tree, but based on their names, not inode numbers, so to convert inode to snapshot vnode we have to interate over all snalshots.
This is not a problem in OpenSolaris, because in their READDIRPLUS implementation they use VOP_LOOKUP() on d_name, instead of VFS_VGET() on d_fileno as we do.
PR: kern/125149 Reported by: Weldon Godfrey <wgodfrey@ena.com> Analysis by: Jaakko Heinonen <jh@saunalahti.fi> MFC after: 3 days
|
197153 |
13-Sep-2009 |
pjd |
When zfs.ko is compiled with debug, make sure that znode and vnode point at each other.
MFC after: 3 days
|
197152 |
13-Sep-2009 |
pjd |
Extend scope of the z_teardown_lock lock for consistency and "just in case".
MFC after: 3 days
|
197151 |
13-Sep-2009 |
pjd |
Be sure not to overflow struct fid.
MFC after: 3 days
|
197150 |
13-Sep-2009 |
pjd |
There is a bug where mze_insert() can trigger an assert() of inserting the same entry twice. This bug is not fixed yet, but leads to situation where when try to access corrupted directory the kernel will panic. Until the bug is properly fixed, try to recover from it and log that it happened.
Reported by: marck OpenSolaris bug: 6709336 MFC after: 3 days
|
197133 |
12-Sep-2009 |
pjd |
- Protect reclaim with z_teardown_inactive_lock. - Be prepared for dbuf to disappear in zfs_reclaim_complete() and check if z_dbuf field is NULL - this might happen in case of rollback or forced unmount between zfs_freebsd_reclaim() and zfs_reclaim_complete(). - On forced unmount wait for all znodes to be destroyed - destruction can be done asynchronously via zfs_reclaim_complete().
MFC after: 1 week
|
197131 |
12-Sep-2009 |
pjd |
Tighten up the check for race in zfs_zget() - ZTOV(zp) can not only contain NULL, but also can point to dead vnode, take that into account.
PR: kern/132068 Reported by: Edward Fisk" <7ogcg7g02@sneakemail.com>, kris Fix based on patch from: Jaakko Heinonen <jh@saunalahti.fi> MFC after: 1 week
|
196985 |
08-Sep-2009 |
pjd |
Only log successful commands! Without this fix we log even unsuccessful commands executed by unprivileged users. Action is not really taken, but it is logged to pool history, which might be confusing.
Reported by: Denis Ahrens <denis@h3q.com> MFC after: 3 days
|
196982 |
08-Sep-2009 |
pjd |
We don't export individual snapshots, so mnt_export field in snapshot's mount point is NULL. That's why when we try to access snapshots over NFS use mnt_export field from the parent file system.
MFC after: 1 week
|
196980 |
08-Sep-2009 |
pjd |
When we automatically mount snapshot we want to return vnode of the mount point from the lookup and not covered vnode. This is one of the fixes for using .zfs/ over NFS.
MFC after: 1 week
|
196979 |
08-Sep-2009 |
pjd |
On FreeBSD we don't have to look for snapshot's mount point, because fhtovp method is already called with proper mount point.
MFC after: 1 week
|
196978 |
08-Sep-2009 |
pjd |
Call ZFS_EXIT() after locking the vnode.
MFC after: 1 week
|
196965 |
08-Sep-2009 |
pjd |
Fix reference count leak for a case where snapshot's mount point is updated. Such situation is not supported.
This problem was triggered by something like this:
# zpool create tank da0 # zfs snapshot tank@snap # cd /tank/.zfs/snapshot/snap (this will mount the snapshot) # cd # mount -u nosuid /tank/.zfs/snapshot/snap (refcount leak) # zpool export tank cannot export 'tank': pool is busy
MFC after: 1 week
|
196954 |
07-Sep-2009 |
pjd |
If we have to use avl_find(), optimize a bit and use avl_insert() instead of avl_add() (the latter is actually a wrapper around avl_find() + avl_insert()).
Fix similar case in the code that is currently commented out.
|
196953 |
07-Sep-2009 |
pjd |
When snapshot mount point is busy (for example we are still in it) we will fail to unmount it, but it won't be removed from the tree, so in that case there is no need to reinsert it.
This fixes a panic reproducable in the following steps:
# zfs create tank/foo # zfs snapshot tank/foo@snap # cd /tank/foo/.zfs/snapshot/snap # umount /tank/foo panic: avl_find() succeeded inside avl_add()
Reported by: trasz MFC after: 3 days
|
196949 |
07-Sep-2009 |
trasz |
Enable NFSv4 ACL support in ZFS.
Reviewed by: pjd
|
196944 |
07-Sep-2009 |
pjd |
Don't recheck ownership on update mount. This will eliminate LOR between vfs_busy() and mount mutex. We check ownership in vfs_domount() anyway.
Noticed by: kib Reviewed by: kib MFC after: 1 week
|
196941 |
07-Sep-2009 |
trasz |
Prevent the line from wrapping.
|
196927 |
07-Sep-2009 |
pjd |
Changing provider size is not really supported by GEOM, but doing so when provider is closed should be ok.
When administrator requests to change ZVOL size do it immediately if ZVOL is closed or do it on last ZVOL close.
PR: kern/136942 Requested by: Bernard Buri <bsd@ask-us.at> MFC after: 1 week
|
196919 |
07-Sep-2009 |
pjd |
bzero() on-stack argument, so mutex_init() won't misinterpret that the lock is already initialized if we have some garbage on the stack.
PR: kern/135480 Reported by: Emil Mikulic <emikulic@gmail.com> MFC after: 3 days
|
196863 |
05-Sep-2009 |
trasz |
Improve wording.
Discussed with: pjd, cperciva, rink, wkoszek and des, in order of appearance.
|
196703 |
31-Aug-2009 |
pjd |
Backport the 'dirtying dbuf' panic fix from newer ZFS version.
Reported by: Thomas Backman <serenity@exscape.org> MFC after: 1 week
|
196662 |
30-Aug-2009 |
pjd |
Add missing mountpoint vnode locking.
This fixes panic on assertion with DEBUG_VFS_LOCKS and vfs.usermount=1 when regular user tries to mount dataset owned by him.
MFC after: 1 week
|
196458 |
23-Aug-2009 |
pjd |
- Hide ZFS kernel threads under zfskern process. - Use better (shorter) threads names: 'zvol:worker zvol/tank/vol00' -> 'zvol tank/vol00' 'vdev:worker da0' -> 'vdev da0'
|
196457 |
23-Aug-2009 |
pjd |
Set priority of vdev_geom threads and zvol threads to PRIBIO.
|
196309 |
17-Aug-2009 |
pjd |
getcwd() (when __getcwd() fails) works by stating current directory, going up (..), calling readdir and looking for previous directory inode. In case of .zfs/ directory this doesn't work, because .zfs/ is hidden by default, so it won't be visible in readdir output.
Fix this by implementing VPTOCNP for snapshot directories, so __getcwd() doesn't fail and getcwd() doesn't have to use readdir method.
This fixes /bin/pwd from within .zfs/snapshot/<name>/.
Suggested by: kib Approved by: re (rwatson)
|
196307 |
17-Aug-2009 |
pjd |
Manage asynchronous vnode release just like Solaris.
Discussed with: kmacy Approved by: re (kib)
|
196303 |
17-Aug-2009 |
pjd |
- Reduce z_teardown_lock lock scope a bit. - The error variable is int, not bool. - Convert spaces to tabs where needed.
Approved by: re (kib)
|
196301 |
17-Aug-2009 |
pjd |
If z_buf is NULL, we should free znode immediately.
Noticed by: avg Approved by: re (kib)
|
196299 |
17-Aug-2009 |
pjd |
- We need to recycle vnode instead of freeing znode.
Submitted by: avg
- Add missing vnode interlock unlock. - Remove redundant znode locking.
Approved by: re (kib)
|
196297 |
17-Aug-2009 |
pjd |
Fix panic in zfs recv code. The last vnode (mountpoint's vnode) can have 0 usecount.
Reported by: Thomas Backman <serenity@exscape.org> Approved by: re (kib)
|
196295 |
17-Aug-2009 |
pjd |
Remove OpenSolaris taskq port (it performs very poorly in our kernel) and replace it with wrappers around our taskqueue(9). To make it possible implement taskqueue_member() function which returns 1 if the given thread was created by the given taskqueue.
Approved by: re (kib)
|
196291 |
17-Aug-2009 |
pjd |
- Fix a race where /dev/zfs control device is created before ZFS is fully initialized. Also destroy /dev/zfs before doing other deinitializations. - Initialization through taskq is no longer needed and there is a race where one of the zpool/zfs command loads zfs.ko and tries to do some work immediately, but /dev/zfs is not there yet.
Reported by: pav Approved by: re (kib)
|
195909 |
27-Jul-2009 |
pjd |
We don't support ephemeral IDs in FreeBSD and without this fix ZFS can panic when in zfs_fuid_create_cred() when userid is negative. It is converted to unsigned value which makes IS_EPHEMERAL() macro to incorrectly report that this is ephemeral ID. The most reasonable solution for now is to always report that the given ID is not ephemeral.
PR: kern/132337 Submitted by: Matthew West <freebsd@r.zeeb.org> Tested by: Thomas Backman <serenity@exscape.org>, Michael Reifenberger <mike@reifenberger.com> Approved by: re (kib) MFC after: 2 weeks
|
195822 |
22-Jul-2009 |
trasz |
Fix extattr_list_file(2) on ZFS in case the attribute directory doesn't exist and user doesn't have write access to the file. Without this fix, it returns bogus value instead of 0. For some reason this didn't manifest on my kernel compiled with -O0.
PR: kern/136601 Submitted by: Jaakko Heinonen <jh at saunalahti dot fi> Approved by: re (kib)
|
195785 |
20-Jul-2009 |
trasz |
Fix permission handling for extended attributes in ZFS. Without this change, ZFS uses SunOS Alternate Data Streams semantics - each EA has its own permissions, which are set at EA creation time and - unlike SunOS - invisible to the user and impossible to change. From the user point of view, it's just broken: sometimes access is granted when it shouldn't be, sometimes it's denied when it shouldn't be.
This patch makes it behave just like UFS, i.e. depend on current file permissions. Also, it fixes returned error codes (ENOATTR instead of ENOENT) and makes listextattr(2) return 0 instead of EPERM where there is no EA directory (i.e. the file never had any EA).
Reviewed by: pjd (idea, not actual code) Approved by: re (kib)
|
194586 |
21-Jun-2009 |
kib |
Add another flags argument to vn_open_cred. Use it to specify that some vn_open_cred invocations shall not audit namei path.
In particular, specify VN_OPEN_NOAUDIT for dotdot lookup performed by default implementation of vop_vptocnp, and for the open done for core file. vn_fullpath is called from the audit code, and vn_open there need to disable audit to avoid infinite recursion. Core file is created on return to user mode, that, in particular, happens during syscall return. The creation of the core file is audited by direct calls, and we do not want to overwrite audit information for syscall.
Reported, reviewed and tested by: rwatson
|
194118 |
13-Jun-2009 |
jamie |
Rename the host-related prison fields to be the same as the host.* parameters they represent, and the variables they replaced, instead of abbreviated versions of them.
Approved by: bz (mentor)
|
194043 |
11-Jun-2009 |
kmacy |
pjd has requested that I keep the tunable as zfs_prefetch_disable to minimize gratuitous differences with Opensolaris' ZFS
Sorry for the churn
|
193980 |
11-Jun-2009 |
kmacy |
check against prefetch_enable
|
193953 |
10-Jun-2009 |
kmacy |
use default policy for enabling prefetching unless the TUNABLE is set
|
193878 |
10-Jun-2009 |
kmacy |
As far as I can tell systems that have less than 4GB are more often hurt by prefetched than helped. On i386 systems and systems with less than 4GB, prefetch is now disabled by default. I've added a prefetch enable tunable, to enable prefetching for those systems. The prefetch disable tunable will continue to unconditionally disable prefetching.
|
193440 |
04-Jun-2009 |
ps |
Support shared vnode locks for write operations when the offset is provided on filesystems that support it. This really improves mysql + innodb performance on ZFS.
Reviewed by: jhb, kmacy, jeffr
|
193163 |
31-May-2009 |
dfr |
Allow the bootfs property to be set for raidz pools on FreeBSD.
Reviewed by: pjd
|
193128 |
30-May-2009 |
kmacy |
fix xdrmem_control to be safe in an if statement fix zfs to depend on krpc remove xdr from zfs makefile
Submitted by: dchagin@freebsd.org
|
192800 |
26-May-2009 |
trasz |
MFp4 changes neccessary for NFSv4 ACLs support in ZFS. This is mostly about removing a few #ifdefs and providing compatibility wrappers and VOP implementations to get and set an ACL; ZFS does ACL enforcement all by itself.
Note that the VOPs are ifdefed out for now, so this change should be a no-op.
Reviewed by: pjd
|
192689 |
24-May-2009 |
trasz |
Fix comment.
|
192360 |
19-May-2009 |
kmacy |
- back out direct map hack - it is no longer needed
|
192237 |
17-May-2009 |
kmacy |
SAVESTART implies SAVENAME
|
192211 |
16-May-2009 |
kmacy |
- allow forced unmounts - don't assume snapshot was auto-mounted
|
192209 |
16-May-2009 |
kmacy |
only use direct map if system has more than 2GB
|
192207 |
16-May-2009 |
kmacy |
apply band-aid to x86_64 systems with more physical memory than kmem by allocating from the direct map
|
191990 |
11-May-2009 |
attilio |
Remove the thread argument from the FSD (File-System Dependent) parts of the VFS. Now all the VFS_* functions and relating parts don't want the context as long as it always refers to curthread.
In some points, in particular when dealing with VOPs and functions living in the same namespace (eg. vflush) which still need to be converted, pass curthread explicitly in order to retain the old behaviour. Such loose ends will be fixed ASAP.
While here fix a bug: now, UFS_EXTATTR can be compiled alone without the UFS_EXTATTR_AUTOSTART option.
VFS KPI is heavilly changed by this commit so thirdy parts modules needs to be recompiled. Bump __FreeBSD_version in order to signal such situation.
|
191907 |
07-May-2009 |
kmacy |
don't call vn_rele_async_fini in the !_KERNEL case
|
191903 |
07-May-2009 |
kmacy |
avoid LOR and gratuitous extra lock acquisitions by moving user_evict list buffers to a temporary list
|
191902 |
07-May-2009 |
kmacy |
Allow the VM to provide backpressure on the ARC cache as it does on Solaris.
|
191900 |
07-May-2009 |
kmacy |
Asynchronously release vnodes to avoid blocking on range locks when calling back in to zfs. This is based on a fix that went in to opensolaris on March 9th. However, it uses a dedicated thread instead of a Solaris' taskq to avoid doing a blocking memory allocation with the vnode interlock held.
This fixes a long-time deadlock in ZFS. This is not, strictly speaking, an LOR. The spa_zio thread releases a vnode, this calls in to vn_reclaim which in turn needs to acquire range locks to sync dirty data out to disk. The range locks are already held by a user-level process waiting on a condition variable that it the process is waiting on a spa_zio thread to signal it on. The process could not be signalled because the spa_zio thread could not proceed.
The nature of this problem was not apparent due to ZFS locks opting out of witness which meant that DDB did not know about the locks that were held by ZFS.
Reviewed by: pjd MFC after: 7 days
|
190888 |
10-Apr-2009 |
rwatson |
Remove VOP_LEASE and supporting functions. This hasn't been used since the removal of NQNFS, but was left in in case it was required for NFSv4. Since our new NFSv4 client and server can't use it for their requirements, GC the old mechanism, as well as other unused lease- related code and interfaces.
Due to its impact on kernel programming and binary interfaces, this change should not be MFC'd.
Proposed by: jeff Reviewed by: jeff Discussed with: rmacklem, zach loafman @ isilon
|
190878 |
10-Apr-2009 |
thompsa |
Revert r190676,190677
The geom and CAM changes for root_hold are the wrong solution for USB design quirks.
Requested by: scottl
|
190676 |
03-Apr-2009 |
thompsa |
Add a how argument to root_mount_hold() so it can be passed NOWAIT and be called in situations where sleeping isnt allowed.
|
189967 |
18-Mar-2009 |
jhb |
The zfs_get_xattrdir() function is used to find the extended attribute directory for a znode. When the directory already exists, it returns a referenced but unlocked vnode. When a directory does not yet exist, it calls zfs_make_xattrdir() to create a new one. zfs_make_xattrdir() returns the vnode both referenced and and locked and zfs_get_xattrdir() was leaking this vnode lock to its callers. Fix this by dropping the vnode lock if zfs_make_xattrdir() successfully creates a new extended attribute directory.
Reviewed by: pjd
|
189696 |
11-Mar-2009 |
jhb |
Add a new internal mount flag (MNTK_EXTENDED_SHARED) to indicate that a filesystem supports additional operations using shared vnode locks. Currently this is used to enable shared locks for open() and close() of read-only file descriptors. - When an ISOPEN namei() request is performed with LOCKSHARED, use a shared vnode lock for the leaf vnode only if the mount point has the extended shared flag set. - Set LOCKSHARED in vn_open_cred() for requests that specify O_RDONLY but not O_CREAT. - Use a shared vnode lock around VOP_CLOSE() if the file was opened with O_RDONLY and the mountpoint has the extended shared flag set. - Adjust md(4) to upgrade the vnode lock on the vnode it gets back from vn_open() since it now may only have a shared vnode lock. - Don't enable shared vnode locks on FIFO vnodes in ZFS and UFS since FIFO's require exclusive vnode locks for their open() and close() routines. (My recent MPSAFE patches for UDF and cd9660 already included this change.) - Enable extended shared operations on UFS, cd9660, and UDF.
Submitted by: ups Reviewed by: pjd (ZFS bits) MFC after: 1 month
|
188588 |
13-Feb-2009 |
jhb |
Use shared vnode locks when invoking VOP_READDIR().
MFC after: 1 month
|
187830 |
28-Jan-2009 |
ed |
Last step of splitting up minor and unit numbers: remove minor().
Inside the kernel, the minor() function was responsible for obtaining the device minor number of a character device. Because we made device numbers dynamically allocated and independent of the unit number passed to make_dev() a long time ago, it was actually a misnomer. If you really want to obtain the device number, you should use dev2udev().
We already converted all the drivers to use dev2unit() to obtain the device unit number, which is still used by a lot of drivers. I've noticed not a single driver passes NULL to dev2unit(). Even if they would, its behaviour would make little sense. This is why I've removed the NULL check.
Ths commit removes minor(), minor2unit() and unit2minor() from the kernel. Because there was a naming collision with uminor(), we can rename umajor() and uminor() back to major() and minor(). This means that the makedev(3) manual page also applies to kernel space code now.
I suspect umajor() and uminor() isn't used that often in external code, but to make it easier for other parties to port their code, I've increased __FreeBSD_version to 800062.
|
185321 |
25-Nov-2008 |
trasz |
MFp4: We don't support TX_CREATE_ACL_ATTR nor TX_MKDIR_ACL_ATTR; code found in zfs_replay.c will panic if it encounters transactions of this type. Make sure we don't put these into the ZIL.
Approved by: rwatson (mentor), pjd
|
185319 |
25-Nov-2008 |
pjd |
Fix locking (file descriptor table and Giant around VFS).
Most submitted by: kib Reviewed by: kib
|
185174 |
22-Nov-2008 |
pjd |
IFp4: Don't rely on disk IDs and always use vdev guids, which means always look up for components by reading metadata. This might be slower when there are big number of disks in the system, but is definiately more reliable.
|
185172 |
22-Nov-2008 |
pjd |
IFp4: Finish implemnetation of chflags(2) for ZFS. While doing this I found that zfs_access() can only handle VREAD, VWRITE and VEXEC, for the rest we need to use vaccess(9).
|
185171 |
22-Nov-2008 |
pjd |
IFp4: Don't free pathname too soon, debugging code is still using it.
|
185029 |
17-Nov-2008 |
pjd |
Update ZFS from version 6 to 13 and bring some FreeBSD-specific changes.
This bring huge amount of changes, I'll enumerate only user-visible changes:
- Delegated Administration
Allows regular users to perform ZFS operations, like file system creation, snapshot creation, etc.
- L2ARC
Level 2 cache for ZFS - allows to use additional disks for cache. Huge performance improvements mostly for random read of mostly static content.
- slog
Allow to use additional disks for ZFS Intent Log to speed up operations like fsync(2).
- vfs.zfs.super_owner
Allows regular users to perform privileged operations on files stored on ZFS file systems owned by him. Very careful with this one.
- chflags(2)
Not all the flags are supported. This still needs work.
- ZFSBoot
Support to boot off of ZFS pool. Not finished, AFAIK.
Submitted by: dfr
- Snapshot properties
- New failure modes
Before if write requested failed, system paniced. Now one can select from one of three failure modes: - panic - panic on write error - wait - wait for disk to reappear - continue - serve read requests if possible, block write requests
- Refquota, refreservation properties
Just quota and reservation properties, but don't count space consumed by children file systems, clones and snapshots.
- Sparse volumes
ZVOLs that don't reserve space in the pool.
- External attributes
Compatible with extattr(2).
- NFSv4-ACLs
Not sure about the status, might not be complete yet.
Submitted by: trasz
- Creation-time properties
- Regression tests for zpool(8) command.
Obtained from: OpenSolaris
|
184770 |
08-Nov-2008 |
trasz |
Require write access on a directory being moved from one parent directory to another in ZFS.
Approved by: rwatson (mentor), pjd
|
184740 |
06-Nov-2008 |
trasz |
Backoff the last patch. It was overly restrictive - we want to check for write permission on target only when moving the target between two directories.
Approved by: rwatson (mentor)
|
184737 |
06-Nov-2008 |
trasz |
Change ZFS behaviour to match UFS: when moving (rename(2)) a subdirectory from one parent directory to another, in addition to the usual access checks one also needs write access to the subdirectory being moved.
Approved by: rwatson (mentor), pjd
|
184413 |
28-Oct-2008 |
trasz |
Introduce accmode_t. This is required for NFSv4 ACLs - it will be neccessary to add more V* constants, and the variables changed by this patch were often being assigned to mode_t variables, which is 16 bit.
Approved by: rwatson (mentor)
|
183754 |
10-Oct-2008 |
attilio |
Remove the struct thread unuseful argument from bufobj interface. In particular following functions KPI results modified: - bufobj_invalbuf() - bufsync()
and BO_SYNC() "virtual method" of the buffer objects set. Main consumers of bufobj functions are affected by this change too and, in particular, functions which changed their KPI are: - vinvalbuf() - g_vfs_close()
Due to the KPI breakage, __FreeBSD_version will be bumped in a later commit.
As a side note, please consider just temporary the 'curthread' argument passing to VOP_SYNC() (in bufsync()) as it will be axed out ASAP
Reviewed by: kib Tested by: Giovanni Trematerra <giovanni dot trematerra at gmail dot com>
|
183037 |
15-Sep-2008 |
pjd |
Add missing ZFS_EXIT().
PR: kern/124899 Submitted by: Masakazu Asama <m-asama@ginzado.ne.jp>
|
182905 |
10-Sep-2008 |
trasz |
Remove VSVTX, VSGID and VSUID. This should be a no-op, as VSVTX == S_ISVTX, VSGID == S_ISGID and VSUID == S_ISUID.
Approved by: rwatson (mentor)
|
182840 |
07-Sep-2008 |
pjd |
Initialize vp, so we don't call VOP_UNLOCK() with NULL vnode pointer.
Confirmed by: marcus
|
182824 |
06-Sep-2008 |
pjd |
Lock vnode exclusively around insmntque().
|
182781 |
05-Sep-2008 |
pjd |
Catch up after last insmntque() changes: - The vnode has to be locked exclusively before calling insmntque(). - Until I find a way to handle insmntque() failures use VV_FORCEINSMQ flag to force insmntque() to always succeed.
Reported by: kris, trasz, des, others Suggested by: kib Tested by: trasz
|
182371 |
28-Aug-2008 |
attilio |
Decontextualize the couplet VOP_GETATTR / VOP_SETATTR as the passed thread was always curthread and totally unuseful.
Tested by: Giovanni Trematerra <giovanni dot trematerra at gmail dot com>
|
180660 |
21-Jul-2008 |
pjd |
We want to use LBOLT instead of lbolt on FreeBSD. I've this already fixed in p4, but the fix was never integrated into HEAD.
Reported by: ed
|
179758 |
12-Jun-2008 |
ed |
Remove the $FreeBSD$ tag again, now I know fbsd:nokeywords exists.
Requested by: pjd Approved by: philip (mentor)
|
179757 |
12-Jun-2008 |
ed |
Turn dev2unit(), minor(), unit2minor() and minor2unit() into macro's.
Now that we got rid of the minor-to-unit conversion and the constraints on device minor numbers, we can convert the functions that operate on minor and unit numbers to simple macro's. The unit2minor() and minor2unit() macro's are now no-ops.
The ZFS code als defined a macro named `minor'. Change the ZFS code to use umajor() and uminor() here, as it is the correct approach to do this. Also add $FreeBSD$ to keep SVN happy.
Approved by: philip (mentor), pjd
|
179310 |
25-May-2008 |
pjd |
Fix namespace collision after src/sys/sys/file.h:1.78.
|
179280 |
24-May-2008 |
jb |
Make the zfs module depend on the opensolaris module in preparation for it to shared stuff with the DTrace modules.
|
178243 |
16-Apr-2008 |
kib |
Move the head of byte-level advisory lock list from the filesystem-specific vnode data to the struct vnode. Provide the default implementation for the vop_advlock and vop_advlockasync. Purge the locks on the vnode reclaim by using the lf_purgelocks(). The default implementation is augmented for the nfs and smbfs. In the nfs_advlock, push the Giant inside the nfs_dolock.
Before the change, the vop_advlock and vop_advlockasync have taken the unlocked vnode and dereferenced the fs-private inode data, racing with with the vnode reclamation due to forced unmount. Now, the vop_getattr under the shared vnode lock is used to obtain the inode size, and later, in the lf_advlockasync, after locking the vnode interlock, the VI_DOOMED flag is checked to prevent an operation on the doomed vnode.
The implementation of the lf_purgelocks() is submitted by dfr.
Reported by: kris Tested by: kris, pho Discussed with: jeff, dfr MFC after: 2 weeks
|
177633 |
26-Mar-2008 |
dfr |
Add the new kernel-mode NFS Lock Manager. To use it instead of the user-mode lock manager, build a kernel with the NFSLOCKD option and add '-k' to 'rpc_lockd_flags' in rc.conf.
Highlights include:
* Thread-safe kernel RPC client - many threads can use the same RPC client handle safely with replies being de-multiplexed at the socket upcall (typically driven directly by the NIC interrupt) and handed off to whichever thread matches the reply. For UDP sockets, many RPC clients can share the same socket. This allows the use of a single privileged UDP port number to talk to an arbitrary number of remote hosts.
* Single-threaded kernel RPC server. Adding support for multi-threaded server would be relatively straightforward and would follow approximately the Solaris KPI. A single thread should be sufficient for the NLM since it should rarely block in normal operation.
* Kernel mode NLM server supporting cancel requests and granted callbacks. I've tested the NLM server reasonably extensively - it passes both my own tests and the NFS Connectathon locking tests running on Solaris, Mac OS X and Ubuntu Linux.
* Userland NLM client supported. While the NLM server doesn't have support for the local NFS client's locking needs, it does have to field async replies and granted callbacks from remote NLMs that the local client has contacted. We relay these replies to the userland rpc.lockd over a local domain RPC socket.
* Robust deadlock detection for the local lock manager. In particular it will detect deadlocks caused by a lock request that covers more than one blocking request. As required by the NLM protocol, all deadlock detection happens synchronously - a user is guaranteed that if a lock request isn't rejected immediately, the lock will eventually be granted. The old system allowed for a 'deferred deadlock' condition where a blocked lock request could wake up and find that some other deadlock-causing lock owner had beaten them to the lock.
* Since both local and remote locks are managed by the same kernel locking code, local and remote processes can safely use file locks for mutual exclusion. Local processes have no fairness advantage compared to remote processes when contending to lock a region that has just been unlocked - the local lock manager enforces a strict first-come first-served model for both local and remote lockers.
Sponsored by: Isilon Systems PR: 95247 107555 115524 116679 MFC after: 2 weeks
|
177230 |
15-Mar-2008 |
pjd |
Fix mmap(2) on ZFS after some changes in VM subsystem.
Submitted by: alc Reported by: kris (originally) and many others Tested with: fsx MFC after: 1 week
|
176559 |
25-Feb-2008 |
attilio |
Axe the 'thread' argument from VOP_ISLOCKED() and lockstatus() as it is always curthread.
As KPI gets broken by this patch, manpages and __FreeBSD_version will be updated by further commits.
Tested by: Andrea Barberio <insomniac at slackware dot it>
|
176519 |
24-Feb-2008 |
attilio |
Introduce some functions in the vnode locks namespace and in the ffs namespace in order to handle lockmgr fields in a controlled way instead than spreading all around bogus stubs: - VN_LOCK_AREC() allows lock recursion for a specified vnode - VN_LOCK_ASHARE() allows lock sharing for a specified vnode
In FFS land: - BUF_AREC() allows lock recursion for a specified buffer lock - BUF_NOREC() disallows recursion for a specified buffer lock
Side note: union_subr.c::unionfs_node_update() is the only other function directly handling lockmgr fields. As this is not simple to fix, it has been left behind as "sole" exception.
|
175633 |
24-Jan-2008 |
pjd |
- Reduce how much ZFS caches by default. This is another change to mitigate 'kmem_map too small panics'. - Print two warnings if there is not enough memory and not enough address space. - Improve comment.
|
175294 |
13-Jan-2008 |
attilio |
VOP_LOCK1() (and so VOP_LOCK()) and VOP_UNLOCK() are only used in conjuction with 'thread' argument passing which is always curthread. Remove the unuseful extra-argument and pass explicitly curthread to lower layer functions, when necessary.
KPI results broken by this change, which should affect several ports, so version bumping and manpage update will be further committed.
Tested by: kris, pho, Diego Sardina <siarodx at gmail dot com>
|
175202 |
10-Jan-2008 |
attilio |
vn_lock() is currently only used with the 'curthread' passed as argument. Remove this argument and pass curthread directly to underlying VOP_LOCK1() VFS method. This modify makes the code cleaner and in particular remove an annoying dependence helping next lockmgr() cleanup. KPI results, obviously, changed.
Manpage and FreeBSD_version will be updated through further commits.
As a side note, would be valuable to say that next commits will address a similar cleanup about VFS methods, in particular vop_lock1 and vop_unlock.
Tested by: Diego Sardina <siarodx at gmail dot com>, Andrea Di Pasquale <whyx dot it at gmail dot com>
|
174049 |
28-Nov-2007 |
jb |
* Check endianness the FreeBSD way.
* Use LBOLT rather than lbolt to avoid a clash with a FreeBSD global variable.
|
173419 |
07-Nov-2007 |
pjd |
Warn if kmem_map size is set to less than 512MB. Previous warning was a bit pointless, because default is set to something around 300MB and also insufficient.
MFC after: 3 days
|
173373 |
05-Nov-2007 |
pjd |
If setting a state to anything but open state, close access to vdev. This fixes replacing drive in place, eg. zpool replace tank da1 da1. Before it complained that device is already open.
MFC after: 1 week
|
173268 |
02-Nov-2007 |
lulf |
- Add sysctl for sizeof(znode_t), which will be used by fstat(1).
Approved by: pjd (mentor)
|
173250 |
01-Nov-2007 |
pjd |
Call zil_commit() (if ZIL is not disabled) after every non-read request (BIO_WRITE and BIO_FLUSH) as it is done is Solaris. The difference is that Solaris calls it only for sync requests, but we can't say in GEOM is the request is sync or async, so we do it for every request.
MFC after: 1 week
|
172836 |
20-Oct-2007 |
julian |
Rename the kthread_xxx (e.g. kthread_create()) calls to kproc_xxx as they actually make whole processes. Thos makes way for us to add REAL kthread_create() and friends that actually make theads. it turns out that most of these calls actually end up being moved back to the thread version when it's added. but we need to make this cosmetic change first.
I'd LOVE to do this rename in 7.0 so that we can eventually MFC the new kthread_xxx() calls.
|
172645 |
14-Oct-2007 |
thompsa |
ZFS_LOG adds a newline by itself.
Pointed out by: pjd
|
172624 |
14-Oct-2007 |
thompsa |
Print the ZFS ereport to the console if vfs.zfs.debug is set to help diagnose problems with zfs-on-root since devd isnt running yet.
Reviewed by: pjd
|
172443 |
04-Oct-2007 |
pjd |
Fix lock leak leading to the 'System call <name> returning with 1 locks held' panic.
Reported by: kris Approved by: re (kensmith)
|
172135 |
10-Sep-2007 |
pjd |
Reduce the limit of vnodes on i386 when ZFS is loaded to 3/4 of the original value, so we don't run out of KVA. The default vnodes limit fits better for UFS, but ZFS allocated more file system specific memory for a vnode than UFS.
Don't touch vnodes limit if we detect it was tuned by system administrator and restore original value when ZFS is unloaded.
This isn't final fix, but before we implement something better, this will help to stabilize ZFS under heavy load on i386.
Approved by: re (bmah)
|
172130 |
10-Sep-2007 |
pjd |
After dfr@ vnode leak fix, we can allow ARC to consume more memory.
Tested by: kris Approved by: re (bmah)
|
172030 |
01-Sep-2007 |
pjd |
Use CTLFLAG_RDTUN for tunable sysctls.
Approved by: re (bmah)
|
171567 |
24-Jul-2007 |
pjd |
Update assertion after revision 1.23.
Reviewed by: dfr Approved by: re (rwatson)
|
171316 |
09-Jul-2007 |
dfr |
Correct a reference-counting mistake in the ZFS code which led to abnormal memory usage and pessimal cache performance.
Reviewed by: pjd Approved by: re (rwatson)
|
171063 |
27-Jun-2007 |
dfr |
In zfs_vget, if we fail to translate an inode number to the corresponding vnode, make sure we return an error code to the caller.
Reviewed by: pjd Approved by: re
|
170431 |
08-Jun-2007 |
pjd |
- Reduce number of atomic operations needed to be implemented in asm by implementing some of them using existing ones. - Allow to compile ZFS on all archs and use atomic operations surrounded by global mutex on archs we don't have or can't have all atomic operations needed by ZFS.
|
170281 |
04-Jun-2007 |
pjd |
Reimplement traverse() helper function: 1. Pass locking flags to VFS_ROOT(). 2. Check v_mountedhere while the vnode is locked. 3. Always return locked vnode on success.
Change 1 fixes problem reported by Stephen M. Rumble - after zfs_vfsops.c,1.9 change, zfs_root() no longer locks the vnode unconditionally and traverse() didn't pass right lock type to VFS_ROOT(). The result was that kernel paniced when .zfs/ directory was accessed via NFS.
|
170044 |
28-May-2007 |
pjd |
Adjust va_mask for setattr. FreeBSD doesn't have va_mask, so we initialize it based on individual fields beeing set. This doesn't work for setattr replay, because va_type is set there, so we add AT_TYPE flag to va_mask, which won't be accepted by zfs_setattr().
Reported by: kris
|
170040 |
28-May-2007 |
pjd |
Because we allocate componentname structures on stack, bzero() them before use just in case.
|
169929 |
24-May-2007 |
pjd |
Initialize ZFS a bit earlier and block root mounting until initialization is complete. This fixes some root-on-ZFS configurations.
Reported by: Bruno Damour <freebsd.ruomad@free.fr> Tested by: Bruno Damour <freebsd.ruomad@free.fr>
|
169884 |
22-May-2007 |
pjd |
Lock vnode on lookup. This fixes ZIL replay for rmdir/unlink/rename.
Reported by: des
|
169430 |
09-May-2007 |
pjd |
Increase debug level - this message is not that important.
|
169325 |
06-May-2007 |
pjd |
- Add missing lock destruction and remove duplicate initializations. With this change it is possible to unload zfs.ko module from WITNESS-enabled kernel. - Remove bogus comment.
|
169303 |
06-May-2007 |
pjd |
Use provider's ident to handle situations when disks are moved around and show up with different names: first try to open provider using remembered name and compare its ident, if equal, this is our provider, if not equal or there is no provider with such name, find provider with remembered ident and don't care about the name.
|
169302 |
06-May-2007 |
pjd |
MFp4: We don't need to cover vnode_pager_setsize() with the z_map_lock.
|
169199 |
02-May-2007 |
pjd |
Share-lock a vnode where possible.
|
169198 |
02-May-2007 |
pjd |
When parent directory has to be unlocked, lock it back with the same lock type. Before this change, if directory was shared-locked, it was relocked exclusively.
|
169197 |
02-May-2007 |
pjd |
Lock vnode using cn_lkflags in case the caller wants the vnode to be shared-locked.
|
169196 |
02-May-2007 |
pjd |
The getnewvnode() function sets LK_NOSHARE by default, so if we want to support shared vnodes locking, we need to remove that flag. Also add LK_CANRECURSE flag as found in nfsclient.
|
169195 |
02-May-2007 |
pjd |
ZFS should update timestamps upon the creat() of an existing file.
Obtained from: OpenSolaris Bug: http://bugs.opensolaris.org/view_bug.do?bug_id=6465105
|
169194 |
02-May-2007 |
pjd |
- Lock vnode with flags passed in as argument in zfs_vget() and zfs_root().
Pointed out by: ups Also reported by: kris
- Add comments where I'm not sure if LK_RETRY should be used.
|
169172 |
01-May-2007 |
pjd |
MFp4: Remove LK_RETRY flag when locking vnode in zfs_lookup, we don't want dead vnodes here.
Suggested by: kib
|
169170 |
01-May-2007 |
pjd |
White space fixes.
|
169167 |
01-May-2007 |
pjd |
Add a comment explaining why we call dmu_write() unconditionally, even if uiomove() fails, especially that it is different from what OpenSolaris does (I'm not entirely sure they are right).
Suggested by: darrenr
|
169108 |
29-Apr-2007 |
pjd |
- Define d_type for ".", ".." and ".zfs" directories. - Add a TODO comment where d_type is still noe defined.
|
169107 |
29-Apr-2007 |
pjd |
Oops, correct important typo in last commit.
|
169106 |
29-Apr-2007 |
pjd |
Avoid freeing NULL pointer in case of an error.
|
169087 |
29-Apr-2007 |
pjd |
Fix two use-after-free cases.
|
169059 |
26-Apr-2007 |
pjd |
MFp4: Optimize mappedwrite() and mappedread() functions to write/read as much non-mapped data as possible at once and not page-by-page. Which this change we combain I/Os, but also saves many VM_OBJECT_UNLOCK()/VM_OBJECT_LOCK() operations.
Simple 'fsx -l 33554432 -o 524288 -N 10000 /tank/fsx' test shows ~23% performance increase.
|
169057 |
26-Apr-2007 |
pjd |
- Always try to write one whole page at a time. - vm_page_undirty() is enough (instead of vm_page_set_validclean()), but it has to be called before we write the data in case someone makes page dirty after our write, but before our vm_page_undirty() call. - Always dmu_write, not matter if uiomove() succeeded, because it could partially be ok and we would lose some changes.
All good ideas from: ups
|
169056 |
26-Apr-2007 |
pjd |
MFV: Free znodes immediatelly, allowing the ARC to hold onto less memory.
Full description at: http://bugs.opensolaris.org/view_bug.do?bug_id=6543706
|
169055 |
26-Apr-2007 |
pjd |
MFV: Functions name change.
|
169028 |
24-Apr-2007 |
pjd |
ZIL (ZFS Intent Log) can be safely turned on and off at run time, because it is only used when dataset is beeing mounted to decide if log should also be opened.
|
169025 |
24-Apr-2007 |
pjd |
MFp4: Rearange the code so vobject is destroyed from reclaim() method like in all other file system on FreeBSD (instead from inactive() method).
A nice side-effect of this change, except that it speedups file system when mmaped file are often open/closed, is that it makes FreeBSD's namecache work:)
|
169024 |
24-Apr-2007 |
pjd |
MFp4: Once page is written successfully, we should clear the dirty bits. This fixes slow operations on mmaped files, because without this fix, pages were written to disk multiple times.
If one is looking for even greater speed up for such operation, he should disable ZIL (by setting vfs.zfs.zil_disable to 1 in /boot/loader.conf). Disabling ZIL makes fsx run ~9 times faster.
|
169023 |
24-Apr-2007 |
pjd |
MFp4: Reduce diff against vendor.
|
169022 |
24-Apr-2007 |
pjd |
MFp4: We have stronger 'lock already initialized' check now, so we can reduce diff against the vendor by removing bzero of this mutex.
|
168987 |
23-Apr-2007 |
bmah |
Mostly-cosmetic fixes in low-memory warning messages:
o Fix linewrap issues.
o Fix two typos (s/Recomended/Recommended/ and s/tunning/tuning/)
o Remove a couple of extra instances of the word "of".
o Update names of kmem_size variables.
Approved by: pjd
|
168978 |
23-Apr-2007 |
pjd |
Too much diff reduction. 'cmd' has to be u_long.
Reported by: delphij
|
168962 |
23-Apr-2007 |
pjd |
MFp4: Reduce diff against vendor code: - Move FreeBSD-specific code to zfs_freebsd_*() functions in zfs_vnops.c and keep original functions as similar to vendor's code as possible. - Add various includes back, now that we have them.
|
168959 |
22-Apr-2007 |
pjd |
Fix 'zpool status -v'. To get object number we should use ZFS_DIRENT_OBJ() macro, as za_first_integer field also contains type. This should be fixed in ZFS itself, but this bug is not visible on Solaris, because there, type is not stored in za_first_integer. On the other hand it will be visible on MacOS X.
Reported by: Barry Pederson <bp@barryp.org>
|
168958 |
22-Apr-2007 |
pjd |
Fix st_rdev handling (implement it, actually).
Reported by: gj
|
168926 |
21-Apr-2007 |
pjd |
MFp4:
@118370 Correct typo.
@118371 Integrate changes from vendor.
@118491 Show backtrace on unexpected code paths.
@118494 Integrate changes from vendor.
@118504 Fix sendfile(2). I had two ways of fixing it: 1. Fixing sendfile(2) itself to use VOP_GETPAGES() instead of hacking around with vn_rdwr(UIO_NOCOPY), which was suggested by ups. 2. Modify ZFS behaviour to handle this special case.
Although 1 is more correct, I've choosen 2, because hack from 1 have a side-effect of beeing faster - it reads ahead MAXBSIZE bytes instead of reading page by page. This is not easy to implement with VOP_GETPAGES(), at least not for me in this very moment.
Reported by: Andrey V. Elsukov <bu7cher@yandex.ru>
@118525 Reorganize the code to reduce diff.
@118526 This code path is expected. It is simply when file is opened with O_FSYNC flag.
Reported by: kris Reported by: Michal Suszko <dry@dry.pl>
|
168839 |
18-Apr-2007 |
pjd |
MFp4: We check for PRIV_VFS_MOUNT already in mount(2) syscall and we don't want to do the check when snapshot is automatically mounted by an unprivileged user doing lookup on a snapshot directory.
|
168821 |
17-Apr-2007 |
pjd |
Ignore hostid check for root-on-ZFS configurations. Making hostid available before the root is mounted is tricky and having it in /boot/ is not really desire.
Reported by: Zephiris <zephiris@gmail.com>
|
168775 |
16-Apr-2007 |
pjd |
Uncomment forgotten check. Without this check in-place, ZFS will panic on unload instead of returning EBUSY. This check tells if there are mounted ZFS file systems or not. We can't unload if there are mounted file systems.
Reported by: Andrey V. Elsukov <bu7cher@yandex.ru>
|
168738 |
14-Apr-2007 |
pjd |
Fix RAID-Z resilvering.
Obtained from: OpenSolaris
|
168724 |
14-Apr-2007 |
pjd |
MFp4: Hmm, it seems to work now.
|
168715 |
14-Apr-2007 |
pjd |
MFp4: Use max_ncpus, which is used in other places in the code.
|
168714 |
14-Apr-2007 |
pjd |
MFp4: Add more debug, so we can see if zpool.cache was loaded or why it wasn't loaded.
|
168713 |
14-Apr-2007 |
pjd |
MFp4: Allow to tune vfs.zfs.debug from loader.conf.
|
168712 |
14-Apr-2007 |
pjd |
MFp4: - Allow to tune number of spa_zio_* threads. - Reduce default number of spa_zio_* threads to N*spa_zio_issue plus N*spa_zio_intr threads per ZIO type, where N is the number of CPUs. - Put ZIO type number in thread's name.
|
168696 |
13-Apr-2007 |
pjd |
Fix overflow, which was causing endless loops when 32bit machine had more than 2GB of RAM. This was because our physmem is long and 'physmem*PAGESIZE' can be negative for more than 2GB of memory.
Reported by: Andrey V. Elsukov <bu7cher@yandex.ru>
It is not yet tested by Andrey, so there can be other problems, but this was definiately a bug, so I'm committing a fix now.
|
168676 |
12-Apr-2007 |
pjd |
MFp4: Synchronize with vendor (mostly 'zfs rename -r').
|
168583 |
10-Apr-2007 |
pjd |
MFp4: Allow to set zfs_recover via vfs.zfs.recover from /boot/loader.conf.
|
168582 |
10-Apr-2007 |
pjd |
MFp4: Hide under '#ifdef _KERNEL' only what's really needed.
|
168566 |
10-Apr-2007 |
pjd |
Try to stabilize ZFS with regard to memory consumption: - Allow to shrink ARC down to 16MB (instead of 64MB). - Set arc_max to 1/2 of kmem_map by default. - Start freeing things earlier when low memory situation is detected. - Serialize execution of arc_lowmem().
I decided to setup minimum ZFS memory requirements to 512MB of RAM and 256MB of kmem_map size. If there is less RAM or kmem_map, a warning will be printed. World is cruel, be no better. In other words: modern file system requires modern hardware:)
From ZFS administration guide:
"Currently the minimum amount of memory recommended to install a Solaris system is 512 Mbytes. However, for good ZFS performance, at least one Gbyte or more of memory is recommended."
|
168565 |
10-Apr-2007 |
pjd |
Reduce diff against vendor - we have now stronger check for "mutex already initialized", so we can go back to kmem_alloc().
|
168559 |
09-Apr-2007 |
pjd |
Remove unused #define.
|
168511 |
09-Apr-2007 |
pjd |
We don't have to wait for the root file system to be mounted anymore, now that kobj KPI supports operating on files loaded by the loader.
|
168510 |
09-Apr-2007 |
pjd |
Drop the Giant lock before calling zfs_domount(), which is held when mounting root file system.
|
168498 |
08-Apr-2007 |
pjd |
MFp4: Synchronize with recent OpenSolaris changes.
|
168494 |
08-Apr-2007 |
pjd |
- Use 'name=value' so it can be properly recognized by devd(8). - Use only subclass as devd's type.
|
168488 |
08-Apr-2007 |
pjd |
Take vnode pointer and hold it under znode lock, so we won't race with zfs_reclaim(). This may or may not fix problem reported by kris, but it's definiatelly better that way.
|
168481 |
07-Apr-2007 |
pjd |
Fix libzpool compilation.
Reported by: des
|
168474 |
07-Apr-2007 |
des |
Fix some type mismatches.
Reviewed by: pjd@
|
168473 |
07-Apr-2007 |
pjd |
Allow to tune maximum and minimum memory used by ARC.
|
168460 |
07-Apr-2007 |
pjd |
Add missing mutex_init() which was causing assertion panic when on clone destruction.
Reported by: kris
|
168404 |
06-Apr-2007 |
pjd |
Please welcome ZFS - The last word in file systems.
ZFS file system was ported from OpenSolaris operating system. The code in under CDDL license.
I'd like to thank all SUN developers that created this great piece of software.
Supported by: Wheel LTD (http://www.wheel.pl/) Supported by: The FreeBSD Foundation (http://www.freebsdfoundation.org/) Supported by: Sentex (http://www.sentex.net/)
|