History log of /freebsd-10-stable/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/dnode.h
Revision Date Author Comments
# 307292 14-Oct-2016 mav

MFC r305340: MFC r305337:
7004 dmu_tx_hold_zap() does dnode_hold() 7x on same object

Using a benchmark which has 32 threads creating 2 million files in the
same directory, on a machine with 16 CPU cores, I observed poor
performance. I noticed that dmu_tx_hold_zap() was using about 30% of
all CPU, and doing dnode_hold() 7 times on the same object (the ZAP
object that is being held).

dmu_tx_hold_zap() keeps a hold on the dnode_t the entire time it is
running, in dmu_tx_hold_t:txh_dnode, so it would be nice to use the
dnode_t that we already have in hand, rather than repeatedly calling
dnode_hold(). To do this, we need to pass the dnode_t down through
all the intermediate calls that dmu_tx_hold_zap() makes, making these
routines take the dnode_t* rather than an objset_t* and a uint64_t
object number. In particular, the following routines will need to have
analogous *_by_dnode() variants created:

dmu_buf_hold_noread()
dmu_buf_hold()
zap_lookup()
zap_lookup_norm()
zap_count_write()
zap_lockdir()
zap_count_write()

This can improve performance on the benchmark described above by 100%,
from 30,000 file creations per second to 60,000. (This improvement is on
top of that provided by working around the object allocation issue. Peak
performance of ~90,000 creations per second was observed with 8 CPUs;
adding CPUs past that decreased performance due to lock contention.) The
CPU used by dmu_tx_hold_zap() was reduced by 88%, from 340 CPU-seconds
to 40 CPU-seconds.

Sponsored by: Intel Corp.

Closes #109

Reviewed by: Steve Gonczi <steve.gonczi@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed by: Ned Bass <bass6@llnl.gov>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Author: Matthew Ahrens <mahrens@delphix.com>

openzfs/openzfs@d3e523d489a169ab36f9ec1b2a111a60a5563a9f


# 307126 11-Oct-2016 mav

MFC r305222: MFV r302993: 7104 increase indirect block size

illumos/illumos-gate@4b5c8e93cab28d3c65ba9d407fd8f46e3be1db1c
https://github.com/illumos/illumos-gate/commit/4b5c8e93cab28d3c65ba9d407fd8f46e3
be1db1c

https://www.illumos.org/issues/7104
The current default indirect block size is 16KB. We can improve
performance by increasing it to 128KB. This is especially helpful for
any workload that needs to read most of the metadata, e.g.
scrub/resilver, file deletion, filesystem deletion, and zfs send.
We also need to fix a few space estimation errors to make the tests
pass.

Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Reviewed by: Dan McDonald <danmcd@omniti.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Matthew Ahrens <mahrens@delphix.com>


# 299433 11-May-2016 mav

MFC r297832: MFV r297831: 6322 ZFS indirect block predictive prefetch

Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Author: Alexander Motin <mav@FreeBSD.org>

Improve speculative prefetch of indirect blocks.

Scalability of many operations on wide ZFS pool can be limited by
requirement to prefetch indirect blocks first. Recently added
asynchronous indirect block read partially helped, but did not
solve the problem completely. This patch extends existing prefetcher
functionality to explicitly work with indirect blocks.

Before this change prefetcher issued reads for up to 8MB of data in
advance. With this change it also issues indirect block reads
for up to 64MB of data in advance, so that when it will be time to
actually read those data, it can be done immediately. Alike effect
can be achieved by just increasing maximal data prefetch distance,
but at higher memory cost.

Also this change introduces indirect block prefetch for rewrite
operations, that was never done before. Previously ARC miss for
Indirect blocks regularly blocked rewrites, converting perfectly
aligned asynchronous operations into synchronous read-write pairs,
significantly reducing maximal rewrite speed.

While being there this issue was also fixed:
- prefetch was done always, even if caching for the dataset was
completely disabled.

Testing on FreeBSD with zvol on top of 6x striped 2x mirrored pool
of 12 assorted HDDs shown me such performance numbers:
------- BEFORE --------
Write 491363677 bytes/sec
Read 312430631 bytes/sec
Rewrite 97680464 bytes/sec
-------- AFTER --------
Write 493524146 bytes/sec
Read 438598079 bytes/sec
Rewrite 277506044 bytes/sec

Closes #65
Closes #80

openzfs/openzfs@792fd28ac04f78cc5e43ead2d72a96f244ea84e8


# 288549 03-Oct-2015 mav

MFC r286575: 5056 ZFS deadlock on db_mtx and dn_holds

Reviewed by: Will Andrews <willa@spectralogic.com>
Reviewed by: Matt Ahrens <mahrens@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Approved by: Dan McDonald <danmcd@omniti.com>
Author: Justin Gibbs <justing@spectralogic.com>

illumos/illumos-gate@bc9014e6a81272073b9854d9f65dd59e18d18c35


# 288541 03-Oct-2015 mav

MFC r286545:
5630 stale bonus buffer in recycled dnode_t leads to data corruption

Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: George Wilson <george@delphix.com>
Reviewed by: Will Andrews <will@freebsd.org>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Justin T. Gibbs <justing@spectralogic.com>


# 272332 30-Sep-2014 delphij

MFC r271526: MFV r271510:

Enforce 4K as smallest indirect block size (previously the smallest
indirect block size was 1K but that was never used).

This makes some space estimates more accurate and uses less memory
for some data structures.

Illumos issue:
5141 zfs minimum indirect block size is 4K

Approved by: re (gjb)


# 270809 29-Aug-2014 delphij

MFC r270383: MFV r270198:

Instead of using timestamp in the AVL, use the memory address when
comparing.

Illumos issue:
5095 panic when adding a duplicate dbuf to dn_dbufs


# 270127 18-Aug-2014 delphij

MFC r269431: MFV r269427:

In dnode_children_t, use C99's "[]" idiom for declaring the variable
sized array dnc_children at the end of the structure.

This prevents the compiler from mistakenly optimizing away accesses
beyond the array's defined size.

Illumos issue:
5038 Remove "old-style" flexible array usage in ZFS.
Author: Justin T. Gibbs <justing@spectralogic.com>


# 269845 11-Aug-2014 delphij

MFC r269229,269404,269466: MFV r269223:

Change dn->dn_dbufs from linked list to AVL tree.

Illumos issues:
4873 zvol unmap calls can take a very long time for larger datasets


# 265740 09-May-2014 delphij

MFC r264669: MFV r264666:

4374 dn_free_ranges should use range_tree_t

illumos/illumos-gate@bf16b11e8deb633dd6c4296d46e92399d1582df4


# 263390 19-Mar-2014 delphij

MFC r259813 + r259813: MFV r258374:

4171 clean up spa_feature_*() interfaces

4172 implement extensible_dataset feature for use by other zpool
features

illumos/illumos-gate@2acef22db7808606888f8f92715629ff3ba555b9


# 288549 03-Oct-2015 mav

MFC r286575: 5056 ZFS deadlock on db_mtx and dn_holds

Reviewed by: Will Andrews <willa@spectralogic.com>
Reviewed by: Matt Ahrens <mahrens@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Approved by: Dan McDonald <danmcd@omniti.com>
Author: Justin Gibbs <justing@spectralogic.com>

illumos/illumos-gate@bc9014e6a81272073b9854d9f65dd59e18d18c35


# 288541 03-Oct-2015 mav

MFC r286545:
5630 stale bonus buffer in recycled dnode_t leads to data corruption

Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: George Wilson <george@delphix.com>
Reviewed by: Will Andrews <will@freebsd.org>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Justin T. Gibbs <justing@spectralogic.com>


# 272332 30-Sep-2014 delphij

MFC r271526: MFV r271510:

Enforce 4K as smallest indirect block size (previously the smallest
indirect block size was 1K but that was never used).

This makes some space estimates more accurate and uses less memory
for some data structures.

Illumos issue:
5141 zfs minimum indirect block size is 4K

Approved by: re (gjb)


# 270809 29-Aug-2014 delphij

MFC r270383: MFV r270198:

Instead of using timestamp in the AVL, use the memory address when
comparing.

Illumos issue:
5095 panic when adding a duplicate dbuf to dn_dbufs


# 270127 18-Aug-2014 delphij

MFC r269431: MFV r269427:

In dnode_children_t, use C99's "[]" idiom for declaring the variable
sized array dnc_children at the end of the structure.

This prevents the compiler from mistakenly optimizing away accesses
beyond the array's defined size.

Illumos issue:
5038 Remove "old-style" flexible array usage in ZFS.
Author: Justin T. Gibbs <justing@spectralogic.com>


# 269845 11-Aug-2014 delphij

MFC r269229,269404,269466: MFV r269223:

Change dn->dn_dbufs from linked list to AVL tree.

Illumos issues:
4873 zvol unmap calls can take a very long time for larger datasets


# 265740 09-May-2014 delphij

MFC r264669: MFV r264666:

4374 dn_free_ranges should use range_tree_t

illumos/illumos-gate@bf16b11e8deb633dd6c4296d46e92399d1582df4


# 263390 19-Mar-2014 delphij

MFC r259813 + r259813: MFV r258374:

4171 clean up spa_feature_*() interfaces

4172 implement extensible_dataset feature for use by other zpool
features

illumos/illumos-gate@2acef22db7808606888f8f92715629ff3ba555b9