#
302408 |
|
07-Jul-2016 |
gjb |
Copy head@r302406 to stable/11 as part of the 11.0-RELEASE cycle. Prune svn:mergeinfo from the new branch, as nothing has been merged here.
Additional commits post-branch will follow.
Approved by: re (implicit) Sponsored by: The FreeBSD Foundation |
#
277185 |
|
14-Jan-2015 |
mav |
Fix overflow bug from r248577, turning 30s TRIM timeout into ~4s.
MFC after: 2 weeks
|
#
277169 |
|
14-Jan-2015 |
mav |
Reimplement TRIM throttling added in r248577.
Previous throttling implementation approached problem from the wrong side. It significantly limited useful delaying of TRIM requests and aggregation potential, while not so much controlled TRIM burstiness under heavy load.
With this change random 4K write benchmarks (probably the worst case for TRIM) show me IOPS increase by 20%, average latency reduction by 30%, peak TRIM bursts reduction by 3 times and same peak TRIM map size (memory usage).
Also the new logic does not force map size down so heavily, really allowing to keep deleted data for 32 TXG or 30 seconds under moderate load. It was practically impossible with old throttling logic, which pushed map down to only 64 segments.
Reviewed by: smh MFC after: 2 weeks Sponsored by: iXsystems, Inc.
|
#
276983 |
|
11-Jan-2015 |
mav |
When aggregating TRIM segments, move the new one to the list end.
New segment at the list head may block all TRIM requests until txg of that segment can be processed. On my random I/O tests this change reduce peak TRIM list length from 650 to 450 segments. Hopefully it should reduce TRIM burstiness when list processing is unblocked.
MFC after: 2 weeks
|
#
274619 |
|
17-Nov-2014 |
smh |
Disable TRIM on file backed ZFS vdevs and fix TRIM on init
After r265152 TRIM requests are ZIO_TYPE_FREE instead of ZIO_TYPE_IOCTL this meant file backed vdevs to attempted to process the ZIO as a write causing a panic.
We now disable TRIM on file backed vdevs and ASSERT the ZIO types supported by each vdev type to ensure we explicity support the ZIO type being processed.
Also ensure that TRIM on init is not procesed for devices which declare they didn't support TRIM via vdev_notrim.
PR: 195061, 194976, 191573 Sponsored by: Multiplay
|
#
267992 |
|
28-Jun-2014 |
hselasky |
Pull in r267961 and r267973 again. Fix for issues reported will follow.
|
#
267985 |
|
27-Jun-2014 |
gjb |
Revert r267961, r267973:
These changes prevent sysctl(8) from returning proper output, such as:
1) no output from sysctl(8) 2) erroneously returning ENOMEM with tools like truss(1) or uname(1) truss: can not get etype: Cannot allocate memory
|
#
267961 |
|
27-Jun-2014 |
hselasky |
Extend the meaning of the CTLFLAG_TUN flag to automatically check if there is an environment variable which shall initialize the SYSCTL during early boot. This works for all SYSCTL types both statically and dynamically created ones, except for the SYSCTL NODE type and SYSCTLs which belong to VNETs. A new flag, CTLFLAG_NOFETCH, has been added to be used in the case a tunable sysctl has a custom initialisation function allowing the sysctl to still be marked as a tunable. The kernel SYSCTL API is mostly the same, with a few exceptions for some special operations like iterating childrens of a static/extern SYSCTL node. This operation should probably be made into a factored out common macro, hence some device drivers use this. The reason for changing the SYSCTL API was the need for a SYSCTL parent OID pointer and not only the SYSCTL parent OID list pointer in order to quickly generate the sysctl path. The motivation behind this patch is to avoid parameter loading cludges inside the OFED driver subsystem. Instead of adding special code to the OFED driver subsystem to post-load tunables into dynamically created sysctls, we generalize this in the kernel.
Other changes: - Corrected a possibly incorrect sysctl name from "hw.cbb.intr_mask" to "hw.pcic.intr_mask". - Removed redundant TUNABLE statements throughout the kernel. - Some minor code rewrites in connection to removing not needed TUNABLE statements. - Added a missing SYSCTL_DECL(). - Wrapped two very long lines. - Avoid malloc()/free() inside sysctl string handling, in case it is called to initialize a sysctl from a tunable, hence malloc()/free() is not ready when sysctls from the sysctl dataset are registered. - Bumped FreeBSD version to indicate SYSCTL API change.
MFC after: 2 weeks Sponsored by: Mellanox Technologies
|
#
265218 |
|
02-May-2014 |
smh |
Removed pointless / duplicated call to trim_map_first.
MFC after: 1 month X-MFC-With: r265152
|
#
265152 |
|
30-Apr-2014 |
smh |
Reintroduce priority for the TRIM ZIOs instead of using the "NOW" priority
The changes how TRIM requests are generated to use ZIO_TYPE_FREE + a priority instead of ZIO_TYPE_IOCTL, until processed by vdev_geom; only then is it translated the required geom values. This reduces the amount of changes required for FREE requests to be supported by the new IO scheduler. This also eliminates the need for a specific DKIOCTRIM.
Also fixed FREE vdev child IO's from running ZIO_STAGE_VDEV_IO_DONE as part of their schedule.
As the new IO scheduler can result in a request to execute one type of IO to actually run a different type of IO it requires that zio_trim requests are processed without holding the trim map lock (tm->tm_lock), as the free request execute call may result in write request running hence triggering a trim_map_write_start call, which takes the trim map lock and hence would result in recused on no-recursive sx lock.
This is based off avg's original work, so credit to him.
MFC after: 1 month
|
#
249921 |
|
26-Apr-2013 |
smh |
Changed ZFS TRIM sysctl from vfs.zfs.trim_disable -> vfs.zfs.trim.enabled Enabled ZFS TRIM by default
Reviewed by: pjd (mentor) Approved by: pjd (mentor) MFC after: 2 weeks
|
#
248602 |
|
21-Mar-2013 |
smh |
Fix for building libzpool under i386.
Reviewed by: pjd (mentor) Approved by: pjd (mentor) MFC after: 2 weeks
|
#
248577 |
|
21-Mar-2013 |
smh |
Optimisation of TRIM processing.
Previously TRIM processing was very bursty. This was made worse by the fact that TRIM requests on SSD's are typically much slower than reads or writes. This often resulted in stalls while large numbers of TRIM's where processed.
In addition due to the way the TRIM thread was only woken by writes, deletes could stall in the queue for extensive periods of time.
This patch adds a number of controls to how often the TRIM thread for each SPA processes its outstanding delete requests. vfs.zfs.trim.timeout: Delay TRIMs by up to this many seconds vfs.zfs.trim.txg_delay: Delay TRIMs by up to this many TXGs (reduced to 32) vfs.zfs.vdev.trim_max_bytes: Maximum pending TRIM bytes for a vdev vfs.zfs.vdev.trim_max_pending: Maximum pending TRIM segments for a vdev vfs.zfs.trim.max_interval: Maximum interval between TRIM queue processing (seconds)
Given the most common TRIM implementation is ATA TRIM the current defaults are targeted at that.
Reviewed by: pjd (mentor) Approved by: pjd (mentor) MFC after: 2 weeks
|
#
248576 |
|
21-Mar-2013 |
smh |
Names the ZFS TRIM thread
Reviewed by: pjd (mentor) Approved by: pjd (mentor) MFC after: 2 weeks
|
#
248575 |
|
21-Mar-2013 |
smh |
TRIM cache devices based on time instead of TXGs. Currently, the trim module uses the same algorithm for data and cache devices when deciding to issue TRIM requests, based on how far in the past the TXG is.
Unfortunately, this is not ideal for cache devices, because the L2ARC doesn't use the concept of TXGs at all. In fact, when using a pool for reading only, the L2ARC is written but the TXG counter doesn't increase, and so no new TRIM requests are issued to the cache device.
This patch fixes the issue by using time instead of the TXG number as the criteria for trimming on cache devices. The basic delay principle stays the same, but parameters are expressed in seconds instead of TXGs. The new parameters are named trim_l2arc_limit and trim_l2arc_batch, and both default to 30 second.
Reviewed by: pjd (mentor) Approved by: pjd (mentor) Obtained from: https://github.com/dechamps/zfs/commit/17122c31ac7f82875e837019205c21651c05f8cd MFC after: 2 weeks
|
#
248574 |
|
21-Mar-2013 |
smh |
Improve TXG handling in the TRIM module. This patch adds some improvements to the way the trim module considers TXGs:
- Free ZIOs are registered with the TXG from the ZIO itself, not the current SPA syncing TXG (which may be out of date); - L2ARC are registered with a zero TXG number, as L2ARC has no concept of TXGs; - The TXG limit for issuing TRIMs is now computed from the last synced TXG, not the currently syncing TXG. Indeed, under extremely unlikely race conditions, there is a risk we could trim blocks which have been freed in a TXG that has not finished syncing, resulting in potential data corruption in case of a crash.
Reviewed by: pjd (mentor) Approved by: pjd (mentor) Obtained from: https://github.com/dechamps/zfs/commit/5b46ad40d9081d75505d6f3bf04ac652445df366 MFC after: 2 weeks
|
#
248572 |
|
21-Mar-2013 |
smh |
Add TRIM support for L2ARC.
This adds TRIM support to cache vdevs. When ARC buffers are removed from the L2ARC in arc_hdr_destroy(), arc_release() or l2arc_evict(), the size previously occupied by the buffer gets scheduled for TRIMming. As always, actual TRIMs are only issued to the L2ARC after txg_trim_limit.
Reviewed by: pjd (mentor) Approved by: pjd (mentor) Obtained from: https://github.com/dechamps/zfs/commit/31aae373994fd112256607edba7de2359da3e9dc MFC after: 2 weeks
|
#
244187 |
|
13-Dec-2012 |
smh |
Upgrades trim free request sizes before inserting them into to free map, making range consolidation much more effective particularly for small deletes.
This reduces memory used by the free map as well as reducing the number of bio requests down to geom required to process all deletes.
In tests this achieved a factor of 10 reduction of trim ranges / geom call downs.
While I'm here correct the description of zio_vdev_io_start.
PR: kern/173254 Submitted by: Steven Hartland Approved by: pjd (mentor)
|
#
240868 |
|
23-Sep-2012 |
pjd |
Add TRIM support.
The code builds a map of regions that were freed. On every write the code consults the map and eventually removes ranges that were freed before, but are now overwritten.
Freed blocks are not TRIMed immediately. There is a tunable that defines how many txg we should wait with TRIMming freed blocks (64 by default).
There is a low priority thread that TRIMs ranges when the time comes. During TRIM we keep in-flight ranges on a list to detect colliding writes - we have to delay writes that collide with in-flight TRIMs in case something will be reordered and write will reached the disk before the TRIM. We don't have to do the same for in-flight writes, as colliding writes just remove ranges to TRIM.
Sponsored by: multiplay.co.uk
This work includes some important fixes and some improvements obtained from the zfsonlinux project, including TRIMming entire vdevs on pool create/add/attach and on pool import for spare and cache vdevs.
Obtained from: zfsonlinux Submitted by: Etienne Dechamps <etienne.dechamps@ovh.net>
|