#
350996 |
|
13-Aug-2019 |
mav |
MFC kernel part of r350523, r350524, r350961: Add IOCTL to translate nvdX into nvmeY and NSID.
While very useful by itself, it also makes `nvmecontrol` not depend on hardcoded device names parsing, that in its turn makes simple to take nvdX (and potentially any other) device names as arguments.
Also added IOCTL bypass from nvdX to respective nvmeYnsZ makes them interchangeable for management purposes.
|
#
346244 |
|
15-Apr-2019 |
mav |
MFC r344642 (by imp): Unconditionally support unmapped BIOs. This was another shim for supporting older kernels. However, all supported versions of FreeBSD have unmapped I/Os (as do several that have gone EOL), remove it. It's unlikely the driver would work on the older kernels anyway at this point.
|
#
346242 |
|
15-Apr-2019 |
mav |
MFC r342862 (by chuck): Add NVMe drive to NOIOB quirk list
Dell-branded Intel P4600 NVMe drives benefit from NVMe 1.3's NOIOB feature. Unfortunately just like Intel DC P4500s, they don't advertise themselves as benefiting from this...
This changes adds P4600s to the existing list of old drives which benefit from striping.
|
#
346240 |
|
15-Apr-2019 |
mav |
MFC r340412: Use atomic_load_acq_int() here too to poll done, ala r328521
|
#
335151 |
|
14-Jun-2018 |
mav |
MFC r332897 (by imp), r333123: Migrate to make_dev_s interface to populate /dev/nvmeX entries
|
#
335143 |
|
14-Jun-2018 |
mav |
MFC r330953 (by imp): Don't make the namespace devices eternal.
We'll need to delete namespaces soon, so go ahead and stop making these devices eternal. It doesn't help much, and will be getting in the way soon.
|
#
332824 |
|
20-Apr-2018 |
imp |
MFC r332780,r332783: Intel drives have an optimal alignment for I/O. While they honor I/Os that cross this boundary, they perform better when this isn't the case. Intel uses the 3rd byte in the vendor specific area for this. The DC P3500 was previously listed without any explanation. Add the DC P3520 and DC P4500 to the list.
There won't be any others drives needing this quirk. Intel has standardized a field in the namespace data in 1.3 (noiob). A future patch will use that if it exists, with fallback to this method.
Submitted by: Keith Busch Reviewed by: jimharris@ [[ plus tweak comments from 332783 ]]
Sponsored by: Netflix
|
#
331722 |
|
29-Mar-2018 |
eadler |
Revert r330897:
This was intended to be a non-functional change. It wasn't. The commit message was thus wrong. In addition it broke arm, and merged crypto related code.
Revert with prejudice.
This revert skips files touched in r316370 since that commit was since MFCed. This revert also skips files that require $FreeBSD$ property changes.
Thank you to those who helped me get out of this mess including but not limited to gonzo, kevans, rgrimes.
Requested by: gjb (re)
|
#
330897 |
|
14-Mar-2018 |
eadler |
Partial merge of the SPDX changes
These changes are incomplete but are making it difficult to determine what other changes can/should be merged.
No objections from: pfg
|
#
328689 |
|
01-Feb-2018 |
mav |
MFC r322902 (by imp): NVME Namespace ID is 32-bits, so widen interface to reflect that.
|
#
328676 |
|
01-Feb-2018 |
mav |
MFC r314884 (by imp): Make multi-namespace nvme drives more robust.
Fix assumptions about name spaces in NVME driver. First, it assumes cdata.nn is the number of configured devices. However, it is the number of supported name spaces. Second, it assumes that there will never be more than 16 name spaces supported, but a certain drive I'm testing reports 1024. It assumes that name spaces are a tightly packed namespace, but the standard seems to indicate otherwise. Finally, it assumes that an error would be generated when quearying an unconfigured namespace. Instead, it succeeds but the identify data is all zeros.
Fix these by limiting the number of name spaces we probe to 16. Remove aborting when we find one in error. When the size of the name space is zero, ignore it.
This is admittedly a bandaide. The long term fix will be to participate in the enumeration and name space change protocols definfed in the NVNe standard.
|
#
302408 |
|
07-Jul-2016 |
gjb |
Copy head@r302406 to stable/11 as part of the 11.0-RELEASE cycle. Prune svn:mergeinfo from the new branch, as nothing has been merged here.
Additional commits post-branch will follow.
Approved by: re (implicit) Sponsored by: The FreeBSD Foundation |
#
296617 |
|
10-Mar-2016 |
mav |
Revert r292074 (by smh): Limit stripesize reported from nvd(4) to 4K
I believe that this patch handled the problem from the wrong side. Instead of making ZFS properly handle large stripe sizes, it made unrelated driver to lie in reported parameters to workaround that.
Alternative solution for this problem from ZFS side was committed at r296615.
Discussed with: smh
|
#
292074 |
|
11-Dec-2015 |
smh |
Limit stripesize reported from nvd(4) to 4K
Intel NVMe controllers have a slow path for I/Os that span a 128KB stripe boundary but ZFS limits ashift, which is derived from d_stripesize, to 13 (8KB) so we limit the stripesize reported to geom(8) to 4KB.
This may result in a small number of additional I/Os to require splitting in nvme(4), however the NVMe I/O path is very efficient so these additional I/Os will cause very minimal (if any) difference in performance or CPU utilisation.
This can be controller by the new sysctl kern.nvme.max_optimal_sectorsize.
MFC after: 1 week Sponsored by: Multiplay Differential Revision: https://reviews.freebsd.org/D4446
|
#
290199 |
|
30-Oct-2015 |
jimharris |
nvd, nvme: report stripesize through GEOM disk layer
MFC after: 3 days Sponsored by: Intel
|
#
290198 |
|
30-Oct-2015 |
jimharris |
nvme: fix race condition in split bio completion path
Fixes race condition observed under following circumstances:
1) I/O split on 128KB boundary with Intel NVMe controller. Current Intel controllers produce better latency when I/Os do not span a 128KB boundary - even if the I/O size itself is less than 128KB. 2) Per-CPU I/O queues are enabled. 3) Child I/Os are submitted on different submission queues. 4) Interrupts for child I/O completions occur almost simultaneously. 5) ithread for child I/O A increments bio_inbed, then immediately is preempted (rendezvous IPI, higher priority interrupt). 6) ithread for child I/O B increments bio_inbed, then completes parent bio since all children are now completed. 7) parent bio is freed, and immediately reallocated for a VFS or gpart bio (including setting bio_children to 1 and clearing bio_driver1). 8) ithread for child I/O A resumes processing. bio_children for what it thinks is the parent bio is set to 1, so it thinks it needs to complete the parent bio.
Result is either calling a NULL callback function, or double freeing the bio to its uma zone.
PR: 203746 Reported by: Drew Gallatin <gallatin@netflix.com>, Marc Goroff <mgoroff@quorum.net> Tested by: Drew Gallatin <gallatin@netflix.com> MFC after: 3 days Sponsored by: Intel
|
#
281283 |
|
08-Apr-2015 |
jimharris |
nvme: remove CHATHAM related code
Chatham was an internal NVMe prototype board used for early driver development.
MFC after: 1 week Sponsored by: Intel
|
#
257534 |
|
01-Nov-2013 |
jimharris |
Create a unique unit number for each controller and namespace cdev.
Sponsored by: Intel MFC after: 3 days
|
#
256169 |
|
08-Oct-2013 |
jimharris |
Fix the LINT build.
Approved by: re (implicit) MFC after: 1 week
|
#
256151 |
|
08-Oct-2013 |
jimharris |
Add driver-assisted striping for upcoming Intel NVMe controllers that can benefit from it.
Sponsored by: Intel Reviewed by: kib (earlier version), carl Approved by: re (hrs) MFC after: 1 week
|
#
254389 |
|
15-Aug-2013 |
ken |
Change the way that unmapped I/O capability is advertised.
The previous method was to set the D_UNMAPPED_IO flag in the cdevsw for the driver. The problem with this is that in many cases (e.g. sa(4)) there may be some instances of the driver that can handle unmapped I/O and some that can't. The isp(4) driver can handle unmapped I/O, but the esp(4) driver currently cannot. The cdevsw is shared among all driver instances.
So instead of setting a flag on the cdevsw, set a flag on the cdev. This allows drivers to indicate support for unmapped I/O on a per-instance basis.
sys/conf.h: Remove the D_UNMAPPED_IO cdevsw flag and replace it with an SI_UNMAPPED cdev flag.
kern_physio.c: Look at the cdev SI_UNMAPPED flag to determine whether or not a particular driver can handle unmapped I/O.
geom_dev.c: Set the SI_UNMAPPED flag for all GEOM cdevs. Since GEOM will create a temporary mapping when needed, setting SI_UNMAPPED unconditionally will work.
Remove the D_UNMAPPED_IO flag.
nvme_ns.c: Set the SI_UNMAPPED flag on cdevs created here if NVME_UNMAPPED_BIO_SUPPORT is enabled.
vfs_aio.c: In aio_qphysio(), check the SI_UNMAPPED flag on a cdev instead of the D_UNMAPPED_IO flag on the cdevsw.
sys/param.h: Bump __FreeBSD_version to 1000045 for the switch from setting the D_UNMAPPED_IO flag in the cdevsw to setting SI_UNMAPPED in the cdev.
Reviewed by: kib, jimharris MFC after: 1 week Sponsored by: Spectra Logic
|
#
253474 |
|
19-Jul-2013 |
jimharris |
Fix nvme(4) and nvd(4) to support non 512-byte sector sizes.
Recent testing with QEMU that has variable sector size support for NVMe uncovered some of these issues. Chatham prototype boards supported only 512 byte sectors.
Sponsored by: Intel Reviewed by: carl MFC after: 3 days
|
#
253112 |
|
09-Jul-2013 |
jimharris |
Update copyright dates.
MFC after: 3 days
|
#
249422 |
|
12-Apr-2013 |
jimharris |
Remove the NVME_IDENTIFY_CONTROLLER and NVME_IDENTIFY_NAMESPACE IOCTLs and replace them with the NVMe passthrough equivalent.
Sponsored by: Intel
|
#
249421 |
|
12-Apr-2013 |
jimharris |
Add support for passthrough NVMe commands.
This includes a new IOCTL to support a generic method for nvmecontrol(8) to pass IDENTIFY, GET_LOG_PAGE, GET_FEATURES and other commands to the controller, rather than separate IOCTLs for each.
Sponsored by: Intel
|
#
249419 |
|
12-Apr-2013 |
jimharris |
Remove the NVMe-specific physio and associated routines.
These were added early on for benchmarking purposes to avoid the mapped I/O penalties incurred in kern_physio. Now that FreeBSD (including kern_physio) supports unmapped I/O, the need for these NVMe-specific routines no longer exists.
Sponsored by: Intel
|
#
249418 |
|
12-Apr-2013 |
jimharris |
Add a mutex to each namespace, for general locking operations on the namespace.
Sponsored by: Intel
|
#
248977 |
|
01-Apr-2013 |
jimharris |
Add unmapped bio support to nvme(4) and nvd(4).
Sponsored by: Intel
|
#
248835 |
|
28-Mar-2013 |
jimharris |
Remove obsolete comment. This code has now been tested with the QEMU NVMe device emulator.
|
#
248773 |
|
26-Mar-2013 |
jimharris |
Clean up debug prints.
1) Consistently use device_printf. 2) Make dump_completion and dump_command into something more human-readable.
Sponsored by: Intel Reviewed by: carl
|
#
248770 |
|
26-Mar-2013 |
jimharris |
Change a number of malloc(9) calls to use M_WAITOK instead of M_NOWAIT.
Sponsored by: Intel Suggested by: carl Reviewed by: carl
|
#
248769 |
|
26-Mar-2013 |
jimharris |
Replace usages of mtx_pool_find used for admin commands with a polling mechanism.
Now that all requests are timed, we are guaranteed to get a completion notification, even if it is an abort status due to a timed out admin command.
This has the effect of simplifying the controller and namespace setup code, so that it reads straight through rather than broken up into a bunch of different callback functions.
Sponsored by: Intel Reviewed by: carl
|
#
248756 |
|
26-Mar-2013 |
jimharris |
Create struct nvme_status.
NVMe error log entries include status, so breaking this out into its own data structure allows it to be included in both the nvme_completion data structure as well as error log entry data structures.
While here, expose nvme_completion_is_error(), and change all of the places that were explicitly looking at sc/sct bits to use this macro instead.
Sponsored by: Intel Reviewed by: carl
|
#
248747 |
|
26-Mar-2013 |
jimharris |
Add API for nvme consumers to access controller and namespace identify data.
Sponsored by: Intel Reviewed by: carl
|
#
248746 |
|
26-Mar-2013 |
jimharris |
Add controller reset capability to nvme(4) and ability to explicitly invoke it from nvmecontrol(8).
Controller reset will be performed in cases where I/O are repeatedly timing out, the controller reports an unrecoverable condition, or when explicitly requested via IOCTL or an nvme consumer. Since the controller may be in such a state where it cannot even process queue deletion requests, we will perform a controller reset without trying to clean up anything on the controller first.
Sponsored by: Intel Reviewed by: carl
|
#
248729 |
|
26-Mar-2013 |
jimharris |
Do not look at the namespace's thin provisioning field to determine if DSM command is supported. The two are not related.
Sponsored by: Intel
|
#
241657 |
|
17-Oct-2012 |
jimharris |
Add return codes to all functions used for submitting commands to I/O queues.
Sponsored by: Intel
|
#
240616 |
|
17-Sep-2012 |
jimharris |
This is the first of several commits which will add NVM Express (NVMe) support to FreeBSD. A full description of the overall functionality being added is below. nvmexpress.org defines NVM Express as "an optimized register interface, command set and feature set fo PCI Express (PCIe)-based Solid-State Drives (SSDs)."
This commit adds nvme(4) and nvd(4) driver source code and Makefiles to the tree.
Full NVMe functionality description: Add nvme(4) and nvd(4) drivers and nvmecontrol(8) for NVM Express (NVMe) device support.
There will continue to be ongoing work on NVM Express support, but there is more than enough to allow for evaluation of pre-production NVM Express devices as well as soliciting feedback. Questions and feedback are welcome.
nvme(4) implements NVMe hardware abstraction and is a provider of NVMe namespaces. The closest equivalent of an NVMe namespace is a SCSI LUN. nvd(4) is an NVMe consumer, surfacing NVMe namespaces as GEOM disks. nvmecontrol(8) is used for NVMe configuration and management.
The following are currently supported: nvme(4) - full mandatory NVM command set support - per-CPU IO queues (enabled by default but configurable) - per-queue sysctls for statistics and full command/completion queue dumps for debugging - registration API for NVMe namespace consumers - I/O error handling (except for timeoutsee below) - compilation switches for support back to stable-7
nvd(4) - BIO_DELETE and BIO_FLUSH (if supported by controller) - proper BIO_ORDERED handling
nvmecontrol(8) - devlist: list NVMe controllers and their namespaces - identify: display controller or namespace identify data in human-readable or hex format - perftest: quick and dirty performance test to measure raw performance of NVMe device without userspace/physio/GEOM overhead
The following are still work in progress and will be completed over the next 3-6 months in rough priority order: - complete man pages - firmware download and activation - asynchronous error requests - command timeout error handling - controller resets - nvmecontrol(8) log page retrieval
This has been primarily tested on amd64, with light testing on i386. I would be happy to provide assistance to anyone interested in porting this to other architectures, but am not currently planning to do this work myself. Big-endian and dmamap sync for command/completion queues are the main areas that would need to be addressed.
The nvme(4) driver currently has references to Chatham, which is an Intel-developed prototype board which is not fully spec compliant. These references will all be removed over time.
Sponsored by: Intel Contributions from: Joe Golio/EMC <joseph dot golio at emc dot com>
|