#
8d6c0743 |
|
06-Nov-2023 |
Alexander Motin <mav@FreeBSD.org> |
nvme: Introduce longer timeouts for admin queue KIOXIA CD8 SSDs routinely take ~25 seconds to delete non-empty namespace. In some cases like hot-plug it takes longer, triggering timeout and controller resets after just 30 seconds. Linux for many years has separate 60 seconds timeout for admin queue. This patch does the same. And it is good to be consistent. Sponsored by: iXsystems, Inc. Reviewed by: imp MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D42454
|
#
8052b01e |
|
25-Aug-2023 |
Warner Losh <imp@FreeBSD.org> |
nvme: Add exclusion for ISR Add a basically uncontended spinlock that we take out while the ISR is running. This has two effects: First, when we get a timeout, we can safely call the nvme_qpair_process_completions w/o racing any ISRs. Second, we can use it to ensure that we don't reset the card while the ISRs are active (right now we just sleep and hope for the best, which usually is fine, but not always). Sponsored by: Netflix MFC After: 2 weeks Reviewed by: chuck, gallatin Differential Revision: https://reviews.freebsd.org/D41452
|
#
685dc743 |
|
16-Aug-2023 |
Warner Losh <imp@FreeBSD.org> |
sys: Remove $FreeBSD$: one-line .c pattern Remove /^[\s*]*__FBSDID\("\$FreeBSD\$"\);?\s*\n/
|
#
7be0b068 |
|
07-Aug-2023 |
Warner Losh <imp@FreeBSD.org> |
nvme: Remove duplicate command printing routine Both nvme_dump_command and nvme_qpair_print_command print nvme commands. The former latter better. Recode the one call to nvme_dump_command to use nvme_qpair_print_command and delete the former. No sense having two nearly identical routines. A future commit will convert to sbuf. Sponsored by: Netflix Reviewed by: chuck, mav, jhb Differential Revision: https://reviews.freebsd.org/D41309
|
#
6f76d493 |
|
07-Aug-2023 |
Warner Losh <imp@FreeBSD.org> |
nvme: Remove duplicate completion printing routine Both nvme_dump_completion and nvme_qpair_print_completion print completions. The latter is better. Recode the two instances of nvme_dump_completion to use nvme_qpair_print_completion and delete the former. No sense having two nearly identical routines. A future commit will convert this to sbuf. Sponsored by: Netflix Reviewed by: chuck Differential Revision: https://reviews.freebsd.org/D41308
|
#
bdc81eed |
|
12-Jun-2023 |
Warner Losh <imp@FreeBSD.org> |
nvme: Switch to nda by default We already run nda by default on all the !x86 architectures. Switch the default to nda. nda created nvd compatibility links by default, so this should be a nop. If this causes problems for your application, set hw.nvme.use_nvd=1 in your loader.conf. Sponsored by: Netflix
|
#
4d846d26 |
|
10-May-2023 |
Warner Losh <imp@FreeBSD.org> |
spdx: The BSD-2-Clause-FreeBSD identifier is obsolete, drop -FreeBSD The SPDX folks have obsoleted the BSD-2-Clause-FreeBSD identifier. Catch up to that fact and revert to their recommended match of BSD-2-Clause. Discussed with: pfg MFC After: 3 days Sponsored by: Netflix
|
#
161fcf79 |
|
29-Mar-2022 |
Warner Losh <imp@FreeBSD.org> |
nvme: Publish the drive's capabilities Add cap_lo and cap_hi sysctl to each nvme drive. This publishes the raw capabilities of the drive. Now we can only discover these with bootverbose. Sponsored by: Netflix
|
#
5f8ccf65 |
|
30-Nov-2021 |
Gordon Bergling <gbe@FreeBSD.org> |
nvme(4): Correct a typo in a sysctl description - s/printting/printing/ MFC after: 3 days
|
#
587aa255 |
|
28-Sep-2021 |
Warner Losh <imp@FreeBSD.org> |
nvme: count number of ignored interrupts Count the number of times we're asked to process completions, but that we ignore because the state of the qpair isn't in RECOVERY_NONE. Sponsored by: Netflix Reviewed by: mav, chuck Differential Revision: https://reviews.freebsd.org/D32212
|
#
7d5eebe0 |
|
28-Sep-2021 |
Warner Losh <imp@FreeBSD.org> |
nvme: Add sanity check for phase on startup. The proper phase for the qpiar right after reset in the first interrupt is 1. For it, make sure that we're not still in phase 0. This is an illegal state to be processing interrupts and indicates that we've failed to properly protect against a race between initializing our state and processing interrupts. Modify stat resetting code so it resets the number of interrpts to 1 instead of 0 so we don't trigger a false positive panic. Sponsored by: Netflix Reviewed by: cperciva, mav (prior version) Differential Revision: https://reviews.freebsd.org/D32211
|
#
b776de67 |
|
10-Aug-2021 |
Alexander Motin <mav@FreeBSD.org> |
Mark some sysctls as CTLFLAG_MPSAFE. MFC after: 2 weeks
|
#
0fc1d208 |
|
23-Oct-2020 |
Warner Losh <imp@FreeBSD.org> |
nvme: Remove compat code for older kernels Remove code that supported pre-2011 kernels. CTLTYPE_S64 was defined in rev 217616. All supported branches have it, so remove its compat definition as OBE.
|
#
d87b31e1 |
|
01-Sep-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
nvme: clean up empty lines in .c and .h files
|
#
4053f8ac |
|
02-May-2020 |
David Bright <dab@FreeBSD.org> |
Fix various Coverity-detected errors in nvme driver This fixes several Coverity-detected errors in the nvme driver. CIDs addressed: 1008344, 1009377, 1009380, 1193740, 1305470, 1403975, 1403980 Reviewed by: imp@, vangyzen@ MFC after: 5 days Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D24532
|
#
7029da5c |
|
26-Feb-2020 |
Pawel Biernacki <kaktus@FreeBSD.org> |
Mark more nodes as CTLFLAG_MPSAFE or CTLFLAG_NEEDGIANT (17 of many) r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are still not MPSAFE (or already are but aren’t properly marked). Use it in preparation for a general review of all nodes. This is non-functional change that adds annotations to SYSCTL_NODE and SYSCTL_PROC nodes using one of the soon-to-be-required flags. Mark all obvious cases as MPSAFE. All entries that haven't been marked as MPSAFE before are by default marked as NEEDGIANT Approved by: kib (mentor, blanket) Commented by: kib, gallatin, melifaro Differential Revision: https://reviews.freebsd.org/D23718
|
#
1eab19cb |
|
23-Sep-2019 |
Alexander Motin <mav@FreeBSD.org> |
Make nvme(4) driver some more NUMA aware. - For each queue pair precalculate CPU and domain it is bound to. If queue pairs are not per-CPU, then use the domain of the device. - Allocate most of queue pair memory from the domain it is bound to. - Bind callouts to the same CPUs as queue pair to avoid migrations. - Do not assign queue pairs to each SMT thread. It just wasted resources and increased lock congestions. - Remove fixed multiplier of CPUs per queue pair, spread them even. This allows to use more queue pairs in some hardware configurations. - If queue pair serves multiple CPUs, bind different NVMe devices to different CPUs. MFC after: 1 month Sponsored by: iXsystems, Inc.
|
#
5e83c2ff |
|
19-Jul-2019 |
Warner Losh <imp@FreeBSD.org> |
Keep track of the number of commands that exhaust their retry limit. While we print failure messages on the console, sometimes logs are lost or overwhelmed. Keeping a count of how many times we've failed retriable commands helps get a magnitude of the problem.
|
#
c37fc318 |
|
19-Jul-2019 |
Warner Losh <imp@FreeBSD.org> |
Keep track of the number of retried commands. Retried commands can indicate a performance degredation of an nvme drive. Keep track of the number of retries and report it out via sysctl, just like number of commands an interrupts.
|
#
1071b50a |
|
18-Jul-2019 |
Warner Losh <imp@FreeBSD.org> |
Use sysctl + CTLRWTUN for hw.nvme.verbose_cmd_dump. Also convert it to a bool. While the rest of the driver isn't yet bool clean, this will help. Reviewed by: cem@ Differential Revision: https://reviews.freebsd.org/D20988
|
#
718cf2cc |
|
27-Nov-2017 |
Pedro F. Giffuni <pfg@FreeBSD.org> |
sys/dev: further adoption of SPDX licensing ID tags. Mainly focus on files that use BSD 2-Clause license, however the tool I was using misidentified many licenses so this was mostly a manual - error prone - task. The Software Package Data Exchange (SPDX) group provides a specification to make it easier for automated tools to detect and summarize well known opensource licenses. We are gradually adopting the specification, noting that the tags are considered only advisory and do not, in any way, superceed or replace the license texts.
|
#
850564b9 |
|
28-Aug-2017 |
Warner Losh <imp@FreeBSD.org> |
Add new compile-time option NVME_USE_NVD that sets the default value of the runtime hw.nvme.use_vnd tunable. We still default to nvd unless otherwise requested. Sponsored by: Netflix
|
#
8a5d94f9 |
|
03-Aug-2017 |
Warner Losh <imp@FreeBSD.org> |
Make nvd vs nda choice boot-time rather than build-time Introduce hw.nvme.use_nvd tunable. This tunable allows both nvd and nda to be installed in the kernel, while allowing only one of them to create devices. This is an all-or-nothing setting, and you can't change it after boot-time. However, it will allow easier A/B testing. Differential Revision: https://reviews.freebsd.org/D11825
|
#
ee7f4d81 |
|
10-Mar-2016 |
Alexander Motin <mav@FreeBSD.org> |
Revert r292074 (by smh): Limit stripesize reported from nvd(4) to 4K I believe that this patch handled the problem from the wrong side. Instead of making ZFS properly handle large stripe sizes, it made unrelated driver to lie in reported parameters to workaround that. Alternative solution for this problem from ZFS side was committed at r296615. Discussed with: smh
|
#
50dea2da |
|
07-Jan-2016 |
Jim Harris <jimharris@FreeBSD.org> |
nvme: add hw.nvme.min_cpus_per_ioq tunable Due to FreeBSD system-wide limits on number of MSI-X vectors (https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=199321), it may be desirable to allocate fewer than the maximum number of vectors for an NVMe device, in order to save vectors for other devices (usually Ethernet) that can take better advantage of them and may be probed after NVMe. This tunable is expressed in terms of minimum number of CPUs per I/O queue instead of max number of queues per controller, to allow for a more even distribution of CPUs per queue. This avoids cases where some number of CPUs have a dedicated queue, but other CPUs need to share queues. Ideally the PR referenced above will eventually be fixed and the mechanism implemented here becomes obsolete anyways. While here, fix a bug in the CPUs per I/O queue calculation to properly account for the admin queue's MSI-X vector. Reviewed by: gallatin MFC after: 3 days Sponsored by: Intel
|
#
fdf16a68 |
|
10-Dec-2015 |
Steven Hartland <smh@FreeBSD.org> |
Limit stripesize reported from nvd(4) to 4K Intel NVMe controllers have a slow path for I/Os that span a 128KB stripe boundary but ZFS limits ashift, which is derived from d_stripesize, to 13 (8KB) so we limit the stripesize reported to geom(8) to 4KB. This may result in a small number of additional I/Os to require splitting in nvme(4), however the NVMe I/O path is very efficient so these additional I/Os will cause very minimal (if any) difference in performance or CPU utilisation. This can be controller by the new sysctl kern.nvme.max_optimal_sectorsize. MFC after: 1 week Sponsored by: Multiplay Differential Revision: https://reviews.freebsd.org/D4446
|
#
e9efbc13 |
|
09-Jul-2013 |
Jim Harris <jimharris@FreeBSD.org> |
Update copyright dates. MFC after: 3 days
|
#
be34f216 |
|
26-Mar-2013 |
Jim Harris <jimharris@FreeBSD.org> |
Remove the is_started flag from struct nvme_controller. This flag was originally added to communicate to the sysctl code which oids should be built, but there are easier ways to do this. This needs to be cleaned up prior to adding new controller states - for example, controller failure. Sponsored by: Intel Reviewed by: carl
|
#
94143332 |
|
26-Mar-2013 |
Jim Harris <jimharris@FreeBSD.org> |
Add a tunable for the I/O timeout interval. Default is still 30 seconds, but can be adjusted between a min/max of 5 and 120 seconds. Sponsored by: Intel Reviewed by: carl
|
#
21b6da58 |
|
17-Oct-2012 |
Jim Harris <jimharris@FreeBSD.org> |
Preallocate a limited number of nvme_tracker objects per qpair, rather than dynamically creating them at runtime. Sponsored by: Intel
|
#
f2b19f67 |
|
17-Oct-2012 |
Jim Harris <jimharris@FreeBSD.org> |
Merge struct nvme_prp_list into struct nvme_tracker. This simplifies the driver significantly where it is constructing commands to be submitted to hardware. By reducing the number of PRPs (NVMe parlance for SGE) from 128 to 32, it ensures we do not allocate too much memory for more common smaller I/O sizes, while still supporting up to 128KB I/O sizes. This also paves the way for pre-allocation of nvme_tracker objects for each queue which will simplify the I/O path even further. Sponsored by: Intel
|
#
6568ebfc |
|
10-Oct-2012 |
Jim Harris <jimharris@FreeBSD.org> |
Count number of times each queue pair's interrupt handler is invoked. Also add sysctls to query and reset each queue pair's stats, including the new count added here. Sponsored by: Intel
|
#
bb0ec6b3 |
|
17-Sep-2012 |
Jim Harris <jimharris@FreeBSD.org> |
This is the first of several commits which will add NVM Express (NVMe) support to FreeBSD. A full description of the overall functionality being added is below. nvmexpress.org defines NVM Express as "an optimized register interface, command set and feature set fo PCI Express (PCIe)-based Solid-State Drives (SSDs)." This commit adds nvme(4) and nvd(4) driver source code and Makefiles to the tree. Full NVMe functionality description: Add nvme(4) and nvd(4) drivers and nvmecontrol(8) for NVM Express (NVMe) device support. There will continue to be ongoing work on NVM Express support, but there is more than enough to allow for evaluation of pre-production NVM Express devices as well as soliciting feedback. Questions and feedback are welcome. nvme(4) implements NVMe hardware abstraction and is a provider of NVMe namespaces. The closest equivalent of an NVMe namespace is a SCSI LUN. nvd(4) is an NVMe consumer, surfacing NVMe namespaces as GEOM disks. nvmecontrol(8) is used for NVMe configuration and management. The following are currently supported: nvme(4) - full mandatory NVM command set support - per-CPU IO queues (enabled by default but configurable) - per-queue sysctls for statistics and full command/completion queue dumps for debugging - registration API for NVMe namespace consumers - I/O error handling (except for timeoutsee below) - compilation switches for support back to stable-7 nvd(4) - BIO_DELETE and BIO_FLUSH (if supported by controller) - proper BIO_ORDERED handling nvmecontrol(8) - devlist: list NVMe controllers and their namespaces - identify: display controller or namespace identify data in human-readable or hex format - perftest: quick and dirty performance test to measure raw performance of NVMe device without userspace/physio/GEOM overhead The following are still work in progress and will be completed over the next 3-6 months in rough priority order: - complete man pages - firmware download and activation - asynchronous error requests - command timeout error handling - controller resets - nvmecontrol(8) log page retrieval This has been primarily tested on amd64, with light testing on i386. I would be happy to provide assistance to anyone interested in porting this to other architectures, but am not currently planning to do this work myself. Big-endian and dmamap sync for command/completion queues are the main areas that would need to be addressed. The nvme(4) driver currently has references to Chatham, which is an Intel-developed prototype board which is not fully spec compliant. These references will all be removed over time. Sponsored by: Intel Contributions from: Joe Golio/EMC <joseph dot golio at emc dot com>
|