History log of /haiku/src/add-ons/kernel/drivers/disk/nvme/nvme_disk.cpp
Revision Date Author Comments
# 629f071b 27-Feb-2024 X512 <danger_mail@list.ru>

pci: extend MSI interrupt vector number to 32 bits

Also increase MSI message data size to 32 bits according to PCIe spec.

Remove 0xff check for MSI interrupts because it is potentially valid
interrupt vector number. Reject 0xff only for legacy pin interrupts.

- MSI-X supports up to 2048 interrupts per device that do not fit to
`uint8`.

- Non-x86 systems may use separate interrupt vector ranges for
hard-wired interrupts and MSI interrupts so `uint8` is not enough to
represent all of them.

Change-Id: Iaf9ffb197ec23db0f97ffe3ea756d28d7bfc8705
Reviewed-on: https://review.haiku-os.org/c/haiku/+/7433
Reviewed-by: waddlesplash <waddlesplash@gmail.com>
Tested-by: Commit checker robot <no-reply+buildbot@haiku-os.org>


# 12bba381 05-Jan-2024 Augustin Cavalier <waddlesplash@gmail.com>

nvme, mmc: Fix SMAP violations in B_GET_MEDIA_STATUS.

Fixes #18736.


# be808057 27-Apr-2023 Augustin Cavalier <waddlesplash@gmail.com>

IORequest: Refactor IOOperation transferred-bytes and status accounting.

Until the introduction of the nvme_disk driver, these classes were
mostly only used directly by the IO scheduler, and then a few direct
usages of IOOperation itself in the individual disk drivers; so
API confusions were easily missed.

When writing the nvme_disk driver's IORequest support, however, it
became readily apparent that there were some pretty bad confusions
around transferred-bytes accounting in IOOperation. This commit
attempts to resolve all of those.

There are two basic changes here:

1. Move transferred-bytes accounting into IOOperation::SetStatus.

The "TransferredBytes" field of IOOperation is against the *original*
range, not the actual operation's range (which will be wider, due to
bouncing, etc.), and furthermore only applies to the actual content
of the request (and not e.g. to a read half of a bounced write.)

These two facts meant that determining what value to pass to
SetTransferredBytes was not trivial, and was easy to get wrong.
I recall messing that up when working on nvme_disk multiple times
before reading the API carefully.

2. Do not pass redundant values to IORequest::OperationFinished.

All of the values here can be derived (albeit indirectly) from the
IOOperation, and all consumers of this API basically did just that.
Rather than make them do it, make the IORequest take care of
computing all of those values itself.

Change-Id: Ic9ae29e1100319e5b7647647c4db7e5aad4d125e


# 8d2c997d 27-Apr-2023 Augustin Cavalier <waddlesplash@gmail.com>

nvme_disk: Add missing bounds check and adjust clamping.


# 3b91d73b 16-Dec-2022 X512 <danger_mail@list.ru>

bus & drivers: drop PCI_x86

Change-Id: I494deaf24a4793a5e0fe9fa46ecdce32f65e616a
Reviewed-on: https://review.haiku-os.org/c/haiku/+/6226
Reviewed-by: waddlesplash <waddlesplash@gmail.com>
Reviewed-by: Fredrik Holmqvist <fredrik.holmqvist@gmail.com>


# 1572bf59 15-Feb-2023 Augustin Cavalier <waddlesplash@gmail.com>

nvme_disk: Use the correct length value in the total-trimmed calculation.

We want to return the total trimmed size in bytes, not blocks.
Fixes incorrect trim size reporting under NVMe devices.


# 215b685f 11-Dec-2022 X512 <danger_mail@list.ru>

kernel: Drop non-standard GNU inline assignment syntax

* We needed this previously due to our gcc2 compiled kernel.
* Now that our kernel is always latest gcc, we can move to the
c++20 syntax for inline assignment.
* Improves compatibility with clang, less GNU-specific stuff

Change-Id: Ib7272a0a52554a31e9a0e788fd3f031db9049795
Reviewed-on: https://review.haiku-os.org/c/haiku/+/5898
Reviewed-by: Alex von Gluck IV <kallisti5@unixzen.com>
Reviewed-by: Adrien Destugues <pulkomandy@pulkomandy.tk>


# e1f6b2fb 19-Oct-2022 Augustin Cavalier <waddlesplash@gmail.com>

nvme_disk: Add a polling fallback mode.

Before implementing interrupts, all I/O was polled.
Now we only use polling as fallback if interrupts mode
does not work for some reason.

Should fix #17484.


# 688acf41 15-Sep-2022 Jérôme Duval <jerome.duval@gmail.com>

add physical_block_size field where applicable

only scsi_disk checks the actual value, other drivers take the logical block size.

This change reports the physical block size from the disk rather than the block
size used by IDE/SATA/SCSI commands. On typical modern SATA disks, the SATA
commands will use 512 byte blocks, but the disk will actually read and write
4K blocks internally. This is only of importance for partition alignment for DriveSetup,
and is independant of file systems or partitioning systems. This could also influence
the recommended block size for some file systems.

Change-Id: Id0f2e22659e89fcef64c1f8d04f81cd68995e01f
Reviewed-on: https://review.haiku-os.org/c/haiku/+/5667
Tested-by: Commit checker robot <no-reply+buildbot@haiku-os.org>
Reviewed-by: Adrien Destugues <pulkomandy@pulkomandy.tk>


# 07d5dba0 30-Apr-2022 Jim906 <jim_l@fastmail.com>

kernel/drivers/disk: enable Werror

* For NVMe library, disable warnings.
* Otherwise, change code to avoid generating warnings.
* No functional change.
* Fixes #9460.

Change-Id: Ia790de391e6b230c909dff7023f00a19bdd574be
Reviewed-on: https://review.haiku-os.org/c/haiku/+/5284
Tested-by: Commit checker robot <no-reply+buildbot@haiku-os.org>
Reviewed-by: Adrien Destugues <pulkomandy@gmail.com>


# 750ed8dd 23-Nov-2021 Augustin Cavalier <waddlesplash@gmail.com>

nvme_disk: Implement TRIM.

Only lightly tested (in QEMU with qcow2 images: it seems to work
and not to discard actual data.) Use with caution!


# ab2a1b20 23-Nov-2021 Augustin Cavalier <waddlesplash@gmail.com>

nvme_disk: Add a missing CALLED() and another assert.


# 52f94eb1 27-Oct-2021 Augustin Cavalier <waddlesplash@gmail.com>

nvme_disk: Add missing checks in the case where there is only one iovec.

May fix the remaining KDL in #16973.


# 535b7dc9 27-Oct-2021 Augustin Cavalier <waddlesplash@gmail.com>

nvme_disk: Include smp.h.

It is implicitly included on x86_64, but apparently not elsewhere.
Also fix a format warning.


# 9bc2885c 27-Oct-2021 Augustin Cavalier <waddlesplash@gmail.com>

nvme_disk: Allocate and utilize qpairs based on how many CPUs we have.

This replaces the previous round-robin system. It should reduce
lock contention on multi-core systems, with potential other
benefits depending on drive behavior.

FreeBSD's NVMe driver does basically the same thing.


# 18851359 26-Oct-2021 Augustin Cavalier <waddlesplash@gmail.com>

nvme_disk: Add the block size and count to the syslog output upon opening devices.

Since we do not entirely go through DMAResource, these numbers can
be relevant in diagnosing problems from tickets.


# 5250914d 26-Oct-2021 Augustin Cavalier <waddlesplash@gmail.com>

nvme_disk: Remove redundant check.

If the address % pagesize == 0, then clearly it is also aligned
on 32 bits, and we do not need to check & 0x3.


# dab646ac 26-Oct-2021 Augustin Cavalier <waddlesplash@gmail.com>

nvme_disk & virtio_block: Remove unneeded log2.

It was used, but then the result discarded. We have a log2()
in kernel/util/BitUtils.h anyway, so if it is really needed,
that should be used instead.


# b69c9008 26-Oct-2021 Augustin Cavalier <waddlesplash@gmail.com>

nvme_disk: Re-enable MSI-X.

It now works in QEMU, probably following the addition of 64-bit
address support in hrev55542.


# 9710cdc6 26-Oct-2021 Augustin Cavalier <waddlesplash@gmail.com>

nvme_disk: Move interrupt initialization to before qpair allocation.

At least QEMU's emulated controller seems to expect this.


# 69880fc7 18-Oct-2021 Augustin Cavalier <waddlesplash@gmail.com>

nvme_disk: Check addresses are 32-bit-aligned before deciding we do not need to bounce.

This should fix #16973.


# 62571f5e 18-Oct-2021 Augustin Cavalier <waddlesplash@gmail.com>

nvme_disk: Disable MSI-X support for now.

It is broken under QEMU, and it may well be on real hardware, too.
More investigation is required.


# 8f5199f0 18-Oct-2021 Augustin Cavalier <waddlesplash@gmail.com>

nvme_disk: Unconditionally mask off PCI_command_int_disable.

Otherwise legacy interrupts do not work at all on x86.
This fixes hrev55299, which contrary to the commit message,
did not "add" legacy interrupts support; it was already there,
and in fact that commit broke it on x86.


# 0877e4d7 25-Jul-2021 X512 <danger_mail@list.ru>

nvme: add legacy interrupts support

Change-Id: I29b9676c6c2407e7795af4c7b1c56a7afb546f6e
Reviewed-on: https://review.haiku-os.org/c/haiku/+/4312
Reviewed-by: Alex von Gluck IV <kallisti5@unixzen.com>
Reviewed-by: waddlesplash <waddlesplash@gmail.com>


# 550e8532 20-May-2020 Augustin Cavalier <waddlesplash@gmail.com>

nvme_disk: Perform the new lba_count computation only once.

As suggested by korli. No functional change.


# 4d0ad37a 18-May-2020 Augustin Cavalier <waddlesplash@gmail.com>

nvme_disk: Avoid doing I/O larger than the maximum size, if possible.

libnvme can break up transfers that are too large, but to do so,
it allocates one nvme_request struct (of which it has a large,
but finite, pool) per segment. Since we are already iterating
over "vecs", we might as well cut off transfers after they
would otherwise go over the limit.

Individual IOVs that are too large are left alone, though;
libnvme can still handle this. But at least we no longer try
to do all I/O in one go.

Tested in a similar manner to the previous commit.


# 8eb950cd 18-May-2020 Augustin Cavalier <waddlesplash@gmail.com>

nvme_disk: Fix ior_reset_sgl.

The "offset" parameter was not actually an IOV offset, but actually
a byte offset across all IOVs... whoops. Somehow, this went unnoticed
as most controllers have large enough maximum transfer sizes
that we would in practice not hit the limit (even with bs=1M
dd tests!)

KapiX's controller, as seen in #16049, however, has a maximum
transfer size of 64 pages; much smaller than these other controllers,
so it did trigger this behavior and exposed the bug.

Tested by adding an artificial limit of 2 blocks as the max
transfer size (which makes things pretty slow, as you might
expect.)


# 16a59954 18-May-2020 Augustin Cavalier <waddlesplash@gmail.com>

nvme_disk: Error check after do_nvme_io_request.

Fixes partial transfers being reported as larger than they actually were.


# 18e9c888 27-Apr-2020 Augustin Cavalier <waddlesplash@gmail.com>

nvme_disk: Use DMAResource for bouncing.

This is a "best of both worlds" approach: if nvme_disk
determines the I/O can be done with no bouncing at all,
it will do so; otherwise, the I/O is done by the _bounce
method that uses DMAResource.

Thanks to mmlr for helping investigate some of the DMAResource
crashes and other oddities.

Fixes #15818 and #15123.


# 1b3ccbc5 27-Apr-2020 Augustin Cavalier <waddlesplash@gmail.com>

nvme_disk: Overhaul to use physical-buffer-based I/O.

This fixes the virtual/physical mixup problems that plagued
the driver until now.

This is rather inefficent, though, as it does its own page-based
bouncing instead of using DMAResource. That will come in the
next commit.


# b6fccb79 20-Apr-2020 Augustin Cavalier <waddlesplash@gmail.com>

nvme_disk: Add an interrupt await timeout.

This way, if something stalls in an unexpected way, the whole system
won't just hang.

Should "fix" #15874, but there is probably some other underlying
problem.


# 9c878a9a 22-Mar-2020 Augustin Cavalier <waddlesplash@gmail.com>

nvme_disk: Use addr_t for pointer arithmetic.

No functional change (intended).


# 625dc38a 22-Mar-2020 Augustin Cavalier <waddlesplash@gmail.com>

libnvme: Rework qpairs to lock themselves.

They were theoretically guarded by the controller lock, but it appears
nvme_ns bypassed that, meaning that if ns_read was executed at the
same time as qpair_poll, unpredictable races could occur. This solves
that by making the qpairs guarded by their own mutex, which also
has the advantage of poll() being executable on more than one qpair
at a time.

Seems to fix the KDLs in #15123 (and maybe other NVMe tickets),
though the I/O corruptions remain.


# 54e9c79b 07-Mar-2020 Adrien Destugues <pulkomandy@pulkomandy.tk>

nvme: build fix.


# cbdbe1cf 07-Mar-2020 Augustin Cavalier <waddlesplash@gmail.com>

nvme_disk: Protect against simultaneous writes to "rounded" blocks.

May help with some of the stranger NVMe KDL tickets.


# f87a7ab0 07-Mar-2020 Augustin Cavalier <waddlesplash@gmail.com>

nvme_disk: Add missing linebreaks in error messages.


# 0bf26bd1 07-Mar-2020 Augustin Cavalier <waddlesplash@gmail.com>

nvme_disk: Poll before checking the "status" again.

Otherwise, we may have missed an interrupt and wind up in
a dead-wait.


# 475c3b42 07-Mar-2020 Augustin Cavalier <waddlesplash@gmail.com>

nvme_disk: Remove do_nvme_segmented_io.

It appears that libnvme handles breaking up I/O operations already
and submit them in a more organized fashion, so this is not needed.


# d447d747 07-Mar-2020 Augustin Cavalier <waddlesplash@gmail.com>

nvme_disk: Use interrupts instead of polling.

Requires the preceding commit due to how it uses ConditionVariables.
Massively decreases CPU usage under heavy disk load.

Change-Id: I967e5ed000751d9f3971dd811563e23bcb13dd50
Reviewed-on: https://review.haiku-os.org/c/haiku/+/2302
Reviewed-by: waddlesplash <waddlesplash@gmail.com>


# 0da74916 01-Mar-2020 Augustin Cavalier <waddlesplash@gmail.com>

nvme_disk: Initialize "status" in nvme_disk_write.

It is possible that no code path which assigns "status" is taken,
so we need to initialize this before it is checked against.

Spotted by Coverity.


# 06f4ddb0 28-Aug-2019 Augustin Cavalier <waddlesplash@gmail.com>

nvme_disk: Add missing cast to off_t.

Should fix the 64-bit build.


# 43895d31 27-Aug-2019 Augustin Cavalier <waddlesplash@gmail.com>

nvme_disk: Better protection against underflows.

This was using unsigned integer math and then trying to clamp to 0.
That won't work. Use off_t instead, which is an int64 and thus signed.
May fix behavior in some stranger error conditions.

While I'm at it, avoid reading in the beginning partial block
if we don't need to.


# e2aa2b55 29-Jun-2019 Augustin Cavalier <waddlesplash@gmail.com>

nvme_disk: Trace I/O errors to syslog.

This really should never occur, but in case it does, whoever called
us may not print such an error, so we should.


# 96e7ae6c 17-Apr-2019 Augustin Cavalier <waddlesplash@gmail.com>

nvme_disk: Fix build under GCC7.


# 2bd1323a 17-Apr-2019 Augustin Cavalier <waddlesplash@gmail.com>

nvme_disk: Respect the maximum transfer size by segmenting I/O.

On VirtualBox, the reported maximum transfer size is 2MB; but all
I/O >= 753KB fails in an identical way to the "maximum size" failures.
In other news, there are multiple open tickets in the VirtualBox
tracker about Linux systems failing to boot off NVMe...

I tested with a hacked-in maximum segment size of 752KB and everything
seemed to work just fine. Writing a HPKG to a FAT partition on NVMe,
rebooting, and then reading back the HPKG showed it had the same
sha256sum as the one stored in /system did. So this is working!


# a3fa1e70 17-Apr-2019 Augustin Cavalier <waddlesplash@gmail.com>

nvme_disk: Implement write support.

Tested only lightly. Please use caution when trying it!


# 872c8209 17-Apr-2019 Augustin Cavalier <waddlesplash@gmail.com>

nvme_disk: Allocate up to 8 qpairs and round-robin them for I/O.

Before this commit, only one qpair was allocated and it was
locked for the duration of the transfer. Now we allocate more
qpairs if the device supports it and lock only when actually
calling read(), enabling multiple reads to both be queued
and execute simultaneously.

This is probably a significant performance improvement,
even more so than it is/would be for SCSI, as SSDs
can actually read from as many sectors as bandwidth
allows at once.


# fa84c61a 14-Apr-2019 Augustin Cavalier <waddlesplash@gmail.com>

nvme_disk: Round the length to the proper offset for partial reads.

Following this commit, Haiku boots successfully in VMware from a
NVMe device!


# a3dc2dfa 14-Apr-2019 Augustin Cavalier <waddlesplash@gmail.com>

nvme_disk: Disable tracing.


# c9ba44b7 14-Apr-2019 Augustin Cavalier <waddlesplash@gmail.com>

nvme_disk: Properly implement partial reads.

The previous version did not account for the position not being a
multiple of the block size, among other problems.

Tested by hand with DiskProbe and then "dd skip=..." to read single
bytes from partial blocks, and validated as correct.


# 6ed6023d 14-Apr-2019 Augustin Cavalier <waddlesplash@gmail.com>

nvme_disk: Add more error checks in read().


# ace66546 14-Apr-2019 Augustin Cavalier <waddlesplash@gmail.com>

nvme_disk: Get the correct PCI information struct.

We want the one of the parent object, not the parent's parent.


# 26657c39 14-Apr-2019 Augustin Cavalier <waddlesplash@gmail.com>

nvme: Fix the build with GCC2.


# 2338299b 14-Apr-2019 Augustin Cavalier <waddlesplash@gmail.com>

nvme: Import the driver code.

Only one qpair is used for reading, which is rather inefficient.
We currently allocate a bounce buffer for every allocation, which is
also inefficient, due to the fact that we must read an integer multiple
of LBAs.

But it does work, and it is actually reasonably fast, even on an emulated
machine using a spinning-disk-backed NVMe device (88MB/s.)

I wasn't able to get it working in non-packaged, though; the device manager
called supports_device() on a number of PCI devices, but not the NVMe
device, so I have a different version with a hack that grabs the PCI info
manually. I didn't test inside haiku.hpkg yet; perhaps it will work in there.