History log of /linux-master/drivers/block/virtio_blk.c
Revision Date Author Comments
# 0e46064e 01-Mar-2024 Damien Le Moal <dlemoal@kernel.org>

virtio_blk: Do not use disk_set_max_open/active_zones()

In virtblk_read_zoned_limits(), setting a zoned block device maximum
number of open and active zones using the functions
disk_set_max_open_zones() and disk_set_max_active_zones() is incorrect
as setting the limits for the request queue is now done atomically when
the gendisk is created (with blk_mq_alloc_disk()). The value set by the
disk_set_max_open/active_zones() functions will be overwritten.
Fix this by setting the maximum number of open and active zones directly
in the queue_limits structure passed to virtblk_read_zoned_limits().

Fixes: 8b837256560c ("virtio_blk: pass queue_limits to blk_mq_alloc_disk")
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20240301192639.410183-2-dlemoal@kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 8b837256 13-Feb-2024 Christoph Hellwig <hch@lst.de>

virtio_blk: pass queue_limits to blk_mq_alloc_disk

Call virtblk_read_limits and most of virtblk_probe_zoned_device before
allocating the gendisk and thus request_queue and make them read into
a queue_limits structure instead. Pass this initialized queue_limits
to blk_mq_alloc_disk to set the queue up with the right parameters
from the start and only leave a few final touches for zoned devices
to be done just before adding the disk.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Link: https://lore.kernel.org/r/20240213073425.1621680-13-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 718628ad 13-Feb-2024 Christoph Hellwig <hch@lst.de>

virtio_blk: split virtblk_probe

Split out a virtblk_read_limits helper that just reads the various
queue limits to separate it from the higher level probing logic.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Link: https://lore.kernel.org/r/20240213073425.1621680-12-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 27e32cd2 13-Feb-2024 Christoph Hellwig <hch@lst.de>

block: pass a queue_limits argument to blk_mq_alloc_disk

Pass a queue_limits to blk_mq_alloc_disk and apply it if non-NULL. This
will allow allocating queues with valid queue limits instead of setting
the values one at a time later.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: John Garry <john.g.garry@oracle.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Link: https://lore.kernel.org/r/20240213073425.1621680-11-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 4ce6e2db 29-Jan-2024 Yi Sun <yi.sun@unisoc.com>

virtio-blk: Ensure no requests in virtqueues before deleting vqs.

Ensure no remaining requests in virtqueues before resetting vdev and
deleting virtqueues. Otherwise these requests will never be completed.
It may cause the system to become unresponsive.

Function blk_mq_quiesce_queue() can ensure that requests have become
in_flight status, but it cannot guarantee that requests have been
processed by the device. Virtqueues should never be deleted before
all requests become complete status.

Function blk_mq_freeze_queue() ensure that all requests in virtqueues
become complete status. And no requests can enter in virtqueues.

Signed-off-by: Yi Sun <yi.sun@unisoc.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Link: https://lore.kernel.org/r/20240129085250.1550594-1-yi.sun@unisoc.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 04036d49 12-Jan-2024 Li RongQing <lirongqing@baidu.com>

virtio_blk: remove duplicate check if queue is broken in virtblk_done

virtqueue_enable_cb() will call virtqueue_poll() which will check if
queue is broken at beginning, so remove the virtqueue_is_broken() call

Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Li RongQing <lirongqing@baidu.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# d73e93b4 17-Dec-2023 Christoph Hellwig <hch@lst.de>

block: simplify disk_set_zoned

Only use disk_set_zoned to actually enable zoned device support.
For clearing it, call disk_clear_zoned, which is renamed from
disk_clear_zone_settings and now directly clears the zoned flag as
well.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Link: https://lore.kernel.org/r/20231217165359.604246-5-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 7437bb73 17-Dec-2023 Christoph Hellwig <hch@lst.de>

block: remove support for the host aware zone model

When zones were first added the SCSI and ATA specs, two different
models were supported (in addition to the drive managed one that
is invisible to the host):

- host managed where non-conventional zones there is strict requirement
to write at the write pointer, or else an error is returned
- host aware where a write point is maintained if writes always happen
at it, otherwise it is left in an under-defined state and the
sequential write preferred zones behave like conventional zones
(probably very badly performing ones, though)

Not surprisingly this lukewarm model didn't prove to be very useful and
was finally removed from the ZBC and SBC specs (NVMe never implemented
it). Due to to the easily disappearing write pointer host software
could never rely on the write pointer to actually be useful for say
recovery.

Fortunately only a few HDD prototypes shipped using this model which
never made it to mass production. Drop the support before it is too
late. Note that any such host aware prototype HDD can still be used
with Linux as we'll now treat it as a conventional HDD.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Link: https://lore.kernel.org/r/20231217165359.604246-4-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# a971ed80 17-Dec-2023 Christoph Hellwig <hch@lst.de>

virtio_blk: remove the broken zone revalidation support

virtblk_revalidate_zones is called unconditionally from
virtblk_config_changed_work from the virtio config_changed callback.

virtblk_revalidate_zones is a bit odd in that it re-clears the zoned
state for host aware or non-zoned devices, which isn't needed unless the
zoned mode changed - but a zone mode change to a host managed model isn't
handled at all, and virtio_blk also doesn't handle any other config
change except for a capacity change is handled (and even if it was
the upper layers above virtio_blk wouldn't handle it very well).

But even the useful case of a size change that would add or remove
zones isn't handled properly as blk_revalidate_disk_zones expects the
device capacity to cover all zones, but the capacity is only updated
after virtblk_revalidate_zones.

As this code appears to be entirely untested and is getting in the way
remove it for now, but it can be readded in a fixed version with
proper test coverage if needed.

Fixes: 95bfec41bd3d ("virtio-blk: add support for zoned block devices")
Fixes: f1ba4e674feb ("virtio-blk: fix to match virtio spec")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Link: https://lore.kernel.org/r/20231217165359.604246-3-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 77360cad 17-Dec-2023 Christoph Hellwig <hch@lst.de>

virtio_blk: cleanup zoned device probing

Move reading and checking the zoned model from virtblk_probe_zoned_device
into the caller, leaving only the code to perform the actual setup for
host managed zoned devices in virtblk_probe_zoned_device.

This allows to share the model reading and sharing between builds with
and without CONFIG_BLK_DEV_ZONED, and improve it for the
!CONFIG_BLK_DEV_ZONED case.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Link: https://lore.kernel.org/r/20231217165359.604246-2-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# b8e07924 04-Dec-2023 Stefan Hajnoczi <stefanha@redhat.com>

virtio_blk: fix snprintf truncation compiler warning

Commit 4e0400525691 ("virtio-blk: support polling I/O") triggers the
following gcc 13 W=1 warnings:

drivers/block/virtio_blk.c: In function ‘init_vq’:
drivers/block/virtio_blk.c:1077:68: warning: ‘%d’ directive output may be truncated writing between 1 and 11 bytes into a region of size 7 [-Wformat-truncation=]
1077 | snprintf(vblk->vqs[i].name, VQ_NAME_LEN, "req_poll.%d", i);
| ^~
drivers/block/virtio_blk.c:1077:58: note: directive argument in the range [-2147483648, 65534]
1077 | snprintf(vblk->vqs[i].name, VQ_NAME_LEN, "req_poll.%d", i);
| ^~~~~~~~~~~~~
drivers/block/virtio_blk.c:1077:17: note: ‘snprintf’ output between 11 and 21 bytes into a destination of size 16
1077 | snprintf(vblk->vqs[i].name, VQ_NAME_LEN, "req_poll.%d", i);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

This is a false positive because the lower bound -2147483648 is
incorrect. The true range of i is [0, num_vqs - 1] where 0 < num_vqs <
65536.

The code mixes int, unsigned short, and unsigned int types in addition
to using "%d" for an unsigned value. Use unsigned short and "%u"
consistently to solve the compiler warning.

Cc: Suwan Kim <suwan.kim027@gmail.com>
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202312041509.DIyvEt9h-lkp@intel.com/
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20231204140743.1487843-1-stefanha@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>


# fafb51a6 04-Sep-2023 zhenwei pi <pizhenwei@bytedance.com>

virtio-blk: fix implicit overflow on virtio_max_dma_size

The following codes have an implicit conversion from size_t to u32:
(u32)max_size = (size_t)virtio_max_dma_size(vdev);

This may lead overflow, Ex (size_t)4G -> (u32)0. Once
virtio_max_dma_size() has a larger size than U32_MAX, use U32_MAX
instead.

Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>
Message-Id: <20230904061045.510460-1-pizhenwei@bytedance.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>


# 217b613a 13-Sep-2023 Chengming Zhou <zhouchengming@bytedance.com>

blk-mq: update driver tags request table when start request

Now we update driver tags request table in blk_mq_get_driver_tag(),
so the driver that support queue_rqs() have to update that inflight
table by itself.

Move it to blk_mq_start_request(), which is a better place where
we setup the deadline for request timeout check. And it's just
where the request becomes inflight.

Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20230913151616.3164338-5-chengming.zhou@linux.dev
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# a3d96ed2 02-Jul-2023 Damien Le Moal <dlemoal@kernel.org>

scsi: block: virtio_blk: Set zone limits before revalidating zones

In virtblk_probe_zoned_device(), execute blk_queue_chunk_sectors() and
blk_queue_max_zone_append_sectors() to respectively set the zoned device
zone size and maximum zone append sector limit before executing
blk_revalidate_disk_zones(). This is to allow the block layer zone
reavlidation to check these device characteristics prior to checking all
zones of the device.

Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Link: https://lore.kernel.org/r/20230703024812.76778-5-dlemoal@kernel.org
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>


# afd384f0 08-Jun-2023 Michael S. Tsirkin <mst@redhat.com>

Revert "virtio-blk: support completion batching for the IRQ path"

This reverts commit 07b679f70d73483930e8d3c293942416d9cd5c13.

This change appears to have broken things...
We now see applications hanging during disk accesses.
e.g.
multi-port virtio-blk device running in h/w (FPGA)
Host running a simple 'fio' test.
[global]
thread=1
direct=1
ioengine=libaio
norandommap=1
group_reporting=1
bs=4K
rw=read
iodepth=128
runtime=1
numjobs=4
time_based
[job0]
filename=/dev/vda
[job1]
filename=/dev/vdb
[job2]
filename=/dev/vdc
...
[job15]
filename=/dev/vdp

i.e. 16 disks; 4 queues per disk; simple burst of 4KB reads
This is repeatedly run in a loop.

After a few, normally <10 seconds, fio hangs.
With 64 queues (16 disks), failure occurs within a few seconds; with 8 queues (2 disks) it may take ~hour before hanging.
Last message:
fio-3.19
Starting 8 threads
Jobs: 1 (f=1): [_(7),R(1)][68.3%][eta 03h:11m:06s]
I think this means at the end of the run 1 queue was left incomplete.

'diskstats' (run while fio is hung) shows no outstanding transactions.
e.g.
$ cat /proc/diskstats
...
252 0 vda 1843140071 0 14745120568 712568645 0 0 0 0 0 3117947 712568645 0 0 0 0 0 0
252 16 vdb 1816291511 0 14530332088 704905623 0 0 0 0 0 3117711 704905623 0 0 0 0 0 0
...

Other stats (in the h/w, and added to the virtio-blk driver ([a]virtio_queue_rq(), [b]virtblk_handle_req(), [c]virtblk_request_done()) all agree, and show every request had a completion, and that virtblk_request_done() never gets called.
e.g.
PF= 0 vq=0 1 2 3
[a]request_count - 839416590 813148916 105586179 84988123
[b]completion1_count - 839416590 813148916 105586179 84988123
[c]completion2_count - 0 0 0 0

PF= 1 vq=0 1 2 3
[a]request_count - 823335887 812516140 104582672 75856549
[b]completion1_count - 823335887 812516140 104582672 75856549
[c]completion2_count - 0 0 0 0

i.e. the issue is after the virtio-blk driver.

This change was introduced in kernel 6.3.0.
I am seeing this using 6.3.3.
If I run with an earlier kernel (5.15), it does not occur.
If I make a simple patch to the 6.3.3 virtio-blk driver, to skip the blk_mq_add_to_batch()call, it does not fail.
e.g.
kernel 5.15 - this is OK
virtio_blk.c,virtblk_done() [irq handler]
if (likely(!blk_should_fake_timeout(req->q))) {
blk_mq_complete_request(req);
}

kernel 6.3.3 - this fails
virtio_blk.c,virtblk_handle_req() [irq handler]
if (likely(!blk_should_fake_timeout(req->q))) {
if (!blk_mq_complete_request_remote(req)) {
if (!blk_mq_add_to_batch(req, iob, virtblk_vbr_status(vbr), virtblk_complete_batch)) {
virtblk_request_done(req); //this never gets called... so blk_mq_add_to_batch() must always succeed
}
}
}

If I do, kernel 6.3.3 - this is OK
virtio_blk.c,virtblk_handle_req() [irq handler]
if (likely(!blk_should_fake_timeout(req->q))) {
if (!blk_mq_complete_request_remote(req)) {
virtblk_request_done(req); //force this here...
if (!blk_mq_add_to_batch(req, iob, virtblk_vbr_status(vbr), virtblk_complete_batch)) {
virtblk_request_done(req); //this never gets called... so blk_mq_add_to_batch() must always succeed
}
}
}

Perhaps you might like to fix/test/revert this change...
Martin

Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202306090826.C1fZmdMe-lkp@intel.com/
Cc: Suwan Kim <suwan.kim027@gmail.com>
Tested-by: edliaw@google.com
Reported-by: "Roberts, Martin" <martin.roberts@intel.com>
Message-Id: <336455b4f630f329380a8f53ee8cad3868764d5c.1686295549.git.mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>


# 10805eb5 30-Mar-2023 Dmitry Fomichev <dmitry.fomichev@wdc.com>

virtio-blk: fix ZBD probe in kernels without ZBD support

When the kernel is built without support for zoned block devices,
virtio-blk probe needs to error out any host-managed device scans
to prevent such devices from appearing in the system as non-zoned.
The current virtio-blk code simply bypasses all ZBD checks if
CONFIG_BLK_DEV_ZONED is not defined and this leads to host-managed
block devices being presented as non-zoned in the OS. This is one of
the main problems this patch series is aimed to fix.

In this patch, make VIRTIO_BLK_F_ZONED feature defined even when
CONFIG_BLK_DEV_ZONED is not. This change makes the code compliant with
the voted revision of virtio-blk ZBD spec. Modify the probe code to
look at the situation when VIRTIO_BLK_F_ZONED is negotiated in a kernel
that is built without ZBD support. In this case, the code checks
the zoned model of the device and fails the probe is the device
is host-managed.

The patch also adds the comment to clarify that the call to perform
the zoned device probe is correctly placed after virtio_device ready().

Fixes: 95bfec41bd3d ("virtio-blk: add support for zoned block devices")
Cc: stable@vger.kernel.org
Signed-off-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Message-Id: <20230330214953.1088216-3-dmitry.fomichev@wdc.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>


# f1ba4e67 30-Mar-2023 Dmitry Fomichev <dmitry.fomichev@wdc.com>

virtio-blk: fix to match virtio spec

The merged patch series to support zoned block devices in virtio-blk
is not the most up to date version. The merged patch can be found at

https://lore.kernel.org/linux-block/20221016034127.330942-3-dmitry.fomichev@wdc.com/

but the latest and reviewed version is

https://lore.kernel.org/linux-block/20221110053952.3378990-3-dmitry.fomichev@wdc.com/

The reason is apparently that the correct mailing lists and
maintainers were not copied.

The differences between the two are mostly cleanups, but there is one
change that is very important in terms of compatibility with the
approved virtio-zbd specification.

Before it was approved, the OASIS virtio spec had a change in
VIRTIO_BLK_T_ZONE_APPEND request layout that is not reflected in the
current virtio-blk driver code. In the running code, the status is
the first byte of the in-header that is followed by some pad bytes
and the u64 that carries the sector at which the data has been written
to the zone back to the driver, aka the append sector.

This layout turned out to be problematic for implementing in QEMU and
the request status byte has been eventually made the last byte of the
in-header. The current code doesn't expect that and this causes the
append sector value always come as zero to the block layer. This needs
to be fixed ASAP.

Fixes: 95bfec41bd3d ("virtio-blk: add support for zoned block devices")
Cc: stable@vger.kernel.org
Signed-off-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Message-Id: <20230330214953.1088216-2-dmitry.fomichev@wdc.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>


# 07b679f7 21-Dec-2022 Suwan Kim <suwan.kim027@gmail.com>

virtio-blk: support completion batching for the IRQ path

This patch adds completion batching to the IRQ path. It reuses batch
completion code of virtblk_poll(). It collects requests to io_comp_batch
and processes them all at once. It can boost up the performance by 2%.

To validate the performance improvement and stabilty, I did fio test with
4 vCPU VM and 12 vCPU VM respectively. Both VMs have 8GB ram and the same
number of HW queues as vCPU.
The fio cammad is as follows and I ran the fio 5 times and got IOPS average.
(io_uring, randread, direct=1, bs=512, iodepth=64 numjobs=2,4)

Test result shows about 2% improvement.

4 vcpu VM | numjobs=2 | numjobs=4
-----------------------------------------------------------
fio without patch | 367.2K IOPS | 397.6K IOPS
-----------------------------------------------------------
fio with patch | 372.8K IOPS | 407.7K IOPS

12 vcpu VM | numjobs=2 | numjobs=4
-----------------------------------------------------------
fio without patch | 363.6K IOPS | 374.8K IOPS
-----------------------------------------------------------
fio with patch | 373.8K IOPS | 385.3K IOPS

Signed-off-by: Suwan Kim <suwan.kim027@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Message-Id: <20221221145456.281218-3-suwan.kim027@gmail.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>


# 489e18f3 21-Dec-2022 Suwan Kim <suwan.kim027@gmail.com>

virtio-blk: set req->state to MQ_RQ_COMPLETE after polling I/O is finished

Driver should set req->state to MQ_RQ_COMPLETE after it finishes to process
req. But virtio-blk doesn't set MQ_RQ_COMPLETE after virtblk_poll() handles
req and req->state still remains MQ_RQ_IN_FLIGHT. Fortunately so far there
is no issue about it because blk_mq_end_request_batch() sets req->state to
MQ_RQ_IDLE.

In this patch, virblk_poll() calls blk_mq_complete_request_remote() to set
req->state to MQ_RQ_COMPLETE before it adds req to a batch completion list.
So it properly sets req->state after polling I/O is finished.

Fixes: 4e0400525691 ("virtio-blk: support polling I/O")
Signed-off-by: Suwan Kim <suwan.kim027@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Message-Id: <20221221145456.281218-2-suwan.kim027@gmail.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>


# 2a9c844e 20-Dec-2022 Michael S. Tsirkin <mst@redhat.com>

virtio_blk: zone append in header type tweak

virtio blk returns a 64 bit append_sector in an input buffer,
in LE format. This field is not tagged as LE correctly, so
even though the generated code is ok, we get warnings from sparse:

drivers/block/virtio_blk.c:332:33: sparse: sparse: cast to restricted __le64

Make sparse happy by using the correct type.

Message-Id: <20221220125154.564265-1-mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>


# 04e5421e 20-Dec-2022 Michael S. Tsirkin <mst@redhat.com>

virtio_blk: temporary variable type tweak

virtblk_result returns blk_status_t which is a bitwise restricted type,
so we are not supposed to stuff it in a plain int temporary variable.
All we do with it is pass it on to a function expecting blk_status_t so
the generated code is ok, but we get warnings from sparse:

drivers/block/virtio_blk.c:326:36: sparse: sparse: incorrect type in initializer (different base types) @@ expected int status @@
+got restricted blk_status_t @@
drivers/block/virtio_blk.c:334:33: sparse: sparse: incorrect type in argument 2 (different base types) @@ expected restricted
+blk_status_t [usertype] error @@ got int status @@

Make sparse happy by using the correct type.

Message-Id: <20221220124152.523531-1-mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>


# 95bfec41 15-Oct-2022 Dmitry Fomichev <dmitry.fomichev@wdc.com>

virtio-blk: add support for zoned block devices

This patch adds support for Zoned Block Devices (ZBDs) to the kernel
virtio-blk driver.

The patch accompanies the virtio-blk ZBD support draft that is now
being proposed for standardization. The latest version of the draft is
linked at

https://github.com/oasis-tcs/virtio-spec/issues/143 .

The QEMU zoned device code that implements these protocol extensions
has been developed by Sam Li and it is currently in review at the QEMU
mailing list.

A number of virtblk request structure changes has been introduced to
accommodate the functionality that is specific to zoned block devices
and, most importantly, make room for carrying the Zoned Append sector
value from the device back to the driver along with the request status.

The zone-specific code in the patch is heavily influenced by NVMe ZNS
code in drivers/nvme/host/zns.c, but it is simpler because the proposed
virtio ZBD draft only covers the zoned device features that are
relevant to the zoned functionality provided by Linux block layer.

includes the following fixup:

virtio-blk: fix probe without CONFIG_BLK_DEV_ZONED

When building without CONFIG_BLK_DEV_ZONED, VIRTIO_BLK_F_ZONED
is excluded from array of driver features.
As a result virtio_has_feature panics in virtio_check_driver_offered_feature
since that by design verifies that a feature we are checking for
is listed in the feature array.

To fix, replace the call to virtio_has_feature with a stub.

Message-Id: <20221016034127.330942-3-dmitry.fomichev@wdc.com>
Co-developed-by: Stefan Hajnoczi <stefanha@gmail.com>
Signed-off-by: Stefan Hajnoczi <stefanha@gmail.com>
Signed-off-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Message-Id: <20221220112340.518841-1-mst@redhat.com>
Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>
Tested-by: Linux Kernel Functional Testing <lkft@linaro.org>
Reported-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Debugged-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Tested-by: Anders Roxell <anders.roxell@linaro.org>
Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>


# b831f3a1 03-Feb-2023 Christoph Hellwig <hch@lst.de>

virtio_blk: use bvec_set_virt to initialize special_vec

Use the bvec_set_virt helper to initialize the special_vec.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Link: https://lore.kernel.org/r/20230203150634.3199647-10-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# a26116c1 21-Oct-2022 Rafael Mendonca <rafaelmendsr@gmail.com>

virtio_blk: Fix signedness bug in virtblk_prep_rq()

The virtblk_map_data() function returns negative error codes, however, the
'nents' field of vbr->sg_table is an unsigned int, which causes the error
handling not to work correctly.

Cc: stable@vger.kernel.org
Fixes: 0e9911fa768f ("virtio-blk: support mq_ops->queue_rqs()")
Signed-off-by: Rafael Mendonca <rafaelmendsr@gmail.com>
Message-Id: <20221021204126.927603-1-rafaelmendsr@gmail.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Suwan Kim <suwan.kim027@gmail.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>


# f4e468f7 09-Nov-2022 Angus Chen <angus.chen@jaguarmicro.com>

virtio_blk: use UINT_MAX instead of -1U

We use UINT_MAX to limit max_discard_sectors in virtblk_probe,
we can use UINT_MAX to limit max_hw_sectors for consistencies.

No functional change intended.

Signed-off-by: Angus Chen <angus.chen@jaguarmicro.com>
Message-Id: <20221110030124.1986-1-angus.chen@jaguarmicro.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>


# 258896fc 15-Oct-2022 Dmitry Fomichev <dmitry.fomichev@wdc.com>

virtio-blk: use a helper to handle request queuing errors

Define a new helper function, virtblk_fail_to_queue(), to
clean up the error handling code in virtio_queue_rq().

Signed-off-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Message-Id: <20221016034127.330942-2-dmitry.fomichev@wdc.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>


# 92a34c46 30-Nov-2022 Pankaj Raghav <p.raghav@samsung.com>

virtio-blk: replace ida_simple[get|remove] with ida_[alloc_range|free]

ida_simple[get|remove] are deprecated, and are just wrappers to
ida_[alloc_range|free]. Replace ida_simple[get|remove] with their
corresponding counterparts.

No functional changes.

Signed-off-by: Pankaj Raghav <p.raghav@samsung.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Link: https://lore.kernel.org/r/20221130123001.25473-1-p.raghav@samsung.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# e60d6407 21-Sep-2022 Alvaro Karsz <alvaro.karsz@solid-run.com>

virtio_blk: add SECURE ERASE command support

Support for the VIRTIO_BLK_F_SECURE_ERASE VirtIO feature.

A device that offers this feature can receive VIRTIO_BLK_T_SECURE_ERASE
commands.

A device which supports this feature has the following fields in the
virtio config:

- max_secure_erase_sectors
- max_secure_erase_seg
- secure_erase_sector_alignment

max_secure_erase_sectors and secure_erase_sector_alignment are expressed
in 512-byte units.

Every secure erase command has the following fields:

- sectors: The starting offset in 512-byte units.
- num_sectors: The number of sectors.

Signed-off-by: Alvaro Karsz <alvaro.karsz@solid-run.com>
Message-Id: <20220921082729.2516779-1-alvaro.karsz@solid-run.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>


# a4e1d0b7 15-Aug-2022 Bart Van Assche <bvanassche@acm.org>

block: Change the return type of blk_mq_map_queues() into void

Since blk_mq_map_queues() and the .map_queues() callbacks always return 0,
change their return type into void. Most callers ignore the returned value
anyway.

Cc: Christoph Hellwig <hch@lst.de>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Keith Busch <kbusch@kernel.org>
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Cc: Doug Gilbert <dgilbert@interlog.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: John Garry <john.garry@huawei.com>
Acked-by: Md Haris Iqbal <haris.iqbal@ionos.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Link: https://lore.kernel.org/r/20220815170043.19489-3-bvanassche@acm.org
[axboe: fold in fix from Bart]
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 37fafe6b 30-Aug-2022 Suwan Kim <suwan.kim027@gmail.com>

virtio-blk: Fix WARN_ON_ONCE in virtio_queue_rq()

If a request fails at virtio_queue_rqs(), it is inserted to requeue_list
and passed to virtio_queue_rq(). Then blk_mq_start_request() can be called
again at virtio_queue_rq() and trigger WARN_ON_ONCE like below trace because
request state was already set to MQ_RQ_IN_FLIGHT in virtio_queue_rqs()
despite the failure.

[ 1.890468] ------------[ cut here ]------------
[ 1.890776] WARNING: CPU: 2 PID: 122 at block/blk-mq.c:1143
blk_mq_start_request+0x8a/0xe0
[ 1.891045] Modules linked in:
[ 1.891250] CPU: 2 PID: 122 Comm: journal-offline Not tainted 5.19.0+ #44
[ 1.891504] Hardware name: ChromiumOS crosvm, BIOS 0
[ 1.891739] RIP: 0010:blk_mq_start_request+0x8a/0xe0
[ 1.891961] Code: 12 80 74 22 48 8b 4b 10 8b 89 64 01 00 00 8b 53
20 83 fa ff 75 08 ba 00 00 00 80 0b 53 24 c1 e1 10 09 d1 89 48 34 5b
41 5e c3 <0f> 0b eb b8 65 8b 05 2b 39 b6 7e 89 c0 48 0f a3 05 39 77 5b
01 0f
[ 1.892443] RSP: 0018:ffffc900002777b0 EFLAGS: 00010202
[ 1.892673] RAX: 0000000000000000 RBX: ffff888004bc0000 RCX: 0000000000000000
[ 1.892952] RDX: 0000000000000000 RSI: ffff888003d7c200 RDI: ffff888004bc0000
[ 1.893228] RBP: 0000000000000000 R08: 0000000000000001 R09: ffff888004bc0100
[ 1.893506] R10: ffffffffffffffff R11: ffffffff8185ca10 R12: ffff888004bc0000
[ 1.893797] R13: ffffc90000277900 R14: ffff888004ab2340 R15: ffff888003d86e00
[ 1.894060] FS: 00007ffa143a4640(0000) GS:ffff88807dd00000(0000)
knlGS:0000000000000000
[ 1.894412] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1.894682] CR2: 00005648577d9088 CR3: 00000000053da004 CR4: 0000000000170ee0
[ 1.894953] Call Trace:
[ 1.895139] <TASK>
[ 1.895303] virtblk_prep_rq+0x1e5/0x280
[ 1.895509] virtio_queue_rq+0x5c/0x310
[ 1.895710] ? virtqueue_add_sgs+0x95/0xb0
[ 1.895905] ? _raw_spin_unlock_irqrestore+0x16/0x30
[ 1.896133] ? virtio_queue_rqs+0x340/0x390
[ 1.896453] ? sbitmap_get+0xfa/0x220
[ 1.896678] __blk_mq_issue_directly+0x41/0x180
[ 1.896906] blk_mq_plug_issue_direct+0xd8/0x2c0
[ 1.897115] blk_mq_flush_plug_list+0x115/0x180
[ 1.897342] blk_add_rq_to_plug+0x51/0x130
[ 1.897543] blk_mq_submit_bio+0x3a1/0x570
[ 1.897750] submit_bio_noacct_nocheck+0x418/0x520
[ 1.897985] ? submit_bio_noacct+0x1e/0x260
[ 1.897989] ext4_bio_write_page+0x222/0x420
[ 1.898000] mpage_process_page_bufs+0x178/0x1c0
[ 1.899451] mpage_prepare_extent_to_map+0x2d2/0x440
[ 1.899603] ext4_writepages+0x495/0x1020
[ 1.899733] do_writepages+0xcb/0x220
[ 1.899871] ? __seccomp_filter+0x171/0x7e0
[ 1.900006] file_write_and_wait_range+0xcd/0xf0
[ 1.900167] ext4_sync_file+0x72/0x320
[ 1.900308] __x64_sys_fsync+0x66/0xa0
[ 1.900449] do_syscall_64+0x31/0x50
[ 1.900595] entry_SYSCALL_64_after_hwframe+0x63/0xcd
[ 1.900747] RIP: 0033:0x7ffa16ec96ea
[ 1.900883] Code: b8 4a 00 00 00 0f 05 48 3d 00 f0 ff ff 77 41 c3
48 83 ec 18 89 7c 24 0c e8 e3 02 f8 ff 8b 7c 24 0c 89 c2 b8 4a 00 00
00 0f 05 <48> 3d 00 f0 ff ff 77 36 89 d7 89 44 24 0c e8 43 03 f8 ff 8b
44 24
[ 1.901302] RSP: 002b:00007ffa143a3ac0 EFLAGS: 00000293 ORIG_RAX:
000000000000004a
[ 1.901499] RAX: ffffffffffffffda RBX: 0000560277ec6fe0 RCX: 00007ffa16ec96ea
[ 1.901696] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000016
[ 1.901884] RBP: 0000560277ec5910 R08: 0000000000000000 R09: 00007ffa143a4640
[ 1.902082] R10: 00007ffa16e4d39e R11: 0000000000000293 R12: 00005602773f59e0
[ 1.902459] R13: 0000000000000000 R14: 00007fffbfc007ff R15: 00007ffa13ba4000
[ 1.902763] </TASK>
[ 1.902877] ---[ end trace 0000000000000000 ]---

To avoid calling blk_mq_start_request() twice, This patch moves the
execution of blk_mq_start_request() to the end of virtblk_prep_rq().
And instead of requeuing failed request to plug list in the error path of
virtblk_add_req_batch(), it uses blk_mq_requeue_request() to change failed
request state to MQ_RQ_IDLE. Then virtblk can safely handle the request
on the next trial.

Fixes: 0e9911fa768f ("virtio-blk: support mq_ops->queue_rqs()")
Reported-by: Alexandre Courbot <acourbot@chromium.org>
Tested-by: Alexandre Courbot <acourbot@chromium.org>
Signed-off-by: Suwan Kim <suwan.kim027@gmail.com>
Message-Id: <20220830150153.12627-1-suwan.kim027@gmail.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Pankaj Raghav <p.raghav@samsung.com>


# 8d12ec10 10-Aug-2022 Shigeru Yoshida <syoshida@redhat.com>

virtio-blk: Avoid use-after-free on suspend/resume

hctx->user_data is set to vq in virtblk_init_hctx(). However, vq is
freed on suspend and reallocated on resume. So, hctx->user_data is
invalid after resume, and it will cause use-after-free accessing which
will result in the kernel crash something like below:

[ 22.428391] Call Trace:
[ 22.428899] <TASK>
[ 22.429339] virtqueue_add_split+0x3eb/0x620
[ 22.430035] ? __blk_mq_alloc_requests+0x17f/0x2d0
[ 22.430789] ? kvm_clock_get_cycles+0x14/0x30
[ 22.431496] virtqueue_add_sgs+0xad/0xd0
[ 22.432108] virtblk_add_req+0xe8/0x150
[ 22.432692] virtio_queue_rqs+0xeb/0x210
[ 22.433330] blk_mq_flush_plug_list+0x1b8/0x280
[ 22.434059] __blk_flush_plug+0xe1/0x140
[ 22.434853] blk_finish_plug+0x20/0x40
[ 22.435512] read_pages+0x20a/0x2e0
[ 22.436063] ? folio_add_lru+0x62/0xa0
[ 22.436652] page_cache_ra_unbounded+0x112/0x160
[ 22.437365] filemap_get_pages+0xe1/0x5b0
[ 22.437964] ? context_to_sid+0x70/0x100
[ 22.438580] ? sidtab_context_to_sid+0x32/0x400
[ 22.439979] filemap_read+0xcd/0x3d0
[ 22.440917] xfs_file_buffered_read+0x4a/0xc0
[ 22.441984] xfs_file_read_iter+0x65/0xd0
[ 22.442970] __kernel_read+0x160/0x2e0
[ 22.443921] bprm_execve+0x21b/0x640
[ 22.444809] do_execveat_common.isra.0+0x1a8/0x220
[ 22.446008] __x64_sys_execve+0x2d/0x40
[ 22.446920] do_syscall_64+0x37/0x90
[ 22.447773] entry_SYSCALL_64_after_hwframe+0x63/0xcd

This patch fixes this issue by getting vq from vblk, and removes
virtblk_init_hctx().

Fixes: 4e0400525691 ("virtio-blk: support polling I/O")
Cc: "Suwan Kim" <suwan.kim027@gmail.com>
Signed-off-by: Shigeru Yoshida <syoshida@redhat.com>
Message-Id: <20220810160948.959781-1-syoshida@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>


# 8b9ab626 19-Jun-2022 Christoph Hellwig <hch@lst.de>

block: remove blk_cleanup_disk

blk_cleanup_disk is nothing but a trivial wrapper for put_disk now,
so remove it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Link: https://lore.kernel.org/r/20220619060552.1850436-7-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 6f8191fd 19-Jun-2022 Christoph Hellwig <hch@lst.de>

block: simplify disk shutdown

Set the queue dying flag and call blk_mq_exit_queue from del_gendisk for
all disks that do not have separately allocated queues, and thus remove
the need to call blk_cleanup_queue for them.

Rename blk_cleanup_disk to blk_mq_destroy_queue to make it clear that
this function is intended only for separately allocated blk-mq queues.

This saves an extra queue freeze for devices without a separately
allocated queue.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Link: https://lore.kernel.org/r/20220619060552.1850436-6-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 0e9911fa 06-Apr-2022 Suwan Kim <suwan.kim027@gmail.com>

virtio-blk: support mq_ops->queue_rqs()

This patch supports mq_ops->queue_rqs() hook. It has an advantage of
batch submission to virtio-blk driver. It also helps polling I/O because
polling uses batched completion of block layer. Batch submission in
queue_rqs() can boost polling performance.

In queue_rqs(), it iterates plug->mq_list, collects requests that
belong to same HW queue until it encounters a request from other
HW queue or sees the end of the list.
Then, virtio-blk adds requests into virtqueue and kicks virtqueue
to submit requests.

If there is an error, it inserts error request to requeue_list and
passes it to ordinary block layer path.

For verification, I did fio test.
(io_uring, randread, direct=1, bs=4K, iodepth=64 numjobs=N)
I set 4 vcpu and 2 virtio-blk queues for VM and run fio test 5 times.
It shows about 2% improvement.

| numjobs=2 | numjobs=4
-----------------------------------------------------------
fio without queue_rqs() | 291K IOPS | 238K IOPS
-----------------------------------------------------------
fio with queue_rqs() | 295K IOPS | 243K IOPS

For polling I/O performance, I also did fio test as below.
(io_uring, hipri, randread, direct=1, bs=512, iodepth=64 numjobs=4)
I set 4 vcpu and 2 poll queues for VM.
It shows about 2% improvement in polling I/O.

| IOPS | avg latency
-----------------------------------------------------------
fio poll without queue_rqs() | 424K | 613.05 usec
-----------------------------------------------------------
fio poll with queue_rqs() | 435K | 601.01 usec

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Suwan Kim <suwan.kim027@gmail.com>
Message-Id: <20220406153207.163134-3-suwan.kim027@gmail.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>


# 4e040052 06-Apr-2022 Suwan Kim <suwan.kim027@gmail.com>

virtio-blk: support polling I/O

This patch supports polling I/O via virtio-blk driver. Polling
feature is enabled by module parameter "poll_queues" and it sets
dedicated polling queues for virtio-blk. This patch improves the
polling I/O throughput and latency.

The virtio-blk driver doesn't not have a poll function and a poll
queue and it has been operating in interrupt driven method even if
the polling function is called in the upper layer.

virtio-blk polling is implemented upon 'batched completion' of block
layer. virtblk_poll() queues completed request to io_comp_batch->req_list
and later, virtblk_complete_batch() calls unmap function and ends
the requests in batch.

virtio-blk reads the number of poll queues from module parameter
"poll_queues". If VM sets queue parameter as below,
("num-queues=N" [QEMU property], "poll_queues=M" [module parameter])
It allocates N virtqueues to virtio_blk->vqs[N] and it uses [0..(N-M-1)]
as default queues and [(N-M)..(N-1)] as poll queues. Unlike the default
queues, the poll queues have no callback function.

Regarding HW-SW queue mapping, the default queue mapping uses the
existing method that condsiders MSI irq vector. But the poll queue
doesn't have an irq, so it uses the regular blk-mq cpu mapping.

For verifying the improvement, I did Fio polling I/O performance test
with io_uring engine with the options below.
(io_uring, hipri, randread, direct=1, bs=512, iodepth=64 numjobs=N)
I set 4 vcpu and 4 virtio-blk queues - 2 default queues and 2 poll
queues for VM.

As a result, IOPS and average latency improved about 10%.

Test result:

- Fio io_uring poll without virtio-blk poll support
-- numjobs=1 : IOPS = 339K, avg latency = 188.33us
-- numjobs=2 : IOPS = 367K, avg latency = 347.33us
-- numjobs=4 : IOPS = 383K, avg latency = 682.06us

- Fio io_uring poll with virtio-blk poll support
-- numjobs=1 : IOPS = 385K, avg latency = 165.94us
-- numjobs=2 : IOPS = 408K, avg latency = 313.28us
-- numjobs=4 : IOPS = 424K, avg latency = 613.05us

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Signed-off-by: Suwan Kim <suwan.kim027@gmail.com>
Message-Id: <20220406153207.163134-2-suwan.kim027@gmail.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>


# 62952cc5 17-Apr-2022 Christoph Hellwig <hch@lst.de>

virtio_blk: fix the discard_granularity and discard_alignment queue limits

The discard_alignment queue limit is named a bit misleading means the
offset into the block device at which the discard granularity starts.

On the other hand the discard_sector_alignment from the virtio 1.1 looks
similar to what Linux uses as discard granularity (even if not very well
described):

"discard_sector_alignment can be used by OS when splitting a request
based on alignment. "

And at least qemu does set it to the discard granularity.

So stop setting the discard_alignment and use the virtio
discard_sector_alignment to set the discard granularity.

Fixes: 1f23816b8eb8 ("virtio_blk: add discard and write zeroes support")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Link: https://lore.kernel.org/r/20220418045314.360785-5-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 70200574 14-Apr-2022 Christoph Hellwig <hch@lst.de>

block: remove QUEUE_FLAG_DISCARD

Just use a non-zero max_discard_sectors as an indicator for discard
support, similar to what is done for write zeroes.

The only places where needs special attention is the RAID5 driver,
which must clear discard support for security reasons by default,
even if the default stacking rules would allow for it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Acked-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com> [drbd]
Acked-by: Jan Höppner <hoeppner@linux.ibm.com> [s390]
Acked-by: Coly Li <colyli@suse.de> [bcache]
Acked-by: David Sterba <dsterba@suse.com> [btrfs]
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Link: https://lore.kernel.org/r/20220415045258.199825-25-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# bcfe9b6c 16-Mar-2022 Randy Dunlap <rdunlap@infradead.org>

virtio_blk: eliminate anonymous module_init & module_exit

Eliminate anonymous module_init() and module_exit(), which can lead to
confusion or ambiguity when reading System.map, crashes/oops/bugs,
or an initcall_debug log.

Give each of these init and exit functions unique driver-specific
names to eliminate the anonymous names.

Example 1: (System.map)
ffffffff832fc78c t init
ffffffff832fc79e t init
ffffffff832fc8f8 t init

Example 2: (initcall_debug log)
calling init+0x0/0x12 @ 1
initcall init+0x0/0x12 returned 0 after 15 usecs
calling init+0x0/0x60 @ 1
initcall init+0x0/0x60 returned 0 after 2 usecs
calling init+0x0/0x9a @ 1
initcall init+0x0/0x9a returned 0 after 74 usecs

Fixes: e467cde23818 ("Block driver using virtio.")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Cc: virtualization@lists.linux-foundation.org
Cc: Jens Axboe <axboe@kernel.dk>
Cc: linux-block@vger.kernel.org
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Link: https://lore.kernel.org/r/20220316192010.19001-2-rdunlap@infradead.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 24b45e6c 15-Feb-2022 Christoph Hellwig <hch@lst.de>

virtio_blk: simplify refcounting

Implement the ->free_disk method to free the virtio_blk structure only
once the last gendisk reference goes away instead of keeping a local
refcount.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Link: https://lore.kernel.org/r/20220215094514.3828912-6-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# e030759a 04-Mar-2022 Xie Yongji <xieyongji@bytedance.com>

virtio-blk: Remove BUG_ON() in virtio_queue_rq()

Currently we have a BUG_ON() to make sure the number of sg
list does not exceed queue_max_segments() in virtio_queue_rq().
However, the block layer uses queue_max_discard_segments()
instead of queue_max_segments() to limit the sg list for
discard requests. So the BUG_ON() might be triggered if
virtio-blk device reports a larger value for max discard
segment than queue_max_segments(). To fix it, let's simply
remove the BUG_ON() which has become unnecessary after commit
02746e26c39e("virtio-blk: avoid preallocating big SGL for data").
And the unused vblk->sg_elems can also be removed together.

Fixes: 1f23816b8eb8 ("virtio_blk: add discard and write zeroes support")
Suggested-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Xie Yongji <xieyongji@bytedance.com>
Reviewed-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Link: https://lore.kernel.org/r/20220304100058.116-2-xieyongji@bytedance.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>


# dacc73ed 04-Mar-2022 Xie Yongji <xieyongji@bytedance.com>

virtio-blk: Don't use MAX_DISCARD_SEGMENTS if max_discard_seg is zero

Currently the value of max_discard_segment will be set to
MAX_DISCARD_SEGMENTS (256) with no basis in hardware if device
set 0 to max_discard_seg in configuration space. It's incorrect
since the device might not be able to handle such large descriptors.
To fix it, let's follow max_segments restrictions in this case.

Fixes: 1f23816b8eb8 ("virtio_blk: add discard and write zeroes support")
Signed-off-by: Xie Yongji <xieyongji@bytedance.com>
Link: https://lore.kernel.org/r/20220304100058.116-1-xieyongji@bytedance.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>


# d9679d00 13-Oct-2021 Michael S. Tsirkin <mst@redhat.com>

virtio: wrap config->reset calls

This will enable cleanups down the road.
The idea is to disable cbs, then add "flush_queued_cbs" callback
as a parameter, this way drivers can flush any work
queued after callbacks have been disabled.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Link: https://lore.kernel.org/r/20211013105226.20225-1-mst@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>


# b84ba30b 26-Nov-2021 Christoph Hellwig <hch@lst.de>

block: remove the gendisk argument to blk_execute_rq

Remove the gendisk aregument to blk_execute_rq and blk_execute_rq_nowait
given that it is unused now. Also convert the boolean at_head parameter
to actually use the bool type while touching the prototype.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Link: https://lore.kernel.org/r/20211126121802.2090656-5-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 1ebe2e5f 22-Nov-2021 Christoph Hellwig <hch@lst.de>

block: remove GENHD_FL_EXT_DEVT

All modern drivers can support extra partitions using the extended
dev_t. In fact except for the ioctl method drivers never even see
partitions in normal operation.

So remove the GENHD_FL_EXT_DEVT and allow extra partitions for all
block devices that do support partitions, and require those that
do not support partitions to explicit disallow them using
GENHD_FL_NO_PART.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20211122130625.1136848-12-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 0466a39b 16-Nov-2021 Ye Guojin <ye.guojin@zte.com.cn>

virtio-blk: modify the value type of num in virtio_queue_rq()

This was found by coccicheck:
./drivers/block/virtio_blk.c, 334, 14-17, WARNING Unsigned expression
compared with zero num < 0

Reported-by: Zeal Robot <zealci@zte.com.cn>
Signed-off-by: Ye Guojin <ye.guojin@zte.com.cn>
Link: https://lore.kernel.org/r/20211117063955.160777-1-ye.guojin@zte.com.cn
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Fixes: 02746e26c39e ("virtio-blk: avoid preallocating big SGL for data")
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>


# 2b17d9f8 24-Nov-2021 Michael S. Tsirkin <mst@redhat.com>

Revert "virtio-blk: don't let virtio core to validate used length"

This reverts commit a40392edf1b2c7822bc0ce68413106661a9d4232.

Attempts to validate length in the core did not work out.
We'll drop them, so revert the dependent changes in drivers.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>


# a40392ed 26-Oct-2021 Jason Wang <jasowang@redhat.com>

virtio-blk: don't let virtio core to validate used length

We never tries to use used length, so the patch prevents the virtio
core from validating used length.

Signed-off-by: Jason Wang <jasowang@redhat.com>
Link: https://lore.kernel.org/r/20211027022107.14357-4-jasowang@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>


# f0839372 25-Oct-2021 Michael S. Tsirkin <mst@redhat.com>

virtio_blk: correct types for status handling

virtblk_setup_cmd returns blk_status_t in an int, callers then assign it
back to a blk_status_t variable. blk_status_t is either u32 or (more
typically) u8 so it works, but is inelegant and causes sparse warnings.

Pass the status in blk_status_t in a consistent way.

Reported-by: kernel test robot <lkp@intel.com>
Fixes: b2c5221fd074 ("virtio-blk: avoid preallocating big SGL for data")
Cc: Max Gurtovoy <mgurtovoy@nvidia.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>


# ead65f76 24-Oct-2021 Michael S. Tsirkin <mst@redhat.com>

virtio_blk: allow 0 as num_request_queues

The default value is 0 meaning "no limit". However if 0
is specified on the command line it is instead silently
converted to 1. Further, the value is already validated
at point of use, there's no point in duplicating code
validating the value when it is set.

Simplify the code while making the behaviour more consistent
by using plain module_param.

Fixes: 1a662cf6cb9a ("virtio-blk: add num_request_queues module parameter")
Cc: Max Gurtovoy <mgurtovoy@nvidia.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>


# f1aa12f5 21-Oct-2021 Ye Guojin <ye.guojin@zte.com.cn>

virtio-blk: fixup coccinelle warnings

coccicheck complains about the use of snprintf() in sysfs show
functions:
WARNING use scnprintf or sprintf

Use sysfs_emit instead of scnprintf or sprintf makes more sense.

Reported-by: Zeal Robot <zealci@zte.com.cn>
Signed-off-by: Ye Guojin <ye.guojin@zte.com.cn>
Link: https://lore.kernel.org/r/20211021065111.1047824-1-ye.guojin@zte.com.cn
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>


# 63b4ffa4 25-Oct-2021 Colin Ian King <colin.i.king@googlemail.com>

virtio_blk: Fix spelling mistake: "advertisted" -> "advertised"

There is a spelling mistake in a dev_err error message. Fix it.

Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Link: https://lore.kernel.org/r/20211025102240.22801-1-colin.i.king@gmail.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Acked-by: Jason Wang <jasowang@redhat.com>


# 6ae6ff6f 19-Oct-2021 Jason Wang <jasowang@redhat.com>

virtio-blk: validate num_queues during probe

If an untrusted device neogitates BLK_F_MQ but advertises a zero
num_queues, the driver may end up trying to allocating zero size
buffers where ZERO_SIZE_PTR is returned which may pass the checking
against the NULL. This will lead unexpected results.

Fixing this by failing the probe in this case.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Cc: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Link: https://lore.kernel.org/r/20211019070152.8236-2-jasowang@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>


# 0989c41b 02-Sep-2021 Max Gurtovoy <mgurtovoy@nvidia.com>

virtio-blk: add num_request_queues module parameter

Sometimes a user would like to control the amount of request queues to
be created for a block device. For example, for limiting the memory
footprint of virtio-blk devices.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Link: https://lore.kernel.org/r/20210902204622.54354-1-mgurtovoy@nvidia.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>


# 02746e26 01-Sep-2021 Max Gurtovoy <mgurtovoy@nvidia.com>

virtio-blk: avoid preallocating big SGL for data

No need to pre-allocate a big buffer for the IO SGL anymore. If a device
has lots of deep queues, preallocation for the sg list can consume
substantial amounts of memory. For HW virtio-blk device, nr_hw_queues
can be 64 or 128 and each queue's depth might be 128. This means the
resulting preallocation for the data SGLs is big.

Switch to runtime allocation for SGL for lists longer than 2 entries.
This is the approach used by NVMe drivers so it should be reasonable for
virtio block as well. Runtime SGL allocation has always been the case
for the legacy I/O path so this is nothing new.

The preallocated small SGL depends on SG_CHAIN so if the ARCH doesn't
support SG_CHAIN, use only runtime allocation for the SGL.

Re-organize the setup of the IO request to fit the new sg chain
mechanism.

No performance degradation was seen (fio libaio engine with 16 jobs and
128 iodepth):

IO size IOPs Rand Read (before/after) IOPs Rand Write (before/after)
-------- --------------------------------- ----------------------------------
512B 318K/316K 329K/325K

4KB 323K/321K 353K/349K

16KB 199K/208K 250K/275K

128KB 36K/36.1K 39.2K/41.7K

Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Reviewed-by: Israel Rukshin <israelr@nvidia.com>
Link: https://lore.kernel.org/r/20210901131434.31158-1-mgurtovoy@nvidia.com
Reviewed-by: Feng Li <lifeng1519@gmail.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Tested-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Arnd Bergmann <arnd@arndb.de> # kconfig fixups
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>


# 0bf6d96c 25-Oct-2021 Christoph Hellwig <hch@lst.de>

block: remove blk_{get,put}_request

These are now pointless wrappers around blk_mq_{alloc,free}_request,
so remove them.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Link: https://lore.kernel.org/r/20211025070517.1548584-3-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 57a13a5b 26-Oct-2021 Xie Yongji <xieyongji@bytedance.com>

virtio-blk: Use blk_validate_block_size() to validate block size

The block layer can't support a block size larger than
page size yet. And a block size that's too small or
not a power of two won't work either. If a misconfigured
device presents an invalid block size in configuration space,
it will result in the kernel crash something like below:

[ 506.154324] BUG: kernel NULL pointer dereference, address: 0000000000000008
[ 506.160416] RIP: 0010:create_empty_buffers+0x24/0x100
[ 506.174302] Call Trace:
[ 506.174651] create_page_buffers+0x4d/0x60
[ 506.175207] block_read_full_page+0x50/0x380
[ 506.175798] ? __mod_lruvec_page_state+0x60/0xa0
[ 506.176412] ? __add_to_page_cache_locked+0x1b2/0x390
[ 506.177085] ? blkdev_direct_IO+0x4a0/0x4a0
[ 506.177644] ? scan_shadow_nodes+0x30/0x30
[ 506.178206] ? lru_cache_add+0x42/0x60
[ 506.178716] do_read_cache_page+0x695/0x740
[ 506.179278] ? read_part_sector+0xe0/0xe0
[ 506.179821] read_part_sector+0x36/0xe0
[ 506.180337] adfspart_check_ICS+0x32/0x320
[ 506.180890] ? snprintf+0x45/0x70
[ 506.181350] ? read_part_sector+0xe0/0xe0
[ 506.181906] bdev_disk_changed+0x229/0x5c0
[ 506.182483] blkdev_get_whole+0x6d/0x90
[ 506.183013] blkdev_get_by_dev+0x122/0x2d0
[ 506.183562] device_add_disk+0x39e/0x3c0
[ 506.184472] virtblk_probe+0x3f8/0x79b [virtio_blk]
[ 506.185461] virtio_dev_probe+0x15e/0x1d0 [virtio]

So let's use a block layer helper to validate the block size.

Signed-off-by: Xie Yongji <xieyongji@bytedance.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Link: https://lore.kernel.org/r/20211026144015.188-5-xieyongji@bytedance.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# ff631988 04-Oct-2021 Michael S. Tsirkin <mst@redhat.com>

Revert "virtio-blk: Add validation for block size in config space"

It turns out that access to config space before completing the feature
negotiation is broken for big endian guests at least with QEMU hosts up
to 6.1 inclusive. This affects any device that accesses config space in
the validate callback: at the moment that is virtio-net with
VIRTIO_NET_F_MTU but since 82e89ea077b9 ("virtio-blk: Add validation for
block size in config space") that also started affecting virtio-blk with
VIRTIO_BLK_F_BLK_SIZE. Further, unlike VIRTIO_NET_F_MTU which is off by
default on QEMU, VIRTIO_BLK_F_BLK_SIZE is on by default, which resulted
in lots of people not being able to boot VMs on BE.

The spec is very clear that what we are doing is legal so QEMU needs to
be fixed, but given it's been broken for so many years and no one
noticed, we need to give QEMU a bit more time before applying this.

Further, this patch is incomplete (does not check blk size is a power
of two) and it duplicates the logic from nbd.

Revert for now, and we'll reapply a cleaner logic in the next release.

Cc: stable@vger.kernel.org
Fixes: 82e89ea077b9 ("virtio-blk: Add validation for block size in config space")
Cc: Xie Yongji <xieyongji@bytedance.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>


# 6105d1fe 05-Sep-2021 Max Gurtovoy <mgurtovoy@nvidia.com>

virtio-blk: remove unneeded "likely" statements

Usually we use "likely/unlikely" to optimize the fast path. Remove
redundant "likely/unlikely" statements in the control path to simplify
the code and make it easier to read.

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Link: https://lore.kernel.org/r/20210905085717.7427-1-mgurtovoy@nvidia.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Chaitanya Kulkarni <ckulkarnilinux@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>


# dbb301f9 18-Aug-2021 Luis Chamberlain <mcgrof@kernel.org>

virtio_blk: add error handling support for add_disk()

We never checked for errors on add_disk() as this function
returned void. Now that this is fixed, use the shiny new
error handling.

Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Link: https://lore.kernel.org/r/20210818144542.19305-11-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 358b348b 04-Aug-2021 Christoph Hellwig <hch@lst.de>

virtio_blk: use bvec_virt

Use bvec_virt instead of open coding it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Link: https://lore.kernel.org/r/20210804095634.460779-9-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 82e89ea0 09-Aug-2021 Xie Yongji <xieyongji@bytedance.com>

virtio-blk: Add validation for block size in config space

An untrusted device might presents an invalid block size
in configuration space. This tries to add validation for it
in the validate callback and clear the VIRTIO_BLK_F_BLK_SIZE
feature bit if the value is out of the supported range.

And we also double check the value in virtblk_probe() in
case that it's changed after the validation.

Signed-off-by: Xie Yongji <xieyongji@bytedance.com>
Link: https://lore.kernel.org/r/20210809101609.148-1-xieyongji@bytedance.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>


# 63947b34 24-May-2021 Stefan Hajnoczi <stefanha@redhat.com>

virtio-blk: limit seg_max to a safe value

The struct virtio_blk_config seg_max value is read from the device and
incremented by 2 to account for the request header and status byte
descriptors added by the driver.

In preparation for supporting untrusted virtio-blk devices, protect
against integer overflow and limit the value to a safe maximum.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Link: https://lore.kernel.org/r/20210524154020.98195-1-stefanha@redhat.com
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>


# b71ba22e 17-May-2021 Xie Yongji <xieyongji@bytedance.com>

virtio-blk: Fix memory leak among suspend/resume procedure

The vblk->vqs should be freed before we call init_vqs()
in virtblk_restore().

Signed-off-by: Xie Yongji <xieyongji@bytedance.com>
Link: https://lore.kernel.org/r/20210517084332.280-1-xieyongji@bytedance.com
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>


# 4f118472 29-Apr-2021 Sohaib <sohaib.amhmd@gmail.com>

virtio_blk: cleanups: remove check obsoleted by CONFIG_LBDAF removal

Prior to 72deb455b5ec ("block: remove CONFIG_LBDAF"), it was optional if
the 32-bit kernel support block device and/or file sizes larger than 2 TiB
(considering the sector size is 512 bytes)
But now sector_t and blkcnt_t are always 64-bit in size.

Suggested-by: Ahmad Fatoum <a.fatoum@pengutronix.de>
Signed-off-by: Sohaib Mohammed <sohaib.amhmd@gmail.com>
Link: https://lore.kernel.org/r/20210430103611.77345-1-sohaib.amhmd@gmail.com
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>


# 89a5f065 02-Jun-2021 Christoph Hellwig <hch@lst.de>

virtio-blk: use blk_mq_alloc_disk

Use the blk_mq_alloc_disk API to simplify the gendisk and request_queue
allocation.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Link: https://lore.kernel.org/r/20210602065345.355274-5-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# d1e9aa9c 22-Jan-2021 Joseph Qi <joseph.qi@linux.alibaba.com>

virtio-blk: support per-device queue depth

module parameter 'virtblk_queue_depth' was firstly introduced for
testing/benchmarking purposes described in commit fc4324b4597c
("virtio-blk: base queue-depth on virtqueue ringsize or module param").
And currently 'virtblk_queue_depth' is used as a saved value for the
first probed device.
Since we have different virtio-blk devices which have different
capabilities, it requires that we support per-device queue depth instead
of per-module. So defaultly use vq free elements if module parameter
'virtblk_queue_depth' is not set.

Signed-off-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Link: https://lore.kernel.org/r/1611307306-71067-1-git-send-email-joseph.qi@linux.alibaba.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>


# 684da762 24-Jan-2021 Guoqing Jiang <guoqing.jiang@cloud.ionos.com>

block: remove unnecessary argument from blk_execute_rq

We can remove 'q' from blk_execute_rq as well after the previous change
in blk_execute_rq_nowait.

And more importantly it never really was needed to start with given
that we can trivial derive it from struct request.

Cc: linux-scsi@vger.kernel.org
Cc: virtualization@lists.linux-foundation.org
Cc: linux-ide@vger.kernel.org
Cc: linux-mmc@vger.kernel.org
Cc: linux-nvme@lists.infradead.org
Cc: linux-nfs@vger.kernel.org
Acked-by: Ulf Hansson <ulf.hansson@linaro.org> # for mmc
Signed-off-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# ddff331a 16-Nov-2020 Christoph Hellwig <hch@lst.de>

virtio-blk: remove a spurious call to revalidate_disk_size

revalidate_disk_size just updates the block device size from the disk
size. Thus calling it from virtblk_update_cache_mode doesn't actually
do anything.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 449f4ec9 16-Nov-2020 Christoph Hellwig <hch@lst.de>

block: remove the update_bdev parameter to set_capacity_revalidate_and_notify

The update_bdev argument is always set to true, so remove it. Also
rename the function to the slighly less verbose set_capacity_and_notify,
as propagating the disk size to the block device isn't really
revalidation.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Petr Vorel <pvorel@suse.cz>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 659e56ba 01-Sep-2020 Christoph Hellwig <hch@lst.de>

block: add a new revalidate_disk_size helper

revalidate_disk is a relative awkward helper for driver use, as it first
calls an optional driver method and then updates the block device size,
while most callers either don't need the method call at all, or want to
keep state between the caller and the called method.

Add a revalidate_disk_size helper that just performs the update of the
block device size from the gendisk one, and switch all drivers that do
not implement ->revalidate_disk to use the new helper instead of
revalidate_disk()

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Acked-by: Song Liu <song@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 4ce79063 20-Aug-2020 Tian Tao <tiantao6@hisilicon.com>

virtio-blk: Use kobj_to_dev() instead of container_of()

Use kobj_to_dev() instead of container_of()

Signed-off-by: Tian Tao <tiantao6@hisilicon.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# af822aa6 17-Aug-2020 Ming Lei <ming.lei@redhat.com>

block: virtio_blk: fix handling single range discard request

1f23816b8eb8 ("virtio_blk: add discard and write zeroes support") starts
to support multi-range discard for virtio-blk. However, the virtio-blk
disk may report max discard segment as 1, at least that is exactly what
qemu is doing.

So far, block layer switches to normal request merge if max discard segment
limit is 1, and multiple bios can be merged to single segment. This way may
cause memory corruption in virtblk_setup_discard_write_zeroes().

Fix the issue by handling single max discard segment in straightforward
way.

Fixes: 1f23816b8eb8 ("virtio_blk: add discard and write zeroes support")
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Changpeng Liu <changpeng.liu@intel.com>
Cc: Daniel Verkamp <dverkamp@chromium.org>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Cc: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# e7eea44e 14-Jun-2020 Hou Tao <houtao1@huawei.com>

virtio-blk: free vblk-vqs in error path of virtblk_probe()

Else there will be memory leak if alloc_disk() fails.

Fixes: 6a27b656fc02 ("block: virtio-blk: support multi virt queues per virtio-blk device")
Signed-off-by: Hou Tao <houtao1@huawei.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 15f73f5b 11-Jun-2020 Christoph Hellwig <hch@lst.de>

blk-mq: move failure injection out of blk_mq_complete_request

Move the call to blk_should_fake_timeout out of blk_mq_complete_request
and into the drivers, skipping call sites that are obvious error
handlers, and remove the now superflous blk_mq_force_complete_rq helper.
This ensures we don't keep injecting errors into completions that just
terminate the Linux request after the hardware has been reset or the
command has been aborted.

Reviewed-by: Daniel Wagner <dwagner@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 90b5feb8 30-Apr-2020 Stefan Hajnoczi <stefanha@redhat.com>

virtio-blk: handle block_device_operations callbacks after hot unplug

A userspace process holding a file descriptor to a virtio_blk device can
still invoke block_device_operations after hot unplug. This leads to a
use-after-free accessing vblk->vdev in virtblk_getgeo() when
ioctl(HDIO_GETGEO) is invoked:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000090
IP: [<ffffffffc00e5450>] virtio_check_driver_offered_feature+0x10/0x90 [virtio]
PGD 800000003a92f067 PUD 3a930067 PMD 0
Oops: 0000 [#1] SMP
CPU: 0 PID: 1310 Comm: hdio-getgeo Tainted: G OE ------------ 3.10.0-1062.el7.x86_64 #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
task: ffff9be5fbfb8000 ti: ffff9be5fa890000 task.ti: ffff9be5fa890000
RIP: 0010:[<ffffffffc00e5450>] [<ffffffffc00e5450>] virtio_check_driver_offered_feature+0x10/0x90 [virtio]
RSP: 0018:ffff9be5fa893dc8 EFLAGS: 00010246
RAX: ffff9be5fc3f3400 RBX: ffff9be5fa893e30 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000004 RDI: ffff9be5fbc10b40
RBP: ffff9be5fa893dc8 R08: 0000000000000301 R09: 0000000000000301
R10: 0000000000000000 R11: 0000000000000000 R12: ffff9be5fdc24680
R13: ffff9be5fbc10b40 R14: ffff9be5fbc10480 R15: 0000000000000000
FS: 00007f1bfb968740(0000) GS:ffff9be5ffc00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000090 CR3: 000000003a894000 CR4: 0000000000360ff0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
[<ffffffffc016ac37>] virtblk_getgeo+0x47/0x110 [virtio_blk]
[<ffffffff8d3f200d>] ? handle_mm_fault+0x39d/0x9b0
[<ffffffff8d561265>] blkdev_ioctl+0x1f5/0xa20
[<ffffffff8d488771>] block_ioctl+0x41/0x50
[<ffffffff8d45d9e0>] do_vfs_ioctl+0x3a0/0x5a0
[<ffffffff8d45dc81>] SyS_ioctl+0xa1/0xc0

A related problem is that virtblk_remove() leaks the vd_index_ida index
when something still holds a reference to vblk->disk during hot unplug.
This causes virtio-blk device names to be lost (vda, vdb, etc).

Fix these issues by protecting vblk->vdev with a mutex and reference
counting vblk so the vd_index_ida index can be removed in all cases.

Fixes: 48e4043d4529 ("virtio: add virtio disk geometry feature")
Reported-by: Lance Digby <ldigby@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Link: https://lore.kernel.org/r/20200430140442.171016-1-stefanha@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>


# 55a2415b 17-Apr-2020 Michael S. Tsirkin <mst@redhat.com>

virtio_blk: add a missing include

virtio_blk uses VIRTIO_RING_F_INDIRECT_DESC, pull in
the header defining that value.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>


# 662155e2 12-Mar-2020 Balbir Singh <sblbir@amazon.com>

virtio_blk.c: Convert to use set_capacity_revalidate_and_notify

block/genhd provides set_capacity_revalidate_and_notify() for sending RESIZE
notifications via uevents.

Signed-off-by: Balbir Singh <sblbir@amazon.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 3d973b2e 13-Feb-2020 Halil Pasic <pasic@linux.ibm.com>

virtio-blk: improve virtqueue error to BLK_STS

Let's change the mapping between virtqueue_add errors to BLK_STS
statuses, so that -ENOSPC, which indicates virtqueue full is still
mapped to BLK_STS_DEV_RESOURCE, but -ENOMEM which indicates non-device
specific resource outage is mapped to BLK_STS_RESOURCE.

Signed-off-by: Halil Pasic <pasic@linux.ibm.com>
Link: https://lore.kernel.org/r/20200213123728.61216-3-pasic@linux.ibm.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>


# f5f6b95c 13-Feb-2020 Halil Pasic <pasic@linux.ibm.com>

virtio-blk: fix hw_queue stopped on arbitrary error

Since nobody else is going to restart our hw_queue for us, the
blk_mq_start_stopped_hw_queues() is in virtblk_done() is not sufficient
necessarily sufficient to ensure that the queue will get started again.
In case of global resource outage (-ENOMEM because mapping failure,
because of swiotlb full) our virtqueue may be empty and we can get
stuck with a stopped hw_queue.

Let us not stop the queue on arbitrary errors, but only on -EONSPC which
indicates a full virtqueue, where the hw_queue is guaranteed to get
started by virtblk_done() before when it makes sense to carry on
submitting requests. Let us also remove a stale comment.

Signed-off-by: Halil Pasic <pasic@linux.ibm.com>
Cc: Jens Axboe <axboe@kernel.dk>
Fixes: f7728002c1c7 ("virtio_ring: fix return code on DMA mapping fails")
Link: https://lore.kernel.org/r/20200213123728.61216-2-pasic@linux.ibm.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>


# 782e067d 12-Dec-2019 Christoph Hellwig <hch@lst.de>

virtio-blk: remove VIRTIO_BLK_F_SCSI support

Since the need for a special flag to support SCSI passthrough on a
block device was added in May 2017 the SCSI passthrough support in
virtio-blk has been disabled. It has always been a bad idea
(just ask the original author..) and we have virtio-scsi for proper
passthrough. The feature also never made it into the virtio 1.0
or later specifications.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>


# d320a955 15-Mar-2019 Arnd Bergmann <arnd@arndb.de>

compat_ioctl: scsi: move ioctl handling into drivers

Each driver calling scsi_ioctl() gets an equivalent compat_ioctl()
handler that implements the same commands by calling scsi_compat_ioctl().

The scsi_cmd_ioctl() and scsi_cmd_blk_ioctl() functions are compatible
at this point, so any driver that calls those can do so for both native
and compat mode, with the argument passed through compat_ptr().

With this, we can remove the entries from fs/compat_ioctl.c. The new
code is larger, but should be easier to maintain and keep updated with
newly added commands.

Reviewed-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>


# 09c434b8 19-May-2019 Thomas Gleixner <tglx@linutronix.de>

treewide: Add SPDX license identifier for more missed files

Add SPDX license identifiers to all files which:

- Have no license information of any form

- Have MODULE_LICENCE("GPL*") inside which was used in the initial
scan/conversion to ignore the file

These files fall under the project license, GPL v2 only. The resulting SPDX
license identifier is:

GPL-2.0-only

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


# bf348f9b 27-Mar-2019 Dongli Zhang <dongli.zhang@oracle.com>

virtio-blk: limit number of hw queues by nr_cpu_ids

When tag_set->nr_maps is 1, the block layer limits the number of hw queues
by nr_cpu_ids. No matter how many hw queues are used by virtio-blk, as it
has (tag_set->nr_maps == 1), it can use at most nr_cpu_ids hw queues.

In addition, specifically for pci scenario, when the 'num-queues' specified
by qemu is more than maxcpus, virtio-blk would not be able to allocate more
than maxcpus vectors in order to have a vector for each queue. As a result,
it falls back into MSI-X with one vector for config and one shared for
queues.

Considering above reasons, this patch limits the number of hw queues used
by virtio-blk by nr_cpu_ids.

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 9bc00750 11-Mar-2019 Dongli Zhang <dongli.zhang@oracle.com>

virtio_blk: replace 0 by HCTX_TYPE_DEFAULT to index blk_mq_tag_set->map

Use HCTX_TYPE_DEFAULT instead of 0 to avoid hardcoding.

Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# fd1068e1 06-Feb-2019 Joerg Roedel <jroedel@suse.de>

virtio-blk: Consider virtio_max_dma_size() for maximum segment size

Segments can't be larger than the maximum DMA mapping size
supported on the platform. Take that into account when
setting the maximum segment size for a block device.

Cc: stable@vger.kernel.org
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>


# 1f23816b 01-Nov-2018 Changpeng Liu <changpeng.liu@intel.com>

virtio_blk: add discard and write zeroes support

In commit 88c85538, "virtio-blk: add discard and write zeroes features
to specification" (https://github.com/oasis-tcs/virtio-spec), the virtio
block specification has been extended to add VIRTIO_BLK_T_DISCARD and
VIRTIO_BLK_T_WRITE_ZEROES commands. This patch enables support for
discard and write zeroes in the virtio-blk driver when the device
advertises the corresponding features, VIRTIO_BLK_F_DISCARD and
VIRTIO_BLK_F_WRITE_ZEROES.

Signed-off-by: Changpeng Liu <changpeng.liu@intel.com>
Signed-off-by: Daniel Verkamp <dverkamp@chromium.org>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>


# 944e7c87 26-Nov-2018 Jens Axboe <axboe@kernel.dk>

virtio_blk: implement mq_ops->commit_rqs() hook

We need this for blk-mq to kick things into gear, if we told it that
we had more IO coming, but then failed to deliver on that promise.

Reviewed-by: Omar Sandoval <osandov@fb.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# ed76e329 29-Oct-2018 Jens Axboe <axboe@kernel.dk>

blk-mq: abstract out queue map

This is in preparation for allowing multiple sets of maps per
queue, if so desired.

Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# e982c4d0 28-Sep-2018 Hannes Reinecke <hare@suse.de>

virtio-blk: modernize sysfs attribute creation

Use new-style DEVICE_ATTR_RO/DEVICE_ATTR_RW to create the sysfs attributes
and register the disk with default sysfs attribute groups.

Signed-off-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# fef912bf 28-Sep-2018 Hannes Reinecke <hare@suse.de>

block: genhd: add 'groups' argument to device_add_disk

Update device_add_disk() to take an 'groups' argument so that
individual drivers can register a device with additional sysfs
attributes.
This avoids race condition the driver would otherwise have if these
groups were to be created with sysfs_add_groups().

Signed-off-by: Martin Wilck <martin.wilck@suse.com>
Signed-off-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 5657a819 24-May-2018 Joe Perches <joe@perches.com>

block drivers/block: Use octal not symbolic permissions

Convert the S_<FOO> symbolic permissions to their octal equivalents as
using octal and not symbolic permissions is preferred by many as more
readable.

see: https://lkml.org/lkml/2016/8/2/1945

Done with automated conversion via:
$ ./scripts/checkpatch.pl -f --types=SYMBOLIC_PERMS --fix-inplace <files...>

Miscellanea:

o Wrapped modified multi-line calls to a single line where appropriate
o Realign modified multi-line calls to open parenthesis

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# ff005a06 09-May-2018 Christoph Hellwig <hch@lst.de>

block: sanitize blk_get_request calling conventions

Switch everyone to blk_get_request_flags, and then rename
blk_get_request_flags to blk_get_request.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# daf2a501 03-Jan-2018 Stefan Hajnoczi <stefanha@redhat.com>

virtio_blk: print capacity at probe time

Print the capacity of the block device when the driver is probed. Many
users expect this since SCSI disks (sd) do it. Moreover, kernel dmesg
output is the primary source of troubleshooting information so it's
helpful to include the disk size there.

The capacity is already printed by virtio_blk when a resize event
occurs. Extract the code and reuse it from virtblk_probe().

This patch also adds the block device name to the message so it can be
correlated with a specific device:

virtio_blk virtio0: [vda] 20971520 512-byte logical blocks (10.7 GB/10.0 GiB)

Cc: Rodrigo A B Freire <rfreire@redhat.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>


# 86ff7c2a 30-Jan-2018 Ming Lei <ming.lei@redhat.com>

blk-mq: introduce BLK_STS_DEV_RESOURCE

This status is returned from driver to block layer if device related
resource is unavailable, but driver can guarantee that IO dispatch
will be triggered in future when the resource is available.

Convert some drivers to return BLK_STS_DEV_RESOURCE. Also, if driver
returns BLK_STS_RESOURCE and SCHED_RESTART is set, rerun queue after
a delay (BLK_MQ_DELAY_QUEUE) to avoid IO stalls. BLK_MQ_DELAY_QUEUE is
3 ms because both scsi-mq and nvmefc are using that magic value.

If a driver can make sure there is in-flight IO, it is safe to return
BLK_STS_DEV_RESOURCE because:

1) If all in-flight IOs complete before examining SCHED_RESTART in
blk_mq_dispatch_rq_list(), SCHED_RESTART must be cleared, so queue
is run immediately in this case by blk_mq_dispatch_rq_list();

2) if there is any in-flight IO after/when examining SCHED_RESTART
in blk_mq_dispatch_rq_list():
- if SCHED_RESTART isn't set, queue is run immediately as handled in 1)
- otherwise, this request will be dispatched after any in-flight IO is
completed via blk_mq_sched_restart()

3) if SCHED_RESTART is set concurently in context because of
BLK_STS_RESOURCE, blk_mq_delay_run_hw_queue() will cover the above two
cases and make sure IO hang can be avoided.

One invariant is that queue will be rerun if SCHED_RESTART is set.

Suggested-by: Jens Axboe <axboe@kernel.dk>
Tested-by: Laurence Oberman <loberman@redhat.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# efea2abc 27-Oct-2017 Bart Van Assche <bvanassche@acm.org>

virtio_blk: Fix an SG_IO regression

Avoid that submitting an SG_IO ioctl triggers a kernel oops that
is preceded by:

usercopy: kernel memory overwrite attempt detected to (null) (<null>) (6 bytes)
kernel BUG at mm/usercopy.c:72!

Reported-by: Dann Frazier <dann.frazier@canonical.com>
Fixes: commit ca18d6f769d2 ("block: Make most scsi_req_init() calls implicit")
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Dann Frazier <dann.frazier@canonical.com>
Cc: <stable@vger.kernel.org> # v4.13
Reviewed-by: Christoph Hellwig <hch@lst.de>

Moved virtblk_initialize_rq() inside CONFIG_VIRTIO_BLK_SCSI.

Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 1046d304 26-Jul-2017 Stefan Hajnoczi <stefanha@redhat.com>

virtio_blk: fix incorrect message when disk is resized

The message printed on disk resize is incorrect. The following is
printed when resizing to 2 GiB:

$ truncate -s 1G test.img
$ qemu -device virtio-blk-pci,logical_block_size=4096,...
(qemu) block_resize drive1 2G

virtio_blk virtio0: new size: 4194304 4096-byte logical blocks (17.2 GB/16.0 GiB)

The virtio_blk capacity config field is in 512-byte sector units
regardless of logical_block_size as per the VIRTIO specification.
Therefore the message should read:

virtio_blk virtio0: new size: 524288 4096-byte logical blocks (2.15 GB/2.0 GiB)

Note that this only affects the printed message. Thankfully the actual
block device has the correct size because the block layer expects
capacity in sectors.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>


# 6e9fe8dd 17-Aug-2017 Bart Van Assche <bvanassche@acm.org>

virtio_blk: Use blk_rq_is_scsi()

This patch does not change any functionality.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: virtualization@lists.linux-foundation.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# f53d5aa0 09-Jun-2017 Andy Shevchenko <andriy.shevchenko@linux.intel.com>

virtio_blk: Use sysfs_match_string() helper

Use sysfs_match_string() helper instead of open coded variant.

Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Jason Wang <jasowang@redhat.com>


# 9b3e9905 04-Jul-2017 Sagi Grimberg <sagi@grimberg.me>

virtio_blk: quiesce/unquiesce live IO when entering PM states

Without it its not guaranteed that no .queue_rq is inflight.

Reviewed-by: Ming Lei <ming.lei@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Cc: virtio-dev@lists.oasis-open.org
Cc: Jason Wang <jasowang@redhat.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>


# 46685d1a 19-Jun-2017 Christoph Hellwig <hch@lst.de>

blk-mq: don't bounce by default

For historical reasons we default to bouncing highmem pages for all block
queues. But the blk-mq drivers are easy to audit to ensure that we don't
need this - scsi and mtip32xx set explicit limits and everyone else doesn't
have any particular ones.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# fc17b653 03-Jun-2017 Christoph Hellwig <hch@lst.de>

blk-mq: switch ->queue_rq return value to blk_status_t

Use the same values for use for request completion errors as the return
value from ->queue_rq. BLK_STS_RESOURCE is special cased to cause
a requeue, and all the others are completed as-is.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>


# 2a842aca 03-Jun-2017 Christoph Hellwig <hch@lst.de>

block: introduce new block status code type

Currently we use nornal Linux errno values in the block layer, and while
we accept any error a few have overloaded magic meanings. This patch
instead introduces a new blk_status_t value that holds block layer specific
status codes and explicitly explains their meaning. Helpers to convert from
and to the previous special meanings are provided for now, but I suspect
we want to get rid of them in the long run - those drivers that have a
errno input (e.g. networking) usually get errnos that don't know about
the special block layer overloads, and similarly returning them to userspace
will usually return somethings that strictly speaking isn't correct
for file system operations, but that's left as an exercise for later.

For now the set of errors is a very limited set that closely corresponds
to the previous overloaded errno values, but there is some low hanging
fruite to improve it.

blk_status_t (ab)uses the sparse __bitwise annotations to allow for sparse
typechecking, so that we can easily catch places passing the wrong values.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>


# 9b2bbdb2 06-Mar-2017 Michael S. Tsirkin <mst@redhat.com>

virtio: wrap find_vqs

We are going to add more parameters to find_vqs, let's wrap the call so
we don't need to tweak all drivers every time.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>


# d6296d39 01-May-2017 Christoph Hellwig <hch@lst.de>

blk-mq: update ->init_request and ->exit_request prototypes

Remove the request_idx parameter, which can't be used safely now that we
support I/O schedulers with blk-mq. Except for a superflous check in
mtip32xx it was unused anyway.

Also pass the tag_set instead of just the driver data - this allows drivers
to avoid some code duplication in a follow on cleanup.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>


# 08e0029a 20-Apr-2017 Christoph Hellwig <hch@lst.de>

blk-mq: remove the error argument to blk_mq_complete_request

Now that all drivers that call blk_mq_complete_requests have a
->complete callback we can remove the direct call to blk_mq_end_request,
as well as the error argument to blk_mq_complete_request.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Bart Van Assche <Bart.VanAssche@sandisk.com>
Signed-off-by: Jens Axboe <axboe@fb.com>


# 17d5363b 20-Apr-2017 Christoph Hellwig <hch@lst.de>

scsi: introduce a result field in struct scsi_request

This passes on the scsi_cmnd result field to users of passthrough
requests. Currently we abuse req->errors for this purpose, but that
field will go away in its current form.

Note that the old IDE code abuses the errors field in very creative
ways and stores all kinds of different values in it. I didn't dare
to touch this magic, so the abuses are brought forward 1:1.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Bart Van Assche <Bart.VanAssche@sandisk.com>
Signed-off-by: Jens Axboe <axboe@fb.com>


# d19633d5 20-Apr-2017 Christoph Hellwig <hch@lst.de>

virtio_blk: don't use req->errors

Remove passing req->errors (which at that point is always 0) to
blk_mq_complete_request, and rely on the virtio status code for the
serial number passthrough request.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@fb.com>


# a1a6e62b 20-Apr-2017 Christoph Hellwig <hch@lst.de>

virtio: fix spelling of virtblk_scsi_request_done

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Bart Van Assche <Bart.VanAssche@sandisk.com>
Signed-off-by: Jens Axboe <axboe@fb.com>


# b7819b92 20-Apr-2017 Christoph Hellwig <hch@lst.de>

block: remove the blk_execute_rq return value

The function only returns -EIO if rq->errors is non-zero, which is not
very useful and lets a large number of callers ignore the return value.

Just let the callers figure out their error themselves.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Bart Van Assche <Bart.VanAssche@sandisk.com>
Signed-off-by: Jens Axboe <axboe@fb.com>


# f363b089 30-Mar-2017 Eric Biggers <ebiggers@google.com>

blk-mq: constify struct blk_mq_ops

Constify all instances of blk_mq_ops, as they are never modified.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Jens Axboe <axboe@fb.com>


# ad71473d 05-Feb-2017 Christoph Hellwig <hch@lst.de>

virtio_blk: use virtio IRQ affinity

Use automatic IRQ affinity assignment in the virtio layer if available,
and build the blk-mq queues based on it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>


# fb5e31d9 05-Feb-2017 Christoph Hellwig <hch@lst.de>

virtio: allow drivers to request IRQ affinity when creating VQs

Add a struct irq_affinity pointer to the find_vqs methods, which if set
is used to tell the PCI layer to create the MSI-X vectors for our I/O
virtqueues with the proper affinity from the start. Compared to after
the fact affinity hints this gives us an instantly working setup and
allows to allocate the irq descritors node-local and avoid interconnect
traffic. Last but not least this will allow blk-mq queues are created
based on the interrupt affinity for storage drivers.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>


# aebf526b 31-Jan-2017 Christoph Hellwig <hch@lst.de>

block: fold cmd_type into the REQ_OP_ space

Instead of keeping two levels of indirection for requests types, fold it
all into the operations. The little caveat here is that previously
cmd_type only applied to struct request, while the request and bio op
fields were set to plain REQ_OP_READ/WRITE even for passthrough
operations.

Instead this patch adds new REQ_OP_* for SCSI passthrough and driver
private requests, althought it has to add two for each so that we
can communicate the data in/out nature of the request.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>


# 97b50a65 28-Jan-2017 Christoph Hellwig <hch@lst.de>

virtio_blk: make SCSI passthrough support configurable

The SCSI passthrough idea was a a bad idea to start with (guess who came
up with it?), and has been removed from the virtio 1.O spec, and is not
enabled by defauly by any host I know of. Add a separate config option
for it so that we don't need to enable it for most setups. That way
any bugs related to it (like the one recently fixed for vmapped stacks)
do not affect other users, and the size of the virtblk_req structure
also shrinks significantly.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>


# 85dada09 28-Jan-2017 Christoph Hellwig <hch@lst.de>

virtio_blk: remove struct request backpointer from virtblk_req

We can simply use blk_mq_rq_from_pdu to get back at the request at
I/O completion time.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>


# 82ed4db4 27-Jan-2017 Christoph Hellwig <hch@lst.de>

block: split scsi_request out of struct request

And require all drivers that want to support BLOCK_PC to allocate it
as the first thing of their private data. To support this the legacy
IDE and BSG code is switched to set cmd_size on their queues to let
the block layer allocate the additional space.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>


# 2c935bc5 14-Nov-2016 Peter Zijlstra <peterz@infradead.org>

locking/atomic, kref: Add kref_read()

Since we need to change the implementation, stop exposing internals.

Provide kref_read() to read the current reference count; typically
used for debug messages.

Kills two anti-patterns:

atomic_read(&kref->refcount)
kref->refcount.counter

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 6bf6b0aa 09-Jan-2017 Omar Sandoval <osandov@fb.com>

virtio_blk: fix panic in initialization error path

If blk_mq_init_queue() returns an error, it gets assigned to
vblk->disk->queue. Then, when we call put_disk(), we end up calling
blk_put_queue() with the ERR_PTR, causing a bad dereference. Fix it by
only assigning to vblk->disk->queue on success.

Signed-off-by: Omar Sandoval <osandov@fb.com>
Reviewed-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Jens Axboe <axboe@fb.com>


# a14d749f 09-Jan-2017 Christoph Hellwig <hch@lst.de>

virtio_blk: avoid DMA to stack for the sense buffer

Most users of BLOCK_PC requests allocate the sense buffer on the stack,
so to avoid DMA to the stack copy them to a field in the heap allocated
virtblk_req structure. Without that any attempt at SCSI passthrough I/O,
including the SG_IO ioctl from userspace will crash the kernel. Note that
this includes running tools like hdparm even when the host does not have
SCSI passthrough enabled.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: stable@vger.kernel.org # v4.9+
Signed-off-by: Jens Axboe <axboe@fb.com>


# 2ff98449 13-Sep-2016 Markus Elfring <elfring@users.sourceforge.net>

virtio_blk: Delete an unnecessary initialisation in init_vq()

The local variable "err" will be set to an appropriate value
by a following statement.
Thus omit the explicit initialisation at the beginning.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>


# 668866b6 13-Sep-2016 Markus Elfring <elfring@users.sourceforge.net>

virtio_blk: Use kmalloc_array() in init_vq()

Multiplications for the size determination of memory allocations
indicated that array data structures should be processed.
Thus use the corresponding function "kmalloc_array".

This issue was detected by using the Coccinelle software.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>


# 7d7e0f90 14-Sep-2016 Christoph Hellwig <hch@lst.de>

blk-mq: remove ->map_queue

All drivers use the default, so provide an inline version of it. If we
ever need other queue mapping we can add an optional method back,
although supporting will also require major changes to the queue setup
code.

This provides better code generation, and better debugability as well.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>


# 347a5293 09-Aug-2016 Minfei Huang <mnghuan@gmail.com>

virtio_blk: Fix a slient kernel panic

We do a lot of memory allocation in function init_vq, and don't handle
the allocation failure properly. Then this function will return 0,
although initialization fails due to lacking memory. At that moment,
kernel will panic in guest machine, if virtio is used to drive disk.

To fix this bug, we should take care of allocation failure, and return
correct value to let caller know what happen.

Tested-by: Chao Fan <fanc.fnst@cn.fujitsu.com>
Signed-off-by: Minfei Huang <mnghuan@gmail.com>
Signed-off-by: Minfei Huang <minfei.hmf@alibaba-inc.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>


# 0c4de0f3 19-Jul-2016 Christoph Hellwig <hch@lst.de>

block: ensure bios return from blk_get_request are properly initialized

blk_get_request is used for BLOCK_PC and similar passthrough requests.
Currently we always need to call blk_rq_set_block_pc or an open coded
version of it to allow appending bios using the request mapping helpers
later on, which is a somewhat awkward API. Instead move the
initialization part of blk_rq_set_block_pc into blk_get_request, so that
we always have a safe to use request.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>


# f9596695 19-Jul-2016 Christoph Hellwig <hch@lst.de>

virtio_blk: use blk_rq_map_kern

Similar to how SCSI and NVMe prepare passthrough requests. This avoids
poking into request internals too much.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>


# 0d52c756 15-Jun-2016 Dan Williams <dan.j.williams@intel.com>

block: convert to device_add_disk()

For block drivers that specify a parent device, convert them to use
device_add_disk().

This conversion was done with the following semantic patch:

@@
struct gendisk *disk;
expression E;
@@

- disk->driverfs_dev = E;
...
- add_disk(disk);
+ device_add_disk(E, disk);

@@
struct gendisk *disk;
expression E1, E2;
@@

- disk->driverfs_dev = E1;
...
E2 = disk;
...
- add_disk(E2);
+ device_add_disk(E1, E2);

...plus some manual fixups for a few missed conversions.

Cc: Jens Axboe <axboe@fb.com>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: James Bottomley <James.Bottomley@hansenpartnership.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>


# 3a5e02ce 05-Jun-2016 Mike Christie <mchristi@redhat.com>

block, drivers: add REQ_OP_FLUSH operation

This adds a REQ_OP_FLUSH operation that is sent to request_fn
based drivers by the block layer's flush code, instead of
sending requests with the request->cmd_flags REQ_FLUSH bit set.

Signed-off-by: Mike Christie <mchristi@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>


# ad9126ac 30-Mar-2016 Jens Axboe <axboe@fb.com>

virtio_blk: switch to using blk_queue_write_cache()

Signed-off-by: Jens Axboe <axboe@fb.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>


# 592002f5 24-Feb-2016 Michael S. Tsirkin <mst@redhat.com>

virtio_blk: VIRTIO_BLK_F_WCE->VIRTIO_BLK_F_FLUSH

Latest virtio spec says the feature bit name is VIRTIO_BLK_F_FLUSH,
VIRTIO_BLK_F_WCE is the legacy name. virtio blk header says exactly the
reverse - fix that and update driver code to match.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>


# f4829a9b 27-Sep-2015 Christoph Hellwig <hch@lst.de>

blk-mq: fix racy updates of rq->errors

blk_mq_complete_request may be a no-op if the request has already
been completed by others means (e.g. a timeout or cancellation), but
currently drivers have to set rq->errors before calling
blk_mq_complete_request, which might leave us with the wrong error value.

Add an error parameter to blk_mq_complete_request so that we can
defer setting rq->errors until we known we won the race to complete the
request.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Jens Axboe <axboe@fb.com>


# 5fa3142d 06-Sep-2015 Fam Zheng <famz@redhat.com>

virtio-blk: Allow extended partitions

This will allow up to DISK_MAX_PARTS (256) partitions, with for example
GPT in the guest. Otherwise, the partition scan code will only discover
the first 15 partitions.

Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>


# 53eab6fd 21-Aug-2015 Paolo Bonzini <pbonzini@redhat.com>

virtio-blk: use VIRTIO_BLK_F_WCE and VIRTIO_BLK_F_CONFIG_WCE in virtio1

VIRTIO_BLK_F_CONFIG_WCE is important in order to achieve good performance
(up to 2x, though more realistically +30-40%) in latency-bound workloads.
However, it was removed by mistake together with VIRTIO_BLK_F_FLUSH.

It will be restored in the next revision of the virtio 1.0 standard, so
do the same in Linux.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>


# 4f8c9510 17-Apr-2015 Christoph Hellwig <hch@lst.de>

block: rename REQ_TYPE_SPECIAL to REQ_TYPE_DRV_PRIV

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>


# b9f28d86 05-Mar-2015 James Bottomley <JBottomley@Odin.com>

sd, mmc, virtio_blk, string_helpers: fix block size units

The current string_get_size() overflows when the device size goes over
2^64 bytes because the string helper routine computes the suffix from
the size in bytes. However, the entirety of Linux thinks in terms of
blocks, not bytes, so this will artificially induce an overflow on very
large devices. Fix this by making the function string_get_size() take
blocks and the block size instead of bytes. This should allow us to
keep working until the current SCSI standard overflows.

Also fix virtio_blk and mmc (both of which were also artificially
multiplying by the block size to pass a byte side to string_get_size()).

The mathematics of this is pretty simple: we're taking a product of
size in blocks (S) and block size (B) and trying to re-express this in
exponential form: S*B = R*N^E (where N, the exponent is either 1000 or
1024) and R < N. Mathematically, S = RS*N^ES and B=RB*N^EB, so if RS*RB
< N it's easy to see that S*B = RS*RB*N^(ES+EB). However, if RS*BS > N,
we can see that this can be re-expressed as RS*BS = R*N (where R =
RS*BS/N < N) so the whole exponent becomes R*N^(ES+EB+1)

[jejb: fix incorrect 32 bit do_div spotted by kbuild test robot <fengguang.wu@intel.com>]
Acked-by: Ulf Hansson <ulf.hansson@linaro.org>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: James Bottomley <JBottomley@Odin.com>


# bb6ec576 15-Jan-2015 Michael S. Tsirkin <mst@redhat.com>

virtio_blk: coding style fixes

Most of our code has
struct foo {
}

Fix two instances where blk is inconsistent.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


# a4379fd8 12-Jan-2015 Michael S. Tsirkin <mst@redhat.com>

virtio/blk: verify device has config space

Some devices might not implement config space access
(e.g. remoteproc used not to - before 3.9).
virtio/blk needs config space access so make it
fail gracefully if not there.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


# 35b489d3 02-Jan-2015 Ming Lei <ming.lei@canonical.com>

block: fix checking return value of blk_mq_init_queue

Check IS_ERR_OR_NULL(return value) instead of just return value.

Signed-off-by: Ming Lei <ming.lei@canonical.com>

Reduced to IS_ERR() by me, we never return NULL.
Signed-off-by: Jens Axboe <axboe@fb.com>


# 51cdc381 01-Dec-2014 Michael S. Tsirkin <mst@redhat.com>

virtio: drop VIRTIO_F_VERSION_1 from drivers

Core activates this bit automatically now,
drop it from drivers that set it explicitly.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>


# 38f37b57 23-Oct-2014 Michael S. Tsirkin <mst@redhat.com>

virtio_blk: fix race at module removal

If a device appears while module is being removed,
driver will get a callback after we've given up
on the major number.

In theory this means this major number can get reused
by something else, resulting in a conflict.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>


# 393c525b 23-Oct-2014 Michael S. Tsirkin <mst@redhat.com>

virtio_blk: make serial attribute static

It's never declared so no need to make it extern.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>


# 19c1c5a6 07-Oct-2014 Michael S. Tsirkin <mst@redhat.com>

virtio_blk: v1.0 support

Based on patch by Cornelia Huck.

Note: for consistency, and to avoid sparse errors,
convert all fields, even those no longer in use
for virtio v1.0.

Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>


# 46652a86 09-Nov-2014 Michael S. Tsirkin <mst@redhat.com>

virtio_blk: fix race at module removal

If a device appears while module is being removed,
driver will get a callback after we've given up
on the major number.

In theory this means this major number can get reused
by something else, resulting in a conflict.

To fix, cleanup in reverse order of initialization.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Ming Lei <ming.lei@canonical.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


# 74c45052 29-Oct-2014 Jens Axboe <axboe@fb.com>

blk-mq: add a 'list' parameter to ->queue_rq()

Since we have the notion of a 'last' request in a chain, we can use
this to have the hardware optimize the issuing of requests. Add
a list_head parameter to queue_rq that the driver can use to
temporarily store hw commands for issue when 'last' is true. If we
are doing a chain of requests, pass in a NULL list for the first
request to force issue of that immediately, then batch the remainder
for deferred issue until the last request has been sent.

Instead of adding yet another argument to the hot ->queue_rq path,
encapsulate the passed arguments in a blk_mq_queue_data structure.
This is passed as a constant, and has been tested as faster than
passing 4 (or even 3) args through ->queue_rq. Update drivers for
the new ->queue_rq() prototype. There are no functional changes
in this patch for drivers - if they don't use the passed in list,
then they will just queue requests individually like before.

Signed-off-by: Jens Axboe <axboe@fb.com>


# 6d62c37f 14-Oct-2014 Michael S. Tsirkin <mst@redhat.com>

virtio_blk: enable VQs early on restore

virtio spec requires drivers to set DRIVER_OK before using VQs.
This is set automatically after restore returns, virtio block violated
this rule on restore by restarting queues, which might in theory
cause the VQ to be used directly within restore.

To fix, call virtio_device_ready before using starting queues.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


# 7a11370e 14-Oct-2014 Michael S. Tsirkin <mst@redhat.com>

virtio_blk: enable VQs early

virtio spec requires drivers to set DRIVER_OK before using VQs.
This is set automatically after probe returns, virtio block violated this
rule by calling add_disk, which causes the VQ to be used directly within
probe.

To fix, call virtio_device_ready before using VQs.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


# 1f54b0c0 14-Oct-2014 Michael S. Tsirkin <mst@redhat.com>

virtio-blk: drop config_mutex

config_mutex served two purposes: prevent multiple concurrent config
change handlers, and synchronize access to config_enable flag.

Since commit dbf2576e37da0fcc7aacbfbb9fd5d3de7888a3c1
workqueue: make all workqueues non-reentrant
all workqueues are non-reentrant, and config_enable
is now gone.

Get rid of the unnecessary lock.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


# cc74f719 14-Oct-2014 Michael S. Tsirkin <mst@redhat.com>

virtio_blk: drop config_enable

Now that virtio core ensures config changes don't
arrive during probing, drop config_enable flag
in virtio blk.
On removal, flush is now sufficient to guarantee that
no change work is queued.

This help simplify the driver, and will allow
setting DRIVER_OK earlier without losing config
change notifications.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


# c8a446ad 13-Sep-2014 Christoph Hellwig <hch@lst.de>

blk-mq: rename blk_mq_end_io to blk_mq_end_request

Now that we've changed the driver API on the submission side use the
opportunity to fix up the name on the completion side to fit into the
general scheme.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>


# e2490073 13-Sep-2014 Christoph Hellwig <hch@lst.de>

blk-mq: call blk_mq_start_request from ->queue_rq

When we call blk_mq_start_request from the core blk-mq code before calling into
->queue_rq there is a racy window where the timeout handler can hit before we've
fully set up the driver specific part of the command.

Move the call to blk_mq_start_request into the driver so the driver can start
the request only once it is fully set up.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>


# bf572297 13-Sep-2014 Christoph Hellwig <hch@lst.de>

blk-mq: remove REQ_END

Pass an explicit parameter for the last request in a batch to ->queue_rq
instead of using a request flag. Besides being a cleaner and non-stateful
interface this is also required for the next patch, which fixes the blk-mq
I/O submission code to not start a time too early.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>


# 6a27b656 26-Jun-2014 Ming Lei <ming.lei@canonical.com>

block: virtio-blk: support multi virt queues per virtio-blk device

Firstly this patch supports more than one virtual queues for virtio-blk
device.

Secondly this patch maps the virtual queue to blk-mq's hardware queue.

With this approach, both scalability and performance can be improved.

Signed-off-by: Ming Lei <ming.lei@canonical.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jens Axboe <axboe@fb.com>


# e8edca6f 29-May-2014 Ming Lei <ming.lei@canonical.com>

block: virtio_blk: don't hold spin lock during world switch

Firstly, it isn't necessary to hold lock of vblk->vq_lock
when notifying hypervisor about queued I/O.

Secondly, virtqueue_notify() will cause world switch and
it may take long time on some hypervisors(such as, qemu-arm),
so it isn't good to hold the lock and block other vCPUs.

On arm64 quad core VM(qemu-kvm), the patch can increase I/O
performance a lot with VIRTIO_RING_F_EVENT_IDX enabled:
- without the patch: 14K IOPS
- with the patch: 34K IOPS

fio script:
[global]
direct=1
bsrange=4k-4k
timeout=10
numjobs=4
ioengine=libaio
iodepth=64

filename=/dev/vdc
group_reporting=1

[f1]
rw=randread

Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: virtualization@lists.linux-foundation.org
Signed-off-by: Ming Lei <ming.lei@canonical.com>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: stable@kernel.org # 3.13+
Signed-off-by: Jens Axboe <axboe@fb.com>


# cdef54dd 28-May-2014 Christoph Hellwig <hch@lst.de>

blk-mq: remove alloc_hctx and free_hctx methods

There is no need for drivers to control hardware context allocation
now that we do the context to node mapping in common code.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>


# aa0818c6 16-May-2014 Ming Lei <tom.leiming@gmail.com>

virtio_blk: fix race between start and stop queue

When there isn't enough vring descriptor for adding to vq,
blk-mq will be put as stopped state until some of pending
descriptors are completed & freed.

Unfortunately, the vq's interrupt may come just before
blk-mq's BLK_MQ_S_STOPPED flag is set, so the blk-mq will
still be kept as stopped even though lots of descriptors
are completed and freed in the interrupt handler. The worst
case is that all pending descriptors are freed in the
interrupt handler, and the queue is kept as stopped forever.

This patch fixes the problem by starting/stopping blk-mq
with holding vq_lock.

Cc: Jens Axboe <axboe@kernel.dk>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Ming Lei <tom.leiming@gmail.com>
Cc: stable@kernel.org
Signed-off-by: Jens Axboe <axboe@fb.com>

Conflicts:
drivers/block/virtio_blk.c


# 0c29e93e 16-May-2014 Ming Lei <tom.leiming@gmail.com>

virtio_blk: fix race between start and stop queue

When there isn't enough vring descriptor for adding to vq,
blk-mq will be put as stopped state until some of pending
descriptors are completed & freed.

Unfortunately, the vq's interrupt may come just before
blk-mq's BLK_MQ_S_STOPPED flag is set, so the blk-mq will
still be kept as stopped even though lots of descriptors
are completed and freed in the interrupt handler. The worst
case is that all pending descriptors are freed in the
interrupt handler, and the queue is kept as stopped forever.

This patch fixes the problem by starting/stopping blk-mq
with holding vq_lock.

Cc: Jens Axboe <axboe@kernel.dk>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Ming Lei <tom.leiming@gmail.com>
Signed-off-by: Jens Axboe <axboe@fb.com>


# 1b4a3258 16-Apr-2014 Christoph Hellwig <hch@lst.de>

blk-mq: add async parameter to blk_mq_start_stopped_hw_queues

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>


# 24d2f903 15-Apr-2014 Christoph Hellwig <hch@lst.de>

blk-mq: split out tag initialization, support shared tags

Add a new blk_mq_tag_set structure that gets set up before we initialize
the queue. A single blk_mq_tag_set structure can be shared by multiple
queues.

Signed-off-by: Christoph Hellwig <hch@lst.de>

Modular export of blk_mq_{alloc,free}_tagset added by me.

Signed-off-by: Jens Axboe <axboe@fb.com>


# e9b267d9 15-Apr-2014 Christoph Hellwig <hch@lst.de>

blk-mq: add ->init_request and ->exit_request methods

The current blk_mq_init_commands/blk_mq_free_commands interface has a
two problems:

1) Because only the constructor is passed to blk_mq_init_commands there
is no easy way to clean up when a comman initialization failed. The
current code simply leaks the allocations done in the constructor.

2) There is no good place to call blk_mq_free_commands: before
blk_cleanup_queue there is no guarantee that all outstanding
commands have completed, so we can't free them yet. After
blk_cleanup_queue the queue has usually been freed. This can be
worked around by grabbing an unconditional reference before calling
blk_cleanup_queue and dropping it after blk_mq_free_commands is
done, although that's not exatly pretty and driver writers are
guaranteed to get it wrong sooner or later.

Both issues are easily fixed by making the request constructor and
destructor normal blk_mq_ops methods.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>


# 9d74e257 14-Apr-2014 Christoph Hellwig <hch@lst.de>

blk-mq: do not initialize req->special

Drivers can reach their private data easily using the blk_mq_rq_to_pdu
helper and don't need req->special. By not initializing it code can
be simplified nicely, and we also shave off a few more instructions from
the I/O path.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>


# fc4324b4 19-Mar-2014 Rusty Russell <rusty@rustcorp.com.au>

virtio-blk: base queue-depth on virtqueue ringsize or module param

Venkatash spake thus:

virtio-blk set the default queue depth to 64 requests, which was
insufficient for high-IOPS devices. Instead set the blk-queue depth to
the device's virtqueue depth divided by two (each I/O requires at least
two VQ entries).

But behold, Ted added a module parameter:

Also allow the queue depth to be something which can be set at module
load time or via a kernel boot-time parameter, for
testing/benchmarking purposes.

And I rewrote it substantially, mainly to take
VIRTIO_RING_F_INDIRECT_DESC into account.

As QEMU sets the vq size for PCI to 128, Venkatash's patch wouldn't
have made a change. This version does (since QEMU also offers
VIRTIO_RING_F_INDIRECT_DESC.

Inspired-by: "Theodore Ts'o" <tytso@mit.edu>
Based-on-the-true-story-of: Venkatesh Srinivas <venkateshs@google.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: virtio-dev@lists.oasis-open.org
Cc: virtualization@lists.linux-foundation.org
Cc: Frank Swiderski <fes@google.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


# 95363efd 14-Mar-2014 Jens Axboe <axboe@fb.com>

blk-mq: allow blk_mq_init_commands() to return failure

If drivers do dynamic allocation in the hardware command init
path, then we need to be able to handle and return failures.

And if they do allocations or mappings in the init command path,
then we need a cleanup function to free up that space at exit
time. So add blk_mq_free_commands() as the cleanup function.

This is required for the mtip32xx driver conversion to blk-mq.

Signed-off-by: Jens Axboe <axboe@fb.com>


# 5261b85e 12-Mar-2014 Rusty Russell <rusty@rustcorp.com.au>

virtio_blk: don't crash, report error if virtqueue is broken.

A bad implementation of virtio might cause us to mark the virtqueue
broken: we'll dev_err() in that case, and the device is useless, but
let's not BUG_ON().

ENOMEM or ENOSPC implies the ring is full, and we should try again
later (-ENOMEM is documented to happen, but doesn't, as we fall
through to ENOSPC).

EIO means it's broken.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


# 5124c285 10-Feb-2014 Christoph Hellwig <hch@infradead.org>

virtio_blk: use blk_mq_complete_request

Make sure to complete requests on the submitting CPU. Previously this
was done in blk_mq_end_io, but the responsibility shifted to the drivers.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>


# f02b9ac3 19-Nov-2013 Shaohua Li <shli@fusionio.com>

virtio-blk: virtqueue_kick() must be ordered with other virtqueue operations

It isn't safe to call it without holding the vblk->vq_lock.

Reported-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Shaohua Li <shli@fusionio.com>

Fixed another condition of virtqueue_kick() not holding the lock.

Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 1cf7e9c6 01-Nov-2013 Jens Axboe <axboe@kernel.dk>

virtio_blk: blk-mq support

Switch virtio-blk from the dual support for old-style requests and bios
to use the block-multiqueue.

Acked-by: Asias He <asias@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Christoph Hellwig <hch@lst.de>


# 7f03b17d 28-Oct-2013 Heinz Graalfs <graalfs@linux.vnet.ibm.com>

virtio_blk: verify if queue is broken after virtqueue_get_buf()

In case virtqueue_get_buf() returned with a NULL pointer verify if the
virtqueue is broken in order to leave while loop.

Signed-off-by: Heinz Graalfs <graalfs@linux.vnet.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


# 855e0c52 14-Oct-2013 Rusty Russell <rusty@rustcorp.com.au>

virtio: use size-based config accessors.

This lets the transport do endian conversion if necessary, and insulates
the drivers from the difference.

Most drivers can use the simple helpers virtio_cread() and virtio_cwrite().

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


# 89107000 16-Sep-2013 Aaron Lu <aaron.lu@intel.com>

virtio: pm: use CONFIG_PM_SLEEP instead of CONFIG_PM

The freeze and restore functions defined in virtio drivers are used
for suspend and hibernate, so CONFIG_PM_SLEEP is more appropriate than
CONFIG_PM. This patch replace all CONFIG_PM with CONFIG_PM_SLEEP for
virtio drivers that implement freeze and restore callbacks.

Signed-off-by: Aaron Lu <aaron.lu@intel.com>
Reviewed-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


# 2a647bfe 19-May-2013 Jonghwan Choi <jhbird.choi@samsung.com>

virtio_blk: Add missing 'static' qualifiers

Add missing 'static' qualifiers

Signed-off-by: Jonghwan Choi <jhbird.choi@samsung.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


# 0a11cc36 19-Mar-2013 Rusty Russell <rusty@rustcorp.com.au>

virtio_blk: remove nents member.

It's simply a flag as to whether we have data now, so make it an
explicit function parameter rather than a member of struct
virtblk_req.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Reviewed-by: Asias He <asias@redhat.com>


# 20af3cfd 19-Mar-2013 Paolo Bonzini <pbonzini@redhat.com>

virtio-blk: use virtqueue_add_sgs on req path

(This is a respin of Paolo Bonzini's patch, but it calls
virtqueue_add_sgs() instead of his multi-part API).

This is similar to the previous patch, but a bit more radical
because the bio and req paths now share the buffer construction
code. Because the req path doesn't use vbr->sg, however, we
need to add a couple of arguments to __virtblk_add_req.

We also need to teach __virtblk_add_req how to build SCSI command
requests.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Reviewed-by: Asias He <asias@redhat.com>


# 8f39db9d 19-Mar-2013 Paolo Bonzini <pbonzini@redhat.com>

virtio-blk: use virtqueue_add_sgs on bio path

(This is a respin of Paolo Bonzini's patch, but it calls
virtqueue_add_sgs() instead of his multi-part API).

Move the creation of the request header and response footer to
__virtblk_add_req. vbr->sg only contains the data scatterlist,
the header/footer are added separately using virtqueue_add_sgs().

With this change, virtio-blk (with use_bio) is not relying anymore on
the virtio functions ignoring the end markers in a scatterlist.
The next patch will do the same for the other path.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Reviewed-by: Asias He <asias@redhat.com>


# 5ee21a52 19-Mar-2013 Paolo Bonzini <pbonzini@redhat.com>

virtio-blk: reorganize virtblk_add_req

Right now, both virtblk_add_req and virtblk_add_req_wait call
virtqueue_add_buf. To prepare for the next patches, abstract the call
to virtqueue_add_buf into a new function __virtblk_add_req, and include
the waiting logic directly in virtblk_add_req.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Asias He <asias@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


# 9d9598b8 11-Mar-2013 Milos Vyletel <milos.vyletel@sde.cz>

virtio-blk: emit udev event when device is resized

When virtio-blk device is resized from host (using block_resize from QEMU) emit
KOBJ_CHANGE uevent to notify guest about such change. This allows user to have
custom udev rules which would take whatever action if such event occurs. As a
proof of concept I've created simple udev rule that automatically resize
filesystem on virtio-blk device.

ACTION=="change", KERNEL=="vd*", \
ENV{RESIZE}=="1", \
ENV{ID_FS_TYPE}=="ext[3-4]", \
RUN+="/sbin/resize2fs /dev/%k"
ACTION=="change", KERNEL=="vd*", \
ENV{RESIZE}=="1", \
ENV{ID_FS_TYPE}=="LVM2_member", \
RUN+="/sbin/pvresize /dev/%k"

Signed-off-by: Milos Vyletel <milos.vyletel@sde.cz>
Tested-by: Asias He <asias@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (minor simplification)


# 8d85fce7 21-Dec-2012 Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Drivers: block: remove __dev* attributes.

CONFIG_HOTPLUG is going away as an option. As a result, the __dev*
markings need to be removed.

This change removes the use of __devinit, __devexit_p, __devinitdata,
__devinitconst, and __devexit from these drivers.

Based on patches originally written by Bill Pemberton, but redone by me
in order to handle some of the coding style issues better, by hand.

Cc: Bill Pemberton <wfp5p@virginia.edu>
Cc: Mike Miller <mike.miller@hp.com>
Cc: Chirag Kantharia <chirag.kantharia@hp.com>
Cc: Geoff Levand <geoff@infradead.org>
Cc: Jim Paris <jim@jtan.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Grant Likely <grant.likely@secretlab.ca>
Cc: Matthew Wilcox <matthew.r.wilcox@intel.com>
Cc: Keith Busch <keith.busch@intel.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: NeilBrown <neilb@suse.de>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Tao Guo <Tao.Guo@emc.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


# f4953fe6 01-Jan-2013 Alexander Graf <agraf@suse.de>

virtio-blk: Don't free ida when disk is in use

When a file system is mounted on a virtio-blk disk, we then remove it
and then reattach it, the reattached disk gets the same disk name and
ids as the hot removed one.

This leads to very nasty effects - mostly rendering the newly attached
device completely unusable.

Trying what happens when I do the same thing with a USB device, I saw
that the sd node simply doesn't get free'd when a device gets forcefully
removed.

Imitate the same behavior for vd devices. This way broken vd devices
simply are never free'd and newly attached ones keep working just fine.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: stable@kernel.org


# bb811108 24-Sep-2012 Asias He <asias@redhat.com>

virtio-blk: Disable callback in virtblk_done()

This reduces unnecessary interrupts that host could send to guest while
guest is in the progress of irq handling.

If one vcpu is handling the irq, while another interrupt comes, in
handle_edge_irq(), the guest will mask the interrupt via mask_msi_irq()
which is a very heavy operation that goes all the way down to host.

Here are some performance numbers on qemu:

Before:
-------------------------------------
seq-read : io=0 B, bw=269730KB/s, iops=67432 , runt= 62200msec
seq-write : io=0 B, bw=339716KB/s, iops=84929 , runt= 49386msec
rand-read : io=0 B, bw=270435KB/s, iops=67608 , runt= 62038msec
rand-write: io=0 B, bw=354436KB/s, iops=88608 , runt= 47335msec
clat (usec): min=101 , max=138052 , avg=14822.09, stdev=11771.01
clat (usec): min=96 , max=81543 , avg=11798.94, stdev=7735.60
clat (usec): min=128 , max=140043 , avg=14835.85, stdev=11765.33
clat (usec): min=109 , max=147207 , avg=11337.09, stdev=5990.35
cpu : usr=15.93%, sys=60.37%, ctx=7764972, majf=0, minf=54
cpu : usr=32.73%, sys=120.49%, ctx=7372945, majf=0, minf=1
cpu : usr=18.84%, sys=58.18%, ctx=7775420, majf=0, minf=1
cpu : usr=24.20%, sys=59.85%, ctx=8307886, majf=0, minf=0
vdb: ios=8389107/8368136, merge=0/0, ticks=19457874/14616506,
in_queue=34206098, util=99.68%
43: interrupt in total: 887320
fio --exec_prerun="echo 3 > /proc/sys/vm/drop_caches" --group_reporting
--ioscheduler=noop --thread --bs=4k --size=512MB --direct=1 --numjobs=16
--ioengine=libaio --iodepth=64 --loops=3 --ramp_time=0
--filename=/dev/vdb --name=seq-read --stonewall --rw=read
--name=seq-write --stonewall --rw=write --name=rnd-read --stonewall
--rw=randread --name=rnd-write --stonewall --rw=randwrite

After:
-------------------------------------
seq-read : io=0 B, bw=309503KB/s, iops=77375 , runt= 54207msec
seq-write : io=0 B, bw=448205KB/s, iops=112051 , runt= 37432msec
rand-read : io=0 B, bw=311254KB/s, iops=77813 , runt= 53902msec
rand-write: io=0 B, bw=377152KB/s, iops=94287 , runt= 44484msec
clat (usec): min=81 , max=90588 , avg=12946.06, stdev=9085.94
clat (usec): min=57 , max=72264 , avg=8967.97, stdev=5951.04
clat (usec): min=29 , max=101046 , avg=12889.95, stdev=9067.91
clat (usec): min=52 , max=106152 , avg=10660.56, stdev=4778.19
cpu : usr=15.05%, sys=57.92%, ctx=7710941, majf=0, minf=54
cpu : usr=26.78%, sys=101.40%, ctx=7387891, majf=0, minf=2
cpu : usr=19.03%, sys=58.17%, ctx=7681976, majf=0, minf=8
cpu : usr=24.65%, sys=58.34%, ctx=8442632, majf=0, minf=4
vdb: ios=8389086/8361888, merge=0/0, ticks=17243780/12742010,
in_queue=30078377, util=99.59%
43: interrupt in total: 1259639
fio --exec_prerun="echo 3 > /proc/sys/vm/drop_caches" --group_reporting
--ioscheduler=noop --thread --bs=4k --size=512MB --direct=1 --numjobs=16
--ioengine=libaio --iodepth=64 --loops=3 --ramp_time=0
--filename=/dev/vdb --name=seq-read --stonewall --rw=read
--name=seq-write --stonewall --rw=write --name=rnd-read --stonewall
--rw=randread --name=rnd-write --stonewall --rw=randwrite

Signed-off-by: Asias He <asias@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


# f22cf8eb 05-Sep-2012 Dan Carpenter <dan.carpenter@oracle.com>

virtio-blk: fix NULL checking in virtblk_alloc_req()

Smatch complains about the inconsistent NULL checking here. Fix it to
return NULL on failure.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (fixed accidental deletion)


# c85a1f91 08-Aug-2012 Asias He <asias@redhat.com>

virtio-blk: Add REQ_FLUSH and REQ_FUA support to bio path

We need to support both REQ_FLUSH and REQ_FUA for bio based path since
it does not get the sequencing of REQ_FUA into REQ_FLUSH that request
based drivers can request.

REQ_FLUSH is emulated by:
A) If the bio has no data to write:
1. Send VIRTIO_BLK_T_FLUSH to device,
2. In the flush I/O completion handler, finish the bio

B) If the bio has data to write:
1. Send VIRTIO_BLK_T_FLUSH to device
2. In the flush I/O completion handler, send the actual write data to device
3. In the write I/O completion handler, finish the bio

REQ_FUA is emulated by:
1. Send the actual write data to device
2. In the write I/O completion handler, send VIRTIO_BLK_T_FLUSH to device
3. In the flush I/O completion handler, finish the bio

Changes in v7:
- Using vbr->flags to trace request type
- Dropped unnecessary struct virtio_blk *vblk parameter
- Reuse struct virtblk_req in bio done function

Cahnges in v6:
- Reworked REQ_FLUSH and REQ_FUA emulatation order

Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Tejun Heo <tj@kernel.org>
Cc: Shaohua Li <shli@kernel.org>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: kvm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: virtualization@lists.linux-foundation.org
Signed-off-by: Asias He <asias@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


# a98755c5 08-Aug-2012 Asias He <asias@redhat.com>

virtio-blk: Add bio-based IO path for virtio-blk

This patch introduces bio-based IO path for virtio-blk.

Compared to request-based IO path, bio-based IO path uses driver
provided ->make_request_fn() method to bypasses the IO scheduler. It
handles the bio to device directly without allocating a request in block
layer. This reduces the IO path in guest kernel to achieve high IOPS
and lower latency. The downside is that guest can not use the IO
scheduler to merge and sort requests. However, this is not a big problem
if the backend disk in host side uses faster disk device.

When the bio-based IO path is not enabled, virtio-blk still uses the
original request-based IO path, no performance difference is observed.

Using a slow device e.g. normal SATA disk, the bio-based IO path for
sequential read and write are slower than req-based IO path due to lack
of merge in guest kernel. So we make the bio-based path optional.

Performance evaluation:
-----------------------------
1) Fio test is performed in a 8 vcpu guest with ramdisk based guest using
kvm tool.

Short version:
With bio-based IO path, sequential read/write, random read/write
IOPS boost : 28%, 24%, 21%, 16%
Latency improvement: 32%, 17%, 21%, 16%

Long version:
With bio-based IO path:
seq-read : io=2048.0MB, bw=116996KB/s, iops=233991 , runt= 17925msec
seq-write : io=2048.0MB, bw=100829KB/s, iops=201658 , runt= 20799msec
rand-read : io=3095.7MB, bw=112134KB/s, iops=224268 , runt= 28269msec
rand-write: io=3095.7MB, bw=96198KB/s, iops=192396 , runt= 32952msec
clat (usec): min=0 , max=2631.6K, avg=58716.99, stdev=191377.30
clat (usec): min=0 , max=1753.2K, avg=66423.25, stdev=81774.35
clat (usec): min=0 , max=2915.5K, avg=61685.70, stdev=120598.39
clat (usec): min=0 , max=1933.4K, avg=76935.12, stdev=96603.45
cpu : usr=74.08%, sys=703.84%, ctx=29661403, majf=21354, minf=22460954
cpu : usr=70.92%, sys=702.81%, ctx=77219828, majf=13980, minf=27713137
cpu : usr=72.23%, sys=695.37%, ctx=88081059, majf=18475, minf=28177648
cpu : usr=69.69%, sys=654.13%, ctx=145476035, majf=15867, minf=26176375
With request-based IO path:
seq-read : io=2048.0MB, bw=91074KB/s, iops=182147 , runt= 23027msec
seq-write : io=2048.0MB, bw=80725KB/s, iops=161449 , runt= 25979msec
rand-read : io=3095.7MB, bw=92106KB/s, iops=184211 , runt= 34416msec
rand-write: io=3095.7MB, bw=82815KB/s, iops=165630 , runt= 38277msec
clat (usec): min=0 , max=1932.4K, avg=77824.17, stdev=170339.49
clat (usec): min=0 , max=2510.2K, avg=78023.96, stdev=146949.15
clat (usec): min=0 , max=3037.2K, avg=74746.53, stdev=128498.27
clat (usec): min=0 , max=1363.4K, avg=89830.75, stdev=114279.68
cpu : usr=53.28%, sys=724.19%, ctx=37988895, majf=17531, minf=23577622
cpu : usr=49.03%, sys=633.20%, ctx=205935380, majf=18197, minf=27288959
cpu : usr=55.78%, sys=722.40%, ctx=101525058, majf=19273, minf=28067082
cpu : usr=56.55%, sys=690.83%, ctx=228205022, majf=18039, minf=26551985

2) Fio test is performed in a 8 vcpu guest with Fusion-IO based guest using
kvm tool.

Short version:
With bio-based IO path, sequential read/write, random read/write
IOPS boost : 11%, 11%, 13%, 10%
Latency improvement: 10%, 10%, 12%, 10%
Long Version:
With bio-based IO path:
read : io=2048.0MB, bw=58920KB/s, iops=117840 , runt= 35593msec
write: io=2048.0MB, bw=64308KB/s, iops=128616 , runt= 32611msec
read : io=3095.7MB, bw=59633KB/s, iops=119266 , runt= 53157msec
write: io=3095.7MB, bw=62993KB/s, iops=125985 , runt= 50322msec
clat (usec): min=0 , max=1284.3K, avg=128109.01, stdev=71513.29
clat (usec): min=94 , max=962339 , avg=116832.95, stdev=65836.80
clat (usec): min=0 , max=1846.6K, avg=128509.99, stdev=89575.07
clat (usec): min=0 , max=2256.4K, avg=121361.84, stdev=82747.25
cpu : usr=56.79%, sys=421.70%, ctx=147335118, majf=21080, minf=19852517
cpu : usr=61.81%, sys=455.53%, ctx=143269950, majf=16027, minf=24800604
cpu : usr=63.10%, sys=455.38%, ctx=178373538, majf=16958, minf=24822612
cpu : usr=62.04%, sys=453.58%, ctx=226902362, majf=16089, minf=23278105
With request-based IO path:
read : io=2048.0MB, bw=52896KB/s, iops=105791 , runt= 39647msec
write: io=2048.0MB, bw=57856KB/s, iops=115711 , runt= 36248msec
read : io=3095.7MB, bw=52387KB/s, iops=104773 , runt= 60510msec
write: io=3095.7MB, bw=57310KB/s, iops=114619 , runt= 55312msec
clat (usec): min=0 , max=1532.6K, avg=142085.62, stdev=109196.84
clat (usec): min=0 , max=1487.4K, avg=129110.71, stdev=114973.64
clat (usec): min=0 , max=1388.6K, avg=145049.22, stdev=107232.55
clat (usec): min=0 , max=1465.9K, avg=133585.67, stdev=110322.95
cpu : usr=44.08%, sys=590.71%, ctx=451812322, majf=14841, minf=17648641
cpu : usr=48.73%, sys=610.78%, ctx=418953997, majf=22164, minf=26850689
cpu : usr=45.58%, sys=581.16%, ctx=714079216, majf=21497, minf=22558223
cpu : usr=48.40%, sys=599.65%, ctx=656089423, majf=16393, minf=23824409

3) Fio test is performed in a 8 vcpu guest with normal SATA based guest
using kvm tool.

Short version:
With bio-based IO path, sequential read/write, random read/write
IOPS boost : -10%, -10%, 4.4%, 0.5%
Latency improvement: -12%, -15%, 2.5%, 0.8%
Long Version:
With bio-based IO path:
read : io=124812KB, bw=36537KB/s, iops=9060 , runt= 3416msec
write: io=169180KB, bw=24406KB/s, iops=6065 , runt= 6932msec
read : io=256200KB, bw=2089.3KB/s, iops=520 , runt=122630msec
write: io=257988KB, bw=1545.7KB/s, iops=384 , runt=166910msec
clat (msec): min=1 , max=1527 , avg=28.06, stdev=89.54
clat (msec): min=2 , max=344 , avg=41.12, stdev=38.70
clat (msec): min=8 , max=1984 , avg=490.63, stdev=207.28
clat (msec): min=33 , max=4131 , avg=659.19, stdev=304.71
cpu : usr=4.85%, sys=17.15%, ctx=31593, majf=0, minf=7
cpu : usr=3.04%, sys=11.45%, ctx=39377, majf=0, minf=0
cpu : usr=0.47%, sys=1.59%, ctx=262986, majf=0, minf=16
cpu : usr=0.47%, sys=1.46%, ctx=337410, majf=0, minf=0

With request-based IO path:
read : io=150120KB, bw=40420KB/s, iops=10037 , runt= 3714msec
write: io=194932KB, bw=27029KB/s, iops=6722 , runt= 7212msec
read : io=257136KB, bw=2001.1KB/s, iops=498 , runt=128443msec
write: io=258276KB, bw=1537.2KB/s, iops=382 , runt=168028msec
clat (msec): min=1 , max=1542 , avg=24.84, stdev=32.45
clat (msec): min=3 , max=628 , avg=35.62, stdev=39.71
clat (msec): min=8 , max=2540 , avg=503.28, stdev=236.97
clat (msec): min=41 , max=4398 , avg=653.88, stdev=302.61
cpu : usr=3.91%, sys=15.75%, ctx=26968, majf=0, minf=23
cpu : usr=2.50%, sys=10.56%, ctx=19090, majf=0, minf=0
cpu : usr=0.16%, sys=0.43%, ctx=20159, majf=0, minf=16
cpu : usr=0.18%, sys=0.53%, ctx=81364, majf=0, minf=0

How to use:
-----------------------------
Add 'virtio_blk.use_bio=1' to kernel cmdline or 'modprobe virtio_blk
use_bio=1' to enable ->make_request_fn() based I/O path.

Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Tejun Heo <tj@kernel.org>
Cc: Shaohua Li <shli@kernel.org>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: kvm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: virtualization@lists.linux-foundation.org
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Minchan Kim <minchan.kim@gmail.com>
Signed-off-by: Asias He <asias@redhat.com>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


# cd5d5038 03-Jul-2012 Paolo Bonzini <pbonzini@redhat.com>

virtio-blk: allow toggling host cache between writeback and writethrough

This patch adds support for the new VIRTIO_BLK_F_CONFIG_WCE feature,
which exposes the cache mode in the configuration space and lets the
driver modify it. The cache mode is exposed via sysfs.

Even if the host does not support the new feature, the cache mode is
visible (thanks to the existing VIRTIO_BLK_F_WCE), but not modifiable.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


# 2c95a329 25-May-2012 Asias He <asias@redhat.com>

virtio-blk: Use block layer provided spinlock

Block layer will allocate a spinlock for the queue if the driver does
not provide one in blk_init_queue().

The reason to use the internal spinlock is that blk_cleanup_queue() will
switch to use the internal spinlock in the cleanup code path.

if (q->queue_lock != &q->__queue_lock)
q->queue_lock = &q->__queue_lock;

However, processes which are in D state might have taken the driver
provided spinlock, when the processes wake up, they would release the
block provided spinlock.

=====================================
[ BUG: bad unlock balance detected! ]
3.4.0-rc7+ #238 Not tainted
-------------------------------------
fio/3587 is trying to release lock (&(&q->__queue_lock)->rlock) at:
[<ffffffff813274d2>] blk_queue_bio+0x2a2/0x380
but there are no more locks to release!

other info that might help us debug this:
1 lock held by fio/3587:
#0: (&(&vblk->lock)->rlock){......}, at:
[<ffffffff8132661a>] get_request_wait+0x19a/0x250

Other drivers use block layer provided spinlock as well, e.g. SCSI.

Switching to the block layer provided spinlock saves a bit of memory and
does not increase lock contention. Performance test shows no real
difference is observed before and after this patch.

Changes in v2: Improve commit log as Michael suggested.

Cc: virtualization@lists.linux-foundation.org
Cc: kvm@vger.kernel.org
Cc: stable@kernel.org
Signed-off-by: Asias He <asias@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


# 483001c7 24-May-2012 Asias He <asias@redhat.com>

virtio-blk: Reset device after blk_cleanup_queue()

blk_cleanup_queue() will call blk_drian_queue() to drain all the
requests before queue DEAD marking. If we reset the device before
blk_cleanup_queue() the drain would fail.

1) if the queue is stopped in do_virtblk_request() because device is
full, the q->request_fn() will not be called.

blk_drain_queue() {
while(true) {
...
if (!list_empty(&q->queue_head))
__blk_run_queue(q) {
if (queue is not stoped)
q->request_fn()
}
...
}
}

Do no reset the device before blk_cleanup_queue() gives the chance to
start the queue in interrupt handler blk_done().

2) In commit b79d866c8b7014a51f611a64c40546109beaf24a, We abort requests
dispatched to driver before blk_cleanup_queue(). There is a race if
requests are dispatched to driver after the abort and before the queue
DEAD mark. To fix this, instead of aborting the requests explicitly, we
can just reset the device after after blk_cleanup_queue so that the
device can complete all the requests before queue DEAD marking in the
drain process.

Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: virtualization@lists.linux-foundation.org
Cc: kvm@vger.kernel.org
Cc: stable@kernel.org
Signed-off-by: Asias He <asias@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


# 02e2b124 24-May-2012 Asias He <asias@redhat.com>

virtio-blk: Call del_gendisk() before disable guest kick

del_gendisk() might not return due to failing to remove the
/sys/block/vda/serial sysfs entry when another thread (udev) is
trying to read it.

virtblk_remove()
vdev->config->reset() : guest will not kick us through interrupt
del_gendisk()
device_del()
kobject_del(): got stuck, sysfs entry ref count non zero

sysfs_open_file(): user space process read /sys/block/vda/serial
sysfs_get_active() : got sysfs entry ref count
dev_attr_show()
virtblk_serial_show()
blk_execute_rq() : got stuck, interrupt is disabled
request cannot be finished

This patch fixes it by calling del_gendisk() before we disable guest's
interrupt so that the request sent in virtblk_serial_show() will be
finished and del_gendisk() will success.

This fixes another race in hot-unplug process.

It is save to call del_gendisk(vblk->disk) before
flush_work(&vblk->config_work) which might access vblk->disk, because
vblk->disk is not freed until put_disk(vblk->disk).

Cc: virtualization@lists.linux-foundation.org
Cc: kvm@vger.kernel.org
Cc: stable@kernel.org
Signed-off-by: Asias He <asias@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


# f65ca1dc 29-Mar-2012 Asias He <asias@redhat.com>

virtio_blk: Drop unused request tracking list

Benchmark shows small performance improvement on fusion io device.

Before:
seq-read : io=1,024MB, bw=19,982KB/s, iops=39,964, runt= 52475msec
seq-write: io=1,024MB, bw=20,321KB/s, iops=40,641, runt= 51601msec
rnd-read : io=1,024MB, bw=15,404KB/s, iops=30,808, runt= 68070msec
rnd-write: io=1,024MB, bw=14,776KB/s, iops=29,552, runt= 70963msec

After:
seq-read : io=1,024MB, bw=20,343KB/s, iops=40,685, runt= 51546msec
seq-write: io=1,024MB, bw=20,803KB/s, iops=41,606, runt= 50404msec
rnd-read : io=1,024MB, bw=16,221KB/s, iops=32,442, runt= 64642msec
rnd-write: io=1,024MB, bw=15,199KB/s, iops=30,397, runt= 68991msec

Signed-off-by: Asias He <asias@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


# b79d866c 04-May-2012 Asias He <asias@redhat.com>

virtio-blk: Fix hot-unplug race in remove method

If we reset the virtio-blk device before the requests already dispatched
to the virtio-blk driver from the block layer are finised, we will stuck
in blk_cleanup_queue() and the remove will fail.

blk_cleanup_queue() calls blk_drain_queue() to drain all requests queued
before DEAD marking. However it will never success if the device is
already stopped. We'll have q->in_flight[] > 0, so the drain will not
finish.

How to reproduce the race:
1. hot-plug a virtio-blk device
2. keep reading/writing the device in guest
3. hot-unplug while the device is busy serving I/O

Test:
~1000 rounds of hot-plug/hot-unplug test passed with this patch.

Changes in v3:
- Drop blk_abort_queue and blk_abort_request
- Use __blk_end_request_all to complete request dispatched to driver

Changes in v2:
- Drop req_in_flight
- Use virtqueue_detach_unused_buf to get request dispatched to driver

Signed-off-by: Asias He <asias@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


# c0aa3e09 10-Apr-2012 Ren Mingxin <renmx@cn.fujitsu.com>

virtio_blk: helper function to format disk names

The current virtio block's naming algorithm just supports 18278
(26^3 + 26^2 + 26) disks. If there are more virtio blocks,
there will be disks with the same name.

Based on commit 3e1a7ff8a0a7b948f2684930166954f9e8e776fe, add
a function "virtblk_name_format()" for virtio block to support mass
of disks naming.

Notes:
- Our naming scheme is ugly. We are stuck with it
for virtio but don't use it for any new driver:
new drivers should name their devices PREFIX%d
where the sequence number can be allocated by ida
- sd_format_disk_name has exactly the same logic.
Moving it to a central place was deferred over worries
that this will make people keep using the legacy naming
in new drivers.
We kept code idential in case someone wants to deduplicate later.

Signed-off-by: Ren Mingxin <renmx@cn.fujitsu.com>
Acked-by: Asias He <asias@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>


# e9986f30 29-Mar-2012 Vivek Goyal <vgoyal@redhat.com>

virtio-blk: Call revalidate_disk() upon online disk resize

If a virtio disk is open in guest and a disk resize operation is done,
(virsh blockresize), new size is not visible to tools like "fdisk -l".
This seems to be happening as we update only part->nr_sects and not
bdev->bd_inode size.

Call revalidate_disk() which should take care of it. I tested growing disk
size of already open disk and it works for me.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 577ebb37 12-Jan-2012 Paolo Bonzini <pbonzini@redhat.com>

block: add and use scsi_blk_cmd_ioctl

Introduce a wrapper around scsi_cmd_ioctl that takes a block device.

The function will then be enhanced to detect partition block devices
and, in that case, subject the ioctls to whitelisting.

Cc: linux-scsi@vger.kernel.org
Cc: Jens Axboe <axboe@kernel.dk>
Cc: James Bottomley <JBottomley@parallels.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# f8fb5bc2 22-Dec-2011 Amit Shah <amit.shah@redhat.com>

virtio: blk: Add freeze, restore handlers to support S4

Delete the vq and flush any pending requests from the block queue on the
freeze callback to prepare for hibernation.

Re-create the vq in the restore callback to resume normal function.

Signed-off-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


# 6abd6e5a 22-Dec-2011 Amit Shah <amit.shah@redhat.com>

virtio: blk: Move vq initialization to separate function

The probe and PM restore functions will share this code.

Signed-off-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


# 4678d6f9 11-Jan-2012 Michael S. Tsirkin <mst@redhat.com>

virtio_blk: fix config handler race

Fix a theoretical race related to config work
handler: a config interrupt might happen
after we flush config work but before we
reset the device. It will then cause the
config work to run during or after reset.

Two problems with this:
- if this runs after device is gone we will get use after free
- access of config while reset is in progress is racy
(as layout is changing).

As a solution
1. flush after reset when we know there will be no more interrupts
2. add a flag to disable config access before reset

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


# f96fde41 11-Jan-2012 Rusty Russell <rusty@rustcorp.com.au>

virtio: rename virtqueue_add_buf_gfp to virtqueue_add_buf

Remove wrapper functions. This makes the allocation type explicit in
all callers; I used GPF_KERNEL where it seemed obvious, left it at
GFP_ATOMIC otherwise.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Reviewed-by: Christoph Hellwig <hch@lst.de>


# 5087a50e 30-Oct-2011 Michael S. Tsirkin <mst@redhat.com>

virtio-blk: use ida to allocate disk index

Based on a patch by Mark Wu <dwu@redhat.com>

Current index allocation in virtio-blk is based on a monotonically
increasing variable "index". This means we'll run out of numbers
after a while. It also could cause confusion about the disk
name in the case of hot-plugging disks.
Change virtio-blk to use ida to allocate index, instead.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


# 0c8d44f2 01-Jul-2011 Paul Gortmaker <paul.gortmaker@windriver.com>

block: Fix files that are modules and hence need module.h

We want to remove the implicit everywhere presence of module.h
so fix up the people relying on that implicit presence in advance.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>


# a0eda625 31-Oct-2011 Michael S. Tsirkin <mst@redhat.com>

virtio-blk: use ida to allocate disk index

Based on a patch by Mark Wu <dwu@redhat.com>

Current index allocation in virtio-blk is based on a monotonically
increasing variable "index". This means we'll run out of numbers
after a while. It also could cause confusion about the disk
name in the case of hot-plugging disks.
Change virtio-blk to use ida to allocate index, instead.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 6917f83f 23-Apr-2011 Liu Yuan <tailai.ly@taobao.com>

drivers, block: virtio_blk: Replace cryptic number with the macro

It is easier to figure out the context by reading SCSI_SENSE_BUFFERSIZE
instead of plain '96'.

Signed-off-by: Liu Yuan <tailai.ly@taobao.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


# 7a7c924c 01-Feb-2011 Christoph Hellwig <hch@lst.de>

virtio_blk: allow re-reading config space at runtime

Wire up the virtio_driver config_changed method to get notified about
config changes raised by the host. For now we just re-read the device
size to support online resizing of devices, but once we add more
attributes that might be changeable they could be added as well.

Note that the config_changed method is called from irq context, so
we'll have to use the workqueue infrastructure to provide us a proper
user context for our changes.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


# fe5a50a1 14-Sep-2010 Christoph Hellwig <hch@lst.de>

virtio_blk: remove BKL leftovers

Remove the BKL usage added in "block: push down BKL into .locked_ioctl".
Virtio-blk doesn't use the BKL for anything, and doesn't implement any
ioctl command by itself, but only uses the generic scsi_cmd_ioctl
which is fine without the BKL.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


# e4c4776d 08-Oct-2010 Mike Snitzer <snitzer@redhat.com>

virtio-blk: fix request leak.

Must drop reference taken by blk_make_request().

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: stable@kernel.org # .35.x
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 02c42b7a 03-Sep-2010 Tejun Heo <tj@kernel.org>

virtio_blk: drop REQ_HARDBARRIER support

Remove now unused REQ_HARDBARRIER support. virtio_blk already
supports REQ_FLUSH and the usefulness of REQ_FUA for virtio_blk is
questionable at this point, so there's nothing else to do to support
new REQ_FLUSH/FUA interface.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>


# 4913efe4 03-Sep-2010 Tejun Heo <tj@kernel.org>

block: deprecate barrier and replace blk_queue_ordered() with blk_queue_flush()

Barrier is deemed too heavy and will soon be replaced by FLUSH/FUA
requests. Deprecate barrier. All REQ_HARDBARRIERs are failed with
-EOPNOTSUPP and blk_queue_ordered() is replaced with simpler
blk_queue_flush().

blk_queue_flush() takes combinations of REQ_FLUSH and FUA. If a
device has write cache and can flush it, it should set REQ_FLUSH. If
the device can handle FUA writes, it should also set REQ_FUA.

All blk_queue_ordered() users are converted.

* ORDERED_DRAIN is mapped to 0 which is the default value.
* ORDERED_DRAIN_FLUSH is mapped to REQ_FLUSH.
* ORDERED_DRAIN_FLUSH_FUA is mapped to REQ_FLUSH | REQ_FUA.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Boaz Harrosh <bharrosh@panasas.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Nick Piggin <npiggin@kernel.dk>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: Chris Wright <chrisw@sous-sol.org>
Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Geert Uytterhoeven <Geert.Uytterhoeven@sonycom.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Alasdair G Kergon <agk@redhat.com>
Cc: Pierre Ossman <drzeus@drzeus.cx>
Cc: Stefan Weinhuber <wein@de.ibm.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>


# 6958f145 03-Sep-2010 Tejun Heo <tj@kernel.org>

block: kill QUEUE_ORDERED_BY_TAG

Nobody is making meaningful use of ORDERED_BY_TAG now and queue
draining for barrier requests will be removed soon which will render
the advantage of tag ordering moot. Kill ORDERED_BY_TAG. The
following users are affected.

* brd: converted to ORDERED_DRAIN.
* virtio_blk: ORDERED_TAG path was already marked deprecated. Removed.
* xen-blkfront: ORDERED_TAG case dropped.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Nick Piggin <npiggin@kernel.dk>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>


# 8a6cfeb6 08-Jul-2010 Arnd Bergmann <arnd@arndb.de>

block: push down BKL into .locked_ioctl

As a preparation for the removal of the big kernel
lock in the block layer, this removes the BKL
from the common ioctl handling code, moving it
into every single driver still using it.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>


# 00fff265 03-Jul-2010 FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>

block: remove q->prepare_flush_fn completely

This removes q->prepare_flush_fn completely (changes the
blk_queue_ordered API).

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>


# dd40e456 03-Jul-2010 FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>

virtio_blk: stop using q->prepare_flush_fn

use REQ_FLUSH flag instead.

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>


# 15fa6e81 17-Jun-2010 Jens Axboe <jaxboe@fusionio.com>

virtio_blk: add default case to cmd type switch

On compilation, gcc correctly detects that we do not handle
all types:

In function ‘blk_done’:
warning: enumeration value ‘REQ_TYPE_FS’ not handled in switch
warning: enumeration value ‘REQ_TYPE_SENSE’ not handled in switch
warning: enumeration value ‘REQ_TYPE_PM_SUSPEND’ not handled in switch
warning: enumeration value ‘REQ_TYPE_PM_RESUME’ not handled in switch
warning: enumeration value ‘REQ_TYPE_PM_SHUTDOWN’ not handled in switch
warning: enumeration value ‘REQ_TYPE_LINUX_BLOCK’ not handled in switch
warning: enumeration value ‘REQ_TYPE_ATA_TASKFILE’ not handled in switch
warning: enumeration value ‘REQ_TYPE_ATA_PC’ not handled in switch

which is a bit pointless since this is at the end of the request
processessing. Add a default case that just breaks out.

Signed-off-by: Jens Axboe <jaxboe@fusionio.com>


# 33659ebb 07-Aug-2010 Christoph Hellwig <hch@lst.de>

block: remove wrappers for request type/flags

Remove all the trivial wrappers for the cmd_type and cmd_flags fields in
struct requests. This allows much easier grepping for different request
types instead of unwinding through macros.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>


# 6c99a852 23-Jun-2010 Ryan Harper <ryanh@us.ibm.com>

virtio_blk: Remove VBID ioctl

With the availablility of a sysfs device attribute for examining disk serial
numbers the ioctl is no longer needed. The user-space changes for this aren't
upstream yet so we don't have any users to worry about.

Signed-off-by: Ryan Harper <ryanh@us.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


# a5eb9e4f 23-Jun-2010 Ryan Harper <ryanh@us.ibm.com>

virtio_blk: Add 'serial' attribute to virtio-blk devices (v2)

Create a new attribute for virtio-blk devices that will fetch the serial number
of the block device. This attribute can be used by udev to create disk/by-id
symlinks for devices that don't have a UUID (filesystem) associated with them.

ATA_IDENTIFY strings are special in that they can be up to 20 chars long
and aren't required to be nul-terminated. The buffer is also zero-padded
meaning that if the serial is 19 chars or less that we get a nul-terminated
string. When copying this value into a string buffer, we must be careful to
copy up to the nul (if it present) and only 20 if it is longer and not to
attempt to nul terminate; this isn't needed.

Changes since v1:
- Added BUILD_BUG_ON() for PAGE_SIZE check
- Removed min() since BUILD_BUG_ON() handles the check
- Replaced serial_sysfs() by copying id directly to buffer

Signed-off-by: Ryan Harper <ryanh@us.ibm.com>
Signed-off-by: john cooper <john.cooper@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


# 10bc310c 15-Jun-2010 Christoph Hellwig <hch@lst.de>

virtio_blk: support barriers without FLUSH feature

If we want to support barriers with the cache=writethrough mode in qemu
we need to tell the block layer that we only need queue drains to
implement a barrier. Follow the model set by SCSI and IDE and assume
that there is no volatile write cache if the host doesn't advertize it.
While this might imply working barriers on old qemu versions or other
hypervisors that actually have a volatile write cache this is only a
cosmetic issue - these hypervisors don't guarantee any data integrity
with or without this patch, but with the patch we at least provide
data ordering.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


# a5b365a6 25-May-2010 Christoph Hellwig <hch@lst.de>

virtio-blk: fix minimum number of S/G elements

We need at least one S/G element to operate properly, as does the block
layer which increments it to one anyway. We hit this due to a qemu
bug which advertises a sg_elements of 0 under some circumstances.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (tweaked logic)


# 09ec6b69 12-Apr-2010 Michael S. Tsirkin <mst@redhat.com>

virtio_blk: use virtqueue_xxx wrappers

Switch virtio_blk to new virtqueue_xxx wrappers.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


# bdb4a130 19-May-2010 Rusty Russell <rusty@rustcorp.com.au>

virtio_blk: remove multichar constant.

drivers/block/virtio_blk.c:228:13: warning: multi-character character constant

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: john cooper <john.cooper@redhat.com>


# 234f2725 24-Mar-2010 john cooper <john.cooper@redhat.com>

Add virtio disk identification ioctl

Return serial string to the guest application via
ioctl driver call.

Note this form of interface to the guest userland
was the consensus when the prior version using
the ATA_IDENTIFY came under dispute.

Signed-off-by: john cooper <john.cooper@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


# 4cb2ea28 24-Mar-2010 john cooper <john.cooper@redhat.com>

Add virtio disk identification support

Add virtio-blk device id (s/n) support via virtio request.

Signed-off-by: john cooper <john.cooper@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


# 5a0e3ad6 24-Mar-2010 Tejun Heo <tj@kernel.org>

include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h

percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files. percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.

percpu.h -> slab.h dependency is about to be removed. Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability. As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.

http://userweb.kernel.org/~tj/misc/slabh-sweep.py

The script does the followings.

* Scan files for gfp and slab usages and update includes such that
only the necessary includes are there. ie. if only gfp is used,
gfp.h, if slab is used, slab.h.

* When the script inserts a new include, it looks at the include
blocks and try to put the new include such that its order conforms
to its surrounding. It's put in the include block which contains
core kernel includes, in the same order that the rest are ordered -
alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
doesn't seem to be any matching order.

* If the script can't find a place to put a new include (mostly
because the file doesn't have fitting include block), it prints out
an error message indicating which .h file needs to be added to the
file.

The conversion was done in the following steps.

1. The initial automatic conversion of all .c files updated slightly
over 4000 files, deleting around 700 includes and adding ~480 gfp.h
and ~3000 slab.h inclusions. The script emitted errors for ~400
files.

2. Each error was manually checked. Some didn't need the inclusion,
some needed manual addition while adding it to implementation .h or
embedding .c file was more appropriate for others. This step added
inclusions to around 150 files.

3. The script was run again and the output was compared to the edits
from #2 to make sure no file was left behind.

4. Several build tests were done and a couple of problems were fixed.
e.g. lib/decompress_*.c used malloc/free() wrappers around slab
APIs requiring slab.h to be added manually.

5. The script was run on all .h files but without automatically
editing them as sprinkling gfp.h and slab.h inclusions around .h
files could easily lead to inclusion dependency hell. Most gfp.h
inclusion directives were ignored as stuff from gfp.h was usually
wildly available and often used in preprocessor macros. Each
slab.h inclusion directive was examined and added manually as
necessary.

6. percpu.h was updated not to include slab.h.

7. Build test were done on the following configurations and failures
were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
distributed build env didn't work with gcov compiles) and a few
more options had to be turned off depending on archs to make things
build (like ipr on powerpc/64 which failed due to missing writeq).

* x86 and x86_64 UP and SMP allmodconfig and a custom test config.
* powerpc and powerpc64 SMP allmodconfig
* sparc and sparc64 SMP allmodconfig
* ia64 SMP allmodconfig
* s390 SMP allmodconfig
* alpha SMP allmodconfig
* um on x86_64 SMP allmodconfig

8. percpu.h modifications were reverted so that it could be applied as
a separate patch and serve as bisection point.

Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.

Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>


# ee714f2d 09-Mar-2010 Martin K. Petersen <martin.petersen@oracle.com>

block: Finalize conversion of block limits functions

Remove compatibility wrappers and update remaining drivers.

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>


# 69740c8b 24-Feb-2010 Christoph Hellwig <hch@lst.de>

virtio_blk: add block topology support

Allow reading various alignment values from the config page. This
allows the guest to much better align I/O requests depending on the
storage topology.

Note that the formats for the config values appear a bit messed up,
but we follow the formats used by ATA and SCSI so they are expected in
the storage world.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


# 47483e25 10-Jan-2010 Márton Németh <nm127@freemail.hu>

block: make virtio device id constant

The id_table field of the struct virtio_driver is constant in <linux/virtio.h>
so it is worth to make id_table also constant.

The semantic match that finds this kind of pattern is as follows:
(http://coccinelle.lip6.fr/)

// <smpl>
@r@
disable decl_init,const_decl_init;
identifier I1, I2, x;
@@
struct I1 {
...
const struct I2 *x;
...
};
@s@
identifier r.I1, y;
identifier r.x, E;
@@
struct I1 y = {
.x = E,
};
@c@
identifier r.I2;
identifier s.E;
@@
const struct I2 E[] = ... ;
@depends on !c@
identifier r.I2;
identifier s.E;
@@
+ const
struct I2 E[] = ...;
// </smpl>

Signed-off-by: Márton Németh <nm127@freemail.hu>
Cc: Julia Lawall <julia@diku.dk>
Cc: cocci@diku.dk
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>


# 3225beab 22-Oct-2009 Rusty Russell <rusty@rustcorp.com.au>

virtio_blk: Revert serial number support

This reverts "Add serial number support for virtio_blk, V4a".

Turns out that virtio_pci, lguest and s/390 all have an 8 bit limit
on virtio config space, so noone could ever use this.

This is coming back later in a cleaner form.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: john cooper <john.cooper@redhat.com>
Cc: Jens Axboe <jens.axboe@oracle.com>


# e95646c3 30-Sep-2009 Christian Borntraeger <borntraeger@de.ibm.com>

virtio: let header files include virtio_ids.h

Rusty,

commit 3ca4f5ca73057a617f9444a91022d7127041970a
virtio: add virtio IDs file
moved all device IDs into a single file. While the change itself is
a very good one, it can break userspace applications. For example
if a userspace tool wanted to get the ID of virtio_net it used to
include virtio_net.h. This does no longer work, since virtio_net.h
does not include virtio_ids.h.
This patch moves all "#include <linux/virtio_ids.h>" from the C
files into the header files, making the header files compatible with
the old ones.

In addition, this patch exports virtio_ids.h to userspace.

CC: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


# f8b12e51 04-Sep-2009 Christoph Hellwig <hch@lst.de>

virtio_blk: revert QUEUE_FLAG_VIRT addition

It seems like the addition of QUEUE_FLAG_VIRT caueses major performance
regressions for Fedora users:

https://bugzilla.redhat.com/show_bug.cgi?id=509383
https://bugzilla.redhat.com/show_bug.cgi?id=505695

while I can't reproduce those extreme regressions myself I think the flag
is wrong.

Rationale:

QUEUE_FLAG_VIRT expands to QUEUE_FLAG_NONROT which casus the queue
unplugged immediately. This is not a good behaviour for at least
qemu and kvm where we do have significant overhead for every
I/O operations. Even with all the latested speeups (native AIO,
MSI support, zero copy) we can only get native speed for up to 128kb
I/O requests we already are down to 66% of native performance for 4kb
requests even on my laptop running the Intel X25-M SSD for which the
QUEUE_FLAG_NONROT was designed.
If we ever get virtio-blk overhead low enough that this flag makes
sense it should only be set based on a feature flag set by the host.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


# f1b0ef06 17-Sep-2009 Christoph Hellwig <hch@lst.de>

virtio_blk: add support for cache flush

Recent qemu has added a VIRTIO_BLK_F_FLUSH flag to advertise that the
virtual disk has a volatile write cache that needs to be flushed. In case
we see this feature implement tell the Linux block layer about the fact
and use the new VIRTIO_BLK_T_FLUSH to flush the cache when required. This
allows for an correct and simple implementation of write barriers.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


# 3ca4f5ca 31-Jul-2009 Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>

virtio: add virtio IDs file

Virtio IDs are spread all over the tree which makes assigning new IDs
bothersome. Putting them together should make the process less error-prone.

Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


# 3c1b27d5 23-Sep-2009 Rusty Russell <rusty@rustcorp.com.au>

virtio: make add_buf return capacity remaining

This API change means that virtio_net can tell how much capacity
remains for buffers. It's necessarily fuzzy, since
VIRTIO_RING_F_INDIRECT_DESC means we can fit any number of descriptors
in one, *if* we can kmalloc.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Dinesh Subhraveti <dineshs@us.ibm.com>


# 83d5cde4 21-Sep-2009 Alexey Dobriyan <adobriyan@gmail.com>

const: make block_device_operations const

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 4fbfff76 17-Jul-2009 Rakib Mullick <rakib.mullick@gmail.com>

virtio_blk: mark virtio_blk with __refdata to kill spurious section mismatch

The variable virtio_blk references the function virtblk_probe() (which
is in .devinit section) and also references the function
virtblk_remove() ( which is in .devexit section). So, virtio_blk
simultaneously refers .devinit and .devexit section. To avoid this
messup, we mark virtio_blk as __refdata.

We were warned by the following warning:

LD drivers/block/built-in.o
WARNING: drivers/block/built-in.o(.data+0xc8dc): Section mismatch in
reference from the variable virtio_blk to the function
.devinit.text:virtblk_probe()
The variable virtio_blk references
the function __devinit virtblk_probe()
If the reference is valid then annotate the
variable with __init* or __refdata (see linux/init.h) or name the variable:
*driver, *_template, *_timer, *_sht, *_ops, *_probe, *_probe_one, *_console,

WARNING: drivers/block/built-in.o(.data+0xc8e0): Section mismatch in
reference from the variable virtio_blk to the function
.devexit.text:virtblk_remove()
The variable virtio_blk references
the function __devexit virtblk_remove()
If the reference is valid then annotate the
variable with __exit* (see linux/init.h) or name the variable:
*driver, *_template, *_timer, *_sht, *_ops, *_probe, *_probe_one, *_console,

Signed-off-by: Rakib Mullick <rakib.mullick@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>


# d9ecdea7 20-Jun-2009 Christoph Hellwig <hch@lst.de>

virtio_blk: ioctl return value fix

Block driver ioctl methods must return ENOTTY and not -ENOIOCTLCMD if
they expect the block layer to handle generic ioctls.

This triggered a BLKROSET failure in xfsqa #200.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


# 4eff3cae 17-Jul-2009 Christoph Hellwig <hch@lst.de>

virtio_blk: don't bounce highmem requests

By default a block driver bounces highmem requests, but virtio-blk is
perfectly fine with any request that fit into it's 64 bit addressing scheme,
mapped in the kernel virtual space or not.

Besides improving performance on highmem systems this also makes the
reproducible oops in __bounce_end_io go away (but hiding the real cause).

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


# 98e94444 18-May-2009 Mike Frysinger <vapier@gentoo.org>

virtio_blk: add missing __dev{init,exit} markings

The remove member of the virtio_driver structure uses __devexit_p(), so
the remove function itself should be marked with __devexit. And where
there be __devexit on the remove, so is there __devinit on the probe.

Signed-off-by: Mike Frysinger <vapier@gentoo.org>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


# d2a7ddda 12-Jun-2009 Michael S. Tsirkin <mst@redhat.com>

virtio: find_vqs/del_vqs virtio operations

This replaces find_vq/del_vq with find_vqs/del_vqs virtio operations,
and updates all drivers. This is needed for MSI support, because MSI
needs to know the total number of vectors upfront.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (+ lguest/9p compile fixes)


# 9499f5e7 12-Jun-2009 Rusty Russell <rusty@rustcorp.com.au>

virtio: add names to virtqueue struct, mapping from devices to queues.

Add a linked list of all virtqueues for a virtio device: this helps for
debugging and is also needed for upcoming interface change.

Also, add a "name" field for clearer debug messages.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


# 1d589bb1 09-Jun-2009 john cooper <john.cooper@redhat.com>

Add serial number support for virtio_blk, V4a

This patch extracts the opaque data from pci i/o
region 0 via the added VIRTIO_BLK_F_IDENTIFY
field. By convention this data takes the form of
that returned by an ATA IDENTIFY DEVICE command,
however the driver (except for structure size)
makes no interpretation of the data. The structure
data is copied wholesale to userspace via a
HDIO_GET_IDENTITY ioctl command (eg: hdparm -i <dev>).

Signed-off-by: john cooper <john.cooper@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>


# e1defc4f 22-May-2009 Martin K. Petersen <martin.petersen@oracle.com>

block: Do away with the notion of hardsect_size

Until now we have had a 1:1 mapping between storage device physical
block size and the logical block sized used when addressing the device.
With SATA 4KB drives coming out that will no longer be the case. The
sector size will be 4KB but the logical block size will remain
512-bytes. Hence we need to distinguish between the physical block size
and the logical ditto.

This patch renames hardsect_size to logical_block_size.

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>


# f831cc03 18-May-2009 Jens Axboe <jens.axboe@oracle.com>

virtio_blk: get rid of unused variable

drivers/block/virtio_blk.c: In function 'blk_done':
drivers/block/virtio_blk.c:53: warning: unused variable 'nr_bytes'

Leftover from commit 1cde26f928863d90e9e7c1217880c8450464d305

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>


# 1cde26f9 18-May-2009 Hannes Reinecke <hare@suse.de>

virtio_blk: SG_IO passthru support

Add support for SG_IO passthru to virtio_blk. We add the scsi command
block after the normal outhdr, and the scsi inhdr with full status
information aswell as the sense buffer before the regular inhdr.

[hch: forward ported, added the VIRTIO_BLK_F_SCSI flags, some comments
and tested the whole beast]
[axboe: updated to use ->resid and not dual-path the byte count]

Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (+ checkpatch.pl tweak)
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>


# 6c3b46f7 18-May-2009 Christoph Hellwig <hch@lst.de>

virtio_blk: don't blindly derefence req->rq_disk

request->rq_disk is only set for FS requests or BLOCK_PC requests
originating from the generic block layer scsi ioctls. It's not set
for requests origination from other soures or internal cache flush
commands implemented by the patch I'll send after this.

So instead of using it to get at the private data in do_virtblk_request
setup queue->queuedata and use it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>


# 9934c8c0 07-May-2009 Tejun Heo <tj@kernel.org>

block: implement and enforce request peek/start/fetch

Till now block layer allowed two separate modes of request execution.
A request is always acquired from the request queue via
elv_next_request(). After that, drivers are free to either dequeue it
or process it without dequeueing. Dequeue allows elv_next_request()
to return the next request so that multiple requests can be in flight.

Executing requests without dequeueing has its merits mostly in
allowing drivers for simpler devices which can't do sg to deal with
segments only without considering request boundary. However, the
benefit this brings is dubious and declining while the cost of the API
ambiguity is increasing. Segment based drivers are usually for very
old or limited devices and as converting to dequeueing model isn't
difficult, it doesn't justify the API overhead it puts on block layer
and its more modern users.

Previous patches converted all block low level drivers to dequeueing
model. This patch completes the API transition by...

* renaming elv_next_request() to blk_peek_request()

* renaming blkdev_dequeue_request() to blk_start_request()

* adding blk_fetch_request() which is combination of peek and start

* disallowing completion of queued (not started) requests

* applying new API to all LLDs

Renamings are for consistency and to break out of tree code so that
it's apparent that out of tree drivers need updating.

[ Impact: block request issue API cleanup, no functional change ]

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Mike Miller <mike.miller@hp.com>
Cc: unsik Kim <donari75@gmail.com>
Cc: Paul Clements <paul.clements@steeleye.com>
Cc: Tim Waugh <tim@cyberelk.net>
Cc: Geert Uytterhoeven <Geert.Uytterhoeven@sonycom.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Laurent Vivier <Laurent@lvivier.info>
Cc: Jeff Garzik <jgarzik@pobox.com>
Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: Grant Likely <grant.likely@secretlab.ca>
Cc: Adrian McMenamin <adrian@mcmen.demon.co.uk>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Cc: Borislav Petkov <petkovbb@googlemail.com>
Cc: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Cc: Alex Dubov <oakad@yahoo.com>
Cc: Pierre Ossman <drzeus@drzeus.cx>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Markus Lidel <Markus.Lidel@shadowconnect.com>
Cc: Stefan Weinhuber <wein@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Pete Zaitcev <zaitcev@redhat.com>
Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>


# 83096ebf 07-May-2009 Tejun Heo <tj@kernel.org>

block: convert to pos and nr_sectors accessors

With recent cleanups, there is no place where low level driver
directly manipulates request fields. This means that the 'hard'
request fields always equal the !hard fields. Convert all
rq->sectors, nr_sectors and current_nr_sectors references to
accessors.

While at it, drop superflous blk_rq_pos() < 0 test in swim.c.

[ Impact: use pos and nr_sectors accessors ]

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Geert Uytterhoeven <Geert.Uytterhoeven@sonycom.com>
Tested-by: Grant Likely <grant.likely@secretlab.ca>
Acked-by: Grant Likely <grant.likely@secretlab.ca>
Tested-by: Adrian McMenamin <adrian@mcmen.demon.co.uk>
Acked-by: Adrian McMenamin <adrian@mcmen.demon.co.uk>
Acked-by: Mike Miller <mike.miller@hp.com>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Cc: Borislav Petkov <petkovbb@googlemail.com>
Cc: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Cc: Eric Moore <Eric.Moore@lsi.com>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Pete Zaitcev <zaitcev@redhat.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Paul Clements <paul.clements@steeleye.com>
Cc: Tim Waugh <tim@cyberelk.net>
Cc: Jeff Garzik <jgarzik@pobox.com>
Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: Alex Dubov <oakad@yahoo.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Dario Ballabio <ballabio_dario@emc.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: unsik Kim <donari75@gmail.com>
Cc: Laurent Vivier <Laurent@lvivier.info>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>


# 40cbbb78 22-Apr-2009 Tejun Heo <tj@kernel.org>

block: implement and use [__]blk_end_request_all()

There are many [__]blk_end_request() call sites which call it with
full request length and expect full completion. Many of them ensure
that the request actually completes by doing BUG_ON() the return
value, which is awkward and error-prone.

This patch adds [__]blk_end_request_all() which takes @rq and @error
and fully completes the request. BUG_ON() is added to to ensure that
this actually happens.

Most conversions are simple but there are a few noteworthy ones.

* cdrom/viocd: viocd_end_request() replaced with direct calls to
__blk_end_request_all().

* s390/block/dasd: dasd_end_request() replaced with direct calls to
__blk_end_request_all().

* s390/char/tape_block: tapeblock_end_request() replaced with direct
calls to blk_end_request_all().

[ Impact: cleanup ]

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Mike Miller <mike.miller@hp.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Jeff Garzik <jgarzik@pobox.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: Alex Dubov <oakad@yahoo.com>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>


# b194aee9 26-Nov-2008 Randy Dunlap <randy.dunlap@oracle.com>

virtio_blk: fix type warning

Fix parameter type warning:

linux-next-20081126/drivers/block/virtio_blk.c:307: warning: large integer implicitly truncated to unsigned type

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


# 0864b79a 30-Dec-2008 Rusty Russell <rusty@rustcorp.com.au>

virtio: block: dynamic maximum segments

Enhance the driver to handle whatever maximum segment number the host
tells us to handle. Do to this, we need to allocate the scatterlist
dynamically.

We set max_phys_segments and max_hw_segments to the same value (1 if
the host doesn't tell us, since that's safest and all known hosts do
tell us).

Note that kmalloc'ing the structure for large sg_elems might be
problematic: the fix for this is sg_table, but that requires more
work.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


# 4b7f7e20 30-Dec-2008 Rusty Russell <rusty@rustcorp.com.au>

virtio: set max_segment_size and max_sectors to infinite.

Setting max_segment_size allows more than 64k per sg element, unless
the host specified a limit. Setting max_sectors indicates that our
max_hw_segments is the only limit.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


# 7d116b62 27-Oct-2008 Fernando Luis Vázquez Cao <fernando@oss.ntt.co.jp>

virtio_blk: set queue paravirt flag

As a paravirt front-end driver, virtio_blk is not a rotational device so
we want do avoid idling in AS/CFQ. Tell the block layer about this.

Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>


# 4e109852 02-Mar-2008 Al Viro <viro@zeniv.linux.org.uk>

[PATCH] switch virtio_blk

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>


# d4430d62 02-Mar-2008 Al Viro <viro@zeniv.linux.org.uk>

[PATCH] beginning of methods conversion

To keep the size of changesets sane we split the switch by drivers;
to keep the damn thing bisectable we do the following:
1) rename the affected methods, add ones with correct
prototypes, make (few) callers handle both. That's this changeset.
2) for each driver convert to new methods. *ALL* drivers
are converted in this series.
3) kill the old (renamed) methods.

Note that it _is_ a flagday; all in-tree drivers are converted and by the
end of this series no trace of old methods remain. The only reason why
we do that this way is to keep the damn thing bisectable and allow per-driver
debugging if anything goes wrong.

New methods:
open(bdev, mode)
release(disk, mode)
ioctl(bdev, mode, cmd, arg) /* Called without BKL */
compat_ioctl(bdev, mode, cmd, arg)
locked_ioctl(bdev, mode, cmd, arg) /* Called with BKL, legacy */

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>


# 74f3c8af 27-Aug-2007 Al Viro <viro@zeniv.linux.org.uk>

[PATCH] switch scsi_cmd_ioctl() to passing fmode_t

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>


# 8316982a 01-Oct-2008 Kiyoshi Ueda <k-ueda@ct.jp.nec.com>

virtio_blk: change to use __blk_end_request()

This patch converts virtio_blk to use __blk_end_request() directly
so that end_{queued|dequeued}_request() can be removed.
Related 'uptodate' argument is converted to 'error'.

Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com>
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>


# 766ca442 14-Aug-2008 Fernando Luis Vázquez Cao <fernando@oss.ntt.co.jp>

virtio_blk: use a wrapper function to access io context information of IO requests

struct request has an ioprio member but it is never updated because
currently bios do not hold io context information. The implication of
this is that virtio_blk ends up passing useless information to the
backend driver.

That said, some IO schedulers such as CFQ do store io context
information in struct request, but use private members for that, which
means that that information cannot be directly accessed in a IO
scheduler-independent way.

This patch adds a function to obtain the ioprio of a request. We should
avoid accessing ioprio directly and use this function instead, so that
its users do not have to care about future changes in block layer
structures or what the currently active IO controller is.

This patch does not introduce any functional changes but paves the way
for future clean-ups and enhancements.

Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>


# 066f4d82 29-May-2008 Christian Borntraeger <borntraeger@de.ibm.com>

virtio_blk: check for hardsector size from host

Currently virtio_blk assumes a 512 byte hard sector size. This can cause
trouble / performance issues if the backing has a different block size
(like a file on an ext3 file system formatted with 4k block size or a dasd).

Lets add a feature flag that tells the guest to use a different hard sector
size than 512 byte.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


# 3ef53609 16-May-2008 Christian Borntraeger <borntraeger@de.ibm.com>

virtio_blk: allow read-only disks

Hello Rusty,

sometimes it is useful to share a disk (e.g. usr). To avoid file system
corruption, the disk should be mounted read-only in that case. This patch
adds a new feature flag, that allows the host to specify, if the disk should
be considered read-only.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


# ac9d463a 30-May-2008 Chris Lalancette <clalance@redhat.com>

Fix crash in virtio_blk during modprobe ; rmmod ; modprobe

Fix a modprobe virtio_blk ; rmmod virtio_blk ; modprobe virtio_blk crash; this
was basically because we weren't doing "del_gendisk()" in the remove path.

Signed-off-by: Chris Lalancette <clalance@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (moved del_gendisk up)


# 48e4043d 16-Apr-2008 Ryan Harper <ryanh@us.ibm.com>

virtio: add virtio disk geometry feature

Rather than faking up some geometry, allow the backend to push the disk
geometry via virtio pci config option. Keep the old geo code around for
compatibility.

Signed-off-by: Ryan Harper <ryanh@us.ibm.com>
Reviewed-by: Anthony Liguori <aliguori@us.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (modified to single struct)


# c45a6816 02-May-2008 Rusty Russell <rusty@rustcorp.com.au>

virtio: explicit advertisement of driver features

A recent proposed feature addition to the virtio block driver revealed
some flaws in the API: in particular, we assume that feature
negotiation is complete once a driver's probe function returns.

There is nothing in the API to require this, however, and even I
didn't notice when it was violated.

So instead, we require the driver to specify what features it supports
in a table, we can then move the feature negotiation into the virtio
core. The intersection of device and driver features are presented in
a new 'features' bitmap in the struct virtio_device.

Note that this highlights the difference between Linux unsigned-long
bitmaps where each unsigned long is in native endian, and a
straight-forward little-endian array of bytes.

Drivers can still remove feature bits in their probe routine if they
really have to.

API changes:
- dev->config->feature() no longer gets and acks a feature.
- drivers should advertise their features in the 'feature_table' field
- use virtio_has_feature() for extra sanity when checking feature bits

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


# 72e61eb4 02-May-2008 Rusty Russell <rusty@rustcorp.com.au>

virtio: change config to guest endian.

A recent proposed feature addition to the virtio block driver revealed
some flaws in the API, in particular how easy it is to break big
endian machines.

The virtio config space was originally chosen to be little-endian,
because we thought the config might be part of the PCI config space
for virtio_pci. It's actually a separate mmio region, so that
argument holds little water; as only x86 is currently using the virtio
mechanism, we can change this (but must do so now, before the
impending s390 merge).

API changes:
- __virtio_config_val() just becomes a striaght vdev->config_get() call.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


# 2e895e4c 24-Apr-2008 Marcelo Tosatti <mtosatti@redhat.com>

virtio-blk: fix remove oops

Do not unregister the major at device remove, since there might be
another device instances around.

(qemu) pci_del 0 11
(qemu) ACPI: PCI interrupt for device 0000:00:0b.0 disabled
(qemu) pci_del 0 10
(qemu) ------------[ cut here ]------------
WARNING: at block/genhd.c:126 unregister_blkdev+0x74/0x9e()
ACPI: PCI interrupt for device 0000:00:0a.0 disabled

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


# cb38fa23 02-May-2008 Rusty Russell <rusty@rustcorp.com.au>

virtio: de-structify virtio_block status byte

Ron Minnich points out that a struct containing a char is not always
sizeof(char); simplest to remove the structure to avoid confusion.

Cc: "ron minnich" <rminnich@gmail.com>

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


# c4839346 02-Mar-2008 Jeremy Katz <katzj@redhat.com>

virtio: Fix sysfs bits to have proper block symlink

Fix up so that the virtio_blk devices in sysfs link correctly to their
block device. This then allows them to be detected by hal, etc

Signed-off-by: Jeremy Katz <katzj@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


# d50ed907 01-Feb-2008 Christian Borntraeger <borntraeger@de.ibm.com>

virtio_blk: implement naming for vda-vdz,vdaa-vdzz,vdaaa-vdzzz

Am Freitag, 1. Februar 2008 schrieb Christian Borntraeger:
> Right. I will fix that with an additional patch.

This patch goes on top of the minor number patch. Please let me know if
you want a merged patch:

Currently virtio_blk creates the disk name combinging "vd" with 'a'++.
This will give strange names after vdz. I have implemented names up to
vdzzz - inspired by the sd.c code. That should be sufficient for now.

There is one driver in the kernel (driver/s390/block/dasd_genhd.c) that
implements names from dasda-dasdzzzz allowing even more disks. Maybe
a janitor can come up with a common implementation usable for all kind
of block device drivers.

I have tested this patch with 100 disks - seems to work.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


# 4f3bf19c 31-Jan-2008 Christian Borntraeger <borntraeger@de.ibm.com>

virtio_blk: Dont waste major numbers

Rusty,

currently virtio_blk uses one major number per device. While this works
quite well on most systems it is wasteful and will exhaust major numbers
on larger installations.

This patch allocates a major number on init and will use 16 minor numbers
for each disk. That will allow ~64k virtio_blk disks.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


# 135da0b0 23-Jan-2008 Christian Borntraeger <borntraeger@de.ibm.com>

virtio_blk: provide getgeo

Rusty,

I currently try to make my guest boot from an virtio root device
without having an external kernel. Some of the tools that I tried
expect HDIO_GETGEO to work. The most interesting value is likely
the geo.start value to get the offset of a partition. This value
is filled by block/ioctl.c if fops->getgeo is set. This patch also
fills in some standard values for heads, sectors and cylinders.

Makes sense?

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


# 6e5aa7ef 04-Feb-2008 Rusty Russell <rusty@rustcorp.com.au>

virtio: reset function

A reset function solves three problems:

1) It allows us to renegotiate features, eg. if we want to upgrade a
guest driver without rebooting the guest.

2) It gives us a clean way of shutting down virtqueues: after a reset,
we know that the buffers won't be used by the host, and

3) It helps the guest recover from messed-up drivers.

So we remove the ->shutdown hook, and the only way we now remove
feature bits is via reset.

We leave it to the driver to do the reset before it deletes queues:
the balloon driver, for example, needs to chat to the host in its
remove function.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


# 18445c4d 04-Feb-2008 Rusty Russell <rusty@rustcorp.com.au>

virtio: explicit enable_cb/disable_cb rather than callback return.

It seems that virtio_net wants to disable callbacks (interrupts) before
calling netif_rx_schedule(), so we can't use the return value to do so.

Rename "restart" to "cb_enable" and introduce "cb_disable" hook: callback
now returns void, rather than a boolean.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


# a586d4f6 04-Feb-2008 Rusty Russell <rusty@rustcorp.com.au>

virtio: simplify config mechanism.

Previously we used a type/len pair within the config space, but this
seems overkill. We now simply define a structure which represents the
layout in the config space: the config space can now only be extended
at the end.

The main driver-visible changes:
1) We indicate what fields are present with an explicit feature bit.
2) Virtqueues are explicitly numbered, and not in the config space.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


# 74b2553f 19-Nov-2007 Rusty Russell <rusty@rustcorp.com.au>

virtio: fix module/device unloading

The virtio code never hooked through the ->remove callback. Although
noone supports device removal at the moment, this code is already
needed for module unloading.

This of course also revealed bugs in virtio_blk, virtio_net and lguest
unloading paths.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


# 3d1266c7 24-Oct-2007 Jens Axboe <jens.axboe@oracle.com>

SG: audit of drivers that use blk_rq_map_sg()

They need to properly init the sg table, or blk_rq_map_sg() will
complain if CONFIG_DEBUG_SG is set.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>


# e467cde2 21-Oct-2007 Rusty Russell <rusty@rustcorp.com.au>

Block driver using virtio.

The block driver uses scatter-gather lists with sg[0] being the
request information (struct virtio_blk_outhdr) with the type, sector
and inbuf id. The next N sg entries are the bio itself, then the last
sg is the status byte. Whether the N entries are in or out depends on
whether it's a read or a write.

We accept the normal (SCSI) ioctls: they get handed through to the other
side which can then handle it or reply that it's unsupported. It's
not clear that this actually works in general, since I don't know
if blk_pc_request() requests have an accurate rq_data_dir().

Although we try to reply -ENOTTY on unsupported commands, ioctl(fd,
CDROMEJECT) returns success to userspace. This needs a separate
patch.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Jens Axboe <jens.axboe@oracle.com>