History log of /linux-master/drivers/scsi/sd.h
Revision Date Author Comments
# 4f53138f 30-Jan-2024 Bart Van Assche <bvanassche@acm.org>

scsi: sd: Translate data lifetime information

Recently T10 standardized SBC constrained streams. This mechanism allows to
pass data lifetime information to SCSI devices in the group number field.
Add support for translating write hint information into a permanent stream
number in the sd driver. Use WRITE(10) instead of WRITE(6) if data lifetime
information is present because the WRITE(6) command does not have a GROUP
NUMBER field.

Cc: Martin K. Petersen <martin.petersen@oracle.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Damien Le Moal <dlemoal@kernel.org>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20240130214911.1863909-12-bvanassche@acm.org
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>


# 96b171d6 30-Jan-2024 Bart Van Assche <bvanassche@acm.org>

scsi: core: Query the Block Limits Extension VPD page

Parse the Reduced Stream Control Supported (RSCS) bit from the block limits
extension VPD page. The RSCS bit is defined in SBC-5 r05
(https://www.t10.org/cgi-bin/ac.pl?t=f&f=sbc5r05.pdf).

Reviewed-by: Avri Altman <avri.altman@wdc.com>
Reviewed-by: Daejun Park <daejun7.park@samsung.com>
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20240130214911.1863909-10-bvanassche@acm.org
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>


# 99398d20 08-Sep-2023 Damien Le Moal <dlemoal@kernel.org>

scsi: sd: Do not issue commands to suspended disks on shutdown

If an error occurs when resuming a host adapter before the devices
attached to the adapter are resumed, the adapter low level driver may
remove the scsi host, resulting in a call to sd_remove() for the
disks of the host. This in turn results in a call to sd_shutdown() which
will issue a synchronize cache command and a start stop unit command to
spindown the disk. sd_shutdown() issues the commands only if the device
is not already runtime suspended but does not check the power state for
system-wide suspend/resume. That is, the commands may be issued with the
device in a suspended state, which causes PM resume to hang, forcing a
reset of the machine to recover.

Fix this by tracking the suspended state of a disk by introducing the
suspended boolean field in the scsi_disk structure. This flag is set to
true when the disk is suspended is sd_suspend_common() and resumed with
sd_resume(). When suspended is true, sd_shutdown() is not executed from
sd_remove().

Cc: stable@vger.kernel.org
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>


# 785538bf 16-Aug-2022 Bart Van Assche <bvanassche@acm.org>

scsi: sd: Revert "Rework asynchronous resume support"

Although commit 88f1669019bd ("scsi: sd: Rework asynchronous resume support")
eliminates a delay for some ATA disks after resume, it causes resume of ATA
disks to fail on other setups. See also:

* "Resume process hangs for 5-6 seconds starting sometime in 5.16"
(https://bugzilla.kernel.org/show_bug.cgi?id=215880).

* Geert's regression report
(https://lore.kernel.org/linux-scsi/alpine.DEB.2.22.394.2207191125130.1006766@ramsan.of.borg/).

This is what I understand about this issue:

* During resume, ata_port_pm_resume() starts the SCSI error handler. This
changes the SCSI host state into SHOST_RECOVERY and causes
scsi_queue_rq() to return BLK_STS_RESOURCE.

* sd_resume() calls sd_start_stop_device() for ATA devices. That function
in turn calls sd_submit_start() which tries to submit a START STOP UNIT
command. That command can only be submitted after the SCSI error handler
has changed the SCSI host state back to SHOST_RUNNING.

* The SCSI error handler runs on its own thread and calls
schedule_work(&(ap->scsi_rescan_task)). That causes
ata_scsi_dev_rescan() to be called from the context of a kernel
workqueue. That call hangs in blk_mq_get_tag(). I'm not sure why - maybe
because all available tags have been allocated by sd_submit_start()
calls (this is a guess).

Link: https://lore.kernel.org/r/20220816172638.538734-1-bvanassche@acm.org
Fixes: 88f1669019bd ("scsi: sd: Rework asynchronous resume support")
Cc: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: gzhqyz@gmail.com
Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Reported-by: gzhqyz@gmail.com
Reported-and-tested-by: Vlastimil Babka <vbabka@suse.cz>
Tested-by: John Garry <john.garry@huawei.com>
Tested-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>


# 88f16690 30-Jun-2022 Bart Van Assche <bvanassche@acm.org>

scsi: sd: Rework asynchronous resume support

For some technologies, e.g. an ATA bus, resuming can take multiple
seconds. Waiting for resume to finish can cause a very noticeable delay.
Hence this commit that restores the behavior from before "scsi: core: pm:
Rely on the device driver core for async power management" for most SCSI
devices.

This commit introduces a behavior change: if the START command fails, do
not consider this as a SCSI disk resume failure.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=215880
Link: https://lore.kernel.org/r/20220630195703.10155-3-bvanassche@acm.org
Fixes: a19a93e4c6a9 ("scsi: core: pm: Rely on the device driver core for async power management")
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Hannes Reinecke <hare@suse.de>
Cc: John Garry <john.garry@huawei.com>
Cc: ericspero@icloud.com
Cc: jason600.groome@gmail.com
Tested-by: jason600.groome@gmail.com
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>


# 30c4fdc3 01-Jun-2022 Damien Le Moal <damien.lemoal@opensource.wdc.com>

scsi: sd_zbc: Prevent zone information memory leak

Make sure to always free a scsi disk zone information, even for regular
disks. This ensures that there is no memory leak, even in the case of a
zoned disk changing type to a regular disk (e.g. with a reformat using the
FORMAT WITH PRESET command or other vendor proprietary command).

To do this, rename sd_zbc_clear_zone_info() to sd_zbc_free_zone_info() and
remove sd_zbc_release_disk(). A call to sd_zbc_free_zone_info() is added to
sd_zbc_read_zones() for drives for which sd_is_zoned() returns
false. Furthermore, sd_zbc_free_zone_info() code make s sure that the sdkp
rev_mutex is never used while not being initialized by gating the cleanup
code with a a check on the zone_wp_update_buf field as it is never NULL
when rev_mutex has been initialized.

Link: https://lore.kernel.org/r/20220601062544.905141-3-damien.lemoal@opensource.wdc.com
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>


# 631669a2 01-Mar-2022 Martin K. Petersen <martin.petersen@oracle.com>

scsi: sd: Optimal I/O size should be a multiple of reported granularity

Commit a83da8a4509d ("scsi: sd: Optimal I/O size should be a multiple of
physical block size") validated the reported optimal I/O size against the
physical block size to overcome problems with devices reporting nonsensical
transfer sizes.

However, some devices claim conformity to older SCSI versions that predate
the physical block size being reported. Other devices do not report a
physical block size at all. We need to be able to validate the optimal I/O
size on those devices as well.

Many devices report an OPTIMAL TRANSFER LENGTH GRANULARITY in the same VPD
page as the OPTIMAL TRANSFER LENGTH. Use this value to validate the optimal
I/O size. Also check that the reported granularity is a multiple of the
physical block size, if supported.

Link: https://lore.kernel.org/r/33fb522e-4f61-1b76-914f-c9e6a3553c9b@gmail.com
Link: https://lore.kernel.org/r/20220302053559.32147-9-martin.petersen@oracle.com
Reported-by: Bernhard Sulzer <micraft.b@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>


# c976e588 21-Apr-2022 Damien Le Moal <damien.lemoal@opensource.wdc.com>

scsi: sd: sd_zbc: Hide gap zones

ZBC-2 allows host-managed disks to report gap zones. This allow zoned disks
to report an offset between data zone starts that is a power of two even if
the number of logical blocks with data per zone is not a power of two.

Another new feature in ZBC-2 is support for constant zone starting LBA
offsets. For zoned disks that report a constant zone starting LBA offset,
hide the gap zones from the block layer. Report the offset between data
zone starts as zone size and report the number of logical blocks with data
per zone as the zone capacity.

Link: https://lore.kernel.org/r/20220421183023.3462291-7-bvanassche@acm.org
Acked-by: Douglas Gilbert <dgilbert@interlog.com>
Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
[ bvanassche: Reworked this patch ]
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>


# 628617be 21-Apr-2022 Bart Van Assche <bvanassche@acm.org>

scsi: sd: sd_zbc: Introduce struct zoned_disk_info

Deriving the meaning of the nr_zones, rev_nr_zones, zone_blocks and
rev_zone_blocks member variables requires careful analysis of the source
code. Make the meaning of these member variables easier to understand by
introducing struct zoned_disk_info.

Link: https://lore.kernel.org/r/20220421183023.3462291-5-bvanassche@acm.org
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Acked-by: Douglas Gilbert <dgilbert@interlog.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>


# aa96bfb4 21-Apr-2022 Bart Van Assche <bvanassche@acm.org>

scsi: sd: sd_zbc: Improve source code documentation

Add several kernel-doc headers. Declare input arrays const. Specify the
array size in function declarations.

Link: https://lore.kernel.org/r/20220421183023.3462291-2-bvanassche@acm.org
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Acked-by: Douglas Gilbert <dgilbert@interlog.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>


# fad45c30 07-Mar-2022 Christoph Hellwig <hch@lst.de>

sd: rename the scsi_disk.dev field

dev is very hard to grep for. Give the field a more descriptive name and
documents its purpose.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Link: https://lore.kernel.org/r/20220308055200.735835-5-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# e7f76552 07-Mar-2022 Christoph Hellwig <hch@lst.de>

scsi: don't use disk->private_data to find the scsi_driver

Requiring every ULP to have the scsi_drive as first member of the
private data is rather fragile and not necessary anyway. Just use
the driver hanging off the SCSI device instead.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Link: https://lore.kernel.org/r/20220308055200.735835-4-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# e815d365 26-Oct-2021 Damien Le Moal <damien.lemoal@wdc.com>

scsi: sd: add concurrent positioning ranges support

Add the sd_read_cpr() function to the sd scsi disk driver to discover
if a device has multiple concurrent positioning ranges (i.e. multiple
actuators on an HDD). The existence of VPD page B9h indicates if a
device has multiple concurrent positioning ranges. The page content
describes each range supported by the device.

sd_read_cpr() is called from sd_revalidate_disk() and uses the block
layer functions disk_alloc_independent_access_ranges() and
disk_set_independent_access_ranges() to represent the set of actuators
of the device as independent access ranges.

The format of the Concurrent Positioning Ranges VPD page B9h is defined
in section 6.6.6 of SBC-5.

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Link: https://lore.kernel.org/r/20211027022223.183838-3-damien.lemoal@wdc.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 0610959f 01-Oct-2020 Mike Christie <michael.christie@oracle.com>

scsi: sd: Allow user to configure command retries

Some iSCSI targets went with the traditional "export N ports" approach and
then allowed the initiator to multipath over them. Other targets went the
opposite direction and export a single port, and then software on the
target side performs load balancing and failover to other targets via an
iSCSI specific feature or IP takover.

The problem for the 2nd type of config is we quickly run out of our five
retries and get I/O errors. In these setups we want to reduce resource use
on the initiator side so we only wanted the one session and no
dm-multipath. To handle traditional multipath operations like failover we
do IP takover on the target side. So we would have an iSCSI target running
on node1. Some monitoring software decides it's dead or the node is
overloaded so it starts the iSCSI target on node2. The problem is for the
failover case where we might have the equivalent of a dm-multipath
temporary all paths down, or we just have to try more than 5 nodes before
finding a good one.

To handle this type of issue allow the user to configure the disk cmd
retries from -1 to the current max of 5. -1 means infinite retries and
should be used for setups where some other setting is going to control when
to fail. For example iSCSI has the replacement/recovery timeout and fc
(some users have used FC with NPIV and done something similar as IP
takover) has dev_loss_tmo/fast_io_fail which will eventually expire and
fail I/O.

Link: https://lore.kernel.org/r/1601566554-26752-3-git-send-email-michael.christie@oracle.com
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>


# 6c5dee18 15-Sep-2020 Damien Le Moal <damien.lemoal@wdc.com>

scsi: sd: sd_zbc: Fix ZBC disk initialization

Make sure to call sd_zbc_init_disk() when the sdkp->zoned field is known,
that is, once sd_read_block_characteristics() is executed in
sd_revalidate_disk(), so that host-aware disks also get initialized. To do
so, move sd_zbc_init_disk() call in sd_zbc_revalidate_zones() and make sure
to execute it for all zoned disks, including for host-aware disks used as
regular disks as these disk zoned model may be changed back to BLK_ZONED_HA
when partitions are deleted.

Link: https://lore.kernel.org/r/20200915073347.832424-3-damien.lemoal@wdc.com
Fixes: 5795eb443060 ("scsi: sd_zbc: emulate ZONE_APPEND commands")
Cc: <stable@vger.kernel.org> # v5.8+
Reported-by: Borislav Petkov <bp@alien8.de>
Tested-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>


# 27ba3e8f 15-Sep-2020 Damien Le Moal <damien.lemoal@wdc.com>

scsi: sd: sd_zbc: Fix handling of host-aware ZBC disks

When CONFIG_BLK_DEV_ZONED is disabled, allow using host-aware ZBC disks as
regular disks. In this case, ensure that command completion is correctly
executed by changing sd_zbc_complete() to return good_bytes instead of 0
and causing a hang during device probe (endless retries).

When CONFIG_BLK_DEV_ZONED is enabled and a host-aware disk is detected to
have partitions, it will be used as a regular disk. In this case, make sure
to not do anything in sd_zbc_revalidate_zones() as that triggers warnings.

Since all these different cases result in subtle settings of the disk queue
zoned model, introduce the block layer helper function
blk_queue_set_zoned() to generically implement setting up the effective
zoned model according to the disk type, the presence of partitions on the
disk and CONFIG_BLK_DEV_ZONED configuration.

Link: https://lore.kernel.org/r/20200915073347.832424-2-damien.lemoal@wdc.com
Fixes: b72053072c0b ("block: allow partitions on host aware zone devices")
Cc: <stable@vger.kernel.org>
Reported-by: Borislav Petkov <bp@alien8.de>
Suggested-by: Christoph Hellwig <hch@infradead.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>


# a3d8a257 30-Jul-2020 Damien Le Moal <damien.lemoal@wdc.com>

scsi: sd_zbc: Improve zone revalidation

Currently, for zoned disks, since blk_revalidate_disk_zones() requires the
disk capacity to be set already to operate correctly, zones revalidation
can only be done on the second revalidate scan once the gendisk capacity is
set at the end of the first scan. As a result, if zone revalidation fails,
there is no second chance to recover from the failure and the disk capacity
is changed to 0, with the disk left unusable.

This can be improved by shuffling around code, specifically, by moving the
call to sd_zbc_revalidate_zones() from sd_zbc_read_zones() to the end of
sd_revalidate_disk(), after set_capacity_revalidate_and_notify() is called
to set the gendisk capacity. With this change, if sd_zbc_revalidate_zones()
fails on the first scan, the second scan will call it again to recover, if
possible.

Using the new struct scsi_disk fields rev_nr_zones and rev_zone_blocks,
sd_zbc_revalidate_zones() does actual work only if it detects a change with
the disk zone configuration. This means that for a successful zones
revalidation on the first scan, the second scan will not cause another
heavy full check.

While at it, remove the unecesary "extern" declaration of
sd_zbc_read_zones().

Link: https://lore.kernel.org/r/20200731054928.668547-1-damien.lemoal@wdc.com
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>


# ca0800a6 14-Jul-2020 YueHaibing <yuehaibing@huawei.com>

scsi: sd_zbc: Remove unused inline functions

These are no longer used and can be removed.

Link: https://lore.kernel.org/r/20200715025523.34620-1-yuehaibing@huawei.com
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>


# 5795eb44 12-May-2020 Johannes Thumshirn <johannes.thumshirn@wdc.com>

scsi: sd_zbc: emulate ZONE_APPEND commands

Emulate ZONE_APPEND for SCSI disks using a regular WRITE(16) command
with a start LBA set to the target zone write pointer position.

In order to always know the write pointer position of a sequential write
zone, the write pointer of all zones is tracked using an array of 32bits
zone write pointer offset attached to the scsi disk structure. Each
entry of the array indicate a zone write pointer position relative to
the zone start sector. The write pointer offsets are maintained in sync
with the device as follows:
1) the write pointer offset of a zone is reset to 0 when a
REQ_OP_ZONE_RESET command completes.
2) the write pointer offset of a zone is set to the zone size when a
REQ_OP_ZONE_FINISH command completes.
3) the write pointer offset of a zone is incremented by the number of
512B sectors written when a write, write same or a zone append
command completes.
4) the write pointer offset of all zones is reset to 0 when a
REQ_OP_ZONE_RESET_ALL command completes.

Since the block layer does not write lock zones for zone append
commands, to ensure a sequential ordering of the regular write commands
used for the emulation, the target zone of a zone append command is
locked when the function sd_zbc_prepare_zone_append() is called from
sd_setup_read_write_cmnd(). If the zone write lock cannot be obtained
(e.g. a zone append is in-flight or a regular write has already locked
the zone), the zone append command dispatching is delayed by returning
BLK_STS_ZONE_RESOURCE.

To avoid the need for write locking all zones for REQ_OP_ZONE_RESET_ALL
requests, use a spinlock to protect accesses and modifications of the
zone write pointer offsets. This spinlock is initialized from sd_probe()
using the new function sd_zbc_init().

Co-developed-by: Damien Le Moal <Damien.LeMoal@wdc.com>
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# a35989a0 25-Nov-2019 Damien Le Moal <damien.lemoal@wdc.com>

scsi: sd_zbc: Improve report zones error printout

In the case of a report zones command failure, instead of simply printing
the host_byte and driver_byte values returned, print a message that is more
human readable and useful, adding sense codes too.

To do so, use the already defined sd_print_sense_hdr() and
sd_print_result() functions by moving the declaration of these functions
into sd.h.

Link: https://lore.kernel.org/r/20191125070518.951717-1-damien.lemoal@wdc.com
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>


# d4100351 10-Nov-2019 Christoph Hellwig <hch@lst.de>

block: rework zone reporting

Avoid the need to allocate a potentially large array of struct blk_zone
in the block layer by switching the ->report_zones method interface to
a callback model. Now the caller simply supplies a callback that is
executed on each reported zone, and private data for it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# ad512f20 27-Oct-2019 Ajay Joshi <ajay.joshi@wdc.com>

scsi: sd_zbc: add zone open, close, and finish support

Implement REQ_OP_ZONE_OPEN, REQ_OP_ZONE_CLOSE and REQ_OP_ZONE_FINISH
support to allow explicit control of zone states.

Contains contributions from Matias Bjorling, Hans Holmberg,
Keith Busch and Damien Le Moal.

Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ajay Joshi <ajay.joshi@wdc.com>
Signed-off-by: Matias Bjorling <matias.bjorling@wdc.com>
Signed-off-by: Hans Holmberg <hans.holmberg@wdc.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# d81e9d49 01-Aug-2019 Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>

scsi: implement REQ_OP_ZONE_RESET_ALL

This patch implements the zone reset all operation for sd_zbc.c. We add
a new boolean parameter for the sd_zbc_setup_reset_cmd() to indicate
REQ_OP_ZONE_RESET_ALL command setup. Along with that we add support in
the completion path for the zone reset all.

Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Acked-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# bd976e52 30-Jun-2019 Damien Le Moal <damien.lemoal@wdc.com>

block: Kill gfp_t argument of blkdev_report_zones()

Only GFP_KERNEL and GFP_NOIO are used with blkdev_report_zones(). In
preparation of using vmalloc() for large report buffer and zone array
allocations used by this function, remove its "gfp_t gfp_mask" argument
and rely on the caller context to use memalloc_noio_save/restore() where
necessary (block layer zone revalidation and dm-zoned I/O error path).

Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# df46cac3 05-Feb-2019 Dietmar Hahn <dietmar.hahn@ts.fujitsu.com>

scsi: sd: Fix typo in sd_first_printk()

Commit b2bff6ceb61a9 ("[SCSI] sd: Quiesce mode sense error messages")
added the macro sd_first_printk(). The macro takes "sdsk" as argument
but dereferences "sdkp". This hasn't caused any real issues since all
callers of sd_first_printk() have an sdkp. But fix the typo.

[mkp: Turned this into a real patch and tweaked commit description]

Signed-off-by: Dietmar Hahn <dietmar.hahn@ts.fujitsu.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>


# 082c2cd2 08-Jan-2019 John Garry <john.garry@huawei.com>

scsi: sd: Make protection lookup tables static and relocate functions

Currently the protection lookup tables in sd_prot_flag_mask() and
sd_prot_op() are declared as non-static. As such, they will be rebuilt for
each respective function call.

Optimise by making them static.

This saves ~100B object code for sd.c:

Before:
text data bss dec hex filename
25403 1024 16 26443 674b drivers/scsi/sd.o

After:
text data bss dec hex filename
25299 1024 16 26339 66e3 drivers/scsi/sd.o

In addition, since those same functions are declared in sd.h, but each are
only referenced in sd.c, relocate them to that same c file.

The inline specifier is dropped also, since gcc should be able to make the
decision to inline.

Signed-off-by: John Garry <john.garry@huawei.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>


# 159b2cbf 09-Nov-2018 Christoph Hellwig <hch@lst.de>

scsi: return blk_status_t from scsi_init_io and ->init_command

Replace the old BLKPREP_* values with the BLK_STS_ ones that they are
converted to later anyway.

Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# bf505456 12-Oct-2018 Damien Le Moal <damien.lemoal@wdc.com>

block: Introduce blk_revalidate_disk_zones()

Drivers exposing zoned block devices have to initialize and maintain
correctness (i.e. revalidate) of the device zone bitmaps attached to
the device request queue (seq_zones_bitmap and seq_zones_wlock).

To simplify coding this, introduce a generic helper function
blk_revalidate_disk_zones() suitable for most (and likely all) cases.
This new function always update the seq_zones_bitmap and seq_zones_wlock
bitmaps as well as the queue nr_zones field when called for a disk
using a request based queue. For a disk using a BIO based queue, only
the number of zones is updated since these queues do not have
schedulers and so do not need the zone bitmaps.

With this change, the zone bitmap initialization code in sd_zbc.c can be
replaced with a call to this function in sd_zbc_read_zones(), which is
called from the disk revalidate block operation method.

A call to blk_revalidate_disk_zones() is also added to the null_blk
driver for devices created with the zoned mode enabled.

Finally, to ensure that zoned devices created with dm-linear or
dm-flakey expose the correct number of zones through sysfs, a call to
blk_revalidate_disk_zones() is added to dm_table_set_restrictions().

The zone bitmaps allocated and initialized with
blk_revalidate_disk_zones() are freed automatically from
__blk_release_queue() using the block internal function
blk_queue_free_zone_bitmaps().

Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# e76239a3 12-Oct-2018 Christoph Hellwig <hch@lst.de>

block: add a report_zones method

Dispatching a report zones command through the request queue is a major
pain due to the command reply payload rewriting necessary. Given that
blkdev_report_zones() is executing everything synchronously, implement
report zones as a block device file operation instead, allowing major
simplification of the code in many places.

sd, null-blk, dm-linear and dm-flakey being the only block device
drivers supporting exposing zoned block devices, these drivers are
modified to provide the device side implementation of the
report_zones() block device file operation.

For device mappers, a new report_zones() target type operation is
defined so that the upper block layer calls blkdev_report_zones() can
be propagated down to the underlying devices of the dm targets.
Implementation for this new operation is added to the dm-linear and
dm-flakey targets.

Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
[Damien]
* Changed method block_device argument to gendisk
* Various bug fixes and improvements
* Added support for null_blk, dm-linear and dm-flakey.
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 10c41ddd 29-Jul-2018 Max Gurtovoy <maxg@mellanox.com>

block: move dif_prepare/dif_complete functions to block layer

Currently these functions are implemented in the scsi layer, but their
actual place should be the block layer since T10-PI is a general data
integrity feature that is used in the nvme protocol as well. Also, use
the tuple size from the integrity profile since it may vary between
integrity types.

Suggested-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 354f1132 16-Apr-2018 Bart Van Assche <bvanassche@acm.org>

scsi: sd_zbc: Change the type of the ZBC fields into u32

This patch does not change any functionality but makes it clear that it is on
purpose that these fields are 32 bits wide.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Damien Le Moal <damien.lemoal@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.com>
Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>


# 39051dd8 20-Dec-2017 Damien Le Moal <damien.lemoal@wdc.com>

scsi: sd: Remove zone write locking

The block layer now handles zone write locking.

[mkp: removed SCMD_ZONE_WRITE_LOCK reference in scsi_debugfs]

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>


# b2441318 01-Nov-2017 Greg Kroah-Hartman <gregkh@linuxfoundation.org>

License cleanup: add SPDX GPL-2.0 license identifier to files with no license

Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.

By default all files without license information are under the default
license of the kernel, which is GPL version 2.

Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier. The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.

This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.

How this work was done:

Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
- file had no licensing information it it.
- file was a */uapi/* one with no licensing information in it,
- file was a */uapi/* one with existing licensing information,

Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.

The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne. Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.

The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed. Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.

Criteria used to select files for SPDX license identifier tagging was:
- Files considered eligible had to be source code files.
- Make and config files were included as candidates if they contained >5
lines of source
- File already had some variant of a license header in it (even if <5
lines).

All documentation files were explicitly excluded.

The following heuristics were used to determine which SPDX license
identifiers to apply.

- when both scanners couldn't find any license traces, file was
considered to have no license information in it, and the top level
COPYING file license applied.

For non */uapi/* files that summary was:

SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 11139

and resulted in the first patch in this series.

If that file was a */uapi/* path one, it was "GPL-2.0 WITH
Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was:

SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 WITH Linux-syscall-note 930

and resulted in the second patch in this series.

- if a file had some form of licensing information in it, and was one
of the */uapi/* ones, it was denoted with the Linux-syscall-note if
any GPL family license was found in the file or had no licensing in
it (per prior point). Results summary:

SPDX license identifier # files
---------------------------------------------------|------
GPL-2.0 WITH Linux-syscall-note 270
GPL-2.0+ WITH Linux-syscall-note 169
((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21
((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17
LGPL-2.1+ WITH Linux-syscall-note 15
GPL-1.0+ WITH Linux-syscall-note 14
((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5
LGPL-2.0+ WITH Linux-syscall-note 4
LGPL-2.1 WITH Linux-syscall-note 3
((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3
((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1

and that resulted in the third patch in this series.

- when the two scanners agreed on the detected license(s), that became
the concluded license(s).

- when there was disagreement between the two scanners (one detected a
license but the other didn't, or they both detected different
licenses) a manual inspection of the file occurred.

- In most cases a manual inspection of the information in the file
resulted in a clear resolution of the license that should apply (and
which scanner probably needed to revisit its heuristics).

- When it was not immediately clear, the license identifier was
confirmed with lawyers working with the Linux Foundation.

- If there was any question as to the appropriate license identifier,
the file was flagged for further research and to be revisited later
in time.

In total, over 70 hours of logged manual review was done on the
spreadsheet to determine the SPDX license identifiers to apply to the
source files by Kate, Philippe, Thomas and, in some cases, confirmation
by lawyers working with the Linux Foundation.

Kate also obtained a third independent scan of the 4.13 code base from
FOSSology, and compared selected files where the other two scanners
disagreed against that SPDX file, to see if there was new insights. The
Windriver scanner is based on an older version of FOSSology in part, so
they are related.

Thomas did random spot checks in about 500 files from the spreadsheets
for the uapi headers and agreed with SPDX license identifier in the
files he inspected. For the non-uapi files Thomas did random spot checks
in about 15000 files.

In initial set of patches against 4.14-rc6, 3 files were found to have
copy/paste license identifier errors, and have been fixed to reflect the
correct identifier.

Additionally Philippe spent 10 hours this week doing a detailed manual
inspection and review of the 12,461 patched files from the initial patch
version early this week with:
- a full scancode scan run, collecting the matched texts, detected
license ids and scores
- reviewing anything where there was a license detected (about 500+
files) to ensure that the applied SPDX license was correct
- reviewing anything where there was no detection but the patch license
was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
SPDX license was correct

This produced a worksheet with 20 files needing minor correction. This
worksheet was then exported into 3 different .csv files for the
different types of files to be modified.

These .csv files were then reviewed by Greg. Thomas wrote a script to
parse the csv files and add the proper SPDX tag to the file, in the
format that the file expected. This script was further refined by Greg
based on the output to detect more types of files automatically and to
distinguish between header and source .c files (which need different
comment types.) Finally Greg ran the script using the .csv files to
generate the patches.

Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


# d80210f2 19-Jun-2017 Christoph Hellwig <hch@lst.de>

sd: add support for TCG OPAL self encrypting disks

Just wire up the generic TCG OPAL infrastructure to the SCSI disk driver
and the Security In/Out commands.

Note that I don't know of any actual SCSI disks that do support TCG OPAL,
but this is required to support ATA disks through libata.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Tejun Heo <tj@kernel.org>


# a90dfdc2 24-Apr-2017 Damien Le Moal <damien.lemoal@wdc.com>

scsi: sd: sd_zbc: Rename sd_zbc_setup_write_cmnd

Rename sd_zbc_setup_write_cmnd() to sd_zbc_write_lock_zone() to be clear
about what the function actually does. To be consistent, also rename
sd_zbc_cancel_write_cmnd() to sd_zbc_write_unlock_zone().

No functional change is introduced by this patch.

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Bart Van Assche <Bart.VanAssche@sandisk.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>


# 6eadc612 24-Apr-2017 Damien Le Moal <damien.lemoal@wdc.com>

scsi: sd: Improve sd_completed_bytes

Re-shuffle the code to be more efficient by not initializing variables
upfront (i.e. do it only when necessary). Also replace the do_div calls
with calls to sectors_to_logical().

No functional change is introduced by this patch.

[mkp: bytes_to_logical()]

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>


# e6bd9312 05-Apr-2017 Martin K. Petersen <martin.petersen@oracle.com>

scsi: sd: Separate zeroout and discard command choices

Now that zeroout and discards are distinct operations we need to
separate the policy of choosing the appropriate command. Create a
zeroing_mode which can be one of:

write: Zeroout assist not present, use regular WRITE
writesame: Allow WRITE SAME(10/16) with a zeroed payload
writesame_16_unmap: Allow WRITE SAME(16) with UNMAP
writesame_10_unmap: Allow WRITE SAME(10) with UNMAP

The last two are conditional on the device being thin provisioned with
LBPRZ=1 and LBPWS=1 or LBPWS10=1 respectively.

Whether to set the UNMAP bit or not depends on the REQ_NOUNMAP flag. And
if none of the _unmap variants are supported, regular WRITE SAME will be
used if the device supports it.

The zeroout_mode is exported in sysfs and the detected mode for a given
device can be overridden using the string constants above.

With this change in place we can now issue WRITE SAME(16) with UNMAP set
for block zeroing applications that require hard guarantees and
logical_block_size granularity. And at the same time use the UNMAP
command with the device's preferred granulary and alignment for discard
operations.

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>


# 7a38dc0b 06-Apr-2017 Hannes Reinecke <hare@suse.de>

scsi: scsi_error: count medium access timeout only once per EH run

The current medium access timeout counter will be increased for
each command, so if there are enough failed commands we'll hit
the medium access timeout for even a single device failure and
the following kernel message is displayed:

sd H:C:T:L: [sdXY] Medium access timeout failure. Offlining disk!

Fix this by making the timeout per EH run, ie the counter will
only be increased once per device and EH run.

Fixes: 18a4d0a ("[SCSI] Handle disk devices which can not process medium access commands")
Cc: Ewan Milne <emilne@redhat.com>
Cc: Lawrence Obermann <loberman@redhat.com>
Cc: Benjamin Block <bblock@linux.vnet.ibm.com>
Cc: Steffen Maier <maier@linux.vnet.ibm.com>
Signed-off-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>


# 89d94756 18-Oct-2016 Hannes Reinecke <hare@suse.de>

sd: Implement support for ZBC devices

Implement ZBC support functions to setup zoned disks, both
host-managed and host-aware models. Only zoned disks that satisfy
the following conditions are supported:
1) All zones are the same size, with the exception of an eventual
last smaller runt zone.
2) For host-managed disks, reads are unrestricted (reads are not
failed due to zone or write pointer alignement constraints).
Zoned disks that do not satisfy these 2 conditions are setup with
a capacity of 0 to prevent their use.

The function sd_zbc_read_zones, called from sd_revalidate_disk,
checks that the device satisfies the above two constraints. This
function may also change the disk capacity previously set by
sd_read_capacity for devices reporting only the capacity of
conventional zones at the beginning of the LBA range (i.e. devices
reporting rc_basis set to 0).

The capacity message output was moved out of sd_read_capacity into
a new function sd_print_capacity to include this eventual capacity
change by sd_zbc_read_zones. This new function also includes a call
to sd_zbc_print_zones to display the number of zones and zone size
of the device.

Signed-off-by: Hannes Reinecke <hare@suse.de>

[Damien: * Removed zone cache support
* Removed mapping of discard to reset write pointer command
* Modified sd_zbc_read_zones to include checks that the
device satisfies the kernel constraints
* Implemeted REPORT ZONES setup and post-processing based
on code from Shaun Tancheff <shaun.tancheff@seagate.com>
* Removed confusing use of 512B sector units in functions
interface]
Signed-off-by: Damien Le Moal <damien.lemoal@hgst.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Shaun Tancheff <shaun.tancheff@seagate.com>
Tested-by: Shaun Tancheff <shaun.tancheff@seagate.com>
Acked-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Jens Axboe <axboe@fb.com>


# 8475c811 11-Sep-2016 Christoph Hellwig <hch@lst.de>

scsi: sd: Move DIF protection types to t10-pi.h

These should go together with the rest of the T10 protection information
defintions.

[mkp: s/T10_DIF/T10_PI/]

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>


# 6ebf105c 11-Sep-2016 Christoph Hellwig <hch@lst.de>

scsi: scsi_debug: Use struct t10_pi_tuple instead of struct sd_dif_tuple

And remove the declaration of the latter in sd.h as scsi_debug was the
only user.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>


# 6b7e9cde 12-May-2016 Martin K. Petersen <martin.petersen@oracle.com>

sd: Fix rw_max for devices that report an optimal xfer size

For historic reasons, io_opt is in bytes and max_sectors in block layer
sectors. This interface inconsistency is error prone and should be
fixed. But for 4.4--4.7 let's make the unit difference explicit via a
wrapper function.

Fixes: d0eb20a863ba ("sd: Optimal I/O size is in bytes, not sectors")
Cc: stable@vger.kernel.org # 4.4+
Reported-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Tested-by: Andrew Patterson <andrew.patterson@hpe.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>


# f08bb1e0 28-Mar-2016 Martin K. Petersen <martin.petersen@oracle.com>

sd: Fix excessive capacity printing on devices with blocks bigger than 512 bytes

During revalidate we check whether device capacity has changed before we
decide whether to output disk information or not.

The check for old capacity failed to take into account that we scaled
sdkp->capacity based on the reported logical block size. And therefore
the capacity test would always fail for devices with sectors bigger than
512 bytes and we would print several copies of the same discovery
information.

Avoid scaling sdkp->capacity and instead adjust the value on the fly
when setting the block device capacity and generating fake C/H/S
geometry.

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Cc: <stable@vger.kernel.org>
Reported-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Hannes Reinicke <hare@suse.de>
Reviewed-by: Ewan Milne <emilne@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>


# ca369d51 13-Nov-2015 Martin K. Petersen <martin.petersen@oracle.com>

block/sd: Fix device-imposed transfer length limits

Commit 4f258a46346c ("sd: Fix maximum I/O size for BLOCK_PC requests")
had the unfortunate side-effect of removing an implicit clamp to
BLK_DEF_MAX_SECTORS for REQ_TYPE_FS requests in the block layer
code. This caused problems for some SMR drives.

Debugging this issue revealed a few problems with the existing
infrastructure since the block layer didn't know how to deal with
device-imposed limits, only limits set by the I/O controller.

- Introduce a new queue limit, max_dev_sectors, which is used by the
ULD to signal the maximum sectors for a REQ_TYPE_FS request.

- Ensure that max_dev_sectors is correctly stacked and taken into
account when overriding max_sectors through sysfs.

- Rework sd_read_block_limits() so it saves the max_xfer and opt_xfer
values for later processing.

- In sd_revalidate() set the queue's max_dev_sectors based on the
MAXIMUM TRANSFER LENGTH value in the Block Limits VPD. If this value
is not reported, fall back to a cap based on the CDB TRANSFER LENGTH
field size.

- In sd_revalidate(), use OPTIMAL TRANSFER LENGTH from the Block Limits
VPD--if reported and sane--to signal the preferred device transfer
size for FS requests. Otherwise use BLK_DEF_MAX_SECTORS.

- blk_limits_max_hw_sectors() is no longer used and can be removed.

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=93581
Reviewed-by: Christoph Hellwig <hch@lst.de>
Tested-by: sweeneygj@gmx.com
Tested-by: Arzeets <anatol.pomozov@gmail.com>
Tested-by: David Eisner <david.eisner@oriel.oxon.org>
Tested-by: Mario Kicherer <dev@kicherer.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>


# 22e0d994 24-Oct-2014 Hannes Reinecke <hare@suse.de>

scsi: introduce sdev_prefix_printk()

Like scmd_printk(), but the device name is passed in as
a string. Can be used by eg ULDs which do not have access
to the scsi_cmnd structure.

Signed-off-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Robert Elliott <elliott@hp.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>


# c611529e 26-Sep-2014 Martin K. Petersen <martin.petersen@oracle.com>

sd: Honor block layer integrity handling flags

A set of flags introduced in the block layer enable better control over
how protection information is handled. These flags are useful for both
error injection and data recovery purposes. Checking can be enabled and
disabled for controller and disk, and the guard tag format is now a
per-I/O property.

Update sd_protect_op to communicate the relevant information to the
low-level device driver via a set of flags in scsi_cmnd.

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Acked-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>


# bcdb247c 03-Jun-2014 Martin K. Petersen <martin.petersen@oracle.com>

sd: Limit transfer length

Until now the per-command transfer length has exclusively been gated by
the max_sectors parameter in the scsi_host template. Given that the size
of this parameter has been bumped to an unsigned int we have to be
careful not to exceed the target device's capabilities.

If the if the device specifies a Maximum Transfer Length in the Block
Limits VPD we'll use that value. Otherwise we'll use 0xffffffff for
devices that have use_16_for_rw set and 0xffff for the rest. We then
combine the chosen disk limit with max_sectors in the host template. The
smaller of the two will be used to set the max_hw_sectors queue limit.

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Ewan D. Milne <emilne@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>


# b2bff6ce 03-Jan-2014 Martin K. Petersen <martin.petersen@oracle.com>

[SCSI] sd: Quiesce mode sense error messages

Messages about discovered disk properties are only printed once unless
they are found to have changed. Errors encountered during mode sense,
however, are printed every time we revalidate.

Quiesce mode sense errors so they are only printed during the first
scan.

[jejb: checkpatch fixes]
Bugzilla: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=733565
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>


# 7e660100 04-Oct-2013 James Bottomley <jbottomley@parallels.com>

[SCSI] Derive the FLUSH_TIMEOUT from the basic I/O timeout

Rather than having a separate constant for specifying the timeout on FLUSH
operations, use the basic I/O timeout value that is already configurable
on a per target basis to derive the FLUSH timeout. Looking at the current
definitions of these timeout values, the FLUSH operation is supposed to have
a value that is twice the normal timeout value. This patch preserves this
relationship while leveraging the flexibility of specifying the I/O timeout.

Based on a prior patch by KY Srinivasan <kys@microsoft.com>

Reviewed-by: KY Srinivasan <kys@microsoft.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>


# 66c28f97 06-Jun-2013 Martin K. Petersen <martin.petersen@oracle.com>

[SCSI] sd: Update WRITE SAME heuristics

SATA drives located behind a SAS controller would incorrectly receive
WRITE SAME commands. Tweak the heuristics so that:

- If REPORT SUPPORTED OPERATION CODES is provided we will use that to
choose between WRITE SAME(16), WRITE SAME(10) and disabled. This also
fixes an issue with the old code which would issue WRITE SAME(10)
despite the command not being whitelisted in REPORT SUPPORTED
OPERATION CODES.

- If REPORT SUPPORTED OPERATION CODES is not provided we will fall back
to WRITE SAME(10) unless the device has an ATA Information VPD page.
The assumption is that a SATL which is smart enough to implement
WRITE SAME would also provide REPORT SUPPORTED OPERATION CODES.

To facilitate the new heuristics scsi_report_opcode() has been modified
to so we can distinguish between "operation not supported" and "RSOC not
supported".

Reported-by: H. Peter Anvin <hpa@zytor.com>
Tested-by: Bernd Schubert <bernd.schubert@itwm.fraunhofer.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>


# 39c60a09 24-Apr-2013 James Bottomley <JBottomley@Parallels.com>

[SCSI] sd: fix array cache flushing bug causing performance problems

Some arrays synchronize their full non volatile cache when the sd driver sends
a SYNCHRONIZE CACHE command. Unfortunately, they can have Terrabytes of this
and we send a SYNCHRONIZE CACHE for every barrier if an array reports it has a
writeback cache. This leads to massive slowdowns on journalled filesystems.

The fix is to allow userspace to turn off the writeback cache setting as a
temporary measure (i.e. without doing the MODE SELECT to write it back to the
device), so even though the device reported it has a writeback cache, the
user, knowing that the cache is non volatile and all they care about is
filesystem correctness, can turn that bit off in the kernel and avoid the
performance ruinous (and safety irrelevant) SYNCHRONIZE CACHE commands.

The way you do this is add a 'temporary' prefix when performing the usual
cache setting operations, so

echo temporary write through > /sys/class/scsi_disk/<disk>/cache_type

Reported-by: Ric Wheeler <rwheeler@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>


# 5db44863 17-Sep-2012 Martin K. Petersen <martin.petersen@oracle.com>

[SCSI] sd: Implement support for WRITE SAME

Implement support for WRITE SAME(10) and WRITE SAME(16) in the SCSI disk
driver.

- We set the default maximum to 0xFFFF because there are several
devices out there that only support two-byte block counts even with
WRITE SAME(16). We only enable transfers bigger than 0xFFFF if the
device explicitly reports MAXIMUM WRITE SAME LENGTH in the BLOCK
LIMITS VPD.

- max_write_same_blocks can be overriden per-device basis in sysfs.

- The UNMAP discovery heuristics remain unchanged but the discard
limits are tweaked to match the "real" WRITE SAME commands.

- In the error handling logic we now distinguish between WRITE SAME
with and without UNMAP set.

The discovery process heuristics are:

- If the device reports a SCSI level of SPC-3 or greater we'll issue
READ SUPPORTED OPERATION CODES to find out whether WRITE SAME(16) is
supported. If that's the case we will use it.

- If the device supports the block limits VPD and reports a MAXIMUM
WRITE SAME LENGTH bigger than 0xFFFF we will use WRITE SAME(16).

- Otherwise we will use WRITE SAME(10) unless the target LBA is beyond
0xFFFFFFFF or the block count exceeds 0xFFFF.

- no_write_same is set for ATA, FireWire and USB.

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Mike Snitzer <snitzer@redhat.com>
Reviewed-by: Jeff Garzik <jgarzik@redhat.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>


# 8c579ab6 28-Aug-2012 Martin K. Petersen <martin.petersen@oracle.com>

[SCSI] sd: Avoid remapping bad reference tags

It does not make sense to translate ref tags with unexpected values.
Instead we simply ignore them and let the upper layers catch the
problem. Ref tags that contain the expected value are still remapped.

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>


# 18a4d0a2 09-Feb-2012 Martin K. Petersen <martin.petersen@oracle.com>

[SCSI] Handle disk devices which can not process medium access commands

We have experienced several devices which fail in a fashion we do not
currently handle gracefully in SCSI. After a failure these devices will
respond to the SCSI primary command set (INQUIRY, TEST UNIT READY, etc.)
but any command accessing the storage medium will time out.

The following patch adds an callback that can be used by upper level
drivers to inspect the results of an error handling command. This in
turn has been used to implement additional checking in the SCSI disk
driver.

If a medium access command fails twice but TEST UNIT READY succeeds both
times in the subsequent error handling we will offline the device. The
maximum number of failed commands required to take a device offline can
be tweaked in sysfs.

Also add a new error flag to scsi_debug which allows this scenario to be
easily reproduced.

[jejb: fix up integer parsing to use kstrtouint]
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>


# 21208ae5 19-Oct-2011 Dave Kleikamp <dave.kleikamp@oracle.com>

[SCSI] sd: remove arbitrary SD_MAX_DISKS namespace limit

There is no reason to limit the SCSI disk namespace to sdXXX.

Add new error messages to sd_probe() in the unlikely event that either
ida_get_new() or sd_format_disk_name() fail.

Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>


# c98a0eb0 08-Mar-2011 Martin K. Petersen <martin.petersen@oracle.com>

[SCSI] sd: Logical Block Provisioning update

SBC3r26 contains many changes to the Logical Block Provisioning
interfaces (formerly known as Thin Provisioning ditto). This patch
implements support for both the old and new schemes using the same
heuristic as before (whether the LBP VPD page is present).

The new code also allows the provisioning mode (i.e. choice of command)
to be overridden on a per-device basis via sysfs. Two additional modes
are supported in this version:

- WRITE SAME(10) with the UNMAP bit set

- WRITE SAME(10) without the UNMAP bit set. This allows us to support
devices that predate the TP/LBP enhancements in SBC3 and which work
by way zero-detection

Switching between modes has been consolidated in a helper function that
also updates the block layer topology according to the limitations of
the chosen command.

I experimented with trying WRITE SAME(16) if UNMAP fails, WRITE SAME(10)
if WRITE SAME(16) fails, etc. but found several devices that got
cranky. So for now we'll disable discard if one of the commands
fail. The user still has the option of selecting a different mode in
sysfs.

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>


# 2bae0093 18-Dec-2010 Tejun Heo <tj@kernel.org>

[SCSI] sd: implement sd_check_events()

Replace sd_media_change() with sd_check_events().

* Move media removed logic into set_media_not_present() and
media_not_present() and set sdev->changed iff an existing media is
removed or the device indicates UNIT_ATTENTION.

* Make sd_check_events() sets sdev->changed if previously missing
media becomes present.

* Event is reported only if sdev->changed is set.

This makes media presence event reported if scsi_disk->media_present
actually changed or the device indicated UNIT_ATTENTION. For backward
compatibility, SDEV_EVT_MEDIA_CHANGE is generated each time
sd_check_events() detects media change event.

[jejb: fix boot failure]
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Jens Axboe <jaxboe@fusionio.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>


# eface65c 18-Dec-2010 Tejun Heo <tj@kernel.org>

[SCSI] sd: implement sd_check_events()

Replace sd_media_change() with sd_check_events().

* Move media removed logic into set_media_not_present() and
media_not_present() and set sdev->changed iff an existing media is
removed or the device indicates UNIT_ATTENTION.

* Make sd_check_events() sets sdev->changed if previously missing
media becomes present.

* Event is reported only if sdev->changed is set.

This makes media presence event reported if scsi_disk->media_present
actually changed or the device indicated UNIT_ATTENTION. For backward
compatibility, SDEV_EVT_MEDIA_CHANGE is generated each time
sd_check_events() detects media change event.

[jejb: fix boot failure]
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Jens Axboe <jaxboe@fusionio.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>


# fcc57045 22-Dec-2010 Jens Axboe <jaxboe@fusionio.com>

Revert "sd: implement sd_check_events()"

This reverts commit c8d2e937355d02db3055c2fc203e5f017297ee1f.

We run into merging problems with the SCSI tree, revert this one
so it can be handled by a postmerge tree there.

Signed-off-by: Jens Axboe <jaxboe@fusionio.com>


# c8d2e937 08-Dec-2010 Tejun Heo <tj@kernel.org>

sd: implement sd_check_events()

Replace sd_media_change() with sd_check_events(). sd used to set the
changed state whenever the device is not ready, which can cause event
loop while the device is not ready. Media presence handling code is
changed such that the changed state is set iff the media presence
actually changes. UA still always sets the changed state and
NOT_READY always (at least where it used to set ->changed) clears
media presence, so no event is lost.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>


# 526f7c79 28-Sep-2010 Martin K. Petersen <martin.petersen@oracle.com>

[SCSI] sd: Fix overflow with big physical blocks

The hw_sector_size variable could overflow if a device reported huge
physical blocks. Switch to the more accurate physical_block_size
terminology and make sure we use an unsigned int to match the range
permitted by READ CAPACITY(16).

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>


# 045d3fe7 09-Sep-2010 Martin K. Petersen <martin.petersen@oracle.com>

[SCSI] sd: Update thin provisioning support

Add support for the Thin Provisioning VPD page and use the TPU and TPWS
bits to switch between UNMAP and WRITE SAME(16) for discards. If no TP
VPD page is present we fall back to old scheme where the max descriptor
count combined with the max lba count are used trigger UNMAP.

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>


# e3b3e624 11-Aug-2010 Mike Christie <mchristi@redhat.com>

[SCSI] scsi/block: increase flush/sync timeout

We have been seeing the flush request timeout with a wide
range of hardware from tgt+iser to FC targets from a major vendor.

After discussions about if the value should be configurable and
what the best value should be, this patch just increases the flush/sync
cache timeout to 1 minute. 2 minutes was determined to be too long, and
making it configurable was troublesome for users.

This patch was made over Linus's tree. It is not made over scsi-misc
or scsi-rc-fixes, because Linus's had block layer changes that my
patch was built over.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Acked-by: Jens Axboe <jaxboe@fusionio.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>


# 409f3499 07-Jul-2010 Arnd Bergmann <arnd@arndb.de>

scsi/sd: remove big kernel lock

Every user of the BKL in the sd driver is the
result of the pushdown from the block layer
into the open/close/ioctl functions.

The only place that used to rely on the BKL is
the sdkp->openers variable, which gets converted
into an atomic_t.

Nothing else seems to rely on the BKL, since the
functions do not touch global data without holding
another lock, and the open/close functions are
still protected from concurrent execution using
the bdev->bd_mutex.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: linux-scsi@vger.kernel.org
Cc: "James E.J. Bottomley" <James.Bottomley@suse.de>
Acked-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>


# e339c1a7 25-Nov-2009 Martin K. Petersen <martin.petersen@oracle.com>

[SCSI] sd: WRITE SAME(16) / UNMAP support

Implement a function for handling discard requests that sends either
WRITE SAME(16) or UNMAP(10) depending on parameters indicated by the
device in the block limits VPD.

Extract unmap constraints and report them to the block layer.

Based in part by a patch by Christoph Hellwig <hch@lst.de>.

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>


# 4e7392ec 20-Sep-2009 Martin K. Petersen <martin.petersen@oracle.com>

[SCSI] sd: Support disks formatted with DIF Type 2

Disks formatted with DIF Type 2 reject READ/WRITE 6/10/12/16 commands
when protection is enabled. Only the 32-byte variants are supported.

Implement support for issusing 32-byte READ/WRITE and enable Type 2
drives in the protection type detection logic.

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>


# 35e1a5d9 18-Sep-2009 Martin K. Petersen <martin.petersen@oracle.com>

[SCSI] sd: Detach DIF from block integrity infrastructure

So far we have only issued DIF commands if CONFIG_BLK_DEV_INTEGRITY is
enabled. However, communication between initiator and target should be
independent of protection information DMA. There are DIF-only host
adapters coming out that will be able to take advantage of this.

Move the relevant DIF bits to sd.c.

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>


# ea09bcc9 23-May-2009 Martin K. Petersen <martin.petersen@oracle.com>

sd: Physical block size and alignment support

Extract physical block size and lowest aligned LBA from READ
CAPACITY(16) response and adjust queue parameters.

Report physical block size and alignment when applicable.

[jejb: fix up trailing whitespace]
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>


# 70a9b873 09-Mar-2009 Martin K. Petersen <martin.petersen@oracle.com>

[SCSI] sd: Make revalidate less chatty

sd_revalidate ends up being called several times during device setup.
With this patch we print everything during the first scan. Subsequent
invocations will only print a message if the parameter in question has
actually changed (LUN capacity has increased, etc.).

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>


# 4c393e6e 13-Oct-2008 James Bottomley <James.Bottomley@HansenPartnership.com>

[SCSI] sd: fix compile failure with CONFIG_BLK_DEV_INTEGRITY=n

Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>


# 9e06688e 19-Sep-2008 Martin K. Petersen <martin.petersen@oracle.com>

[SCSI] sd: Correctly handle all combinations of DIF and DIX

The old detection code couldn't handle all possible combinations of
DIX and DIF. This version does, giving priority to DIX if the
controller is capable.

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>


# 18351070 05-Aug-2008 Linus Torvalds <torvalds@linux-foundation.org>

Re-introduce "[SCSI] extend the last_sector_bug flag to cover more sectors"

This re-introduces commit 2b142900784c6e38c8d39fa57d5f95ef08e735d8,
which was reverted due to the regression it caused by commit
fca082c9f1e11ec07efa8d2f9f13688521253f36.

That regression was not root-caused by the original commit, it was just
uncovered by it, and the real fix was done by Alan Stern in commit
580da34847488b404218d1d7f53b156f245f5555 ("Fix USB storage hang on
command abort").

We can thus re-introduce the change that was confirmed by Alan Jenkins
to be still required by his odd card reader.

Cc: Alan Jenkins <alan-jenkins@tuffmail.co.uk>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# fca082c9 04-Aug-2008 Linus Torvalds <torvalds@linux-foundation.org>

Revert "[SCSI] extend the last_sector_bug flag to cover more sectors"

This reverts commit 2b142900784c6e38c8d39fa57d5f95ef08e735d8, since it
seems to break some other USB storage devices (at least a JMicron USB to
ATA bridge). As such, while it apparently fixes some cardreaders, it
would need to be made conditional on the exact reader it fixes in order
to avoid causing regressions.

Cc: Alan Jenkins <alan-jenkins@tuffmail.co.uk>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 2b142900 27-Jul-2008 Alan Jenkins <alan-jenkins@tuffmail.co.uk>

[SCSI] extend the last_sector_bug flag to cover more sectors

The last_sector_bug flag was added to work around a bug in certain usb
cardreaders, where they would crash if a multiple sector read included the
last sector. The original implementation avoids this by e.g. splitting an 8
sector read which includes the last sector into a 7 sector read, and a single
sector read for the last sector. The flag is enabled for all USB devices.

This revealed a second bug in other usb cardreaders, which crash when they
get a multiple sector read which stops 1 sector short of the last sector.
Affected hardware includes the Kingston "MobileLite" external USB cardreader
and the internal USB cardreader on the Asus EeePC.

Extend the last_sector_bug workaround to ensure that any access which touches
the last 8 hardware sectors of the device is a single sector long. Requests
are shrunk as necessary to meet this constraint.

This gives us a safety margin against potential unknown or future bugs
affecting multi-sector access to the end of the device. The two known bugs
only affect the last 2 sectors. However, they suggest that these devices
are prone to fencepost errors and that multi-sector access to the end of the
device is not well tested. Popular OS's use multi-sector accesses, but they
rarely read the last few sectors. Linux (with udev & vol_id) automatically
reads sectors from the end of the device on insertion. It is assumed that
single sector accesses are more thoroughly tested during development.

Signed-off-by: Alan Jenkins <alan-jenkins@tuffmail.co.uk>
Tested-by: Alan Jenkins <alan-jenkins@tuffmail.co.uk>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>


# af55ff67 17-Jul-2008 Martin K. Petersen <martin.petersen@oracle.com>

[SCSI] sd: Support for SCSI disk (SBC) Data Integrity Field

Support for controllers and disks that implement DIF protection
information:

- During command preparation the RDPROTECT/WRPROTECT must be set
correctly if the target has DIF enabled.

- READ(6) and WRITE(6) are not supported when DIF is on.

- The controller must be told how to handle the I/O via the
protection operation field in scsi_cmnd.

- Refactor the I/O completion code that extracts failed LBA from the
returned sense data and handle DIF failures correctly.

- sd_dif.c implements the functions required to prepare and complete
requests with protection information attached.

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>


# e0597d70 17-Jul-2008 Martin K. Petersen <martin.petersen@oracle.com>

[SCSI] sd: Identify DIF protection type and application tag ownership

If a disk is formatted with protection information (Inquiry bit
PROTECT=1) it is required to support Read Capacity(16). Force use of
the 16-bit command in this case and extract the P_TYPE field which
indicates whether the disk is formatted using DIF Type 1, 2 or 3.

The ATO (App Tag Own) bit in the Control Mode Page indicates whether
the storage device or the initiator own the contents of the
DIF application tag.

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>


# 5b635da1 25-Jun-2008 Martin K. Petersen <martin.petersen@oracle.com>

[SCSI] sd: Move scsi_disk() accessor function to sd.h

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>


# aa91696e 16-Jun-2008 Martin K. Petersen <martin.petersen@oracle.com>

[SCSI] sd: Move sd.h header file

Christoph objected to having sd.h in include/scsi since it is internal
to the sd driver. Move it to drivers/scsi/sd.h.

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>