Cross Reference: /linux-master/drivers/scsi/scsi

History log of /linux-master/drivers/scsi/scsi_debug.c
Revision	Date	Author	Comments
# af180c08	30-Jan-2024	Bart Van Assche <bvanassche@acm.org>	scsi: scsi_debug: Maintain write statistics per group number Track per GROUP NUMBER how many write commands have been processed. Make this information available in sysfs. Reset these statistics if any data is written into the sysfs attribute. Note: SCSI devices should only interpret the information in the GROUP NUMBER field as a stream identifier if the ST_ENBLE bit has been set to one. This patch follows a simpler approach: count the number of writes per GROUP NUMBER whether or not the group number represents a stream identifier. Cc: Martin K. Petersen <martin.petersen@oracle.com> Cc: Douglas Gilbert <dgilbert@interlog.com> Tested-by: Douglas Gilbert <dgilbert@interlog.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20240130214911.1863909-20-bvanassche@acm.org Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# ad620bec	30-Jan-2024	Bart Van Assche <bvanassche@acm.org>	scsi: scsi_debug: Implement GET STREAM STATUS Implement the GET STREAM STATUS SCSI command. Report that the first five stream indexes correspond to permanent streams. Cc: Martin K. Petersen <martin.petersen@oracle.com> Cc: Douglas Gilbert <dgilbert@interlog.com> Tested-by: Douglas Gilbert <dgilbert@interlog.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20240130214911.1863909-19-bvanassche@acm.org Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# f8ab2710	30-Jan-2024	Bart Van Assche <bvanassche@acm.org>	scsi: scsi_debug: Implement the IO Advice Hints Grouping mode page Implement an IO Advice Hints Grouping mode page with three permanent streams. A permanent stream is a stream for which the device server does not allow closing or otherwise modifying the configuration of that stream. The stream identifier enable (ST_ENBLE) bit specifies whether the stream identifier may be used in the GROUP NUMBER field of SCSI WRITE commands. Cc: Martin K. Petersen <martin.petersen@oracle.com> Cc: Douglas Gilbert <dgilbert@interlog.com> Tested-by: Douglas Gilbert <dgilbert@interlog.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20240130214911.1863909-18-bvanassche@acm.org Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# b952eb27	30-Jan-2024	Bart Van Assche <bvanassche@acm.org>	scsi: scsi_debug: Allocate the MODE SENSE response from the heap Make the MODE SENSE response buffer larger and allocate it from the heap. This patch prepares for adding support for the IO Advice Hints Grouping mode page. Suggested-by: Douglas Gilbert <dgilbert@interlog.com> Cc: Douglas Gilbert <dgilbert@interlog.com> Tested-by: Douglas Gilbert <dgilbert@interlog.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20240130214911.1863909-17-bvanassche@acm.org Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# f19c3e4f	30-Jan-2024	Bart Van Assche <bvanassche@acm.org>	scsi: scsi_debug: Rework subpage code error handling Move the subpage code checks into the switch statement to make it easier to add support for new page code / subpage code combinations. Cc: Martin K. Petersen <martin.petersen@oracle.com> Cc: Douglas Gilbert <dgilbert@interlog.com> Tested-by: Douglas Gilbert <dgilbert@interlog.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20240130214911.1863909-16-bvanassche@acm.org Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# b2f86090	30-Jan-2024	Bart Van Assche <bvanassche@acm.org>	scsi: scsi_debug: Rework page code error handling Instead of tracking whether or not the page code is valid in a boolean variable, jump to error handling code if an unsupported page code is encountered. Cc: Martin K. Petersen <martin.petersen@oracle.com> Cc: Douglas Gilbert <dgilbert@interlog.com> Tested-by: Douglas Gilbert <dgilbert@interlog.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20240130214911.1863909-15-bvanassche@acm.org Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# b1e5c0b3	30-Jan-2024	Bart Van Assche <bvanassche@acm.org>	scsi: scsi_debug: Support the block limits extension VPD page >From SBC-5 r05: "Reduced stream control: a) reduces the maximum number of streams that the device server supports; and b) increases the number of write commands that are able to specify a stream to be written in any write command that contains the GROUP NUMBER field in its CDB. If the RSCS bit (see 6.6.5) is set to one, then the device server shall: a) support per group stream identifier usage as described in 4.32.2; b) support the IO Advice Hints Grouping mode page (see 6.5.7); and c) set the MAXIMUM NUMBER OF STREAMS field (see 6.6.5) to a value that is less than 64. Device servers that set the RSCS bit to one may support other features (e.g., permanent streams (see 4.32.4)). 4.32.4 Permanent streams A permanent stream is a stream for which the device server does not allow closing or otherwise modifying the configuration of that stream. The PERM bit (see 5.9.2.3) indicates whether a stream is a permanent stream. If a STREAM CONTROL command (see 5.32) specifies the closing of a permanent stream, the device server terminates that command with CHECK CONDITION status instead of closing the specified stream. A permanent stream is always an open stream. Device severs should assign the lowest numbered stream identifiers to permanent streams." Report that reduced stream control is supported. Cc: Martin K. Petersen <martin.petersen@oracle.com> Cc: Douglas Gilbert <dgilbert@interlog.com> Tested-by: Douglas Gilbert <dgilbert@interlog.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20240130214911.1863909-14-bvanassche@acm.org Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# a5fe98eb	30-Jan-2024	Bart Van Assche <bvanassche@acm.org>	scsi: scsi_debug: Reduce code duplication All VPD pages have the page code in byte one. Reduce code duplication by storing the VPD page code once. Reviewed-by: Avri Altman <avri.altman@wdc.com> Cc: Martin K. Petersen <martin.petersen@oracle.com> Cc: Douglas Gilbert <dgilbert@interlog.com> Tested-by: Douglas Gilbert <dgilbert@interlog.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20240130214911.1863909-13-bvanassche@acm.org Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# ac0dd0f3	03-Feb-2024	Ricardo B. Marliere <ricardo@marliere.net>	scsi: scsi_debug: Make pseudo_lld_bus const Now that the driver core can properly handle constant struct bus_type, move the pseudo_lld_bus variable to be a constant structure as well, placing it into read-only memory which can not be modified at runtime. Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Suggested-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Ricardo B. Marliere <ricardo@marliere.net> Link: https://lore.kernel.org/r/20240203-bus_cleanup-scsi-v1-3-6f552fb24f71@marliere.net Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# 7437bb73	17-Dec-2023	Christoph Hellwig <hch@lst.de>	block: remove support for the host aware zone model When zones were first added the SCSI and ATA specs, two different models were supported (in addition to the drive managed one that is invisible to the host): - host managed where non-conventional zones there is strict requirement to write at the write pointer, or else an error is returned - host aware where a write point is maintained if writes always happen at it, otherwise it is left in an under-defined state and the sequential write preferred zones behave like conventional zones (probably very badly performing ones, though) Not surprisingly this lukewarm model didn't prove to be very useful and was finally removed from the ZBC and SBC specs (NVMe never implemented it). Due to to the easily disappearing write pointer host software could never rely on the write pointer to actually be useful for say recovery. Fortunately only a few HDD prototypes shipped using this model which never made it to mass production. Drop the support before it is too late. Note that any such host aware prototype HDD can still be used with Linux as we'll now treat it as a conventional HDD. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20231217165359.604246-4-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
# 037fbd3f	06-Nov-2023	Dan Carpenter <dan.carpenter@linaro.org>	scsi: scsi_debug: Delete some bogus error checking Smatch complains that "dentry" is never initialized. These days everyone initializes all their stack variables to zero so this means that it will trigger a warning every time this function is run. Really, debugfs functions are not supposed to be checked for errors in normal code. For example, if we updated this code to check the correct variable then it would print a warning if CONFIG_DEBUGFS was disabled. We don't want that. Just delete the check. Fixes: f084fe52c640 ("scsi: scsi_debug: Add debugfs interface to fail target reset") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Link: https://lore.kernel.org/r/c602c9ad-5e35-4e18-a47f-87ed956a9ec2@moroto.mountain Reviewed-by: Wenchao Hao <haowenchao2@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# 860c3d03	06-Nov-2023	Dan Carpenter <dan.carpenter@linaro.org>	scsi: scsi_debug: Fix some bugs in sdebug_error_write() There are two bug in this code: 1) If count is zero, then it will lead to a NULL dereference. The kmalloc() will successfully allocate zero bytes and the test for "if (buf[0] == '-')" will read beyond the end of the zero size buffer and Oops. 2) The code does not ensure that the user's string is properly NUL terminated which could lead to a read overflow. Fixes: a9996d722b11 ("scsi: scsi_debug: Add interface to manage error injection for a single device") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Link: https://lore.kernel.org/r/7733643d-e102-4581-8d29-769472011c97@moroto.mountain Reviewed-by: Wenchao Hao <haowenchao2@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# 573c2d06	10-Oct-2023	Wenchao Hao <haowenchao2@huawei.com>	scsi: scsi_debug: Add param to control sdev's allow_restart Add new module param "allow_restart" to control scsi_device's allow_restart flag. This flag determines if EH is triggered after a command completes with sense_key 0x6, ASC 0x4 and ASCQ 0x2. EH would be triggered if allow_restart=1 in this condition. The new param can be used with the error injection capability to test how commands completing with sense_key 0x6, ASC 0x4 and ASCQ 0x2 are handled. Signed-off-by: Wenchao Hao <haowenchao2@huawei.com> Link: https://lore.kernel.org/r/20231010092051.608007-11-haowenchao2@huawei.com Tested-by: Douglas Gilbert <dgilbert@interlog.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# f084fe52	10-Oct-2023	Wenchao Hao <haowenchao2@huawei.com>	scsi: scsi_debug: Add debugfs interface to fail target reset The interface is found at /sys/kernel/debug/scsi_debug/target<h:c:t>/fail_reset where <h:c:t> identifies the target to inject errors on. It's a simple bool type interface which would make this target's reset fail if set to 'Y'. Signed-off-by: Wenchao Hao <haowenchao2@huawei.com> Link: https://lore.kernel.org/r/20231010092051.608007-10-haowenchao2@huawei.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# 02678116	10-Oct-2023	Wenchao Hao <haowenchao2@huawei.com>	scsi: scsi_debug: Add new error injection type: Reset LUN failed Add error injection type 4 to make scsi_debug_device_reset() return FAILED. Fail abort command format: +--------+------+-------------------------------------------------------+ \| Column \| Type \| Description \| +--------+------+-------------------------------------------------------+ \| 1 \| u8 \| Error type, fixed to 0x4 \| +--------+------+-------------------------------------------------------+ \| 2 \| s32 \| Error count \| \| \| \| 0: this rule will be ignored \| \| \| \| positive: the rule will always take effect \| \| \| \| negative: the rule takes effect n times where -n is \| \| \| \| the value given. Ignored after n times \| +--------+------+-------------------------------------------------------+ \| 3 \| x8 \| SCSI command opcode, 0xff for all commands \| +--------+------+-------------------------------------------------------+ Examples: error=/sys/kernel/debug/scsi_debug/0:0:0:1/error echo "4 -10 0x12" > ${error} will make the device return FAILED when trying to reset LUN with inquiry command 10 times. error=/sys/kernel/debug/scsi_debug/0:0:0:1/error echo "4 -10 0xff" > ${error} will make the device return FAILED when trying to reset LUN 10 times. Usually we do not care about what command it is when trying to perform reset LUN, so 0xff could be applied. Signed-off-by: Wenchao Hao <haowenchao2@huawei.com> Link: https://lore.kernel.org/r/20231010092051.608007-9-haowenchao2@huawei.com Tested-by: Douglas Gilbert <dgilbert@interlog.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# 5551ce92	10-Oct-2023	Wenchao Hao <haowenchao2@huawei.com>	scsi: scsi_debug: Add new error injection type: Abort Failed Add error injection type 3 to make scsi_debug_abort() return FAILED. Fail abort command format: +--------+------+-------------------------------------------------------+ \| Column \| Type \| Description \| +--------+------+-------------------------------------------------------+ \| 1 \| u8 \| Error type, fixed to 0x3 \| +--------+------+-------------------------------------------------------+ \| 2 \| s32 \| Error count \| \| \| \| 0: this rule will be ignored \| \| \| \| positive: the rule will always take effect \| \| \| \| negative: the rule takes effect n times where -n is \| \| \| \| the value given. Ignored after n times \| +--------+------+-------------------------------------------------------+ \| 3 \| x8 \| SCSI command opcode, 0xff for all commands \| +--------+------+-------------------------------------------------------+ Examples: error=/sys/kernel/debug/scsi_debug/0:0:0:1/error echo "3 -10 0x12" > ${error} will make the device return FAILED when aborting inquiry command 10 times. Signed-off-by: Wenchao Hao <haowenchao2@huawei.com> Link: https://lore.kernel.org/r/20231010092051.608007-8-haowenchao2@huawei.com Tested-by: Douglas Gilbert <dgilbert@interlog.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# 33592274	10-Oct-2023	Wenchao Hao <haowenchao2@huawei.com>	scsi: scsi_debug: Set command result and sense data if error is injected If a fail command error is injected, set the command's status and sense data then finish this SCSI command. Set SCSI command's status and sense data format: +--------+------+-------------------------------------------------------+ \| Column \| Type \| Description \| +--------+------+-------------------------------------------------------+ \| 1 \| u8 \| Error type, fixed to 0x2 \| +--------+------+-------------------------------------------------------+ \| 2 \| s32 \| Error Count \| \| \| \| 0: the rule will be ignored \| \| \| \| positive: the rule will always take effect \| \| \| \| negative: the rule takes effect n times where -n is \| \| \| \| the value given. Ignored after n times \| +--------+------+-------------------------------------------------------+ \| 3 \| x8 \| SCSI command opcode, 0xff for all commands \| +--------+------+-------------------------------------------------------+ \| 4 \| x8 \| Host byte in scsi_cmd::status \| \| \| \| [scsi_cmd::status has 32 bits holding these 3 bytes] \| +--------+------+-------------------------------------------------------+ \| 5 \| x8 \| Driver byte in scsi_cmd::status \| +--------+------+-------------------------------------------------------+ \| 6 \| x8 \| SCSI Status byte in scsi_cmd::status \| +--------+------+-------------------------------------------------------+ \| 7 \| x8 \| SCSI Sense Key in scsi_cmnd \| +--------+------+-------------------------------------------------------+ \| 8 \| x8 \| SCSI ASC in scsi_cmnd \| +--------+------+-------------------------------------------------------+ \| 9 \| x8 \| SCSI ASCQ in scsi_cmnd \| +--------+------+-------------------------------------------------------+ Examples: error=/sys/kernel/debug/scsi_debug/0:0:0:1/error echo "2 -10 0x88 0 0 0x2 0x3 0x11 0x0" >${error} will make device's read command return with media error with additional sense of "Unrecovered read error" (UNC): Acked-by: Douglas Gilbert <dgilbert@interlog.com> Signed-off-by: Wenchao Hao <haowenchao2@huawei.com> Link: https://lore.kernel.org/r/20231010092051.608007-7-haowenchao2@huawei.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# 33bccf55	10-Oct-2023	Wenchao Hao <haowenchao2@huawei.com>	scsi: scsi_debug: Return failed value if error is injected If a fail queuecommand error is injected, return the failed value defined in the rule from queuecommand. Make queuecommand return format: +--------+------+-------------------------------------------------------+ \| Column \| Type \| Description \| +--------+------+-------------------------------------------------------+ \| 1 \| u8 \| Error type, fixed to 0x1 \| +--------+------+-------------------------------------------------------+ \| 2 \| s32 \| Error count \| \| \| \| 0: this rule will be ignored \| \| \| \| positive: the rule will always take effect \| \| \| \| negative: the rule takes effect n times where -n is \| \| \| \| the value given. Ignored after n times \| +--------+------+-------------------------------------------------------+ \| 3 \| x8 \| SCSI command opcode, 0xff for all commands \| +--------+------+-------------------------------------------------------+ \| 4 \| x32 \| The queuecommand() return value we want \| +--------+------+-------------------------------------------------------+ Examples: error=/sys/kernel/debug/scsi_debug/0:0:0:1/error echo "1 1 0x12 0x1055" > ${error} will make each INQUIRY command sent to that device return 0x1055 (SCSI_MLQUEUE_HOST_BUSY). Acked-by: Douglas Gilbert <dgilbert@interlog.com> Signed-off-by: Wenchao Hao <haowenchao2@huawei.com> Link: https://lore.kernel.org/r/20231010092051.608007-6-haowenchao2@huawei.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# 32be8b6e	10-Oct-2023	Wenchao Hao <haowenchao2@huawei.com>	scsi: scsi_debug: Time out command if the error is injected If a timeout error is injected, return 0 from scsi_debug_queuecommand to make the command time out. Time out SCSI command format: +--------+------+-------------------------------------------------------+ \| Column \| Type \| Description \| +--------+------+-------------------------------------------------------+ \| 1 \| u8 \| Error type, fixed to 0x0 \| +--------+------+-------------------------------------------------------+ \| 2 \| s32 \| Error count \| \| \| \| 0: this rule will be ignored \| \| \| \| positive: the rule will always take effect \| \| \| \| negative: the rule takes effect n times where -n is \| \| \| \| the value given. Ignored after n times \| +--------+------+-------------------------------------------------------+ \| 3 \| x8 \| SCSI command opcode, 0xff for all commands \| +--------+------+-------------------------------------------------------+ Examples: error=/sys/kernel/debug/scsi_debug/0:0:0:1/error echo "0 -10 0x12" > ${error} will make the device's inquiry command time out 10 times. echo "0 1 0x12" > ${error} will make the device's inquiry time out each time it is invoked on this device. Acked-by: Douglas Gilbert <dgilbert@interlog.com> Signed-off-by: Wenchao Hao <haowenchao2@huawei.com> Link: https://lore.kernel.org/r/20231010092051.608007-5-haowenchao2@huawei.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# 962d77cd	10-Oct-2023	Wenchao Hao <haowenchao2@huawei.com>	scsi: scsi_debug: Define grammar to remove added error injection The grammar to remove error injection is a line with fixed 3 columns separated by spaces. First column is fixed to "-". It tells this is a removal operation. Second column is the error code to match. Third column is the scsi command to match. For example the following command would remove timeout injection of inquiry command: echo "- 0 0x12" > /sys/kernel/debug/scsi_debug/0:0:0:1/error Acked-by: Douglas Gilbert <dgilbert@interlog.com> Signed-off-by: Wenchao Hao <haowenchao2@huawei.com> Link: https://lore.kernel.org/r/20231010092051.608007-4-haowenchao2@huawei.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# a9996d72	10-Oct-2023	Wenchao Hao <haowenchao2@huawei.com>	scsi: scsi_debug: Add interface to manage error injection for a single device This new facility uses the debugfs pseudo file system which is typically mounted under the /sys/kernel/debug directory and requires root permissions to access. The interface file is found at /sys/kernel/debug/scsi_debug/<h:c:t:l>/error where <h:c:t:l> identifies the device (logical unit (LU)) to inject errors on. For the following description the ${error} environment variable is assumed to be set to/sys/kernel/debug/scsi_debug/1:0:0:0/error where 1:0:0:0 is a pseudo device (LU) owned by the scsi_debug driver. Rules are written to ${error} in the normal sysfs fashion (e.g. 'echo "0 -2 0x12" > ${error}'). More than one rule can be active on a device at a time and inactive rules (i.e. those whose error count is 0) remain in the rule listing. The existing rules can be read with 'cat ${error}' with oneline output for each rule. The interface format is line-by-line, each line is an error injection rule. Each rule contains integers separated by spaces, the first three columns correspond to "Error code", "Error count" and "SCSI command", other columns depend on Error code. General rule format: +--------+------+-------------------------------------------------------+ \| Column \| Type \| Description \| +--------+------+-------------------------------------------------------+ \| 1 \| u8 \| Error code \| \| \| \| 0: timeout SCSI command \| \| \| \| 1: fail queuecommand, make queuecommand return \| \| \| \| given value \| \| \| \| 2: fail command, finish command with SCSI status, \| \| \| \| sense key and ASC/ASCQ values \| \| \| \| 3: make abort commands for specific command fail \| \| \| \| 4: make reset lun for specific command fail \| +--------+------+-------------------------------------------------------+ \| 2 \| s32 \| Error count \| \| \| \| 0: this rule will be ignored \| \| \| \| positive: the rule will always take effect \| \| \| \| negative: the rule takes effect n times where -n is \| \| \| \| the value given. Ignored after n times \| +--------+------+-------------------------------------------------------+ \| 3 \| x8 \| SCSI command opcode, 0xff for all commands \| +--------+------+-------------------------------------------------------+ \| ... \| xxx \| Error type specific fields \| +--------+------+-------------------------------------------------------+ Notes: - When multiple error inject rules are added for the same SCSI command, the one with smaller error code will take effect (and the others will be ignored). - If the same error (i.e. same Error code and SCSI command) is added, the older one will be overwritten.. - Currently, the basic types are (u8/u16/u32/u64/s8/s16/s32/s64) and the hexadecimal types (x8/x16/x32/x64). - Where a hexadecimal value is expected (e.g. Column 3: SCSI command opcode) the "0x" prefix is optional on the value (e.g. the INQUIRY opcode can be given as '0x12' or '12'). - When the Error count is negative, reading ${error} will show that value incrementing, stopping when it gets to 0. Acked-by: Douglas Gilbert <dgilbert@interlog.com> Signed-off-by: Wenchao Hao <haowenchao2@huawei.com> Link: https://lore.kernel.org/r/20231010092051.608007-3-haowenchao2@huawei.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# 6e2d15f5	10-Oct-2023	Wenchao Hao <haowenchao2@huawei.com>	scsi: scsi_debug: Create scsi_debug directory in the debugfs filesystem Create directory scsi_debug in the root of the debugfs filesystem. Prepare to add interface for manage error injection. Acked-by: Douglas Gilbert <dgilbert@interlog.com> Signed-off-by: Wenchao Hao <haowenchao2@huawei.com> Link: https://lore.kernel.org/r/20231010092051.608007-2-haowenchao2@huawei.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# 23815df5	28-Jun-2023	Maurizio Lombardi <mlombard@redhat.com>	scsi: scsi_debug: Remove dead code The ramdisk rwlocks are not used anymore. Fixes: 87c715dcde63 ("scsi: scsi_debug: Add per_host_store option") Signed-off-by: Maurizio Lombardi <mlombard@redhat.com> Link: https://lore.kernel.org/r/20230628150638.53218-1-mlombard@redhat.com Reviewed-by: Laurence Oberman <loberman@redhat.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# 0c028b6a	16-Apr-2023	John Garry <john.g.garry@oracle.com>	scsi: scsi_debug: Abort commands from scsi_debug_device_reset() Currently scsi_debug_device_reset() does not do much apart from setting the SDEBUG_UA_POR ("Power on, reset, or bus device reset") flag, which is eventually passed back to the SCSI midlayer later for a "unit attention" command. There is a report that blktest scsi/007 test fails due to commit 1107c7b24ee3 ("scsi: scsi_debug: Dynamically allocate sdebug_queued_cmd"). The problem there is that there are dangling scsi_debug queued commands when we attempt to remove the driver. scsi/007 test triggers SCSI EH and attempts to abort a timed-out command. Function scsi_debug_device_reset() is called as part of the EH, but does not deal with outstanding erroneous command. Prior to the named commit, removing the driver caused all dangling queued commands to be stopped - this should have not been necessary. Fix by aborting outstanding commands on a scsi_device basis from scsi_debug_device_reset(). Fixes: 1107c7b24ee3 ("scsi: scsi_debug: Dynamically allocate sdebug_queued_cmd") Reported-by: kernel test robot <yujie.liu@intel.com> Link: https://lore.kernel.org/oe-lkp/202304071111.e762fcbd-yujie.liu@intel.com Signed-off-by: John Garry <john.g.garry@oracle.com> Link: https://lore.kernel.org/r/20230416175654.159163-1-john.g.garry@oracle.com Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# b32283d7	06-Apr-2023	Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>	scsi: scsi_debug: Fix missing error code in scsi_debug_init() Smatch reports: drivers/scsi/scsi_debug.c:6996 scsi_debug_init() warn: missing error code 'ret' Although it is unlikely that KMEM_CACHE might fail, but if it does then ret might be zero. So to fix this explicitly mark ret as "-ENOMEM" and then goto driver_unreg. Fixes: 1107c7b24ee3 ("scsi: scsi_debug: Dynamically allocate sdebug_queued_cmd") Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com> Link: https://lore.kernel.org/r/20230406074607.3637097-1-harshit.m.mogalapalli@oracle.com Reviewed-by: John Garry <john.g.garry@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# f1437cd1	27-Mar-2023	John Garry <john.g.garry@oracle.com>	scsi: scsi_debug: Drop sdebug_queue It's easy to get scsi_debug to error on throughput testing when we have multiple shosts: $ lsscsi [7:0:0:0] disk Linux scsi_debug 0191 [0:0:0:0] disk Linux scsi_debug 0191 $ fio --filename=/dev/sda --filename=/dev/sdb --direct=1 --rw=read --bs=4k --iodepth=256 --runtime=60 --numjobs=40 --time_based --name=jpg --eta-newline=1 --readonly --ioengine=io_uring --hipri --exitall_on_error jpg: (g=0): rw=read, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=io_uring, iodepth=256 ... fio-3.28 Starting 40 processes [ 27.521809] hrtimer: interrupt took 33067 ns [ 27.904660] sd 7:0:0:0: [sdb] tag#171 FAILED Result: hostbyte=DID_ABORT driverbyte=DRIVER_OK cmd_age=0s [ 27.904660] sd 0:0:0:0: [sda] tag#58 FAILED Result: hostbyte=DID_ABORT driverbyte=DRIVER_OK cmd_age=0s fio: io_u error [ 27.904667] sd 0:0:0:0: [sda] tag#58 CDB: Read(10) 28 00 00 00 27 00 00 01 18 00 on file /dev/sda[ 27.904670] sd 0:0:0:0: [sda] tag#62 FAILED Result: hostbyte=DID_ABORT driverbyte=DRIVER_OK cmd_age=0s The issue is related to how the driver manages submit queues and tags. A single array of submit queues - sdebug_q_arr - with its own set of tags is shared among all shosts. As such, for occasions when we have more than one shost it is possible to overload the submit queues and run out of tags. The struct sdebug_queue is to manage tags and hold the associated queued command entry pointer (for that tag). Since the tagset iters are now used for functions like sdebug_blk_mq_poll(), there is no need to manage these queues. Indeed, blk-mq already provides what we need for managing tags and queues. Drop sdebug_queue and all its usage in the driver. Signed-off-by: John Garry <john.g.garry@oracle.com> Link: https://lore.kernel.org/r/20230327074310.1862889-12-john.g.garry@oracle.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# 57f7225a	27-Mar-2023	John Garry <john.g.garry@oracle.com>	scsi: scsi_debug: Only allow sdebug_max_queue be modified when no shosts The shost->can_queue value is initially used to set per-HW queue context tag depth in the block layer. This ensures that the shost is not sent too many commands which it can deal with. However lowering sdebug_max_queue separately means that we can easily overload the shost, as in the following example: $ cat /sys/bus/pseudo/drivers/scsi_debug/max_queue 192 $ cat /sys/class/scsi_host/host0/can_queue 192 $ echo 100 > /sys/bus/pseudo/drivers/scsi_debug/max_queue $ cat /sys/class/scsi_host/host0/can_queue 192 $ fio --filename=/dev/sda --direct=1 --rw=read --bs=4k --iodepth=256 --runtime=1200 --numjobs=10 --time_based --group_reporting --name=iops-test-job --eta-newline=1 --readonly --ioengine=io_uring --hipri --exitall_on_error iops-test-job: (g=0): rw=read, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=io_uring, iodepth=256 ... fio-3.28 Starting 10 processes [ 111.269885] scsi_io_completion_action: 400 callbacks suppressed [ 111.269885] blk_print_req_error: 400 callbacks suppressed [ 111.269889] I/O error, dev sda, sector 440 op 0x0:(READ) flags 0x1200000 phys_seg 1 prio class 2 [ 111.269892] sd 0:0:0:0: [sda] tag#132 FAILED Result: hostbyte=DID_ABORT driverbyte=DRIVER_OK cmd_age=0s [ 111.269897] sd 0:0:0:0: [sda] tag#132 CDB: Read(10) 28 00 00 00 01 68 00 00 08 00 [ 111.277058] I/O error, dev sda, sector 360 op 0x0:(READ) flags 0x1200000 phys_seg 1 prio class 2 [...] Ensure that this cannot happen by allowing sdebug_max_queue be modified only when we have no shosts. As such, any shost->can_queue value will match sdebug_max_queue, and sdebug_max_queue cannot be modified separately. Since retired_max_queue is no longer set, remove support. Continue to apply the restriction that sdebug_host_max_queue cannot be modified when sdebug_host_max_queue is set. Adding support for that would mean extra code, and no one has complained about this restriction previously. A command like the following may be used to remove a shost: echo -1 > /sys/bus/pseudo/drivers/scsi_debug/add_host Signed-off-by: John Garry <john.g.garry@oracle.com> Link: https://lore.kernel.org/r/20230327074310.1862889-11-john.g.garry@oracle.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# 12f3eef0	27-Mar-2023	John Garry <john.g.garry@oracle.com>	scsi: scsi_debug: Use scsi_host_busy() in delay_store() and ndelay_store() The functions to update ndelay and delay value first check whether we have any in-flight IO for any host. It does this by checking if any tag is used in the global submit queues. We can achieve the same by setting the host as blocked and then ensuring that we have no in-flight commands with scsi_host_busy(). Note that scsi_host_busy() checks SCMD_STATE_INFLIGHT flag, which is only set per command after we ensure that the host is not blocked, i.e. we see more commands active after the check for scsi_host_busy() returns 0. Signed-off-by: John Garry <john.g.garry@oracle.com> Link: https://lore.kernel.org/r/20230327074310.1862889-10-john.g.garry@oracle.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# 9c559c9b	27-Mar-2023	John Garry <john.g.garry@oracle.com>	scsi: scsi_debug: Use blk_mq_tagset_busy_iter() in stop_all_queued() Instead of iterating all deferred commands in the submission queue structures, use blk_mq_tagset_busy_iter(), which is a standard API for this. Signed-off-by: John Garry <john.g.garry@oracle.com> Link: https://lore.kernel.org/r/20230327074310.1862889-9-john.g.garry@oracle.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# 600d9ead	27-Mar-2023	John Garry <john.g.garry@oracle.com>	scsi: scsi_debug: Use blk_mq_tagset_busy_iter() in sdebug_blk_mq_poll() Instead of iterating all deferred commands in the submission queue structures, use blk_mq_tagset_busy_iter(), which is a standard API for this. Signed-off-by: John Garry <john.g.garry@oracle.com> Link: https://lore.kernel.org/r/20230327074310.1862889-8-john.g.garry@oracle.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# 1107c7b2	27-Mar-2023	John Garry <john.g.garry@oracle.com>	scsi: scsi_debug: Dynamically allocate sdebug_queued_cmd Eventually we will drop the sdebug_queue struct as it is not really required, so start with making the sdebug_queued_cmd dynamically allocated for the lifetime of the scsi_cmnd in the driver. As an interim measure, make sdebug_queued_cmd.sd_dp a pointer to struct sdebug_defer. Also keep a value of the index allocated in sdebug_queued_cmd.qc_arr in struct sdebug_queued_cmd. To deal with an races in accessing the scsi cmnd allocated struct sdebug_queued_cmd, add a spinlock for the scsi command in its priv area. Races may be between scheduling a command for completion, aborting a command, and the command actually completing and freeing the struct sdebug_queued_cmd. [mkp: typo fix] Signed-off-by: John Garry <john.g.garry@oracle.com> Link: https://lore.kernel.org/r/20230327074310.1862889-7-john.g.garry@oracle.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# a0473bf3	27-Mar-2023	John Garry <john.g.garry@oracle.com>	scsi: scsi_debug: Use scsi_block_requests() to block queues The feature to block queues is quite dubious, since it races with in-flight IO. Indeed, it seems unnecessary for block queues for any times we do so. Anyway, to keep the same behaviour, use standard SCSI API to stop IO being sent - scsi_{un}block_requests(). Signed-off-by: John Garry <john.g.garry@oracle.com> Link: https://lore.kernel.org/r/20230327074310.1862889-6-john.g.garry@oracle.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# 25b80b2c	27-Mar-2023	John Garry <john.g.garry@oracle.com>	scsi: scsi_debug: Protect block_unblock_all_queues() with mutex There is no reason that calls to block_unblock_all_queues() from different context can't race with one another, so protect with the sdebug_host_list_mutex. There's no need for a more fine-grained per shost locking here (and we don't have a per-host lock anyway). Also simplify some touched code in sdebug_change_qdepth(). Signed-off-by: John Garry <john.g.garry@oracle.com> Link: https://lore.kernel.org/r/20230327074310.1862889-5-john.g.garry@oracle.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# 0aaa3fad	27-Mar-2023	John Garry <john.g.garry@oracle.com>	scsi: scsi_debug: Change shost list lock to a mutex The shost list lock, sdebug_host_list_lock, is a spinlock. We would only lock in non-atomic context in this driver, so use a mutex instead, which is friendlier if we need to schedule when iterating. Signed-off-by: John Garry <john.g.garry@oracle.com> Link: https://lore.kernel.org/r/20230327074310.1862889-4-john.g.garry@oracle.com Acked-by: Douglas Gilbert <dgilbert@interlog.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# 00f9d622	27-Mar-2023	John Garry <john.g.garry@oracle.com>	scsi: scsi_debug: Don't iter all shosts in clear_luns_changed_on_target() In clear_luns_changed_on_target(), we iter all devices for all shosts to conditionally clear the SDEBUG_UA_LUNS_CHANGED flag in the per-device uas_bm. One condition to see whether we clear the flag is to test whether the host for the device under consideration is the same as the matching device's (devip) host. This check will only ever pass for devices for the same shost, so only iter the devices for the matching device shost. We can now drop the spinlock'ing of the sdebug_host_list_lock in the same function. This will allow us to use a mutex instead of the spinlock for the global shost lock, as clear_luns_changed_on_target() could be called in non-blocking context, in scsi_debug_queuecommand() -> make_ua() -> clear_luns_changed_on_target() (which is why required a spinlock). Signed-off-by: John Garry <john.g.garry@oracle.com> Link: https://lore.kernel.org/r/20230327074310.1862889-3-john.g.garry@oracle.com Acked-by: Douglas Gilbert <dgilbert@interlog.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# 6500d204	27-Mar-2023	John Garry <john.g.garry@oracle.com>	scsi: scsi_debug: Fix check for sdev queue full There is a report that the blktests scsi/004 test for "TASK SET FULL" (TSF) now fails. The condition upon we should issue this TSF is when the sdev queue is full. The check for a full queue has an off-by-1 error. Previously we would increment the number of requests in the queue after testing if the queue would be full, i.e. test if one less than full. Since we now use scsi_device_busy() to count the number of requests in the queue, this would already account for the current request, so fix the test for queue full accordingly. Fixes: 151f0ec9ddb5 ("scsi: scsi_debug: Drop sdebug_dev_info.num_in_q") Reported-by: kernel test robot <oliver.sang@intel.com> Link: https://lore.kernel.org/oe-lkp/202303201334.18b30edc-oliver.sang@intel.com Signed-off-by: John Garry <john.g.garry@oracle.com> Link: https://lore.kernel.org/r/20230327074310.1862889-2-john.g.garry@oracle.com Acked-by: Douglas Gilbert <dgilbert@interlog.com> Tested-by: Yi Zhang <yi.zhang@redhat.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# c45b3804	18-Mar-2023	Lizhe <sensor1010@163.com>	scsi: scsi_debug: Remove redundant driver match function If there is no driver match function, the driver core assumes that each candidate pair (driver, device) matches, see driver_match_device(). Drop the pseudo_lld bus match function that always returned 1. This results in the same behaviour as when there is no match function. [mkp+jgg: patch description] Signed-off-by: Lizhe <sensor1010@163.com> Link: https://lore.kernel.org/r/20230319042732.278691-1-sensor1010@163.com Reviewed-by: John Garry <john.g.garry@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# 548ebb33	13-Mar-2023	John Garry <john.g.garry@oracle.com>	scsi: scsi_debug: Add poll mode deferred completions to statistics Currently commands completed via poll mode are not included in the statistics gathering for deferred completions and missed CPUs. Poll mode completions should be treated the same as other deferred completion types, so add poll mode completions to the statistics. Signed-off-by: John Garry <john.g.garry@oracle.com> Acked-by: Douglas Gilbert <dgilbert@interlog.com> Link: https://lore.kernel.org/r/20230313093114.1498305-12-john.g.garry@oracle.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# f037b5cb	13-Mar-2023	John Garry <john.g.garry@oracle.com>	scsi: scsi_debug: Get command abort feature working again The command abort feature allows us to test aborting a command which has timed-out. The idea is that for specific commands we just don't call scsi_done() and allow the request to timeout, which ensures SCSI EH kicks-in we try to abort the command. Since commit 4a0c6f432d15 ("scsi: scsi_debug: Add new defer type for mq_poll") this does not seem to work. The issue is that we clear the sd_dp->aborted flag in schedule_resp() before the completion callback has run. When the completion callback actually runs, it calls scsi_done() as normal as sd_dp->aborted unset. This is all very racy. Fix by not clearing sd_dp->aborted in schedule_resp(). Also move the call to blk_abort_request() from schedule_resp() to sdebug_q_cmd_complete(), which makes the code have a more logical sequence. I also note that this feature only works for commands which are classed as "SDEG_RES_IMMED_MASK", but only practically triggered with prior RW commands. So for my experiment I need to run fio to trigger the error on the "nth" command (see inject_on_this_cmd()), and then run something like sg_sync to queue a command to actually trigger the abort. Signed-off-by: John Garry <john.g.garry@oracle.com> Acked-by: Douglas Gilbert <dgilbert@interlog.com> Link: https://lore.kernel.org/r/20230313093114.1498305-11-john.g.garry@oracle.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# 151f0ec9	13-Mar-2023	John Garry <john.g.garry@oracle.com>	scsi: scsi_debug: Drop sdebug_dev_info.num_in_q In schedule_resp(), under certain conditions we check whether the per-device queue is full (num_in_q == queue depth - 1) and we may inject a "task set full" (TSF) error if it is. However how we read num_in_q is racy - many threads may see the same "queue is full" value (and also issue a TSF). There is per-queue locking in reading per-device num_in_q, but that would not help. Replace how we read num_in_q at this location with a call to scsi_device_busy(). Calling scsi_device_busy() is likewise racy (as reading num_in_q), so nothing lost or gained. Calling scsi_device_busy() is also slow as it needs to read all bits in the per-device budget bitmap, but we can live with that since we're just a simulator and it's only under a certain configs which we would see this. Also move the "task set full" print earlier as it would only be called now under this condition. However, previously it may not have been called - like returning early - but keep it simple and always call it. At this point we can drop sdebug_dev_info.num_in_q - it is difficult to maintain properly and adds extra normal case command processing. Signed-off-by: John Garry <john.g.garry@oracle.com> Acked-by: Douglas Gilbert <dgilbert@interlog.com> Link: https://lore.kernel.org/r/20230313093114.1498305-10-john.g.garry@oracle.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>