Cross Reference: /linux-master/drivers/scsi/scsi

History log of /linux-master/drivers/scsi/scsi_lib.c
Revision	Date	Author	Comments
# ca91259b	25-Mar-2024	Bart Van Assche <bvanassche@acm.org>	scsi: core: Fix handling of SCMD_FAIL_IF_RECOVERING There is code in the SCSI core that sets the SCMD_FAIL_IF_RECOVERING flag but there is no code that clears this flag. Instead of only clearing SCMD_INITIALIZED in scsi_end_request(), clear all flags. It is never necessary to preserve any command flags inside scsi_end_request(). Cc: stable@vger.kernel.org Fixes: 310bcaef6d7e ("scsi: core: Support failing requests while recovering") Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20240325224417.1477135-1-bvanassche@acm.org Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# f7c7190f	07-Feb-2024	Lukas Bulwahn <lukas.bulwahn@gmail.com>	scsi: core: Really include kunit tests with SCSI_LIB_KUNIT_TEST Commit 25a1f7a0a1fe ("scsi: core: Add kunit tests for scsi_check_passthrough()") adds the config SCSI_LIB_KUNIT_TEST and corresponding tests. Due to naming confusion, the actual tests would only be included when the non-existing config SCSI_KUNIT_TEST is enabled (note this missing 'LIB' in the config name). So, they are basically dead right now. Adjust the name to actual existing config. Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com> Link: https://lore.kernel.org/r/20240207145603.15680-1-lukas.bulwahn@gmail.com Reviewed-by: Mike Christie <michael.christie@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# 25a1f7a0	22-Jan-2024	Mike Christie <michael.christie@oracle.com>	scsi: core: Add kunit tests for scsi_check_passthrough() Add some kunit tests for scsi_check_passthrough() so we can easily make sure we are hitting the cases for which it's difficult to replicate in hardware or even scsi_debug. Signed-off-by: Mike Christie <michael.christie@oracle.com> Link: https://lore.kernel.org/r/20240123002220.129141-20-michael.christie@oracle.com Acked-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# 21bdff48	22-Jan-2024	Mike Christie <michael.christie@oracle.com>	scsi: core: Have midlayer retry scsi_mode_sense() UAs This has scsi_mode_sense() have the SCSI midlayer retry UAs instead of driving them itself. Signed-off-by: Mike Christie <michael.christie@oracle.com> Link: https://lore.kernel.org/r/20240123002220.129141-13-michael.christie@oracle.com Acked-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# 994724e6	22-Jan-2024	Mike Christie <michael.christie@oracle.com>	scsi: core: Allow passthrough to request midlayer retries For passthrough we don't retry any error which we get a check condition for. This results in a lot of callers driving their own retries for all UAs, specific UAs, NOT_READY, specific sense values or any type of failure. This adds the core code to allow passthrough users to specify what errors they want the SCSI midlayer to retry for them. We can then convert users to drop a lot of their sense parsing and retry handling. Signed-off-by: Mike Christie <michael.christie@oracle.com> Link: https://lore.kernel.org/r/20240123002220.129141-2-michael.christie@oracle.com Reviewed-by: John Garry <john.g.garry@oracle.com> Acked-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# 4e6c9011	02-Feb-2024	Ming Lei <ming.lei@redhat.com>	scsi: core: Move scsi_host_busy() out of host lock if it is for per-command Commit 4373534a9850 ("scsi: core: Move scsi_host_busy() out of host lock for waking up EH handler") intended to fix a hard lockup issue triggered by EH. The core idea was to move scsi_host_busy() out of the host lock when processing individual commands for EH. However, a suggested style change inadvertently caused scsi_host_busy() to remain under the host lock. Fix this by calling scsi_host_busy() outside the lock. Fixes: 4373534a9850 ("scsi: core: Move scsi_host_busy() out of host lock for waking up EH handler") Cc: Sathya Prakash Veerichetty <safhya.prakash@broadcom.com> Cc: Bart Van Assche <bvanassche@acm.org> Cc: Ewan D. Milne <emilne@redhat.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20240203024521.2006455-1-ming.lei@redhat.com Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# 4373534a	12-Jan-2024	Ming Lei <ming.lei@redhat.com>	scsi: core: Move scsi_host_busy() out of host lock for waking up EH handler Inside scsi_eh_wakeup(), scsi_host_busy() is called & checked with host lock every time for deciding if error handler kthread needs to be waken up. This can be too heavy in case of recovery, such as: - N hardware queues - queue depth is M for each hardware queue - each scsi_host_busy() iterates over (N * M) tag/requests If recovery is triggered in case that all requests are in-flight, each scsi_eh_wakeup() is strictly serialized, when scsi_eh_wakeup() is called for the last in-flight request, scsi_host_busy() has been run for (N * M - 1) times, and request has been iterated for (NM - 1) (N * M) times. If both N and M are big enough, hard lockup can be triggered on acquiring host lock, and it is observed on mpi3mr(128 hw queues, queue depth 8169). Fix the issue by calling scsi_host_busy() outside the host lock. We don't need the host lock for getting busy count because host the lock never covers that. [mkp: Drop unnecessary 'busy' variables pointed out by Bart] Cc: Ewan Milne <emilne@redhat.com> Fixes: 6eb045e092ef ("scsi: core: avoid host-wide host_busy counter for scsi_mq") Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20240112070000.4161982-1-ming.lei@redhat.com Reviewed-by: Ewan D. Milne <emilne@redhat.com> Reviewed-by: Sathya Prakash Veerichetty <safhya.prakash@broadcom.com> Tested-by: Sathya Prakash Veerichetty <safhya.prakash@broadcom.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# 3dc985bf	18-Oct-2023	Wenchao Hao <haowenchao2@huawei.com>	scsi: core: Clean up scsi_dev_queue_ready() This is just a cleanup for scsi_dev_queue_ready() to avoid a redundant goto and if statement. No functional change. Signed-off-by: Wenchao Hao <haowenchao2@huawei.com> Link: https://lore.kernel.org/r/20231018113746.1940197-2-haowenchao2@huawei.com Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# 2bbeb8d1	14-Oct-2023	Douglas Gilbert <dgilbert@interlog.com>	scsi: core: Handle depopulation and restoration in progress The default handling of the NOT READY sense key is to wait for the device to become ready. The "wait" is assumed to be relatively short. However there is a sub-class of NOT READY that have the "... in progress" phrase in their additional sense code and these can take much longer. Following on from commit 505aa4b6a883 ("scsi: sd: Defer spinning up drive while SANITIZE is in progress") we now have element depopulation and restoration that can take a long time. For example, over 24 hours for a 20 TB, 7200 rpm hard disk to depopulate 1 of its 20 elements. Add handling of ASC/ASCQ: 0x4,0x24 (depopulation in progress) and ASC/ASCQ: 0x4,0x25 (depopulation restoration in progress) to sd.c . The scsi_lib.c has incomplete handling of these two messages, so complete it. Signed-off-by: Douglas Gilbert <dgilbert@interlog.com> Link: https://lore.kernel.org/r/20231015050650.131145-1-dgilbert@interlog.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# f43158ee	04-Oct-2023	Mike Christie <michael.christie@oracle.com>	scsi: Fix sshdr use in scsi_test_unit_ready If scsi_execute_cmd returns < 0, it doesn't initialize the sshdr, so we shouldn't access the sshdr. If it returns 0, then the cmd executed successfully, so there is no need to check the sshdr. This has us access the sshdr when we get a return value > 0. Signed-off-by: Mike Christie <michael.christie@oracle.com> Link: https://lore.kernel.org/r/20231004210013.5601-10-michael.christie@oracle.com Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: John Garry <john.g.garry@oracle.com> Reviewed-by: Martin Wilck <mwilck@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# 79519528	22-Aug-2023	Bart Van Assche <bvanassche@acm.org>	scsi: core: Improve type safety of scsi_rescan_device() Most callers of scsi_rescan_device() have the scsi_device pointer readily available. Pass a struct scsi_device pointer to scsi_rescan_device() instead of a struct device pointer. This change prevents that a pointer to another struct device would be passed accidentally to scsi_rescan_device(). Remove the scsi_rescan_device() declaration from the scsi_priv.h header file since it duplicates the declaration in <scsi/scsi_host.h>. Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Reviewed-by: John Garry <john.g.garry@oracle.com> Cc: Mike Christie <michael.christie@oracle.com> Cc: Ming Lei <ming.lei@redhat.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20230822153043.4046244-1-bvanassche@acm.org Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# 65a558f6	21-Jul-2023	Bart Van Assche <bvanassche@acm.org>	block: Improve performance for BLK_MQ_F_BLOCKING drivers blk_mq_run_queue() runs the queue asynchronously if BLK_MQ_F_BLOCKING has been set. This is suboptimal since running the queue asynchronously is slower than running the queue synchronously. This patch modifies blk_mq_run_queue() as follows if BLK_MQ_F_BLOCKING has been set: - Run the queue synchronously if it is allowed to sleep. - Run the queue asynchronously if it is not allowed to sleep. Additionally, blk_mq_run_hw_queue(hctx, false) calls are modified into blk_mq_run_hw_queue(hctx, hctx->flags & BLK_MQ_F_BLOCKING) if the caller may be invoked from atomic context. The following caller chains have been reviewed: blk_mq_run_hw_queue(hctx, false) blk_mq_get_tag() /* may sleep, hence the functions it calls may also sleep / blk_execute_rq() / may sleep / blk_mq_run_hw_queues(q, async=false) blk_freeze_queue_start() / may sleep / blk_mq_requeue_work() / may sleep / scsi_kick_queue() scsi_requeue_run_queue() / may sleep / scsi_run_host_queues() scsi_ioctl_reset() / may sleep / blk_mq_insert_requests(hctx, ctx, list, run_queue_async=false) blk_mq_dispatch_plug_list(plug, from_sched=false) blk_mq_flush_plug_list(plug, from_schedule=false) __blk_flush_plug(plug, from_schedule=false) blk_add_rq_to_plug() blk_mq_submit_bio() / may sleep if REQ_NOWAIT has not been set / blk_mq_plug_issue_direct() blk_mq_flush_plug_list() / see above / blk_mq_dispatch_plug_list(plug, from_sched=false) blk_mq_flush_plug_list() / see above / blk_mq_try_issue_directly() blk_mq_submit_bio() / may sleep if REQ_NOWAIT has not been set / blk_mq_try_issue_list_directly(hctx, list) blk_mq_insert_requests() / see above */ Cc: Christoph Hellwig <hch@lst.de> Cc: Ming Lei <ming.lei@redhat.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20230721172731.955724-4-bvanassche@acm.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
# d42e2e34	21-Jul-2023	Bart Van Assche <bvanassche@acm.org>	scsi: Remove a blk_mq_run_hw_queues() call blk_mq_kick_requeue_list() calls blk_mq_run_hw_queues() asynchronously. Leave out the direct blk_mq_run_hw_queues() call. This patch causes scsi_run_queue() to call blk_mq_run_hw_queues() asynchronously instead of synchronously. Since scsi_run_queue() is not called from the hot I/O submission path, this patch does not affect the hot path. This patch prepares for allowing blk_mq_run_hw_queue() to sleep if BLK_MQ_F_BLOCKING has been set. scsi_run_queue() may be called from atomic context and must not sleep. Hence the removal of the blk_mq_run_hw_queues(q, false) call. See also scsi_unblock_requests(). Cc: "Martin K. Petersen" <martin.petersen@oracle.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: "Martin K. Petersen" <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20230721172731.955724-3-bvanassche@acm.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
# b5ca9acf	21-Jul-2023	Bart Van Assche <bvanassche@acm.org>	scsi: Inline scsi_kick_queue() Inline scsi_kick_queue() to prepare for modifying the second argument passed to blk_mq_run_hw_queues(). Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: "Martin K. Petersen" <martin.petersen@oracle.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: "Martin K. Petersen" <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20230721172731.955724-2-bvanassche@acm.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
# 6d7160c7	13-Jun-2023	Martin Wilck <mwilck@suse.com>	scsi: core: Improve warning message in scsi_device_block() If __scsi_internal_device_block() returns an error, it is always -EINVAL because of an invalid state transition. For debugging purposes, it makes more sense to print the device state. Signed-off-by: Martin Wilck <mwilck@suse.com> Link: https://lore.kernel.org/r/20230614103616.31857-8-mwilck@suse.com Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# 31950192	13-Jun-2023	Martin Wilck <mwilck@suse.com>	scsi: core: Replace scsi_target_block() with scsi_block_targets() All callers (fc_remote_port_delete(), __iscsi_block_session(), __srp_start_tl_fail_timers(), srp_reconnect_rport(), snic_tgt_del()) pass parent devices of scsi_target devices to scsi_target_block(). Rename the function to scsi_block_targets(), and simplify it by assuming that it is always passed a parent device. Also, have callers pass the Scsi_Host pointer to scsi_block_targets(), as every caller has this pointer readily available. Suggested-by: Christoph Hellwig <hch@lst.de> Suggested-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Martin Wilck <mwilck@suse.com> Link: https://lore.kernel.org/r/20230614103616.31857-7-mwilck@suse.com Cc: Karan Tilak Kumar <kartilak@cisco.com> Cc: Sesidhar Baddela <sebaddel@cisco.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# e20fff8a	13-Jun-2023	Martin Wilck <mwilck@suse.com>	scsi: core: Don't wait for quiesce in scsi_device_block() scsi_device_block() is only called from scsi_target_block(), which calls it repeatedly for every child device. For targets with many devices, waiting for every queue to quiesce may cause a substantial delay (we measured more than 100s delay for blocking a FC rport with 2048 LUNs). Just call blk_mq_wait_quiesce_done() once from scsi_target_block() after stopping all queues. Signed-off-by: Martin Wilck <mwilck@suse.com> Link: https://lore.kernel.org/r/20230614103616.31857-6-mwilck@suse.com Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# d7035b73	13-Jun-2023	Martin Wilck <mwilck@suse.com>	scsi: core: Don't wait for quiesce in scsi_stop_queue() scsi_stop_queue() has just two callers, one with and one without "nowait". As blk_mq_quiesce_queue() comes down to blk_mq_quiesce_queue_nowait() followed by blk_mq_wait_quiesce_done(), we might as well open-code this in scsi_device_block(). Also, add a comment explaining why blk_mq_quiesce_queue_nowait() must be called with the state_mutex held, see https://lore.kernel.org/linux-scsi/3b8b13bf-a458-827a-b916-07d7eee8ae00@acm.org/. Signed-off-by: Martin Wilck <mwilck@suse.com> Link: https://lore.kernel.org/r/20230614103616.31857-5-mwilck@suse.com Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# c5e46f7a	13-Jun-2023	Martin Wilck <mwilck@suse.com>	scsi: core: Merge scsi_internal_device_block() and device_block() scsi_internal_device_block() is only called from device_block(). Merge the two functions, and call the result scsi_device_block(), as the name device_block() is confusingly generic. Signed-off-by: Martin Wilck <mwilck@suse.com> Link: https://lore.kernel.org/r/20230614103616.31857-4-mwilck@suse.com Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# b125bb99	29-May-2023	Bart Van Assche <bvanassche@acm.org>	scsi: core: Support setting BLK_MQ_F_BLOCKING Prepare for adding code in ufshcd_queuecommand() that may sleep. This patch is similar to a patch posted last year by Mike Christie. See also https://lore.kernel.org/all/20220308003957.123312-2-michael.christie@oracle.com/ Cc: Mike Christie <michael.christie@oracle.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20230529202640.11883-3-bvanassche@acm.org Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# c854bcdf	29-May-2023	Bart Van Assche <bvanassche@acm.org>	scsi: core: Rework scsi_host_block() Make scsi_host_block() easier to read by converting it to the widely used early-return style. See also commit f983622ae605 ("scsi: core: Avoid calling synchronize_rcu() for each device in scsi_host_block()"). Reviewed-by: Mike Christie <michael.christie@oracle.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Cc: Ye Bin <yebin10@huawei.com> Cc: Hannes Reinecke <hare@suse.de> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20230529202640.11883-2-bvanassche@acm.org Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# 8b566edb	18-May-2023	Bart Van Assche <bvanassche@acm.org>	scsi: core: Only kick the requeue list if necessary Instead of running the request queue of each device associated with a host every 3 ms (BLK_MQ_RESOURCE_DELAY) while host error handling is in progress, run the request queue after error handling has finished. Cc: Christoph Hellwig <hch@lst.de> Cc: Ming Lei <ming.lei@redhat.com> Cc: Hannes Reinecke <hare@suse.de> Cc: John Garry <john.g.garry@oracle.com> Cc: Mike Christie <michael.christie@oracle.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20230518193159.1166304-4-bvanassche@acm.org Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# 390e2d1a	10-May-2023	Niklas Cassel <niklas.cassel@wdc.com>	scsi: sd: Handle read/write CDL timeout failures Commands using a duration limit descriptor that has limit policies set to a value other than 0x0 may be failed by the device if one of the limits are exceeded. For such commands, since the failure is the result of the user duration limit configuration and workload, the commands should not be retried and terminated immediately. Furthermore, to allow the user to differentiate these "soft" failures from hard errors due to hardware problem, a different error code than EIO should be returned. There are 2 cases to consider: (1) The failure is due to a limit policy failing the command with a check condition sense key, that is, any limit policy other than 0xD. For this case, scsi_check_sense() is modified to detect failures with the ABORTED COMMAND sense key and the COMMAND TIMEOUT BEFORE PROCESSING or COMMAND TIMEOUT DURING PROCESSING or COMMAND TIMEOUT DURING PROCESSING DUE TO ERROR RECOVERY additional sense code. For these failures, a SUCCESS disposition is returned so that scsi_finish_command() is called to terminate the command. (2) The failure is due to a limit policy set to 0xD, which result in the command being terminated with a GOOD status, COMPLETED sense key, and DATA CURRENTLY UNAVAILABLE additional sense code. To handle this case, the scsi_check_sense() is modified to return a SUCCESS disposition so that scsi_finish_command() is called to terminate the command. In addition, scsi_decide_disposition() has to be modified to see if a command being terminated with GOOD status has sense data. This is as defined in SCSI Primary Commands - 6 (SPC-6), so all according to spec, even if GOOD status commands were not checked before. If scsi_check_sense() detects sense data representing a duration limit, scsi_check_sense() will set the newly introduced SCSI ML byte SCSIML_STAT_DL_TIMEOUT. This SCSI ML byte is checked in scsi_noretry_cmd(), so that a command that failed because of a CDL timeout cannot be retried. The SCSI ML byte is also checked in scsi_result_to_blk_status() to complete the command request with the BLK_STS_DURATION_LIMIT status, which result in the user seeing ETIME errors for the failed commands. Co-developed-by: Damien Le Moal <dlemoal@kernel.org> Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com> Link: https://lore.kernel.org/r/20230511011356.227789-12-nks@flawful.org Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# a6cdc35f	10-May-2023	Damien Le Moal <dlemoal@kernel.org>	scsi: core: Support retrieving sub-pages of mode pages Allow scsi_mode_sense() to retrieve sub-pages of mode pages by adding the subpage argument. Change all the current caller sites to specify the subpage 0. Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com> Link: https://lore.kernel.org/r/20230511011356.227789-7-nks@flawful.org Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# 73432693	10-May-2023	Niklas Cassel <niklas.cassel@wdc.com>	scsi: core: Rename and move get_scsi_ml_byte() SCSI has two different getters: - get_XXX_byte() (in scsi_cmnd.h) which takes a struct scsi_cmnd *, and - XXX_byte() (in scsi.h) which takes a scmd->result. The proper name for get_scsi_ml_byte() should thus be without the get_ prefix, as it takes a scmd->result. Rename the function to rectify this. (This change was suggested by Mike Christie.) Additionally, move get_scsi_ml_byte() to scsi_priv.h since both scsi_lib.c and scsi_error.c will need to use this helper in a follow-up patch. Cc: Mike Christie <michael.christie@oracle.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com> Link: https://lore.kernel.org/r/20230511011356.227789-6-nks@flawful.org Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# 7ba15083	07-Apr-2023	Mike Christie <michael.christie@oracle.com>	block: Rename BLK_STS_NEXUS to BLK_STS_RESV_CONFLICT BLK_STS_NEXUS is used for NVMe/SCSI reservation conflicts and DASD's locking feature which works similar to NVMe/SCSI reservations where a host can get a lock on a device and when the lock is taken it will get failures. This patch renames BLK_STS_NEXUS so it better reflects this type of use. Signed-off-by: Mike Christie <michael.christie@oracle.com> Link: https://lore.kernel.org/r/20230407200551.12660-3-michael.christie@oracle.com Acked-by: Stefan Haberland <sth@linux.ibm.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# 09e797c8	15-May-2023	Wenchao Hao <haowenchao2@huawei.com>	scsi: core: Decrease scsi_device's iorequest_cnt if dispatch failed If scsi_dispatch_cmd() failed, the SCSI command was not sent to the target, scsi_queue_rq() would return BLK_STS_RESOURCE and the related request would be requeued. The timeout of this request would not fire, no one would increase iodone_cnt. The above flow would result the iodone_cnt smaller than iorequest_cnt. So decrease the iorequest_cnt if dispatch failed to workaround the issue. Signed-off-by: Wenchao Hao <haowenchao2@huawei.com> Reported-by: Ming Lei <ming.lei@redhat.com> Closes: https://lore.kernel.org/r/ZF+zB+bB7iqe0wGd@ovpn-8-17.pek2.redhat.com Link: https://lore.kernel.org/r/20230515070156.1790181-3-haowenchao2@huawei.com Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# 6ca9818d	15-May-2023	Wenchao Hao <haowenchao2@huawei.com>	scsi: Revert "scsi: core: Do not increase scsi_device's iorequest_cnt if dispatch failed" The "atomic_inc(&cmd->device->iorequest_cnt)" in scsi_queue_rq() would cause kernel panic because cmd->device may be freed after returning from scsi_dispatch_cmd(). This reverts commit cfee29ffb45b1c9798011b19d454637d1b0fe87d. Signed-off-by: Wenchao Hao <haowenchao2@huawei.com> Reported-by: Ming Lei <ming.lei@redhat.com> Closes: https://lore.kernel.org/r/ZF+zB+bB7iqe0wGd@ovpn-8-17.pek2.redhat.com Link: https://lore.kernel.org/r/20230515070156.1790181-2-haowenchao2@huawei.com Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# 35cd2f55	10-Feb-2023	Bart Van Assche <bvanassche@acm.org>	scsi: core: Extend struct scsi_exec_args Allow SCSI LLDs to specify SCMD_* flags. Link: https://lore.kernel.org/r/20230210193258.4004923-2-bvanassche@acm.org Cc: Mike Christie <michael.christie@oracle.com> Cc: John Garry <john.g.garry@oracle.com> Reviewed-by: John Garry <john.g.garry@oracle.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# 7dfe0b5e	29-Dec-2022	Mike Christie <michael.christie@oracle.com>	scsi: core: Convert to scsi_execute_cmd() scsi_execute_req() is going to be removed. Convert SCSI midlayer to scsi_execute_cmd(). Signed-off-by: Mike Christie <michael.christie@oracle.com> Reviewed-by: John Garry <john.g.garry@oracle.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# d0949565	29-Dec-2022	Mike Christie <michael.christie@oracle.com>	scsi: core: Add struct for args to execution functions Move the SCSI execution functions to use a struct for passing in optional args. This commit adds the new struct, temporarily converts scsi_execute() and scsi_execute_req() ands a new helper, scsi_execute_cmd(), which takes the scsi_exec_args struct. There should be no change in behavior. We no longer allow users to pass in any request->rq_flags value, but they were only passing in RQF_PM which we do support by allowing users to pass in the BLK_MQ_REQ flags used by blk_mq_alloc_request(). Subsequent commits will convert scsi_execute() and scsi_execute_req() users to the new helpers then remove scsi_execute() and scsi_execute_req(). Signed-off-by: Mike Christie <michael.christie@oracle.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: John Garry <john.g.garry@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# cfee29ff	23-Nov-2022	Wenchao Hao <haowenchao@huawei.com>	scsi: core: Do not increase scsi_device's iorequest_cnt if dispatch failed If scsi_dispatch_cmd() failed, the SCSI command was not sent to the target. scsi_queue_rq() would return BLK_STS_RESOURCE if scsi_dispatch_cmd() failed, and the related request would be requeued. The timeout of this request would not fire, so noone would increase iodone_cnt. Signed-off-by: Wenchao Hao <haowenchao@huawei.com> Link: https://lore.kernel.org/r/20221123122137.150776-3-haowenchao@huawei.com Reviewed-by: Mike Christie <michael.christie@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# 310bcaef	18-Oct-2022	Bart Van Assche <bvanassche@acm.org>	scsi: core: Support failing requests while recovering The current behavior for SCSI commands submitted while error recovery is ongoing is to retry command submission after error recovery has finished. See also the scsi_host_in_recovery() check in scsi_host_queue_ready(). Add support for failing SCSI commands while host recovery is in progress. This functionality will be used to fix a deadlock in the UFS driver. Cc: Christoph Hellwig <hch@lst.de> Cc: Ming Lei <ming.lei@redhat.com> Cc: John Garry <john.garry@huawei.com> Cc: Mike Christie <michael.christie@oracle.com> Cc: Hannes Reinecke <hare@suse.de> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20221018202958.1902564-4-bvanassche@acm.org Reviewed-by: Mike Christie <michael.christie@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# d460f624	14-Oct-2022	Bart Van Assche <bvanassche@acm.org>	scsi: core: Rework scsi_single_lun_run() Use __starget_for_each_device() instead of open-coding starget_for_each_device(). Run the queues asynchronously instead of synchronously. This commit removes code that calls scsi_device_put() from atomic context. Cc: Christoph Hellwig <hch@lst.de> Cc: Ming Lei <ming.lei@redhat.com> Cc: Hannes Reinecke <hare@suse.de> Cc: John Garry <john.garry@huawei.com> Cc: Mike Christie <michael.christie@oracle.com> Cc: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20221015002418.30955-6-bvanassche@acm.org Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# 483239c7	01-Nov-2022	Christoph Hellwig <hch@lst.de>	blk-mq: pass a tagset to blk_mq_wait_quiesce_done Nothing in blk_mq_wait_quiesce_done needs the request_queue now, so just pass the tagset, and move the non-mq check into the only caller that needs it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Chao Leng <lengchao@huawei.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20221101150050.3510-13-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
# 7dfaae6a	11-Aug-2022	Mike Christie <michael.christie@oracle.com>	scsi: core: Convert scsi_decide_disposition() to use SCSIML_STAT Don't use: - DID_TARGET_FAILURE - DID_NEXUS_FAILURE - DID_ALLOC_FAILURE - DID_MEDIUM_ERROR Instead use the SCSI midlayer internal values. Link: https://lore.kernel.org/r/20220812010027.8251-10-michael.christie@oracle.com Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Mike Christie <michael.christie@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# 36ebf1e2	11-Aug-2022	Mike Christie <michael.christie@oracle.com>	scsi: core: Add error codes for internal SCSI midlayer use If a driver returns: - DID_TARGET_FAILURE - DID_NEXUS_FAILURE - DID_ALLOC_FAILURE - DID_MEDIUM_ERROR we hit a couple bugs: 1. The SCSI error handler runs because scsi_decide_disposition() has no case statements for them and we return FAILED. 2. For SG IO the userspace app gets a success status instead of failed, because scsi_result_to_blk_status() clears those errors. This patch adds a new internal error code byte for use by the SCSI midlayer. This will be used instead of the above error codes, so we don't have to play that clearing the host code game in scsi_result_to_blk_status() and drivers cannot accidentally use them. A subsequent commit will then remove the internal users of the above codes and convert us to use the new ones. Link: https://lore.kernel.org/r/20220812010027.8251-9-michael.christie@oracle.com Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Mike Christie <michael.christie@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# a4e1d0b7	15-Aug-2022	Bart Van Assche <bvanassche@acm.org>	block: Change the return type of blk_mq_map_queues() into void Since blk_mq_map_queues() and the .map_queues() callbacks always return 0, change their return type into void. Most callers ignore the returned value anyway. Cc: Christoph Hellwig <hch@lst.de> Cc: Jason Wang <jasowang@redhat.com> Cc: Keith Busch <kbusch@kernel.org> Cc: Martin K. Petersen <martin.petersen@oracle.com> Cc: Doug Gilbert <dgilbert@interlog.com> Cc: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: John Garry <john.garry@huawei.com> Acked-by: Md Haris Iqbal <haris.iqbal@ionos.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Link: https://lore.kernel.org/r/20220815170043.19489-3-bvanassche@acm.org [axboe: fold in fix from Bart] Signed-off-by: Jens Axboe <axboe@kernel.dk>
# 8fe4ce58	25-Aug-2022	Bart Van Assche <bvanassche@acm.org>	scsi: core: Fix a use-after-free There are two .exit_cmd_priv implementations. Both implementations use resources associated with the SCSI host. Make sure that these resources are still available when .exit_cmd_priv is called by waiting inside scsi_remove_host() until the tag set has been freed. This commit fixes the following use-after-free: ================================================================== BUG: KASAN: use-after-free in srp_exit_cmd_priv+0x27/0xd0 [ib_srp] Read of size 8 at addr ffff888100337000 by task multipathd/16727 Call Trace: <TASK> dump_stack_lvl+0x34/0x44 print_report.cold+0x5e/0x5db kasan_report+0xab/0x120 srp_exit_cmd_priv+0x27/0xd0 [ib_srp] scsi_mq_exit_request+0x4d/0x70 blk_mq_free_rqs+0x143/0x410 __blk_mq_free_map_and_rqs+0x6e/0x100 blk_mq_free_tag_set+0x2b/0x160 scsi_host_dev_release+0xf3/0x1a0 device_release+0x54/0xe0 kobject_put+0xa5/0x120 device_release+0x54/0xe0 kobject_put+0xa5/0x120 scsi_device_dev_release_usercontext+0x4c1/0x4e0 execute_in_process_context+0x23/0x90 device_release+0x54/0xe0 kobject_put+0xa5/0x120 scsi_disk_release+0x3f/0x50 device_release+0x54/0xe0 kobject_put+0xa5/0x120 disk_release+0x17f/0x1b0 device_release+0x54/0xe0 kobject_put+0xa5/0x120 dm_put_table_device+0xa3/0x160 [dm_mod] dm_put_device+0xd0/0x140 [dm_mod] free_priority_group+0xd8/0x110 [dm_multipath] free_multipath+0x94/0xe0 [dm_multipath] dm_table_destroy+0xa2/0x1e0 [dm_mod] __dm_destroy+0x196/0x350 [dm_mod] dev_remove+0x10c/0x160 [dm_mod] ctl_ioctl+0x2c2/0x590 [dm_mod] dm_ctl_ioctl+0x5/0x10 [dm_mod] __x64_sys_ioctl+0xb4/0xf0 dm_ctl_ioctl+0x5/0x10 [dm_mod] __x64_sys_ioctl+0xb4/0xf0 do_syscall_64+0x3b/0x90 entry_SYSCALL_64_after_hwframe+0x46/0xb0 Link: https://lore.kernel.org/r/20220826002635.919423-1-bvanassche@acm.org Fixes: 65ca846a5314 ("scsi: core: Introduce {init,exit}_cmd_priv()") Cc: Ming Lei <ming.lei@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Mike Christie <michael.christie@oracle.com> Cc: Hannes Reinecke <hare@suse.de> Cc: John Garry <john.garry@huawei.com> Cc: Li Zhijian <lizhijian@fujitsu.com> Reported-by: Li Zhijian <lizhijian@fujitsu.com> Tested-by: Li Zhijian <lizhijian@fujitsu.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# fac8e558	11-Aug-2022	Mike Christie <michael.christie@oracle.com>	scsi: core: Fix passthrough retry counter handling Passthrough users will set the scsi_cmnd->allowed value and were expecting up to $allowed retries. The problem is that before: commit 6aded12b10e0 ("scsi: core: Remove struct scsi_request") we used to set the retries on the scsi_request then copy them over to scsi_cmnd->allowed in scsi_setup_scsi_cmnd. With that patch we now set scsi_cmnd->allowed to 0 in scsi_prepare_cmd and overwrite what the passthrough user set. This moves the allowed initialization to after the blk_rq_is_passthrough() check so it's only done for the non-passthrough path where the ULD init_command will normally set an allowed value it prefers. Link: https://lore.kernel.org/r/20220812011206.9157-1-michael.christie@oracle.com Fixes: 6aded12b10e0 ("scsi: core: Remove struct scsi_request") Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Mike Christie <michael.christie@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# 54249306	29-Jul-2022	Brian Bunker <brian@purestorage.com>	scsi: core: Allow the ALUA transitioning state enough time The error path for the SCSI check condition of not ready, target in ALUA state transition, will result in the failure of that path after the retries are exhausted. In most cases that is well ahead of the transition timeout established in the SCSI ALUA device handler. Instead, reprep the command and re-add it to the queue after a 1 second delay. This will allow the handler to take care of the timeout and only fail the path if the target has exceeded the transition expiry timeout (default 60 seconds). If the expiry timeout is exceeded, the handler will change the path state from transitioning to standby leading to a path failure eliminating the potential of this re-prep to continue endlessly. In most cases the target will exit the transitioning state well before the expiry timeout but after the retries are exhausted as mentioned. Additionally remove the scsi_io_completion_reprep() function which provides little value. Link: https://lore.kernel.org/r/20220729214110.58576-1-brian@purestorage.com Reviewed-by: Martin Wilck <mwilck@suse.com> Acked-by: Krishna Kant <krishna.kant@purestorage.com> Acked-by: Seamus Connor <sconnor@purestorage.com> Signed-off-by: Brian Bunker <brian@purestorage.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# bb7d1283	14-Jul-2022	John Garry <john.garry@huawei.com>	scsi: core: cap shost max_sectors according to DMA limits only once The shost->max_sectors is repeatedly capped according to the host DMA mapping limit for each sdev in __scsi_init_queue(). This is unnecessary, so set only once when adding the host. Signed-off-by: John Garry <john.garry@huawei.com> Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Acked-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
# 90552cd2	30-Jun-2022	Bart Van Assche <bvanassche@acm.org>	scsi: core: Move the definition of SCSI_QUEUE_DELAY Move the definition of SCSI_QUEUE_DELAY to just above the function that uses it. Link: https://lore.kernel.org/r/20220630195703.10155-2-bvanassche@acm.org Cc: Ming Lei <ming.lei@redhat.com> Cc: Hannes Reinecke <hare@suse.de> Cc: John Garry <john.garry@huawei.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# 2599cac5	14-Jul-2022	Bart Van Assche <bvanassche@acm.org>	scsi/core: Use the new blk_opf_t type Use the new blk_opf_t type for arguments and variables that represent request flags. Use the !! operator in scsi_noretry_cmd() to convert the blk_opf_t type into a boolean. This patch does not change any functionality. Cc: Martin K. Petersen <martin.petersen@oracle.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Ming Lei <ming.lei@redhat.com> Cc: Hannes Reinecke <hare@suse.de> Cc: John Garry <john.garry@huawei.com> Cc: Mike Christie <michael.christie@oracle.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20220714180729.1065367-42-bvanassche@acm.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
# ea957547	14-Jul-2022	Bart Van Assche <bvanassche@acm.org>	scsi/core: Improve static type checking Improve static type checking by using the new blk_opf_t type for the combination of a request operation and its flags. Cc: Martin K. Petersen <martin.petersen@oracle.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Ming Lei <ming.lei@redhat.com> Cc: Hannes Reinecke <hare@suse.de> Cc: John Garry <john.garry@huawei.com> Cc: Mike Christie <michael.christie@oracle.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20220714180729.1065367-40-bvanassche@acm.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
# deef1be1	06-Jul-2022	John Garry <john.garry@huawei.com>	scsi: core: Remove reserved request time-out handling The SCSI core code does not currently support reserved commands. As such, requests which time-out would never be reserved, and scsi_timeout() 'reserved' arg should never be set. Remove handling for reserved requests, drop the wrapper scsi_timeout() as it now just calls scsi_times_out() always, and finally rename scsi_times_out() -> scsi_timeout() to match the blk_mq_ops method name. Signed-off-by: John Garry <john.garry@huawei.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/1657109034-206040-2-git-send-email-john.garry@huawei.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
# 6f8191fd	19-Jun-2022	Christoph Hellwig <hch@lst.de>	block: simplify disk shutdown Set the queue dying flag and call blk_mq_exit_queue from del_gendisk for all disks that do not have separately allocated queues, and thus remove the need to call blk_cleanup_queue for them. Rename blk_cleanup_disk to blk_mq_destroy_queue to make it clear that this function is intended only for separately allocated blk-mq queues. This saves an extra queue freeze for devices without a separately allocated queue. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Link: https://lore.kernel.org/r/20220619060552.1850436-6-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
# b3bc1a88	23-May-2022	Hannes Reinecke <hare@suse.de>	scsi: core: Return BLK_STS_TRANSPORT for ALUA transitioning When the 'ALUA state transitioning' sense code is returned we cannot use BLK_STS_AGAIN, as this has a very specific use-case. So return BLK_STS_TRANSPORT here. Link: https://lore.kernel.org/r/20220524055631.85480-3-hare@suse.de Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Signed-off-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# 973dac8a	30-Mar-2022	John Garry <john.garry@huawei.com>	scsi: core: Refine how we set tag_set NUMA node For SCSI hosts which enable host_tagset the NUMA node returned from blk_mq_hw_queue_to_node() is NUMA_NO_NODE always. Then, since in scsi_mq_setup_tags() the default we choose for the tag_set NUMA node is NUMA_NO_NODE, we always evaluate the NUMA node as NUMA_NO_NODE in functions like blk_mq_alloc_rq_map(). The reason we get NUMA_NO_NODE from blk_mq_hw_queue_to_node() is that the hctx_idx passed is BLK_MQ_NO_HCTX_IDX - so we can't match against a (HW) queue mapping index. Improve this by defaulting the tag_set NUMA node to the same NUMA node of the SCSI host DMA dev. Link: https://lore.kernel.org/r/1648640315-21419-1-git-send-email-john.garry@huawei.com Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# 6aded12b	24-Feb-2022	Christoph Hellwig <hch@lst.de>	scsi: core: Remove struct scsi_request Let submitters initialize the scmd->allowed field directly instead of indirecting through struct scsi_request and remove the now superfluous structure. Link: https://lore.kernel.org/r/20220224175552.988286-8-hch@lst.de Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: John Garry <john.garry@huawei.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# dbb4c84d	24-Feb-2022	Christoph Hellwig <hch@lst.de>	scsi: core: Move the result field from struct scsi_request to struct scsi_cmnd Prepare for removing the scsi_request structure by moving the result field to struct scsi_cmnd. Link: https://lore.kernel.org/r/20220224175552.988286-7-hch@lst.de Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: John Garry <john.garry@huawei.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# a9a4ea11	24-Feb-2022	Christoph Hellwig <hch@lst.de>	scsi: core: Move the resid_len field from struct scsi_request to struct scsi_cmnd Prepare for removing the scsi_request structure by moving the resid_len field to struct scsi_cmnd. Link: https://lore.kernel.org/r/20220224175552.988286-6-hch@lst.de Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: John Garry <john.garry@huawei.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# 5b794f98	24-Feb-2022	Christoph Hellwig <hch@lst.de>	scsi: core: Remove the sense and sense_len fields from struct scsi_request Just use the sense_buffer field in struct scsi_cmnd for the sense data and move the sense_len field over to struct scsi_cmnd. Link: https://lore.kernel.org/r/20220224175552.988286-5-hch@lst.de Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: John Garry <john.garry@huawei.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# ce70fd9a	24-Feb-2022	Christoph Hellwig <hch@lst.de>	scsi: core: Remove the cmd field from struct scsi_request Now that each scsi_request is backed by a scsi_cmnd, there is no need to indirect the CDB storage. Change all submitters of SCSI passthrough requests to store the CDB information directly in the scsi_cmnd, and while doing so allocate the full 32 bytes that cover all Linux supported SCSI hosts instead of requiring dynamic allocation for > 16 byte CDBs. On 64-bit systems this does not change the size of the scsi_cmnd at all, while on 32-bit systems it slightly increases it for now, but that increase will be made up by the removal of the remaining scsi_request fields. Link: https://lore.kernel.org/r/20220224175552.988286-4-hch@lst.de Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: John Garry <john.garry@huawei.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# 71bada34	24-Feb-2022	Christoph Hellwig <hch@lst.de>	scsi: core: Don't memset() the entire scsi_cmnd in scsi_init_command() Replace the big fat memset that requires saving and restoring various fields with just initializing those fields that need initialization. All the clearing to 0 is moved to scsi_prepare_cmd() as scsi_ioctl_reset() alreadly uses kzalloc() to allocate a pre-zeroed command. This is still conservative and can probably be optimized further. Link: https://lore.kernel.org/r/20220224175552.988286-3-hch@lst.de Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: John Garry <john.garry@huawei.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# b84b6ec0	03-Feb-2022	Sebastian Andrzej Siewior <sebastian@breakpoint.cc>	scsi: core: Add scsi_done_direct() for immediate completion Add scsi_done_direct() which behaves like scsi_done() except that it invokes blk_mq_complete_request_direct() in order to complete the request. Callers from process context can complete the request directly instead waking ksoftirqd. Link: https://lore.kernel.org/r/Yfw7JaszshmfYa1d@flow Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Sebastian Andrzej Siewior <sebastian@breakpoint.cc> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# 9574d434	03-Feb-2022	Song Liu <song@kernel.org>	scsi: use BLK_STS_OFFLINE for not fully online devices The new error message for such case looks like [ 172.809565] device offline error, dev sda, sector 3138208 ... which will not be confused with regular I/O error (BLK_STS_IOERR). Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Song Liu <song@kernel.org> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20220203192827.1370270-4-song@kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
# 81d3f500	29-Sep-2021	Damien Le Moal <damien.lemoal@wdc.com>	scsi: core: Fix scsi_mode_select() interface The modepage argument is unused. Remove it. Link: https://lore.kernel.org/r/20210929091744.706003-3-damien.lemoal@wdc.com Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# b84ba30b	26-Nov-2021	Christoph Hellwig <hch@lst.de>	block: remove the gendisk argument to blk_execute_rq Remove the gendisk aregument to blk_execute_rq and blk_execute_rq_nowait given that it is unused now. Also convert the boolean at_head parameter to actually use the bool type while touching the prototype. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20211126121802.2090656-5-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
# f3fa33ac	26-Nov-2021	Christoph Hellwig <hch@lst.de>	block: remove the ->rq_disk field in struct request Just use the disk attached to the request_queue instead. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20211126121802.2090656-4-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
# 79478bf9	16-Nov-2021	Christoph Hellwig <hch@lst.de>	block: move blk_rq_err_bytes to scsi blk_rq_err_bytes is only used by the scsi midlayer, so move it there. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20211117061404.331732-2-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
# 703535e6	03-Nov-2021	Tadeusz Struk <tadeusz.struk@linaro.org>	scsi: core: Remove command size deduction from scsi_setup_scsi_cmnd() No need to deduce command size in scsi_setup_scsi_cmnd() anymore as appropriate checks have been added to scsi_fill_sghdr_rq() function and the cmd_len should never be zero here. The code to do that wasn't correct anyway, as it used uninitialized cmd->cmnd, which caused a null-ptr-deref if the command size was zero as in the trace below. Fix this by removing the unneeded code. KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007] CPU: 0 PID: 1822 Comm: repro Not tainted 5.15.0 #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-4.fc34 04/01/2014 Call Trace: blk_mq_dispatch_rq_list+0x7c7/0x12d0 __blk_mq_sched_dispatch_requests+0x244/0x380 blk_mq_sched_dispatch_requests+0xf0/0x160 __blk_mq_run_hw_queue+0xe8/0x160 __blk_mq_delay_run_hw_queue+0x252/0x5d0 blk_mq_run_hw_queue+0x1dd/0x3b0 blk_mq_sched_insert_request+0x1ff/0x3e0 blk_execute_rq_nowait+0x173/0x1e0 blk_execute_rq+0x15c/0x540 sg_io+0x97c/0x1370 scsi_ioctl+0xe16/0x28e0 sd_ioctl+0x134/0x170 blkdev_ioctl+0x362/0x6e0 block_ioctl+0xb0/0xf0 vfs_ioctl+0xa7/0xf0 do_syscall_64+0x3d/0xb0 entry_SYSCALL_64_after_hwframe+0x44/0xae ---[ end trace 8b086e334adef6d2 ]--- Kernel panic - not syncing: Fatal exception Link: https://lore.kernel.org/r/20211103170659.22151-2-tadeusz.struk@linaro.org Fixes: 2ceda20f0a99 ("scsi: core: Move command size detection out of the fast path") Cc: Bart Van Assche <bvanassche@acm.org> Cc: Christoph Hellwig <hch@lst.de> Cc: James E.J. Bottomley <jejb@linux.ibm.com> Cc: Martin K. Petersen <martin.petersen@oracle.com> Cc: <linux-scsi@vger.kernel.org> Cc: <linux-kernel@vger.kernel.org> Cc: <stable@vger.kernel.org> # 5.15, 5.14, 5.10 Reported-by: syzbot+5516b30f5401d4dcbcae@syzkaller.appspotmail.com Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Tadeusz Struk <tadeusz.struk@linaro.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# 5ae17501	29-Oct-2021	Ewan D. Milne <emilne@redhat.com>	scsi: core: Avoid leaving shost->last_reset with stale value if EH does not run The changes to issue the abort from the scmd->abort_work instead of the EH thread introduced a problem if eh_deadline is used. If aborting the command(s) is successful, and there are never any scmds added to the shost->eh_cmd_q, there is no code path which will reset the ->last_reset value back to zero. The effect of this is that after a successful abort with no EH thread activity, a subsequent timeout, perhaps a long time later, might immediately be considered past a user-set eh_deadline time, and the host will be reset with no attempt at recovery. Fix this by resetting ->last_reset back to zero in scmd_eh_abort_handler() if it is determined that the EH thread will not run to do this. Thanks to Gopinath Marappan for investigating this problem. Link: https://lore.kernel.org/r/20211029194311.17504-2-emilne@redhat.com Fixes: e494f6a72839 ("[SCSI] improved eh timeout handler") Cc: stable@vger.kernel.org Signed-off-by: Ewan D. Milne <emilne@redhat.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# 93542fbf	09-Nov-2021	Ming Lei <ming.lei@redhat.com>	scsi: make sure that request queue queiesce and unquiesce balanced For fixing queue quiesce race between driver and block layer(elevator switch, update nr_requests, ...), we need to support concurrent quiesce and unquiesce, which requires the two call balanced. It isn't easy to audit that in all scsi drivers, especially the two may be called from different contexts, so do it in scsi core with one per-device atomic variable to balance quiesce and unquiesce. Reported-by: Yi Zhang <yi.zhang@redhat.com> Fixes: e70feb8b3e68 ("blk-mq: support concurrent queue quiesce/unquiesce") Signed-off-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20211109071144.181581-4-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
# d2b9f12b	09-Nov-2021	Ming Lei <ming.lei@redhat.com>	scsi: avoid to quiesce sdev->request_queue two times For fixing queue quiesce race between driver and block layer(elevator switch, update nr_requests, ...), we need to support concurrent quiesce and unquiesce, which requires the two to be balanced. blk_mq_quiesce_queue() calls blk_mq_quiesce_queue_nowait() for updating quiesce depth and marking the flag, then scsi_internal_device_block() calls blk_mq_quiesce_queue_nowait() two times actually. Fix the double quiesce and keep quiesce and unquiesce balanced. Reported-by: Yi Zhang <yi.zhang@redhat.com> Fixes: e70feb8b3e68 ("blk-mq: support concurrent queue quiesce/unquiesce") Signed-off-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20211109071144.181581-3-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
# 11b68e36	07-Oct-2021	Bart Van Assche <bvanassche@acm.org>	scsi: core: Call scsi_done directly Conditional statements are faster than indirect calls. Hence call scsi_done() directly. Since this patch removes the last user of the scsi_done member, also remove that data structure member. Link: https://lore.kernel.org/r/20211007204618.2196847-11-bvanassche@acm.org Reviewed-by: Benjamin Block <bblock@linux.ibm.com> Reviewed-by: Bean Huo <beanhuo@micron.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# a710eacb	07-Oct-2021	Bart Van Assche <bvanassche@acm.org>	scsi: core: Rename scsi_mq_done() into scsi_done() and export it Since the removal of the legacy block layer there is only one completion function left in the SCSI core, namely scsi_mq_done(). Rename it into scsi_done(). Export that function to allow SCSI LLDs to call it directly. Link: https://lore.kernel.org/r/20211007202923.2174984-3-bvanassche@acm.org Reviewed-by: Benjamin Block <bblock@linux.ibm.com> Reviewed-by: Bean Huo <beanhuo@micron.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# bf23e619	07-Oct-2021	Bart Van Assche <bvanassche@acm.org>	scsi: core: Use a structure member to track the SCSI command submitter Conditional statements are faster than indirect calls. Use a structure member to track the SCSI command submitter such that later patches can call scsi_done(scmd) instead of scmd->scsi_done(scmd). The asymmetric behavior that scsi_send_eh_cmnd() sets the submission context to the SCSI error handler and that it does not restore the submission context to the SCSI core is retained. Link: https://lore.kernel.org/r/20211007202923.2174984-2-bvanassche@acm.org Cc: Hannes Reinecke <hare@suse.com> Cc: Ming Lei <ming.lei@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Reviewed-by: Benjamin Block <bblock@linux.ibm.com> Reviewed-by: Bean Huo <beanhuo@micron.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# e9076e7f	29-Sep-2021	Bart Van Assche <bvanassche@acm.org>	scsi: core: Fix spelling in a source code comment The typo in this source code comment makes the comment confusing. Clear up the confusion by fixing the typo. Link: https://lore.kernel.org/r/20210929182318.2060489-1-bvanassche@acm.org Fixes: bc85dc500f9d ("scsi: remove scsi_end_request") Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# a7d6840b	20-Aug-2021	Damien Le Moal <damien.lemoal@wdc.com>	scsi: core: Fix scsi_mode_select() buffer length handling The MODE SELECT(6) command allows handling mode page buffers that are up to 255 bytes, including the 4 byte header needed in front of the page buffer. For requests larger than this limit, automatically use the MODE SELECT(10) command. In both cases, since scsi_mode_select() adds the mode select page header, checks on the buffer length value must include this header size to avoid overflows of the command CDB allocation length field. While at it, use put_unaligned_be16() for setting the header block descriptor length and CDB allocation length when using MODE SELECT(10). [mkp: fix MODE SENSE vs. MODE SELECT confusion] Link: https://lore.kernel.org/r/20210820070255.682775-3-damien.lemoal@wdc.com Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# 17b49bcb	20-Aug-2021	Damien Le Moal <damien.lemoal@wdc.com>	scsi: core: Fix scsi_mode_sense() buffer length handling Several problems exist with scsi_mode_sense() buffer length handling: 1) The allocation length field of the MODE SENSE(10) command is 16-bits, occupying bytes 7 and 8 of the CDB. With this command, access to mode pages larger than 255 bytes is thus possible. However, the CDB allocation length field is set by assigning len to byte 8 only, thus truncating buffer length larger than 255. 2) If scsi_mode_sense() is called with len smaller than 8 with sdev->use_10_for_ms set, or smaller than 4 otherwise, the buffer length is increased to 8 and 4 respectively, and the buffer is zero filled with these increased values, thus corrupting the memory following the buffer. Fix these 2 problems by using put_unaligned_be16() to set the allocation length field of MODE SENSE(10) CDB and by returning an error when len is too small. Furthermore, if len is larger than 255B, always try MODE SENSE(10) first, even if the device driver did not set sdev->use_10_for_ms. In case of invalid opcode error for MODE SENSE(10), access to mode pages larger than 255 bytes are not retried using MODE SENSE(6). To avoid buffer length overflows for the MODE_SENSE(10) case, check that len is smaller than 65535 bytes. While at it, also fix the folowing: * Use get_unaligned_be16() to retrieve the mode data length and block descriptor length fields of the mode sense reply header instead of using an open coded calculation. * Fix the kdoc dbd argument explanation: the DBD bit stands for Disable Block Descriptor, which is the opposite of what the dbd argument description was. Link: https://lore.kernel.org/r/20210820070255.682775-2-damien.lemoal@wdc.com Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# 0bf6d96c	25-Oct-2021	Christoph Hellwig <hch@lst.de>	block: remove blk_{get,put}_request These are now pointless wrappers around blk_mq_{alloc,free}_request, so remove them. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20211025070517.1548584-3-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
# 4845012e	21-Oct-2021	Christoph Hellwig <hch@lst.de>	block: remove QUEUE_FLAG_SCSI_PASSTHROUGH Export scsi_device_from_queue for use with pktcdvd and use that instead of the otherwise unused QUEUE_FLAG_SCSI_PASSTHROUGH queue flag. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Link: https://lore.kernel.org/r/20211021060607.264371-8-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
# 68ec3b81	21-Oct-2021	Christoph Hellwig <hch@lst.de>	scsi: add a scsi_alloc_request helper Add a new helper that calls blk_get_request and initializes the scsi_request to avoid the indirect call through ->.initialize_rq_fn. Note that this makes the pktcdvd driver depend on the SCSI core, but given that only SCSI devices support SCSI passthrough requests that is not a functional change. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Link: https://lore.kernel.org/r/20211021060607.264371-6-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
# 5a72e899	12-Oct-2021	Jens Axboe <axboe@kernel.dk>	block: add a struct io_comp_batch argument to fops->iopoll() struct io_comp_batch contains a list head and a completion handler, which will allow completions to more effciently completed batches of IO. For now, no functional changes in this patch, we just define the io_comp_batch structure and add the argument to the file_operations iopoll handler. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
# fe45e630	20-Sep-2021	Christoph Hellwig <hch@lst.de>	block: move integrity handling out of <linux/blkdev.h> Split the integrity/metadata handling definitions out into a new header. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20210920123328.1399408-17-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
# 4c7b6ea3	13-Aug-2021	John Garry <john.garry@huawei.com>	scsi: core: Remove scsi_cmnd.tag It is never read, so get rid of it. Link: https://lore.kernel.org/r/1628862553-179450-4-git-send-email-john.garry@huawei.com Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# 2266a2de	09-Aug-2021	Bart Van Assche <bvanassche@acm.org>	scsi: core: Remove the request member from struct scsi_cmnd Since all scsi_cmnd.request users are gone, remove the request pointer from struct scsi_cmnd. Link: https://lore.kernel.org/r/20210809230355.8186-53-bvanassche@acm.org Cc: Christoph Hellwig <hch@lst.de> Cc: Hannes Reinecke <hare@suse.de> Cc: Ming Lei <ming.lei@redhat.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# aa8e25e5	09-Aug-2021	Bart Van Assche <bvanassche@acm.org>	scsi: core: Use scsi_cmd_to_rq() instead of scsi_cmnd.request Prepare for removal of the request pointer by using scsi_cmd_to_rq() instead. Cast away constness where necessary when passing a SCSI command pointer to scsi_cmd_to_rq(). This patch does not change any functionality. Link: https://lore.kernel.org/r/20210809230355.8186-3-bvanassche@acm.org Cc: Christoph Hellwig <hch@lst.de> Cc: Hannes Reinecke <hare@suse.de> Cc: Ming Lei <ming.lei@redhat.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# 2cece377	24-Jul-2021	Christoph Hellwig <hch@lst.de>	scsi: scsi_ioctl: Remove scsi_req_init() Merge scsi_req_init() into its only caller. Link: https://lore.kernel.org/r/20210724072033.1284840-16-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# 422969bb	09-Jul-2021	Bart Van Assche <bvanassche@acm.org>	scsi: core: Fix the documentation of the scsi_execute() time parameter The unit of the scsi_execute() timeout parameter is 1/HZ seconds instead of one second, just like the timeouts used in the block layer. Fix the documentation header above the definition of the scsi_execute() macro. Link: https://lore.kernel.org/r/20210709202638.9480-3-bvanassche@acm.org Fixes: "[SCSI] use scatter lists for all block pc requests and simplify hw handlers" # v2.6.16.28 Cc: "James E.J. Bottomley" <jejb@linux.ibm.com> Cc: Hannes Reinecke <hare@suse.de> Cc: Ming Lei <ming.lei@redhat.com> Cc: John Garry <john.garry@huawei.com> Reviewed-by: Avri Altman <avri.altman@wdc.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# 104739ac	29-Jun-2021	Quat Le <quat.le@oracle.com>	scsi: core: Retry I/O for Notify (Enable Spinup) Required error If the device is power-cycled, it takes time for the initiator to transmit the periodic NOTIFY (ENABLE SPINUP) SAS primitive, and for the device to respond to the primitive to become ACTIVE. Retry the I/O request to allow the device time to become ACTIVE. Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20210629155826.48441-1-quat.le@oracle.com Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Quat Le <quat.le@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# 59506abe	21-Jun-2021	Bart Van Assche <bvanassche@acm.org>	scsi: core: Inline scsi_mq_alloc_queue() Since scsi_mq_alloc_queue() only has one caller, inline it. This change was suggested by Christoph Hellwig. Link: https://lore.kernel.org/r/20210622024654.12543-1-bvanassche@acm.org Cc: Christoph Hellwig <hch@lst.de> Cc: Ming Lei <ming.lei@redhat.com> Cc: Ed Tsai <ed.tsai@mediatek.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# da6269da	24-Jun-2021	Christoph Hellwig <hch@lst.de>	block: remove REQ_OP_SCSI_{IN,OUT} With the legacy IDE driver gone drivers now use either REQ_OP_DRV_* or REQ_OP_SCSI_*, so unify the two concepts of passthrough requests into a single one. Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
# 3d45cefc	27-Apr-2021	Hannes Reinecke <hare@suse.de>	scsi: core: Drop obsolete Linux-specific SCSI status codes Originally the SCSI subsystem has been using 'special' SCSI status codes, which were the SAM-specified ones but shifted by 1. As most drivers have now been modified to use the SAM-specified ones, having two nearly identical sets of definitions only causes confusion. The Linux-specifed SCSI status codes have been marked obsolete for several years so drop them and use the SAM-specified status codes throughout. Link: https://lore.kernel.org/r/20210427083046.31620-41-hare@suse.de Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Douglas Gilbert <dgilbert@interlog.com> Signed-off-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# a7479a84	27-Apr-2021	Hannes Reinecke <hare@suse.de>	scsi: core: Kill message byte Remove last vestiges of SCSI status message bytes. Link: https://lore.kernel.org/r/20210427083046.31620-39-hare@suse.de Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# 464a00c9	27-Apr-2021	Hannes Reinecke <hare@suse.de>	scsi: core: Kill DRIVER_SENSE Replace the check for DRIVER_SENSE with a check for scsi_status_is_check_condition(). Audit all callsites to ensure the SAM status is set correctly. For backwards compability move the DRIVER_SENSE definition to sg.h, and update sg, bsg, and scsi_ioctl to set the DRIVER_SENSE driver_status whenever SAM_STAT_CHECK_CONDITION is present. [mkp: fix zeroday srp warning] Link: https://lore.kernel.org/r/20210427083046.31620-10-hare@suse.de Signed-off-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> fix
# d0672a03	27-Apr-2021	Hannes Reinecke <hare@suse.de>	scsi: core: Introduce scsi_status_is_check_condition() Add a helper function scsi_status_is_check_condition() to encapsulate the frequent checks for SAM_STAT_CHECK_CONDITION. Link: https://lore.kernel.org/r/20210427083046.31620-9-hare@suse.de Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# f2b1e9c6	27-Apr-2021	Hannes Reinecke <hare@suse.de>	scsi: core: Introduce scsi_build_sense() Introduce scsi_build_sense() as a wrapper around scsi_build_sense_buffer() to format the buffer and set the correct SCSI status. Link: https://lore.kernel.org/r/20210427083046.31620-8-hare@suse.de Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# ced202f7	27-Apr-2021	Hannes Reinecke <hare@suse.de>	scsi: core: Stop using DRIVER_ERROR Return the actual error code in __scsi_execute() (which, according to the documentation, should have happened anyway). And audit all callers to cope with negative return values from __scsi_execute() and friends. [mkp: resolve conflict and return bool] Link: https://lore.kernel.org/r/20210427083046.31620-7-hare@suse.de Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# 64aaa3dd	27-Apr-2021	Hannes Reinecke <hare@suse.de>	scsi: core: Reshuffle response handling in scsi_mode_sense() Reshuffle response handling in scsi_mode_sense() to make the code easier to follow. [mkp: fix build] Link: https://lore.kernel.org/r/20210427083046.31620-5-hare@suse.de Suggested-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# 8793613d	27-Apr-2021	Hannes Reinecke <hare@suse.de>	scsi: core: Fixup calling convention for scsi_mode_sense() The description for scsi_mode_sense() claims to return the number of valid bytes on success, which is not what the code does. Additionally there is no gain in returning the SCSI status, as everything the callers do is to check against scsi_result_is_good(), which is what scsi_mode_sense() does already. So change the calling convention to return a standard error code on failure, and 0 on success, and adapt the description and all callers. Link: https://lore.kernel.org/r/20210427083046.31620-4-hare@suse.de Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# b8e162f9	15-Apr-2021	Bart Van Assche <bvanassche@acm.org>	scsi: core: Introduce enum scsi_disposition Improve readability of the code in the SCSI core by introducing an enumeration type for the values used internally that decide how to continue processing a SCSI command. The eh_*_handler return values have not been changed because that would involve modifying all SCSI drivers. The output of the following command has been inspected to verify that no out-of-range values are assigned to a variable of type enum scsi_disposition: KCFLAGS=-Wassign-enum make CC=clang W=1 drivers/scsi/ Link: https://lore.kernel.org/r/20210415220826.29438-6-bvanassche@acm.org Cc: Christoph Hellwig <hch@lst.de> Cc: Johannes Thumshirn <johannes.thumshirn@wdc.com> Cc: Hannes Reinecke <hare@suse.de> Cc: Daniel Wagner <dwagner@suse.de> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# 0d2810cd	15-Apr-2021	Bart Van Assche <bvanassche@acm.org>	scsi: core: Rename scsi_softirq_done() into scsi_complete() Commit 320ae51feed5 ("blk-mq: new multi-queue block IO queueing mechanism"; v3.13) introduced a code path that calls the blk-mq completion function from interrupt context. scsi-mq was introduced by commit d285203cf647 ("scsi: add support for a blk-mq based I/O path."; v3.17). Since the introduction of scsi-mq, scsi_softirq_done() can be called from interrupt context. That made the name of the function misleading, rename it to scsi_complete(). Link: https://lore.kernel.org/r/20210415220826.29438-4-bvanassche@acm.org Cc: Christoph Hellwig <hch@lst.de> Cc: Johannes Thumshirn <johannes.thumshirn@wdc.com> Cc: Hannes Reinecke <hare@suse.de> Cc: Daniel Wagner <dwagner@suse.de> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# 76fc0df9	15-Apr-2021	Bart Van Assche <bvanassche@acm.org>	scsi: core: Make the scsi_alloc_sgtables() documentation more accurate The current scsi_alloc_sgtables() documentation does not accurately explain what this function does. Hence improve the documentation of this function. Link: https://lore.kernel.org/r/20210415220826.29438-2-bvanassche@acm.org Cc: Christoph Hellwig <hch@lst.de> Cc: Johannes Thumshirn <johannes.thumshirn@wdc.com> Cc: Hannes Reinecke <hare@suse.de> Cc: Daniel Wagner <dwagner@suse.de> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# aaff5eba	31-Mar-2021	Christoph Hellwig <hch@lst.de>	scsi: remove the unchecked_isa_dma flag Remove the unchecked_isa_dma now that all users are gone. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Link: https://lore.kernel.org/r/20210331073001.46776-6-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
# 4309ea74	15-Feb-2021	Kashyap Desai <kashyap.desai@broadcom.com>	scsi: core: Set shost as hctx driver_data hctx->driver_data is not set for SCSI currently. Set hctx->driver_data = shost. Link: https://lore.kernel.org/r/20210215074048.19424-6-kashyap.desai@broadcom.com Suggested-by: John Garry <john.garry@huawei.com> Reviewed-by: John Garry <john.garry@huawei.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# af183095	15-Feb-2021	Kashyap Desai <kashyap.desai@broadcom.com>	scsi: core: Add mq_poll support to SCSI layer Currently IOPOLL support is only available in block layer. This patch adds mq_poll support to the SCSI layer. Link: https://lore.kernel.org/r/20210215074048.19424-2-kashyap.desai@broadcom.com Cc: sumit.saxena@broadcom.com Cc: chandrakanth.patil@broadcom.com Cc: linux-block@vger.kernel.org Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: John Garry <john.garry@huawei.com> Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# 020b0f0a	21-Jan-2021	Ming Lei <ming.lei@redhat.com>	scsi: core: Replace sdev->device_busy with sbitmap SCSI currently uses an atomic variable to track queue depth for each attached device. The queue depth depends on many factors such as transport type and device implementation. In addition, the SCSI device queue depth is not a static entity but changes over time as a result of congestion management. While blk-mq currently tracks queue depth for each hctx, it can't easily be changed to accommodate the SCSI per-device requirement. The current approach of using an atomic variable doesn't scale well when there are lots of CPU cores and the disk is very fast. IOPS can be substantially impacted by the atomic in the hot path. Replace the atomic variable sdev->device_busy with an sbitmap for tracking the SCSI device queue depth. It has been observed that IOPS is improved ~30% by this patchset in the following test: 1) test machine(32 logical CPU cores) Thread(s) per core: 2 Core(s) per socket: 8 Socket(s): 2 NUMA node(s): 2 Model name: Intel(R) Xeon(R) Silver 4110 CPU @ 2.10GHz 2) setup scsi_debug: modprobe scsi_debug virtual_gb=128 max_luns=1 submit_queues=32 delay=0 max_queue=256 3) fio script: fio --rw=randread --size=128G --direct=1 --ioengine=libaio --iodepth=2048 \ --numjobs=32 --bs=4k --group_reporting=1 --group_reporting=1 --runtime=60 \ --loops=10000 --name=job1 --filename=/dev/sdN [mkp: fix device_busy reference in mpt3sas] Link: https://lore.kernel.org/r/20210122023317.687987-14-ming.lei@redhat.com Link: https://lore.kernel.org/linux-block/20200119071432.18558-6-ming.lei@redhat.com/ Cc: Omar Sandoval <osandov@fb.com> Cc: Kashyap Desai <kashyap.desai@broadcom.com> Cc: Sumanesh Samanta <sumanesh.samanta@broadcom.com> Cc: Ewan D. Milne <emilne@redhat.com> Cc: Hannes Reinecke <hare@suse.de> Tested-by: Sumanesh Samanta <sumanesh.samanta@broadcom.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# 8278807a	21-Jan-2021	Ming Lei <ming.lei@redhat.com>	scsi: core: Add scsi_device_busy() wrapper Add scsi_device_busy() helper to prepare drivers for tracking device queue depth via sbitmap_queue. Link: https://lore.kernel.org/r/20210122023317.687987-12-ming.lei@redhat.com Cc: Omar Sandoval <osandov@fb.com> Cc: Kashyap Desai <kashyap.desai@broadcom.com> Cc: Sumanesh Samanta <sumanesh.samanta@broadcom.com> Cc: Ewan D. Milne <emilne@redhat.com> Tested-by: Sumanesh Samanta <sumanesh.samanta@broadcom.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# 2a5a24aa	21-Jan-2021	Ming Lei <ming.lei@redhat.com>	scsi: blk-mq: Return budget token from .get_budget callback SCSI uses a global atomic variable to track queue depth for each LUN/request queue. This doesn't scale well when there are lots of CPU cores and the disk is very fast. It has been observed that IOPS is affected a lot by tracking queue depth via sdev->device_busy in the I/O path. Return budget token from .get_budget callback. The budget token can be passed to driver so that we can replace the atomic variable with sbitmap_queue and alleviate the scaling problems that way. Link: https://lore.kernel.org/r/20210122023317.687987-9-ming.lei@redhat.com Cc: Omar Sandoval <osandov@fb.com> Cc: Kashyap Desai <kashyap.desai@broadcom.com> Cc: Sumanesh Samanta <sumanesh.samanta@broadcom.com> Cc: Ewan D. Milne <emilne@redhat.com> Tested-by: Sumanesh Samanta <sumanesh.samanta@broadcom.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# d022d18c	21-Jan-2021	Ming Lei <ming.lei@redhat.com>	scsi: blk-mq: Add callbacks for storing & retrieving budget token Since SCSI is the only driver which requires dispatch budget move the token from struct request to struct scsi_cmnd. Link: https://lore.kernel.org/r/20210122023317.687987-8-ming.lei@redhat.com Cc: Omar Sandoval <osandov@fb.com> Cc: Kashyap Desai <kashyap.desai@broadcom.com> Cc: Sumanesh Samanta <sumanesh.samanta@broadcom.com> Cc: Ewan D. Milne <emilne@redhat.com> Cc: Hannes Reinecke <hare@suse.de> Tested-by: Sumanesh Samanta <sumanesh.samanta@broadcom.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# 962c8dcd	06-Jan-2021	Muneendra Kumar <muneendra.kumar@broadcom.com>	scsi: core: Add a new error code DID_TRANSPORT_MARGINAL in scsi.h Add code in scsi_result_to_blk_status to translate a new error DID_TRANSPORT_MARGINAL to the corresponding blk_status_t i.e BLK_STS_TRANSPORT. Add DID_TRANSPORT_MARGINAL case to scsi_decide_disposition(). Link: https://lore.kernel.org/r/1609969748-17684-2-git-send-email-muneendra.kumar@broadcom.com Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Reviewed-by: Ewan D. Milne <emilne@redhat.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Muneendra Kumar <muneendra.kumar@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# 684da762	24-Jan-2021	Guoqing Jiang <guoqing.jiang@cloud.ionos.com>	block: remove unnecessary argument from blk_execute_rq We can remove 'q' from blk_execute_rq as well after the previous change in blk_execute_rq_nowait. And more importantly it never really was needed to start with given that we can trivial derive it from struct request. Cc: linux-scsi@vger.kernel.org Cc: virtualization@lists.linux-foundation.org Cc: linux-ide@vger.kernel.org Cc: linux-mmc@vger.kernel.org Cc: linux-nvme@lists.infradead.org Cc: linux-nfs@vger.kernel.org Acked-by: Ulf Hansson <ulf.hansson@linaro.org> # for mmc Signed-off-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
# e6044f71	08-Dec-2020	Bart Van Assche <bvanassche@acm.org>	scsi: core: Only process PM requests if rpm_status != RPM_ACTIVE Instead of submitting all SCSI commands submitted with scsi_execute() to a SCSI device if rpm_status != RPM_ACTIVE, only submit RQF_PM (power management requests) if rpm_status != RPM_ACTIVE. This patch makes the SCSI core handle the runtime power management status (rpm_status) as it should be handled. Link: https://lore.kernel.org/r/20201209052951.16136-7-bvanassche@acm.org Cc: Can Guo <cang@codeaurora.org> Cc: Stanley Chu <stanley.chu@mediatek.com> Cc: Alan Stern <stern@rowland.harvard.edu> Cc: Ming Lei <ming.lei@redhat.com> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Martin Kepplinger <martin.kepplinger@puri.sm> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Jens Axboe <axboe@kernel.dk> Reviewed-by: Can Guo <cang@codeaurora.org> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# 673235f9	02-Dec-2020	Ming Lei <ming.lei@redhat.com>	scsi: core: Fix race between handling STS_RESOURCE and completion When queuing I/O request to LLD, STS_RESOURCE may be returned because: - Host is in recovery or blocked - Target queue throttling or target is blocked - LLD rejection In these scenarios BLK_STS_DEV_RESOURCE is returned to the block layer to avoid an unnecessary re-run of the queue. However, all of the requests queued to this SCSI device may complete immediately after reading 'sdev->device_busy' and BLK_STS_DEV_RESOURCE is returned to block layer. In that case the current I/O won't get a chance to get queued since it is invisible at that time for both scsi_run_queue_async() and blk-mq's RESTART. Fix the issue by not returning BLK_STS_DEV_RESOURCE in this situation. Link: https://lore.kernel.org/r/20201202100419.525144-1-ming.lei@redhat.com Fixes: 86ff7c2a80cd ("blk-mq: introduce BLK_STS_DEV_RESOURCE") Cc: Hannes Reinecke <hare@suse.com> Cc: Sumit Saxena <sumit.saxena@broadcom.com> Cc: Kashyap Desai <kashyap.desai@broadcom.com> Cc: Bart Van Assche <bvanassche@acm.org> Cc: Ewan Milne <emilne@redhat.com> Cc: Long Li <longli@microsoft.com> Reported-by: John Garry <john.garry@huawei.com> Tested-by: "chenxiang (M)" <chenxiang66@hisilicon.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# 16d6317e	29-Oct-2020	Martin Wilck <mwilck@suse.com>	scsi: core: Replace while-loop by for-loop in scsi_vpd_lun_id() This makes the code slightly more readable. Link: https://lore.kernel.org/r/20201029170846.14786-2-mwilck@suse.com Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Martin Wilck <mwilck@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# 2e4209b3	29-Oct-2020	Martin Wilck <mwilck@suse.com>	scsi: core: Fix VPD LUN ID designator priorities The current implementation of scsi_vpd_lun_id() uses the designator length as an implicit measure of priority. This works most of the time, but not always. For example, some Hitachi storage arrays return this in VPD 0x83: VPD INQUIRY: Device Identification page Designation descriptor number 1, descriptor length: 24 designator_type: T10 vendor identification, code_set: ASCII associated with the Addressed logical unit vendor id: HITACHI vendor specific: 5030C3502025 Designation descriptor number 2, descriptor length: 6 designator_type: vendor specific [0x0], code_set: Binary associated with the Target port vendor specific: 08 03 Designation descriptor number 3, descriptor length: 20 designator_type: NAA, code_set: Binary associated with the Addressed logical unit NAA 6, IEEE Company_id: 0x60e8 Vendor Specific Identifier: 0x7c35000 Vendor Specific Identifier Extension: 0x30c35000002025 [0x60060e8007c350000030c35000002025] The current code would use the first descriptor because it's longer than the NAA descriptor. But this is wrong, the kernel is supposed to prefer NAA descriptors over T10 vendor ID. Designator length should only be used to compare designators of the same type. This patch addresses the issue by separating designator priority and length. Link: https://lore.kernel.org/r/20201029170846.14786-1-mwilck@suse.com Fixes: 9983bed3907c ("scsi: Add scsi_vpd_lun_id()") Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Martin Wilck <mwilck@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# 0d882320	30-Sep-2020	Hannes Reinecke <hare@suse.de>	scsi: core: Return BLK_STS_AGAIN for ALUA transitioning Whenever we encounter a sense code of ALUA transitioning in scsi_io_completion() it means that the SCSI midlayer ran out of retries trying to wait for ALUA transitioning. In these cases we should be passing up the error, but signalling that the I/O might be retried, preferably on another path. So return BLK_STS_AGAIN in these cases. [mkp: typo + fallthrough] Link: https://lore.kernel.org/r/20200930080256.90964-5-hare@suse.de Reviewed-by: Ewan D. Milne <emilne@redhat.com> Signed-off-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# 268940b8	30-Sep-2020	Hannes Reinecke <hare@suse.de>	scsi: scsi_dh_alua: Return BLK_STS_AGAIN for ALUA transitioning state When the ALUA state indicates transitioning we should not retry the command immediately, but rather complete the command with BLK_STS_AGAIN to signal the completion handler that it might be retried. This allows multipathing to redirect the command to another path if possible, and avoid stalls during lengthy transitioning times. Link: https://lore.kernel.org/r/20200930080256.90964-3-hare@suse.de Reviewed-by: Ewan D. Milne <emilne@redhat.com> Signed-off-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# ae6b4e69	23-Oct-2020	Mauro Carvalho Chehab <mchehab+huawei@kernel.org>	scsi: doc: Fix some kernel-doc markups Some identifiers have different names between their prototypes and the kernel-doc markup. [mkp: fix whitespace] Link: https://lore.kernel.org/r/8ed7f149f25a363eea76e514c253c4e337c59379.1603469755.git.mchehab+huawei@kernel.org Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# d8f53b0a	24-Sep-2020	Damien Le Moal <damien.lemoal@wdc.com>	scsi: handle zone resources errors ZBC or ZAC disks that have a limit on the number of open zones may fail a zone open command or a write to a zone that is not already implicitly or explicitly open if the total number of open zones is already at the maximum allowed. For these operations, instead of returning the generic BLK_STS_IOERR, return BLK_STS_ZONE_OPEN_RESOURCE which is returned as -ETOOMANYREFS to the I/O issuer, allowing the device user to act appropriately on these relatively benign zone resource errors. Acked-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
# b6ba9b0e	08-Oct-2020	Christoph Hellwig <hch@lst.de>	scsi: core: Set sc_data_direction to DMA_NONE for no-transfer commands No having the special DMA_NONE logic makes libata rather unhappy. Link: https://lore.kernel.org/r/20201008200611.1818099-3-hch@lst.de Fixes: 40b93836a136 ("scsi: core: Use rq_dma_dir in scsi_setup_cmnd()") Reported-by: Qian Cai <cai@redhat.com> Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# ed7fb2d0	05-Oct-2020	Christoph Hellwig <hch@lst.de>	scsi: core: Only start the request just before dispatching This has no change in behavior, but improves the accounting a bit. Link: https://lore.kernel.org/r/20201005084130.143273-11-hch@lst.de Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# 74e5e6c1	05-Oct-2020	Christoph Hellwig <hch@lst.de>	scsi: core: Remove scsi_setup_cmnd() and scsi_setup_fs_cmnd() Move this trivial functionality into scsi_prepare_cmd() instead of splitting it over multiple small functions, and update the comments to better document passthrough commands as the special case. Link: https://lore.kernel.org/r/20201005084130.143273-10-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# 7007e9dd	05-Oct-2020	Christoph Hellwig <hch@lst.de>	scsi: core: Clean up allocation and freeing of sgtables Rename scsi_init_io() to scsi_alloc_sgtables(), and ensure callers call scsi_free_sgtables() to cleanup failures close to scsi_init_io() instead of leaking it down the generic I/O submission path. Link: https://lore.kernel.org/r/20201005084130.143273-9-hch@lst.de Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# 5843cc3d	05-Oct-2020	Christoph Hellwig <hch@lst.de>	scsi: core: Rename scsi_mq_prep_fn() to scsi_prepare_cmd() The old name is rather confusing now that the the legacy prep_fn is gone. Link: https://lore.kernel.org/r/20201005084130.143273-8-hch@lst.de Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# 822bd2db	05-Oct-2020	Christoph Hellwig <hch@lst.de>	scsi: core: Rename scsi_prep_state_check() to scsi_device_state_check() The old name is rather confusing now that the the legacy prep_fn is gone. Link: https://lore.kernel.org/r/20201005084130.143273-7-hch@lst.de Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# 40b93836	05-Oct-2020	Christoph Hellwig <hch@lst.de>	scsi: core: Use rq_dma_dir in scsi_setup_cmnd() Link: https://lore.kernel.org/r/20201005084130.143273-6-hch@lst.de Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# 2ceda20f	05-Oct-2020	Christoph Hellwig <hch@lst.de>	scsi: core: Move command size detection out of the fast path We only need to detect the command size for ioctl request from userspace, which is limited to the passthrough path. Move the check there instead of doing it for all queuecommand invocations. Link: https://lore.kernel.org/r/20201005084130.143273-4-hch@lst.de Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# 3a8dc5bb	05-Oct-2020	Christoph Hellwig <hch@lst.de>	scsi: core: Remove scsi_init_cmd_errh There is no good reason to keep this functionality as a separate function, just merge it into the only caller. Link: https://lore.kernel.org/r/20201005084130.143273-3-hch@lst.de Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# 2ba87c43	05-Oct-2020	Christoph Hellwig <hch@lst.de>	scsi: core: Don't export scsi_device_from_queue() This function is only used by code built into scsi_mod.ko. Link: https://lore.kernel.org/r/20201005084130.143273-2-hch@lst.de Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# bdb01301	19-Aug-2020	Hannes Reinecke <hare@suse.com>	scsi: Add host and host template flag 'host_tagset' Add Host and host template flag 'host_tagset' so hostwide tagset can be shared on multiple reply queues after the SCSI device's reply queue is converted to blk-mq hw queue. [jpg: Update comment on .can_queue and add Scsi_Host.host_tagset] Signed-off-by: Hannes Reinecke <hare@suse.com> Signed-off-by: John Garry <john.garry@huawei.com> Tested-by: Don Brace<don.brace@microsemi.com> #SCSI resv cmds patches used Tested-by: Douglas Gilbert <dgilbert@interlog.com> Acked-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
# 2a242d59	01-Oct-2020	Mike Christie <michael.christie@oracle.com>	scsi: core: Add limitless cmd retry support Add infinite retry support to SCSI midlayer by combining common checks for retries into some helper functions, and then checking for the -1/SCSI_CMD_RETRIES_NO_LIMIT. Link: https://lore.kernel.org/r/1601566554-26752-2-git-send-email-michael.christie@oracle.com Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Mike Christie <michael.christie@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# ed5dd6a6	10-Sep-2020	Ming Lei <ming.lei@redhat.com>	scsi: core: Only re-run queue in scsi_end_request() if device queue is busy The request queue is currently run unconditionally in scsi_end_request() if both target queue and host queue are ready. Recently Long Li reported that cost of a queue run can be very heavy in case of high queue depth. Improve this situation by only running the request queue when this LUN is busy. Link: https://lore.kernel.org/r/20200910075056.36509-1-ming.lei@redhat.com Reported-by: Long Li <longli@microsoft.com> Tested-by: Long Li <longli@microsoft.com> Tested-by: Kashyap Desai <kashyap.desai@broadcom.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Ewan D. Milne <emilne@redhat.com> Reviewed-by: John Garry <john.garry@huawei.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# df561f66	23-Aug-2020	Gustavo A. R. Silva <gustavoars@kernel.org>	treewide: Use fallthrough pseudo-keyword Replace the existing /* fall through */ comments and its variants with the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary fall-through markings when it is the case. [1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
# f30785db	17-Jul-2020	Ye Bin <yebin10@huawei.com>	scsi: core: Add missing scsi_device_put() in scsi_host_block() The scsi_host_block() case was missing in commit 4dea170f4fb2 ("scsi: core: Fix incorrect usage of shost_for_each_device"). Link: https://lore.kernel.org/r/20200717090921.29243-1-yebin10@huawei.com Fixes: 2bb955840c1d ("scsi: core: add scsi_host_(block,unblock) helper function") Fixes: 4dea170f4fb2 ("scsi: core: Fix incorrect usage of shost_for_each_device") Signed-off-by: Ye Bin <yebin10@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# 3f0dcfbc	19-Jul-2020	Ming Lei <ming.lei@redhat.com>	scsi: core: Run queue in case of I/O resource contention failure I/O requests may be held in scheduler queue because of resource contention. The starvation scenario was handled properly in the regular completion path but we failed to account for it during I/O submission. This lead to the hang captured below. Make sure we run the queue when resource contention is encountered in the submission path. [ 39.054963] scsi 13:0:0:0: rejecting I/O to dead device [ 39.058700] scsi 13:0:0:0: rejecting I/O to dead device [ 39.087855] sd 13:0:0:1: [sdd] Synchronizing SCSI cache [ 39.088909] scsi 13:0:0:1: rejecting I/O to dead device [ 39.095351] scsi 13:0:0:1: rejecting I/O to dead device [ 39.096962] scsi 13:0:0:1: rejecting I/O to dead device [ 247.021859] INFO: task scsi-stress-rem:813 blocked for more than 122 seconds. [ 247.023258] Not tainted 5.8.0-rc2 #8 [ 247.024069] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 247.025331] scsi-stress-rem D 0 813 802 0x00004000 [ 247.025334] Call Trace: [ 247.025354] __schedule+0x504/0x55f [ 247.027987] schedule+0x72/0xa8 [ 247.027991] blk_mq_freeze_queue_wait+0x63/0x8c [ 247.027994] ? do_wait_intr_irq+0x7a/0x7a [ 247.027996] blk_cleanup_queue+0x4b/0xc9 [ 247.028000] __scsi_remove_device+0xf6/0x14e [ 247.028002] scsi_remove_device+0x21/0x2b [ 247.029037] sdev_store_delete+0x58/0x7c [ 247.029041] kernfs_fop_write+0x10d/0x14f [ 247.031281] vfs_write+0xa2/0xdf [ 247.032670] ksys_write+0x6b/0xb3 [ 247.032673] do_syscall_64+0x56/0x82 [ 247.034053] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 247.034059] RIP: 0033:0x7f69f39e9008 [ 247.036330] Code: Bad RIP value. [ 247.036331] RSP: 002b:00007ffdd8116498 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 [ 247.037613] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f69f39e9008 [ 247.039714] RDX: 0000000000000002 RSI: 000055cde92a0ab0 RDI: 0000000000000001 [ 247.039715] RBP: 000055cde92a0ab0 R08: 000000000000000a R09: 00007f69f3a79e80 [ 247.039716] R10: 000000000000000a R11: 0000000000000246 R12: 00007f69f3abb780 [ 247.039717] R13: 0000000000000002 R14: 00007f69f3ab6740 R15: 0000000000000002 Link: https://lore.kernel.org/r/20200720025435.812030-1-ming.lei@redhat.com Cc: linux-block@vger.kernel.org Cc: Christoph Hellwig <hch@lst.de> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# 65c76369	30-Jun-2020	Ming Lei <ming.lei@redhat.com>	blk-mq: pass request queue into get/put budget callback blk-mq budget is abstract from scsi's device queue depth, and it is always per-request-queue instead of hctx. It can be quite absurd to get a budget from one hctx, then dequeue a request from scheduler queue, and this request may not belong to this hctx, at least for bfq and deadline. So fix the mess and always pass request queue to get/put budget callback. Signed-off-by: Ming Lei <ming.lei@redhat.com> Tested-by: Baolin Wang <baolin.wang7@gmail.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Douglas Anderson <dianders@chromium.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Cc: Sagi Grimberg <sagi@grimberg.me> Cc: Baolin Wang <baolin.wang7@gmail.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Douglas Anderson <dianders@chromium.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
# 15f73f5b	11-Jun-2020	Christoph Hellwig <hch@lst.de>	blk-mq: move failure injection out of blk_mq_complete_request Move the call to blk_should_fake_timeout out of blk_mq_complete_request and into the drivers, skipping call sites that are obvious error handlers, and remove the now superflous blk_mq_force_complete_rq helper. This ensures we don't keep injecting errors into completions that just terminate the Linux request after the hardware has been reset or the command has been aborted. Reviewed-by: Daniel Wagner <dwagner@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
# 4c7b4d63	19-Jun-2020	Bean Huo <beanhuo@micron.com>	scsi: core: Fix formatting errors in scsi_lib.c Delete trailing whitespace, multiple blank lines, and make switch/case be at the same indentation. Link: https://lore.kernel.org/r/20200619154117.10262-3-huobean@gmail.com Signed-off-by: Bean Huo <beanhuo@micron.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# 71df6fb9	19-Jun-2020	Bean Huo <beanhuo@micron.com>	scsi: core: Remove scsi_sdb_cache After commit f664a3cc17b7 ("scsi: kill off the legacy IO path"), scsi_sdb_cache is not used anymore. Remove it. Link: https://lore.kernel.org/r/20200619154117.10262-2-huobean@gmail.com Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Bean Huo <beanhuo@micron.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# 840e1b55	18-May-2020	Ye Bin <yebin10@huawei.com>	scsi: core: Refactor scsi_mq_setup_tags function shost->tag_set is used too many times, introduce temporary parameter tag_set instead of &shost->tag_set. Link: https://lore.kernel.org/r/20200518074732.39679-1-yebin10@huawei.com Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Ye Bin <yebin10@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# 4dea170f	18-May-2020	Ye Bin <yebin10@huawei.com>	scsi: core: Fix incorrect usage of shost_for_each_device shost_for_each_device(sdev, shost) \ for ((sdev) = __scsi_iterate_devices((shost), NULL); \ (sdev); \ (sdev) = __scsi_iterate_devices((shost), (sdev))) When terminating shost_for_each_device() iteration with break or return, scsi_device_put() should be used to prevent stale scsi device references from being left behind. Link: https://lore.kernel.org/r/20200518074420.39275-1-yebin10@huawei.com Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Ye Bin <yebin10@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# 0512a75b	12-May-2020	Keith Busch <kbusch@kernel.org>	block: Introduce REQ_OP_ZONE_APPEND Define REQ_OP_ZONE_APPEND to append-write sectors to a zone of a zoned block device. This is a no-merge write operation. A zone append write BIO must: * Target a zoned block device * Have a sector position indicating the start sector of the target zone * The target zone must be a sequential write zone * The BIO must not cross a zone boundary * The BIO size must not be split to ensure that a single range of LBAs is written with a single command. Implement these checks in generic_make_request_checks() using the helper function blk_check_zone_append(). To avoid write append BIO splitting, introduce the new max_zone_append_sectors queue limit attribute and ensure that a BIO size is always lower than this limit. Export this new limit through sysfs and check these limits in bio_full(). Also when a LLDD can't dispatch a request to a specific zone, it will return BLK_STS_ZONE_RESOURCE indicating this request needs to be delayed, e.g. because the zone it will be dispatched to is still write-locked. If this happens set the request aside in a local list to continue trying dispatching requests such as READ requests or a WRITE/ZONE_APPEND requests targetting other zones. This way we can still keep a high queue depth without starving other requests even if one request can't be served due to zone write-locking. Finally, make sure that the bio sector position indicates the actual write position as indicated by the device on completion. Signed-off-by: Keith Busch <kbusch@kernel.org> [ jth: added zone-append specific add_page and merge_page helpers ] Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
# 20a66f2b	28-Apr-2020	Johannes Thumshirn <johannes.thumshirn@wdc.com>	scsi: core: free sgtables in case command setup fails In case scsi_setup_fs_cmnd() fails we're not freeing the sgtables allocated by scsi_init_io(), thus we leak the allocated memory. Free the sgtables allocated by scsi_init_io() in case scsi_setup_fs_cmnd() fails. Technically scsi_setup_scsi_cmnd() does not suffer from this problem as it can only fail if scsi_init_io() fails, so it does not have sgtables allocated. But to maintain symmetry and as a measure of defensive programming, free the sgtables on scsi_setup_scsi_cmnd() failure as well. scsi_mq_free_sgtables() has safeguards against double-freeing of memory so this is safe to do. While we're at it, rename scsi_mq_free_sgtables() to scsi_free_sgtables(). Link: https://bugzilla.kernel.org/show_bug.cgi?id=205595 Link: https://lore.kernel.org/r/20200428104605.8143-2-johannes.thumshirn@wdc.com Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Daniel Wagner <dwagner@suse.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# ea941016	18-Apr-2020	André Almeida <andrealmeid@collabora.com>	scsi: core: doc: Change function comments to kernel-doc style Despite of functions being documented, they are not in the kernel-doc specification, and could not be included in kernel documentation. Change the style of functions comments to be compliant to the kernel-doc style. When the function comments are outdated, update then. [mkp: a few edits] Link: https://lore.kernel.org/r/20200419050148.33371-1-andrealmeid@collabora.com Signed-off-by: André Almeida <andrealmeid@collabora.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# f983622a	22-Apr-2020	Ming Lei <ming.lei@redhat.com>	scsi: core: Avoid calling synchronize_rcu() for each device in scsi_host_block() scsi_host_block() calls scsi_internal_device_block() for each scsi_device and scsi_internal_device_block() calls blk_mq_quiesce_queue() for each LUN. Since synchronize_rcu() is called from blk_mq_quiesce_queue(), this can cause substantial slowdowns on systems with many LUNs. Use scsi_internal_device_block_nowait() to implement scsi_host_block() so it is sufficient to run synchronize_rcu() once. This is safe since SCSI does not set the BLK_MQ_F_BLOCKING flag. [mkp: commit desc and comment tweaks] Link: https://lore.kernel.org/r/20200423020713.332743-1-ming.lei@redhat.com Cc: Steffen Maier <maier@linux.ibm.com> Cc: Bart Van Assche <bvanassche@acm.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Dexuan Cui <decui@microsoft.com> Cc: Hannes Reinecke <hare@suse.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# bdf8710d	14-Apr-2020	Christoph Hellwig <hch@lst.de>	block: move dma_pad handling from blk_rq_map_sg into the callers There are only two callers of blk_rq_map_sg/__blk_rq_map_sg that set the dma_pad value in the queue. Move the handling into those callers instead of burdening the common code, and move the ->extra_len field from struct request to struct scsi_cmnd. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
# cc97923a	14-Apr-2020	Christoph Hellwig <hch@lst.de>	block: move dma drain handling to scsi Don't burden the common block code with with specifics of the libata DMA draining mechanism. Instead move most of the code to the scsi midlayer. That also means the nr_phys_segments adjustments in the blk-mq fast path can go away entirely, given that SCSI never looks at nr_phys_segments after mapping the request to a scatterlist. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
# 0475bd6c	14-Apr-2020	Christoph Hellwig <hch@lst.de>	scsi: merge scsi_init_sgtable into scsi_init_io scsi_init_io is the only caller of scsi_init_sgtable. Merge the two function to make upcoming changes a little easier. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
# 6cbb7aed	17-Apr-2020	Dexuan Cui <decui@microsoft.com>	scsi: core: Allow the state change from SDEV_QUIESCE to SDEV_BLOCK The APIs scsi_host_block()/scsi_host_unblock() were recently added by commit 2bb955840c1d ("scsi: core: add scsi_host_(block,unblock) helper function") and so far the APIs are only used by: commit 3d3ca53b1639 ("scsi: aacraid: use scsi_host_(block,unblock) to block I/O"). However, from reading the code, I think the APIs don't really work for aacraid, because, in the resume path of hibernation, when aac_suspend() -> scsi_host_block() is called, scsi_device_quiesce() has set the state to SDEV_QUIESCE, so aac_suspend() -> scsi_host_block() returns -EINVAL. Fix the issue by allowing the state change. Link: https://lore.kernel.org/r/1587170445-50013-1-git-send-email-decui@microsoft.com Fixes: 2bb955840c1d ("scsi: core: add scsi_host_(block,unblock) helper function") Reviewed-by: Ewan D. Milne <emilne@redhat.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Dexuan Cui <decui@microsoft.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# b4fd63f4	20-Apr-2020	Douglas Anderson <dianders@chromium.org>	Revert "scsi: core: run queue if SCSI device queue isn't ready and queue is idle" This reverts commit 7e70aa789d4a0c89dbfbd2c8a974a4df717475ec. Now that we have the patches ("blk-mq: In blk_mq_dispatch_rq_list() "no budget" is a reason to kick") and ("blk-mq: Rerun dispatching in the case of budget contention") we should no longer need the fix in the SCSI code. Revert it, resolving conflicts with other patches that have touched this code. With this revert (and the two new patches) I can run the script that was in commit 7e70aa789d4a ("scsi: core: run queue if SCSI device queue isn't ready and queue is idle") in a loop with no failure. If I do this revert without the two new patches I can easily get a failure. Signed-off-by: Douglas Anderson <dianders@chromium.org> Reviewed-by: Ming Lei <ming.lei@redhat.com> Acked-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
# b0962c53	11-Mar-2020	Ewan D. Milne <emilne@redhat.com>	scsi: core: avoid repetitive logging of device offline messages Large queues of I/O to offline devices that are eventually submitted when devices are unblocked result in a many repeated "rejecting I/O to offline device" messages. These messages can fill up the dmesg buffer in crash dumps so no useful prior messages remain. In addition, if a serial console is used, the flood of messages can cause a hard lockup in the console code. Introduce a flag indicating the message has already been logged for the device, and reset the flag when scsi_device_set_state() changes the device state. Link: https://lore.kernel.org/r/20200311143930.20674-1-emilne@redhat.com Reviewed-by: Bart van Assche <bvanassche@acm.org> Signed-off-by: Ewan D. Milne <emilne@redhat.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# 65ca846a	22-Jan-2020	Bart Van Assche <bvanassche@acm.org>	scsi: core: Introduce {init,exit}_cmd_priv() The current behavior of the SCSI core is to clear driver-private data before preparing a request for submission to the SCSI LLD. Make it possible for SCSI LLDs to disable clearing of driver-private data. These hooks will be used by a later patch, namely "scsi: ufs: Let the SCSI core allocate per-command UFS data". Link: https://lore.kernel.org/r/20200123035637.21848-2-bvanassche@acm.org Cc: Tomas Winkler <tomas.winkler@intel.com> Cc: Stanley Chu <stanley.chu@mediatek.com> Cc: Bean Huo <beanhuo@micron.com> Cc: Avri Altman <avri.altman@wdc.com> Cc: Can Guo <cang@codeaurora.org> Cc: Ming Lei <ming.lei@redhat.com> Cc: Johannes Thumshirn <jth@kernel.org> Cc: Hannes Reinecke <hare@suse.com> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# c5a97076	28-Feb-2020	Hannes Reinecke <hare@suse.de>	scsi: core: Remove cmd_list functionality Remove cmd_list functionality; no users left. With that the scsi_put_command() becomes empty, so remove that one, too. Link: https://lore.kernel.org/r/20200228075318.91255-14-hare@suse.de Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bart van Assche <bvanassche@acm.org> Signed-off-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# 2bb95584	28-Feb-2020	Hannes Reinecke <hare@suse.de>	scsi: core: add scsi_host_(block,unblock) helper function Add helper functions to call scsi_internal_device_block()/ scsi_internal_device_unblock() for all attached devices on a SCSI host. Link: https://lore.kernel.org/r/20200228075318.91255-9-hare@suse.de Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# 0ec96913	04-Dec-2019	Can Guo <cang@codeaurora.org>	scsi: core: Adjust DBD setting in MODE SENSE for caching mode page per LLD UFS JEDEC standards require DBD field to be set to 1 in mode sense command. This patch allows LLD to define the setting of DBD, if required. Link: https://lore.kernel.org/r/0101016ed3d643f9-ffd45d6c-c593-4a13-a18f-a32da3d3bb97-000000@us-west-2.amazonses.com Reviewed-by: Avri Altman <avri.altman@wdc.com> Signed-off-by: Can Guo <cang@codeaurora.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# 9393c8de	04-Nov-2019	Michael Schmitz <schmitzmic@gmail.com>	scsi: core: Handle drivers which set sg_tablesize to zero In scsi_mq_setup_tags(), cmd_size is calculated based on zero size for the scatter-gather list in case the low level driver uses SG_NONE in its host template. cmd_size is passed on to the block layer for calculation of the request size, and we've seen NULL pointer dereference errors from the block layer in drivers where SG_NONE is used and a mq IO scheduler is active, apparently as a consequence of this (see commit 68ab2d76e4be ("scsi: cxlflash: Set sg_tablesize to 1 instead of SG_NONE"), and a recent patch by Finn Thain converting the three m68k NFR5380 drivers to avoid setting SG_NONE). Try to avoid these errors by accounting for at least one sg list entry when calculating cmd_size, regardless of whether the low level driver set a zero sg_tablesize. Tested on 030 m68k with the atari_scsi driver - setting sg_tablesize to SG_NONE no longer results in a crash when loading this driver. CC: Finn Thain <fthain@telegraphics.com.au> Link: https://lore.kernel.org/r/1572922150-4358-1-git-send-email-schmitzmic@gmail.com Signed-off-by: Michael Schmitz <schmitzmic@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# 6eb045e0	25-Oct-2019	Ming Lei <ming.lei@redhat.com>	scsi: core: avoid host-wide host_busy counter for scsi_mq It isn't necessary to check the host depth in scsi_queue_rq() any more since it has been respected by blk-mq before calling scsi_queue_rq() via getting driver tag. Lots of LUNs may attach to same host and per-host IOPS may reach millions, so we should avoid expensive atomic operations on the host-wide counter in the IO path. This patch implements scsi_host_busy() via blk_mq_tagset_busy_iter() with one scsi command state for reading the count of busy IOs for scsi_mq. It is observed that IOPS is increased by 15% in IO test on scsi_debug (32 LUNs, 32 submit queues, 1024 can_queue, libaio/dio) in a dual-socket system. Cc: Jens Axboe <axboe@kernel.dk> Cc: Ewan D. Milne <emilne@redhat.com> Cc: Omar Sandoval <osandov@fb.com>, Cc: "Martin K. Petersen" <martin.petersen@oracle.com>, Cc: James Bottomley <james.bottomley@hansenpartnership.com>, Cc: Christoph Hellwig <hch@lst.de>, Cc: Kashyap Desai <kashyap.desai@broadcom.com> Cc: Hannes Reinecke <hare@suse.de> Cc: Laurence Oberman <loberman@redhat.com> Cc: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20191025065855.6309-1-ming.lei@redhat.com Signed-off-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Jens Axboe <axboe@kernel.dk> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# 6b6fa7a5	07-Aug-2019	Steffen Maier <maier@linux.ibm.com>	scsi: core: fix dh and multipathing for SCSI hosts without request batching This was missing from scsi_device_from_queue() due to the introduction of another new scsi_mq_ops_no_commit of linux-next commit 8930a6c20791 ("scsi: core: add support for request batching") from Martin's scsi/5.4/scsi-queue or James' scsi/misc. Only devicehandler code seems to call scsi_device_from_queue(): *** drivers/scsi/scsi_dh.c: scsi_dh_activate[255] sdev = scsi_device_from_queue(q); scsi_dh_set_params[302] sdev = scsi_device_from_queue(q); scsi_dh_attach[325] sdev = scsi_device_from_queue(q); scsi_dh_attached_handler_name[363] sdev = scsi_device_from_queue(q); Fixes multipath tools follow-on errors: $ multipath -v6 ... libdevmapper: ioctl/libdm-iface.c(1887): device-mapper: reload ioctl on mpatha failed: No such device ... mpatha: failed to load map, error 19 ... showing also as kernel messages: device-mapper: table: 252:0: multipath: error attaching hardware handler device-mapper: ioctl: error adding target to table Signed-off-by: Steffen Maier <maier@linux.ibm.com> Fixes: 8930a6c20791 ("scsi: core: add support for request batching") Cc: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# 82a9ac71	07-Aug-2019	Steffen Maier <maier@linux.ibm.com>	scsi: core: fix missing .cleanup_rq for SCSI hosts without request batching This was missing from scsi_mq_ops_no_commit of linux-next commit 8930a6c20791 ("scsi: core: add support for request batching") from Martin's scsi/5.4/scsi-queue or James' scsi/misc. See also linux-next commit b7e9e1fb7a92 ("scsi: implement .cleanup_rq callback") from block/for-next. Signed-off-by: Steffen Maier <maier@linux.ibm.com> Fixes: 8930a6c20791 ("scsi: core: add support for request batching") Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Ming Lei <ming.lei@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# 94ef80a5	01-Aug-2019	Bart Van Assche <bvanassche@acm.org>	scsi: core: Complain if scsi_target_block() fails If scsi_target_block() fails that can break the code that calls this function. Hence complain loudly if scsi_target_block() fails. Cc: Christoph Hellwig <hch@lst.de> Cc: Hannes Reinecke <hare@suse.com> Cc: Johannes Thumshirn <jthumshirn@suse.de> Cc: Ming Lei <ming.lei@redhat.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# 09addb1d	01-Aug-2019	Bart Van Assche <bvanassche@acm.org>	scsi: core: Make scsi_internal_device_unblock_nowait() reject invalid new_state The only 'new_state' values passed by upstream kernel code to scsi_internal_device_unblock_nowait() are SDEV_RUNNING and SDEV_TRANSPORT_OFFLINE. These are the only values that should be passed to this function. Hence check the value of the 'new_state' argument to avoid that scsi_internal_device_unblock_nowait() would be used to trigger an illegal SCSI device state transition. In this context 'illegal' means not allowed by scsi_device_set_state(). Cc: Christoph Hellwig <hch@lst.de> Cc: Hannes Reinecke <hare@suse.com> Cc: Johannes Thumshirn <jthumshirn@suse.de> Cc: Ming Lei <ming.lei@redhat.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# b7e9e1fb	24-Jul-2019	Ming Lei <ming.lei@redhat.com>	scsi: implement .cleanup_rq callback Implement .cleanup_rq() callback for freeing driver private part of the request. Then we can avoid to leak this part if the request isn't completed by SCSI, and freed by blk-mq or upper layer(such as dm-rq) finally. Cc: Ewan D. Milne <emilne@redhat.com> Cc: Bart Van Assche <bvanassche@acm.org> Cc: Hannes Reinecke <hare@suse.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Mike Snitzer <snitzer@redhat.com> Cc: dm-devel@redhat.com Cc: <stable@vger.kernel.org> Fixes: 396eaf21ee17 ("blk-mq: improve DM's blk-mq IO merging via blk_insert_cloned_request feedback") Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
# 1b5d9a6e	22-Jul-2019	Christoph Hellwig <hch@lst.de>	scsi: core: fix the dma_max_mapping_size call We should only call dma_max_mapping_size for devices that have a DMA mask set, otherwise we can run into a NULL pointer dereference that will crash the system. Also we need to do right shift to get the sectors from the size in bytes, not a left shift. Fixes: bdd17bdef7d8 ("scsi: core: take the DMA max mapping size into account") Reported-by: Bart Van Assche <bvanassche@acm.org> Reported-by: Ming Lei <tom.leiming@gmail.com> Tested-by: Guilherme G. Piccoli <gpiccoli@canonical.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# 8930a6c2	30-May-2019	Paolo Bonzini <pbonzini@redhat.com>	scsi: core: add support for request batching This allows a list of requests to be issued, with the LLD only writing the hardware doorbell when necessary, after the last request was prepared. This is more efficient if we have lists of requests to issue, particularly on virtualized hardware, where writing the doorbell is more expensive than on real hardware. The use case for this is plugged IO, where blk-mq flushes a batch of requests all at once. The API is the same as for blk-mq, just with blk-mq concepts tweaked to fit the SCSI subsystem API: the "last" flag in blk_mq_queue_data becomes a flag in scsi_cmnd, while the queue_num in the commit_rqs callback is extracted from the hctx and passed as a parameter. The only complication is that blk-mq uses different plugging heuristics depending on whether commit_rqs is present or not. So we have two different sets of blk_mq_ops and pick one depending on whether the scsi_host template uses commit_rqs or not. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# bdd17bde	17-Jun-2019	Christoph Hellwig <hch@lst.de>	scsi: core: take the DMA max mapping size into account We need to limit the device's max_sectors to what the DMA mapping implementation can support. If not, we risk running out of swiotlb buffers easily. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# 7ad388d8	17-Jun-2019	Christoph Hellwig <hch@lst.de>	scsi: core: add a host / host template field for the virt boundary This allows drivers setting it up easily instead of branching out to block layer calls in slave_alloc, and ensures the upgraded max_segment_size setting gets picked up by the DMA layer. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Kashyap Desai < kashyap.desai@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# f9b0530f	11-Jul-2019	Ming Lei <ming.lei@redhat.com>	scsi: core: Fix race on creating sense cache When scsi_init_sense_cache(host) is called concurrently from different hosts, each code path may find that no cache has been created and allocate a new one. The lack of locking can lead to potentially overriding a cache allocated by a different host. Fix the issue by moving 'mutex_lock(&scsi_sense_cache_mutex)' before scsi_select_sense_cache(). Fixes: 0a6ac4ee7c21 ("scsi: respect unchecked_isa_dma for blk-mq") Cc: Stable <stable@vger.kernel.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Hannes Reinecke <hare@suse.com> Cc: Ewan D. Milne <emilne@redhat.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# 463cdad8	02-Jul-2019	Maurizio Lombardi <mlombard@redhat.com>	scsi: core: use scmd_printk() to print which command timed out With a possibly faulty disk the following messages may appear in the logs: kernel: sd 0:0:9:0: timing out command, waited 180s kernel: sd 0:0:9:0: timing out command, waited 20s kernel: sd 0:0:9:0: timing out command, waited 20s kernel: sd 0:0:9:0: timing out command, waited 60s kernel: sd 0:0:9:0: timing out command, waited 20s This is not very informative because it's not possible to identify the command that timed out. This patch replaces sdev_printk() with scmd_printk(). Signed-off-by: Maurizio Lombardi <mlombard@redhat.com> Reviewed-by: Ewan D. Milne <emilne@redhat.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# bbe9fb0d	17-Jun-2019	Bart Van Assche <bvanassche@acm.org>	scsi: Avoid that .queuecommand() gets called for a blocked SCSI device Several SCSI transport and LLD drivers surround code that does not tolerate concurrent calls of .queuecommand() with scsi_target_block() / scsi_target_unblock(). These last two functions use blk_mq_quiesce_queue() / blk_mq_unquiesce_queue() for scsi-mq request queues to prevent concurrent .queuecommand() calls. However, that is not sufficient to prevent .queuecommand() calls from scsi_send_eh_cmnd(). Hence surround the .queuecommand() call from the SCSI error handler with code that avoids that .queuecommand() gets called in the blocked state. Note: converting the .queuecommand() call in scsi_send_eh_cmnd() into code that calls blk_get_request() + blk_execute_rq() is not an option since scsi_send_eh_cmnd() must be able to make forward progress even if all requests have been allocated. Cc: Ming Lei <ming.lei@redhat.com> Cc: Hannes Reinecke <hare@suse.de> Cc: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# 3e99b3b1	06-Jun-2019	Ming Lei <ming.lei@redhat.com>	scsi: core: don't preallocate small SGL in case of NO_SG_CHAIN The preallocated small SGL depends on SG_CHAIN so if the ARCH doesn't support SG_CHAIN, preallocation of small SGL can't work at all. Fix this issue by not using small preallocation in case of NO_SG_CHAIN. Cc: Christoph Hellwig <hch@lst.de> Cc: Bart Van Assche <bvanassche@acm.org> Cc: Ewan D. Milne <emilne@redhat.com> Cc: Hannes Reinecke <hare@suse.com> Cc: Guenter Roeck <linux@roeck-us.net> Reported-by: Guenter Roeck <linux@roeck-us.net> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Tested-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# 3dccdf53	28-Apr-2019	Ming Lei <ming.lei@redhat.com>	scsi: core: avoid preallocating big SGL for data scsi_mq_setup_tags() preallocates a big buffer for the IO SGL. The size is based on scsi_mq_sgl_size() which is determined based on shost->sg_tablesize and SG_CHUNK_SIZE. Modern DMA engines are often capable of dealing with very big segments so the resulting scsi_mq_sgl_size() is often too big. SG_CHUNK_SIZE results in a static 4KB SGL allocation per command. If an HBA has lots of deep queues, preallocation for the sg list can consume substantial amounts of memory. For lpfc, nr_hw_queues can be 70 and each queue's depth 3781. This means the resulting preallocation for the data SGL is 7037812K = 517MB. Switch to runtime allocation for SGL for lists longer than 2 entries. This is the approach used by NVMe PCI so it should be reasonable for SCSI as well. Runtime SGL allocation has always been the case for the legacy I/O path so this is nothing new. [mkp: attempted to clarify commit desc] Cc: Christoph Hellwig <hch@lst.de> Cc: Bart Van Assche <bvanassche@acm.org> Cc: Ewan D. Milne <emilne@redhat.com> Cc: Hannes Reinecke <hare@suse.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# 92524fa1	28-Apr-2019	Ming Lei <ming.lei@redhat.com>	scsi: core: avoid preallocating big SGL for protection information scsi_mq_setup_tags() currently preallocates a big buffer for protection SGL entries. scsi_mq_sgl_size() is used to determine the size for both data and protection information scatterlists but the protection buffer is usually much smaller. For example, one 512-byte sector needs 8 bytes of protection information. Given that the maximum number of sectors for one request is 2560 (BLK_DEF_MAX_SECTORS) sectors, the max protection information buffer size is just 20K. The protection information segment count generally matches the number of bios in the request. As a result, the typical actual number of segments won't be very big. And should the need arise, allocating a bigger SGL from slab is fast enough. Pre-allocate only one SGL entry for protection information and switch to runtime allocation in case that the protection information segment number is bigger than 1. This reduces memory tied up by static command allocations. For example, 500+ MB is saved on single lpfc HBA. [mkp: attempted to clarify commit desc] Cc: Christoph Hellwig <hch@lst.de> Cc: Bart Van Assche <bvanassche@acm.org> Cc: Ewan D. Milne <emilne@redhat.com> Cc: Hannes Reinecke <hare@suse.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# 4635873c	28-Apr-2019	Ming Lei <ming.lei@redhat.com>	scsi: lib/sg_pool.c: improve APIs for allocating sg pool sg_alloc_table_chained() currently allows the caller to provide one preallocated SGL and returns if the requested number isn't bigger than size of that SGL. This is used to inline an SGL for an IO request. However, scattergather code only allows that size of the 1st preallocated SGL to be SG_CHUNK_SIZE(128). This means a substantial amount of memory (4KB) is claimed for the SGL for each IO request. If the I/O is small, it would be prudent to allocate a smaller SGL. Introduce an extra parameter to sg_alloc_table_chained() and sg_free_table_chained() for specifying size of the preallocated SGL. Both __sg_free_table() and __sg_alloc_table() assume that each SGL has the same size except for the last one. Change the code to allow both functions to accept a variable size for the 1st preallocated SGL. [mkp: attempted to clarify commit desc] Cc: Christoph Hellwig <hch@lst.de> Cc: Bart Van Assche <bvanassche@acm.org> Cc: Ewan D. Milne <emilne@redhat.com> Cc: Hannes Reinecke <hare@suse.com> Cc: Sagi Grimberg <sagi@grimberg.me> Cc: Chuck Lever <chuck.lever@oracle.com> Cc: netdev@vger.kernel.org Cc: linux-nvme@lists.infradead.org Suggested-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# 026104bf	30-Apr-2019	Christoph Hellwig <hch@lst.de>	scsi: core: add SPDX tags to scsi midlayer files missing licensing information Add the default kernel GPLv2 annotation to SCSI midlayer files missing any licensing information. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# 457c8996	19-May-2019	Thomas Gleixner <tglx@linutronix.de>	treewide: Add SPDX license identifier for missed files Add SPDX license identifiers to all files which: - Have no license information of any form - Have EXPORT_.*_SYMBOL_GPL inside which was used in the initial scan/conversion to ignore the file These files fall under the project license, GPL v2 only. The resulting SPDX license identifier is: GPL-2.0-only Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
# be549d49	09-Apr-2019	Jaesoo Lee <jalee@purestorage.com>	scsi: core: set result when the command cannot be dispatched When SCSI blk-mq is enabled, there is a bug in handling errors in scsi_queue_rq. Specifically, the bug is not setting result field of scsi_request correctly when the dispatch of the command has been failed. Since the upper layer code including the sg_io ioctl expects to receive any error status from result field of scsi_request, the error is silently ignored and this could cause data corruptions for some applications. Fixes: d285203cf647 ("scsi: add support for a blk-mq based I/O path.") Cc: <stable@vger.kernel.org> Signed-off-by: Jaesoo Lee <jalee@purestorage.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# 18c4f0a4	11-Apr-2019	Ming Lei <ming.lei@redhat.com>	scsi: core: don't hold device refcount in IO path scsi_device's refcount is always grabbed in IO path. Turns out it isn't necessary, because blk_queue_cleanup() will drain any in-flight IOs, then cancel timeout/requeue work, and SCSI's requeue_work is canceled too in __scsi_remove_device(). Also scsi_device won't go away until blk_cleanup_queue() is done. So don't hold the refcount in IO path, especially the refcount isn't required in IO path since blk_queue_enter() / blk_queue_exit() is introduced in the legacy block layer. Cc: Dongli Zhang <dongli.zhang@oracle.com> Cc: James Smart <james.smart@broadcom.com> Cc: Bart Van Assche <bart.vanassche@wdc.com> Cc: linux-scsi@vger.kernel.org, Cc: Martin K . Petersen <martin.petersen@oracle.com>, Cc: Christoph Hellwig <hch@lst.de>, Cc: James E . J . Bottomley <jejb@linux.vnet.ibm.com>, Cc: jianchao wang <jianchao.w.wang@oracle.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# 99bbf484	11-Mar-2019	Dongli Zhang <dongli.zhang@oracle.com>	scsi: core: Use HCTX_TYPE_DEFAULT for blk_mq_tag_set->map Use HCTX_TYPE_DEFAULT instead of 0 to avoid hardcoding. Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# 17605afa	15-Mar-2019	Bart Van Assche <bvanassche@acm.org>	scsi: core: Avoid that a kernel warning appears during system resume Since scsi_device_quiesce() skips SCSI devices that have another state than RUNNING, OFFLINE or TRANSPORT_OFFLINE, scsi_device_resume() should not complain about SCSI devices that have been skipped. Hence this patch. This patch avoids that the following warning appears during resume: WARNING: CPU: 3 PID: 1039 at blk_clear_pm_only+0x2a/0x30 CPU: 3 PID: 1039 Comm: kworker/u8:49 Not tainted 5.0.0+ #1 Hardware name: LENOVO 4180F42/4180F42, BIOS 83ET75WW (1.45 ) 05/10/2013 Workqueue: events_unbound async_run_entry_fn RIP: 0010:blk_clear_pm_only+0x2a/0x30 Call Trace: ? scsi_device_resume+0x28/0x50 ? scsi_dev_type_resume+0x2b/0x80 ? async_run_entry_fn+0x2c/0xd0 ? process_one_work+0x1f0/0x3f0 ? worker_thread+0x28/0x3c0 ? process_one_work+0x3f0/0x3f0 ? kthread+0x10c/0x130 ? __kthread_create_on_node+0x150/0x150 ? ret_from_fork+0x1f/0x30 Cc: Christoph Hellwig <hch@lst.de> Cc: Hannes Reinecke <hare@suse.com> Cc: Ming Lei <ming.lei@redhat.com> Cc: Johannes Thumshirn <jthumshirn@suse.de> Cc: Oleksandr Natalenko <oleksandr@natalenko.name> Cc: Martin Steigerwald <martin@lichtvoll.de> Cc: <stable@vger.kernel.org> Reported-by: Jisheng Zhang <Jisheng.Zhang@synaptics.com> Tested-by: Jisheng Zhang <Jisheng.Zhang@synaptics.com> Fixes: 3a0a529971ec ("block, scsi: Make SCSI quiesce and resume work reliably") # v4.15 Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# db983f6e	18-Mar-2019	Bart Van Assche <bvanassche@acm.org>	scsi: core: Also call destroy_rcu_head() for passthrough requests cmd->rcu is initialized by scsi_initialize_rq(). For passthrough requests, blk_get_request() calls scsi_initialize_rq(). For filesystem requests, scsi_init_command() calls scsi_initialize_rq(). Make sure that destroy_rcu_head() is called for passthrough requests. Cc: Christoph Hellwig <hch@lst.de> Cc: Hannes Reinecke <hare@suse.com> Cc: Ewan D. Milne <emilne@redhat.com> Cc: Johannes Thumshirn <jthumshirn@suse.de> Reported-by: Ewan D. Milne <emilne@redhat.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# b9cef509	26-Feb-2019	Hannes Reinecke <hare@suse.com>	scsi: kill command serial number No users left, kill it. Signed-off-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# 388b4e6a	26-Feb-2019	Bart Van Assche <bvanassche@acm.org>	scsi: core: Avoid that system resume triggers a kernel warning scsi_device_quiesce() and scsi_device_resume() are called during system-wide suspend and resume. scsi_device_quiesce() only succeeds for SCSI devices that are in one of the RUNNING, OFFLINE or TRANSPORT_OFFLINE states (see also scsi_set_device_state()). This patch avoids that the following warning is triggered when resuming a system for which quiescing a SCSI device failed: WARNING: CPU: 2 PID: 11303 at drivers/scsi/scsi_lib.c:2600 scsi_device_resume+0x4f/0x58 CPU: 2 PID: 11303 Comm: kworker/u8:70 Not tainted 5.0.0-rc1+ #50 Hardware name: LENOVO 80E3/Lancer 5B2, BIOS A2CN45WW(V2.13) 08/04/2016 Workqueue: events_unbound async_run_entry_fn Call Trace: scsi_dev_type_resume+0x2e/0x60 async_run_entry_fn+0x32/0xd8 process_one_work+0x1f4/0x420 worker_thread+0x28/0x3c0 kthread+0x118/0x130 ret_from_fork+0x22/0x40 Cc: Przemek Socha <soprwa@gmail.com> Reported-by: Przemek Socha <soprwa@gmail.com> Fixes: 3a0a529971ec ("block, scsi: Make SCSI quiesce and resume work reliably") # v4.15 Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# 4a067cf8	14-Feb-2019	Martin Wilck <mwilck@suse.com>	scsi: core: reset host byte in DID_NEXUS_FAILURE case Up to 4.12, __scsi_error_from_host_byte() would reset the host byte to DID_OK for various cases including DID_NEXUS_FAILURE. Commit 2a842acab109 ("block: introduce new block status code type") replaced this function with scsi_result_to_blk_status() and removed the host-byte resetting code for the DID_NEXUS_FAILURE case. As the line set_host_byte(cmd, DID_OK) was preserved for the other cases, I suppose this was an editing mistake. The fact that the host byte remains set after 4.13 is causing problems with the sg_persist tool, which now returns success rather then exit status 24 when a RESERVATION CONFLICT error is encountered. Fixes: 2a842acab109 "block: introduce new block status code type" Signed-off-by: Martin Wilck <mwilck@suse.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# 56d18f62	15-Feb-2019	Ming Lei <ming.lei@redhat.com>	block: kill BLK_MQ_F_SG_MERGE QUEUE_FLAG_NO_SG_MERGE has been killed, so kill BLK_MQ_F_SG_MERGE too. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
# 9fa505ad	08-Feb-2019	Bart Van Assche <bvanassche@acm.org>	scsi: core: Move resid from scsi_data_buffer to scsi_cmnd This patch does not change any functionality but reduces the size of struct scsi_cmnd. Cc: Douglas Gilbert <dgilbert@interlog.com> Cc: Hannes Reinecke <hare@suse.com> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# b9f91992	08-Nov-2018	Christoph Hellwig <hch@lst.de>	scsi: stop setting up request->special No more need in a blk-mq world where the scsi command and request are allocated together. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# ae3d56d8	29-Jan-2019	Christoph Hellwig <hch@lst.de>	scsi: remove bidirectional command support No real need for bidi support once the OSD code is gone. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# cd464d83	15-Jan-2019	Bart Van Assche <bvanassche@acm.org>	scsi: core: Remove an atomic instruction from the hot path From scsi_init_command(), a function called by scsi_mq_prep_fn(): /* zero out the cmd, except for the embedded scsi_request / memset((char )cmd + sizeof(cmd->req), 0, sizeof(*cmd) - sizeof(cmd->req) + dev->host->hostt->cmd_size); In other words, scsi_mq_prep_fn() clears scsi_cmnd.flags. Hence move the clear_bit() call into the else branch, the only branch in which this code is necessary. See also commit f1342709d18a ("scsi: Do not rely on blk-mq for double completions"). Cc: Keith Busch <keith.busch@intel.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Hannes Reinecke <hare@suse.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# a8cf59a6	16-Jan-2019	Christoph Hellwig <hch@lst.de>	scsi: communicate max segment size to the DMA mapping code When a host driver sets a maximum segment size we should not only propagate that setting to the block layer, which can merge segments, but also to the DMA mapping layer which can merge segments as well. Fixes: 50c2e9107f ("scsi: introduce a max_segment_size host_template parameters") Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# 4af14d11	13-Dec-2018	Christoph Hellwig <hch@lst.de>	scsi: remove the use_clustering flag The same effects can be achieved by setting the dma_boundary to PAGE_SIZE - 1 and the max_segment_size to PAGE_SIZE, so shift those settings into the drivers. Note that in many cases the setting might be bogus, but this keeps the status quo. [mkp: fix myrs and myrb] Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# 50c2e910	13-Dec-2018	Christoph Hellwig <hch@lst.de>	scsi: introduce a max_segment_size host_template parameters This allows the host driver to indicate the maximum supported segment size in a nice an easy way, so that the driver doesn't have to worry about DMA-layer imposed limitations. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# 2a3d4eb8	13-Dec-2018	Christoph Hellwig <hch@lst.de>	scsi: flip the default on use_clustering Most SCSI drivers want to enable "clustering", that is merging of segments so that they might span more than a single page. Remove the ENABLE_CLUSTERING define, and require drivers to explicitly set DISABLE_CLUSTERING to disable this feature. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# f1342709	26-Nov-2018	Keith Busch <kbusch@kernel.org>	scsi: Do not rely on blk-mq for double completions The scsi timeout error handling had been directly updating the block layer's request state to prevent a error handling and a natural completion from completing the same request twice. Fix this layering violation by having scsi control the fate of its commands with scsi owned flags rather than use blk-mq's. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
# 8dc765d4	14-Nov-2018	Ming Lei <ming.lei@redhat.com>	SCSI: fix queue cleanup race before queue initialization is done c2856ae2f315d ("blk-mq: quiesce queue before freeing queue") has already fixed this race, however the implied synchronize_rcu() in blk_mq_quiesce_queue() can slow down LUN probe a lot, so caused performance regression. Then 1311326cf4755c7 ("blk-mq: avoid to synchronize rcu inside blk_cleanup_queue()") tried to quiesce queue for avoiding unnecessary synchronize_rcu() only when queue initialization is done, because it is usual to see lots of inexistent LUNs which need to be probed. However, turns out it isn't safe to quiesce queue only when queue initialization is done. Because when one SCSI command is completed, the user of sending command can be waken up immediately, then the scsi device may be removed, meantime the run queue in scsi_end_request() is still in-progress, so kernel panic can be caused. In Red Hat QE lab, there are several reports about this kind of kernel panic triggered during kernel booting. This patch tries to address the issue by grabing one queue usage counter during freeing one request and the following run queue. Fixes: 1311326cf4755c7 ("blk-mq: avoid to synchronize rcu inside blk_cleanup_queue()") Cc: Andrew Jones <drjones@redhat.com> Cc: Bart Van Assche <bart.vanassche@wdc.com> Cc: linux-scsi@vger.kernel.org Cc: Martin K. Petersen <martin.petersen@oracle.com> Cc: Christoph Hellwig <hch@lst.de> Cc: James E.J. Bottomley <jejb@linux.vnet.ibm.com> Cc: stable <stable@vger.kernel.org> Cc: jianchao.wang <jianchao.w.wang@oracle.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
# 4c1cb67c	09-Nov-2018	Christoph Hellwig <hch@lst.de>	scsi: return blk_status_t from device handler ->prep_fn Remove the last use of the old BLKPREP_* values, which get converted to BLK_STS_* later anyway. Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
# 159b2cbf	09-Nov-2018	Christoph Hellwig <hch@lst.de>	scsi: return blk_status_t from scsi_init_io and ->init_command Replace the old BLKPREP_* values with the BLK_STS_ ones that they are converted to later anyway. Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
# 14784565	09-Nov-2018	Christoph Hellwig <hch@lst.de>	scsi: clean up error handling in scsi_init_io There is no need to call scsi_mq_free_sgtables until we have actually allocated sgtables. Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
# 785ba83b	09-Nov-2018	Christoph Hellwig <hch@lst.de>	scsi: push blk_status_t up into scsi_setup_{fs,scsi}_cmnd This just moves the prep_to_mq calls up in preparation of further removal of BLKPREP_* usage. Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
# c092d4ec	09-Nov-2018	Christoph Hellwig <hch@lst.de>	scsi: simplify scsi_prep_state_check Return a blk_status_t directly, and make the code a little more compact by handling the fast path in the caller. Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
# ed76e329	29-Oct-2018	Jens Axboe <axboe@kernel.dk>	blk-mq: abstract out queue map This is in preparation for allowing multiple sets of maps per queue, if so desired. Reviewed-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
# f664a3cc	01-Nov-2018	Jens Axboe <axboe@kernel.dk>	scsi: kill off the legacy IO path This removes the legacy (non-mq) IO path for SCSI. Cc: linux-scsi@vger.kernel.org Acked-by: Himanshu Madhani <himanshu.madhani@cavium.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Tested-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Omar Sandoval <osandov@fb.com> Acked-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
# 3a7ea2c4	29-Oct-2018	Jens Axboe <axboe@kernel.dk>	scsi: provide mq_ops->busy() hook Only the SCSI legacy path provides a way to check if target is currently busy, provide the same for the MQ path. Cc: linux-scsi@vger.kernel.org Reviewed-by: Hannes Reinecke <hare@suse.com> Tested-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Omar Sandoval <osandov@fb.com> Acked-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
# a33e5bfb	07-Oct-2018	Hannes Reinecke <hare@suse.com>	scsi: core: Allow state transitions from OFFLINE to BLOCKED When an RSCN gets delayed (or not being sent at all), the transport class will detect an error, EH kicks in, and eventually will be setting the device to offline. If we receive an RSCN after that, the device will stay in 'offline'. This patch allows for an 'offline' to 'blocked' transition, thereby allowing the device to become active again. Signed-off-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# cd84a62e	26-Sep-2018	Bart Van Assche <bvanassche@acm.org>	block, scsi: Change the preempt-only flag into a counter The RQF_PREEMPT flag is used for three purposes: - In the SCSI core, for making sure that power management requests are executed even if a device is in the "quiesced" state. - For domain validation by SCSI drivers that use the parallel port. - In the IDE driver, for IDE preempt requests. Rename "preempt-only" into "pm-only" because the primary purpose of this mode is power management. Since the power management core may but does not have to resume a runtime suspended device before performing system-wide suspend and since a later patch will set "pm-only" mode as long as a block device is runtime suspended, make it possible to set "pm-only" mode from more than one context. Since with this change scsi_device_quiesce() is no longer idempotent, make that function return early if it is called for a quiesced queue. Signed-off-by: Bart Van Assche <bvanassche@acm.org> Acked-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Ming Lei <ming.lei@redhat.com> Cc: Jianchao Wang <jianchao.w.wang@oracle.com> Cc: Johannes Thumshirn <jthumshirn@suse.de> Cc: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: Jens Axboe <axboe@kernel.dk>
# 6f1d8a53	05-Sep-2018	Igor Stoppa <igor.stoppa@gmail.com>	scsi: core: remove unnecessary unlikely() BUG_ON() already contains an unlikely(), there is no need for another one. Signed-off-by: Igor Stoppa <igor.stoppa@huawei.com> Cc: "Martin K. Petersen" <martin.petersen@oracle.com> Cc: "James E.J. Bottomley" <jejb@linux.vnet.ibm.com> Cc: linux-scsi@vger.kernel.org CC: linux-kernel@vger.kernel.org Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# d772a65d	27-Aug-2018	Ming Lei <ming.lei@redhat.com>	Revert "scsi: core: avoid host-wide host_busy counter for scsi_mq" This reverts commit 328728630d9f2bf14b82ca30b5e47489beefe361. There is fundamental issue in commit 328728630d9f2bf1 (scsi: core: avoid host-wide host_busy counter for scsi_mq) because SCSI's host busy counter may not be same with counter of blk-mq's inflight tags, especially in case of none io scheduler. We may switch to other approach for addressing this scsi_mq's performance issue, such as percpu counter or kind of ways, so revert this commit first for fixing this kind of issue in EH path, as reported by Jens. Cc: Omar Sandoval <osandov@fb.com>, Cc: "Martin K. Petersen" <martin.petersen@oracle.com>, Cc: James Bottomley <james.bottomley@hansenpartnership.com>, Cc: Christoph Hellwig <hch@lst.de>, Cc: Don Brace <don.brace@microsemi.com> Cc: Kashyap Desai <kashyap.desai@broadcom.com> Cc: Mike Snitzer <snitzer@redhat.com> Cc: Hannes Reinecke <hare@suse.de> Cc: Laurence Oberman <loberman@redhat.com> Cc: Bart Van Assche <bart.vanassche@wdc.com> Cc: Jens Axboe <axboe@kernel.dk> Reported-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# 23aa8e69	27-Aug-2018	Ming Lei <ming.lei@redhat.com>	Revert "scsi: core: fix scsi_host_queue_ready" This reverts commit 265d59aacbce7e50bdc1f5d25033c38dd70b3767. There is fundamental issue in commit 328728630d9f2bf1 (scsi: core: avoid host-wide host_busy counter for scsi_mq) because SCSI's host busy counter may not be same with counter of blk-mq's inflight tags, especially in case of none io scheduler. So revert this commit first. Cc: Omar Sandoval <osandov@fb.com>, Cc: "Martin K. Petersen" <martin.petersen@oracle.com>, Cc: James Bottomley <james.bottomley@hansenpartnership.com>, Cc: Christoph Hellwig <hch@lst.de>, Cc: Don Brace <don.brace@microsemi.com> Cc: Kashyap Desai <kashyap.desai@broadcom.com> Cc: Mike Snitzer <snitzer@redhat.com> Cc: Hannes Reinecke <hare@suse.de> Cc: Laurence Oberman <loberman@redhat.com> Cc: Bart Van Assche <bart.vanassche@wdc.com> Cc: Guenter Roeck <linux@roeck-us.net> Cc: Jens Axboe <axboe@kernel.dk> Reported-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# 51372570	08-Aug-2018	Jianchao Wang <jianchao.w.wang@oracle.com>	scsi: core: use blk_mq_run_hw_queues in scsi_kick_queue We don't use blk-mq start/stop hw queue any more, so no reason to use blk_mq_start_hw_queues which does clear_bit, replace it with blk_mq_run_hw_queues. Signed-off-by: Jianchao Wang <jianchao.w.wang@oracle.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# 704f8392	31-Jul-2018	Kees Cook <keescook@chromium.org>	scsi: Check sense buffer size at build time To avoid introducing problems like those fixed in commit f7068114d45e ("sr: pass down correctly sized SCSI sense buffer"), this creates a macro wrapper for scsi_execute() that verifies the size of the sense buffer similar to what was done for command string sizes in commit 3756f6401c30 ("exec: avoid gcc-8 warning for get_task_comm"). Another solution could be to add a length argument to scsi_execute(), but this function already takes a lot of arguments and Jens was not fond of that approach. Additionally, this moves the SCSI_SENSE_BUFFERSIZE definition into scsi_device.h, and removes a redundant include for scsi_device.h from scsi_cmnd.h. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
# 265d59aa	29-Jun-2018	Ming Lei <ming.lei@redhat.com>	scsi: core: fix scsi_host_queue_ready 328728630d9f ("scsi: avoid to hold host-wide counter of host_busy for scsi_mq") adds one extra check on scsi_host_busy(shost) in scsi_host_queue_ready(), which is wrong and not necessary, can causes booting stall on LSI53c895A. So remove the check. Cc: Omar Sandoval <osandov@fb.com>, Cc: "Martin K. Petersen" <martin.petersen@oracle.com>, Cc: James Bottomley <james.bottomley@hansenpartnership.com>, Cc: Christoph Hellwig <hch@lst.de>, Cc: Don Brace <don.brace@microsemi.com> Cc: Kashyap Desai <kashyap.desai@broadcom.com> Cc: Mike Snitzer <snitzer@redhat.com> Cc: Hannes Reinecke <hare@suse.de> Cc: Laurence Oberman <loberman@redhat.com> Cc: Bart Van Assche <bart.vanassche@wdc.com> Cc: Guenter Roeck <linux@roeck-us.net> Reported-by: Guenter Roeck <linux@roeck-us.net> Fixes: 328728630d9f ("scsi: avoid to hold host-wide counter of host_busy for scsi_mq") Signed-off-by: Ming Lei <ming.lei@redhat.com> Tested-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# 32872863	24-Jun-2018	Ming Lei <ming.lei@redhat.com>	scsi: core: avoid host-wide host_busy counter for scsi_mq It isn't necessary to check the host depth in scsi_queue_rq() any more since it has been respected by blk-mq before calling scsi_queue_rq() via getting driver tag. Lots of LUNs may attach to same host and per-host IOPS may reach millions, so we should avoid expensive atomic operations on the host-wide counter in the IO path. This patch implements scsi_host_busy() via blk_mq_tagset_busy_iter() for reading the count of busy IOs for scsi_mq. It is observed that IOPS is increased by 15% in IO test on scsi_debug (32 LUNs, 32 submit queues, 1024 can_queue, libaio/dio) in a dual-socket system. [mkp: clarified commit message] Cc: Omar Sandoval <osandov@fb.com>, Cc: "Martin K. Petersen" <martin.petersen@oracle.com>, Cc: James Bottomley <james.bottomley@hansenpartnership.com>, Cc: Christoph Hellwig <hch@lst.de>, Cc: Don Brace <don.brace@microsemi.com> Cc: Kashyap Desai <kashyap.desai@broadcom.com> Cc: Mike Snitzer <snitzer@redhat.com> Cc: Hannes Reinecke <hare@suse.de> Cc: Laurence Oberman <loberman@redhat.com> Cc: Bart Van Assche <bart.vanassche@wdc.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# c65be1a6	25-Jun-2018	Johannes Thumshirn <jthumshirn@suse.de>	scsi: core: check for equality of result byte values When evaluating a SCSI command's result using the field access macros, check for equality of the fields and not if a specific bit is set. This is a preparation patch, for reworking the results field in the SCSI command. Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# 8e1695a0	22-Jun-2018	Douglas Gilbert <dgilbert@interlog.com>	scsi: core: scsi_io_completion convert BUGs to WARNs The scsi_io_completion function contains three BUG() and BUG_ON() calls. Replace them with WARN variants. Signed-off-by: Douglas Gilbert <dgilbert@interlog.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# 0d437906	22-Jun-2018	Douglas Gilbert <dgilbert@interlog.com>	scsi: core: scsi_io_completion hints on fastpath Add likely() and unlikely() hints to conditionals on or near the fastpath. Signed-off-by: Douglas Gilbert <dgilbert@interlog.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# 4ae61c68	22-Jun-2018	Douglas Gilbert <dgilbert@interlog.com>	scsi: core: add scsi_io_completion_reprep helper Since the action "reprep" is called from two places, rather than repeat the code, make a new scsi_io_completion helper with "reprep" as its suffix. Signed-off-by: Douglas Gilbert <dgilbert@interlog.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# da32baea	22-Jun-2018	Douglas Gilbert <dgilbert@interlog.com>	scsi: core: add scsi_io_completion_action helper Place scsi_io_completion()'s complex error processing associated with a local enumeration into a static helper function. That enumeration's values start with "ACTION_" so use the suffix "_action" in the helper function's name. Signed-off-by: Douglas Gilbert <dgilbert@interlog.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# ab831084	22-Jun-2018	Douglas Gilbert <dgilbert@interlog.com>	scsi: core: add scsi_io_completion_nz_result function Break out several intertwined paths when cmd->result is non zero and place them in the scsi_io_completion_nz_result helper function. The logic is not changed. Signed-off-by: Douglas Gilbert <dgilbert@interlog.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# 1f7cbb8e	22-Jun-2018	Douglas Gilbert <dgilbert@interlog.com>	scsi: core: scsi_io_completion: rename variables Change and add some variable names, adjust some associated comments for clarity. Correct some misleading comments. Signed-off-by: Douglas Gilbert <dgilbert@interlog.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# 7e63b5a4	22-Jun-2018	Douglas Gilbert <dgilbert@interlog.com>	scsi: core: scsi_io_completion: comment on end_request return scsi_end_request() is called multiple times from scsi_io_completion() which branches on its bool returned value. Add comment before the static definition of scsi_end_request() about the meaning of that return. Signed-off-by: Douglas Gilbert <dgilbert@interlog.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com> Signed-off-by: Douglas Gilbert <dgilbert@interlog.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# e37c7d9a	18-May-2018	Douglas Gilbert <dgilbert@interlog.com>	scsi: core: sanitize++ in progress Commit 505aa4b6a883 ("scsi: sd: Defer spinning up drive while SANITIZE is in progress") may not be sufficient, especially if the SCSI SANITIZE command is sent via the bsg or sg pass-throughs, since they don't use the sd driver. Add "Sanitize in progress" plus some other recent "... in progress" additional sense codes into the scsi mid-level so they are treated in a similar fashion to "Format in progress". [mkp: checkpatch] Signed-off-by: Douglas Gilbert <dgilbert@interlog.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# 0eb0b63c	09-May-2018	Christoph Hellwig <hch@lst.de>	block: consistently use GFP_NOIO instead of __GFP_NORECLAIM Same numerical value (for now at least), but a much better documentation of intent. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
# ff005a06	09-May-2018	Christoph Hellwig <hch@lst.de>	block: sanitize blk_get_request calling conventions Switch everyone to blk_get_request_flags, and then rename blk_get_request_flags to blk_get_request. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
# 21e07dba	03-Apr-2018	Christoph Hellwig <hch@lst.de>	scsi: reduce use of block bounce buffers We can rely on the dma-mapping code to handle any DMA limits that is bigger than the ISA DMA mask for us (either using an iommu or swiotlb), so remove setting the block layer bounce limit for anything but the unchecked_isa_dma case, or the bouncing for highmem pages. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jens Axboe <axboe@kernel.dk>
# f4abab3f	05-Apr-2018	Bart Van Assche <bvanassche@acm.org>	scsi: core: Make scsi_result_to_blk_status() recognize CONDITION MET Ensure that CONDITION MET and other non-zero status values that indicate success are translated into BLK_STS_OK. Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Cc: Hannes Reinecke <hare@suse.com> Cc: Douglas Gilbert <dgilbert@interlog.com> Cc: Damien Le Moal <damien.lemoal@wdc.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Lee Duncan <lduncan@suse.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# a77b32d8	05-Apr-2018	Bart Van Assche <bvanassche@acm.org>	scsi: core: Rename __scsi_error_from_host_byte() into scsi_result_to_blk_status() Since the next patch will modify this function such that it checks more than just the host byte of the SCSI result, rename __scsi_error_from_host_byte() into scsi_result_to_blk_status(). This patch does not change any functionality. Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Cc: Hannes Reinecke <hare@suse.com> Cc: Douglas Gilbert <dgilbert@interlog.com> Cc: Damien Le Moal <damien.lemoal@wdc.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Lee Duncan <lduncan@suse.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# cbe095e2	05-Apr-2018	Bart Van Assche <bvanassche@acm.org>	Revert "scsi: core: return BLK_STS_OK for DID_OK in __scsi_error_from_host_byte()" The description of commit e39a97353e53 is wrong: it mentions that commit 2a842acab109 introduced a bug in __scsi_error_from_host_byte() although that commit did not change the behavior of that function. Additionally, commit e39a97353e53 introduced a bug: it causes commands that fail with hostbyte=DID_OK and driverbyte=DRIVER_SENSE to be completed with BLK_STS_OK. Hence revert that commit. Fixes: e39a97353e53 ("scsi: core: return BLK_STS_OK for DID_OK in __scsi_error_from_host_byte()") Reported-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Cc: Hannes Reinecke <hare@suse.com> Cc: Douglas Gilbert <dgilbert@interlog.com> Cc: Damien Le Moal <damien.lemoal@wdc.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Lee Duncan <lduncan@suse.com> Cc: stable@vger.kernel.org Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# 17cb960f	13-Mar-2018	Christoph Hellwig <hch@lst.de>	bsg: split handling of SCSI CDBs vs transport requeues The current BSG design tries to shoe-horn the transport-specific passthrough commands into the overall framework for SCSI passthrough requests. This has a couple problems: - each passthrough queue has to set the QUEUE_FLAG_SCSI_PASSTHROUGH flag despite not dealing with SCSI commands at all. Because of that these queues could also incorrectly accept SCSI commands from in-kernel users or through the legacy SCSI_IOCTL_SEND_COMMAND ioctl. - the real SCSI bsg queues also incorrectly accept bsg requests of the BSG_SUB_PROTOCOL_SCSI_TRANSPORT type - the bsg transport code is almost unredable because it tries to reuse different SCSI concepts for its own purpose. This patch instead adds a new bsg_ops structure to handle the two cases differently, and thus solves all of the above problems. Another side effect is that the bsg-lib queues also don't need to embedd a struct scsi_request anymore. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
# 1875ede0	06-Mar-2018	Douglas Gilbert <dgilbert@interlog.com>	scsi: core: Make SCSI Status CONDITION MET equivalent to GOOD The SCSI PRE-FETCH (10 or 16) command is present both on hard disks and some SSDs. It is useful when the address of the next block(s) to be read is known but it is not following the LBA of the current READ (so read-ahead won't help). It returns two "good" SCSI Status values. If the requested blocks have fitted (or will most likely fit (when the IMMED bit is set)) into the disk's cache, it returns CONDITION MET. If it didn't (or will not) fit then it returns GOOD status. The goal of this patch is to stop the SCSI subsystem treating the CONDITION MET SCSI status as an error. The current state makes the PRE-FETCH command effectively unusable via pass-throughs. Signed-off-by: Douglas Gilbert <dgilbert@interlog.com> Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# 8b904b5b	07-Mar-2018	Bart Van Assche <bvanassche@acm.org>	block: Use blk_queue_flag_() in drivers instead of queue_flag_() This patch has been generated as follows: for verb in set_unlocked clear_unlocked set clear; do replace-in-files queue_flag_${verb} blk_queue_flag_${verb%_unlocked} \ $(git grep -lw queue_flag_${verb} drivers block/bsg*) done Except for protecting all queue flag changes with the queue lock this patch does not change any functionality. Cc: Mike Snitzer <snitzer@redhat.com> Cc: Shaohua Li <shli@fb.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Hannes Reinecke <hare@suse.de> Cc: Ming Lei <ming.lei@redhat.com> Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Acked-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
# 2f793a27	02-Mar-2018	Jianchao Wang <jianchao.w.wang@oracle.com>	scsi: core: use blk_mq_requeue_request in __scsi_queue_insert In scsi core, __scsi_queue_insert should just put request back on the queue and retry using the same command as before. However, for blk-mq, scsi_mq_requeue_cmd is employed here which will unprepare the request. To align with the semantics of __scsi_queue_insert, use blk_mq_requeue_request with kick_requeue_list == true and put the reference of scsi_device. Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Jianchao Wang <jianchao.w.wang@oracle.com> Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# e39a9735	26-Feb-2018	Hannes Reinecke <hare@suse.de>	scsi: core: return BLK_STS_OK for DID_OK in __scsi_error_from_host_byte() When converting __scsi_error_from_host_byte() to BLK_STS error codes the case DID_OK was forgotten, resulting in it always returning an error. Fixes: 2a842acab109 ("block: introduce new block status code type") Cc: Doug Gilbert <dgilbert@interlog.com> Signed-off-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Douglas Gilbert <dgilbert@interlog.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# 3be8828f	22-Feb-2018	Bart Van Assche <bvanassche@acm.org>	scsi: core: Avoid that ATA error handling can trigger a kernel hang or oops Avoid that the recently introduced call_rcu() call in the SCSI core triggers a double call_rcu() call. Reported-by: Natanael Copa <ncopa@alpinelinux.org> Reported-by: Damien Le Moal <damien.lemoal@wdc.com> References: https://bugzilla.kernel.org/show_bug.cgi?id=198861 Fixes: 3bd6f43f5cb3 ("scsi: core: Ensure that the SCSI error handler gets woken up") Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com> Tested-by: Damien Le Moal <damien.lemoal@wdc.com> Cc: Natanael Copa <ncopa@alpinelinux.org> Cc: Damien Le Moal <damien.lemoal@wdc.com> Cc: Alexandre Oliva <oliva@gnu.org> Cc: Pavel Tikhomirov <ptikhomirov@virtuozzo.com> Cc: Hannes Reinecke <hare@suse.com> Cc: Johannes Thumshirn <jthumshirn@suse.de> Cc: <stable@vger.kernel.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# 5ee0524b	28-Feb-2018	Bart Van Assche <bvanassche@acm.org>	block: Add 'lock' as third argument to blk_alloc_queue_node() This patch does not change any functionality. Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Philipp Reisner <philipp.reisner@linbit.com> Cc: Ulf Hansson <ulf.hansson@linaro.org> Cc: Kees Cook <keescook@chromium.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
# 9b91fd34	12-Feb-2018	Bart Van Assche <bvanassche@acm.org>	scsi: core: Reduce number of scsi_test_unit_ready() retries Make scsi_test_unit_ready() send at most as many TURs as specified in the 'retries' argument instead of retries * (retries + 1) / 2. Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# 86ff7c2a	30-Jan-2018	Ming Lei <ming.lei@redhat.com>	blk-mq: introduce BLK_STS_DEV_RESOURCE This status is returned from driver to block layer if device related resource is unavailable, but driver can guarantee that IO dispatch will be triggered in future when the resource is available. Convert some drivers to return BLK_STS_DEV_RESOURCE. Also, if driver returns BLK_STS_RESOURCE and SCHED_RESTART is set, rerun queue after a delay (BLK_MQ_DELAY_QUEUE) to avoid IO stalls. BLK_MQ_DELAY_QUEUE is 3 ms because both scsi-mq and nvmefc are using that magic value. If a driver can make sure there is in-flight IO, it is safe to return BLK_STS_DEV_RESOURCE because: 1) If all in-flight IOs complete before examining SCHED_RESTART in blk_mq_dispatch_rq_list(), SCHED_RESTART must be cleared, so queue is run immediately in this case by blk_mq_dispatch_rq_list(); 2) if there is any in-flight IO after/when examining SCHED_RESTART in blk_mq_dispatch_rq_list(): - if SCHED_RESTART isn't set, queue is run immediately as handled in 1) - otherwise, this request will be dispatched after any in-flight IO is completed via blk_mq_sched_restart() 3) if SCHED_RESTART is set concurently in context because of BLK_STS_RESOURCE, blk_mq_delay_run_hw_queue() will cover the above two cases and make sure IO hang can be avoided. One invariant is that queue will be rerun if SCHED_RESTART is set. Suggested-by: Jens Axboe <axboe@kernel.dk> Tested-by: Laurence Oberman <loberman@redhat.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
# 0afe76e8	10-Jun-2017	David Windsor <dave@nullcore.net>	scsi: Define usercopy region in scsi_sense_cache slab cache SCSI sense buffers, stored in struct scsi_cmnd.sense and therefore contained in the scsi_sense_cache slab cache, need to be copied to/from userspace. cache object allocation: drivers/scsi/scsi_lib.c: scsi_select_sense_cache(...): return ... ? scsi_sense_isadma_cache : scsi_sense_cache scsi_alloc_sense_buffer(...): return kmem_cache_alloc_node(scsi_select_sense_cache(), ...); scsi_init_request(...): ... cmd->sense_buffer = scsi_alloc_sense_buffer(...); ... cmd->req.sense = cmd->sense_buffer example usage trace: block/scsi_ioctl.c: (inline from sg_io) blk_complete_sghdr_rq(...): struct scsi_request *req = scsi_req(rq); ... copy_to_user(..., req->sense, len) scsi_cmd_ioctl(...): sg_io(...); In support of usercopy hardening, this patch defines a region in the scsi_sense_cache slab cache in which userspace copy operations are allowed. This region is known as the slab cache's usercopy region. Slab caches can now check that each dynamically sized copy operation involving cache-managed memory falls entirely within the slab's usercopy region. Signed-off-by: David Windsor <dave@nullcore.net> [kees: adjust commit log, provide usage trace] Cc: "James E.J. Bottomley" <jejb@linux.vnet.ibm.com> Cc: "Martin K. Petersen" <martin.petersen@oracle.com> Cc: linux-scsi@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org>
# 08640e81	10-Jan-2018	Bart Van Assche <bvanassche@acm.org>	scsi: core: Change third __scsi_queue_insert() argument from int to bool This patch does not change any functionality but makes the SCSI core source code slightly easier to read. Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Hannes Reinecke <hare@suse.de> Cc: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# e4c9470b	07-Dec-2017	Bart Van Assche <bvanassche@acm.org>	scsi: core: Unexport scsi_initialize_rq() Commit 651a01364994 ("scsi: scsi_transport_sas: switch to bsg-lib for SMP passthrough") removed the only call to scsi_initialize_rq() from outside the SCSI core. Hence unexport scsi_initialize_rq(). Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Hannes Reinecke <hare@suse.com> Cc: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# 3bd6f43f	04-Dec-2017	Bart Van Assche <bvanassche@acm.org>	scsi: core: Ensure that the SCSI error handler gets woken up If scsi_eh_scmd_add() is called concurrently with scsi_host_queue_ready() while shost->host_blocked > 0 then it can happen that neither function wakes up the SCSI error handler. Fix this by making every function that decreases the host_busy counter wake up the error handler if necessary and by protecting the host_failed checks with the SCSI host lock. Reported-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com> References: https://marc.info/?l=linux-kernel&m=150461610630736 Fixes: commit 746650160866 ("scsi: convert host_busy to atomic_t") Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Reviewed-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com> Tested-by: Stuart Hayes <stuart.w.hayes@gmail.com> Cc: Konstantin Khorenko <khorenko@virtuozzo.com> Cc: Stuart Hayes <stuart.w.hayes@gmail.com> Cc: Pavel Tikhomirov <ptikhomirov@virtuozzo.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Hannes Reinecke <hare@suse.com> Cc: Johannes Thumshirn <jthumshirn@suse.de> Cc: <stable@vger.kernel.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# 7e70aa78	05-Dec-2017	Ming Lei <ming.lei@redhat.com>	scsi: core: run queue if SCSI device queue isn't ready and queue is idle Before commit 0df21c86bdbf ("scsi: implement .get_budget and .put_budget for blk-mq"), we run queue after 3ms if queue is idle and SCSI device queue isn't ready, which is done in handling BLK_STS_RESOURCE. After commit 0df21c86bdbf is introduced, queue won't be run any more under this situation. IO hang is observed when timeout happened, and this patch fixes the IO hang issue by running queue after delay in scsi_dev_queue_ready, just like non-mq. This issue can be triggered by the following script[1]. There is another issue which can be covered by running idle queue: when .get_budget() is called on request coming from hctx->dispatch_list, if one request just completes during .get_budget(), we can't depend on SCSI's restart to make progress any more. This patch fixes the race too. With this patch, we basically recover to previous behaviour (before commit 0df21c86bdbf) of handling idle queue when running out of resource. [1] script for test/verify SCSI timeout rmmod scsi_debug modprobe scsi_debug max_queue=1 DEVICE=`ls -d /sys/bus/pseudo/drivers/scsi_debug/adapter/host/target//block/* \| head -1 \| xargs basename` DISK_DIR=`ls -d /sys/block/$DEVICE/device/scsi_disk/*` echo "using scsi device $DEVICE" echo "-1" >/sys/bus/pseudo/drivers/scsi_debug/every_nth echo "temporary write through" >$DISK_DIR/cache_type echo "128" >/sys/bus/pseudo/drivers/scsi_debug/opts echo none > /sys/block/$DEVICE/queue/scheduler dd if=/dev/$DEVICE of=/dev/null bs=1M iflag=direct count=1 & sleep 5 echo "0" >/sys/bus/pseudo/drivers/scsi_debug/opts wait echo "SUCCESS" Fixes: 0df21c86bdbf ("scsi: implement .get_budget and .put_budget for blk-mq") Signed-off-by: Ming Lei <ming.lei@redhat.com> Tested-by: Holger Hoffstätte <holger@applied-asynchrony.com> Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# 90addc6b	21-Nov-2017	Huacai Chen <chenhuacai@kernel.org>	scsi: use dma_get_cache_alignment() as minimum DMA alignment In non-coherent DMA mode, kernel uses cache flushing operations to maintain I/O coherency, so scsi's block queue should be aligned to the value returned by dma_get_cache_alignment(). Otherwise, If a DMA buffer and a kernel structure share a same cache line, and if the kernel structure has dirty data, cache_invalidate (no writeback) will cause data corruption. Cc: stable@vger.kernel.org Signed-off-by: Huacai Chen <chenhc@lemote.com> [hch: rebased and updated the comment and changelog] Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# 3a0a5299	09-Nov-2017	Bart Van Assche <bvanassche@acm.org>	block, scsi: Make SCSI quiesce and resume work reliably The contexts from which a SCSI device can be quiesced or resumed are: * Writing into /sys/class/scsi_device//device/state. SCSI parallel (SPI) domain validation. * The SCSI device power management methods. See also scsi_bus_pm_ops. It is essential during suspend and resume that neither the filesystem state nor the filesystem metadata in RAM changes. This is why while the hibernation image is being written or restored that SCSI devices are quiesced. The SCSI core quiesces devices through scsi_device_quiesce() and scsi_device_resume(). In the SDEV_QUIESCE state execution of non-preempt requests is deferred. This is realized by returning BLKPREP_DEFER from inside scsi_prep_state_check() for quiesced SCSI devices. Avoid that a full queue prevents power management requests to be submitted by deferring allocation of non-preempt requests for devices in the quiesced state. This patch has been tested by running the following commands and by verifying that after each resume the fio job was still running: for ((i=0; i<10; i++)); do ( cd /sys/block/md0/md && while true; do [ "$(<sync_action)" = "idle" ] && echo check > sync_action sleep 1 done ) & pids=($!) for d in /sys/class/block/sd[a-z]; do bdev=${d#/sys/class/block/} hcil=$(readlink "$d/device") hcil=${hcil#../../../} echo 4 > "$d/queue/nr_requests" echo 1 > "/sys/class/scsi_device/$hcil/device/queue_depth" fio --name="$bdev" --filename="/dev/$bdev" --buffered=0 --bs=512 \ --rw=randread --ioengine=libaio --numjobs=4 --iodepth=16 \ --iodepth_batch=1 --thread --loops=$((2*31)) & pids+=($!) done sleep 1 echo "$(date) Hibernating ..." >>hibernate-test-log.txt systemctl hibernate sleep 10 kill "${pids[@]}" echo idle > /sys/block/md0/md/sync_action wait echo "$(date) Done." >>hibernate-test-log.txt done Reported-by: Oleksandr Natalenko <oleksandr@natalenko.name> References: "I/O hangs after resuming from suspend-to-ram" (https://marc.info/?l=linux-block&m=150340235201348). Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Tested-by: Martin Steigerwald <martin@lichtvoll.de> Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name> Cc: Martin K. Petersen <martin.petersen@oracle.com> Cc: Ming Lei <ming.lei@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
# 039c635f	09-Nov-2017	Bart Van Assche <bvanassche@acm.org>	ide, scsi: Tell the block layer at request allocation time about preempt requests Convert blk_get_request(q, op, __GFP_RECLAIM) into blk_get_request_flags(q, op, BLK_MQ_PREEMPT). This patch does not change any functionality. Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Tested-by: Martin Steigerwald <martin@lichtvoll.de> Acked-by: David S. Miller <davem@davemloft.net> [ for IDE ] Acked-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name> Cc: Ming Lei <ming.lei@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
# a817e73f	07-Nov-2017	Linus Torvalds <torvalds@linux-foundation.org>	Revert "scsi: make 'state' device attribute pollable" This reverts commit 8a97712e5314aefe16b3ffb4583a34deaa49de04. This commit added a call to sysfs_notify() from within scsi_device_set_state(), which in turn turns out to make libata very unhappy, because ata_eh_detach_dev() does spin_lock_irqsave(ap->lock, flags); .. if (ata_scsi_offline_dev(dev)) { dev->flags \|= ATA_DFLAG_DETACHED; ap->pflags \|= ATA_PFLAG_SCSI_HOTPLUG; } and ata_scsi_offline_dev() then does that scsi_device_set_state() to set it offline. So now we called sysfs_notify() from within a spinlocked region, which really doesn't work. The 0day robot reported this as: BUG: sleeping function called from invalid context at kernel/locking/mutex.c:238 because sysfs_notify() ends up calling kernfs_find_and_get_ns() which then does mutex_lock(&kernfs_mutex).. The pollability of the device state isn't critical, so revert this all for now, and maybe we'll do it differently in the future. Reported-by: Fengguang Wu <fengguang.wu@intel.com> Acked-by: Tejun Heo <tj@kernel.org> Acked-by: Martin K. Petersen <martin.petersen@oracle.com> Acked-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
# 88022d72	04-Nov-2017	Ming Lei <ming.lei@redhat.com>	blk-mq: don't handle failure in .get_budget It is enough to just check if we can get the budget via .get_budget(). And we don't need to deal with device state change in .get_budget(). For SCSI, one issue to be fixed is that we have to call scsi_mq_uninit_cmd() to free allocated ressources if SCSI device fails to handle the request. And it isn't enough to simply call blk_mq_end_request() to do that if this request is marked as RQF_DONTPREP. Fixes: 0df21c86bdbf(scsi: implement .get_budget and .put_budget for blk-mq) Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
# 826a70a0	03-Nov-2017	Ming Lei <ming.lei@redhat.com>	SCSI: don't get target/host busy_count in scsi_mq_get_budget() It is very expensive to atomic_inc/atomic_dec the host wide counter of host->busy_count, and it should have been avoided via blk-mq's mechanism of getting driver tag, which uses the more efficient way of sbitmap queue. Also we don't check atomic_read(&sdev->device_busy) in scsi_mq_get_budget() and don't run queue if the counter becomes zero, so IO hang may be caused if all requests are completed just before the current SCSI device is added to shost->starved_list. Fixes: 0df21c86bdbf(scsi: implement .get_budget and .put_budget for blk-mq) Reported-by: Bart Van Assche <bart.vanassche@wdc.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
# 0df21c86	14-Oct-2017	Ming Lei <ming.lei@redhat.com>	scsi: implement .get_budget and .put_budget for blk-mq We need to tell blk-mq to reserve resources before queuing one request, so implement these two callbacks. Then blk-mq can avoid to dequeue request too early, and IO merging can be improved a lot. Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
# aeec7762	14-Oct-2017	Ming Lei <ming.lei@redhat.com>	scsi: allow passing in null rq to scsi_prep_state_check() In the following patch, we will implement scsi_get_budget() which need to call scsi_prep_state_check() when rq isn't dequeued yet. Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
# 8fe8ffb1	20-Oct-2017	Bart Van Assche <bvanassche@acm.org>	scsi: Suppress a kernel warning in case the prep function returns BLKPREP_DEFER The legacy block layer handles requests as follows: - If the prep function returns BLKPREP_OK, let blk_peek_request() return the pointer to that request. - If the prep function returns BLKPREP_DEFER, keep the RQF_STARTED flag and retry calling the prep function later. - If the prep function returns BLKPREP_KILL or BLKPREP_INVALID, end the request. In none of these cases it is correct to clear the SCMD_INITIALIZED flag from inside scsi_prep_fn(). Since scsi_prep_fn() already guarantees that scsi_init_command() will be called once even if scsi_prep_fn() is called multiple times, remove the code that clears SCMD_INITIALIZED from scsi_prep_fn(). The scsi-mq code handles requests as follows: - If scsi_mq_prep_fn() returns BLKPREP_OK, set the RQF_DONTPREP flag and submit the request to the SCSI LLD. - If scsi_mq_prep_fn() returns BLKPREP_DEFER, call blk_mq_delay_run_hw_queue() and return BLK_STS_RESOURCE. - If the prep function returns BLKPREP_KILL or BLKPREP_INVALID, call scsi_mq_uninit_cmd() and let the blk-mq core end the request. In none of these cases scsi_mq_prep_fn() should clear the SCMD_INITIALIZED flag. Hence remove the code from scsi_mq_prep_fn() function that clears that flag. This patch avoids that the following warning is triggered when using the legacy block layer: ------------[ cut here ]------------ WARNING: CPU: 1 PID: 4198 at drivers/scsi/scsi_lib.c:654 scsi_end_request+0x1de/0x220 CPU: 1 PID: 4198 Comm: mkfs.f2fs Not tainted 4.14.0-rc5+ #1 task: ffff91c147a4b800 task.stack: ffffb282c37b8000 RIP: 0010:scsi_end_request+0x1de/0x220 Call Trace: <IRQ> scsi_io_completion+0x204/0x5e0 scsi_finish_command+0xce/0xe0 scsi_softirq_done+0x126/0x130 blk_done_softirq+0x6e/0x80 __do_softirq+0xcf/0x2a8 irq_exit+0xab/0xb0 do_IRQ+0x7b/0xc0 common_interrupt+0x90/0x90 </IRQ> RIP: 0010:_raw_spin_unlock_irqrestore+0x9/0x10 __test_set_page_writeback+0xc7/0x2c0 __block_write_full_page+0x158/0x3b0 block_write_full_page+0xc4/0xd0 blkdev_writepage+0x13/0x20 __writepage+0x12/0x40 write_cache_pages+0x204/0x500 generic_writepages+0x48/0x70 blkdev_writepages+0x9/0x10 do_writepages+0x34/0xc0 __filemap_fdatawrite_range+0x6c/0x90 file_write_and_wait_range+0x31/0x90 blkdev_fsync+0x16/0x40 vfs_fsync_range+0x44/0xa0 do_fsync+0x38/0x60 SyS_fsync+0xb/0x10 entry_SYSCALL_64_fastpath+0x13/0x94 ---[ end trace 86e8ef85a4a6c1d1 ]--- Fixes: commit 64104f703212 ("scsi: Call scsi_initialize_rq() for filesystem requests") Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Cc: Damien Le Moal <damien.lemoal@wdc.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Hannes Reinecke <hare@suse.com> Cc: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# cf3431bb	17-Oct-2017	Hannes Reinecke <hare@suse.de>	scsi: scsi_error: Handle power-on reset unit attention As per SAM there is a status precedence, with any sense code 29/XX taking second place just after an ACA ACTIVE status. Additionally, each target might prefer to not queue any unit attention conditions, but just report one. Due to the above, this will be that one with the highest precedence. This results in the sense code 29/XX effectively overwriting any other unit attention. Hence we should report the power-on reset to userland so that it can take appropriate action. Signed-off-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# e98f42bc	10-Oct-2017	Damien Le Moal <damien.lemoal@wdc.com>	scsi: sd_zbc: Fix comments and indentation Fix comments style (use kernel-doc style) and content to clarify some functions. Also fix some functions signature indentation and remove a useless blank line in sd_zbc_read_zones(). No functional change is introduced by this patch. Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# a45a1f36	30-Aug-2017	Bart Van Assche <bvanassche@acm.org>	scsi: scsi-mq: Always unprepare before requeuing a request One of the two scsi-mq functions that requeue a request unprepares a request before requeueing (scsi_io_completion()) but the other function not (__scsi_queue_insert()). Make sure that a request is unprepared before requeuing it. Fixes: commit d285203cf647 ("scsi: add support for a blk-mq based I/O path.") Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Hannes Reinecke <hare@suse.com> Cc: Damien Le Moal <damien.lemoal@wdc.com> Cc: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# 832889f5	30-Aug-2017	Bart Van Assche <bvanassche@acm.org>	scsi: Improve requeuing behavior Requests are unprepared and reprepared when being requeued. Avoid that requeuing resets .jiffies_at_alloc and .retries by initializing these two member variables from inside scsi_initialize_rq() and by preserving both member variables when preparing a request. This patch affects the requeuing behavior of both the legacy scsi and the scsi-mq code paths. Reported-by: Brian King <brking@linux.vnet.ibm.com> References: https://lkml.org/lkml/2017/8/18/923 ("Re: [BUG][bisected 270065e] linux-next fails to boot on powerpc") Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Brian King <brking@linux.vnet.ibm.com> Cc: Hannes Reinecke <hare@suse.com> Cc: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# 64104f70	30-Aug-2017	Bart Van Assche <bvanassche@acm.org>	scsi: Call scsi_initialize_rq() for filesystem requests If a pass-through request is submitted then blk_get_request() initializes that request by calling scsi_initialize_rq(). Also call this function for filesystem requests. Introduce CMD_INITIALIZED to keep track of whether or not a request has already been initialized. Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Brian King <brking@linux.vnet.ibm.com> Cc: Hannes Reinecke <hare@suse.com> Cc: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# ccf1e004	29-Aug-2017	Bart Van Assche <bvanassche@acm.org>	scsi: Rework handling of scsi_device.vpd_pg8[03] Introduce struct scsi_vpd for the VPD page length, data and the RCU head that will be used to free the VPD data. Use kfree_rcu() instead of kfree() to free VPD data. Move the VPD buffer pointer check inside the RCU read lock in the sysfs code. Only annotate pointers that are shared across threads with __rcu. Use rcu_dereference() when dereferencing an RCU pointer. This patch suppresses about twenty sparse complaints about the vpd_pg8[03] pointers. This patch also fixes a race condition, namely that updating of the VPD pointers and length variables in struct scsi_device was not atomic with reference to the code reading these variables. See also "Does the update code tolerate concurrent accesses?" in Documentation/RCU/checklist.txt. Fixes: commit 09e2b0b14690 ("scsi: rescan VPD attributes") Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Acked-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Shane Seymour <shane.seymour@hpe.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Johannes Thumshirn <jthumshirn@suse.de> Cc: Shane Seymour <shane.seymour@hpe.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# 35c0506f	24-Aug-2017	Jonathan Corbet <corbet@lwn.net>	scsi: Fix the kerneldoc for scsi_initialize_rq() The kerneldoc comment for scsi_initialize_rq() neglected to document the "rq" parameter, leading to this docs build warning: ./drivers/scsi/scsi_lib.c:1116: warning: No description found for parameter 'rq' Document the parameter and make the build slightly quieter. [mkp: used wording suggested by Bart] Signed-off-by: Jonathan Corbet <corbet@lwn.net> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# 23cb27fd	25-Aug-2017	Hannes Reinecke <hare@suse.de>	scsi: fix comment in scsi_device_set_state() The function returns '0' if successful; with the original comment the function doesn't have a way to indicate success ... Signed-off-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Bart van Assche <bvanassche@wdc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# bed2213d	25-Aug-2017	Bart Van Assche <bvanassche@acm.org>	scsi: Use blk_mq_rq_to_pdu() to convert a request to a SCSI command pointer Since commit e9c787e65c0c ("scsi: allocate scsi_cmnd structures as part of struct request") struct request and struct scsi_cmnd are adjacent. This means that there is now an alternative to reading req->special to convert a pointer to a prepared request into a SCSI command pointer, namely by using blk_mq_rq_to_pdu(). Make this change where appropriate. Although this patch does not change any functionality, it slightly improves performance and slightly improves readability. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# e7008ff5	25-Aug-2017	Bart Van Assche <bvanassche@acm.org>	scsi: Document which queue type a function is intended for Rename several functions to make it easy to see which queue type a function is intended for. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Hannes Reinecke <hare@suse.com> Cc: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# 8a97712e	11-Aug-2017	Hannes Reinecke <hare@suse.de>	scsi: make 'state' device attribute pollable While the 'state' attribute can (and will) change occasionally, calling 'poll()' or 'select()' on it fails as sysfs is never notified that the state has changed. With this patch calling 'poll()' or 'select()' will work properly. Signed-off-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# 8cd1ec78	11-Aug-2017	Hannes Reinecke <hare@suse.de>	scsi: scsi_lib: rework scsi_internal_device_unblock_nowait() Rework scsi_internal_device_unblock_nowait() into using a switch statement. No functional changes. Signed-off-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# c8d9cf22	20-Jun-2017	Bart Van Assche <bvanassche@acm.org>	block: Change argument type of scsi_req_init() Since scsi_req_init() works on a struct scsi_request, change the argument type into struct scsi_request *. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
# ca18d6f7	20-Jun-2017	Bart Van Assche <bvanassche@acm.org>	block: Make most scsi_req_init() calls implicit Instead of explicitly calling scsi_req_init() after blk_get_request(), call that function from inside blk_get_request(). Add an .initialize_rq_fn() callback function to the block drivers that need it. Merge the IDE .init_rq_fn() function into .initialize_rq_fn() because it is too small to keep it as a separate function. Keep the scsi_req_init() call in ide_prep_sense() because it follows a blk_rq_init() call. References: commit 82ed4db499b8 ("block: split scsi_request out of struct request") Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Hannes Reinecke <hare@suse.com> Cc: Omar Sandoval <osandov@fb.com> Cc: Nicholas Bellinger <nab@linux-iscsi.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
# f660174e	06-Jun-2017	Ming Lei <ming.lei@redhat.com>	blk-mq: use the introduced blk_mq_unquiesce_queue() blk_mq_unquiesce_queue() is used for unquiescing the queue explicitly, so replace blk_mq_start_stopped_hw_queues() with it. For the scsi part, this patch takes Bart's suggestion to switch to block quiesce/unquiesce API completely. Cc: linux-nvme@lists.infradead.org Cc: linux-scsi@vger.kernel.org Cc: dm-devel@redhat.com Reviewed-by: Bart Van Assche <Bart.VanAssche@sandisk.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
# 08f78436	02-Jun-2017	Bart Van Assche <bvanassche@acm.org>	scsi: Make scsi_mq_prep_fn() call scsi_init_command() This patch reduces code duplication. There are two functional changes in this patch: - It causes scsi_mq_prep_fn() to clear driver-private command data, just like the already upstream commit 1bad6c4a57ef ("scsi: zero per-cmd private driver data for each MQ I/O"). - The initialization of .prot_sdb is moved from scsi_mq_prep_fn() into scsi_init_request(). [mkp: applied by hand] Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: Hannes Reinecke <hare@suse.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# be4c186c	02-Jun-2017	Bart Van Assche <bvanassche@acm.org>	scsi: Introduce scsi_mq_sgl_size() This patch does not change any functionality but makes the next patch easier to read. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: Hannes Reinecke <hare@suse.com> Cc: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# 2dd6fb59	02-Jun-2017	Bart Van Assche <bvanassche@acm.org>	scsi: Only add commands to the device command list if required by the LLD Just like for the scsi-mq code path, in the single queue SCSI code path only add commands to the per-device command list if required by the SCSI LLD. This patch will make it easier to merge the single-queue and multiqueue command initialization code. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Cc: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# 255ee932	02-Jun-2017	Bart Van Assche <bvanassche@acm.org>	scsi: Make __scsi_remove_device go straight from BLOCKED to DEL If a device is blocked, make __scsi_remove_device() cause it to transition to the DEL state. This means that all the commands issued in .shutdown() will error in the mid-layer, thus making the removal proceed without being stopped. This patch is a slightly modified version of a patch from James Bottomley. This patch avoids that the following lockup occurs: Call Trace: schedule+0x35/0x80 schedule_timeout+0x237/0x2d0 io_schedule_timeout+0xa6/0x110 wait_for_completion_io+0xa3/0x110 blk_execute_rq+0xdf/0x120 scsi_execute+0xce/0x150 [scsi_mod] scsi_execute_req_flags+0x8f/0xf0 [scsi_mod] sd_sync_cache+0xa9/0x190 [sd_mod] sd_shutdown+0x6a/0x100 [sd_mod] sd_remove+0x64/0xc0 [sd_mod] __device_release_driver+0x8d/0x120 device_release_driver+0x1e/0x30 bus_remove_device+0xf9/0x170 device_del+0x127/0x240 __scsi_remove_device+0xc1/0xd0 [scsi_mod] scsi_forget_host+0x57/0x60 [scsi_mod] scsi_remove_host+0x72/0x110 [scsi_mod] srp_remove_work+0x8b/0x200 [ib_srp] Reported-by: Israel Rukshin <israelr@mellanox.com> Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Cc: Israel Rukshin <israelr@mellanox.com> Cc: Max Gurtovoy <maxg@mellanox.com> Cc: Benjamin Block <bblock@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# 66483a4a	02-Jun-2017	Bart Van Assche <bvanassche@acm.org>	scsi: Introduce scsi_start_queue() This patch does not change any functionality. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: Israel Rukshin <israelr@mellanox.com> Cc: Max Gurtovoy <maxg@mellanox.com> Cc: Benjamin Block <bblock@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# 0db6ca8a	02-Jun-2017	Bart Van Assche <bvanassche@acm.org>	scsi: Protect SCSI device state changes with a mutex Serializing SCSI device state changes avoids that two state changes can occur concurrently, e.g. the state changes in scsi_target_block() and __scsi_remove_device(). This serialization is essential to make patch "Make __scsi_remove_device go straight from BLOCKED to DEL" work reliably. Enable this mechanism for all scsi_target_*block() callers but not for the scsi_internal_device_unblock() calls from the mpt3sas driver because that driver can call scsi_internal_device_unblock() from atomic context. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Hannes Reinecke <hare@suse.com> Cc: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# 43f7571b	02-Jun-2017	Bart Van Assche <bvanassche@acm.org>	scsi: Create two versions of scsi_internal_device_unblock() This will make it easier to serialize SCSI device state changes through a mutex. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: Sreekanth Reddy <sreekanth.reddy@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# 551eb598	02-Jun-2017	Bart Van Assche <bvanassche@acm.org>	scsi: Split scsi_internal_device_block() Instead of passing a "wait" argument to scsi_internal_device_block(), split this function into a function that waits and a function that doesn't wait. This will make it easier to serialize SCSI device state changes through a mutex. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: Sreekanth Reddy <sreekanth.reddy@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# 8e688254	02-Jun-2017	Bart Van Assche <bvanassche@acm.org>	scsi: Avoid that scsi_exit_rq() triggers a use-after-free Dereferencing shost from scsi_exit_rq() is not safe because the SCSI host may already have been freed when scsi_exit_rq() is called. Increasing the shost reference count in scsi_init_rq() and dropping that reference in scsi_exit_rq() is nontrivial since scsi_host_dev_release() may sleep and since scsi_exit_rq() may be called from interrupt context. Since scsi_exit_rq() only needs a single bit from shost, copy that bit into struct scsi_cmnd. Reported-by: Scott Bauer <scott.bauer@intel.com> Fixes: e9c787e65c0c ("scsi: allocate scsi_cmnd structures as part of struct request") Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: Hannes Reinecke <hare@suse.com> Cc: Scott Bauer <scott.bauer@intel.com> Cc: Jan Kara <jack@suse.cz> Cc: <stable@vger.kernel.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# fc17b653	03-Jun-2017	Christoph Hellwig <hch@lst.de>	blk-mq: switch ->queue_rq return value to blk_status_t Use the same values for use for request completion errors as the return value from ->queue_rq. BLK_STS_RESOURCE is special cased to cause a requeue, and all the others are completed as-is. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
# 2a842aca	03-Jun-2017	Christoph Hellwig <hch@lst.de>	block: introduce new block status code type Currently we use nornal Linux errno values in the block layer, and while we accept any error a few have overloaded magic meanings. This patch instead introduces a new blk_status_t value that holds block layer specific status codes and explicitly explains their meaning. Helpers to convert from and to the previous special meanings are provided for now, but I suspect we want to get rid of them in the long run - those drivers that have a errno input (e.g. networking) usually get errnos that don't know about the special block layer overloads, and similarly returning them to userspace will usually return somethings that strictly speaking isn't correct for file system operations, but that's left as an exercise for later. For now the set of errors is a very limited set that closely corresponds to the previous overloaded errno values, but there is some low hanging fruite to improve it. blk_status_t (ab)uses the sparse __bitwise annotations to allow for sparse typechecking, so that we can easily catch places passing the wrong values. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
# 9efc160f	31-May-2017	Bart Van Assche <bvanassche@acm.org>	block: Introduce queue flag QUEUE_FLAG_SCSI_PASSTHROUGH From the context where a SCSI command is submitted it is not always possible to figure out whether or not the queue the command is submitted to has struct scsi_request as the first member of its private data. Hence introduce the flag QUEUE_FLAG_SCSI_PASSTHROUGH. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Cc: Omar Sandoval <osandov@fb.com> Cc: Don Brace <don.brace@microsemi.com> Signed-off-by: Jens Axboe <axboe@fb.com>
# 1bad6c4a	18-May-2017	Long Li <longli@microsoft.com>	scsi: zero per-cmd private driver data for each MQ I/O In lower layer driver's (LLD) scsi_host_template, the driver may optionally ask SCSI to allocate its private driver memory for each command, by specifying cmd_size. This memory is allocated at the end of scsi_cmnd by SCSI. Later when SCSI queues a command, the LLD can use scsi_cmd_priv to get to its private data. Some LLD, e.g. hv_storvsc, doesn't clear its private data before use. In this case, the LLD may get to stale or uninitialized data in its private driver memory. This may result in unexpected driver and hardware behavior. Fix this problem by also zeroing the private driver memory before passing them to LLD. Signed-off-by: Long Li <longli@microsoft.com> Reviewed-by: Bart Van Assche <Bart.VanAssche@sandisk.com> Reviewed-by: KY Srinivasan <kys@microsoft.com> Reviewed-by: Christoph Hellwig <hch@infradead.org> CC: <stable@vger.kernel.org> # 4.11+ Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# 7aa686d3	02-May-2017	Bart Van Assche <bvanassche@acm.org>	scsi: scsi_lib: Add #include <scsi/scsi_transport.h> This patch avoids that when building with W=1 the compiler complains that __scsi_init_queue() has not been declared. See also commit d48777a633d6 ("scsi: remove __scsi_alloc_queue"). Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Hannes Reinecke <hare@suse.com> Cc: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# d6296d39	01-May-2017	Christoph Hellwig <hch@lst.de>	blk-mq: update ->init_request and ->exit_request prototypes Remove the request_idx parameter, which can't be used safely now that we support I/O schedulers with blk-mq. Except for a superflous check in mtip32xx it was unused anyway. Also pass the tag_set instead of just the driver data - this allows drivers to avoid some code duplication in a follow on cleanup. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
# 0eebd005	26-Apr-2017	Bart Van Assche <bvanassche@acm.org>	scsi: Implement blk_mq_ops.show_rq() Show the SCSI CDB for pending SCSI commands in /sys/kernel/debug/block//mq//dispatch and */rq_list. An example of how SCSI commands are displayed by this code: ffff8801703245c0 {.op=READ, .cmd_flags=META PRIO, .rq_flags=DONTPREP IO_STAT STATS, .tag=14, .internal_tag=-1, .cmd=Read(10) 28 00 2a 81 1b 30 00 00 08 00} Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Omar Sandoval <osandov@fb.com> Cc: Martin K. Petersen <martin.petersen@oracle.com> Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Cc: Hannes Reinecke <hare@suse.com> Cc: <linux-scsi@vger.kernel.org> Signed-off-by: Jens Axboe <axboe@fb.com>
# 08e0029a	20-Apr-2017	Christoph Hellwig <hch@lst.de>	blk-mq: remove the error argument to blk_mq_complete_request Now that all drivers that call blk_mq_complete_requests have a ->complete callback we can remove the direct call to blk_mq_end_request, as well as the error argument to blk_mq_complete_request. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Bart Van Assche <Bart.VanAssche@sandisk.com> Signed-off-by: Jens Axboe <axboe@fb.com>
# 17d5363b	20-Apr-2017	Christoph Hellwig <hch@lst.de>	scsi: introduce a result field in struct scsi_request This passes on the scsi_cmnd result field to users of passthrough requests. Currently we abuse req->errors for this purpose, but that field will go away in its current form. Note that the old IDE code abuses the errors field in very creative ways and stores all kinds of different values in it. I didn't dare to touch this magic, so the abuses are brought forward 1:1. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Bart Van Assche <Bart.VanAssche@sandisk.com> Signed-off-by: Jens Axboe <axboe@fb.com>
# e7661a8e	12-Apr-2017	Johannes Thumshirn <jthumshirn@suse.de>	scsi: return correct blkprep status code in case scsi_init_io() fails. When instrumenting the SCSI layer to run into the !blk_rq_nr_phys_segments(rq) case the following warning emitted from the block layer: blk_peek_request: bad return=-22 This happens because since commit fd3fc0b4d730 ("scsi: don't BUG_ON() empty DMA transfers") we return the wrong error value from scsi_prep_fn() back to the block layer. [mkp: silenced checkpatch] Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de> Fixes: fd3fc0b4d730 scsi: don't BUG_ON() empty DMA transfers Cc: <stable@vger.kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# 36e3cf27	07-Apr-2017	Bart Van Assche <bvanassche@acm.org>	scsi: Avoid that SCSI queues get stuck If a .queue_rq() function returns BLK_MQ_RQ_QUEUE_BUSY then the block driver that implements that function is responsible for rerunning the hardware queue once requests can be queued again successfully. commit 52d7f1b5c2f3 ("blk-mq: Avoid that requeueing starts stopped queues") removed the blk_mq_stop_hw_queue() call from scsi_queue_rq() for the BLK_MQ_RQ_QUEUE_BUSY case. Hence change all calls to functions that are intended to rerun a busy queue such that these examine all hardware queues instead of only stopped queues. Since no other functions than scsi_internal_device_block() and scsi_internal_device_unblock() should ever stop or restart a SCSI queue, change the blk_mq_delay_queue() call into a blk_mq_delay_run_hw_queue() call. Fixes: commit 52d7f1b5c2f3 ("blk-mq: Avoid that requeueing starts stopped queues") Fixes: commit 7e79dadce222 ("blk-mq: stop hardware queue in blk_mq_delay_queue()") Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: Martin K. Petersen <martin.petersen@oracle.com> Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Hannes Reinecke <hare@suse.de> Cc: Sagi Grimberg <sagi@grimberg.me> Cc: Long Li <longli@microsoft.com> Cc: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Jens Axboe <axboe@fb.com>
# a0658632	06-Apr-2017	Hannes Reinecke <hare@suse.de>	scsi: make asynchronous aborts mandatory There hasn't been any reports for HBAs where asynchronous abort would not work, so we should make it mandatory and remove the fallback. Signed-off-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# 2171b6d0	06-Apr-2017	Hannes Reinecke <hare@suse.de>	scsi: make scsi_eh_scmd_add() always succeed scsi_eh_scmd_add() currently only will fail if no error handler thread is started (which will never be the case) or if the state machine encounters an illegal transition. But if we're encountering an invalid state transition chances is we cannot fixup things with the error handler. So better add a WARN_ON for illegal host states and make scsi_dh_scmd_add() a void function. Signed-off-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# 64c7f1d1	05-Apr-2017	Christoph Hellwig <hch@lst.de>	block, scsi: move the retries field to struct scsi_request Instead of bloating the generic struct request with it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jens Axboe <axboe@fb.com>
# f363b089	30-Mar-2017	Eric Biggers <ebiggers@google.com>	blk-mq: constify struct blk_mq_ops Constify all instances of blk_mq_ops, as they are never modified. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Jens Axboe <axboe@fb.com>
# 8893cf6c	01-Mar-2017	Bart Van Assche <bvanassche@acm.org>	scsi: mpt3sas: Avoid sleeping in interrupt context Commit 669f044170d8 ("scsi: srp_transport: Move queuecommand() wait code to SCSI core") can make scsi_internal_device_block() sleep. However, the mpt3sas driver can call this function from an interrupt handler. Hence add a second argument to scsi_internal_device_block() that restores the old behavior of this function for the mpt3sas handler. The call chain that triggered an "IRQ handler enabled interrupts" complaint is as follows: _base_interrupt() -> _base_async_event() -> mpt3sas_scsih_event_callback() -> _scsih_check_topo_delete_events() -> _scsih_block_io_to_children_attached_directly() -> _scsih_block_io_device() -> _scsih_internal_device_block() -> scsi_internal_device_block() Reported-by: Omar Sandoval <osandov@osandov.com> Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: Omar Sandoval <osandov@osandov.com> Cc: Hannes Reinecke <hare@suse.com> Cc: Sagi Grimberg <sagi@grimberg.me> Cc: Christoph Hellwig <hch@lst.de> Cc: Sathya Prakash <sathya.prakash@broadcom.com> Cc: Chaitra P B <chaitra.basappa@broadcom.com> Cc: Suganath Prabu Subramani <suganath-prabu.subramani@broadcom.com> Cc: Sreekanth Reddy <Sreekanth.Reddy@broadcom.com> Cc: <stable@vger.kernel.org> # v4.10+ Tested-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# fcbfffe2	23-Feb-2017	Christoph Hellwig <hch@lst.de>	scsi: remove scsi_execute_req_flags And switch all callers to use scsi_execute instead. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# 76aaf87b	23-Feb-2017	Christoph Hellwig <hch@lst.de>	scsi: merge __scsi_execute into scsi_execute All but one caller want the decoded sense header, so offer the existing __scsi_execute helper as the public scsi_execute API to simply the callers. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# 3949e2f0	14-Feb-2017	Christoph Hellwig <hch@lst.de>	scsi: simplify scsi_execute_req_flags Add a sshdr argument to __scsi_execute so that we can decode the sense data directly into the sense header instead of needing a copy of it. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# 74a78ebd	14-Feb-2017	Christoph Hellwig <hch@lst.de>	scsi: make the sense header argument to scsi_test_unit_ready mandatory It's a tiny structure that can be allocated on the stack, don't complicate the code by making it optional. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# 857de6e0	17-Feb-2017	Hannes Reinecke <hare@suse.de>	scsi: use 'scsi_device_from_queue()' for scsi_dh The device handler needs to check if a given queue belongs to a scsi device; only then does it make sense to attach a device handler. [mkp: dropped flags] Cc: <stable@vger.kernel.org> Signed-off-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# ee524236	21-Feb-2017	Christoph Hellwig <hch@lst.de>	scsi: zero per-cmd driver data before each I/O Without this drivers that don't clear the state themselves can see off effects. For example Hyper-V VMs using the storvsc driver will often hang during boot due to uncleared Test Unit Ready failures. Fixes: e9c787e6 ("scsi: allocate scsi_cmnd structures as part of struct request") Signed-off-by: Christoph Hellwig <hch@lst.de> Reported-by: Dexuan Cui <decui@microsoft.com> Tested-by: Dexuan Cui <decui@microsoft.com> Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com> Signed-off-by: Jens Axboe <axboe@fb.com>
# fd3fc0b4	31-Jan-2017	Johannes Thumshirn <jthumshirn@suse.de>	scsi: don't BUG_ON() empty DMA transfers Don't crash the machine just because of an empty transfer. Use WARN_ON() combined with returning an error. Found by Dmitry Vyukov and syzkaller. [ Changed to "WARN_ON_ONCE()". Al has a patch that should fix the root cause, but a BUG_ON() is not acceptable in any case, and a WARN_ON() might still be a cause of excessive log spamming. NOTE! If this warning ever triggers, we may end up leaking resources, since this doesn't bother to try to clean the command up. So this WARN_ON_ONCE() triggering does imply real problems. But BUG_ON() is much worse. People really need to stop using BUG_ON() for "this shouldn't ever happen". It makes pretty much any bug worse. - Linus ] Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de> Reported-by: Dmitry Vyukov <dvyukov@google.com> Cc: James Bottomley <jejb@linux.vnet.ibm.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
# aebf526b	31-Jan-2017	Christoph Hellwig <hch@lst.de>	block: fold cmd_type into the REQ_OP_ space Instead of keeping two levels of indirection for requests types, fold it all into the operations. The little caveat here is that previously cmd_type only applied to struct request, while the request and bio op fields were set to plain REQ_OP_READ/WRITE even for passthrough operations. Instead this patch adds new REQ_OP_* for SCSI passthrough and driver private requests, althought it has to add two for each so that we can communicate the data in/out nature of the request. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
# 57292b58	31-Jan-2017	Christoph Hellwig <hch@lst.de>	block: introduce blk_rq_is_passthrough This can be used to check for fs vs non-fs requests and basically removes all knowledge of BLOCK_PC specific from the block layer, as well as preparing for removing the cmd_type field in struct request. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
# 82ed4db4	27-Jan-2017	Christoph Hellwig <hch@lst.de>	block: split scsi_request out of struct request And require all drivers that want to support BLOCK_PC to allocate it as the first thing of their private data. To support this the legacy IDE and BSG code is switched to set cmd_size on their queues to let the block layer allocate the additional space. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
# e9c787e6	02-Jan-2017	Christoph Hellwig <hch@lst.de>	scsi: allocate scsi_cmnd structures as part of struct request Rely on the new block layer functionality to allocate additional driver specific data behind struct request instead of implementing it in SCSI itѕelf. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Jens Axboe <axboe@fb.com>
# d48777a6	02-Jan-2017	Christoph Hellwig <hch@lst.de>	scsi: remove __scsi_alloc_queue Instead do an internal export of __scsi_init_queue for the transport classes that export BSG nodes. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Jens Axboe <axboe@fb.com>
# 0a6ac4ee	02-Jan-2017	Christoph Hellwig <hch@lst.de>	scsi: respect unchecked_isa_dma for blk-mq Currently blk-mq always allocates the sense buffer using normal GFP_KERNEL allocation. Refactor the cmd pool code to split the cmd and sense allocation and share the code to allocate the sense buffers as well as the sense buffer slab caches between the legacy and blk-mq path. Note that this switches to lazy allocation of the sense slab caches - the slab caches (not the actual allocations) won't be destroy until the scsi module is unloaded instead of keeping track of hosts using them. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Acked-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jens Axboe <axboe@fb.com>
# fd102b12	12-Jan-2017	Christoph Hellwig <hch@lst.de>	scsi: use blk_rq_payload_bytes Without that we'll pass a wrong payload size in cmd->sdb, which can lead to hangs with drivers that need the total transfer size. Signed-off-by: Christoph Hellwig <hch@lst.de> Reported-by: Chris Valean <v-chvale@microsoft.com> Reported-by: Dexuan Cui <decui@microsoft.com> Fixes: f9d03f96 ("block: improve handling of the magic discard payload") Reviewed-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jens Axboe <axboe@fb.com>
# 7dbbf0fa	22-Nov-2016	Bart Van Assche <bvanassche@acm.org>	scsi: scsi-mq: Wait for .queue_rq() if necessary Ensure that if scsi-mq is enabled that scsi_internal_device_block() waits until ongoing shost->hostt->queuecommand() calls have finished. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Cc: James Bottomley <jejb@linux.vnet.ibm.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Doug Ledford <dledford@redhat.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# f9d03f96	08-Dec-2016	Christoph Hellwig <hch@lst.de>	block: improve handling of the magic discard payload Instead of allocating a single unused biovec for discard requests, send them down without any payload. Instead we allow the driver to add a "special" payload using a biovec embedded into struct request (unioned over other fields never used while in the driver), and overloading the number of segments for this case. This has a couple of advantages: - we don't have to allocate the bio_vec - the amount of special casing for discard requests in the block layer is significantly reduced - using this same scheme for other request types is trivial, which will be important for implementing the new WRITE_ZEROES op on devices where it actually requires a payload (e.g. SCSI) - we can get rid of playing games with the request length, as we'll never touch it and completions will work just fine - it will allow us to support ranged discard operations in the future by merging non-contiguous discard bios into a single request - last but not least it removes a lot of code This patch is the common base for my WIP series for ranges discards and to remove discard_zeroes_data in favor of always using REQ_OP_WRITE_ZEROES, so it would be good to get it in quickly. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
# 669f0441	22-Nov-2016	Bart Van Assche <bvanassche@acm.org>	scsi: srp_transport: Move queuecommand() wait code to SCSI core Additionally, rename srp_wait_for_queuecommand() into scsi_wait_for_queuecommand() and add a comment about the queuecommand() call from scsi_send_eh_cmnd(). Note: this patch changes scsi_internal_device_block from a function that did not sleep into a function that may sleep. This is fine for all callers of this function: * scsi_internal_device_block() is called from the mpt3sas device while that driver holds the ioc->dm_cmds.mutex. This means that the mpt3sas driver calls this function from thread context. * scsi_target_block() is called by __iscsi_block_session() from kernel thread context and with IRQs enabled. * The SRP transport code also calls scsi_target_block() from kernel thread context while sleeping is allowed. * The snic driver also calls scsi_target_block() from a context from which sleeping is allowed. The scsi_target_block() call namely occurs immediately after a scsi_flush_work() call. [mkp: s/shost/sdev/] Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Cc: James Bottomley <jejb@linux.vnet.ibm.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Doug Ledford <dledford@redhat.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# 2868f13c	15-Nov-2016	Omar Sandoval <osandov@fb.com>	scsi_lib: untangle 0 and BLK_MQ_RQ_QUEUE_OK Let's not depend on any of the BLK_MQ_RQ_QUEUE_* constants having specific values. No functional change. Signed-off-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Jens Axboe <axboe@fb.com>
# 2d9c5c20	01-Nov-2016	Christoph Hellwig <hch@lst.de>	scsi: allow LLDDs to expose the queue mapping to blk-mq Just hand through the blk-mq map_queues method in the host template. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# 2b053aca	28-Oct-2016	Bart Van Assche <bvanassche@acm.org>	blk-mq: Add a kick_requeue_list argument to blk_mq_requeue_request() Most blk_mq_requeue_request() and blk_mq_add_to_requeue_list() calls are followed by kicking the requeue list. Hence add an argument to these two functions that allows to kick the requeue list. This was proposed by Christoph Hellwig. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: Hannes Reinecke <hare@suse.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jens Axboe <axboe@fb.com>
# 52d7f1b5	28-Oct-2016	Bart Van Assche <bvanassche@acm.org>	blk-mq: Avoid that requeueing starts stopped queues Since blk_mq_requeue_work() starts stopped queues and since execution of this function can be scheduled after a queue has been stopped it is not possible to stop queues without using an additional state variable to track whether or not the queue has been stopped. Hence modify blk_mq_requeue_work() such that it does not start stopped queues. My conclusion after a review of the blk_mq_stop_hw_queues() and blk_mq_{delay_,}kick_requeue_list() callers is as follows: * In the dm driver starting and stopping queues should only happen if __dm_suspend() or __dm_resume() is called and not if the requeue list is processed. * In the SCSI core queue stopping and starting should only be performed by the scsi_internal_device_block() and scsi_internal_device_unblock() functions but not by any other function. Although the blk_mq_stop_hw_queue() call in scsi_queue_rq() may help to reduce CPU load if a LLD queue is full, figuring out whether or not a queue should be restarted when requeueing a command would require to introduce additional locking in scsi_mq_requeue_cmd() to avoid a race with scsi_internal_device_block(). Avoid this complexity by removing the blk_mq_stop_hw_queue() call from scsi_queue_rq(). * In the NVMe core only the functions that call blk_mq_start_stopped_hw_queues() explicitly should start stopped queues. * A blk_mq_start_stopped_hwqueues() call must be added in the xen-blkfront driver in its blkif_recover() function. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Roger Pau Monné <roger.pau@citrix.com> Cc: Mike Snitzer <snitzer@redhat.com> Cc: James Bottomley <jejb@linux.vnet.ibm.com> Cc: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
# e8064021	20-Oct-2016	Christoph Hellwig <hch@lst.de>	block: split out request-only flags into a new namespace A lot of the REQ_* flags are only used on struct requests, and only of use to the block layer and a few drivers that dig into struct request internals. This patch adds a new req_flags_t rq_flags field to struct request for them, and thus dramatically shrinks the number of common requests. It also removes the unfortunate situation where we have to fit the fields from the same enum into 32 bits for struct bio and 64 bits for struct request. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Shaun Tancheff <shaun.tancheff@seagate.com> Signed-off-by: Jens Axboe <axboe@fb.com>
# 7d7e0f90	14-Sep-2016	Christoph Hellwig <hch@lst.de>	blk-mq: remove ->map_queue All drivers use the default, so provide an inline version of it. If we ever need other queue mapping we can add an optional method back, although supporting will also require major changes to the queue setup code. This provides better code generation, and better debugability as well. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
# a621bac3	13-May-2016	James Bottomley <James.Bottomley@HansenPartnership.com>	scsi_lib: correctly retry failed zero length REQ_TYPE_FS commands When SCSI was written, all commands coming from the filesystem (REQ_TYPE_FS commands) had data. This meant that our signal for needing to complete the command was the number of bytes completed being equal to the number of bytes in the request. Unfortunately, with the advent of flush barriers, we can now get zero length REQ_TYPE_FS commands, which confuse this logic because they satisfy the condition every time. This means they never get retried even for retryable conditions, like UNIT ATTENTION because we complete them early assuming they're done. Fix this by special casing the early completion condition to recognise zero length commands with errors and let them drop through to the retry code. Cc: stable@vger.kernel.org Reported-by: Sebastian Parschauer <s.parschauer@gmx.de> Signed-off-by: James E.J. Bottomley <jejb@linux.vnet.ibm.com> Tested-by: Jack Wang <jinpu.wang@profitbricks.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# d230823a	09-May-2016	Hannes Reinecke <hare@suse.de>	scsi_lib: Decode T10 vendor IDs Some arrays / HBAs will only present T10 vendor IDs, so we should be decoding them, too. [mkp: Fixed T10 spelling] Suggested-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Hannes Reinecke <hare@suse.com> Tested-by: Paul Mackerras <paulus@ozlabs.org> Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# 9b1d6c89	04-Apr-2016	Ming Lin <ming.l@ssi.samsung.com>	lib: scatterlist: move SG pool code from SCSI driver to lib/sg_pool.c Now it's ready to move the mempool based SG chained allocator code from SCSI driver to lib/sg_pool.c, which will be compiled only based on a Kconfig symbol CONFIG_SG_POOL. SCSI selects CONFIG_SG_POOL. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ming Lin <ming.l@ssi.samsung.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# 65e8617f	04-Apr-2016	Ming Lin <ming.l@ssi.samsung.com>	scsi: rename SCSI_MAX_{SG, SG_CHAIN}_SEGMENTS Rename SCSI_MAX_SG_SEGMENTS to SG_CHUNK_SIZE, which means the amount we fit into a single scatterlist chunk. Rename SCSI_MAX_SG_CHAIN_SEGMENTS to SG_MAX_SEGMENTS. Will move these 2 generic definitions to scatterlist.h later. Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Bart Van Assche <bart.vanassche@sandisk.com> (for ib_srp changes) Signed-off-by: Ming Lin <ming.l@ssi.samsung.com> Acked-by: Tejun Heo <tj@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# 001d63be	04-Apr-2016	Ming Lin <ming.l@ssi.samsung.com>	scsi: rename SG related struct and functions Rename SCSI specific struct and functions to more genenic names. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ming Lin <ming.l@ssi.samsung.com> Reviewed-by: Sagi Grimberg <sgi@grimberg.me> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# 22cc3d4c	04-Apr-2016	Ming Lin <ming.l@ssi.samsung.com>	scsi: replace "mq" with "first_chunk" in SG functions Parameter "bool mq" is block driver specific. Change it to "first_chunk" to make it more generic. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ming Lin <ming.l@ssi.samsung.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# 91dbc08d	04-Apr-2016	Ming Lin <ming.l@ssi.samsung.com>	scsi: replace "scsi_data_buffer" with "sg_table" in SG functions Replace parameter "struct scsi_data_buffer" with "struct sg_table" in SG alloc/free functions to make them generic. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ming Lin <ming.l@ssi.samsung.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# e1cd3911	16-Feb-2016	jiangyiwen <jiangyiwen@huawei.com>	SCSI: Free resources when we return BLKPREP_INVALID When called scsi_prep_fn return BLKPREP_INVALID, we should use the same code with BLKPREP_KILL in scsi_prep_return. Signed-off-by: Yiwen Jiang <jiangyiwen@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# d3d32891	19-Feb-2016	Hannes Reinecke <hare@suse.de>	scsi_dh: add 'rescan' callback If a device needs to be rescanned the device_handler might need to be rechecked, too. So add a 'rescan' callback to the device handler and call it upon scsi_rescan_device(). The rescan callback will be invoked from the Unit Attention handling of ASC/ASCQ 3F 03 (INQUIRY DATA HAS CHANGED). Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# a8aa3978	01-Dec-2015	Hannes Reinecke <hare@suse.de>	scsi: Add scsi_vpd_tpg_id() Implement scsi_vpd_tpg_id() to extract the target port group id and the relative port id from SCSI VPD page 0x83. Reviewed-by: Johannes Thumshirn <jthumshirn@suse.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# 9983bed3	01-Dec-2015	Hannes Reinecke <hare@suse.de>	scsi: Add scsi_vpd_lun_id() Add a function scsi_vpd_lun_id() to return a unique device identifcation based on the designation descriptors of VPD page 0x83. As devices might implement several descriptors the order of preference is: - NAA IEE Registered Extended - EUI-64 based 16-byte - EUI-64 based 12-byte - NAA IEEE Registered - NAA IEEE Extended A SCSI name string descriptor is preferred to all of them if the identification is longer than 16 bytes. The returned unique device identification will be formatted as a SCSI Name string to avoid clashes between different designator types. [mkp: Fixed up kernel doc comment from Johannes] Signed-off-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Ewan Milne <emilne@redhat.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
# 71baba4b	06-Nov-2015	Mel Gorman <mgorman@techsingularity.net>	mm, page_alloc: rename __GFP_WAIT to __GFP_RECLAIM __GFP_WAIT was used to signal that the caller was in atomic context and could not sleep. Now it is possible to distinguish between true atomic context and callers that are not willing to sleep. The latter should clear __GFP_DIRECT_RECLAIM so kswapd will still wake. As clearing __GFP_WAIT behaves differently, there is a risk that people will clear the wrong flags. This patch renames __GFP_WAIT to __GFP_RECLAIM to clearly indicate what it does -- setting it allows all reclaim activity, clearing them prevents it. [akpm@linux-foundation.org: fix build] [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Christoph Lameter <cl@linux.com> Acked-by: David Rientjes <rientjes@google.com> Cc: Vitaly Wool <vitalywool@gmail.com> Cc: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
# f4829a9b	27-Sep-2015	Christoph Hellwig <hch@lst.de>	blk-mq: fix racy updates of rq->errors blk_mq_complete_request may be a no-op if the request has already been completed by others means (e.g. a timeout or cancellation), but currently drivers have to set rq->errors before calling blk_mq_complete_request, which might leave us with the wrong error value. Add an error parameter to blk_mq_complete_request so that we can defer setting rq->errors until we known we won the race to complete the request. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Jens Axboe <axboe@fb.com>
# ee14c674	27-Aug-2015	Christoph Hellwig <hch@lst.de>	scsi_dh: kill struct scsi_dh_data Add a ->handler and a ->handler_data field to struct scsi_device and kill this indirection. Also move struct scsi_device_handler to scsi_dh.h so that changes to it don't require rebuilding every SCSI LLDD. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: James Bottomley <JBottomley@Odin.com>
# 14c3e677	06-Jul-2015	Hannes Reinecke <hare@suse.de>	scsi: Add ALUA state change UA handling Log the ALUA state change unit attention correctly with the message log and emit an event to allow user-space tools to react to it. Signed-off-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Ewan D. Milne <emilne@redhat.com> Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com> Signed-off-by: James Bottomley <JBottomley@Odin.com>
# 0ae80ba9	12-Jun-2015	Hannes Reinecke <hare@suse.de>	scsi: retry MODE SENSE on unit attention The 'sd' driver is calling scsi_mode_sense() to figure out internal details. But scsi_mode_sense() never checks for any pending unit attentions, so we're getting annoying error messages like: MODE SENSE: unimplemented page/subpage: 0x00/0x00 and a possible wrong decision for device cache handling. Reviewed-by: Ewan Milne <emilne@redhat.com> Signed-off-by: Hannes Reinecke <hare@suse.de> Signed-off-by: James Bottomley <JBottomley@Odin.com>
# 0c958ecc	16-Jul-2015	Tony Battersby <tonyb@cybernetics.com>	scsi: fix memory leak with scsi-mq Fix a memory leak with scsi-mq triggered by commands with large data transfer length. __sg_alloc_table() sets both table->nents and table->orig_nents to the same value. When the scatterlist is DMA-mapped, table->nents is overwritten with the (possibly smaller) size of the DMA-mapped scatterlist, while table->orig_nents retains the original size of the allocated scatterlist. scsi_free_sgtable() should therefore check orig_nents instead of nents, and all code that initializes sdb->table without calling __sg_alloc_table() should set both nents and orig_nents. Fixes: d285203cf647 ("scsi: add support for a blk-mq based I/O path.") Cc: <stable@vger.kernel.org> # 3.17+ Signed-off-by: Tony Battersby <tonyb@cybernetics.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Ewan D. Milne <emilne@redhat.com> Signed-off-by: James Bottomley <JBottomley@Odin.com>
# bba0bdd7	04-Mar-2015	Bart Van Assche <bvanassche@acm.org>	Defer processing of REQ_PREEMPT requests for blocked devices SCSI transport drivers and SCSI LLDs block a SCSI device if the transport layer is not operational. This means that in this state no requests should be processed, even if the REQ_PREEMPT flag has been set. This patch avoids that a rescan shortly after a cable pull sporadically triggers the following kernel oops: BUG: unable to handle kernel paging request at ffffc9001a6bc084 IP: [<ffffffffa04e08f2>] mlx4_ib_post_send+0xd2/0xb30 [mlx4_ib] Process rescan-scsi-bus (pid: 9241, threadinfo ffff88053484a000, task ffff880534aae100) Call Trace: [<ffffffffa0718135>] srp_post_send+0x65/0x70 [ib_srp] [<ffffffffa071b9df>] srp_queuecommand+0x1cf/0x3e0 [ib_srp] [<ffffffffa0001ff1>] scsi_dispatch_cmd+0x101/0x280 [scsi_mod] [<ffffffffa0009ad1>] scsi_request_fn+0x411/0x4d0 [scsi_mod] [<ffffffff81223b37>] __blk_run_queue+0x27/0x30 [<ffffffff8122a8d2>] blk_execute_rq_nowait+0x82/0x110 [<ffffffff8122a9c2>] blk_execute_rq+0x62/0xf0 [<ffffffffa000b0e8>] scsi_execute+0xe8/0x190 [scsi_mod] [<ffffffffa000b2f3>] scsi_execute_req+0xa3/0x130 [scsi_mod] [<ffffffffa000c1aa>] scsi_probe_lun+0x17a/0x450 [scsi_mod] [<ffffffffa000ce86>] scsi_probe_and_add_lun+0x156/0x480 [scsi_mod] [<ffffffffa000dc2f>] __scsi_scan_target+0xdf/0x1f0 [scsi_mod] [<ffffffffa000dfa3>] scsi_scan_host_selected+0x183/0x1c0 [scsi_mod] [<ffffffffa000edfb>] scsi_scan+0xdb/0xe0 [scsi_mod] [<ffffffffa000ee13>] store_scan+0x13/0x20 [scsi_mod] [<ffffffff811c8d9b>] sysfs_write_file+0xcb/0x160 [<ffffffff811589de>] vfs_write+0xce/0x140 [<ffffffff81158b53>] sys_write+0x53/0xa0 [<ffffffff81464592>] system_call_fastpath+0x16/0x1b [<00007f611c9d9300>] 0x7f611c9d92ff Reported-by: Max Gurtuvoy <maxg@mellanox.com> Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Mike Christie <michaelc@cs.wisc.edu> Cc: <stable@vger.kernel.org> Signed-off-by: James Bottomley <JBottomley@Odin.com>
# 24391c0d	23-Jan-2015	Shaohua Li <shli@fb.com>	blk-mq: add tag allocation policy This is the blk-mq part to support tag allocation policy. The default allocation policy isn't changed (though it's not a strict FIFO). The new policy is round-robin for libata. But it's a try-best implementation. If multiple tasks are competing, the tags returned will be mixed (which is unavoidable even with !mq, as requests from different tasks can be mixed in queue) Cc: Jens Axboe <axboe@fb.com> Cc: Tejun Heo <tj@kernel.org> Cc: Christoph Hellwig <hch@infradead.org> Signed-off-by: Shaohua Li <shli@fb.com> Signed-off-by: Jens Axboe <axboe@fb.com>
# 91724c20	15-Jan-2015	Ewan D. Milne <emilne@redhat.com>	scsi: Avoid crashing if device uses DIX but adapter does not support it This can happen if a multipathed device uses DIX and another path is added via an adapter that does not support it. Multipath should not allow this path to be added, but we should not depend upon that to avoid crashing. Signed-off-by: Ewan D. Milne <emilne@redhat.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
# 70a0f2c1	05-Jan-2015	Christoph Hellwig <hch@lst.de>	scsi: ->queue_rq can't sleep The blk-mq ->queue_rq method is always called from process context, but might have preemption disabled. This means we still always have to use GFP_ATOMIC for memory allocations, and thus need to revert part of commit 3c356bde1 ("scsi: stop passing a gfp_mask argument down the command setup path"). Signed-off-by: Christoph Hellwig <hch@lst.de> Reported-by: Sasha Levin <sasha.levin@oracle.com> Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com> Tested-by: Alexei Starovoitov <alexei.starovoitov@gmail.com>
# 120bb3e1	08-Dec-2014	Tony Battersby <tonyb@cybernetics.com>	scsi: fix random memory corruption with scsi-mq + T10 PI This fixes random memory corruption triggered when all three of the following are true: * scsi-mq enabled * T10 Protection Information (DIF) enabled * SCSI host with sg_tablesize > SCSI_MAX_SG_SEGMENTS (128) The symptoms of this bug are unpredictable memory corruption, BUG()s, oopses, lockups, etc., any of which may appear to be completely unrelated to the root cause. Cc: <stable@vger.kernel.org> # 3.17.x, 3.18.x Signed-off-by: Tony Battersby <tonyb@cybernetics.com> Reviewed-by: Nicholas Bellinger <nab@linux-iscsi.org> Signed-off-by: Christoph Hellwig <hch@lst.de>
# 82042a2c	05-Sep-2014	Christoph Hellwig <hch@lst.de>	scsi: move scsi_dispatch_cmd to scsi_lib.c scsi_lib.c is where the rest of the I/O submission path lives, so move scsi_dispatch_cmd there and mark it static. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Hannes Reinecke <hare@suse.de>
# 3c356bde	05-Sep-2014	Christoph Hellwig <hch@lst.de>	scsi: stop passing a gfp_mask argument down the command setup path There is no reason for ULDs to pass in a flag on how to allocate the S/G lists. While we don't need GFP_ATOMIC for the blk-mq case because we don't hold locks, that decision can be made way down the chain without having to pass a pointless gfp_mask argument. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Hannes Reinecke <hare@suse.de>
# bb3ec62a	05-Sep-2014	Christoph Hellwig <hch@lst.de>	scsi: remove scsi_next_command There's only one caller left, so inline it and reduce the blk-mq vs !blk-mq diff a litte bit. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Hannes Reinecke <hare@suse.de>
# 125c99bc	02-Nov-2014	Christoph Hellwig <hch@lst.de>	scsi: add new scsi-command flag for tagged commands Currently scsi piggy backs on the block layer to define the concept of a tagged command. But we want to be able to have block-level host-wide tags assigned even for untagged commands like the initial INQUIRY, so add a new SCSI-level flag for commands that are tagged at the scsi level, so that even commands without that set can have tags assigned to them. Note that this alredy is the case for the blk-mq code path, and this just lets the old path catch up with it. We also set this flag based upon sdev->simple_tags instead of the block queue flag, so that it is entirely independent of the block layer tagging, and thus always correct even if a driver doesn't use block level tagging yet. Also remove the old blk_rq_tagged; it was only used by SCSI drivers, and removing it forces them to look for the proper replacement. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Mike Christie <michaelc@cs.wisc.edu> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Hannes Reinecke <hare@suse.de>
# efec4b90	30-Oct-2014	Bart Van Assche <bvanassche@acm.org>	scsi: add support for multiple hardware queues Allow a SCSI LLD to declare how many hardware queues it supports by setting Scsi_Host.nr_hw_queues before calling scsi_add_host(). Signed-off-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
# f1569ff1	24-Oct-2014	Hannes Reinecke <hare@suse.de>	scsi: ratelimit I/O error messages There can be quite a lot of I/O error messages, even on smaller machines. So we need to ratelimit them to not overwhelm logging. Signed-off-by: Hannes Reinecke <hare@suse.de> Tested-by: Robert Elliott <elliott@hp.com> Reviewed-by: Robert Elliott <elliott@hp.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
# c11c004b	24-Oct-2014	Hannes Reinecke <hare@suse.de>	scsi: simplify scsi_log_(send\|completion) Simplify scsi_log_(send\|completion) by externalizing scsi_mlreturn_string() and always print the command address. Signed-off-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Robert Elliott <elliott@hp.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
# 4753cbc0	24-Oct-2014	Hannes Reinecke <hare@suse.de>	scsi: use 'bool' as return value for scsi_normalize_sense() Convert scsi_normalize_sense() and friends to return 'bool' instead of an integer. Signed-off-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Robert Elliott <elliott@hp.com> Reviewed-by: Yoshihiro Yunomae <yoshihiro.yunomae.ez@hitachi.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
# d811b848	24-Oct-2014	Hannes Reinecke <hare@suse.de>	scsi: use sdev as argument for sense code printing We should be using the standard dev_printk() variants for sense code printing. [hch: remove __scsi_print_sense call in xen-scsiback, Acked by Juergen] [hch: folded bracing fix from Dan Carpenter] Signed-off-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Robert Elliott <elliott@hp.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
# 037e6d86	14-Oct-2014	Mark Rustad <mark.d.rustad@intel.com>	scsi: resolve some missing-field-initializers warnings Resolve some missing-field-initializers warnings by using designated initialization. [hch: W=2 with modern gcc warns about this. Pretty pointless to me, but I'd prefer to keep us warning free] Signed-off-by: Mark Rustad <mark.d.rustad@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
# 74c45052	29-Oct-2014	Jens Axboe <axboe@fb.com>	blk-mq: add a 'list' parameter to ->queue_rq() Since we have the notion of a 'last' request in a chain, we can use this to have the hardware optimize the issuing of requests. Add a list_head parameter to queue_rq that the driver can use to temporarily store hw commands for issue when 'last' is true. If we are doing a chain of requests, pass in a NULL list for the first request to force issue of that immediately, then batch the remainder for deferred issue until the last request has been sent. Instead of adding yet another argument to the hot ->queue_rq path, encapsulate the passed arguments in a blk_mq_queue_data structure. This is passed as a constant, and has been tested as faster than passing 4 (or even 3) args through ->queue_rq. Update drivers for the new ->queue_rq() prototype. There are no functional changes in this patch for drivers - if they don't use the passed in list, then they will just queue requests individually like before. Signed-off-by: Jens Axboe <axboe@fb.com>
# b1dd2aac	19-Oct-2014	Christoph Hellwig <hch@lst.de>	scsi: set REQ_QUEUE for the blk-mq case To generate the right SPI tag messages we need to properly set QUEUE_FLAG_QUEUED in the request_queue and mirror it to the request. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Acked-by: Jens Axboe <axboe@kernel.dk> Reported-by: Meelis Roos <mroos@linux.ee> Tested-by: Meelis Roos <mroos@linux.ee> Cc: stable@vger.kernel.org
# fe052529	22-Sep-2014	Christoph Hellwig <hch@lst.de>	scsi: move blk_mq_start_request call earlier Some ATA drivers need the dma drain size workaround, and thus need to call blk_mq_start_request before the S/G mapping. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
# 0152fb6b	13-Sep-2014	Christoph Hellwig <hch@lst.de>	blk-mq: pass a reserved argument to the timeout handler Allow blk-mq to pass an argument to the timeout handler to indicate if we're timing out a reserved or regular command. For many drivers those need to be handled different. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
# c8a446ad	13-Sep-2014	Christoph Hellwig <hch@lst.de>	blk-mq: rename blk_mq_end_io to blk_mq_end_request Now that we've changed the driver API on the submission side use the opportunity to fix up the name on the completion side to fit into the general scheme. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
# e2490073	13-Sep-2014	Christoph Hellwig <hch@lst.de>	blk-mq: call blk_mq_start_request from ->queue_rq When we call blk_mq_start_request from the core blk-mq code before calling into ->queue_rq there is a racy window where the timeout handler can hit before we've fully set up the driver specific part of the command. Move the call to blk_mq_start_request into the driver so the driver can start the request only once it is fully set up. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
# bf572297	13-Sep-2014	Christoph Hellwig <hch@lst.de>	blk-mq: remove REQ_END Pass an explicit parameter for the last request in a batch to ->queue_rq instead of using a request flag. Besides being a cleaner and non-stateful interface this is also required for the next patch, which fixes the blk-mq I/O submission code to not start a time too early. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
# f81426a8	16-Sep-2014	Daniel Gryniewicz <dang@linuxbox.com>	[SCSI] fix for bidi use after free When ending a bi-directionional SCSI request, blk_finish_request() cleans up and frees the request, but scsi_release_bidi_buffers() tries to indirect through the request to find it's data buffers. This causes a panic due to a null pointer dereference. Move the call to scsi_release_bidi_buffers() before the call to blk_finish_request(). Signed-off-by: Daniel Gryniewicz <dang@linuxbox.com> Reviewed-by: Webb Scales <webbnh@hp.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
# 64bdcbc4	20-Aug-2014	Kashyap.Desai@avagotech.com <Kashyap.Desai@avagotech.com>	scsi: add use_cmd_list flag Add a use_cmd_list flag in struct Scsi_Host to request keeping track of all outstanding commands per device. Default behaviour is not to keep track of cmd_list per sdev, as this may introduce lock contention. (overhead is more on multi-node NUMA.), and only enable it on the two drivers that need it. Signed-off-by: Kashyap Desai <kashyap.desai@avagotech.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Christoph Hellwig <hch@lst.de>
# a492f075	28-Aug-2014	Joe Lawrence <joe.lawrence@stratus.com>	block,scsi: fixup blk_get_request dead queue scenarios The blk_get_request function may fail in low-memory conditions or during device removal (even if __GFP_WAIT is set). To distinguish between these errors, modify the blk_get_request call stack to return the appropriate ERR_PTR. Verify that all callers check the return status and consider IS_ERR instead of a simple NULL pointer check. For consistency, make a similar change to the blk_mq_alloc_request leg of blk_get_request. It may fail if the queue is dead, or the caller was unwilling to wait. Signed-off-by: Joe Lawrence <joe.lawrence@stratus.com> Acked-by: Jiri Kosina <jkosina@suse.cz> [for pktdvd] Acked-by: Boaz Harrosh <bharrosh@panasas.com> [for osd] Reviewed-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Jens Axboe <axboe@fb.com>
# 6f4a1626	22-Aug-2014	Tony Battersby <tonyb@cybernetics.com>	scsi-mq: fix requests that use a separate CDB buffer This patch fixes code such as the following with scsi-mq enabled: rq = blk_get_request(...); blk_rq_set_block_pc(rq); rq->cmd = my_cmd_buffer; /* separate CDB buffer */ blk_execute_rq_nowait(...); Code like this appears in e.g. sg_start_req() in drivers/scsi/sg.c (for large CDBs only). Without this patch, scsi_mq_prep_fn() will set rq->cmd back to rq->__cmd, causing the wrong CDB to be sent to the device. Signed-off-by: Tony Battersby <tonyb@cybernetics.com> Signed-off-by: Jens Axboe <axboe@fb.com>
# 480cadc2	10-Aug-2014	Guenter Roeck <linux@roeck-us.net>	scsi: Fix qemu boot hang problem The latest kernel fails to boot qemu arm images when using scsi for disk access. Boot gets stuck after the following messages. brd: module loaded sym53c8xx 0000:00:0c.0: enabling device (0100 -> 0103) sym0: <895a> rev 0x0 at pci 0000:00:0c.0 irq 93 sym0: No NVRAM, ID 7, Fast-40, LVD, parity checking sym0: SCSI BUS has been reset. scsi host0: sym-2.2.3 Bisect points to commit 71e75c97f97a ("scsi: convert device_busy to atomic_t"). Code inspection shows the following suspicious change in scsi_request_fn. out_delay: - if (sdev->device_busy == 0 && !scsi_device_blocked(sdev)) + if (atomic_read(&sdev->device_busy) && !scsi_device_blocked(sdev)) blk_delay_queue(q, SCSI_QUEUE_DELAY); } 'sdev->device_busy == 0' was replaced with 'atomic_read(&sdev->device_busy)', meaning the logic was reversed. Changing this expression to '!atomic_read(&sdev->device_busy)' fixes the problem. Signed-off-by: Guenter Roeck <linux@roeck-us.net> Reviewed-by: Hannes Reinecke <hare@suse.de> Acked-by: Jens Axboe <axboe@fb.com> Reviewed-by: Venkatesh Srinivas <venkateshs@google.com> Reviewed-by: Webb Scales <webbnh@hp.com> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
# 045065d8	10-Aug-2014	Guenter Roeck <linux@roeck-us.net>	[SCSI] fix qemu boot hang problem The latest kernel fails to boot qemu arm images when using scsi for disk access. Boot gets stuck after the following messages. brd: module loaded sym53c8xx 0000:00:0c.0: enabling device (0100 -> 0103) sym0: <895a> rev 0x0 at pci 0000:00:0c.0 irq 93 sym0: No NVRAM, ID 7, Fast-40, LVD, parity checking sym0: SCSI BUS has been reset. scsi host0: sym-2.2.3 Bisect points to commit 71e75c97f97a ("scsi: convert device_busy to atomic_t"). Code inspection shows the following suspicious change in scsi_request_fn. out_delay: - if (sdev->device_busy == 0 && !scsi_device_blocked(sdev)) + if (atomic_read(&sdev->device_busy) && !scsi_device_blocked(sdev)) blk_delay_queue(q, SCSI_QUEUE_DELAY); } 'sdev->device_busy == 0' was replaced with 'atomic_read(&sdev->device_busy)', meaning the logic was reversed. Changing this expression to '!atomic_read(&sdev->device_busy)' fixes the problem. Signed-off-by: Guenter Roeck <linux@roeck-us.net> Reviewed-by: Hannes Reinecke <hare@suse.de> Acked-by: Jens Axboe <axboe@fb.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Fixes: 71e75c97f97a Signed-off-by: James Bottomley <JBottomley@Parallels.com>
# d285203c	16-Jan-2014	Christoph Hellwig <hch@lst.de>	scsi: add support for a blk-mq based I/O path. This patch adds support for an alternate I/O path in the scsi midlayer which uses the blk-mq infrastructure instead of the legacy request code. Use of blk-mq is fully transparent to drivers, although for now a host template field is provided to opt out of blk-mq usage in case any unforseen incompatibilities arise. In general replacing the legacy request code with blk-mq is a simple and mostly mechanical transformation. The biggest exception is the new code that deals with the fact the I/O submissions in blk-mq must happen from process context, which slightly complicates the I/O completion handler. The second biggest differences is that blk-mq is build around the concept of preallocated requests that also include driver specific data, which in SCSI context means the scsi_cmnd structure. This completely avoids dynamic memory allocations for the fast path through I/O submission. Due the preallocated requests the MQ code path exclusively uses the host-wide shared tag allocator instead of a per-LUN one. This only affects drivers actually using the block layer provided tag allocator instead of their own. Unlike the old path blk-mq always provides a tag, although drivers don't have to use it. For now the blk-mq path is disable by defauly and must be enabled using the "use_blk_mq" module parameter. Once the remaining work in the block layer to make blk-mq more suitable for slow devices is complete I hope to make it the default and eventually even remove the old code path. Based on the earlier scsi-mq prototype by Nicholas Bellinger. Thanks to Bart Van Assche and Robert Elliot for testing, benchmarking and various sugestions and code contributions. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Webb Scales <webbnh@hp.com> Acked-by: Jens Axboe <axboe@kernel.dk> Tested-by: Bart Van Assche <bvanassche@acm.org> Tested-by: Robert Elliott <elliott@hp.com>
# c53c6d6a	15-Apr-2014	Christoph Hellwig <hch@lst.de>	scatterlist: allow chaining to preallocated chunks Blk-mq drivers usually preallocate their S/G list as part of the request, but if we want to support the very large S/G lists currently supported by the SCSI code that would tie up a lot of memory in the preallocated request pool. Add support to the scatterlist code so that it can initialize a S/G list that uses a preallocated first chunks and dynamically allocated additional chunks. That way the scsi-mq code can preallocate a first page worth of S/G entries as part of the request, and dynamically extend the S/G list when needed. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Webb Scales <webbnh@hp.com> Acked-by: Jens Axboe <axboe@kernel.dk> Tested-by: Bart Van Assche <bvanassche@acm.org> Tested-by: Robert Elliott <elliott@hp.com>
# f6d47e74	16-Feb-2014	Christoph Hellwig <hch@lst.de>	scsi: unwind blk_end_request_all and blk_end_request_err calls Replace the calls to the various blk_end_request variants with opencode equivalents. Blk-mq is using a model that gives the driver control between the bio updates and the actual completion, and making the old code follow that same model allows us to keep the code more similar for both paths. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Webb Scales <webbnh@hp.com> Acked-by: Jens Axboe <axboe@kernel.dk> Tested-by: Bart Van Assche <bvanassche@acm.org> Tested-by: Robert Elliott <elliott@hp.com>
# 2ccbb008	26-Mar-2014	Christoph Hellwig <hch@lst.de>	scsi: only maintain target_blocked if the driver has a target queue limit This saves us an atomic operation for each I/O submission and completion for the usual case where the driver doesn't set a per-target can_queue value. Only a few iscsi hardware offload drivers set the per-target can_queue value at the moment. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Webb Scales <webbnh@hp.com> Acked-by: Jens Axboe <axboe@kernel.dk> Tested-by: Bart Van Assche <bvanassche@acm.org> Tested-by: Robert Elliott <elliott@hp.com>
# cd9070c9	22-Jan-2014	Christoph Hellwig <hch@lst.de>	scsi: fix the {host,target,device}_blocked counter mess Seems like these counters are missing any sort of synchronization for updates, as a over 10 year old comment from me noted. Fix this by using atomic counters, and while we're at it also make sure they are in the same cacheline as the _busy counters and not needlessly stored to in every I/O completion. With the new model the _busy counters can temporarily go negative, so all the readers are updated to check for > 0 values. Longer term every successful I/O completion will reset the counters to zero, so the temporarily negative values will not cause any harm. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Webb Scales <webbnh@hp.com> Acked-by: Jens Axboe <axboe@kernel.dk> Tested-by: Bart Van Assche <bvanassche@acm.org> Tested-by: Robert Elliott <elliott@hp.com>
# 71e75c97	11-Apr-2014	Christoph Hellwig <hch@lst.de>	scsi: convert device_busy to atomic_t Avoid taking the queue_lock to check the per-device queue limit. Instead we do an atomic_inc_return early on to grab our slot in the queue, and if necessary decrement it after finishing all checks. Unlike the host and target busy counters this doesn't allow us to avoid the queue_lock in the request_fn due to the way the interface works, but it'll allow us to prepare for using the blk-mq code, which doesn't use the queue_lock at all, and it at least avoids a queue_lock round trip in scsi_device_unbusy, which is still important given how busy the queue_lock is. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Webb Scales <webbnh@hp.com> Acked-by: Jens Axboe <axboe@kernel.dk> Tested-by: Bart Van Assche <bvanassche@acm.org> Tested-by: Robert Elliott <elliott@hp.com>
# 74665016	22-Jan-2014	Christoph Hellwig <hch@lst.de>	scsi: convert host_busy to atomic_t Avoid taking the host-wide host_lock to check the per-host queue limit. Instead we do an atomic_inc_return early on to grab our slot in the queue, and if necessary decrement it after finishing all checks. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Webb Scales <webbnh@hp.com> Acked-by: Jens Axboe <axboe@kernel.dk> Tested-by: Bart Van Assche <bvanassche@acm.org> Tested-by: Robert Elliott <elliott@hp.com>
# 7ae65c0f	22-Jan-2014	Christoph Hellwig <hch@lst.de>	scsi: convert target_busy to an atomic_t Avoid taking the host-wide host_lock to check the per-target queue limit. Instead we do an atomic_inc_return early on to grab our slot in the queue, and if necessary decrement it after finishing all checks. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Webb Scales <webbnh@hp.com> Acked-by: Jens Axboe <axboe@kernel.dk> Tested-by: Bart Van Assche <bvanassche@acm.org> Tested-by: Robert Elliott <elliott@hp.com>
# cf68d334	22-Jan-2014	Christoph Hellwig <hch@lst.de>	scsi: push host_lock down into scsi_{host,target}_queue_ready Prepare for not taking a host-wide lock in the dispatch path by pushing the lock down into the places that actually need it. Note that this patch is just a preparation step, as it will actually increase lock roundtrips and thus decrease performance on its own. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Webb Scales <webbnh@hp.com> Acked-by: Jens Axboe <axboe@kernel.dk> Tested-by: Bart Van Assche <bvanassche@acm.org> Tested-by: Robert Elliott <elliott@hp.com>
# 3b5382c4	05-May-2014	Christoph Hellwig <hch@lst.de>	scsi: set ->scsi_done before calling scsi_dispatch_cmd The blk-mq code path will set this to a different function, so make the code simpler by setting it up in a legacy-request specific place. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Webb Scales <webbnh@hp.com> Acked-by: Jens Axboe <axboe@kernel.dk> Tested-by: Bart Van Assche <bvanassche@acm.org> Tested-by: Robert Elliott <elliott@hp.com>
# d0d3bbf9	22-Jan-2014	Christoph Hellwig <hch@lst.de>	scsi: centralize command re-queueing in scsi_dispatch_fn Make sure we only have the logic for requeing commands in one place. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Webb Scales <webbnh@hp.com> Acked-by: Jens Axboe <axboe@kernel.dk> Tested-by: Bart Van Assche <bvanassche@acm.org> Tested-by: Robert Elliott <elliott@hp.com>
# de3e8bf3	23-Jan-2014	Christoph Hellwig <hch@lst.de>	scsi: split __scsi_queue_insert Factor out a helper to set the _blocked values, which we'll reuse for the blk-mq code path. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Webb Scales <webbnh@hp.com> Acked-by: Jens Axboe <axboe@kernel.dk> Tested-by: Bart Van Assche <bvanassche@acm.org> Tested-by: Robert Elliott <elliott@hp.com>
# 6af7a4ff	08-Jul-2014	Christoph Hellwig <hch@lst.de>	scsi: add scsi_setup_cmnd helper Factor out command setup code that will be shared with the blk-mq code path. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Webb Scales <webbnh@hp.com>
# 4f1e5765	27-Jun-2014	Christoph Hellwig <hch@lst.de>	scsi: mark scsi_setup_blk_pc_cmnd static Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Hannes Reinecke <hare@suse.de>
# 5158a899	28-Jun-2014	Christoph Hellwig <hch@lst.de>	scsi: set sc_data_direction in common code The data direction fiel in the SCSI command is derived only from the block request structure. Move setting it up into common code instead of duplicating it in the ULDs. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Hannes Reinecke <hare@suse.de>
# 3868cf8e	28-Jun-2014	Christoph Hellwig <hch@lst.de>	scsi: restructure command initialization for TYPE_FS requests We should call the device handler prep_fn for all TYPE_FS requests, not just simple read/write calls that are handled by the disk driver. Restructure the common I/O code to call the prep_fn handler and zero out the CDB, and just leave the call to scsi_init_io to the ULDs. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Hannes Reinecke <hare@suse.de>
# 635d98b1	28-Jun-2014	Christoph Hellwig <hch@lst.de>	scsi: move the nr_phys_segments assert into scsi_init_io scsi_init_io should only be called for requests that transfer data, so move the assert that a request has segments from the callers into scsi_init_io. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Hannes Reinecke <hare@suse.de>
# e6c11dbb	10-Jul-2014	Maurizio Lombardi <mlombard@redhat.com>	scsi_lib: remove the description string in scsi_io_completion() During IO with fabric faults, one generally sees several "Unhandled error code" messages in the syslog as shown below: sd 4:0:6:2: [sdbw] Unhandled error code sd 4:0:6:2: [sdbw] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK sd 4:0:6:2: [sdbw] CDB: Read(10): 28 00 00 00 00 00 00 00 08 00 end_request: I/O error, dev sdbw, sector 0 This comes from scsi_io_completion (in scsi_lib.c) while handling error codes other than DID_RESET or not deferred sense keys i.e. this is actually handled by the SCSI mid layer. But what gets displayed here is "Unhandled error code" which is quite misleading as it indicates something that is not addressed by the mid layer. The description string is based on the sense key and sometimes on the additional sense code; since the ACTION_FAIL case always prints the sense key and the additional sense code, this patch removes the description string completely because it does not add useful information. Signed-off-by: Maurizio Lombardi <mlombard@redhat.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de>
# f1bea55d	14-Apr-2014	Christoph Hellwig <hch@lst.de>	scsi: remove various exports that were only used by scsi_tgt Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Hannes Reinecke <hare@suse.de>
# 91921e01	25-Jun-2014	Hannes Reinecke <hare@suse.de>	scsi: use dev_printk variants where possible Using dev_printk variants prefixes the logging message with the originating device, which makes debugging easier. Signed-off-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christoph Hellwig <hch@lst.de>
# 89fb4cd1	03-Jul-2014	James Bottomley <JBottomley@Parallels.com>	scsi: handle flush errors properly Flush commands don't transfer data and thus need to be special cased in the I/O completion handler so that we can propagate errors to the block layer and filesystem. Signed-off-by: James Bottomley <JBottomley@Parallels.com> Reported-by: Steven Haber <steven@qumulo.com> Tested-by: Steven Haber <steven@qumulo.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Cc: stable@vger.kernel.org Signed-off-by: Christoph Hellwig <hch@lst.de>
# f27b087b	06-Jun-2014	Jens Axboe <axboe@fb.com>	block: add blk_rq_set_block_pc() With the optimizations around not clearing the full request at alloc time, we are leaving some of the needed init for REQ_TYPE_BLOCK_PC up to the user allocating the request. Add a blk_rq_set_block_pc() that sets the command type to REQ_TYPE_BLOCK_PC, and properly initializes the members associated with this type of request. Update callers to use this function instead of manipulating rq->cmd_type directly. Includes fixes from Christoph Hellwig <hch@lst.de> for my half-assed attempt. Signed-off-by: Jens Axboe <axboe@fb.com>
# a1b73fc1	01-May-2014	Christoph Hellwig <hch@lst.de>	scsi: reintroduce scsi_driver.init_command Instead of letting the ULD play games with the prep_fn move back to the model of a central prep_fn with a callback to the ULD. This already cleans up and shortens the code by itself, and will be required to properly support blk-mq in the SCSI midlayer. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Nicholas Bellinger <nab@linux-iscsi.org> Reviewed-by: Mike Christie <michaelc@cs.wisc.edu> Reviewed-by: Hannes Reinecke <hare@suse.de>
# bc85dc50	01-May-2014	Christoph Hellwig <hch@lst.de>	scsi: remove scsi_end_request By folding scsi_end_request into its only caller we can significantly clean up the completion logic. We can use simple goto labels now to only have a single place to finish or requeue command there instead of the previous convoluted logic. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Nicholas Bellinger <nab@linux-iscsi.org> Reviewed-by: Mike Christie <michaelc@cs.wisc.edu> Reviewed-by: Hannes Reinecke <hare@suse.de>
# c682adf3	01-May-2014	Christoph Hellwig <hch@lst.de>	scsi: explicitly release bidi buffers Instead of trying to guess when we have a BIDI buffer in scsi_release_buffers add a function to explicitly free the BIDI ressoures in the one place that handles them. This avoids needing a special __scsi_release_buffers for the case where we already have freed the request as well. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Mike Christie <michaelc@cs.wisc.edu>
# 644373a4	28-Mar-2014	Alan Stern <stern@rowland.harvard.edu>	[SCSI] Fix command result state propagation We're seeing a case where the contents of scmd->result isn't being reset after a SCSI command encounters an error, is resubmitted, times out and then gets handled. The error handler acts on the stale result of the previous error instead of the timeout. Fix this by properly zeroing the scmd->status before the command is resubmitted. Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
# 68c03d91	14-Apr-2014	Christoph Hellwig <hch@lst.de>	[SCSI] don't reference freed command in scsi_prep_return Patch commit 0479633686d370303e3430256ace4bd5f7f138dc Author: Christoph Hellwig <hch@infradead.org> Date: Thu Feb 20 14:20:55 2014 -0800 [SCSI] do not manipulate device reference counts in scsi_get/put_command Introduced a use after free:I in the kill case of scsi_prep_return we have to release our device reference, but we do this trying to reference the just freed command. Use the local sdev pointer instead. Fixes: 0479633686d370303e3430256ace4bd5f7f138dc Reported-by: Joe Lawrence <joe.lawrence@stratus.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
# 5e012aad	14-Apr-2014	Christoph Hellwig <hch@lst.de>	[SCSI] don't reference freed command in scsi_init_sgtable Patch commit 0479633686d370303e3430256ace4bd5f7f138dc Author: Christoph Hellwig <hch@infradead.org> Date: Thu Feb 20 14:20:55 2014 -0800 [SCSI] do not manipulate device reference counts in scsi_get/put_command Introduced a use after free: when scsi_init_io fails we have to release our device reference, but we do this trying to reference the just freed command. Add a local scsi_device pointer to fix this. Fixes: 0479633686d370303e3430256ace4bd5f7f138dc Reported-by: Sander Eikelenboom <linux@eikelenboom.it> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
# b4f42e28	10-Apr-2014	Jens Axboe <axboe@fb.com>	block: remove struct request buffer member This was used in the olden days, back when onions were proper yellow. Basically it mapped to the current buffer to be transferred. With highmem being added more than a decade ago, most drivers map pages out of a bio, and rq->buffer isn't pointing at anything valid. Convert old style drivers to just use bio_data(). For the discard payload use case, just reference the page in the bio. Signed-off-by: Jens Axboe <axboe@fb.com>
# 2bfad21e	09-Apr-2014	Martin K. Petersen <martin.petersen@oracle.com>	scsi: Make sure cmd_flags are 64-bit cmd_flags in struct request is now 64 bits wide but the scsi_execute functions truncated arguments passed to int leading to errors. Make sure the flags parameters are u64. Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Cc: Jens Axboe <axboe@fb.com> CC: Jan Kara <jack@suse.cz> Cc: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Jens Axboe <axboe@fb.com>
# 59c3d45e	08-Apr-2014	Jens Axboe <axboe@fb.com>	block: remove 'q' parameter from kblockd_schedule_*_work() The queue parameter is never used, just get rid of it. Signed-off-by: Jens Axboe <axboe@fb.com>
# 134997a0	20-Feb-2014	Christoph Hellwig <hch@infradead.org>	[SCSI] remove a useless get/put_device pair in scsi_requeue_command Avoid a spurious device get/put pair by cleaning up scsi_requeue_command and folding scsi_unprep_request into it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
# 27e9e0f1	20-Feb-2014	Bart Van Assche <bvanassche@acm.org>	[SCSI] remove a useless get/put_device pair in scsi_next_command Eliminate a get_device() / put_device() pair from scsi_next_command(). Both are atomic operations hence removing these slightly improves performance. [hch: slight changes due to different context] Signed-off-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
# 613be1f6	20-Feb-2014	Bart Van Assche <bvanassche@acm.org>	[SCSI] remove a useless get/put_device pair in scsi_request_fn SCSI devices may only be removed by calling scsi_remove_device(). That function must invoke blk_cleanup_queue() before the final put of sdev->sdev_gendev. Since blk_cleanup_queue() waits for the block queue to drain and then tears it down, scsi_request_fn cannot be active anymore after blk_cleanup_queue() has returned and hence the get_device()/put_device() pair in scsi_request_fn is unnecessary. Signed-off-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Tejun Heo <tj@kernel.org> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
# 04796336	20-Feb-2014	Christoph Hellwig <hch@infradead.org>	[SCSI] do not manipulate device reference counts in scsi_get/put_command Many callers won't need this and we can optimize them away. In addition the handling in the __-prefixed variants was inconsistant to start with. Based on an earlier patch from Bart Van Assche. [jejb: fix kerneldoc probelm picked up by Fengguang Wu] Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
# 21a05df5	20-Feb-2014	Christoph Hellwig <hch@infradead.org>	[SCSI] avoid taking host_lock in scsi_run_queue unless nessecary If we don't have starved devices we don't need to take the host lock to iterate over them. Also split the function up to be more clear. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
# ee60b2c5	10-Feb-2014	Eiichi Tsukata <eiichi.tsukata.xh@hitachi.com>	[SCSI] Add timeout to avoid infinite command retry Currently, scsi error handling in scsi_io_completion() tries to unconditionally requeue scsi command when device keeps some error state. For example, UNIT_ATTENTION causes infinite retry with action == ACTION_RETRY. This is because retryable errors are thought to be temporary and the scsi device will soon recover from those errors. Normally, such retry policy is appropriate because the device will soon recover from temporary error state. But there is no guarantee that device is able to recover from error state immediately. Some hardware error can prevent device from recovering. This patch adds timeout in scsi_io_completion() to avoid infinite command retry in scsi_io_completion(). Once scsi command retry time is longer than this timeout, the command is treated as failure. Signed-off-by: Eiichi Tsukata <eiichi.tsukata.xh@hitachi.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
# e83b3664	11-Feb-2014	Russell King <rmk+kernel@arm.linux.org.uk>	Fix uses of dma_max_pfn() when converting to a limiting address We must use a 64-bit for this, otherwise overflowed bits get lost, and that can result in a lower than intended value set. Fixes: 8e0cb8a1f6ac ("ARM: 7797/1: mmc: Use dma_max_pfn(dev) helper for bounce_limit calculations") Fixes: 7d35496dd982 ("ARM: 7796/1: scsi: Use dma_max_pfn(dev) helper for bounce_limit calculations") Tested-Acked-by: Santosh Shilimkar <santosh.shilimkar@ti.com> Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
# 7d35496d	29-Jul-2013	Santosh Shilimkar <santosh.shilimkar@ti.com>	ARM: 7796/1: scsi: Use dma_max_pfn(dev) helper for bounce_limit calculations DMA bounce limit is the maximum direct DMA'able memory beyond which bounce buffers has to be used to perform dma operations. SCSI driver relies on dma_mask but its calculation is based on max_*pfn which don't have uniform meaning across architectures. So make use of dma_max_pfn() which is expected to return the DMAable maximum pfn value across architectures. Cc: linux-scsi@vger.kernel.org Signed-off-by: Santosh Shilimkar <santosh.shilimkar@ti.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
# 279afdfe	08-Aug-2013	Ewan D. Milne <emilne@redhat.com>	[SCSI] Generate uevents on certain unit attention codes Generate a uevent when the following Unit Attention ASC/ASCQ codes are received: 2A/01 MODE PARAMETERS CHANGED 2A/09 CAPACITY DATA HAS CHANGED 38/07 THIN PROVISIONING SOFT THRESHOLD REACHED 3F/03 INQUIRY DATA HAS CHANGED 3F/0E REPORTED LUNS DATA HAS CHANGED Log kernel messages when the following Unit Attention ASC/ASCQ codes are received that are not as specific as those above: 2A/xx PARAMETERS CHANGED 3F/xx TARGET OPERATING CONDITIONS HAVE CHANGED Added logic to set expecting_lun_change for other LUNs on the target after REPORTED LUNS DATA HAS CHANGED is received, so that duplicate uevents are not generated, and clear expecting_lun_change when a REPORT LUNS command completes, in accordance with the SPC-3 specification regarding reporting of the 3F 0E ASC/ASCQ UA. [jejb: remove SPC3 test in scsi_report_lun_change and some docbook fixes and unused variable fix, both reported by Fengguang Wu] Signed-off-by: Ewan D. Milne <emilne@redhat.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
# 7e782af5	01-Jul-2013	Hannes Reinecke <hare@suse.de>	[SCSI] Return ENODATA on medium error When a medium error is detected the SCSI stack should return ENODATA to the upper layers. [jejb: fix whitespace error] Signed-off-by: Hannes Reinecke <hare@suse.de> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
# a9d6ceb8	01-Jul-2013	Hannes Reinecke <hare@suse.de>	[SCSI] return ENOSPC on thin provisioning failure When the thin provisioning hard threshold is reached we should return ENOSPC to inform upper layers about this fact. Signed-off-by: Hannes Reinecke <hare@suse.de> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
# 0f7f6234	01-Jul-2013	Hannes Reinecke <hare@suse.de>	[SCSI] Document enhanced error codes Document the various error codes returned on I/O failure. Signed-off-by: Hannes Reinecke <hare@suse.de> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
# f1bc1e4c	22-Aug-2013	Aaron Lu <aaron.lu@intel.com>	ata: acpi: rework the ata acpi bind support Binding ACPI handle to SCSI device has several drawbacks, namely: 1 During ATA device initialization time, ACPI handle will be needed while SCSI devices are not created yet. So each time ACPI handle is needed, instead of retrieving the handle by ACPI_HANDLE macro, a namespace scan is performed to find the handle for the corresponding ATA device. This is inefficient, and also expose a restriction on calling path not holding any lock. 2 The binding to SCSI device tree makes code complex, while at the same time doesn't bring us any benefit. All ACPI handlings are still done in ATA module, not in SCSI. Rework the ATA ACPI binding code to bind ACPI handle to ATA transport devices(ATA port and ATA device). The binding needs to be done only once, since the ATA transport devices do not go away with hotplug. And due to this, the flush_work call in hotplug handler for ATA bay is no longer needed. Tested on an Intel test platform for binding and runtime power off for ODD(ZPODD) and hard disk; on an ASUS S400C for binding and normal boot and S3, where its SATA port node has _SDD and _GTF control methods when configured as an AHCI controller and its PATA device node has _GTF control method when configured as an IDE controller. SATA PMP binding and ATA hotplug is not tested. Signed-off-by: Aaron Lu <aaron.lu@intel.com> Tested-by: Dirk Griesbach <spamthis@freenet.de> Signed-off-by: Tejun Heo <tj@kernel.org>
# 0516c08d	02-Jul-2013	Bart Van Assche <bvanassche@acm.org>	[SCSI] enable destruction of blocked devices which fail LUN scanning If something goes wrong during LUN scanning, e.g. a transport layer failure occurs, then __scsi_remove_device() can get invoked by the LUN scanning code for a SCSI device in state SDEV_CREATED_BLOCK and before the SCSI device has been added to sysfs (is_visible == 0). Make sure that even in this case the transition into state SDEV_DEL occurs. This avoids that __scsi_remove_device() can get invoked a second time by scsi_forget_host() if this last function is invoked from another thread than the thread that performs LUN scanning. Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
# e2eb7244	02-Jul-2013	James Bottomley <JBottomley@Parallels.com>	[SCSI] Fix race between starved list and device removal scsi_run_queue() examines all SCSI devices that are present on the starved list. Since scsi_run_queue() unlocks the SCSI host lock a SCSI device can get removed after it has been removed from the starved list and before its queue is run. Protect against that race condition by holding a reference on the queue while running it. Reported-by: Chanho Min <chanho.min@lge.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
# 9b21493c	22-Mar-2013	Lin Ming <ming.m.lin@intel.com>	[SCSI] sd: use REQ_PM in sd's runtime suspend operation With the introduction of REQ_PM, modify sd's runtime suspend operation functions to use that flag so that the operations to put the device into runtime suspended state(i.e. sync cache and stop device) will not affect its runtime PM status. Signed-off-by: Lin Ming <ming.m.lin@intel.com> Signed-off-by: Aaron Lu <aaron.lu@intel.com> Acked-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
# 53540098	03-Mar-2013	Rafael J. Wysocki <rafael.j.wysocki@intel.com>	ACPI / glue: Add .match() callback to struct acpi_bus_type USB uses the .find_bridge() callback from struct acpi_bus_type incorrectly, because as a result of the way it is used by USB every device in the system that doesn't have a bus type or parent is passed to usb_acpi_find_device() for inspection. What USB actually needs, though, is to call usb_acpi_find_device() for USB ports that don't have a bus type defined, but have usb_port_device_type as their device type, as well as for USB devices. To fix that replace the struct bus_type pointer in struct acpi_bus_type used for matching devices to specific subsystems with a .match() callback to be used for this purpose and update the users of struct acpi_bus_type, including USB, accordingly. Define the .match() callback routine for USB, usb_acpi_bus_match(), in such a way that it will cover both USB devices and USB ports and remove the now redundant .find_bridge() callback pointer from usb_acpi_bus. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: Yinghai Lu <yinghai@kernel.org> Acked-by: Jeff Garzik <jgarzik@pobox.com>
# 6f4c827e	23-Jan-2013	Aaron Lu <aaron.lu@intel.com>	[libata] scsi: no poll when ODD is powered off When the ODD is powered off, any action the user did to the ODD that would generate a media event will trigger an ACPI interrupt, so the poll for media event is no longer necessary. And the poll will also cause a runtime status change, which will stop the ODD from staying in powered off state, so the poll should better be stopped. But since we don't have access to the gendisk structure in LLDs, here comes the disk_events_disable_depth for scsi device. This field is a hint set by LLDs to convey information to upper layer drivers. A value of 0 means media poll is necessary for the device, while values above 0 means media poll is not needed and should better be skipped. So we can increase its value when we are to power off the ODD in ATA layer and decrease its value when the ODD is powered on, effectively silence the media events poll. Signed-off-by: Aaron Lu <aaron.lu@intel.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
# 3f3299d5	28-Nov-2012	Bart Van Assche <bvanassche@acm.org>	block: Rename queue dead flag QUEUE_FLAG_DEAD is used to indicate that queuing new requests must stop. After this flag has been set queue draining starts. However, during the queue draining phase it is still safe to invoke the queue's request_fn, so QUEUE_FLAG_DYING is a better name for this flag. This patch has been generated by running the following command over the kernel source tree: git grep -lEw 'blk_queue_dead\|QUEUE_FLAG_DEAD' \| xargs sed -i.tmp -e 's/blk_queue_dead/blk_queue_dying/g' \ -e 's/QUEUE_FLAG_DEAD/QUEUE_FLAG_DYING/g'; \ sed -i.tmp -e "s/QUEUE_FLAG_DYING$(printf \\t)*5/QUEUE_FLAG_DYING$(printf \\t)5/g" \ include/linux/blkdev.h; \ sed -i.tmp -e 's/ DEAD/ DYING/g' -e 's/dead queue/a dying queue/' \ -e 's/Dead queue/A dying queue/' block/blk-core.c Signed-off-by: Bart Van Assche <bvanassche@acm.org> Acked-by: Tejun Heo <tj@kernel.org> Cc: James Bottomley <JBottomley@Parallels.com> Cc: Mike Christie <michaelc@cs.wisc.edu> Cc: Jens Axboe <axboe@kernel.dk> Cc: Chanho Min <chanho.min@lge.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
# 5db44863	17-Sep-2012	Martin K. Petersen <martin.petersen@oracle.com>	[SCSI] sd: Implement support for WRITE SAME Implement support for WRITE SAME(10) and WRITE SAME(16) in the SCSI disk driver. - We set the default maximum to 0xFFFF because there are several devices out there that only support two-byte block counts even with WRITE SAME(16). We only enable transfers bigger than 0xFFFF if the device explicitly reports MAXIMUM WRITE SAME LENGTH in the BLOCK LIMITS VPD. - max_write_same_blocks can be overriden per-device basis in sysfs. - The UNMAP discovery heuristics remain unchanged but the discard limits are tweaked to match the "real" WRITE SAME commands. - In the error handling logic we now distinguish between WRITE SAME with and without UNMAP set. The discovery process heuristics are: - If the device reports a SCSI level of SPC-3 or greater we'll issue READ SUPPORTED OPERATION CODES to find out whether WRITE SAME(16) is supported. If that's the case we will use it. - If the device supports the block limits VPD and reports a MAXIMUM WRITE SAME LENGTH bigger than 0xFFFF we will use WRITE SAME(16). - Otherwise we will use WRITE SAME(10) unless the target LBA is beyond 0xFFFFFFFF or the block count exceeds 0xFFFF. - no_write_same is set for ATA, FireWire and USB. Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Mike Snitzer <snitzer@redhat.com> Reviewed-by: Jeff Garzik <jgarzik@redhat.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
# 0e58076b	09-Aug-2012	Vikas Chaudhary <vikas.chaudhary@qlogic.com>	[SCSI] scsi_lib: Set the device state from transport-offline to running FC and iSCSI class set SCSI devices to transport-offline state after fast_io_fail/replacement_timeout has fired, but after relogin, function scsi_internal_device_unblock() is not setting scsi device state to running. Due to this the devices even after being relogged in remain offline. Signed-off-by: Vikas Chaudhary <vikas.chaudhary@qlogic.com> Reviewed-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
# 27c41973	31-May-2012	Mike Snitzer <snitzer@redhat.com>	[SCSI] scsi_lib: fix scsi_io_completion's SG_IO error propagation The following v3.4-rc1 commit unmasked an existing bug in scsi_io_completion's SG_IO error handling: 47ac56d [SCSI] scsi_error: classify some ILLEGAL_REQUEST sense as a permanent TARGET_ERROR Given that certain ILLEGAL_REQUEST are now properly categorized as TARGET_ERROR the host_byte is being set (before host_byte wasn't ever set for these ILLEGAL_REQUEST). In scsi_io_completion, initialize req->errors with cmd->result _after_ the SG_IO block that calls __scsi_error_from_host_byte (which may modify the host_byte). Before this fix: cdb to send: 12 01 01 00 00 00 ioctl(3, SG_IO, {'S', SG_DXFER_NONE, cmd[6]=[12, 01, 01, 00, 00, 00], mx_sb_len=32, iovec_count=0, dxfer_len=0, timeout=20000, flags=0, status=02, masked_status=01, sb[19]=[70, 00, 05, 00, 00, 00, 00, 0b, 00, 00, 00, 00, 24, 00, 00, 00, 00, 00, 00], host_status=0x10, driver_status=0x8, resid=0, duration=0, info=0x1}) = 0 SCSI Status: Check Condition Sense Information: sense buffer empty After: cdb to send: 12 01 01 00 00 00 ioctl(3, SG_IO, {'S', SG_DXFER_NONE, cmd[6]=[12, 01, 01, 00, 00, 00], mx_sb_len=32, iovec_count=0, dxfer_len=0, timeout=20000, flags=0, status=02, masked_status=01, sb[19]=[70, 00, 05, 00, 00, 00, 00, 0b, 00, 00, 00, 00, 24, 00, 00, 00, 00, 00, 00], host_status=0, driver_status=0x8, resid=0, duration=0, info=0x1}) = 0 SCSI Status: Check Condition Sense Information: Fixed format, current; Sense key: Illegal Request Additional sense: Invalid field in cdb Raw sense data (in hex): 70 00 05 00 00 00 00 0b 00 00 00 00 24 00 00 00 00 00 00 Reported-by: Paolo Bonzini <pbonzini@redhat.com> Tested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Reviewed-by: Babu Moger <babu.moger@netapp.com> Cc: stable@vger.kernel.org # 3.4 Signed-off-by: James Bottomley <JBottomley@Parallels.com>
# b485462a	29-Jun-2012	Bart Van Assche <bvanassche@acm.org>	[SCSI] Stop accepting SCSI requests before removing a device Avoid that the code for requeueing SCSI requests triggers a crash by making sure that that code isn't scheduled anymore after a device has been removed. Also, source code inspection of __scsi_remove_device() revealed a race condition in this function: no new SCSI requests must be accepted for a SCSI device after device removal started. Signed-off-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Mike Christie <michaelc@cs.wisc.edu> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
# 84feb166	29-Jun-2012	Bart Van Assche <bvanassche@acm.org>	[SCSI] Change return type of scsi_queue_insert() into void The return value of scsi_queue_insert() is ignored by all its callers, hence change the return type of this function into void. Signed-off-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Mike Christie <michaelc@cs.wisc.edu> Reviewed-by: Tejun Heo <tj@kernel.org> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
# 940f5d47	29-Jun-2012	Bart Van Assche <bvanassche@acm.org>	[SCSI] Avoid dangling pointer in scsi_requeue_command() When we call scsi_unprep_request() the command associated with the request gets destroyed and therefore drops its reference on the device. If this was the only reference, the device may get released and we end up with a NULL pointer deref when we call blk_requeue_request. Reported-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Mike Christie <michaelc@cs.wisc.edu> Reviewed-by: Tejun Heo <tj@kernel.org> Cc: <stable@kernel.org> [jejb: enhance commend and add commit log for stable] Signed-off-by: James Bottomley <JBottomley@Parallels.com>
# 67bd9413	29-Jun-2012	Bart Van Assche <bvanassche@acm.org>	[SCSI] Fix device removal NULL pointer dereference Use blk_queue_dead() to test whether the queue is dead instead of !sdev. Since scsi_prep_fn() may be invoked concurrently with __scsi_remove_device(), keep the queuedata (sdev) pointer in __scsi_remove_device(). This patch fixes a kernel oops that can be triggered by USB device removal. See also http://www.spinics.net/lists/linux-scsi/msg56254.html. Other changes included in this patch: - Swap the blk_cleanup_queue() and kfree() calls in scsi_host_dev_release() to make that code easier to grasp. - Remove the queue dead check from scsi_run_queue() since the queue state can change anyway at any point in that function where the queue lock is not held. - Remove the queue dead check from the start of scsi_request_fn() since it is redundant with the scsi_device_online() check. Reported-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Mike Christie <michaelc@cs.wisc.edu> Reviewed-by: Tejun Heo <tj@kernel.org> Cc: <stable@kernel.org> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
# d075498c	17-May-2012	Mike Christie <michaelc@cs.wisc.edu>	[SCSI] remove old comment from block/unblock functions We do not hold the host lock when calling these functions, so remove comment. Signed-off-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
# 5d9fb5cc	17-May-2012	Mike Christie <michaelc@cs.wisc.edu>	[SCSI] core, classes, mpt2sas: have scsi_internal_device_unblock take new state This has scsi_internal_device_unblock/scsi_target_unblock take the new state to set the devices as an argument instead of always setting to running. The patch also converts users of these functions. This allows the FC and iSCSI class to transition devices from blocked to transport-offline, so that when fast_io_fail/replacement_timeout has fired we do not set the devices back to running. Instead, we set them to SDEV_TRANSPORT_OFFLINE. Signed-off-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
# 1b8d2620	17-May-2012	Mike Christie <michaelc@cs.wisc.edu>	[SCSI] add new SDEV_TRANSPORT_OFFLINE state This patch adds a new state SDEV_TRANSPORT_OFFLINE. It will be used by transport classes to offline devices for cases like when the fast_io_fail/recovery_tmo fires. In those cases we want all IO to fail, and we have not yet escalated to dev_loss_tmo behavior where we are removing the devices. Currently to handle this state, transport classes are setting the scsi_device's state to running, setting their internal session/port structs state to something that indicates failed, and then failing IO from some transport check in the queuecommand. The reason for the new value is so that users can distinguish between a device failure that is a result of a transport problem vs the wide range of errors that devices get offlined for when a scsi command times out and we offline the devices there. It also fixes the confusion as to why the transport class is failing IO, but has set the device state from blocked to running. Signed-off-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
# de50ada5	25-Jun-2012	Holger Macht <holger@homac.de>	[SCSI] add wrapper to access and set scsi_bus_type in struct acpi_bus_type For being able to bind ata devices against acpi devices, scsi_bus_type needs to be set as bus in struct acpi_bus_type. So add wrapper to scsi_lib to accomplish that. Signed-off-by: Holger Macht <holger@homac.de> Signed-off-by: Lin Ming <ming.m.lin@intel.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
# b7e94a16	22-May-2012	Jun'ichi Nomura <j-nomura@ce.jp.nec.com>	[SCSI] Fix dm-multipath starvation when scsi host is busy block congestion control doesn't have any concept of fairness across multiple queues. This means that if SCSI reports the host as busy in the queue congestion control it can result in an unfair starvation situation in dm-mp if there are multiple multipath devices on the same host. For example: http://www.redhat.com/archives/dm-devel/2012-May/msg00123.html The fix for this is to report only the sdev busy state (and ignore the host busy state) in the block congestion control call back. The host is still congested, but the SCSI subsystem will sort out the congestion in a fair way because it knows the relation between the queues and the host. [jejb: fixed up trailing whitespace] Reported-by: Bernd Schubert <bernd.schubert@itwm.fraunhofer.de> Tested-by: Bernd Schubert <bernd.schubert@itwm.fraunhofer.de> Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> Cc: <stable@vger.kernel.org> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
# a7a20d10	22-Mar-2012	Dan Williams <dan.j.williams@intel.com>	[SCSI] sd: limit the scope of the async probe domain sd injects and synchronizes probe work on the global kernel-wide domain. This runs into conflict with PM that wants to perform resume actions in async context: [ 494.237079] INFO: task kworker/u:3:554 blocked for more than 120 seconds. [ 494.294396] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 494.360809] kworker/u:3 D 0000000000000000 0 554 2 0x00000000 [ 494.420739] ffff88012e4d3af0 0000000000000046 ffff88013200c160 ffff88012e4d3fd8 [ 494.484392] ffff88012e4d3fd8 0000000000012500 ffff8801394ea0b0 ffff88013200c160 [ 494.548038] ffff88012e4d3ae0 00000000000001e3 ffffffff81a249e0 ffff8801321c5398 [ 494.611685] Call Trace: [ 494.632649] [<ffffffff8149dd25>] schedule+0x5a/0x5c [ 494.674687] [<ffffffff8104b968>] async_synchronize_cookie_domain+0xb6/0x112 [ 494.734177] [<ffffffff810461ff>] ? __init_waitqueue_head+0x50/0x50 [ 494.787134] [<ffffffff8131a224>] ? scsi_remove_target+0x48/0x48 [ 494.837900] [<ffffffff8104b9d9>] async_synchronize_cookie+0x15/0x17 [ 494.891567] [<ffffffff8104ba49>] async_synchronize_full+0x54/0x70 <-- here we wait for async contexts to complete [ 494.943783] [<ffffffff8104b9f5>] ? async_synchronize_full_domain+0x1a/0x1a [ 495.002547] [<ffffffffa00114b1>] sd_remove+0x2c/0xa2 [sd_mod] [ 495.051861] [<ffffffff812fe94f>] __device_release_driver+0x86/0xcf [ 495.104807] [<ffffffff812fe9bd>] device_release_driver+0x25/0x32 <-- here we take device_lock() [ 853.511341] INFO: task kworker/u:4:549 blocked for more than 120 seconds. [ 853.568693] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 853.635119] kworker/u:4 D ffff88013097b5d0 0 549 2 0x00000000 [ 853.695129] ffff880132773c40 0000000000000046 ffff880130790000 ffff880132773fd8 [ 853.758990] ffff880132773fd8 0000000000012500 ffff88013288a0b0 ffff880130790000 [ 853.822796] 0000000000000246 0000000000000040 ffff88013097b5c8 ffff880130790000 [ 853.886633] Call Trace: [ 853.907631] [<ffffffff8149dd25>] schedule+0x5a/0x5c [ 853.949670] [<ffffffff8149cc44>] __mutex_lock_common+0x220/0x351 [ 854.001225] [<ffffffff81304bd7>] ? device_resume+0x58/0x1c4 [ 854.049082] [<ffffffff81304bd7>] ? device_resume+0x58/0x1c4 [ 854.097011] [<ffffffff8149ce48>] mutex_lock_nested+0x2f/0x36 <-- here we wait for device_lock() [ 854.145591] [<ffffffff81304bd7>] device_resume+0x58/0x1c4 [ 854.192066] [<ffffffff81304d61>] async_resume+0x1e/0x45 [ 854.237019] [<ffffffff8104bc93>] async_run_entry_fn+0xc6/0x173 <-- ...while running in async context Provide a 'scsi_sd_probe_domain' so that async probe actions actions can be flushed without regard for the state of PM, and allow for the resume path to handle devices that have transitioned from SDEV_QUIESCE to SDEV_DEL prior to resume. Acked-by: Alan Stern <stern@rowland.harvard.edu> [alan: uplevel scsi_sd_probe_domain, clarify scsi_device_resume] Signed-off-by: Dan Williams <dan.j.williams@intel.com> [jejb: remove unneeded config guards in include file] Signed-off-by: James Bottomley <JBottomley@Parallels.com>
# 6f381fa3	11-Apr-2012	Lin Ming <ming.m.lin@intel.com>	[SCSI] scsi_lib: use correct DMA device in __scsi_alloc_queue Currently, __scsi_alloc_queue uses SCSI host's parent device as DMA device to set segment boundary. But the parent device may not refer to the DMA device. For example, for ATA disk, SCSI host's parent device now refers to ATA port. Since commit d139b9b([SCSI] scsi_lib_dma: fix bug with dma maps on nested scsi objects), a new field Scsi_Host->dma_dev was introduced to refer to the real DMA device. Use ->dma_dev in __scsi_alloc_queue to correctly set segment boundary. Bug report: http://marc.info/?l=linux-ide&m=133177818318187&w=2 Reported-and-tested-by: Jörg Sommer <joerg@alea.gnuu.de> Signed-off-by: Lin Ming <ming.m.lin@intel.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
# 77dfce07	25-Nov-2011	Cong Wang <amwang@redhat.com>	scsi: remove the second argument of k[un]map_atomic() Signed-off-by: Cong Wang <amwang@redhat.com>
# 66a651aa	13-Feb-2012	Martin K. Petersen <martin.petersen@oracle.com>	[SCSI] Ensure discard failure gets treated as a target problem The error reported up the stack for a discard failure did not clearly indicate that the command was processed and subsequently failed by the target device. Return -EREMOTEIO so multipathing does not classify this condition as a path failure. Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Acked-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
# 2082ebc4	24-Jan-2012	Moger, Babu <Babu.Moger@netapp.com>	[SCSI] fix the new host byte settings (DID_TARGET_FAILURE and DID_NEXUS_FAILURE) This patch fixes the host byte settings DID_TARGET_FAILURE and DID_NEXUS_FAILURE. The function __scsi_error_from_host_byte, tries to reset the host byte to DID_OK. But that does not happen because of the OR operation. Here is the flow. scsi_softirq_done-> scsi_decide_disposition -> __scsi_error_from_host_byte Let's take an example with DID_NEXUS_FAILURE. In scsi_decide_disposition, result will be set as DID_NEXUS_FAILURE (=0x11). Then in __scsi_error_from_host_byte, when we do OR with DID_OK. Purpose is to reset it back to DID_OK. But that does not happen. This patch fixes this issue. Signed-off-by: Babu Moger <babu.moger@netapp.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
# 466c08c7	09-Jan-2012	Shaohua Li <shaohua.li@intel.com>	[SCSI] don't change sdev starvation list order without request dispatched The sdev is deleted from starved list and then try to dispatch from this device. It's quite possible the sdev can't eventually dispatch a request, then the sdev will be in starved list tail. This isn't fair. There are two cases here: 1. unplug path. scsi_request_fn() calls to scsi_target_queue_ready(), then the dev is removed from starved list, but quite possible host queue isn't ready, the dev is moved to starved list without dispatching any request. 2. scsi_run_queue path. It deletes the dev from starved list first (both global and local starved lists), then handles the dev. Then we could have the same process like case 1. This patch fixes the first case. Case 2 isn't fixed, because there is a rare case scsi_run_queue finds host isn't busy but scsi_request_fn finds host is busy (other CPU is faster to get host queue depth). Not deleting the dev from starved list in scsi_run_queue will keep scsi_run_queue looping (though this is very rare case, because host will become busy). Fortunately fixing case 1 already gives big improvement for starvation in my test. In a 12 disk JBOD setup, running file creation under EXT4, this gives 12% more throughput. Signed-off-by: Shaohua Li <shaohua.li@intel.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
# 74571813	09-Nov-2011	Hannes Reinecke <hare@suse.de>	[SCSI] Silencing 'killing requests for dead queue' When we tear down a device we try to flush all outstanding commands in scsi_free_queue(). However the check in scsi_request_fn() is imperfect as it only signals that we _might start_ aborting commands, not that we've actually aborted some. So move the printk inside the scsi_kill_request function, this will also give us a hint about which commands are aborted. Signed-off-by: Hannes Reinecke <hare@suse.de> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
# 09703660	27-May-2011	Paul Gortmaker <paul.gortmaker@windriver.com>	scsi: Add export.h for EXPORT_SYMBOL/THIS_MODULE as required For the basic SCSI infrastructure files that are exporting symbols but not modules themselves, add in the basic export.h header file to allow the exports. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
# 3308511c	23-Sep-2011	Bart Van Assche <bvanassche@acm.org>	[SCSI] Make scsi_free_queue() kill pending SCSI commands Make sure that SCSI device removal via scsi_remove_host() does finish all pending SCSI commands. Currently that's not the case and hence removal of a SCSI host during I/O can cause a deadlock. See also "blkdev_issue_discard() hangs forever if underlying storage device is removed" (http://bugzilla.kernel.org/show_bug.cgi?id=40472). See also http://lkml.org/lkml/2011/8/27/6. Signed-off-by: Bart Van Assche <bvanassche@acm.org> Cc: <stable@kernel.org> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
# 573e5913	05-Jul-2011	James Smart <james.smart@emulex.com>	[SCSI] scsi_lib: pause between error retries During cable pull tests on our 16G FC adapter, we are seeing errors, typically reads to close targets, which fail due to CRC or framing errors caused by the cable being pull (return status DID_ERROR). The adapter detects the error on one of the first frames received, marks the FC exchange as dead (further frames go to bit bucket) and signals the host of the error. This action is so quick, and coupled with fast host CPUs, creates a scenario in which the midlayer sees the failure and retries the io almost immediately. We've seen link traces with the retry on the link while the original i/o is still being processed by the target. We're also seeing the time window for the "link to pull-apart" and the physical interface to report disconnected to be in the few millisecond range. Which means, we're encountering scenarios where the full retry count is exhausted (all with error) by the midlayer before the link disconnect state is detected. We looked at 8G FC behavior and occasionally see the same behavior, but as the link was slower, it rarely could exhaust all retries before the link reported disconnect. What is needed is a slight delay between io retries due to DID_ERROR to cover this error. It is inappropriate to put this delay in the driver, as the error is indistinguishable from other link-related errors, nor does the driver track whether the io is a retry or not. This is also easier than tracking between-io-error bursts that are seen in this scenario. The patch below updates the retry path so that it inserts a delay as if the target was busy. The busy delay is on the order of 6ms. This delay is sufficient to ensure the link down condition is reported before the retry count is exhausted (at most 1 retry is seen). Signed-off-by: Alex Iannicelli <alex.iannicelli@emulex.com> Signed-off-by: James Smart <james.smart@emulex.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
# bfe159a5	07-Jul-2011	James Bottomley <James.Bottomley@HansenPartnership.com>	[SCSI] fix crash in scsi_dispatch_cmd() USB surprise removal of sr is triggering an oops in scsi_dispatch_command(). What seems to be happening is that USB is hanging on to a queue reference until the last close of the upper device, so the crash is caused by surprise remove of a mounted CD followed by attempted unmount. The problem is that USB doesn't issue its final commands as part of the SCSI teardown path, but on last close when the block queue is long gone. The long term fix is probably to make sr do the teardown in the same way as sd (so remove all the lower bits on ejection, but keep the upper disk alive until last close of user space). However, the current oops can be simply fixed by not allowing any commands to be sent to a dead queue. Cc: stable@kernel.org Signed-off-by: James Bottomley <JBottomley@Parallels.com>
# 9937a5e2	17-May-2011	Jens Axboe <jaxboe@fusionio.com>	scsi: remove performance regression due to async queue run Commit c21e6beb removed our queue request_fn re-enter protection, and defaulted to always running the queues from kblockd to be safe. This was a known potential slow down, but should be safe. Unfortunately this is causing big performance regressions for some, so we need to improve this logic. Looking into the details of the re-enter, the real issue is on requeue of requests. Requeue of requests upon seeing a BUSY condition from the device ends up re-running the queue, causing traces like this: scsi_request_fn() scsi_dispatch_cmd() scsi_queue_insert() __scsi_queue_insert() scsi_run_queue() scsi_request_fn() ... potentially causing the issue we want to avoid. So special case the requeue re-run of the queue, but improve it to offload the entire run of local queue and starved queue from a single workqueue callback. This is a lot better than potentially kicking off a workqueue run for each device seen. This also fixes the issue of the local device going into recursion, since the above mentioned commit never moved that queue run out of line. Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
# c055f5b2	01-May-2011	James Bottomley <James.Bottomley@suse.de>	[SCSI] fix oops in scsi_run_queue() The recent commit closing the race window in device teardown: commit 86cbfb5607d4b81b1a993ff689bbd2addd5d3a9b Author: James Bottomley <James.Bottomley@suse.de> Date: Fri Apr 22 10:39:59 2011 -0500 [SCSI] put stricter guards on queue dead checks is causing a potential NULL deref in scsi_run_queue() because the q->queuedata may already be NULL by the time this function is called. Since we shouldn't be running a queue that is being torn down, simply add a NULL check in scsi_run_queue() to forestall this. Tested-by: Jim Schutt <jaschut@sandia.gov> Cc: stable@kernel.org Signed-off-by: James Bottomley <James.Bottomley@suse.de>
# c21e6beb	19-Apr-2011	Jens Axboe <jaxboe@fusionio.com>	block: get rid of QUEUE_FLAG_REENTER We are currently using this flag to check whether it's safe to call into ->request_fn(). If it is set, we punt to kblockd. But we get a lot of false positives and excessive punts to kblockd, which hurts performance. The only real abuser of this infrastructure is SCSI. So export the async queue run and convert SCSI over to use that. There's room for improvement in that SCSI need not always use the async call, but this fixes our performance issue and they can fix that up in due time. Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
# 24ecfbe2	18-Apr-2011	Christoph Hellwig <hch@infradead.org>	block: add blk_run_queue_async Instead of overloading __blk_run_queue to force an offload to kblockd add a new blk_run_queue_async helper to do it explicitly. I've kept the blk_queue_stopped check for now, but I suspect it's not needed as the check we do when the workqueue items runs should be enough. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
# c98a0eb0	08-Mar-2011	Martin K. Petersen <martin.petersen@oracle.com>	[SCSI] sd: Logical Block Provisioning update SBC3r26 contains many changes to the Logical Block Provisioning interfaces (formerly known as Thin Provisioning ditto). This patch implements support for both the old and new schemes using the same heuristic as before (whether the LBP VPD page is present). The new code also allows the provisioning mode (i.e. choice of command) to be overridden on a per-device basis via sysfs. Two additional modes are supported in this version: - WRITE SAME(10) with the UNMAP bit set - WRITE SAME(10) without the UNMAP bit set. This allows us to support devices that predate the TP/LBP enhancements in SBC3 and which work by way zero-detection Switching between modes has been consolidated in a helper function that also updates the block layer topology according to the limitations of the chosen command. I experimented with trying WRITE SAME(16) if UNMAP fails, WRITE SAME(10) if WRITE SAME(16) fails, etc. but found several devices that got cranky. So for now we'll disable discard if one of the commands fail. The user still has the option of selecting a different mode in sysfs. Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
# 72f7d322	08-Mar-2011	Martin K. Petersen <martin.petersen@oracle.com>	[SCSI] Include protection operation in SCSI command trace When debugging DIF/DIX it is very helpful to be able to see which DIX operation is associated with the scsi_cmnd. Include the protection op in the SCSI command trace. Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
# a488e749	16-Apr-2010	Jens Axboe <jaxboe@fusionio.com>	scsi: convert to blk_delay_queue() It was always abuse to reuse the plugging infrastructure for this, convert it to the (new) real API for delaying queueing a bit. A default delay of 3 msec is defined, to match the previous behaviour. Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
# 1654e741	02-Mar-2011	Tejun Heo <tj@kernel.org>	block: add @force_kblockd to __blk_run_queue() __blk_run_queue() automatically either calls q->request_fn() directly or schedules kblockd depending on whether the function is recursed. blk-flush implementation needs to be able to explicitly choose kblockd. Add @force_kblockd. All the current users are converted to specify %false for the parameter and this patch doesn't introduce any behavior change. stable: This is prerequisite for fixing ide oops caused by the new blk-flush implementation. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Jan Beulich <JBeulich@novell.com> Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Cc: stable@kernel.org Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
# 63583cca	18-Jan-2011	Hannes Reinecke <hare@suse.de>	[SCSI] Add detailed SCSI I/O errors Instead of just passing 'EIO' for any I/O error we should be notifying the upper layers with more details about the cause of this error. Update the possible I/O errors to: - ENOLINK: Link failure between host and target - EIO: Retryable I/O error - EREMOTEIO: Non-retryable I/O error - EBADE: I/O error restricted to the I_T_L nexus 'Retryable' in this context means that an I/O error _might_ be restricted to the I_T_L nexus (vulgo: path), so retrying on another nexus / path might succeed. 'Non-retryable' in general refers to a target failure, so this error will always be generated regardless of the I_T_L nexus it was send on. I/O errors restricted to the I_T_L nexus might be retried on another nexus / path, but they should _not_ be queued if no paths are available. Signed-off-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
# fd01a663	16-Dec-2010	Hillf Danton <dhillf@gmail.com>	[SCSI] fix the return value of scsi_target_queue_read() It seems that zero should be returned if scsi_target_is_busy(starget) is true, no matter if sdev is on the starved list. Signed-off-by: Hillf Danton <dhillf@gmail.com> Reviewed-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
# e692cb66	01-Dec-2010	Martin K. Petersen <martin.petersen@oracle.com>	block: Deprecate QUEUE_FLAG_CLUSTER and use queue_limits instead When stacking devices, a request_queue is not always available. This forced us to have a no_cluster flag in the queue_limits that could be used as a carrier until the request_queue had been set up for a metadevice. There were several problems with that approach. First of all it was up to the stacking device to remember to set queue flag after stacking had completed. Also, the queue flag and the queue limits had to be kept in sync at all times. We got that wrong, which could lead to us issuing commands that went beyond the max scatterlist limit set by the driver. The proper fix is to avoid having two flags for tracking the same thing. We deprecate QUEUE_FLAG_CLUSTER and use the queue limit directly in the block layer merging functions. The queue_limit 'no_cluster' is turned into 'cluster' to avoid double negatives and to ease stacking. Clustering defaults to being enabled as before. The queue flag logic is removed from the stacking function, and explicitly setting the cluster flag is no longer necessary in DM and MD. Reported-by: Ed Lin <ed.lin@promise.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Acked-by: Mike Snitzer <snitzer@redhat.com> Cc: stable@kernel.org Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
# 9f8a2c23	08-Dec-2010	Tejun Heo <tj@kernel.org>	scsi: replace sr_test_unit_ready() with scsi_test_unit_ready() The usage of TUR has been confusing involving several different commits updating different parts over time. Currently, the only differences between scsi_test_unit_ready() and sr_test_unit_ready() are, * scsi_test_unit_ready() also sets sdev->changed on NOT_READY. * scsi_test_unit_ready() returns 0 if TUR ended with UNIT_ATTENTION or NOT_READY. Due to the above two differences, sr is using its own sr_test_unit_ready(), but sd - the sole user of the above extra handling - doesn't even need them. Where scsi_test_unit_ready() is used in sd_media_changed(), the code is looking for device ready w/ media present state which is true iff TUR succeeds w/o sense data or UA, and when the device is not ready for whatever reason sd_media_changed() explicitly marks media as missing so there's no reason to set sdev->changed automatically from scsi_test_unit_ready() on NOT_READY. Drop both special handlings from scsi_test_unit_ready(), which makes it equivalant to sr_test_unit_ready(), and replace sr_test_unit_ready() with scsi_test_unit_ready(). Also, drop the unnecessary explicit NOT_READY check from sd_media_changed(). Checking return value is enough for testing device readiness. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
# 459dbf72	17-Nov-2010	James Bottomley <James.Bottomley@suse.de>	[SCSI] Eliminate error handler overload of the SCSI serial number The error handler is using the test cmd->serial_number == 0 in the abort routines to signal that the command to be aborted has already completed normally. This design was to close a race window in the original error handler where a command could go through the normal completion routines after it timed out but before error handling was started. Mike Anderson pointed out that when we converted our timeout and softirq completions, we picked up atomicity here because the block layer now mediates this with the REQ_ATOM_COMPLETE flag and guarantees that either the command times out or our done routine is called, but ensures we can't get both occurring. That makes the serial number zero check redundant and it can be removed. Signed-off-by: James Bottomley <James.Bottomley@suse.de>
# 986fe6c7	06-Oct-2010	Mike Christie <michaelc@cs.wisc.edu>	[SCSI] Fix regressions in scsi_internal_device_block Deleting a SCSI device on a blocked fc_remote_port (before fast_io_fail_tmo fires) results in a hanging thread: STACK: 0 schedule+1108 [0x5cac48] 1 schedule_timeout+528 [0x5cb7fc] 2 wait_for_common+266 [0x5ca6be] 3 blk_execute_rq+160 [0x354054] 4 scsi_execute+324 [0x3b7ef4] 5 scsi_execute_req+162 [0x3b80ca] 6 sd_sync_cache+138 [0x3cf662] 7 sd_shutdown+138 [0x3cf91a] 8 sd_remove+112 [0x3cfe4c] 9 __device_release_driver+124 [0x3a08b8] 10 device_release_driver+60 [0x3a0a5c] 11 bus_remove_device+266 [0x39fa76] 12 device_del+340 [0x39d818] 13 __scsi_remove_device+204 [0x3bcc48] 14 scsi_remove_device+66 [0x3bcc8e] 15 sysfs_schedule_callback_work+50 [0x260d66] 16 worker_thread+622 [0x162326] 17 kthread+160 [0x1680b0] 18 kernel_thread_starter+6 [0x10aaea] During the delete, the SCSI device is in moved to SDEV_CANCEL. When the FC transport class later calls scsi_target_unblock, this has no effect, since scsi_internal_device_unblock ignores SCSI devics in this state. It looks like all these are regressions caused by: 5c10e63c943b4c67561ddc6bf61e01d4141f881f [SCSI] limit state transitions in scsi_internal_device_unblock Fix by rejecting offline and cancel in the state transition. Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com> [jejb: Original patch by Christof Schmitt, modified by Mike Christie] Cc: Stable Tree <stable@kernel.org> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
# 13f05c8d	10-Sep-2010	Martin K. Petersen <martin.petersen@oracle.com>	block/scsi: Provide a limit on the number of integrity segments Some controllers have a hardware limit on the number of protection information scatter-gather list segments they can handle. Introduce a max_integrity_segments limit in the block layer and provide a new scsi_host_template setting that allows HBA drivers to provide a value suitable for the hardware. Add support for honoring the integrity segment limit when merging both bios and requests. Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jens Axboe <axboe@carl.home.kernel.dk>
# 3a5c19c2	16-Aug-2010	James Bottomley <James.Bottomley@suse.de>	[SCSI] fix use-after-free in scsi_init_io() we're using a pointer through a freed command to reset the request, which has shown up as an oops with slab poisoning: Reported-by: Tejun Heo <tj@kernel.org> Reported-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
# d6e9fb46	10-Aug-2010	Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>	scsi: remove superfluous NULL pointer check from scsi_kill_request() Dan's list included: drivers/scsi/scsi_lib.c +1365 scsi_kill_request(9) warning: variable derefenced in initializer 'cmd' drivers/scsi/scsi_lib.c +1365 scsi_kill_request(9) warning: variable derefenced before check 'cmd' We dereference cmd (and possible OOPS if cmd == NULL) before starting the request so just remove the superfluous debugging code altogether. [ bart: the potential NULL pointer dereference was finally fixed in (much later than mine) commit 03b1470 but my patch is still valid ] Reported-by: Dan Carpenter <error27@gmail.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Eugene Teo <eteo@redhat.com> Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com> Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
# 610a6349	08-Jul-2010	FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>	scsi: fix discard page leak We leak a page allocated for discard on some error conditions (e.g. scsi_prep_state_check returns BLKPREP_DEFER in scsi_setup_blk_pc_cmnd). We unprep on requests that weren't prepped in the error path of scsi_init_io. It makes the error path to clean up scsi commands messy. Let's strictly apply the rule that we can't unprep on a request that wasn't prepped. Calling just scsi_put_command() in the error path of scsi_init_io() is enough. We don't set REQ_DONTPREP yet. scsi_setup_discard_cmnd can safely free a page on the error case with the above rule. Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
# 28018c24	01-Jul-2010	James Bottomley <James.Bottomley@suse.de>	block: implement an unprep function corresponding directly to prep Reviewed-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
# 33659ebb	07-Aug-2010	Christoph Hellwig <hch@lst.de>	block: remove wrappers for request type/flags Remove all the trivial wrappers for the cmd_type and cmd_flags fields in struct requests. This allows much easier grepping for different request types instead of unwinding through macros. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
# 8a78362c	25-Feb-2010	Martin K. Petersen <martin.petersen@oracle.com>	block: Consolidate phys_segment and hw_segment limits Except for SCSI no device drivers distinguish between physical and hardware segment limits. Consolidate the two into a single segment limit. Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
# 086fa5ff	25-Feb-2010	Martin K. Petersen <martin.petersen@oracle.com>	block: Rename blk_queue_max_sectors to blk_queue_max_hw_sectors The block layer calling convention is blk_queue_<limit name>. blk_queue_max_sectors predates this practice, leading to some confusion. Rename the function to appropriately reflect that its intended use is to set max_hw_sectors. Also introduce a temporary wrapper for backwards compability. This can be removed after the merge window is closed. Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
# e7efe593	03-Jan-2010	Douglas Gilbert <dgilbert@interlog.com>	[SCSI] skip sense logging for some ATA PASS-THROUGH cdbs Further to the lsml thread titled: "does scsi_io_completion need to dump sense data for ata pass through (ck_cond = 1) ?" This is a patch to skip logging when the sense data is associated with a SENSE_KEY of "RECOVERED_ERROR" and the additional sense code is "ATA PASS-THROUGH INFORMATION AVAILABLE". This only occurs with the SAT ATA PASS-THROUGH commands when CK_COND=1 (in the cdb). It indicates that the sense data contains ATA registers. Smartmontools uses such commands on ATA disks connected via SAT. Periodic checks such as those done by smartd cause nuisance entries into logs that are: - neither errors nor warnings - pointless unless the cdb that caused them are also logged Signed-off-by: Douglas Gilbert <dgilbert@interlog.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
# 63c43b0e	15-Dec-2009	Boaz Harrosh <bharrosh@panasas.com>	[SCSI] scsi_lib: Fix bug in completion of bidi commands Because of the terrible structuring of scsi-bidi-commands it breaks some of the life time rules of a scsi-command. It is now not allowed to free up the block-request before cleanup and partial deallocation of the scsi-command. (Which is not so for none bidi commands) The right fix to this problem would be to make bidi command a first citizen by allocating a scsi_sdb pointer at scsi command just like cmd->prot_sdb. The bidi sdb should be allocated/deallocated as part of the get/put_command (Again like the prot_sdb) and the current decoupling of scsi_cmnd and blk-request should be kept. For now make sure scsi_release_buffers() is called before the call to blk_end_request_all() which might cause the suicide of the block requests. At best the leak of bidi buffers, at worse a crash, as there is a race between the existence of the bidi_request and the free of the associated bidi_sdb. The reason this was never hit before is because only OSD has the potential of doing asynchronous bidi commands. (So does bsg but it is never used) And OSD clients just happen to do all their bidi commands synchronously, up until recently. CC: Stable Tree <stable@kernel.org> Signed-off-by: Boaz Harrosh <bharrosh@panasas.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
# d8705f11	25-Nov-2009	Martin K. Petersen <martin.petersen@oracle.com>	[SCSI] Correctly handle thin provisioning write error A thin provisioned device may temporarily be out of sufficient allocation units to fulfill a write request. In that case it will return a space allocation in progress error. Wait a bit and retry the write. Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
# 03b14708	23-Sep-2009	Jiri Slaby <jirislaby@kernel.org>	[SCSI] scsi_lib: fix potential NULL dereference Stanse found a potential NULL dereference in scsi_kill_request. Instead of triggering BUG() in 'if (unlikely(cmd == NULL))' branch, the kernel will Oops earlier on cmd dereference. Move the dereferences after the if. Signed-off-by: Jiri Slaby <jirislaby@gmail.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
# ad630826	28-Sep-2009	Mike Christie <michaelc@cs.wisc.edu>	[SCSI] fix propogation of integrity errors When the Integrity check is done in scsi_io_completion it will set error to -EILSEQ. However, at this point error is no longer used, and blk_end_request_err has -EIO hardcoded. It looks like there was just porting mistake with this patch http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=3e695f89c5debb735e4ff051e9e58d8fb4e95110 and we meant to send error upwards, so this patch changes the hard coded EIO to the error variable. I have only boot tested this patch. Signed-off-by: Mike Christie <michaelc@cs.wisc.edu> Acked-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
# da6c5c72	11-Sep-2009	Tejun Heo <tj@kernel.org>	scsi,block: update SCSI to handle mixed merge failures Update scsi_io_completion() such that it only fails requests till the next error boundary and retry the leftover. This enables block layer to merge requests with different failfast settings and still behave correctly on errors. Allow merge of requests of different failfast settings. As SCSI is currently the only subsystem which follows failfast status, there's no need to worry about other block drivers for now. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Niel Lambrechts <niel.lambrechts@gmail.com> Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
# 002b1eb2	23-May-2009	Martin K. Petersen <martin.petersen@oracle.com>	[SCSI] Print failed commands When a request fails we print the sense data but not the actual command that failed. Add a printout of the operation + CDB for failed commands. Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
# b391277a	18-Jun-2009	Hannes Reinecke <hare@suse.de>	sd, sr: fix Driver 'sd' needs updating message If a SCSI ULD driver sets blk_queue_prep_rq(), it should clean it up itself on remove(), and not from the bus callbacks. This removes the need to hook into bus->remove(), which should not be used at the same time as driver->remove(). [jejb: fix sdkp initialisation problem due to mismerge] Signed-off-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
# 5c10e63c	28-Apr-2009	Takahiro Yasui <tyasui@redhat.com>	[SCSI] limit state transitions in scsi_internal_device_unblock scsi timeout on two or more devices may cause extremely long execution time for user applications because SDEV_OFFLINE state is changed to SDEV_RUNNING state during scsi error recovery procedures triggered by a bus reset or a host reset of scsi LLD, and scsi timeout can happens on the same devices many times. This happens because scsi_internal_device_unblock() changes device's state to SDEV_RUNNING even if a device in other states than SDEV_BLOCK, while the following two transitions are required in this function. SDEV_BLOCK -> SDEV_RUNNING SDEV_CREATED_BLOCK -> SDEV_CREATED Otherwise, it returns -EINVAL. Signed-off-by: Takahiro Yasui <tyasui@redhat.com> [matthew@wil.cx: supplied rewritten base for patch] Signed-off-by: Matthew Wilcox <matthew@wil.cx> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
# ac36552a	19-May-2009	Boaz Harrosh <bharrosh@panasas.com>	scsi_lib: remove unused variable The last request completion cleanup in scsi_lib left an unused this_count variable in scsi_io_completion(). (It was used before in a code segment that now uses blk_end_request_all()) Signed-off-by: Boaz Harrosh <bharrosh@panasas.com> Acked-by: Tejun Heo <tj@kernel.org> Acked-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
# e458824f	12-May-2009	Tejun Heo <tj@kernel.org>	scsi: fix resid_len mis-conversion in scsi_end_request() Commit c3a4d78c580de4edc9ef0f7c59812fb02ceb037f introduced rq->data_len and converted residual count users to it. While converting, it mistakenly converted scsi_end_request() to finish requests with residual count when it wants to do is fully complete the request. Fix it by using blk_end_request_all() instead. This bug was spotted by Boaz Harrosh. Signed-off-by: Tejun Heo <tj@kernel.org> Spotted-by: Boaz Harrosh <bharrosh@panasas.com> Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
# e6bb7a96	11-May-2009	FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>	scsi: simplify the bidi completion Let's use blk_end_request_all() instead of blk_end_bidi_request(). Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
# 9934c8c0	07-May-2009	Tejun Heo <tj@kernel.org>	block: implement and enforce request peek/start/fetch Till now block layer allowed two separate modes of request execution. A request is always acquired from the request queue via elv_next_request(). After that, drivers are free to either dequeue it or process it without dequeueing. Dequeue allows elv_next_request() to return the next request so that multiple requests can be in flight. Executing requests without dequeueing has its merits mostly in allowing drivers for simpler devices which can't do sg to deal with segments only without considering request boundary. However, the benefit this brings is dubious and declining while the cost of the API ambiguity is increasing. Segment based drivers are usually for very old or limited devices and as converting to dequeueing model isn't difficult, it doesn't justify the API overhead it puts on block layer and its more modern users. Previous patches converted all block low level drivers to dequeueing model. This patch completes the API transition by... * renaming elv_next_request() to blk_peek_request() * renaming blkdev_dequeue_request() to blk_start_request() * adding blk_fetch_request() which is combination of peek and start * disallowing completion of queued (not started) requests * applying new API to all LLDs Renamings are for consistency and to break out of tree code so that it's apparent that out of tree drivers need updating. [ Impact: block request issue API cleanup, no functional change ] Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Cc: Mike Miller <mike.miller@hp.com> Cc: unsik Kim <donari75@gmail.com> Cc: Paul Clements <paul.clements@steeleye.com> Cc: Tim Waugh <tim@cyberelk.net> Cc: Geert Uytterhoeven <Geert.Uytterhoeven@sonycom.com> Cc: David S. Miller <davem@davemloft.net> Cc: Laurent Vivier <Laurent@lvivier.info> Cc: Jeff Garzik <jgarzik@pobox.com> Cc: Jeremy Fitzhardinge <jeremy@xensource.com> Cc: Grant Likely <grant.likely@secretlab.ca> Cc: Adrian McMenamin <adrian@mcmen.demon.co.uk> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com> Cc: Borislav Petkov <petkovbb@googlemail.com> Cc: Sergei Shtylyov <sshtylyov@ru.mvista.com> Cc: Alex Dubov <oakad@yahoo.com> Cc: Pierre Ossman <drzeus@drzeus.cx> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Markus Lidel <Markus.Lidel@shadowconnect.com> Cc: Stefan Weinhuber <wein@de.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Pete Zaitcev <zaitcev@redhat.com> Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
# 1011c1b9	07-May-2009	Tejun Heo <tj@kernel.org>	block: blk_rq_[cur_]_{sectors\|bytes}() usage cleanup With the previous changes, the followings are now guaranteed for all requests in any valid state. * blk_rq_sectors() == blk_rq_bytes() >> 9 * blk_rq_cur_sectors() == blk_rq_cur_bytes() >> 9 Clean up accessor usages. Notable changes are * nbd,i2o_block: end_all used instead of explicit byte count * scsi_lib: unnecessary conditional on request type removed [ Impact: cleanup ] Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Paul Clements <paul.clements@steeleye.com> Cc: Pete Zaitcev <zaitcev@redhat.com> Cc: Alex Dubov <oakad@yahoo.com> Cc: Markus Lidel <Markus.Lidel@shadowconnect.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Cc: Boaz Harrosh <bharrosh@panasas.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
# b0790410	07-May-2009	Tejun Heo <tj@kernel.org>	block: cleanup rq->data_len usages With recent unification of fields, it's now guaranteed that rq->data_len always equals blk_rq_bytes(). Convert all non-IDE direct users to accessors. IDE will be converted in a separate patch. Boaz: spotted incorrect data_len/resid_len conversion in osd. [ Impact: convert direct rq->data_len usages to blk_rq_bytes() ] Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com> Cc: Pete Zaitcev <zaitcev@redhat.com> Cc: Eric Moore <Eric.Moore@lsi.com> Cc: Markus Lidel <Markus.Lidel@shadowconnect.com> Cc: Darrick J. Wong <djwong@us.ibm.com> Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Cc: Eric Moore <Eric.Moore@lsi.com> Cc: Boaz Harrosh <bharrosh@panasas.com> Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
# 83096ebf	07-May-2009	Tejun Heo <tj@kernel.org>	block: convert to pos and nr_sectors accessors With recent cleanups, there is no place where low level driver directly manipulates request fields. This means that the 'hard' request fields always equal the !hard fields. Convert all rq->sectors, nr_sectors and current_nr_sectors references to accessors. While at it, drop superflous blk_rq_pos() < 0 test in swim.c. [ Impact: use pos and nr_sectors accessors ] Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Geert Uytterhoeven <Geert.Uytterhoeven@sonycom.com> Tested-by: Grant Likely <grant.likely@secretlab.ca> Acked-by: Grant Likely <grant.likely@secretlab.ca> Tested-by: Adrian McMenamin <adrian@mcmen.demon.co.uk> Acked-by: Adrian McMenamin <adrian@mcmen.demon.co.uk> Acked-by: Mike Miller <mike.miller@hp.com> Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Cc: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com> Cc: Borislav Petkov <petkovbb@googlemail.com> Cc: Sergei Shtylyov <sshtylyov@ru.mvista.com> Cc: Eric Moore <Eric.Moore@lsi.com> Cc: Alan Stern <stern@rowland.harvard.edu> Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Cc: Pete Zaitcev <zaitcev@redhat.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Paul Clements <paul.clements@steeleye.com> Cc: Tim Waugh <tim@cyberelk.net> Cc: Jeff Garzik <jgarzik@pobox.com> Cc: Jeremy Fitzhardinge <jeremy@xensource.com> Cc: Alex Dubov <oakad@yahoo.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Dario Ballabio <ballabio_dario@emc.com> Cc: David S. Miller <davem@davemloft.net> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: unsik Kim <donari75@gmail.com> Cc: Laurent Vivier <Laurent@lvivier.info> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
# 5b93629b	07-May-2009	Tejun Heo <tj@kernel.org>	block: implement blk_rq_pos/[cur_]sectors() and convert obvious ones Implement accessors - blk_rq_pos(), blk_rq_sectors() and blk_rq_cur_sectors() which return rq->hard_sector, rq->hard_nr_sectors and rq->hard_cur_sectors respectively and convert direct references of the said fields to the accessors. This is in preparation of request data length handling cleanup. Geert : suggested adding const to struct request * parameter to accessors Sergei : spotted error in patch description [ Impact: cleanup ] Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Geert Uytterhoeven <Geert.Uytterhoeven@sonycom.com> Acked-by: Stephen Rothwell <sfr@canb.auug.org.au> Tested-by: Grant Likely <grant.likely@secretlab.ca> Acked-by: Grant Likely <grant.likely@secretlab.ca> Ackec-by: Sergei Shtylyov <sshtylyov@ru.mvista.com> Cc: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com> Cc: Borislav Petkov <petkovbb@googlemail.com> Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
# c3a4d78c	07-May-2009	Tejun Heo <tj@kernel.org>	block: add rq->resid_len rq->data_len served two purposes - the length of data buffer on issue and the residual count on completion. This duality creates some headaches. First of all, block layer and low level drivers can't really determine what rq->data_len contains while a request is executing. It could be the total request length or it coulde be anything else one of the lower layers is using to keep track of residual count. This complicates things because blk_rq_bytes() and thus [__]blk_end_request_all() relies on rq->data_len for PC commands. Drivers which want to report residual count should first cache the total request length, update rq->data_len and then complete the request with the cached data length. Secondly, it makes requests default to reporting full residual count, ie. reporting that no data transfer occurred. The residual count is an exception not the norm; however, the driver should clear rq->data_len to zero to signify the normal cases while leaving it alone means no data transfer occurred at all. This reverse default behavior complicates code unnecessarily and renders block PC on some drivers (ide-tape/floppy) unuseable. This patch adds rq->resid_len which is used only for residual count. While at it, remove now unnecessasry blk_rq_bytes() caching in ide_pc_intr() as rq->data_len is not changed anymore. Boaz : spotted missing conversion in osd Sergei : spotted too early conversion to blk_rq_bytes() in ide-tape [ Impact: cleanup residual count handling, report 0 resid by default ] Signed-off-by: Tejun Heo <tj@kernel.org> Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Cc: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com> Cc: Borislav Petkov <petkovbb@googlemail.com> Cc: Sergei Shtylyov <sshtylyov@ru.mvista.com> Cc: Mike Miller <mike.miller@hp.com> Cc: Eric Moore <Eric.Moore@lsi.com> Cc: Alan Stern <stern@rowland.harvard.edu> Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Cc: Doug Gilbert <dgilbert@interlog.com> Cc: Mike Miller <mike.miller@hp.com> Cc: Eric Moore <Eric.Moore@lsi.com> Cc: Darrick J. Wong <djwong@us.ibm.com> Cc: Pete Zaitcev <zaitcev@redhat.com> Cc: Boaz Harrosh <bharrosh@panasas.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
# 731ec497	22-Apr-2009	Tejun Heo <tj@kernel.org>	block: kill rq->data Now that all block request data transfer is done via bio, rq->data isn't used. Kill it. While at it, make the roles of rq->special and buffer clear. [ Impact: drop now unncessary field from struct request ] Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Boaz Harrosh <bharrosh@panasas.com>
# 40cbbb78	22-Apr-2009	Tejun Heo <tj@kernel.org>	block: implement and use [__]blk_end_request_all() There are many [__]blk_end_request() call sites which call it with full request length and expect full completion. Many of them ensure that the request actually completes by doing BUG_ON() the return value, which is awkward and error-prone. This patch adds [__]blk_end_request_all() which takes @rq and @error and fully completes the request. BUG_ON() is added to to ensure that this actually happens. Most conversions are simple but there are a few noteworthy ones. * cdrom/viocd: viocd_end_request() replaced with direct calls to __blk_end_request_all(). * s390/block/dasd: dasd_end_request() replaced with direct calls to __blk_end_request_all(). * s390/char/tape_block: tapeblock_end_request() replaced with direct calls to blk_end_request_all(). [ Impact: cleanup ] Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Mike Miller <mike.miller@hp.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Jeff Garzik <jgarzik@pobox.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Jeremy Fitzhardinge <jeremy@xensource.com> Cc: Alex Dubov <oakad@yahoo.com> Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
# b4efdd58	09-Apr-2009	Mike Christie <michaelc@cs.wisc.edu>	[SCSI] fix q->lock not held warning when target is busy We cannot call blk_plug_device from scsi_target_queue_ready because the q lock is not held. And we do not need to call it from there because when we return 0, the scsi_request_fn not_ready handling will plug the queue for us if needed. Signed-off-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
# a9bddd74	30-Mar-2009	James Bottomley <James.Bottomley@HansenPartnership.com>	[SCSI] fix recovered error handling We have a problem with recovered error handling in that any command which goes down as BLOCK_PC but which returns a sense code of RECOVERED ERROR gets completed with -EIO. For actual SG_IO commands, this doesn't matter at all, since the error return code gets dropped in favour of req->errors which contain the SCSI completion code. However, if this command is part of the block system, then it will pay attention to the returned error code. In particularly if a SYNCHRONIZE CACHE from a barrier command completes with RECOVERED ERROR, the resulting -EIO on the barrier causes block to error the request and return it to the filesystem. Fix this by converting the -EIO for recovered error to zero, plus remove the printing of this from sd and sr so the message isn't double printed. Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
# f078727b	13-Dec-2008	FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>	[SCSI] remove scsi_req_map_sg No one uses scsi_execute_async with data transfer now. We can remove scsi_req_map_sg. Only scsi_eh_lock_door uses scsi_execute_async. scsi_eh_lock_door doesn't handle sense and the callback. So we can remove scsi_io_context too. Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
# 126c0982	19-Feb-2009	James Bottomley <James.Bottomley@HansenPartnership.com>	[SCSI] fix ABORTED_COMMAND looping forever problem Instead of terminating after five retries, commands terminated by ABORTED_COMMAND sense are retrying forever. The problem was introduced by: commit b60af5b0adf0da24c673598c8d3fb4d4189a15ce Author: Alan Stern <stern@rowland.harvard.edu> Date: Mon Nov 3 15:56:47 2008 -0500 [SCSI] simplify scsi_io_completion() Which introduced an error whereby ABORTED_COMMAND now gets erroneously retried in scsi_io_completion. Fix this by returning the behaviour back to the default no retry. Reported-by: Sitsofe Wheeler <sitsofe@yahoo.com> Tested-by: Sitsofe Wheeler <sitsofe@yahoo.com> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
# 79ed2429	06-Jan-2009	James Bottomley <James.Bottomley@HansenPartnership.com>	[SCSI] scsi_lib: fix DID_RESET status problems Andrew Vaszquez said: > There's a problem that is causing commands returned by the LLD with > a DID_RESET status to be reissued with cleared cmd->sdb data which > in our tests are manifesting in firmware detected overruns. Here's > a snippet of a READ_10 scsi_cmnd upon completion by the storage The problem is caused by: commit b60af5b0adf0da24c673598c8d3fb4d4189a15ce Author: Alan Stern <stern@rowland.harvard.edu> Date: Mon Nov 3 15:56:47 2008 -0500 [SCSI] simplify scsi_io_completion() Because scsi_release_buffers() is called before commands that go through the ACTION_RETRY and ACTION_DELAYED_RETRY legs are requeued. However, they're not re-prepared, so nothing ever reallocates the buffer resources to them. Fix this by releasing the buffers only if we're not going to go down these legs (but scsi_release_buffers() on all legs including two in scsi_end_request(); this latter needs a special version __scsi_release_buffers() because the final one can be called after the request has been freed, so the bidi test in scsi_release_buffers(), which touches the request has to be skipped). Reported-by: Andrew Vasquez <andrew.vasquez@qlogic.com> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
# 3e695f89	04-Jan-2009	Martin K. Petersen <martin.petersen@oracle.com>	[SCSI] Fix error handling for DIF/DIX patch commit b60af5b0adf0da24c673598c8d3fb4d4189a15ce Author: Alan Stern <stern@rowland.harvard.edu> Date: Mon Nov 3 15:56:47 2008 -0500 [SCSI] simplify scsi_io_completion() broke DIX error handling. Also, we are now using EILSEQ to indicate integrity errors to the upper layers (as opposed to regular EIO failures). This allows filesystems to inspect buffers and decide whether to retry the I/O. Update scsi_io_completion() accordingly. Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
# 4f5299ac	02-Jan-2009	James Bottomley <James.Bottomley@HansenPartnership.com>	[SCSI] scsi_lib: don't decrement busy counters when inserting commands A bug was introduced by commit b60af5b0adf0da24c673598c8d3fb4d4189a15ce Author: Alan Stern <stern@rowland.harvard.edu> Date: Mon Nov 3 15:56:47 2008 -0500 [SCSI] simplify scsi_io_completion() because the simplification uses scsi_queue_insert(). The problem with this function is that it expects to be called from the completion path while the command is still outstanding, so it decrements the device and host busy counts to do the requeue. The problem is that scsi_io_completion() is a path executed well after these counts have already been decremented, leading to a double decrement if the command goes down any error path leading to ACTION_DELAYED_RETRY. The fix is to allow a private function __scsi_queue_insert() with a flag to say whether the busy counters should be decremented. This is made static to scsi_lib.c to discourage other use. Reported-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
# 3dbf6a54	15-Dec-2008	Alan Stern <stern@rowland.harvard.edu>	[SCSI] Fix uninitialized variable error in scsi_io_completion This patch (as1191) adds a missing "default" case in scsi_io_completion(), thereby fixing an "uninitialized variable" error. It also adds a missing newline to a log entry. Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
# f4f4e47e	03-Dec-2008	FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>	[SCSI] add residual argument to scsi_execute and scsi_execute_req scsi_execute() and scsi_execute_req() discard the residual length information. Some callers need it. This adds residual argument (optional) to scsi_execute and scsi_execute_req. Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
# b60af5b0	03-Nov-2008	Alan Stern <stern@rowland.harvard.edu>	[SCSI] simplify scsi_io_completion() This patch (as1142b) consolidates a lot of repetitious code in scsi_io_completion(). It also fixes a few comments. Most importantly, however, it clearly distinguishes among the three sorts of retries that can be done when a command fails to complete: Unprepare the request and resubmit it, so that a new command will be created for it. Requeue the request directly so that it will be retried immediately using the same command. Requeue the request so that it will be retried following a short delay. Complete the remainder of the request with an I/O error. [jejb: Updates 1. For several error conditions, we would now print the sense twice in slightly different ways, so unify the location of sense printing. 2. I added more descriptions to actual failure conditions for better debugging 3. according to spec, ABORTED_COMMAND is supposed to be retried (except on DIF failure). Our old behaviour of erroring it looks to be a bug. 4. I'd prefer not to default initialise the action variable because that ensures that every leg of the error handler has an associated action and the compiler will warn if someone later accidentally misses one or removes one. ] Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
# 02bd3499	12-Dec-2008	James Bottomley <James.Bottomley@HansenPartnership.com>	[SCSI] scsi_lib: only call scsi_unprep_request() under queue lock It's called under that lock everywhere else and it does alter the request state, so it should be. This one occurance in scsi_requeue_command() could open a window where req->special is set to NULL while the requests is going through either timeout or completion processing leading to NULL pointer derefs of the sort complained of in bugzillas 12020 and 12195. Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
# 2a3a59e5	11-Nov-2008	Mike Christie <michaelc@cs.wisc.edu>	[SCSI] Fix hang in starved list processing Close possible infinite loop with interrupts off when devices are added back to the starved list. Fixes: http://bugzilla.kernel.org/show_bug.cgi?id=11898 Reported-by: <alex.shi@intel.com> Signed-off-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
# 6c5121b7	04-Oct-2008	Kiyoshi Ueda <k-ueda@ct.jp.nec.com>	[SCSI] export busy state via q->lld_busy_fn() This patch implements q->lld_busy_fn() for scsi mid layer to export its busy state for request stacking drivers. For efficiency, no lock is taken to check the busy state of shost/starget/sdev, since the returned value is not guaranteed and may be changed after request stacking drivers call the function, regardless of taking lock or not. When scsi can't dispatch I/Os anymore and needs to kill I/Os (e.g. !sdev), scsi needs to return 'not busy'. Otherwise, request stacking drivers may hold requests forever. Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com> Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
# 9d112517	04-Oct-2008	Kiyoshi Ueda <k-ueda@ct.jp.nec.com>	[SCSI] refactor sdev/starget/shost busy checking This patch refactors the busy checking codes of scsi_device, Scsi_Host and scsi_target. There should be no functional change. This is a preparation for another patch which exports scsi's busy state to the block layer for request stacking drivers. Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com> Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
# 32c356d7	19-Aug-2008	James Bottomley <James.Bottomley@HansenPartnership.com>	[SCSI] fix removable device inability to detect disk changes On Tue, 12 Aug 2008 15:08:14 +0200 Giuliano Pochini <pochini@shiny.it> wrote: > Fujitsu magneto-optical drive, Adaptec 29160 and > Linux Jay 2.6.26 #7 SMP Sun Aug 10 18:34:22 CEST 2008 ppc 7455, altivec supported PowerMac3,6 GNU/Linux > > When I insert a disk and I mount it, scsi_test_unit_ready() is called and > the do-while loop gets sshdr->sense_key == UNIT_ATTENTION in the first > cycle and 0 in the second one. So the if below misses the UNIT_ATTENTION > and sdev->changed = 1 is not executed. At this point bad things can > happen... I'm not sure how to fix this. Any clue ? The problem is essentially caused by us eating UNIT_ATTENTION conditions in scsi_test_unit_ready(). Fix by updating the ->changed flag when this happens if the media is removable. [pochini@shiny.it: updates to tidy up patch] Signed-off-by: Giuliano Pochini <pochini@shiny.it> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
# 4a27446f	19-Aug-2008	Mike Christie <michaelc@cs.wisc.edu>	[SCSI] modify scsi to handle new fail fast flags. This checks the errors the scsi-ml determined were retryable and returns if we should fast fail it based on the request fail fast flags. Without the patch, drivers like lpfc, qla2xxx and fcoe would return DID_ERROR for what it determines is a temporary communication problem. There is no loss of connectivity at that time and the driver thinks that it would be fast to retry at the driver level. SCSI-ml will however sees fast fail on the request and DID_ERROR and will fast fail the io. This will then cause dm-multipath to fail the path and possibley switch target controllers when we should be retrying at the scsi layer. We also were fast failing device errors to dm multiapth when unless the scsi_dh modules think otherwis we want to retry at the scsi layer because multipath can only retry the IO like scsi should have done. multipath is a little dumber though because it does not what the error was for and assumes that it should fail the paths. Signed-off-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
# f0c0a376	17-Aug-2008	Mike Christie <michaelc@cs.wisc.edu>	[SCSI] Add helper code so transport classes/driver can control queueing (v3) SCSI-ml manages the queueing limits for the device and host, but does not do so at the target level. However something something similar can come in userful when a driver is transitioning a transport object to the the blocked state, becuase at that time we do not want to queue io and we do not want the queuecommand to be called again. The patch adds code similar to the exisiting SCSI_ML_*BUSY handlers. You can now return SCSI_MLQUEUE_TARGET_BUSY when we hit a transport level queueing issue like the hw cannot allocate some resource at the iscsi session/connection level, or the target has temporarily closed or shrunk the queueing window, or if we are transitioning to the blocked state. bnx2i, when they rework their firmware according to netdev developers requests, will also need to be able to limit queueing at this level. bnx2i will hook into libiscsi, but will allocate a scsi host per netdevice/hba, so unlike pure software iscsi/iser which is allocating a host per session, it cannot set the scsi_host->can_queue and return SCSI_MLQUEUE_HOST_BUSY to reflect queueing limits on the transport. The iscsi class/driver can also set a scsi_target->can_queue value which reflects the max commands the driver/class can support. For iscsi this reflects the number of commands we can support for each session due to session/connection hw limits, driver limits, and to also reflect the session/targets's queueing window. Changes: v1 - initial patch. v2 - Fix scsi_run_queue handling of multiple blocked targets. Previously we would break from the main loop if a device was added back on the starved list. We now run over the list and check if any target is blocked. v3 - Rediff for scsi-misc. Signed-off-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
# 242f9dcb	14-Sep-2008	Jens Axboe <jens.axboe@oracle.com>	block: unify request timeout handling Right now SCSI and others do their own command timeout handling. Move those bits to the block layer. Instead of having a timer per command, we try to be a bit more clever and simply have one per-queue. This avoids the overhead of having to tear down and setup a timer for each command, so it will result in a lot less timer fiddling. Signed-off-by: Mike Anderson <andmike@linux.vnet.ibm.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
# 6f4267e3	22-Aug-2008	James Bottomley <James.Bottomley@HansenPartnership.com>	[SCSI] Update the SCSI state model to allow blocking in the created state Brian King <brking@linux.vnet.ibm.com> reported that fibre channel devices can oops during scanning if their ports block (because the device goes from CREATED -> BLOCK -> RUNNING rather than CREATED -> BLOCK -> CREATED). Fix this by adding a new state: CREATED_BLOCK which can only transition back to CREATED and disallow the CREATED -> BLOCK transition. Now both the created and blocked states that the mid-layer recognises can include CREATED_BLOCK. Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
# 44ea91c5	19-Sep-2008	James Bottomley <James.Bottomley@HansenPartnership.com>	[SCSI] Fix hang with split requests Sometimes, particularly for USB devices with the last sector bug, requests get completed in chunks. There's a bug in this in that if one of the chunks gets an error, we complete that chunk with an error but never move on to the remaining ones, leading to the request hanging (because it's not fully completed). Fix this by completing all remaining chunks if an error is encountered. Cc: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
# cadbd4a5	04-Jul-2008	Harvey Harrison <harvey.harrison@gmail.com>	[SCSI] replace __FUNCTION__ with __func__ [jejb: fixed up a ton of missed conversions. All of you are on notice this has happened, driver trees will now need to be rebased] Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Cc: SCSI List <linux-scsi@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
# 6bd522f6	22-Jul-2008	Mike Christie <michaelc@cs.wisc.edu>	[SCSI] scsi_lib: use blk_rq_tagged in scsi_request_fn I goofed and did not see the macro for checking if a request is tagged. This patch has us use blk_rq_tagged instead of digging into the req->tag. Patch was made over scsi-misc. Signed-off-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
# 511e44f4	17-Jul-2008	Martin K. Petersen <martin.petersen@oracle.com>	[SCSI] Do not retry a request whose data integrity check failed If initiator or target reject the I/O due to DIF errors there is no point in retrying. Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
# 7027ad72	17-Jul-2008	Martin K. Petersen <martin.petersen@oracle.com>	[SCSI] Support devices with protection information Implement support for DMA of protection information for devices that are data integrity capable. - Add support for mapping an extra scatter-gather list containing the protection information. - Allocate protection scsi_data_buffer if host is DIX (integrity DMA) capable. - Accessor function for checking whether a device has protection enabled. Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
# ecefe8a9	11-Jul-2008	Mike Christie <michaelc@cs.wisc.edu>	[SCSI] fix shared tag map tag allocation When drivers use a shared tag map we can end up with more requests than tags, because the tag map is shost->can_queue tags and there can be sdevs * sdev->queue_depth requests. In scsi_request_fn if tag allocation fails we just drop down to just dequeueing the tag without a tag. The problem is that drivers using the shared tag map rely on a valid tag always being set, because it will use the tag number to lookup commands later. This patch has us check if we got a valid tag when the host lock is held right before we check if the host queue is ready. We do the check here because to allocate the tag we need the q lock, but if the tag is bad we want to add the device/q onto the starved list which requires the host lock. Signed-off-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
# 2476b4d0	03-Jul-2008	James Bottomley <James.Bottomley@HansenPartnership.com>	[SCSI] fix locking in host use of blk_plug_device() scsi_lib.c:scsi_host_queue_ready() plugs the device with incorrect locking. It should actually have the queue lock held, but it's holding the host lock. Fix this by eliminating the call. The host ready has no need to plug the queue because if it returns 0 in scsi_request_function control transfers to not_ready which acquires the queue lock and plugs the device if its at zero depth. Reported-by: Elias Oltmanns <eo@nebensachen.de> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
# 6362abd3	05-Jun-2008	Martin K. Petersen <martin.petersen@oracle.com>	[SCSI] Rename scsi_bidi_sdb_cache The data integrity changes need to dynamically allocate scsi_data_buffers too. Rename scsi_bidi_sdb_cache for clarity. Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
# bdb2b8ca	24-Jun-2008	Alan Stern <stern@rowland.harvard.edu>	[SCSI] erase invalid data returned by device This patch (as1108) fixes a problem that can occur with certain USB mass-storage devices: They return invalid data together with a residue indicating that the data should be ignored. Rather than leave the invalid data in a transfer buffer, where it can get misinterpreted, the patch clears the invalid portion of the buffer. This solves a problem (wrong write-protect setting detected) reported by Maciej Rutecki and Peter Teoh. Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Tested-by: Peter Teoh <htmldeveloper@gmail.com> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
# a6a8d9f8	01-May-2008	Chandra Seetharaman <sekharan@us.ibm.com>	[SCSI] scsi_dh: add infrastructure for SCSI Device Handlers Some of the storage devices (that can be accessed through multiple paths), do need some special handling for 1. Activating the passive path of the storage access. 2. Decode and handle the special sense codes returned by the devices. 3. Handle the I/Os being sent to the passive path, especially during the device probe time. when accessed through multiple paths. As of today this special device handling is done at the dm-multipath layer using dm-handlers. That works well for (1); for (2) to be handled at dm layer, scsi sense information need to be exported from SCSI to dm-layer, which is not very attractive; (3) cannot be done at all at the dm layer. Device handler has been moved to SCSI mainly to handle (2) and (3) properly. Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com> Signed-off-by: Mike Anderson <andmike@linux.vnet.ibm.com> Signed-off-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
# db4742dd	30-Apr-2008	Boaz Harrosh <bharrosh@panasas.com>	[SCSI] add support for variable length extended commands Add support for variable-length, extended, and vendor specific CDBs to scsi-ml. It is now possible for initiators and ULD's to issue these types of commands. LLDs need not change much. All they need is to raise the .max_cmd_len to the longest command they support (see iscsi patch). - clean-up some code paths that did not expect commands to be larger than 16, and change cmd_len members' type to short as char is not enough. Signed-off-by: Boaz Harrosh <bharrosh@panasas.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
# 64a87b24	30-Apr-2008	Boaz Harrosh <bharrosh@panasas.com>	[SCSI] Let scsi_cmnd->cmnd use request->cmd buffer - struct scsi_cmnd had a 16 bytes command buffer of its own. This is an unnecessary duplication and copy of request's cmd. It is probably left overs from the time that scsi_cmnd could function without a request attached. So clean that up. - Once above is done, few places, apart from scsi-ml, needed adjustments due to changing the data type of scsi_cmnd->cmnd. - Lots of drivers still use MAX_COMMAND_SIZE. So I have left that #define but equate it to BLK_MAX_CDB. The way I see it and is reflected in the patch below is. MAX_COMMAND_SIZE - means: The longest fixed-length () SCSI CDB as per the SCSI standard and is not related to the implementation. BLK_MAX_CDB. - The allocated space at the request level - I have audit all ISA drivers and made sure none use ->cmnd in a DMA Operation. Same audit was done by Andi Kleen. ()fixed-length here means commands that their size can be determined by their opcode and the CDB does not carry a length specifier, (unlike the VARIABLE_LENGTH_CMD(0x7f) command). This is actually not exactly true and the SCSI standard also defines extended commands and vendor specific commands that can be bigger than 16 bytes. The kernel will support these using the same infrastructure used for VARLEN CDB's. So in effect MAX_COMMAND_SIZE means the maximum size command scsi-ml supports without specifying a cmd_len by ULD's Signed-off-by: Boaz Harrosh <bharrosh@panasas.com> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
# 75ad23bc	29-Apr-2008	Nick Piggin <npiggin@suse.de>	block: make queue flags non-atomic We can save some atomic ops in the IO path, if we clearly define the rules of how to modify the queue flags. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
# fa8e36c3	02-Apr-2008	James Bottomley <James.Bottomley@HansenPartnership.com>	[SCSI] fix barrier failure issue Currently, if the barrier command fails, the error return isn't seen by the block layer and it proceeds on regardless. The problem is that SCSI always returns no error for REQ_TYPE_BLOCK_PC ... it expects the submitter to pick the errors out of req->errors, which the block barrier functions don't do. Since it appears that the way SG_IO and scsi_execute_request() work they discard the block error return and always use req->errors, the best fix for this is to have the SCSI layer return an error to block if one actually occurred (this also allows us to filter out spurious errors, like deferred sense). This patch is a bug fix that will need backporting to stable, but it's also quite a big change and in need of testing, so we'll incubate in the main kernel tree and backport at the -rc2 or so stage if no problems turn up. Acked-by: Jens Axboe <jens.axboe@oracle.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
# 8c5e03d3	30-Mar-2008	Adrian Bunk <bunk@kernel.org>	[SCSI] make scsi_end_bidi_request() static This patch makes the needlessly global scsi_end_bidi_request() static. Signed-off-by: Adrian Bunk <bunk@kernel.org> Acked-by: Boaz Harrosh <bharrosh@panasas.com> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
# 4d1566ed	19-Mar-2008	Kay Sievers <kay.sievers@vrfy.org>	[SCSI] fix media change events for polled devices Commit: a341cd0f (SCSI: add asynchronous event notification API) breaks: 285e9670 (sr,sd: send media state change modification events) by introducing an event filter, which is removed here, to make events, we are depending on, happen again. Fix this by removing the event filter. It's pretty much broken at the moment, since a user can't set it (the attribute being read only). A proper fix will be to make the event discriminator distinguish between AN and Polled media change events. Cc: David Zeuthen <david@fubar.dk> Cc: kristen accardi <kaccardi@gmail.com> Cc: Jeff Garzik <jeff@garzik.org> Signed-off-by: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
# 6b00769f	19-Feb-2008	Tejun Heo <htejun@gmail.com>	block: add request->raw_data_len With padding and draining moved into it, block layer now may extend requests as directed by queue parameters, so now a request has two sizes - the original request size and the extended size which matches the size of area pointed to by bios and later by sgs. The latter size is what lower layers are primarily interested in when allocating, filling up DMA tables and setting up the controller. Both padding and draining extend the data area to accomodate controller characteristics. As any controller which speaks SCSI can handle underflows, feeding larger data area is safe. So, this patch makes the primary data length field, request->data_len, indicate the size of full data area and add a separate length field, request->raw_data_len, for the unmodified request size. The latter is used to report to higher layer (userland) and where the original request size should be fed to the controller or device. Signed-off-by: Tejun Heo <htejun@gmail.com> Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
# 4d2de3a5	05-Feb-2008	Tony Battersby <tonyb@cybernetics.com>	[SCSI] fix BUG when sum(scatterlist) > bufflen When sending a SCSI command to a tape drive via the SCSI Generic (sg) driver, if the command has a data transfer length more than scatter_elem_sz (32 KB default) and not a multiple of 512, then I either hit BUG_ON(!valid_dma_direction(direction)) in dma_unmap_sg() or else the command never completes (depending on the LLDD). When constructing scatterlists, the sg driver rounds up the scatterlist element sizes to be a multiple of 512. This can result in sum(scatterlist lengths) > bufflen. In this case, scsi_req_map_sg() incorrectly sets bio->bi_size to sum(scatterlist lengths) rather than to bufflen. When the command completes, req_bio_endio() detects that bio->bi_size != 0, and so it doesn't call bio_endio(). This causes the command to be resubmitted, resulting in BUG_ON or the command never completing. This patch makes scsi_req_map_sg() set bio->bi_size to bufflen rather than to sum(scatterlist lengths), which fixes the problem. Signed-off-by: Tony Battersby <tonyb@cybernetics.com> Acked-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
# 99c84dbd	04-Feb-2008	FUJITA Tomonori <tomof@acm.org>	iommu sg merging: call dma_set_seg_boundary in __scsi_alloc_queue() This is a one-line patch to add the following to __scsi_alloc_queue(): dma_set_seg_boundary(dev, shost->dma_boundary); This is the simplest approach but the result looks odd, __scsi_alloc_queue() does: blk_queue_segment_boundary(q, shost->dma_boundary); dma_set_seg_boundary(dev, shost->dma_boundary); blk_queue_max_segment_size(q, dma_get_max_seg_size(dev)); I think that it would be better to set up segment boundary in the same way as we did for the maximum segment size. That is, removing shost->dma_boundary and LLDs call pci_set_dma_seg_boundary (or its friends). Then __scsi_alloc_queue() can set up both limits in the same way: blk_queue_segment_boundary(q, dma_get_seg_boundary(dev)); blk_queue_max_segment_size(q, dma_get_max_seg_size(dev)); killing dma_boundary in scsi_host_template needs a large patch for libata (dma_boundary is used by only libata and sym53c8xx). I'll send a patch to do that if it is acceptable. James and Jeff? Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Cc: James Bottomley <James.Bottomley@steeleye.com> Cc: Jens Axboe <jens.axboe@oracle.com> Cc: Greg KH <greg@kroah.com> Cc: Jeff Garzik <jeff@garzik.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
# 860ac568	04-Feb-2008	FUJITA Tomonori <tomof@acm.org>	iommu sg merging: call blk_queue_segment_boundary in __scsi_alloc_queue request_queue and device struct must have the same value of a segment size limit. This patch adds blk_queue_segment_boundary in __scsi_alloc_queue so LLDs don't need to call both blk_queue_segment_boundary and set_dma_max_seg_size. A LLD can change the default value (64KB) can call device_dma_parameters accessors like pci_set_dma_max_seg_size when allocating scsi_host. Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Acked-by: Jeff Garzik <jeff@garzik.org> Cc: James Bottomley <James.Bottomley@steeleye.com> Acked-by: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
# 3d9dd6ee	25-Jan-2008	FUJITA Tomonori <tomof@acm.org>	[SCSI] handle scsi_init_queue failure properly scsi_init_queue is expected to clean up allocated things when it fails. Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
# b172b6e9	25-Jan-2008	FUJITA Tomonori <tomof@acm.org>	[SCSI] destroy scsi_bidi_sdb_cache in scsi_exit_queue Needs to call kmem_cache_destroy for scsi_bidi_sdb_cache in scsi_exit_queue. Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
# d3f46f39	15-Jan-2008	James Bottomley <James.Bottomley@HansenPartnership.com>	[SCSI] remove use_sg_chaining With the sg table code, every SCSI driver is now either chain capable or broken (or has sg_tablesize set so chaining is never activated), so there's no need to have a check in the host template. Also tidy up the code by moving the scatterlist size defines into the SCSI includes and permit the last entry of the scatterlist pools not to be a power of two. Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
# b8de1631	17-Jan-2008	Kiyoshi Ueda <k-ueda@ct.jp.nec.com>	[SCSI] bidirectional: fix up for the new blk_end_request code Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com> Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
# 6f9a35e2	13-Dec-2007	Boaz Harrosh <bharrosh@panasas.com>	[SCSI] bidirectional command support At the block level bidi request uses req->next_rq pointer for a second bidi_read request. At Scsi-midlayer a second scsi_data_buffer structure is used for the bidi_read part. This bidi scsi_data_buffer is put on request->next_rq->special. Struct scsi_cmnd is not changed. - Define scsi_bidi_cmnd() to return true if it is a bidi request and a second sgtable was allocated. - Define scsi_in()/scsi_out() to return the in or out scsi_data_buffer from this command This API is to isolate users from the mechanics of bidi. - Define scsi_end_bidi_request() to do what scsi_end_request() does but for a bidi request. This is necessary because bidi commands are a bit tricky here. (See comments in body) - scsi_release_buffers() will also release the bidi_read scsi_data_buffer - scsi_io_completion() on bidi commands will now call scsi_end_bidi_request() and return. - The previous work done in scsi_init_io() is now done in a new scsi_init_sgtable() (which is 99% identical to old scsi_init_io()) The new scsi_init_io() will call the above twice if needed also for the bidi_read command. Only at this point is a command bidi. - In scsi_error.c at scsi_eh_prep/restore_cmnd() make sure bidi-lld is not confused by a get-sense command that looks like bidi. This is done by puting NULL at request->next_rq, and restoring. [jejb: update to sg_table and resolve conflicts also update to blk-end-request and resolve conflicts] Signed-off-by: Boaz Harrosh <bharrosh@panasas.com> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
# 30b0c37b	13-Dec-2007	Boaz Harrosh <bharrosh@panasas.com>	[SCSI] implement scsi_data_buffer In preparation for bidi we abstract all IO members of scsi_cmnd, that will need to duplicate, into a substructure. - Group all IO members of scsi_cmnd into a scsi_data_buffer structure. - Adjust accessors to new members. - scsi_{alloc,free}_sgtable receive a scsi_data_buffer instead of scsi_cmnd. And work on it. - Adjust scsi_init_io() and scsi_release_buffers() for above change. - Fix other parts of scsi_lib/scsi.c to members migration. Use accessors where appropriate. - fix Documentation about scsi_cmnd in scsi_host.h - scsi_error.c * Changed needed members of struct scsi_eh_save. * Careful considerations in scsi_eh_prep/restore_cmnd. - sd.c and sr.c * sd and sr would adjust IO size to align on device's block size so code needs to change once we move to scsi_data_buff implementation. * Convert code to use scsi_for_each_sg * Use data accessors where appropriate. - tgt: convert libsrp to use scsi_data_buffer - isd200: This driver still bangs on scsi_cmnd IO members, so need changing [jejb: rebased on top of sg_table patches fixed up conflicts and used the synergy to eliminate use_sg and sg_count] Signed-off-by: Boaz Harrosh <bharrosh@panasas.com> Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
# bb52d82f	13-Dec-2007	Boaz Harrosh <bharrosh@panasas.com>	[SCSI] tgt: use scsi_init_io instead of scsi_alloc_sgtable If we export scsi_init_io()/scsi_release_buffers() instead of scsi_{alloc,free}_sgtable() from scsi_lib than tgt code is much more insulated from scsi_lib changes. As a bonus it will also gain bidi capability when it comes. [jejb: rebase on to sg_table and fix up rejections] Signed-off-by: Boaz Harrosh <bharrosh@panasas.com> Acked-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
# 7cedb1f1	13-Jan-2008	James Bottomley <James.Bottomley@HansenPartnership.com>	SG: work with the SCSI fixed maximum allocations. SCSI sg table allocation has a maximum size (of SCSI_MAX_SG_SEGMENTS, currently 128) and this will cause a BUG_ON() in SCSI if something tries an allocation over it. This patch adds a size limit to the chaining allocator to allow the specification of the maximum allocation size for chaining, so we always chain in units of the maximum SCSI allocation size. Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
# 610d8b0c	11-Dec-2007	Kiyoshi Ueda <k-ueda@ct.jp.nec.com>	blk_end_request: changing scsi (take 4) This patch converts scsi mid-layer to use blk_end_request interfaces. Related 'uptodate' arguments are converted to 'error'. As a result, the interface of internal function, scsi_end_request(), is changed. Cc: James Bottomley <James.Bottomley@SteelEye.com> Cc: Boaz Harrosh <bharrosh@panasas.com> Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com> Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
# 5ed7959e	15-Nov-2007	Jens Axboe <jens.axboe@oracle.com>	SG: Convert SCSI to use scatterlist helpers for sg chaining Also change scsi_alloc_sgtable() to just return 0/failure, since it maps to the command passed in. ->request_buffer is now no longer needed, once drivers are adapted to use scsi_sglist() it can be killed. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
# b80ca4f7	12-Jan-2008	FUJITA Tomonori <tomof@acm.org>	[SCSI] replace sizeof sense_buffer with SCSI_SENSE_BUFFERSIZE This replaces sizeof sense_buffer with SCSI_SENSE_BUFFERSIZE in several LLDs. It's a preparation for the future changes to remove sense_buffer array in scsi_cmnd structure. Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
# 465ff318	01-Jan-2008	James Bottomley <James.Bottomley@HansenPartnership.com>	[SCSI] relax scsi dma alignment This patch relaxes the default SCSI DMA alignment from 512 bytes to 4 bytes. I remember from previous discussions that usb and firewire have sector size alignment requirements, so I upped their alignments in the respective slave allocs. The reason for doing this is so that we don't get such a huge amount of copy overhead in bio_copy_user() for udev. (basically all inquiries it issues can now be directly mapped). Acked-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
# 001aac25	02-Dec-2007	James Bottomley <James.Bottomley@SteelEye.com>	[SCSI] sd,sr: add early detection of medium not present The current scsi_test_unit_ready() is updated to return sense code information (in struct scsi_sense_hdr). The sd and sr drivers are changed to interpret the sense code return asc 0x3a as no media and adjust the device status accordingly. Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
# 4a03d90e	18-Nov-2007	Rusty Russell <rusty@rustcorp.com.au>	[SCSI] BUG_ON() impossible condition in sg list counting If blk_rq_map_sg wrote more than was allocated in the scatterlist, BUG_ON() is probably the right thing to do. [jejb: rejections fixed up] Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Acked-by: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
# 25d7c363	12-Nov-2007	Tony Battersby <tonyb@cybernetics.com>	[SCSI] move single_lun flag from scsi_device to scsi_target Some SCSI tape medium changers that need the BLIST_SINGLELUN flag have the medium changer at one LUN and the tape drive at a different LUN. The inquiry string of the tape drive may be different from that of the medium changer. In order for single_lun to be effective, every scsi_device under a given scsi_target must have it set. This means that there needs to be a blacklist entry for BOTH the medium changer AND the tape drive, which is impractical because some medium changers may be paired with a variety of different tape drive models. It makes more sense to put the single_lun flag in scsi_target instead of scsi_device, which causes every device at a given target ID to inherit the single_lun flag from one LUN. This makes it possible to blacklist just the medium changer and not the tape drive. Signed-off-by: Tony Battersby <tonyb@cybernetics.com> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
# eb44820c	03-Nov-2007	Rob Landley <rob@landley.net>	[SCSI] Add Documentation and integrate into docbook build Add Documentation/DocBook/scsi_midlayer.tmpl, add to Makefile, and update lots of kerneldoc comments in drivers/scsi/*. Updated with comments from Stefan Richter, Stephen M. Cameron, James Bottomley and Randy Dunlap. Signed-off-by: Rob Landley <rob@landley.net> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
# 7b3d9545	06-Jan-2008	Linus Torvalds <torvalds@woody.linux-foundation.org>	Revert "scsi: revert "[SCSI] Get rid of scsi_cmnd->done"" This reverts commit ac40532ef0b8649e6f7f83859ea0de1c4ed08a19, which gets us back the original cleanup of 6f5391c283d7fdcf24bf40786ea79061919d1e1d. It turns out that the bug that was triggered by that commit was apparently not actually triggered by that commit at all, and just the testing conditions had changed enough to make it appear to be due to it. The real problem seems to have been found by Peter Osterlund: "pktcdvd sets it [block device size] when opening the /dev/pktcdvd device, but when the drive is later opened as /dev/scd0, there is nothing that sets it back. (Btw, 40944 is possible if the disk is a CDRW that was formatted with "cdrwtool -m 10236".) The problem is that pktcdvd opens the cd device in non-blocking mode when pktsetup is run, and doesn't close it again until pktsetup -d is run. The effect is that if you meanwhile open the cd device, blkdev.c:do_open() doesn't call bd_set_size() because bdev->bd_openers is non-zero." In particular, to repeat the bug (regardless of whether commit 6f5391c283d7fdcf24bf40786ea79061919d1e1d is applied or not): " 1. Start with an empty drive. 2. pktsetup 0 /dev/scd0 3. Insert a CD containing an isofs filesystem. 4. mount /dev/pktcdvd/0 /mnt/tmp 5. umount /mnt/tmp 6. Press the eject button. 7. Insert a DVD containing a non-writable filesystem. 8. mount /dev/scd0 /mnt/tmp 9. find /mnt/tmp -type f -print0 \| xargs -0 sha1sum >/dev/null 10. If the DVD contains data beyond the physical size of a CD, you get I/O errors in the terminal, and dmesg reports lots of "attempt to access beyond end of device" errors." which in turn is because the nested open after the media change won't cause the size to be set properly (because the original open still holds the block device, and we only do the bd_set_size() when we don't have other people holding the device open). The proper fix for that is probably to just do something like bdev->bd_inode->i_size = (loff_t)get_capacity(disk)<<9; in fs/block_dev.c:do_open() even for the cases where we're not the original opener (but not call bd_set_size(), since that will also change the block size of the device). Cc: Peter Osterlund <petero2@telia.com> Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Cc: Matthew Wilcox <matthew@wil.cx> Cc: Ingo Molnar <mingo@elte.hu> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
# ac40532e	02-Jan-2008	Ingo Molnar <mingo@elte.hu>	scsi: revert "[SCSI] Get rid of scsi_cmnd->done" This reverts commit 6f5391c283d7fdcf24bf40786ea79061919d1e1d ("[SCSI] Get rid of scsi_cmnd->done") that was supposed to be a cleanup commit, but apparently it causes regressions: Bug 9370 - v2.6.24-rc2-409-g9418d5d: attempt to access beyond end of device http://bugzilla.kernel.org/show_bug.cgi?id=9370 this patch should be reintroduced in a more split-up form to make testing of it easier. Signed-off-by: Ingo Molnar <mingo@elte.hu> Acked-by: Matthew Wilcox <matthew@wil.cx> Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
# 751bf4d7	02-Jan-2008	James Bottomley <James.Bottomley@HansenPartnership.com>	[SCSI] scsi_sysfs: restore prep_fn when ULD is removed A recent bug report: http://bugzilla.kernel.org/show_bug.cgi?id=9674 Was caused because the ULDs now set their own prep functions, but don't necessarily reset the prep function back to the SCSI default when they are removed. This leads to panics if commands are sent to the device after the module is removed because the prep_fn is still pointing to the old module code. The fix for this is to implement a bus remove method that resets the prep_fn pointer correctly before calling the ULD specific driver remove method. Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
# a341cd0f	29-Oct-2007	Jeff Garzik <jeff@garzik.org>	SCSI: add asynchronous event notification API Originally based on a patch by Kristen Carlson Accardi @ Intel. Copious input from James Bottomley. Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
# c46f2334	30-Oct-2007	Jens Axboe <jens.axboe@oracle.com>	[SG] Get rid of __sg_mark_end() sg_mark_end() overwrites the page_link information, but all users want __sg_mark_end() behaviour where we just set the end bit. That is the most natural way to use the sg list, since you'll fill it in and then mark the end point. So change sg_mark_end() to only set the termination bit. Add a sg_magic debug check as well, and clear a chain pointer if it is set. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
# 45711f1a	22-Oct-2007	Jens Axboe <jens.axboe@oracle.com>	[SG] Update drivers to use sg helpers Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
# a3bec5c5	17-Oct-2007	Jens Axboe <axboe@carl.home.kernel.dk>	Revert "[SCSI] Remove full sg table memset()" A bit too eager - we definitely need to clear the sg table initially, so that we don't accidentally have ->page & 0x01 true and think that is a chain pointer. This reverts commit f5c0dde4c66421a3a2d7d6fa604a712c9b0744e5.
# f5c0dde4	17-Oct-2007	FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>	[SCSI] Remove full sg table memset() We don't need to do that anymore, since blk_rq_map_sg() clears individual entries. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
# 2a7c59e7	17-Sep-2007	FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>	remove sglist_len Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
# fd820f40	17-Sep-2007	FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>	revert sg segment size ifdefs This reverts sg segment size ifdefs that the current code has in order to provide a way to reduce sgpool memory consumption. Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
# 9cb83c75	16-Oct-2007	FUJITA Tomonori <tomof@acm.org>	[SCSI] add use_sg_chaining option to scsi_host_template This option is true if a low-level driver can support sg chaining. This will be removed eventually when all the drivers are converted to support sg chaining. q->max_phys_segments is set to SCSI_MAX_SG_SEGMENTS if false. Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
# a8474ce2	07-Aug-2007	Jens Axboe <jens.axboe@oracle.com>	SCSI: support for allocating large scatterlists This is what enables large commands. If we need to allocate an sgtable that doesn't fit in a single page, allocate several SCSI_MAX_SG_SEGMENTS sized tables and chain them together. SCSI defaults to large chained sg tables, if the arch supports it. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
# 0cde8d95	16-Oct-2007	Jens Axboe <jens.axboe@oracle.com>	scsi: simplify scsi_free_sgtable() Just pass in the command, no point in passing in the scatterlist and scatterlist pool index seperately. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
# c6132da1	16-Oct-2007	Jens Axboe <jens.axboe@oracle.com>	scsi: convert to using sg helpers This converts the SCSI mid layer to using the sg helpers for looking up sg elements, instead of doing it manually. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
# 6f5391c2	24-Sep-2007	Matthew Wilcox <willy@infradead.org>	[SCSI] Get rid of scsi_cmnd->done The ULD ->done callback moves into the scsi_driver. By moving the call to scsi_io_completion() from scsi_blk_pc_done() to scsi_finish_command(), we can eliminate the latter entirely. By returning 'good_bytes' from the ->done callback (rather than invoking scsi_io_completion()), we can stop exporting scsi_io_completion(). Also move the prototypes from sd.h to sd.c as they're all internal anyway. Rename sd_rw_intr to sd_done and rw_intr to sr_done. Inspired-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Matthew Wilcox <willy@linux.intel.com> Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
# 311b581e	23-Sep-2007	James Bottomley <James.Bottomley@steeleye.com>	[SCSI] Fix device not ready printk Because scsi_print_sense_hdr prefixes with KERN_INFO, the output from scsi_io_completion looks like: sd 0:0:0:0: [sdb] Device not ready: <6>: Sense Key : 0x2 [current] : ASC=0x4 ASCQ=0x3 By using scsi_show_sense_hdr, we can get the much more appealing output: sd 0:0:0:0: [sdb] Device not ready: Sense Key : 0x2 [current] sd 0:0:0:0: [sdb] Device not ready: ASC=0x4 ASCQ=0x3 Acked-by: Matthew Wilcox <willy@linux.intel.com> Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
# 7f9a6bc4	04-Aug-2007	James Bottomley <James.Bottomley@steeleye.com>	[SCSI] move ULD attachment into the prep function One of the intents of the block prep function was to allow ULDs to use it for preprocessing. The original SCSI model was to have a single prep function and add a pointer indirect filter to build the necessary commands. This patch reverses that, does away with the init_command field of the scsi_driver structure and makes ULDs attach directly to the prep function instead. The value is really that it allows us to begin to separate the ULDs from the SCSI mid layer (as long as they don't use any core functions---which is hard at the moment---a ULD doesn't even need SCSI to bind). Acked-by: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
# 52aeeca9	17-Sep-2007	Michael Reed <mdr@sgi.com>	[SCSI] stale residual returned on write following BUSY retry A BUSY status returned on a write request results in a stale residual being returned when the write ultimately successfully completes. This can be reproduced as follows: 1) issue immediate mode rewind to scsi tape drive 2) issue write request The tape drive returns busy. The low level driver detects underrun and sets the residual into the scsi command. The low level driver responds with (DID_OK << 16) \| scsi_status. scsi_status is 8, hence status_byte(result) == 4, i.e., BUSY. scsi_softirq_done() calls scsi_decide_disposition() which returns ADD_TO_MLQUEUE. scsi_softirq_done() then calls scsi_queue_insert() which, on the way to resubmitting the request to the driver, calls scsi_init_cmd_errh(). The attached patch modifies scsi_init_cmd_errh() to clear the resid field. This prevents a "stale" residual from being returned when the scsi command finally completes without a BUSY status. Signed-off-by: Michael Reed <mdr@sgi.com> Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
# bd441dea	12-Mar-2007	Mike Christie <michaelc@cs.wisc.edu>	[SCSI] fix write buffer length in scsi_req_map_sg() sg's may have setup a the buffer with a different length than the transfer length so we should be using the bufflen passed in as the request's data len. Signed-off-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
# 6712ecf8	26-Sep-2007	NeilBrown <neilb@suse.de>	Drop 'size' argument from bio_endio and bi_end_io As bi_end_io is only called once when the reqeust is complete, the 'size' argument is now redundant. Remove it. Now there is no need for bio_endio to subtract the size completed from bi_size. So don't do that either. While we are at it, change bi_end_io to return void. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
# 3001ca77	16-Aug-2007	NeilBrown <neilb@suse.de>	New function blk_req_append_bio ll_back_merge_fn is currently exported to SCSI where is it used, together with blk_rq_bio_prep, in exactly the same way these functions are used in __blk_rq_map_user. So move the common code into a new function (blk_rq_append_bio), and don't export ll_back_merge_fn any longer. Signed-off-by: Neil Brown <neilb@suse.de> diff .prev/block/ll_rw_blk.c ./block/ll_rw_blk.c Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
# 03a5743a	03-Aug-2007	James Bottomley <James.Bottomley@steeleye.com>	[SCSI] sd: disentangle barriers in SCSI Our current implementation has a generic set of barrier functions that go through the SCSI driver model. Realistically, this is unnecessary, because the only device that can use barriers (sd) can set the flush functions up at probe or revalidate time. This patch pulls the barrier functions out of the mid layer and scsi driver model and relocates them directly in sd. Acked-by: Tejun Heo <htejun@gmail.com> Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
# 165125e1	24-Jul-2007	Jens Axboe <jens.axboe@oracle.com>	[BLOCK] Get rid of request_queue_t typedef Some of the code has been gradually transitioned to using the proper struct request_queue, but there's lots left. So do a full sweet of the kernel and get rid of this typedef and replace its uses with the proper type. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
# 20c2df83	19-Jul-2007	Paul Mundt <lethal@linux-sh.org>	mm: Remove slab destructors from kmem_cache_create(). Slab destructors were no longer supported after Christoph's c59def9f222d44bb7e2f0a559f2906191a0862d7 change. They've been BUGs for both slab and slub, and slob never supported them either. This rips out support for the dtor pointer from kmem_cache_create() completely and fixes up every single callsite in the kernel (there were about 224, not including the slab allocator definitions themselves, or the documentation references). Signed-off-by: Paul Mundt <lethal@linux-sh.org>
# 7689e82e	09-Jul-2007	Cornelia Huck <cornelia.huck@de.ibm.com>	[SCSI] don't build scsi_dma_{map,unmap} for !HAS_DMA With dma-mapping-prevent-dma-dependent-code-from-linking-on.patch scsi fails to build on !HAS_DMA architectures: drivers/built-in.o(.text+0x20af6): In function `scsi_dma_map': : undefined reference to `dma_map_sg' drivers/built-in.o(.text+0x20b5c): In function `scsi_dma_unmap': : undefined reference to `dma_unmap_sg' I split those functions out into a new file. Builds on s390 and i386. Move scsi_dma_{map,unmap} into scsi_lib_dma.c which is only build if HAS_DMA is set. Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: James Bottomley <James.Bottomley@SteelEye.com> Cc: Jeff Garzik <jeff@garzik.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
# 824d7b57	25-May-2007	FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>	[SCSI] scsi_lib: add scatter/gather data buffer accessors This adds a set of accessors for the scsi data buffer. This is in preparation for chaining sg lists and bidirectional requests (and possibly, the mid-layer dma mapping). Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
# 59c51591	09-May-2007	Michael Opdenacker <michael@free-electrons.com>	Fix occurrences of "the the " Signed-off-by: Michael Opdenacker <michael@free-electrons.com> Signed-off-by: Adrian Bunk <bunk@stusta.de>
# 5972511b	02-Apr-2007	Jens Axboe <jens.axboe@oracle.com>	[BLOCK] Don't pin lots of memory in mempools Currently we scale the mempool sizes depending on memory installed in the machine, except for the bio pool itself which sits at a fixed 256 entry pre-allocation. There's really no point in "optimizing" this OOM path, we just need enough preallocated to make progress. A single unit is enough, lets scale it down to 2 just to be on the safe side. This patch saves ~150kb of pinned kernel memory on a 32-bit box. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
# b22f687d	13-Mar-2007	Pete Wyckoff <pw@osc.edu>	[SCSI] set resid in scsi_io_completion() even for check condition Some targets can return both valid data and sense information. Always update the request data_len from the SCSI command residual. Callers should interpret sense data to determine what parts of the data are valid in case of a CHECK CONDITION status. Signed-off-by: Pete Wyckoff <pw@osc.edu> Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
# a4d04a4c	27-Feb-2007	Martin K. Petersen <martin.petersen@oracle.com>	[SCSI] Make error printing more verbose This patch enhances SCSI error printing by: - Making use of scsi_print_result() in the completion functions. - Having scmd_printk() output the disk name (when applicable). Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
# c3762229	10-Feb-2007	Robert P. J. Day <rpjday@mindspring.com>	[PATCH] Transform kmem_cache_alloc()+memset(0) -> kmem_cache_zalloc(). Replace appropriate pairs of "kmem_cache_alloc()" + "memset(0)" with the corresponding "kmem_cache_zalloc()" call. Signed-off-by: Robert P. J. Day <rpjday@mindspring.com> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Andi Kleen <ak@muc.de> Cc: Roland McGrath <roland@redhat.com> Cc: James Bottomley <James.Bottomley@steeleye.com> Cc: Greg KH <greg@kroah.com> Acked-by: Joel Becker <Joel.Becker@oracle.com> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Jan Kara <jack@ucw.cz> Cc: Michael Halcrow <mhalcrow@us.ibm.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Stephen Smalley <sds@tycho.nsa.gov> Cc: James Morris <jmorris@namei.org> Cc: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
# 22cfefb5	05-Feb-2007	Andrew Morton <akpm@osdl.org>	[SCSI] scsi_kmap_atomic_sg(): check that local irqs are disabled The KM_BIO_SRC_IRQ kmap slot must be taken with local irqs disabled. Add a check into scsi for this. Cc: James Bottomley <James.Bottomley@steeleye.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
# 596f482a	01-Jan-2007	Christoph Hellwig <hch@lst.de>	[SCSI] kill scsi_rety_command scsi_retry_command only has a single caller, so there is no point in having this function. Additionally the memset of the sense buffer it does is entirely superflous as scsi_request_fn already calls scsi_init_cmd_errh to perform this memset before the command is reissued. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
# 1aa4f24f	19-Dec-2006	Jens Axboe <jens.axboe@oracle.com>	[PATCH] Remove queue merging hooks We have full flexibility of merging parameters now, so we can remove the hooks that define back/front/request merge strategies. Nobody is using them anymore. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
# 2985259b	19-Dec-2006	Jens Axboe <jens.axboe@oracle.com>	[PATCH] ->nr_sectors and ->hard_nr_sectors are not used for BLOCK_PC requests It's a file system thing, for block requests the only size used in the io paths is ->data_len as it is in bytes, not sectors. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
# e18b890b	06-Dec-2006	Christoph Lameter <clameter@sgi.com>	[PATCH] slab: remove kmem_cache_t Replace all uses of kmem_cache_t with struct kmem_cache. The patch was generated using the following script: #!/bin/sh # # Replace one string by another in all the kernel sources. # set -e for file in `find * -name ".c" -o -name ".h"\|xargs grep -l $1`; do quilt add $file sed -e "1,\$s/$1/$2/g" $file >/tmp/$$ mv /tmp/$$ $file quilt refresh done The script was run like this sh replace kmem_cache_t "struct kmem_cache" Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
# b58d9154	16-Nov-2006	FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>	[SCSI] export scsi-ml functions needed by tgt_scsi_lib and its LLDs This patch contains the needed changes to the scsi-ml for the target mode support. Note, per the last review we moved almost all the fields we added to the scsi_cmnd to our internal data structure which we are going to try and kill off when we can replace it with support from other parts of the kernel. The one field we left on was the offset variable. This is needed to handle the case where the target gets request that is so large that it cannot execute it in one dma operation. So max_secotors or a segment limit may limit the size of the transfer. In this case our tgt core code will break up the command into managable transfers and send them to the LLD one at a time. The offset is then used to tell the LLD where in the command we are at. Is there another field on the scsi_cmd for that? Signed-off-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
# 097b8457	16-Nov-2006	Tejun Heo <htejun@gmail.com>	[PATCH] scsi: clear garbage after CDBs on SG_IO ATAPI devices transfer fixed number of bytes for CDBs (12 or 16). Some ATAPI devices choke when shorter CDB is used and the left bytes contain garbage. Block SG_IO cleared left bytes but SCSI SG_IO didn't. This patch makes SCSI SG_IO clear it and simplify CDB clearing in block SG_IO. Signed-off-by: Tejun Heo <htejun@gmail.com> Cc: Mathieu Fluhr <mfluhr@nero.com> Cc: James Bottomley <James.Bottomley@steeleye.com> Cc: Douglas Gilbert <dougg@torque.net> Acked-by: Jens Axboe <jens.axboe@oracle.com> Cc: <stable@kernel.org> Acked-by: Jeff Garzik <jgarzik@pobox.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
# 3b003157	04-Nov-2006	Christoph Hellwig <hch@lst.de>	[SCSI] untangle scsi_prep_fn I wanted to add some BUG checks to scsi_prep_fn to make sure no one sends us a non-sg command, but this function is a horrible mess. So I decided to detangle the function and document what the valid cases are. While doing that I found that REQ_TYPE_SPECIAL commands aren't used by the SCSI layer anymore and we can get rid of the code handling them. The new structure of scsi_prep_fn is: (1) check if we're allowed to send this command (2) big switch on cmd_type. For the two valid types call into a function to set the command up, else error (3) code to handle error cases Because FS and BLOCK_PC commands are handled entirely separate after the patch this introduces a tiny amount of code duplication. This improves readabiulity though and will help to avoid the bidi command overhead for FS commands so it's a good thing. I've tested this on both sata and mptsas. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
# 46c43db1	08-Oct-2006	Alexey Dobriyan <adobriyan@gmail.com>	[SCSI] scsi_lib.c: use BUILD_BUG_ON Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
# 6470f2ba	30-Sep-2006	Arne Redlich <arne.redlich@xiranet.com>	[SCSI] trivial scsi_execute_async fix In scsi_execute_async()'s error path, a struct scsi_io_context allocated with kmem_cache_alloc() is kfree()'d. Obviously kmem_cache_free() should be used instead. Signed-off-by: Arne Redlich <arne.redlich@xiranet.com> Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
# 4aff5e23	10-Aug-2006	Jens Axboe <axboe@suse.de>	[PATCH] Split struct request ->flags into two parts Right now ->flags is a bit of a mess: some are request types, and others are just modifiers. Clean this up by splitting it into ->cmd_type and ->cmd_flags. This allows introduction of generic Linux block message types, useful for sending generic Linux commands to block devices. Signed-off-by: Jens Axboe <axboe@suse.de>
# 04846f25	09-Aug-2006	Andreas Herrmann <aherrman@de.ibm.com>	[SCSI] limit recursion when flushing shost->starved_list Attached is a patch that should limit a possible recursion that can lead to a stack overflow like follows: Kernel stack overflow. CPU: 3 Not tainted Process zfcperp0.0.d819 (pid: 13897, task: 000000003e0d8cc8, ksp: 000000003499dbb8) Krnl PSW : 0404000180000000 000000000030f8b2 (get_device+0x12/0x48) Krnl GPRS: 00000000135a1980 000000000030f758 000000003ed6c1e8 0000000000000005 0000000000000000 000000000044a780 000000003dbf7000 0000000034e15800 000000003621c048 070000003499c108 000000003499c1a0 000000003ed6c000 0000000040895000 00000000408ab630 000000003499c0a0 000000003499c0a0 Krnl Code: a7 fb ff e8 a7 19 00 00 b9 02 00 22 e3 e0 f0 98 00 24 a7 84 Call Trace: ([<000000004089edc2>] scsi_request_fn+0x13e/0x650 [scsi_mod]) [<00000000002c5ff4>] blk_run_queue+0xd4/0x1a4 [<000000004089ff8c>] scsi_queue_insert+0x22c/0x2a4 [scsi_mod] [<000000004089779a>] scsi_dispatch_cmd+0x8a/0x3d0 [scsi_mod] [<000000004089f1ec>] scsi_request_fn+0x568/0x650 [scsi_mod] ... [<000000004089f1ec>] scsi_request_fn+0x568/0x650 [scsi_mod] [<00000000002c5ff4>] blk_run_queue+0xd4/0x1a4 [<000000004089ff8c>] scsi_queue_insert+0x22c/0x2a4 [scsi_mod] [<000000004089779a>] scsi_dispatch_cmd+0x8a/0x3d0 [scsi_mod] [<000000004089f1ec>] scsi_request_fn+0x568/0x650 [scsi_mod] [<00000000002c5ff4>] blk_run_queue+0xd4/0x1a4 [<000000004089fa9e>] scsi_run_host_queues+0x196/0x230 [scsi_mod] [<00000000409eba28>] zfcp_erp_thread+0x2638/0x3080 [zfcp] [<0000000000107462>] kernel_thread_starter+0x6/0xc [<000000000010745c>] kernel_thread_starter+0x0/0xc <0>Kernel panic - not syncing: Corrupt kernel stack, can't continue. This stack overflow occurred during tests on s390 using zfcp. Recursion depth for this panic was 19. Usually recursion between blk_run_queue and a request_fn is avoided using QUEUE_FLAG_REENTER. But this does not help if the scsi stack tries to flush the starved_list of a scsi_host. Limit recursion depth when flushing the starved_list of a scsi_host. Signed-off-by: Andreas Herrmann <aherrman@de.ibm.com> Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
# 631c228c	08-Jul-2006	Christoph Hellwig <hch@lst.de>	[SCSI] hide EH backup data outside the scsi_cmnd Currently struct scsi_cmnd has various fields that are used to backup original data after the corresponding fields have been overridden for EH commands. This means drivers can easily get at it and misuse it. Due to the old_ naming this doesn't happen for most of them, but two that have different names have been used wrong a lot (see previous patch). Another downside is that they unessecarily bloat the scsi_cmnd size. This patch moves them onstack in scsi_send_eh_cmnd to fix those two issues aswell as allowing future EH fixes like moving the EH command submissions to use SG lists like everything else. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
# d6b0c537	02-Jul-2006	James Bottomley <James.Bottomley@steeleye.com>	[SCSI] fix error handling in scsi_io_completion There was a logic fault in scsi_io_completion() where zero transfer commands that complete successfully were sent to the block layer as not up to date. This patch removes the if (good_bytes > 0) gate around the successful completion, since zero transfer commands do have good_bytes == 0. Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
# 309bd271	27-Jun-2006	Brian King <brking@us.ibm.com>	[SCSI] scsi: Device scanning oops for offlined devices (resend) If a device gets offlined as a result of the Inquiry sent during scanning, the following oops can occur. After the disk gets put into the SDEV_OFFLINE state, the error handler sends back the failed inquiry, which wakes the thread doing the scan. This starts a race between the scanning thread freeing the scsi device and the error handler calling scsi_run_host_queues to restart the host. Since the disk is in the SDEV_OFFLINE state, scsi_device_get will still work, which results in __scsi_iterate_devices getting a reference to the scsi disk when it shouldn't. The following execution thread causes the oops: CPU 0 (scan) CPU 1 (eh) --------------------------------------------------------- scsi_probe_and_add_lun .... scsi_eh_offline_sdevs scsi_eh_flush_done_q scsi_destroy_sdev scsi_device_dev_release scsi_restart_operations scsi_run_host_queues __scsi_iterate_devices get_device scsi_device_dev_release_usercontext scsi_run_queue <---OOPS---> The patch fixes this by changing the state of the sdev to SDEV_DEL before doing the final put_device, which should prevent the race from occurring. Original oops follows: Badness in kref_get at lib/kref.c:32 Call Trace: [C00000002F4476D0] [C00000000000EE20] .show_stack+0x68/0x1b0 (unreliable) [C00000002F447770] [C00000000037515C] .program_check_exception+0x1cc/0x5a8 [C00000002F447840] [C00000000000446C] program_check_common+0xec/0x100 Exception: 700 at .kref_get+0x10/0x28 LR = .kobject_get+0x20/0x3c [C00000002F447B30] [C00000002F447BC0] 0xc00000002f447bc0 (unreliable) [C00000002F447BB0] [C000000000254BDC] .get_device+0x20/0x3c [C00000002F447C30] [D000000000063188] .scsi_device_get+0x34/0xdc [scsi_mod] [C00000002F447CC0] [D0000000000633EC] .__scsi_iterate_devices+0x50/0xbc [scsi_mod] [C00000002F447D60] [D00000000006A910] .scsi_run_host_queues+0x34/0x5c [scsi_mod] [C00000002F447DF0] [D000000000069054] .scsi_error_handler+0xdb4/0xe44 [scsi_mod] [C00000002F447EE0] [C00000000007B4E0] .kthread+0x128/0x178 [C00000002F447F90] [C000000000025E84] .kernel_thread+0x4c/0x68 Unable to handle kernel paging request for <7>PCI: Enabling device: (0002:41:01.1), cmd 143 data at address 0x000001b8 Faulting instruction address: 0xd0000000000698e4 sym1: <1010-66> rev 0x1 at pci 0002:41:01.1 irq 216 sym1: No NVRAM, ID 7, Fast-80, LVD, parity checking sym1: SCSI BUS has been reset. scsi2 : sym-2.2.2 cpu 0x0: Vector: 300 (Data Access) at [c00000002f447a30] pc: d0000000000698e4: .scsi_run_queue+0x2c/0x218 [scsi_mod] lr: d00000000006a904: .scsi_run_host_queues+0x28/0x5c [scsi_mod] sp: c00000002f447cb0 msr: 9000000000009032 dar: 1b8 dsisr: 40000000 current = 0xc0000000045fecd0 paca = 0xc00000000048ee80 pid = 1123, comm = scsi_eh_1 enter ? for help [c00000002f447d60] d00000000006a904 .scsi_run_host_queues+0x28/0x5c [scsi_mod] [c00000002f447df0] d000000000069054 .scsi_error_handler+0xdb4/0xe44 [scsi_mod] [c00000002f447ee0] c00000000007b4e0 .kthread+0x128/0x178 [c00000002f447f90] c000000000025e84 .kernel_thread+0x4c/0x68 Signed-off-by: Brian King <brking@us.ibm.com> Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
# 9ea72909	23-Jun-2006	Alan Stern <stern@rowland.harvard.edu>	[SCSI] SCSI core: Allow QUIESCE -> CANCEL sdev transition We have to be able to remove SCSI devices even when they are suspended, so QUIESCE -> CANCEL must be a legal state transition. This patch (as727) adds the transition to the state machine. Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
# 03aba2f7	23-Jun-2006	Luben Tuikov <ltuikov@yahoo.com>	[SCSI] sd/scsi_lib simplify sd_rw_intr and scsi_io_completion This patch simplifies "good_bytes" computation in sd_rw_intr(). sd: "good_bytes" computation is always done in terms of the resolution of the device's medium, since after that it is the number of good bytes we pass around and other layers/contexts (as opposed ot sd) can translate that to their own resolution (block layer:512). It also makes scsi_io_completion() processing more straightforward, eliminating the 3rd argument to the function. It also fixes a couple of bugs like not checking return value, using "break" instead of "return;", etc. I've been running with this patch for some time now on a test (do-it-all) system. Signed-off-by: Luben Tuikov <ltuikov@yahoo.com> Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
# beb40487	10-Jun-2006	Christoph Hellwig <hch@lst.de>	[SCSI] remove scsi_request infrastructure With Achim patch the last user (gdth) is switched away from scsi_request so we an kill it now. Also disables some code in i2o_scsi that was broken since the sg driver stopped using scsi_requests. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
# 6391a113	08-Jun-2006	Tobias Klauser <tklauser@nuerscht.ch>	[SCSI] drivers/scsi: Use ARRAY_SIZE macro Use ARRAY_SIZE macro instead of sizeof(x)/sizeof(x[0]) and remove duplicates of the macro. Signed-off-by: Tobias Klauser <tklauser@nuerscht.ch> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
# f5235962	22-Mar-2006	Bryan Holty <lgeek@frontiernet.net>	[SCSI] scsi_lib.c: properly count the number of pages in scsi_req_map_sg() The calculation of nr_pages in scsi_req_map_sg() doesn't account for the fact that the first page could have an offset that pushes the end of the buffer onto a new page. Signed-off-by: Bryan Holty <lgeek@frontiernet.net> Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
# ee7863bc	15-May-2006	Tejun Heo <htejun@gmail.com>	[PATCH] SCSI: implement shost->host_eh_scheduled libata needs to invoke EH without scmd. This patch adds shost->host_eh_scheduled to implement such behavior. Currently the only user of this feature is libata and no general interface is defined. This patch simply adds handling for host_eh_scheduled where needed and exports scsi_eh_wakeup() to modules. The rest is upto libata. This is the result of the following discussion. http://thread.gmane.org/gmane.linux.scsi/23853/focus=9760 In short, SCSI host is not supposed to know about exceptions unrelated to specific device or command. Such exceptions should be handled by transport layer proper. However, the distinction is not essential to ATA and libata is planning to depart from SCSI, so, for the time being, libata will be using SCSI EH to handle such exceptions. Signed-off-by: Tejun Heo <htejun@gmail.com>
# f3e93f73	25-Apr-2006	James Bottomley <James.Bottomley@steeleye.com>	[SCSI] Fix DVD burning issues. Some pioneer DVDs are apparently returning odd "not ready" status codes that the mid-layer doesn't recognise and so passes back to the user as errors. This patch overhauls our not-ready handling and adds transparent retries for: format in progress rebuild in progress recalculation in progress operation in progress Long write in progress self test in progress The Pioneer was actually returning "long write in progress" Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
# 169e1a2a	18-Apr-2006	Andrew Morton <akpm@osdl.org>	[SCSI] scsi_lib.c: fix warning in scsi_kmap_atomic_sg drivers/scsi/scsi_lib.c: In function `scsi_kmap_atomic_sg': drivers/scsi/scsi_lib.c:2394: warning: unsigned int format, different type arg (arg 3) drivers/scsi/scsi_lib.c:2394: warning: unsigned int format, different type arg (arg 4) Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
# cdb8c2a6	02-Apr-2006	Guennadi Liakhovetski <g.liakhovetski@gmx.de>	[SCSI] dc395x: dynamically map scatter-gather for PIO The current dc395x driver uses PIO to transfer up to 4 bytes which do not get transferred by DMA (under unclear circumstances). For this the driver uses page_address() which is broken on highmem. Apart from this the actual calculation of the virtual address is wrong (even without highmem). So, e.g., for reading it reads bytes from the driver to a wrong address and returns wrong data, I guess, for writing it would just output random data to the device. The proper fix, as suggested by many, is to dynamically map data using kmap_atomic(page, KM_BIO_SRC_IRQ) / kunmap_atomic(virt). The reason why it has not been done until now, although I've done some preliminary patches more than a year ago was that nobody interested in fixing this problem was able to reliably reproduce it. Now it changed - with the help from Sebastian Frei (CC'ed) I was able to trigger the PIO path. Thus, I was also able to test and debug it. There are 4 cases when PIO is used in dc395x - data-in / -out with and without scatter-gather. I was able to reproduce and test only data-in with and without SG. So, the data-out path is still untested, but it is also somewhat simpler than the data-in. Fredrik Roubert (also CC'ed) also had PIO triggering on his system, and in his case it was data-out without SG. It would be great if he could test the attached patch on his system, but even if he cannot, I would still request to apply the patch and just wait if anybody cries... Implementation: I put 2 new functions in scsi_lib.c and their declarations in scsi_cmnd.h. I exported them without _GPL, although, I don't feel strongly about that - not many drivers are likely to use them. But there is at least one more - I want to use them in tmscsim.c. Whether these are the right files for the functions and their declarations - not sure either. Actually, they are not scsi-specific, so, might go somewhere around other scattergather magic? They are not platform specific either, and most SG functions are defined under arch/*/... As these issues were discussed previously there were some more routines suggested to manipulate scattergather buffers, I think, some of them were needed around crypto code... So, might be a common place reasonable, like lib/scattergather.c? I am open here. Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
# e36e0c80	11-Apr-2006	Tejun Heo <htejun@gmail.com>	[SCSI] SCSI: fix scsi_kill_request() busy count handling scsi_kill_request() completes requests via normal SCSI completion path which decrements busy counts; however, requests which get passed to scsi_kill_request() aren't holding busy counts and scsi_kill_request() don't increment them before invoking completion path resulting in incorrect busy counts. Bump up busy counts before invoking completion path. Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
# 93d2341c	26-Mar-2006	Matthew Dobson <colpatch@us.ibm.com>	[PATCH] mempool: use mempool_create_slab_pool() Modify well over a dozen mempool users to call mempool_create_slab_pool() rather than calling mempool_create() with extra arguments, saving about 30 lines of code and increasing readability. Signed-off-by: Matthew Dobson <colpatch@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
# 5baba830	18-Mar-2006	James Bottomley <James.Bottomley@steeleye.com>	[SCSI] add scsi_mode_select to scsi_lib.c This complements the scsi_mode_sense() function Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
# ffedb452	23-Feb-2006	James Bottomley <James.Bottomley@steeleye.com>	[SCSI] fix scsi process problems and clean up the target reap issues In order to use the new execute_in_process_context() API, you have to provide it with the work storage, which I do in SCSI in scsi_device and scsi_target, but which also means that we can no longer queue up the target reaps, so instead I moved the target to a state model which allows target_alloc to detect if we've received a dying target and wait for it to be gone. Hopefully, this should also solve the target namespace race. Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
# 6d73c851	22-Feb-2006	Al Viro <viro@ftp.linux.org.uk>	[SCSI] scsi_lib: fix recognition of cache type of Initio SBP-2 bridges Regardless what mode page was asked for, Initio INIC-14x0 and INIC-2430 always return page 6 without mode page headers. Try to recognise this as a special case in scsi_mode_sense and setting the mode sense headers accordingly. Signed-off-by: Al Viro <viro@ftp.linux.org.uk> Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
# 24669f75	16-Jan-2006	Jes Sorensen <jes@sgi.com>	[SCSI] SCSI core kmalloc2kzalloc Change the core SCSI code to use kzalloc rather than kmalloc+memset where possible. Signed-off-by: Jes Sorensen <jes@sgi.com> Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
# 8884efab	24-Feb-2006	Brian King <brking@us.ibm.com>	[SCSI] scsi: scsi command retries off by one fix Fix up an off by one error in calculating retries for scsi commands. This bug was discovered when an SG_IO request was sent to scsi core with retries = 0, causing the overall timeout check to go off in scsi_softirq_done. Signed-off-by: Brian King <brking@us.ibm.com> Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
# faead26d	14-Feb-2006	James Bottomley <James.Bottomley@steeleye.com>	[PATCH] add scsi_execute_in_process_context() API We have several points in the SCSI stack (primarily for our device functions) where we need to guarantee process context, but (given the place where the last reference was released) we cannot guarantee this. This API gets around the issue by executing the function directly if the caller has process context, but scheduling a workqueue to execute in process context if the caller doesn't have it. Unfortunately, it requires memory allocation in interrupt context, but it's better than what we have previously. The true solution will require a bit of re-engineering, so isn't appropriate for 2.6.16. Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
# bb1d1073	23-Jan-2006	Brian King <brking@us.ibm.com>	[SCSI] Prevent scsi_execute_async from guessing cdb length When the scsi_execute_async interface was added it ended up reducing the flexibility of userspace to send arbitrary scsi commands through sg using SG_IO. The SG_IO interface allows userspace to specify the CDB length. This is now ignored in scsi_execute_async and it is guessed using the COMMAND_SIZE macro, which is not always correct, particularly for vendor specific commands. This patch adds a cmd_len parameter to the scsi_execute_async interface to allow the caller to specify the length of the CDB. Signed-off-by: Brian King <brking@us.ibm.com> Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
# 776b23a0	06-Jan-2006	Christoph Hellwig <hch@lst.de>	[SCSI] always handle REQ_BLOCK_PC requests in common code LLDDs should never see REQ_BLOCK_PC requests, we can handle them just fine in the core code. There is a small behaviour change in that some check in sr's rw_intr are bypassed, but I consider the old behaviour a bug. Mike found this cleanup opportunity and provdided early patches, so all the credit goes to him, even if I redid the patches from scratch beause that was easier than forward-porting the old patches. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
# 1aea6434	09-Jan-2006	Jens Axboe <axboe@suse.de>	[SCSI] Kill the SCSI softirq handling This patch moves the SCSI softirq handling to the block layer version. There should be no functional changes. Signed-off-by: Jens Axboe <axboe@suse.de>
# e650c305	05-Jan-2006	Jens Axboe <axboe@suse.de>	[SCSI] scsi_end_async() needs to take an uptodate parameter Signed-off-by: Jens Axboe <axboe@suse.de>
# 461d4e90	06-Jan-2006	Tejun Heo <htejun@gmail.com>	[BLOCK] update SCSI to use new blk_ordered for barriers All ordered request related stuff delegated to HLD. Midlayer now doens't deal with ordered setting or prepare_flush callback. sd.c updated to deal with blk_queue_ordered setting. Currently, ordered tag isn't used as SCSI midlayer cannot guarantee request ordering. Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Jens Axboe <axboe@suse.de>
# 8ffdc655	06-Jan-2006	Tejun Heo <htejun@gmail.com>	[BLOCK] add @uptodate to end_that_request_last() and @error to rq_end_io_fn() add @uptodate argument to end_that_request_last() and @error to rq_end_io_fn(). there's no generic way to pass error code to request completion function, making generic error handling of non-fs request difficult (rq->errors is driver-specific and each driver uses it differently). this patch adds @uptodate to end_that_request_last() and @error to rq_end_io_fn(). for fs requests, this doesn't really matter, so just using the same uptodate argument used in the last call to end_that_request_first() should suffice. imho, this can also help the generic command-carrying request jens is working on. Signed-off-by: tejun heo <htejun@gmail.com> Signed-Off-By: Jens Axboe <axboe@suse.de>
# 7b16318d	15-Dec-2005	James Bottomley <jejb@titanic.(none)>	Fix up SCSI mismerge I forgot to do a git-update-cache on the merged files ...
# defd94b7	05-Dec-2005	Mike Christie <michaelc@cs.wisc.edu>	[SCSI] seperate max_sectors from max_hw_sectors - export __blk_put_request and blk_execute_rq_nowait needed for async REQ_BLOCK_PC requests - seperate max_hw_sectors and max_sectors for block/scsi_ioctl.c and SG_IO bio.c helpers per Jens's last comments. Since block/scsi_ioctl.c SG_IO was already testing against max_sectors and SCSI-ml was setting max_sectors and max_hw_sectors to the same value this does not change any scsi SG_IO behavior. It only prepares ll_rw_blk.c, scsi_ioctl.c and bio.c for when SCSI-ml begins to set a valid max_hw_sectors for all LLDs. Today if a LLD does not set it SCSI-ml sets it to a safe default and some LLDs set it to a artificial low value to overcome memory and feedback issues. Note: Since we now cap max_sectors to BLK_DEF_MAX_SECTORS, which is 1024, drivers that used to call blk_queue_max_sectors with a large value of max_sectors will now see the fs requests capped to BLK_DEF_MAX_SECTORS. Signed-off-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
# aa7b5cd7	11-Nov-2005	Mike Christie <michaelc@cs.wisc.edu>	[SCSI] add kmemcache for scsi_io_context Add kmemcache of scsi io contexts. In the future when we finalize on where these functions will live we can add a mempool for it and do a bioset for out REQ_BLOCK_PC bios. This is needed becuase the dm-multipath handlers will want to use the scsi_exectute* functions for failover and we cannot have them and the bio device allocating from the same mempool. Signed-off-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
# 0d95716d	11-Nov-2005	Mike Christie <michaelc@cs.wisc.edu>	[SCSI] complete the whole command when it is REQ_BLOCK_PC sd does not allow scsi_io_completion to retry commands for SG_IO requests, and it make sense that it should not happen for st SG_IO commands too. If for st we hit the bottom of scsi_io_completion we will probably screw things up pretty bad. This patch returns to the block layer that the whole command completed and relies on the caller to check the request errors field. For initialization commands like in sd, this adds the previous behavior where scsi_io_completion did not process the error. Signed-off-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
# 17e01f21	11-Nov-2005	Mike Christie <michaelc@cs.wisc.edu>	[SCSI] add retries field to request for REQ_BLOCK_PC use For tape we need to control the retries. This patch adds a retries counter on the request for REQ_BLOCK_PC commands originating from scsi_execute* to use. REQ_BLOCK_PC commands comming from the block layer SG_IO path continue to use the retires set in the ULD init_command. (scsi_execute* does not set the gendisk so we do not execute the init_command in that path). Signed-off-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
# 6e68af66	11-Nov-2005	Mike Christie <michaelc@cs.wisc.edu>	[SCSI] Convert SCSI mid-layer to scsi_execute_async Add scsi helpers to create really-large-requests and convert scsi-ml to scsi_execute_async(). Per Jens's previous comments, I placed this function in scsi_lib.c. I made it follow all the queue's limits - I think I did at least :), so I removed the warning on the function header. I think the scsi_execute_* functions should eventually take a request_queue and be placed some place where the dm-multipath hw_handler can use them if that failover code is going to stay in the kernel. That conversion patch will be sent in another mail though. Signed-off-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
# c9526497	09-Dec-2005	James Bottomley <James.Bottomley@steeleye.com>	[SCSI] Consolidate REQ_BLOCK_PC handling path (fix ipod panic) This follows on from Jens' patch and consolidates all of the ULD separate handlers for REQ_BLOCK_PC into a single call which has his fix for our direction bug. Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
# 26a68019	29-Nov-2005	Jens Axboe <axboe@suse.de>	[SCSI] scsi_lib: stricter checks for clearing use_10_for_rw Check the asc and ascq for being "invalid command opcode" as well. Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
# 49d7bc64	12-Dec-2005	Linus Torvalds <torvalds@g5.osdl.org>	Revert revert of "[SCSI] fix usb storage oops" This reverts commit 1b0997f561bf46689cc6e0903f342e9bf2506bf1, which in turn reverted 34ea80ec6a02ad02e6b9c75c478c18e5880d6713 (which is thus re-instated). Quoth James Bottomley: "All it's doing is deferring the device_put() from the scsi_put_command() to after the scsi_run_queue(), which doesn't fix the sleep while atomic problem of the device release method. In both cases we still get the semaphore in atomic context problem which is caused by scsi_reap_target() doing a device_del(), which I assumed (wrongly) was valid from atomic context." who also promised to fix scsi_reap_target(). Signed-off-by: Linus Torvalds <torvalds@osdl.org>
# a8c730e8	09-Dec-2005	Jens Axboe <axboe@suse.de>	[SCSI] fix panic when ejecting ieee1394 ipod The scsi_library routines don't correctly set DMA_NONE when req->data_len is zero (instead they check the command type first, so if it's write, we end up with req->data_len == 0 and direction as DMA_TO_DEVICE which confuses some drivers) Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
# 1b0997f5	02-Dec-2005	Linus Torvalds <torvalds@g5.osdl.org>	Revert "[SCSI] fix usb storage oops" This reverts commit 34ea80ec6a02ad02e6b9c75c478c18e5880d6713. It does a put_device() from softirq context, which is bad since it gets a semaphore for reading. Signed-off-by: Linus Torvalds <torvalds@osdl.org>
# 34ea80ec	08-Nov-2005	goggin, edward <egoggin@emc.com>	[SCSI] fix usb storage oops The problem is that scsi_run_queue is called from scsi_next_command() after doing a scsi_put_command. If the command was the only thing holding the reference on the scsi_device then the resulting device put will tear down the block queue. Fix this by taking a reference to the device and holding it around scsi_run_queue() Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
# 262eef66	28-Oct-2005	Christoph Hellwig <hch@lst.de>	[SCSI] remove scsi_wait_req This function has been superceeded by the block request based interfaces and is unused (except for the uncompilable cpqfc driver). Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
# 3bf743e7	24-Oct-2005	Jeff Garzik <jgarzik@pobox.com>	[SCSI] use {sdev,scmd,starget,shost}_printk in generic code rejections fixed and Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
# 9ccfc756	02-Oct-2005	James Bottomley <James.Bottomley@steeleye.com>	[SCSI] move the mid-layer printk's over to shost/starget/sdev_printk This should eliminate (at least in the mid layer) to make numeric assumptions about any of the enumeration variables. As a side effect, it will also make all the messages consistent and line us up nicely for the error logging strategy (if it ever shows itself again). Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
# c53033f6	21-Oct-2005	Al Viro <viro@zeniv.linux.org.uk>	[PATCH] gfp_t: drivers/scsi Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
# 7c72ce81	14-Oct-2005	Alan Stern <stern@rowland.harvard.edu>	[SCSI] Fix leak of Scsi_Cmnds When a request is deferred in scsi_init_io because the sg table could not be allocated, the associated scsi_cmnd is not released and the request is not marked with REQ_DONTPREP. When the command is retried, if scsi_prep_fn decides to kill it then the scsi_cmnd will never be released. This patch (as573) changes scsi_init_io so that it calls scsi_put_command before deferring a request. Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
# 939647ee	18-Sep-2005	James Bottomley <James.Bottomley@steeleye.com>	[SCSI] fix oops on usb storage device disconnect We fix the oops by enforcing the host state model. There have also been two extra states added: SHOST_CANCEL_RECOVERY and SHOST_DEL_RECOVERY so we can take the model through host removal while the recovery thread is active. Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
# b95be99d	15-Sep-2005	Alan Stern <stern@rowland.harvard.edu>	[SCSI] fix oops in scsi_release_buffers() I found one other thing that needs to be fixed. The call to scsi_release_buffers in scsi_unprep_request causes an oops, because the sgtable has already been freed in scsi_io_completion. The following patch is needed. Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
# 59897dad	13-Sep-2005	James Bottomley <James.Bottomley@steeleye.com>	[SCSI] fix sym scsi boot hang On Wed, 2005-09-14 at 18:06 +1000, Anton Blanchard wrote: > And in particular it looks like the scsi_unprep_request in > scsi_queue_insert is causing it. The following patch fixes the boot > problems on the vscsi machine: OK, my fault. Your fix is almost correct .. I was going to do this eventually, honest, because there's no need to unprep and reprep a command that comes in through scsi_queue_insert(). However, I decided to leave it in to exercise the scsi_unprep_request() path just to make sure it was working. What's happening, I think, is that we also use this path for retries. Since we kill and reget the command each time, the retries decrement is never seen, so we're retrying forever. Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
# 186d330e	13-Sep-2005	Timothy Thelin <Timothy.Thelin@wdc.com>	[SCSI] scsi: sd, sr, st, and scsi_lib all fail to copy cmd_len to new cmd This fixes an issue in scsi command initialization from a request where sd, sr, st, and scsi_lib all fail to copy the request's cmd_len to the scsi command's cmd_len field. Signed-off-by: Timothy Thelin <timothy.thelin@wdc.com> Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
# 6f16b535	10-Sep-2005	Mike Christie <michaelc@cs.wisc.edu>	[SCSI] set error value when failing commands in prep_fn set DID_NO_CONNECT for the BLKPREP_KILL case and correct a few BLKPREP_DEFER cases that weren't checking for the need to plug the queue. Signed-Off-By: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
# 788ce43a	09-Sep-2005	James Bottomley <James.Bottomley@steeleye.com>	[SCSI] SCSI core: fix leakage of scsi_cmnd's Actually, just one problem and one cosmetic fix: 1) We need to dequeue for the loop and kill case (it seems easiest simply to dequeue in the scsi_kill_request() routine) 2) There's no real need to drop the queue lock. __scsi_done() is lock agnostic, so since there's no requirement, let's just leave it in to avoid any locking issues. Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
# e91442b6	09-Sep-2005	James Bottomley <jejb@mulgrave.(none)>	[SCSI] SCSI core: fix leakage of scsi_cmnd's From: Alan Stern <stern@rowland.harvard.edu> This patch (as559b) adds a new routine, scsi_unprep_request, which gets called every place a request is requeued. (That includes scsi_queue_insert as well as scsi_requeue_command.) It also changes scsi_kill_requests to make it call __scsi_done with result equal to DID_NO_CONNECT << 16. (I'm not sure if it's necessary to call scsi_init_cmd_errh here; maybe you can check on that.) Finally, the patch changes the return value from scsi_end_request, to avoid returning a stale pointer in the case where the request was requeued. Fortunately the return value is used in only place, and the change actually simplified it. Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Rejections fixed up and Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
# 286f3e13	01-Sep-2005	Neil Brown <neilb@suse.de>	[SCSI] fix possible deadlock in scsi_lib.c If a filesystem, while writing out data, decides that it is good to issue a cache flush on a SCSI drive (or other 'sd' device), it will call blkdev_issue_flush which calls ->issue_flush_fn which is scsi_issue_flush_fn. This calls sd_issue_flush which calls sd_sync_cache, which calls scsi_execute_request. This will (as sshdr != NULL) call kmalloc(SCSI_SENSE_BUFFERSIZE, GFP_KERNEL) If memory is tight, the presence of GFP_KERNEL may cause write requests to be sent to some filesystem to free up memory, however if that filesystem is waiting for the issue_flush_fn to complete, you could get a deadlock. I wonder if it might be more appropriate to use GFP_NOIO as in the following patch. I wonder if it might be even more appropriate to cope better with a kmalloc failure, especially as in this use, sd_sync_cache only will use the sense information to print out a more informative error message. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
# 3173d8c3	04-Sep-2005	James Bottomley <James.Bottomley@steeleye.com>	[SCSI] quieten messages on scsi_execute commands scsi_io_completion() can be a bit noisy about certain conditions. Previously this wasn't a problem for internally generated commands, since they never hit it. However, since we do all SCSI commands via bios, now they do. user CD testers like magicdev are now getting not ready messages every time they touch the CD to see if there's anything in it. Fix this by making all scsi_execute commands REQ_QUIET and making scsi_finish_io() not say anything for REQ_QUIET. Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
# e514385b	09-Aug-2005	James Bottomley <James.Bottomley@steeleye.com>	[SCSI] fix sense buffer length handling problem The new bio code was incorrectly converted from stack allocated to kmalloc'd buffer handling. There are two places where it incorrectly uses sizeof(*sense) to get the size of the sense buffer. This actually produces one, so no sense data was ever getting back, causing failure in things like disk spin up. Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
# 1ccb48bb	26-Jun-2005	akpm@osdl.org <akpm@osdl.org>	[SCSI] fix C syntax problem in scsi_lib.c Older gcc's require variable definitions at the beginning of a block. Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
# ea73a9f2	28-Aug-2005	James Bottomley <jejb@titanic.(none)>	[SCSI] convert sd to scsi_execute_req (and update the scsi_execute_req API) This one removes struct scsi_request entirely from sd. In the process, I noticed we have no callers of scsi_wait_req who don't immediately normalise the sense, so I updated the API to make it take a struct scsi_sense_hdr instead of simply a big sense buffer. Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
# 33aa687d	28-Aug-2005	James Bottomley <jejb@titanic.(none)>	[SCSI] convert SPI transport class to scsi_execute This one's slightly more difficult. The transport class uses REQ_FAILFAST, so another interface (scsi_execute) had to be invented to take the extra flag. Also, the sense functions are shifted around to allow spi_execute to place data directly into a struct scsi_sense_hdr. With this change, there's probably a lot of unnecessary sense buffer allocation going on which we can fix later. Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
# 1cf72699	28-Aug-2005	James Bottomley <jejb@titanic.(none)>	[SCSI] convert the remaining mid-layer pieces to scsi_execute_req After this, we just have some drivers, all the ULDs and the SPI transport class using scsi_wait_req(). Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
# 39216033	15-Jun-2005	James Bottomley <jejb@titanic.(none)>	[SCSI] use scatter lists for all block pc requests and simplify hw handlers Original From: Mike Christie <michaelc@cs.wisc.edu> Add scsi_execute_req() as a replacement for scsi_wait_req() Fixed up various pieces (added REQ_SPECIAL and caught req use after free) Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
# 8e640118	15-Jun-2005	James Bottomley <jejb@titanic.(none)>	update scsi_wait_req to new format for blk_rq_map_kern() Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
# e537a36d	05-Jun-2005	James Bottomley <James.Bottomley@steeleye.com>	[SCSI] use scatter lists for all block pc requests and simplify hw handlers Here's the proof of concept for this one. It converts scsi_wait_req to do correct REQ_BLOCK_PC submission (and works nicely in my setup). The final goal should be to eliminate struct scsi_request, but that can't be done until the character submission paths of sg and st are also modified. There's some loss of functionality to this: retries are no longer controllable (except by setting REQ_FASTFAIL) and the wait_req API needs to be altered, but it looks very nice. Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
# d3301874	16-Jun-2005	Mike Anderson <andmike@us.ibm.com>	[SCSI] host state model update: replace old host bitmap state Migrate the current SCSI host state model to a model like SCSI device is using. Signed-off-by: Mike Anderson <andmike@us.ibm.com> Rejections fixed up and Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
# 0f34e3f5	12-Jul-2005	Kenneth W Chen <kenneth.w.chen@intel.com>	[SCSI] Redundant memset in scsi_alloc_sgtable scsi_init_io calls scsi_alloc_sgtable and then calls blk_rq_map_sg to initialize the scatterlist structure. blk_rq_map_sg() already memset the structure for every new segment. That makes the memset in scsi_alloc_sgtable unnecessary. Patch to delete the extra memset in scsi_alloc_sgtable. Tested on a x86_64 machine. Looks stable to me. Signed-off-by: Ken Chen <kenneth.w.chen@intel.com> Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
# a77e3362	28-Jun-2005	KAMBAROV, ZAUR <kambarov@berkeley.edu>	[PATCH] coverity: i386: scsi_lib buffer overrun fix The check in 627 BUG_ON(index > SG_MEMPOOL_NR); with SG_MEMPOOL_NR defined in 32 #define SG_MEMPOOL_NR (sizeof(scsi_sg_pools)/sizeof(struct scsi_host_sg_pool)) was not sufficient. sgp, set in 629 sgp = scsi_sg_pools + index; is dereferenced in 630 mempool_free(sgl, sgp->pool); Signed-off-by: Zaur Kambarov <zkambarov@coverity.com> Cc: <linux-scsi@vger.kernel.org> Cc: James Bottomley <James.Bottomley@steeleye.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
# 8d115f84	19-Jun-2005	Christoph Hellwig <hch@lst.de>	[SCSI] remove scsi_cmnd->state We never look at it except for the old megaraid driver that abuses it for sending internal commands. That usage can be fixed easily because those internal commands are single-threaded by a mutex and we can easily use a completion there. Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
# b4edcbca	19-Jun-2005	Christoph Hellwig <hch@lst.de>	[SCSI] remove scsi_cmnd->owner never checked anywhere Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
# f5ad5614	19-Jun-2005	Christoph Hellwig <hch@lst.de>	[SCSI] remove scsi_cmnd->abort_reason Never used for anything but printing it out in debug routines. Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
# 52c1da39	23-Jun-2005	Adrian Bunk <bunk@stusta.de>	[PATCH] make various thing static Another rollup of patches which give various symbols static scope Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
# d8c37e7b	13-May-2005	Tejun Heo <htejun@gmail.com>	[SCSI] remove a timer race in scsi_queue_insert() scsi_queue_insert() has four callers. Three callers call with timer disabled and one (the second invocation in scsi_dispatch_cmd()) calls with timer activated. scsi_queue_insert() used to always call scsi_delete_timer() and ignore the return value. This results in race with timer expiration. Remove scsi_delete_timer() call from scsi_queue_insert() and make the caller delete timer and check the return value. Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
# a1bf9d1d	24-Apr-2005	Tejun Heo <htejun@gmail.com>	[SCSI] make scsi_queue_insert() use blk_requeue_request() scsi_queue_insert() used to use blk_insert_request() for requeueing requests. This depends on the unobvious behavior of blk_insert_request() setting REQ_SPECIAL and REQ_SOFTBARRIER when requeueing. This patch makes scsi_queue_insert() use blk_requeue_request(). As REQ_SPECIAL means special requests and REQ_SOFTBARRIER is automatically handled by blk layer now, no flag needs to be set. Note that scsi_queue_insert() now calls scsi_run_queue() itself, and the prototype of the function is added right above scsi_queue_insert(). This is temporary, as later requeue path consolidation patchset removes scsi_queue_insert(). By adding temporary prototype, we can do away with unnecessarily moving functions. Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
# 283369cc	24-Apr-2005	Tejun Heo <htejun@gmail.com>	[SCSI] make scsi_requeue_request() use blk_requeue_request() scsi_requeue_request() used to use blk_insert_request() for requeueing requests. This depends on the unobvious behavior of blk_insert_request() setting REQ_SPECIAL and REQ_SOFTBARRIER when requeueing. This patch makes scsi_queue_insert() use blk_requeue_request(). As REQ_SPECIAL means special requests and REQ_SOFTBARRIER is automatically handled by blk layer now, no flag needs to be set. Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
# 867d1191	24-Apr-2005	Tejun Heo <htejun@gmail.com>	[SCSI] remove requeue feature from blk_insert_request() blk_insert_request() has a unobivous feature of requeuing a request setting REQ_SPECIAL\|REQ_SOFTBARRIER. SCSI midlayer was the only user and as previous patches removed the usage, remove the feature from blk_insert_request(). Only special requests should be queued with blk_insert_request(). All requeueing should go through blk_requeue_request(). Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
# beb6617d	24-Apr-2005	Tejun Heo <htejun@gmail.com>	[SCSI] remove REQ_SPECIAL in scsi_init_io() scsi_init_io() used to set REQ_SPECIAL when it fails sg allocation before requeueing the request by returning BLKPREP_DEFER. REQ_SPECIAL is being updated to mean special requests. So, remove REQ_SPECIAL setting. Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
# c6295cdf	03-Apr-2005	Tejun Heo <htejun@gmail.com>	[PATCH] scsi: remove meaningless scsi_cmnd->serial_number_at_timeout field scsi_cmnd->serial_number_at_timeout doesn't serve any purpose anymore. All serial_number == serial_number_at_timeout tests are always true in abort callbacks. Kill the field. Also, as ->pid always equals ->serial_number and ->serial_number doesn't have any special meaning anymore, update comments above ->serial_number accordingly. Once we remove all uses of this field from all lldd's, this field should go. Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
# d3a933dc	03-Apr-2005	Tejun Heo <htejun@gmail.com>	[PATCH] scsi: remove unused scsi_cmnd->internal_timeout field scsi_cmnd->internal_timeout field doesn't have any meaning anymore. Kill the field. Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
# 84011ae8	03-Apr-2005	Tejun Heo <htejun@gmail.com>	[PATCH] scsi: remove meaningless scsi_cmnd->serial_number_at_timeout field scsi_cmnd->serial_number_at_timeout doesn't serve any purpose anymore. All serial_number == serial_number_at_timeout tests are always true in abort callbacks. Kill the field. Also, as ->pid always equals ->serial_number and ->serial_number doesn't have any special meaning anymore, update comments above ->serial_number accordingly. Once we remove all uses of this field from all lldd's, this field should go. Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
# 97665e9c	03-Apr-2005	Tejun Heo <htejun@gmail.com>	[PATCH] scsi: remove unused scsi_cmnd->internal_timeout field scsi_cmnd->internal_timeout field doesn't have any meaning anymore. Kill the field. Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
# 152587de	12-Apr-2005	Jens Axboe <axboe@suse.de>	[PATCH] fix NMI lockup with CFQ scheduler The current problem seen is that the queue lock is actually in the SCSI device structure, so when that structure is freed on device release, we go boom if the queue tries to access the lock again. The fix here is to move the lock from the scsi_device to the queue. Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
# 1da177e4	16-Apr-2005	Linus Torvalds <torvalds@ppc970.osdl.org>	Linux-2.6.12-rc2 Initial git repository build. I'm not bothering with the full history, even though we have it. We can create a separate "historical" git archive of that later if we want to, and in the meantime it's about 3.2GB when imported into git - space that would just make the early git days unnecessarily complicated, when we don't have a lot of good infrastructure for it. Let it rip!