#
2f188828 |
|
19-Dec-2023 |
Sergey Gorenko <sergeygo@nvidia.com> |
IB/iser: Prevent invalidating wrong MR The iser_reg_resources structure has two pointers to MR but only one mr_valid field. The implementation assumes that we use only *sig_mr when pi_enable is true. Otherwise, we use only *mr. However, it is only sometimes correct. Read commands without protection information occur even when pi_enble is true. For example, the following SCSI commands have a Data-In buffer but never have protection information: READ CAPACITY (16), INQUIRY, MODE SENSE(6), MAINTENANCE IN. So, we use *sig_mr for some SCSI commands and *mr for the other SCSI commands. In most cases, it works fine because the remote invalidation is applied. However, there are two cases when the remote invalidation is not applicable. 1. Small write commands when all data is sent as an immediate. 2. The target does not support the remote invalidation feature. The lazy invalidation is used if the remote invalidation is impossible. Since, at the lazy invalidation, we always invalidate the MR we want to use, the wrong MR may be invalidated. To fix the issue, we need a field per MR that indicates the MR needs invalidation. Since the ib_mr structure already has such a field, let's use ib_mr.need_inval instead of iser_reg_resources.mr_valid. Fixes: b76a439982f8 ("IB/iser: Use IB_WR_REG_MR_INTEGRITY for PI handover") Link: https://lore.kernel.org/r/20231219072311.40989-1-sergeygo@nvidia.com Acked-by: Max Gurtovoy <mgurtovoy@nvidia.com> Signed-off-by: Sergey Gorenko <sergeygo@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
#
d42fafb8 |
|
22-Dec-2023 |
Randy Dunlap <rdunlap@infradead.org> |
IB/iser: iscsi_iser.h: fix kernel-doc warning and spellos Drop one kernel-doc comment to prevent a warning: iscsi_iser.h:313: warning: Excess struct member 'mr' description in 'iser_device' and spell 2 words correctly (buffer and deferred). Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Sagi Grimberg <sagi@grimberg.me> Cc: Max Gurtovoy <mgurtovoy@nvidia.com> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: Leon Romanovsky <leonro@nvidia.com> Cc: linux-rdma@vger.kernel.org Link: https://lore.kernel.org/r/20231222234623.25231-1-rdunlap@infradead.org Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Acked-by: Max Gurtovoy <mgurtovoy@nvidia.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
#
d0d4df06 |
|
21-May-2022 |
Julia Lawall <Julia.Lawall@inria.fr> |
IB/iser: Fix typo in comment Spelling mistake (triple letters) in comment. Detected with the help of Coccinelle. Link: https://lore.kernel.org/r/20220521111145.81697-4-Julia.Lawall@inria.fr Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Acked-by: Max Gurtovoy <mgurtovoy@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
#
80303ee2 |
|
08-Mar-2022 |
Max Gurtovoy <mgurtovoy@nvidia.com> |
IB/iser: Generalize map/unmap dma tasks Avoid code duplication and add the mapping/unmapping of the protection buffers to the iser_dma_map_task_data/iser_dma_unmap_task_data functions. Link: https://lore.kernel.org/r/20220308145546.8372-4-mgurtovoy@nvidia.com Reviewed-by: Sergey Gorenko <sergeygo@nvidia.com> Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com> Acked-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
#
ee4efeae |
|
08-Mar-2022 |
Max Gurtovoy <mgurtovoy@nvidia.com> |
IB/iser: Use iser_fr_desc as registration context After removing the FMR support in iSER, there is only one type of registration context. Replace the void pointer with the explicit structure for registration (struct iser_fr_desc). Link: https://lore.kernel.org/r/20220308145546.8372-3-mgurtovoy@nvidia.com Reviewed-by: Sergey Gorenko <sergeygo@nvidia.com> Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com> Acked-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
#
433dc0ef |
|
15-Dec-2021 |
Max Gurtovoy <mgurtovoy@nvidia.com> |
IB/iser: Don't suppress send completions In order to complete a scsi command and guarantee that the HCA will never perform an access violation when retrying a send operation we must complete a scsi request only when both send and receive completions has arrived. This is a preparation commit that remove the send completions suppression. Next step will be taking care of the local invalidation mechanism and adding a reference counter for commands. Currently, we don't do anything upon getting the send completion and just "consume" it. Link: https://lore.kernel.org/r/20211215135721.3662-5-mgurtovoy@nvidia.com Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com> Reviewed-by: Israel Rukshin <israelr@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
#
39b169ea |
|
15-Dec-2021 |
Sergey Gorenko <sergeygo@nvidia.com> |
IB/iser: Fix RNR errors Some users complain about RNR errors on the target, when heavy high-priority tasks run on the initiator. After the investigation, we found out that the receive WRs were exhausted, because the initiator could not post them on time. Receive work reqeusts are posted in chunks to reduce the number of hits to the HCA. The WRs are posted in the receive completion handler when the number of free receive buffers reaches the threshold. But on a high-loaded host, receive CQEs processing can be delayed and all receive WRs will be exhausted. In this case, the target will get an RNR error. To avoid this, we post receive WR, as soon as possible and not in a batch. This increases the number of hits to the HCA, but also the common implementation in most of Linux ULPs (e.g. NVMe-oF/RDMA). As a rule of thumb, performance improvements and heuristics are being added to the RDMA core layer or vendors low level drivers and it's about time to align iSER as well. Link: https://lore.kernel.org/r/20211215135721.3662-3-mgurtovoy@nvidia.com Signed-off-by: Sergey Gorenko <sergeygo@nvidia.com> Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com> Reviewed-by: Israel Rukshin <israelr@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
#
b28801a0 |
|
15-Dec-2021 |
Max Gurtovoy <mgurtovoy@nvidia.com> |
IB/iser: Remove deprecated pi_guard module param No need for this dead code. This commit doesn't change any functionality since one can still run "modprobe ib_iser pi_guard=<type>". Link: https://lore.kernel.org/r/20211215135721.3662-2-mgurtovoy@nvidia.com Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
#
7f13e0be |
|
26-Mar-2021 |
Wan Jiabing <wanjiabing@vivo.com> |
RDMA/iser: struct iscsi_iser_task is declared twice struct iscsi_iser_task has been declared at 201st line. Remove the duplicate. Link: https://lore.kernel.org/r/20210326113347.903976-1-wanjiabing@vivo.com Signed-off-by: Wan Jiabing <wanjiabing@vivo.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
#
d56a7852 |
|
22-Jul-2020 |
Yamin Friedman <yaminf@mellanox.com> |
IB/iser: use new shared CQ mechanism Have the driver use shared CQs provided by the rdma core driver. Since this provides similar functionality to iser_comp it has been removed. Now there is no reason to allocate very large CQs when the driver is loaded while gaining the advantage of shared CQs. Link: https://lore.kernel.org/r/20200722135629.49467-1-maxg@mellanox.com Signed-off-by: Yamin Friedman <yaminf@mellanox.com> Acked-by: Max Gurtovoy <maxg@mellanox.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
#
11708142 |
|
20-Jun-2020 |
Colton Lewis <colton.w.lewis@protonmail.com> |
RDMA: Correct trivial kernel-doc inconsistencies Silence documentation build warnings by correcting kernel-doc comments. ./drivers/infiniband/core/verbs.c:1004: warning: Function parameter or member 'uobject' not described in 'ib_create_srq_user' ./drivers/infiniband/core/verbs.c:1004: warning: Function parameter or member 'udata' not described in 'ib_create_srq_user' ./drivers/infiniband/core/umem_odp.c:161: warning: Function parameter or member 'ops' not described in 'ib_umem_odp_alloc_child' ./drivers/infiniband/core/umem_odp.c:225: warning: Function parameter or member 'ops' not described in 'ib_umem_odp_get' ./drivers/infiniband/sw/rdmavt/ah.c:104: warning: Excess function parameter 'ah_attr' description in 'rvt_create_ah' ./drivers/infiniband/sw/rdmavt/ah.c:104: warning: Excess function parameter 'create_flags' description in 'rvt_create_ah' ./drivers/infiniband/ulp/iser/iscsi_iser.h:363: warning: Function parameter or member 'all_list' not described in 'iser_fr_desc' ./drivers/infiniband/ulp/iser/iscsi_iser.h:377: warning: Function parameter or member 'all_list' not described in 'iser_fr_pool' ./drivers/infiniband/ulp/opa_vnic/opa_vnic_encap.h:148: warning: Function parameter or member 'rsvd0' not described in 'opa_vesw_info' ./drivers/infiniband/ulp/opa_vnic/opa_vnic_encap.h:148: warning: Function parameter or member 'rsvd1' not described in 'opa_vesw_info' ./drivers/infiniband/ulp/opa_vnic/opa_vnic_encap.h:148: warning: Function parameter or member 'rsvd2' not described in 'opa_vesw_info' ./drivers/infiniband/ulp/opa_vnic/opa_vnic_encap.h:148: warning: Function parameter or member 'rsvd3' not described in 'opa_vesw_info' ./drivers/infiniband/ulp/opa_vnic/opa_vnic_encap.h:148: warning: Function parameter or member 'rsvd4' not described in 'opa_vesw_info' ./drivers/infiniband/ulp/opa_vnic/opa_vnic_encap.h:205: warning: Function parameter or member 'rsvd0' not described in 'opa_per_veswport_info' ./drivers/infiniband/ulp/opa_vnic/opa_vnic_encap.h:205: warning: Function parameter or member 'rsvd1' not described in 'opa_per_veswport_info' ./drivers/infiniband/ulp/opa_vnic/opa_vnic_encap.h:205: warning: Function parameter or member 'rsvd2' not described in 'opa_per_veswport_info' ./drivers/infiniband/ulp/opa_vnic/opa_vnic_encap.h:205: warning: Function parameter or member 'rsvd3' not described in 'opa_per_veswport_info' ./drivers/infiniband/ulp/opa_vnic/opa_vnic_encap.h:342: warning: Function parameter or member 'reserved' not described in 'opa_veswport_summary_counters' ./drivers/infiniband/ulp/opa_vnic/opa_vnic_encap.h:394: warning: Function parameter or member 'rsvd0' not described in 'opa_veswport_error_counters' ./drivers/infiniband/ulp/opa_vnic/opa_vnic_encap.h:394: warning: Function parameter or member 'rsvd1' not described in 'opa_veswport_error_counters' ./drivers/infiniband/ulp/opa_vnic/opa_vnic_encap.h:394: warning: Function parameter or member 'rsvd2' not described in 'opa_veswport_error_counters' ./drivers/infiniband/ulp/opa_vnic/opa_vnic_encap.h:394: warning: Function parameter or member 'rsvd3' not described in 'opa_veswport_error_counters' ./drivers/infiniband/ulp/opa_vnic/opa_vnic_encap.h:394: warning: Function parameter or member 'rsvd4' not described in 'opa_veswport_error_counters' ./drivers/infiniband/ulp/opa_vnic/opa_vnic_encap.h:394: warning: Function parameter or member 'rsvd5' not described in 'opa_veswport_error_counters' ./drivers/infiniband/ulp/opa_vnic/opa_vnic_encap.h:394: warning: Function parameter or member 'rsvd6' not described in 'opa_veswport_error_counters' ./drivers/infiniband/ulp/opa_vnic/opa_vnic_encap.h:394: warning: Function parameter or member 'rsvd7' not described in 'opa_veswport_error_counters' ./drivers/infiniband/ulp/opa_vnic/opa_vnic_encap.h:394: warning: Function parameter or member 'rsvd8' not described in 'opa_veswport_error_counters' ./drivers/infiniband/ulp/opa_vnic/opa_vnic_encap.h:394: warning: Function parameter or member 'rsvd9' not described in 'opa_veswport_error_counters' ./drivers/infiniband/ulp/opa_vnic/opa_vnic_encap.h:460: warning: Function parameter or member 'reserved' not described in 'opa_vnic_vema_mad' ./drivers/infiniband/ulp/opa_vnic/opa_vnic_encap.h:485: warning: Function parameter or member 'reserved' not described in 'opa_vnic_notice_attr' ./drivers/infiniband/ulp/opa_vnic/opa_vnic_encap.h:500: warning: Function parameter or member 'reserved' not described in 'opa_vnic_vema_mad_trap' Link: https://lore.kernel.org/r/5373936.DvuYhMxLoT@laptop.coltonlewis.name Signed-off-by: Colton Lewis <colton.w.lewis@protonmail.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
#
1fc43132 |
|
28-May-2020 |
Israel Rukshin <israelr@mellanox.com> |
RDMA/iser: Remove support for FMR memory registration FMR is not supported on most recent RDMA devices (that use fast memory registration mechanism). Also, FMR was recently removed from NFS/RDMA ULP. Link: https://lore.kernel.org/r/1-v3-f58e6669d5d3+2cf-fmr_removal_jgg@mellanox.com Signed-off-by: Israel Rukshin <israelr@mellanox.com> Signed-off-by: Max Gurtovoy <maxg@mellanox.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
#
28f2a6ae |
|
09-Oct-2019 |
rd.dunlab@gmail.com <rd.dunlab@gmail.com> |
infiniband: fix ulp/iser/iscsi_iser.h kernel-doc warnings Fix kernel-doc warnings and typos/spellos. ../drivers/infiniband/ulp/iser/iscsi_iser.h:254: warning: Function parameter or member 'dma_addr' not described in 'iser_tx_desc' ../drivers/infiniband/ulp/iser/iscsi_iser.h:254: warning: Function parameter or member 'cqe' not described in 'iser_tx_desc' ../drivers/infiniband/ulp/iser/iscsi_iser.h:254: warning: Function parameter or member 'reg_wr' not described in 'iser_tx_desc' ../drivers/infiniband/ulp/iser/iscsi_iser.h:254: warning: Function parameter or member 'send_wr' not described in 'iser_tx_desc' ../drivers/infiniband/ulp/iser/iscsi_iser.h:254: warning: Function parameter or member 'inv_wr' not described in 'iser_tx_desc' ../drivers/infiniband/ulp/iser/iscsi_iser.h:277: warning: Function parameter or member 'cqe' not described in 'iser_rx_desc' ../drivers/infiniband/ulp/iser/iscsi_iser.h:296: warning: Function parameter or member 'rsp' not described in 'iser_login_desc' ../drivers/infiniband/ulp/iser/iscsi_iser.h:339: warning: Function parameter or member 'reg_mem' not described in 'iser_reg_ops' ../drivers/infiniband/ulp/iser/iscsi_iser.h:399: warning: Function parameter or member 'all_list' not described in 'iser_fr_desc' ../drivers/infiniband/ulp/iser/iscsi_iser.h:413: warning: Function parameter or member 'all_list' not described in 'iser_fr_pool' ../drivers/infiniband/ulp/iser/iscsi_iser.h:439: warning: Function parameter or member 'reg_cqe' not described in 'ib_conn' ../drivers/infiniband/ulp/iser/iscsi_iser.h:491: warning: Function parameter or member 'snd_w_inv' not described in 'iser_conn' This leaves 2 "member not described" warnings that I don't know how to fix: ../drivers/infiniband/ulp/iser/iscsi_iser.h:401: warning: Function parameter or member 'all_list' not described in 'iser_fr_desc' ../drivers/infiniband/ulp/iser/iscsi_iser.h:415: warning: Function parameter or member 'all_list' not described in 'iser_fr_pool' Link: https://lore.kernel.org/r/20191010035239.756365352@gmail.com Signed-off-by: Randy Dunlap <rd.dunlab@gmail.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
#
df130f87 |
|
09-Oct-2019 |
rd.dunlab@gmail.com <rd.dunlab@gmail.com> |
infiniband: fix ulp/iser/iscsi_iser.[hc] kernel-doc notation Fix struct name in kernel-doc notation to match the struct name below it. Fix one typo (spello). Fix formatting as expected for kernel-doc notation. Fix parameter name to match the function's parameter name to eliminate a kernel-doc warning. ../drivers/infiniband/ulp/iser/iscsi_iser.c:815: warning: Function parameter or member 'non_blocking' not described in 'iscsi_iser_ep_connect' Link: https://lore.kernel.org/r/20191010035239.623888112@gmail.com Signed-off-by: Randy Dunlap <rd.dunlab@gmail.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
#
6eeff06d |
|
24-Sep-2019 |
Max Gurtovoy <maxg@mellanox.com> |
IB/iser: remove redundant macro definitions Use the general linux definition for 4K and retrieve the rest from it. Link: https://lore.kernel.org/r/1569359148-12312-1-git-send-email-maxg@mellanox.com Signed-off-by: Max Gurtovoy <maxg@mellanox.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
#
1ba7c8f8 |
|
12-Sep-2019 |
Sergey Gorenko <sergeygo@mellanox.com> |
IB/iser: Support up to 16MB data transfer in a single command Maximum supported IO size is 8MB for the iSER driver. The current value is limited by the ISCSI_ISER_MAX_SG_TABLESIZE macro. But the driver is able to handle 16MB IOs without any significant changes. Increasing this limit can be useful for the storage arrays which are fine tuned for IOs larger than 8 MB. This commit allows to configure maximum IO size up to 16MB using the max_sectors module parameter. Link: https://lore.kernel.org/r/20190912103534.18210-1-sergeygo@mellanox.com Signed-off-by: Sergey Gorenko <sergeygo@mellanox.com> Reviewed-by: Max Gurtovoy <maxg@mellanox.com> Acked-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
#
b9294f8b |
|
11-Jun-2019 |
Israel Rukshin <israelr@mellanox.com> |
IB/iser: Unwind WR union at iser_tx_desc After decreasing WRs array size from 7 to 3 it is more readable to give each WR a descriptive name. Signed-off-by: Israel Rukshin <israelr@mellanox.com> Reviewed-by: Max Gurtovoy <maxg@mellanox.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
#
b76a4399 |
|
11-Jun-2019 |
Israel Rukshin <israelr@mellanox.com> |
IB/iser: Use IB_WR_REG_MR_INTEGRITY for PI handover Using this new API reduces iSER code complexity. It also reduces the maximum number of work requests per task and the need of dealing with multiple MRs (and their registrations and invalidations) per task. It is done by using a single WR and a special MR type (IB_MR_TYPE_INTEGRITY) for PI operation. The setup of the tested benchmark: - 2 servers with 24 cores (1 initiator and 1 target) - 24 target sessions with 1 LUN each - ramdisk backstore - PI active Performance results running fio (24 jobs, 128 iodepth) using write_generate=0 and read_verify=0 (w/w.o patch): bs IOPS(read) IOPS(write) ---- ---------- ---------- 512 1236.6K/1164.3K 1357.2K/1332.8K 1k 1196.5K/1163.8K 1348.4K/1262.7K 2k 1016.7K/921950 1003.7K/931230 4k 662728/600545 595423/501513 8k 385954/384345 333775/277090 16k 222864/222820 170317/170671 32k 116869/114896 82331/82244 64k 55205/54931 40264/40021 Using write_generate=1 and read_verify=1 (w/w.o patch): bs IOPS(read) IOPS(write) ---- ---------- ---------- 512 1090.1K/1030.9K 1303.9K/1101.4K 1k 1057.7K/904583 1318.4K/988085 2k 965226/638799 1008.6K/692514 4k 555479/410151 542414/414517 8k 298675/224964 264729/237508 16k 133485/122481 164625/138647 32k 74329/67615 80143/78743 64k 35716/35519 39294/37334 We get performance improvement at all block sizes. The most significant improvement is when writing 4k bs (almost 30% more iops). Signed-off-by: Israel Rukshin <israelr@mellanox.com> Reviewed-by: Max Gurtovoy <maxg@mellanox.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
#
19b1a294 |
|
24-Feb-2019 |
Erez Alfasi <ereza@mellanox.com> |
RDMA: Use __packed annotation instead of __attribute__ ((packed)) "__attribute__" set of macros has been standardized, have became more potentially portable and consistent code back in v2.6.21 by commit 82ddcb040 ("[PATCH] extend the set of "__attribute__" shortcut macros"). Moreover, nowadays checkpatch.pl warns about using __attribute__((packed)) instead of __packed. This patch converts all the "__attribute__ ((packed))" annotations to "__packed" within the RDMA subsystem. Signed-off-by: Erez Alfasi <ereza@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
#
c1545f1a |
|
25-Feb-2019 |
Max Gurtovoy <maxg@mellanox.com> |
IB/iser: Fix dma_nents type definition The retured value from ib_dma_map_sg saved in dma_nents variable. To avoid future mismatch between types, define dma_nents as an integer instead of unsigned. Fixes: 57b26497fabe ("IB/iser: Pass the correct number of entries for dma mapped SGL") Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Israel Rukshin <israelr@mellanox.com> Signed-off-by: Max Gurtovoy <maxg@mellanox.com> Acked-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
#
c6c2c03a |
|
31-May-2018 |
Max Gurtovoy <maxg@mellanox.com> |
IB/iser: use T10-PI check mask definitions from core layer No reason to re-define protection information check in ib_iser driver. Use check masks from RDMA core driver. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Max Gurtovoy <maxg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
#
434dda42 |
|
21-May-2018 |
Sergey Gorenko <sergeygo@mellanox.com> |
IB/iser: Do not reduce max_sectors The iSER driver reduces max_sectors. For example, if you load the ib_iser module with max_sectors=1024, you will see that /sys/class/block/<bdev>/queue/max_hw_sectors_kb is 508. It is an incorrect value. The expected value is (max_sectors * sector_size) / 1024 = 512. Reducing of max_sectors can cause performance degradation due to unnecessary splitting of IO requests. The number of pages per MR has been fixed here, so there is no longer any need to reduce max_sectors. Fixes: 9c674815d346 ("IB/iser: Fix max_sectors calculation") Signed-off-by: Sergey Gorenko <sergeygo@mellanox.com> Reviewed-by: Israel Rukshin <israelr@mellanox.com> Reviewed-by: Max Gurtovoy <maxg@mellanox.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Sagi Grimberg <sagi@grimberg.me> Acked-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
ea174c95 |
|
27-Feb-2017 |
Sagi Grimberg <sagi@grimberg.me> |
RDMA/iser: Fix possible mr leak on device removal event When the rdma device is removed, we must cleanup all the rdma resources within the DEVICE_REMOVAL event handler to let the device teardown gracefully. When this happens with live I/O, some memory regions are occupied. Thus, track them too and dereg all the mr's. We are safe with mr access by iscsi_iser_cleanup_task. Reported-by: Raju Rangoju <rajur@chelsio.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Max Gurtovoy <maxg@mellanox.com> Reviewed-by: Max Gurtovoy <maxg@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
83236f01 |
|
17-Jan-2017 |
Max Gurtovoy <maxg@mellanox.com> |
IB/iser: remove unused variable from iser_conn struct max_sectors calculation was fixed in commit: 9c674815d346 ("IB/iser: Fix max_sectors calculation"). Thus, iser_conn variable scsi_max_sectors is not needed anymore. Signed-off-by: Max Gurtovoy <maxg@mellanox.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Tested-by: Raju Rangoju <rajur@chelsio.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
8e61212d |
|
04-Sep-2016 |
Christoph Hellwig <hch@lst.de> |
IB/iser: use IB_PD_UNSAFE_GLOBAL_RKEY Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
4c8ba94d |
|
17-Feb-2016 |
Steve Wise <larrystevenwise@gmail.com> |
IB/iser: Use ib_drain_sq() Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
59caaed7 |
|
23-Dec-2015 |
Jenny Derzhavetz <jennyf@mellanox.com> |
IB/iser: Support the remote invalidation exception Declare that we support remote invalidation in case we are: 1. using fastreg method 2. always registering memory Detect the invalidated rkey from the work completion info so we won't invalidate it locally. The spec mandates that we must not rely on the target remote invalidate our rkey so we must check it upon a receive (scsi response) completion. Signed-off-by: Jenny Derzhavetz <jennyf@mellanox.com> Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
d3cf81f9 |
|
09-Dec-2015 |
Sagi Grimberg <sagig@mellanox.com> |
IB/iser,isert: Create and use new shared header The iser RDMA_CM negotiation protocol is shared by the initiator and the target, so have a shared header for the defines and structure. Move relevant items from the initiator and target headers. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Jenny Derzhavetz <jennyf@mellanox.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
b5f04b00 |
|
09-Dec-2015 |
Jenny Derzhavetz <jennyf@mellanox.com> |
IB/iser: Don't register memory for all immediate data writes When all the task data is sent as immediate data, we are allowed to use the local_dma_lkey as it is not sent to the wire. Signed-off-by: Jenny Derzhavetz <jennyf@mellanox.com> Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
bfe066e2 |
|
09-Dec-2015 |
Sagi Grimberg <sagig@mellanox.com> |
IB/iser: Reuse ib_sg_to_pages We have in iser iser_sg_to_page_vec which has exactly the same role as ib_sg_to_pages. Customize the page_vec to hold a fake MR so we can reuse ib_sg_to_pages. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
2392a4cd |
|
28-Nov-2015 |
Julia Lawall <Julia.Lawall@lip6.fr> |
IB/iser: constify iser_reg_ops structure The iser_reg_ops structures are never modified, so declare them as const. Done with the help of Coccinelle. Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr> Acked-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
4a061b28 |
|
18-Dec-2015 |
Or Gerlitz <ogerlitz@mellanox.com> |
IB/ulps: Avoid calling ib_query_device Instead, use the cached copy of the attributes present on the device. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
cfeb91b3 |
|
11-Dec-2015 |
Christoph Hellwig <hch@lst.de> |
IB/iser: Convert to CQ abstraction Use the new CQ abstraction to simplify completions in the iSER initiator. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
|
#
7edc5a99 |
|
04-Nov-2015 |
Sagi Grimberg <sagig@mellanox.com> |
IB/iser: Use helper for container_of Nicer this way. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
|
#
0f512b34 |
|
04-Nov-2015 |
Sagi Grimberg <sagig@mellanox.com> |
IB/iser: Use a dedicated descriptor for login We'll need it later with the new CQ abstraction. also switch login bufs to void pointers. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
|
#
39405885 |
|
13-Oct-2015 |
Sagi Grimberg <sagig@mellanox.com> |
IB/iser: Port to new fast registration API Remove fastreg page list allocation as the page vector is now private to the provider. Instead of constructing the page list and fast_req work request, call ib_map_mr_sg and construct ib_reg_wr. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Acked-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
dd0107a0 |
|
13-Oct-2015 |
Sagi Grimberg <sagig@mellanox.com> |
IB/iser: set block queue_virt_boundary The block layer can reliably guarantee that SG lists won't contain gaps (page unaligned) if a driver set the queue virt_boundary. With this setting the block layer will: - refuse merges if bios are not aligned to the virtual boundary - split bios/requests that are not aligned to the virtual boundary - or, bounce buffer SG_IOs that are not aligned to the virtual boundary Since iser is working in 4K page size, set the virt_boundary to 4K pages. With this setting, we can now safely remove the bounce buffering logic in iser. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
e622f2f4 |
|
08-Oct-2015 |
Christoph Hellwig <hch@lst.de> |
IB: split struct ib_send_wr This patch split up struct ib_send_wr so that all non-trivial verbs use their own structure which embedds struct ib_send_wr. This dramaticly shrinks the size of a WR for most common operations: sizeof(struct ib_send_wr) (old): 96 sizeof(struct ib_send_wr): 48 sizeof(struct ib_rdma_wr): 64 sizeof(struct ib_atomic_wr): 96 sizeof(struct ib_ud_wr): 88 sizeof(struct ib_fast_reg_wr): 88 sizeof(struct ib_bind_mw_wr): 96 sizeof(struct ib_sig_handover_wr): 80 And with Sagi's pending MR rework the fast registration WR will also be down to a reasonable size: sizeof(struct ib_fastreg_wr): 64 Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com> [srp, srpt] Reviewed-by: Chuck Lever <chuck.lever@oracle.com> [sunrpc] Tested-by: Haggai Eran <haggaie@mellanox.com> Tested-by: Sagi Grimberg <sagig@mellanox.com> Tested-by: Steve Wise <swise@opengridcomputing.com>
|
#
3cffd930 |
|
24-Sep-2015 |
Sagi Grimberg <sagig@mellanox.com> |
IB/iser: Add module parameter for always register memory This module parameter forces memory registration even for a continuous memory region. It is true by default as sending an all-physical rkey with remote permissions might be insecure. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
7332bed0 |
|
06-Aug-2015 |
Sagi Grimberg <sagig@mellanox.com> |
IB/iser: Chain all iser transaction send work requests Chaning of send work requests benefits performance by reducing the send queue lock contention (acquired in ib_post_send) and saves us HW doorbells which is posted only once. Currently, in normal IO flows iser does not chain the CDB send work request with the registration work request. Also in PI flows, signature work requests are not chained as well. Lets chain those and post only once. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
df749cdc |
|
06-Aug-2015 |
Sagi Grimberg <sagig@mellanox.com> |
IB/iser: Support up to 8MB data transfer in a single command iser support up to 512KB data transfer in a single scsi command. This means that larger IOs will split to different request. While iser can easily saturate FDR/EDR wires, some arrays are fine tuned for 1MB (or larger) IO sizes, hence add an option to support larger transfers (up to 8MB) if the device allows it. Given that a few target implementations don't support data transfers of more than 512KB by default and the fact that larger IO sizes require more resources, we introduce a module parameter to determine the maximum number of 512B sectors in a single scsi command. Users that are interested in larger transfers can change this value given that the target supports larger transfers. At the moment, iser works in 4K pages granularity, In a later stage we will get it to work with system page size instead. IO operations that consists of N pages will need a page vector of size N+1 in case the first SG element contains an offset. Given that some devices allocates memory regions in powers of 2, this means that allocating a region with N+1 pages, will result in region resources allocation of the next power of 2. Since we don't want that to happen, in case we are in the limit of IO size supported and the first SG element has an offset, we align the SG list using a bounce buffer (which is OK given that this is not likely to happen a lot). Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
f8db651d |
|
06-Aug-2015 |
Sagi Grimberg <sagig@mellanox.com> |
IB/iser: Pass registration pool a size parameter Hard coded for now. This will allow to allocate different sized MRs depending on the IO size needed (and device capabilities). This patch does not change any functionality. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
32467c42 |
|
06-Aug-2015 |
Sagi Grimberg <sagig@mellanox.com> |
IB/iser: Unify fast memory registration flows iser_reg_rdma_mem_[fastreg|fmr] share a lot of code, and logically do the same thing other than the buffer registration method itself (iser_fast_reg_mr vs. iser_fast_reg_fmr). The DIF logic is not implemented in the FMR flow as there is no existing device that supports FMRs and Signature feature. This patch unifies the flow in a single routine iser_reg_rdma_mem and just split to fmr/frwr for the buffer registration itself. Also, for symmetry reasons, unify iser_unreg_rdma_mem (which will call the relevant device specific unreg routine). Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Adir Lev <adirl@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
81722909 |
|
06-Aug-2015 |
Sagi Grimberg <sagig@mellanox.com> |
IB/iser: Make reg_desc_get a per device routine As for fmrs we will hold a single registration descriptor as no need for multiple like in the frwr mode (descriptor for each task). This change helps unifying the duplicate registration code paths. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Adir Lev <adirl@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
2b3bf958 |
|
06-Aug-2015 |
Adir Lev <adirl@mellanox.com> |
IB/iser: Maintain connection fmr_pool under a single registration descriptor This will allow us to unify the memory registration code path between the various methods which vary by the device capabilities. This change will make it easier and less intrusive to remove fmr_pools from the code when we'd want to. The reason we use a single descriptor is to avoid taking a redundant spinlock when working with FMRs. We also change the signature of iser_reg_page_vec to make it match iser_fast_reg_mr (and the future indirect registration method). Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Adir Lev <adirl@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
385ad87d |
|
06-Aug-2015 |
Sagi Grimberg <sagig@mellanox.com> |
IB/iser: Introduce iser registration pool struct Instead of having it a part of the connection structure, have it be under a dedicated (embedded) structure in the connection. A logical separation of the registration pool and the connection structure. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Adir Lev <adirl@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
48afbff6 |
|
06-Aug-2015 |
Sagi Grimberg <sagig@mellanox.com> |
IB/iser: Introduce iser_reg_ops Move all the per-device function pointers to an easy extensible iser_reg_ops structure that contains all the iser registration operations. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
5190cc26 |
|
06-Aug-2015 |
Sagi Grimberg <sagig@mellanox.com> |
IB/iser: Rename struct fast_reg_descriptor -> iser_fr_desc Avoid struct names without iser_ prefix. This patch does not change any functionality. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
d711d81d |
|
06-Aug-2015 |
Sagi Grimberg <sagig@mellanox.com> |
IB/iser: Introduce struct iser_reg_resources Have fast_reg_descriptor hold struct iser_reg_resources (mr, frpl, valid flag). This will be useful when the actual buffer registration routines will be passed with the needed registration resources (i.e. iser_reg_resources) without being aware of their nature (i.e. data or protection). In order to achieve this, we remove reg_indicators flags container and place specific flags (mr_valid) within iser_reg_resources struct. We also place the sig_mr_valid and sig_protcted flags in iser_pi_context. This patch also modifies iser_fast_reg_mr to receive the reg_resources instead of the fast_reg_descriptor and a data/protection indicator. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Adir Lev <adirl@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
8d5944d8 |
|
06-Aug-2015 |
Sagi Grimberg <sagig@mellanox.com> |
IB/iser: Fix possible bogus DMA unmapping If iser_initialize_task_headers() routine failed before dma mapping, we should not attempt to unmap in cleanup_task(). Fixes: 7414dde0a6c3a958e (IB/iser: Fix race between iser connection ...) Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
ba943fb2 |
|
14-Apr-2015 |
Sagi Grimberg <sagig@mellanox.com> |
IB/iser: Rewrite bounce buffer code path In some rare cases, IO operations may be not aligned to page boundaries. This prevents iser from performing fast memory registration. In order to overcome that iser uses a bounce buffer to carry the transaction. We basically allocate a buffer in the size of the transaction and perform a copy. The buffer allocation using kmalloc is too restrictive since it requires higher order (atomic) allocations for large transactions (which may result in memory exhaustion fairly fast for some workloads). We rewrite the bounce buffer code path to allocate scattered pages and perform a copy between the transaction sg and the bounce sg. Reported-by: Alex Lyakas <alex@zadarastorage.com> Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
4fcd1470 |
|
14-Apr-2015 |
Sagi Grimberg <sagig@mellanox.com> |
IB/iser: Bump version to 1.6 Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
90a6684c |
|
14-Apr-2015 |
Sagi Grimberg <sagig@mellanox.com> |
IB/iser: Modify struct iser_mem_reg members No need to keep lkey, va, len variables, we can keep them as struct ib_sge. This will help when we change the memory registration logic. This patch does not change any functionality. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Adir Lev <adirl@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
bd8b944e |
|
14-Apr-2015 |
Sagi Grimberg <sagig@mellanox.com> |
IB/iser: Move fastreg descriptor pool get/put to helper functions Instead of open-coding connection fastreg pool get/put, we introduce iser_reg_desc[get|put] helpers. We aren't setting these static as this will be a per-device routine later on. Also, cleanup iser_unreg_rdma_mem_fastreg a bit. This patch does not change any functionality. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Adir Lev <adirl@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
b130eded |
|
14-Apr-2015 |
Sagi Grimberg <sagig@mellanox.com> |
IB/iser: Get rid of struct iser_rdma_regd This struct members other than struct iser_mem_reg are unused, so remove it altogether. This patch does not change any functionality. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Adir Lev <adirl@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
d03e61d0 |
|
14-Apr-2015 |
Sagi Grimberg <sagig@mellanox.com> |
IB/iser: Move memory reg/dereg routines to iser_memory.c As memory registration/de-registration methods, lets move them to their natural location. While we're at it, make iser_reg_page_vec routine static. This patch does not change any functionality. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
e3784bd1 |
|
14-Apr-2015 |
Sagi Grimberg <sagig@mellanox.com> |
IB/iser: Remove a redundant struct iser_data_buf No need to keep two iser_data_buf structures just in case we use mem copy. We can avoid that just by adding a pointer to the original sg. So keep only two iser_data_buf per command (data and protection) and pass the relevant data_buf to bounce buffer routine. This patch does not change any functionality. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Adir Lev <adirl@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
c6c95ef4 |
|
28-Dec-2014 |
Roi Dayan <roid@mellanox.com> |
IB/iser: Use correct dma direction when unmapping SGs We always unmap SGs with the same direction instead of unmapping with the direction the mapping was done, fix that. Fixes: 9a8b08fad2ef ("IB/iser: Generalize iser_unmap_task_data and [...]") Signed-off-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
|
#
056da88f |
|
07-Dec-2014 |
Or Gerlitz <ogerlitz@mellanox.com> |
IB/iser: Bump version to 1.5 Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
|
#
60e20908 |
|
07-Dec-2014 |
Sagi Grimberg <sagig@mellanox.com> |
IB/iser: Micro-optimize iser logging And fix a checkpatch warning. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
|
#
da64bdb2 |
|
07-Dec-2014 |
Sagi Grimberg <sagig@mellanox.com> |
IB/iser: Use more completion queues No reason to settle with four, can use the min between device max comp vectors and number of cores. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
|
#
7e1fd4d1 |
|
07-Dec-2014 |
Sagi Grimberg <sagig@mellanox.com> |
IB/iser: Remove redundant is_mr indicator It is enough to check mem_h pointer assignment, mem_h == NULL will indicate that buffer is not registered using mr. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
|
#
6ec9d4d2 |
|
07-Dec-2014 |
Max Gurtovoy <maxg@mellanox.com> |
IB/iser: Fix possible SQ overflow Fix a regression was introduced in commit 6df5a128f0fd ("IB/iser: Suppress scsi command send completions"). The sig_count was wrongly set to be static variable, thus it is possible that we won't reach to (sig_count % ISER_SIGNAL_BATCH) == 0 condition (due to races) and the send queue will be overflowed. Instead keep sig_count per connection. We don't need it to be atomic as we are safe under the iscsi session frwd_lock taken by libiscsi on the queuecommand path. Fixes: 6df5a128f0fd ("IB/iser: Suppress scsi command send completions") Signed-off-by: Max Gurtovoy <maxg@mellanox.com> Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
|
#
f4641ef7 |
|
07-Dec-2014 |
Minh Tran <minhduc.tran@emulex.com> |
IB/iser: Re-adjust CQ and QP send ring sizes to HW limits Re-adjust max CQEs per CQ and max send_wr per QP according to the resource limits supported by underlying hardware. Signed-off-by: Minh Tran <minhduc.tran@emulex.com> Signed-off-by: Jayamohan Kallickal <jayamohan.kallickal@emulex.com> Acked-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
|
#
b261aeaf |
|
01-Oct-2014 |
Or Gerlitz <ogerlitz@mellanox.com> |
IB/iser: Bump version, add maintainer Update the driver version and add Sagi Grimberg as maintainer Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
|
#
cd88621a |
|
01-Oct-2014 |
Sagi Grimberg <sagig@mellanox.com> |
IB/iser: Add/Fix kernel doc style descriptions in iscsi_iser.h - iser_hdr - iser_data_buf - iser_mem_reg - iser_regd_buf - iser_tx_desc - iser_rx_desc - iser_device - iser_pi_context - iser_conn - ib_conn - iser_comp - iscsi_iser_task - iser_global While we're at it, change nit alignments in this file This patch does not change any functionality. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
|
#
e9d49b82 |
|
01-Oct-2014 |
Sagi Grimberg <sagig@mellanox.com> |
IB/iser: Nit - add space after __func__ in iser logging Change logging: "iser:XXXX" to "iser: XXXX" Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
|
#
6df5a128 |
|
01-Oct-2014 |
Sagi Grimberg <sagig@mellanox.com> |
IB/iser: Suppress scsi command send completions Singal completion of every 32 scsi commands and suppress all the rest. We don't do anything upon getting the completion so no need to "just consume" it. Cleanup of scsi command is done in cleanup_task callback. Still keep dataout and control send completions as we may need to cleanup there. This helps reducing the amount of interrupts/completions in the IO path. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
|
#
6e6fe2fb |
|
01-Oct-2014 |
Sagi Grimberg <sagig@mellanox.com> |
IB/iser: Optimize completion polling Poll in batch of 16. Since we don't want it on the stack, keep under iser completion context (iser_comp). Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
|
#
ff3dd52d |
|
01-Oct-2014 |
Sagi Grimberg <sagig@mellanox.com> |
IB/iser: Use beacon to indicate all completions were consumed Avoid post_send counting (atomic) in the IO path just to keep track of how many completions we need to consume. Use a beacon post to indicate that all prior posts completed. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
|
#
6aabfa76 |
|
01-Oct-2014 |
Sagi Grimberg <sagig@mellanox.com> |
IB/iser: Use single CQ for RX and TX This will solve a possible condition where we might miss TX completion (flush error) during session teardown. Since we are using a single CQ, we don't need to actively drain the TX CQ, instead just wait for flush_completion (when counters reach zero) and remove iser_poll_for_flush_errors(). This patch might introduce a minor performance regression on its own, but the next patches will enhance performance using a single CQ for RX and TX. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
|
#
bf175540 |
|
01-Oct-2014 |
Sagi Grimberg <sagig@mellanox.com> |
IB/iser: Centralize iser completion contexts Introduce iser_comp which centralizes all iser completion related items and is referenced by iser_device and each ib_conn. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
|
#
c47a3c9e |
|
01-Oct-2014 |
Sagi Grimberg <sagig@mellanox.com> |
IB/iser: Fix DEVICE REMOVAL handling in the absence of iscsi daemon iscsi daemon is in user-space, thus we can't rely on it to be invoked at connection teardown (if not running or does not receive CPU time). This patch addresses the issue by re-structuring iSER connection teardown logic and CM events handling. The CM events will dictate the RDMA resources destruction (ib_conn) and iser_conn is kept around as long as iscsi_conn is left around allowing iscsi/iser callbacks to continue after RDMA transport was destroyed. This patch introduces a separation in logic when handling CM events: - DISCONNECTED_HANDLER, ADDR_CHANGED This events indicate the start of teardown process. Actions: 1. Terminate the connection: rdma_disconnect (send DREQ/DREP) 2. Notify iSCSI of connection failure 3. Change state to TERMINATING 4. Poll for all flush errors to be consumed - TIMEWAIT_EXIT, DEVICE_REMOVAL These events indicate the final stage of termination process and we can free RDMA related resources. Actions: 1. Call disconnected handler (we are not guaranteed that DISCONNECTED event was invoked in the past) 2. Cleanup RDMA related resources 3. For DEVICE_REMOVAL return non-zero rc from cma_handler to implicitly destroy the cm_id (Can't rely on user-space, make sure we have forward progress) We replace flush_completion (indicate all flushes were consumed) with ib_completion (rdma resources were cleaned up). The iser_conn_release_work will wait for teardown completions: - conn_stop was completed (tasks were cleaned-up) - stop_completion - RDMA resources were destroyed - ib_completion And then will continue to free iser connection representation (iser_conn). Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Ariel Nahum <arieln@mellanox.com> Signed-off-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
|
#
6bb0279f |
|
01-Oct-2014 |
Roi Dayan <roid@mellanox.com> |
IB/iser: Remove unused variables and dead code Signed-off-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
|
#
a4ee3539 |
|
01-Oct-2014 |
Sagi Grimberg <sagig@mellanox.com> |
IB/iser: Re-introduce ib_conn Structure that describes the RDMA relates connection objects. Static member of iser_conn. This patch does not change any functionality Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
|
#
5716af6e |
|
01-Oct-2014 |
Sagi Grimberg <sagig@mellanox.com> |
IB/iser: Rename ib_conn -> iser_conn Two reasons why we choose to do this: 1. No point today calling struct iser_conn by another name ib_conn 2. In the next patches we will restructure iser control plane representation - struct iser_conn: connection logical representation - struct ib_conn: connection RDMA layout representation This patch does not change any functionality. Signed-off-by: Ariel Nahum <arieln@mellanox.com> Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
|
#
61aabb3c |
|
02-Sep-2014 |
Or Gerlitz <ogerlitz@mellanox.com> |
IB/iser: Bump version to 1.4.1 Signed-off-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
|
#
9a6d3234 |
|
31-Jul-2014 |
Ariel Nahum <arieln@mellanox.com> |
IB/iser: Replace connection waitqueue with completion object Instead of waiting for events and condition changes of the iser connection state, we wait for explicit completion of connection establishment and teardown. Separate connection establishment wait object from the teardown object to avoid a situation where racing connection establishment and teardown may concurrently wakeup each other. ep_poll will wait for up_completion invoked by iser_connected_handler() and iser release worker will wait for flush_completion before releasing the connection. Bound the completion wait with a 30 seconds timeout for cases where iscsid (the user space iscsi daemon) is too slow or gone. Signed-off-by: Ariel Nahum <arieln@mellanox.com> Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
|
#
504130c0 |
|
31-Jul-2014 |
Ariel Nahum <arieln@mellanox.com> |
IB/iser: Protect iser state machine with a mutex The iser connection state lookups and transitions are not fully protected. Some transitions are protected with a spinlock, and in some cases the state is accessed unprotected due to specific assumptions of the flow. Introduce a new mutex to protect the connection state access. We use a mutex since we need to also include a scheduling operations executed under the state lock. Each state transition/condition and its corresponding action will be protected with the state mutex. The rdma_cm events handler acquires the mutex when handling connection events. Since iser connection state can transition to DOWN concurrently during connection establishment, we bailout from addr/route resolution events when the state is not PENDING. This addresses a scenario where ep_poll retries expire during CMA connection establishment. In this case ep_disconnect is invoked while CMA events keep coming (address/route resolution, connected, etc...). Signed-off-by: Ariel Nahum <arieln@mellanox.com> Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
|
#
96ed02d4 |
|
31-Jul-2014 |
Roi Dayan <roid@mellanox.com> |
IB/iser: Support IPv6 address family Replace struct sockaddr_in with struct sockaddr which supports both IPv4 and IPv6, and print using the %pIS format directive. Signed-off-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
|
#
c7ca4b69 |
|
22-May-2014 |
Or Gerlitz <ogerlitz@mellanox.com> |
IB/iser: Bump version to 1.4 Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
|
#
b73c3ada |
|
22-May-2014 |
Ariel Nahum <arieln@mellanox.com> |
IB/iser: Simplify connection management iSER relies on refcounting to manage iser connections establishment and teardown. Following commit 39ff05dbbbdb ("IB/iser: Enhance disconnection logic for multi-pathing"), iser connection maintain 3 references: - iscsi_endpoint (at creation stage) - cma_id (at connection request stage) - iscsi_conn (at bind stage) We can avoid taking explicit refcounts by correctly serializing iser teardown flows (graceful and non-graceful). Our approach is to trigger a scheduled work to handle ordered teardown by gracefully waiting for 2 cleanup stages to complete: 1. Cleanup of live pending tasks indicated by iscsi_conn_stop completion 2. Flush errors processing Each completed stage will notify a waiting worker thread when it is done to allow teardwon continuation. Since iSCSI connection establishment may trigger endpoint disconnect without a successful endpoint connect, we rely on the iscsi <-> iser binding (.conn_bind) to learn about the teardown policy we should take wrt cleanup stages. Since all cleanup worker threads are scheduled (release_wq) in .ep_disconnect it is safe to assume that when module_exit is called, all cleanup workers are already scheduled. Thus proper module unload shall flush all scheduled works before allowing safe exit, to guarantee no resources got left behind. Signed-off-by: Ariel Nahum <arieln@mellanox.com> Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
|
#
5de2ad98 |
|
01-Apr-2014 |
Or Gerlitz <ogerlitz@mellanox.com> |
IB/iser: Bump driver version to 1.3 Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
|
#
3ee07d27 |
|
01-Apr-2014 |
Or Gerlitz <ogerlitz@mellanox.com> |
IB/iser: Update Mellanox copyright note Update Mellanox copyrights for 2014 on the iser initiator driver. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
|
#
4667f5df |
|
01-Apr-2014 |
Ariel Nahum <arieln@mellanox.com> |
IB/iser: Remove struct iscsi_iser_conn The iscsi stack has existing mechanisms to link back and forth between the iscsi connection and the iscsi transport (e.g iser/tcp) connection. This is done through a dd_data pointer field in struct iscsi_conn which can be set to point to the transport connection, etc. The iscsi_iser_conn structure was used to get this linking done in another way, which is uneeded and adds extra complication to the iser code, so we just remove it. Signed-off-by: Ariel Nahum <arieln@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
|
#
0a7a08ad |
|
05-Mar-2014 |
Sagi Grimberg <sagig@mellanox.com> |
IB/iser: Implement check_protection Once the iSCSI transaction is completed we must implement check_protection in order to notify on DIF errors that may have occured. The routine boils down to calling ib_check_mr_status to get the signature status of the transaction. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Alex Tabachnik <alext@mellanox.com> Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
|
#
177e31bd |
|
05-Mar-2014 |
Sagi Grimberg <sagig@mellanox.com> |
IB/iser: Support T10-PI operations Add logic to initialize protection information entities. Upon each iSCSI task, we keep the scsi_cmnd in order to query the scsi protection operations and reference to protection buffers. Modify iser_fast_reg_mr to receive indication whether it is registering the data or protection buffers. In addition introduce iser_reg_sig_mr which performs fast registration work-request for a signature enabled memory region (IB_WR_REG_SIG_MR). In this routine we set all the protection relevants for the device to offload protection data-transfer and verification. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Alex Tabachnik <alext@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
|
#
6b5a8fb0 |
|
05-Mar-2014 |
Alex Tabachnik <alext@mellanox.com> |
IB/iser: Initialize T10-PI resources During connection establishment we also initialize T10-PI resources (QP, PI contexts) in order to support SCSI's protection operations. Signed-off-by: Alex Tabachnik <alext@mellanox.com> Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
|
#
7f733847 |
|
05-Mar-2014 |
Alex Tabachnik <alext@mellanox.com> |
IB/iser: Introduce pi_enable, pi_guard module parameters Use modparams to activate protection information support. pi_enable bool: Based on this parameter iSER will know if it should support T10-PI. We don't want to do this by default as it requires to allocate and initialize extra resources. In case pi_enable=N, iSER won't publish to SCSI midlayer any DIF capabilities. pi_guard int: Based on this parameter iSER will publish DIX guard type support to SCSI midlayer. 0 means CRC is allowed to be passed in DIX buffers, 1 (or non-zero) means IP-CSUM is allowed to be passed in DIX buffers. Note that over the wire, only CRC is allowed. In the next phase, it is worth considering passing these parameters from iscsid via nlmsg. This will allow these parameters to be connection based rather than global. Signed-off-by: Alex Tabachnik <alext@mellanox.com> Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
|
#
9a8b08fa |
|
05-Mar-2014 |
Sagi Grimberg <sagig@mellanox.com> |
IB/iser: Generalize iser_unmap_task_data and finalize_rdma_unaligned_sg This routines operates on data buffers and may also work with protection infomation buffers. So we generalize them to handle an iser_data_buf which can be the command data or command protection information. This patch does not change any functionality. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
|
#
73bc06b7 |
|
05-Mar-2014 |
Sagi Grimberg <sagig@mellanox.com> |
IB/iser: Replace fastreg descriptor valid bool with indicators container In T10-PI support we will have memory keys for protection buffers and signature transactions. We prefer to compact indicators rather than keeping multiple bools. This commit does not change any functionality. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Alex Tabachnik <alext@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
|
#
65198d6b |
|
05-Mar-2014 |
Sagi Grimberg <sagig@mellanox.com> |
IB/iser: Keep IB device attributes under iser_device For T10-PI offload support, we will need to know the device signature offload capability upon every connection establishment. This patch does not change any functionality. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Alex Tabachnik <alext@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
|
#
7306b8fa |
|
05-Mar-2014 |
Sagi Grimberg <sagig@mellanox.com> |
IB/iser: Avoid FRWR notation, use fastreg instead FRWR stands for "fast registration work request". We want to avoid calling the fastreg pool with that name, instead we name it fastreg which stands for "fast registration". This pool will include more elements in the future, so it is a good idea to generalize the name. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Alex Tabachnik <alext@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
|
#
db523b8d |
|
22-Jan-2014 |
Sagi Grimberg <sagig@mellanox.com> |
IB/iser: Suppress completions for fast registration work requests In case iSER uses fast registration method, it should not request for successful completions on fast registration nor local invalidate requests. We color wr_id with ISER_FRWR_LI_WRID in order to correctly consume error completions. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
|
#
5587856c |
|
27-Jul-2013 |
Sagi Grimberg <sagig@mellanox.com> |
IB/iser: Introduce fast memory registration model (FRWR) Newer HCAs and Virtual functions may not support FMRs but rather a fast registration model, which we call FRWR - "Fast Registration Work Requests". This model was introduced in 00f7ec36c ("RDMA/core: Add memory management extensions support") and works when the IB device supports the IB_DEVICE_MEM_MGT_EXTENSIONS capability. Upon creating the iser device iser will test whether the HCA supports FMRs. If no support for FMRs, check if IB_DEVICE_MEM_MGT_EXTENSIONS is supported and assign function pointers that handle fast registration and allocation of appropriate resources (fast_reg descriptors). Registration is done using posting IB_WR_FAST_REG_MR to the QP and invalidations using posting IB_WR_LOCAL_INV. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
|
#
e657571b |
|
27-Jul-2013 |
Sagi Grimberg <sagig@mellanox.com> |
IB/iser: Place the fmr pool into a union in iser's IB conn struct This is preparation step for other memory registration methods to be added. In addition, change reg/unreg routines signature to indicate they use FMRs. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
|
#
b4e155ff |
|
27-Jul-2013 |
Sagi Grimberg <sagig@mellanox.com> |
IB/iser: Generalize rdma memory registration Currently the driver uses FMRs as the only means to register the memory pointed by SG provided by the SCSI mid-layer with the RDMA device. As preparation step for adding more methods for fast path memory registration, make the alloc/free and reg/unreg calls function pointers, which are for now just set to the existing FMR ones. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
|
#
b7f04513 |
|
27-Jul-2013 |
Shlomo Pongratz <shlomop@mellanox.com> |
IB/iser: Accept session->cmds_max from user space Use cmds_max passed from user space to be the number of PDUs to be supported for the session instead of hard-coded ISCSI_DEF_XMIT_CMDS_MAX. This allow controlling the max number of SCSI commands for the session. Also don't ignore the qdepth passed from user space. Derive from session->cmds_max the actual number of RX buffers and FMR pool size to allocate during the connection bind phase. Since the iser transport connection is established before the iscsi session/connection are created and bound, we still use one hard-coded quantity ISER_DEF_XMIT_CMDS_MAX to compute the maximum number of work-requests to be supported by the RC QP used for the connection. The above quantity is made to be a power of two between ISCSI_TOTAL_CMDS_MIN (16) and ISER_DEF_XMIT_CMDS_MAX (512) inclusive. Signed-off-by: Shlomo Pongratz <shlomop@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
|
#
986db0d6 |
|
27-Jul-2013 |
Shlomo Pongratz <shlomop@mellanox.com> |
IB/iser: Restructure allocation/deallocation of connection resources This is a preparation step to a patch that accepts the number of max SCSI commands to be supported a session from user space iSCSI tools. Move the allocation of the login buffer, FMR pool and its associated page vector from iser_create_ib_conn_res() (which is called prior when we actually know how many commands should be supported) to iser_alloc_rx_descriptors() (which is called during the iscsi connection bind step where this quantity is known). Also do small refactoring around the deallocation to make that path similar to the allocation one. Signed-off-by: Shlomo Pongratz <shlomop@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
|
#
f91424cf |
|
27-Jul-2013 |
Or Gerlitz <ogerlitz@mellanox.com> |
IB/iser: Use proper debug level value for info prints Commit 4f363882612 ("IB/iser: Move informational messages from error to info level") set info prints to be emitted at a lower debug level than warning prints, which is a bit odd. Fix that. Also move the prints on unaligned SG from warning to debug level. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
|
#
28f292e8 |
|
07-May-2013 |
Or Gerlitz <ogerlitz@mellanox.com> |
IB/iser: Add Mellanox copyright Add Mellanox copyright to the iser initiator source code which I maintain. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
|
#
8d8399de |
|
01-May-2013 |
Or Gerlitz <ogerlitz@mellanox.com> |
IB/iser: Add support for iser CM REQ additional info Annex A12 of the IBTA spec defines additional information that needs to be provided through the CM exchange relating to usage of ZBVA (Zero Based VAs) and Send With Invalidate over an iSER connection. Currently, the initiator sets both to not supported, but does provide the header so that existing iSER targets can be patched to start looking on the private data carried by the CM. This is a preparation step to enable iSER with HW drivers for which FMRs are not supported, such as mlx4 VF instances or new HW devices which might support only FRWR (Fast Registration Work-Requests) along the details of the IB_DEVICE_MEM_MGT_EXTENSIONS device capability. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
|
#
4f363882 |
|
01-May-2013 |
Roi Dayan <roid@mellanox.com> |
IB/iser: Move informational messages from error to info level Introduce iser_info() and move informational messages that were printed as errors to use that macro. Also, cleanup printk leftovers to use the existing macros. Signed-off-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> [ Use pr_warn(... instead of printk(KERN_WARNING .... - Roland ] Signed-off-by: Roland Dreier <roland@purestorage.com>
|
#
c1d786e6 |
|
01-May-2013 |
Roi Dayan <roid@mellanox.com> |
IB/iser: Add module version Add displaying module version, update the version to 1.1, and remove the DRV_DATE define. Signed-off-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
|
#
b96e4aba |
|
21-Feb-2013 |
Or Gerlitz <ogerlitz@mellanox.com> |
IB/iser: Use proper define for the commands per LUN value advertised to SCSI ML ISER_DEF_CMD_PER_LUN was meant to be ISCSI_DEF_XMIT_CMDS_MAX, not plain 128 Signed-off-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
|
#
5a33a669 |
|
23-Sep-2012 |
Alex Tabachnik <alext@mellanox.com> |
IB/iser: Add more RX CQs to scale out processing of SCSI responses RX/TX CQs will now be selected from a per HCA pool. For the RX flow this has the effect of using different interrupt vectors when using low level drivers (such as mlx4) that map the "vector" param provided by the ULP on CQ creation to a dedicated IRQ/MSI-X vector. This allows the RX flow processing of IO responses to be distributed across multiple CPUs. QPs (--> iSER sessions) are assigned to CQs in round robin order using the CQ with the minimum number of sessions attached to it. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Alex Tabachnik <alext@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
|
#
89e984e2 |
|
05-Mar-2012 |
Or Gerlitz <ogerlitz@mellanox.com> |
IB/iser: Post initial receive buffers before sending the final login request An iser target may send iscsi NO-OP PDUs as soon as it marks the iSER iSCSI session as fully operative. This means that there is window where there are no posted receive buffers on the initiator side, so it's possible for the iSER RC connection to break because of RNR NAK / retry errors. To fix this, rely on the flags bits in the login request to have FFP (0x3) in the lower nibble as a marker for the final login request, and post an initial chunk of receive buffers before sending that login request instead of after getting the login response. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Cc: <stable@vger.kernel.org> Signed-off-by: Roland Dreier <roland@purestorage.com>
|
#
52439540 |
|
03-Nov-2011 |
Or Gerlitz <ogerlitz@mellanox.com> |
IB/iser: DMA unmap TX bufs used for iSCSI/iSER headers The current driver never does DMA unmapping on these buffers. Fix that by adding DMA unmapping to the task cleanup callback, and DMA mapping to the task init function (drop the headers_initialized micro-optimization). Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
|
#
2c4ce609 |
|
03-Nov-2011 |
Or Gerlitz <ogerlitz@mellanox.com> |
IB/iser: Use separate buffers for the login request/response The driver counted on the transactional nature of iSCSI login/text flows and used the same buffer for both the request and the response. We also went further and did DMA mapping only once, with DMA_FROM_DEVICE, which violates the DMA mapping API. Fix that by using different buffers, one for requests and one for responses, and use the correct DMA mapping direction for each. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
|
#
0ace64b8 |
|
01-Aug-2011 |
Or Gerlitz <ogerlitz@mellanox.com> |
IBiser: Fix wrong mask when sizeof (dma_addr_t) > sizeof (unsigned long) The code that prepares the SG associated with SCSI command for FMR was buggy for systems with DMA addresses that don't fit in unsigned long, e.g under the 32-bit based XenServer dom0 sizeof(dma_addr_t) is 8. Fix that by casting to unsigned long long a masking constant used by the code. This resolves a crash in iser_sg_to_page_vec on this system. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.co.il> Signed-off-by: Roland Dreier <roland@purestorage.com>
|
#
a6b7a407 |
|
06-Jun-2011 |
Alexey Dobriyan <adobriyan@gmail.com> |
net: remove interrupt.h inclusion from netdevice.h * remove interrupt.g inclusion from netdevice.h -- not needed * fixup fallout, add interrupt.h and hardirq.h back where needed. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
25985edc |
|
30-Mar-2011 |
Lucas De Marchi <lucas.demarchi@profusion.mobi> |
Fix common misspellings Fixes generated by 'codespell' and manually reviewed. Signed-off-by: Lucas De Marchi <lucas.demarchi@profusion.mobi>
|
#
39ff05db |
|
05-May-2010 |
Or Gerlitz <ogerlitz@voltaire.com> |
IB/iser: Enhance disconnection logic for multi-pathing The iser connection teardown flow isn't over until the underlying Connection Manager (e.g the IB CM) delivers a disconnected or timeout event through the RDMA-CM. When the remote (target) side isn't reachable, e.g when some HW e.g port/hca/switch isn't functioning or taken down administratively, the CM timeout flow is used and the event may be generated only after relatively long time -- on the order of tens of seconds. The current iser code exposes this possibly long delay to higher layers, specifically to the iscsid daemon and iscsi kernel stack. As a result, the iscsi stack doesn't respond well: this low-level CM delay is added to the fail-over time under HA schemes such as the one provided by DM multipath through the multipathd(8) service. This patch enhances the reference counting scheme on iser's IB connections so that the disconnect flow initiated by iscsid from user space (ep_disconnect) doesn't wait for the CM to deliver the disconnect/timeout event. (The connection teardown isn't done from iser's view point until the event is delivered) The iser ib (rdma) connection object is destroyed when its reference count reaches zero. When this happens on the RDMA-CM callback context, extra care is taken so that the RDMA-CM does the actual destroying of the associated ID, since doing it in the callback is prohibited. The reference count of iser ib connection normally reaches three, where the <ref, deref> relations are 1. conn <init, terminate> 2. conn <bind, stop/destroy> 3. cma id <create, disconnect/error/timeout callbacks> With this patch, multipath fail-over time is about 30 seconds, while without this patch, multipath fail-over time is about 130 seconds. Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
2110f9bf |
|
05-May-2010 |
Or Gerlitz <ogerlitz@voltaire.com> |
IB/iser: Add asynchronous event handler Add handler to handle events such as port up and down. This is useful when testing high-availability schemes such as multi-pathing. Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
aae3c995 |
|
08-Feb-2010 |
Or Gerlitz <ogerlitz@voltaire.com> |
IB/iser: Remove unnecessary connection checks Remove unnecessary checks for the IB connection state and for QP overflow, as conn state changes are reported by iSER to libiscsi and handled there. QP overflow is theoretically possible only when unsolicited data-outs are used; anyway it's being checked and handled by HW drivers. Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
f19624aa |
|
08-Feb-2010 |
Or Gerlitz <ogerlitz@voltaire.com> |
IB/iser: Simplify send flow/descriptors Simplify and shrink the logic/code used for the send descriptors. Changes include removing struct iser_dto (an unnecessary abstraction), using struct iser_regd_buf only for handling SCSI commands, using dma_sync instead of dma_map/unmap, etc. Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
78ad0a34 |
|
08-Feb-2010 |
Or Gerlitz <ogerlitz@voltaire.com> |
IB/iser: Use different CQ for send completions Use a different CQ for send completions, where send completions are polled by the interrupt-driven receive completion handler. Therefore, interrupts aren't used for the send CQ. Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
704315f0 |
|
08-Feb-2010 |
Or Gerlitz <ogerlitz@voltaire.com> |
IB/iser: Remove atomic counter for posted receive buffers Now that both the posting and reaping of receive buffers is done in the completion path, the counter of outstanding buffers not be atomic. Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
bcc60c38 |
|
08-Feb-2010 |
Or Gerlitz <ogerlitz@voltaire.com> |
IB/iser: New receive buffer posting logic Currently, the recv buffer posting logic is based on the transactional nature of iSER which allows for posting a buffer before sending a PDU. Change this to post only when the number of outstanding recv buffers is below a water mark and in a batched manner, thus simplifying and optimizing the data path. Use a pre-allocated ring of recv buffers instead of allocating from kmem cache. A special treatment is given to the login response buffer whose size must be 8K unlike the size of buffers used for any other purpose which is 128 bytes. Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
1cef4659 |
|
08-Feb-2010 |
Or Gerlitz <ogerlitz@voltaire.com> |
IB/iser: Revert commit bba7ebb "avoid recv buffer exhaustion" We will make a major change in the recv buffer posting logic, after which the problem commit bba7ebb "avoid recv buffer exhaustion caused by unexpected PDUs" comes to solve doesn't exist any more, so revert it. Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
e28f3d5b |
|
05-Mar-2009 |
Mike Christie <michaelc@cs.wisc.edu> |
[SCSI] libiscsi: don't cap queue depth in iscsi modules There is no need to cap the queue depth in the modules. We set this in userspace and can do that there. For performance testing with ram based targets, this is helpful since we can have very high queue depths. Signed-off-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
|
#
bba7ebba |
|
21-Dec-2008 |
David Disseldorp <ddiss@sgi.com> |
IB/iser: Avoid recv buffer exhaustion caused by unexpected PDUs iSCSI/iSER targets may send PDUs without a prior request from the initiator. RFC 5046 refers to these PDUs as "unexpected". NOP-In PDUs with itt=RESERVED and Asynchronous Message PDUs occupy this category. The amount of active "unexpected" PDU's an iSER target may have at any time is governed by the MaxOutstandingUnexpectedPDUs key, which is not yet supported. Currently when an iSER target sends an "unexpected" PDU, the initiators recv buffer consumed by the PDU is not replaced. If over initial_post_recv_bufs_num "unexpected" PDUs are received then the receive queue will run out of receive work requests entirely. This patch ensures recv buffers consumed by "unexpected" PDUs are replaced in the next iser_post_receive_control() call. Signed-off-by: David Disseldorp <ddiss@sgi.com> Signed-off-by: Ken Sandars <ksandars@sgi.com> Acked-by: Or Gerlitz <ogerlitz@voltaire.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
f3781d2e |
|
15-Jul-2008 |
Roland Dreier <rolandd@cisco.com> |
RDMA: Remove subversion $Id tags They don't get updated by git and so they're worse than useless. Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
913e5bf4 |
|
21-May-2008 |
Mike Christie <michaelc@cs.wisc.edu> |
[SCSI] libiscsi, iser, tcp: remove recv_lock The recv lock was defined so the iscsi layer could block the recv path from processing IO during recovery. It turns out iser just set a lock to that pointer which was pointless. We now disconnect the transport connection before doing recovery so we do not need the recv lock. For iscsi_tcp we still stop the recv path incase older tools are being used. This patch also has iscsi_itt_to_ctask user grab the session lock and has the caller access the task with the lock or get a ref to it in case the target is broken and sends a tmf success response then sends data or a response for the command that was supposed to be affected bty the tmf. Signed-off-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
|
#
412eeafa |
|
21-May-2008 |
Mike Christie <michaelc@cs.wisc.edu> |
[SCSI] iser: Modify iser to take a iscsi_endpoint struct in ep callouts and session setup This hooks iser into the iscsi endpoint code. Previously it handled the lookup and allocation. This has been made generic so bnx2i and iser can share it. It also allows us to pass iser the leading conn's ep, so we know the ib_deivce being used and can set it as the scsi_host's parent. And that allows scsi-ml to set the dma_mask based on those values. Signed-off-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
|
#
2261ec3d |
|
21-May-2008 |
Mike Christie <michaelc@cs.wisc.edu> |
[SCSI] iser: handle iscsi_cmd_task rename This handles the iscsi_cmd_task rename and renames the iser cmd task to iser task. Signed-off-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
|
#
2747fdb2 |
|
21-May-2008 |
Mike Christie <michaelc@cs.wisc.edu> |
[SCSI] iser: convert ib_iser to support merged tasks Convert ib_iser to support merged tasks. Signed-off-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
|
#
b40977d9 |
|
21-May-2008 |
Mike Christie <michaelc@cs.wisc.edu> |
[SCSI] iser: fix handling of scsi cmnds during recovery. After the stop_conn callback has returned the LLD should not touch the scsi cmds. iscsi_tcp and libiscsi use the conn->recv_lock and suspend_rx field to halt recv path processing, but iser does not have any protection. This patch modifies iser so that userspace can just call the ep_disconnect callback, which will halt all recv IO, before calling the stop_conn callback so we do not have to worry about the conn->recv_lock and suspend rx field. iser just needs to stop the send side from accessing the ib conn. Fixup to handle when the ep poll fails and ep disconnect is called from Erez. Signed-off-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
|
#
d3826721 |
|
21-May-2008 |
Mike Christie <michaelc@cs.wisc.edu> |
[SCSI] iscsi class, iscsi drivers: remove unused iscsi_transport attrs max_cmd_len and max_conn are not really used. max_cmd_len is always 16 and can be set by the LLD. max_conn is always one since we do not support MCS. Signed-off-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
|
#
6f735e36 |
|
29-Apr-2008 |
Eli Dorfman <dorfman.eli@gmail.com> |
IB/iser: Move high-volume debug output to higher debug level Add another level for debug. Signed-off-by: Eli Dorfman <elid@voltaire.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
41179e2d |
|
17-Jul-2007 |
Roland Dreier <rolandd@cisco.com> |
IB/iser: Make a couple of functions static Make iser_conn_release() and iser_start_rdma_unaligned_sg() static, since they are only used in the .c file where they are defined. In addition to being a cleanup, this even shrinks the generated code by allowing the single call of iser_start_rdma_unaligned_sg() to be inlined into its callsite. On x86_64: add/remove: 0/1 grow/shrink: 1/0 up/down: 466/-533 (-67) function old new delta iser_reg_rdma_mem 1518 1984 +466 iser_start_rdma_unaligned_sg 533 - -533 Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
1548271e |
|
29-May-2007 |
Mike Christie <michaelc@cs.wisc.edu> |
[SCSI] libiscsi: make can_queue configurable This patch allows us to set can_queue and cmds_per_lun from userspace when we create the session/host. From there we can set it on a per target basis. The patch fully converts iscsi_tcp, but only hooks up ib_iser for cmd_per_lun since it currently has a lots of preallocations based on can_queue. Signed-off-by: Mike Christie <michaelc@cs.wisc.edu> Cc: Roland Dreier <rdreier@cisco.com> Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
|
#
1d426d64 |
|
31-Mar-2007 |
Erez Zilber <erezz@voltaire.com> |
IB/iser: Don't defer connection failure notification to workqueue When a connection is terminated asynchronously from the iSCSI layer's perspective, iSER needs to notify the iSCSI layer that the connection has failed. This is done using a workqueue (switched to from the iSER tasklet context). Meanwhile, the connection object (that holds the work struct) is released. If the workqueue function wasn't called yet, it will be called later with a NULL pointer, which will crash the kernel. The context switch (tasklet to workqueue) is not required, and everything can be done from the iSER tasklet. This eliminates the NULL work struct bug (and simplifies the code). Signed-off-by: Erez Zilber <erezz@voltaire.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
5180311f |
|
12-Dec-2006 |
Ralph Campbell <ralph.campbell@qlogic.com> |
IB/iser: Use the new verbs DMA mapping functions Convert iSER to use the new verbs DMA mapping functions for kernel verbs consumers. Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
e54f8188 |
|
29-Nov-2006 |
Roland Dreier <rolandd@cisco.com> |
IB: Convert kmem_cache_t -> struct kmem_cache Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
74a20780 |
|
27-Sep-2006 |
Erez Zilber <erezz@voltaire.com> |
IB/iser: DMA unmap unaligned for RDMA data before touching it iSER uses the DMA mapping api to map the page holding the SCSI command data to the HCA DMA address space. When the command data is not aligned for RDMA, the data is copied to/from an allocated buffer which in turn is used for executing this command. The pages associated with the command must be unmapped before being touched. Signed-off-by: Erez Zilber <erezz@voltaire.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
87e8df7a |
|
27-Sep-2006 |
Erez Zilber <erezz@voltaire.com> |
IB/iser: Have iSER data transaction object point to iSER conn iSER uses a data transaction object (struct iser_dto) as part of its IB data descriptors (struct iser_desc) management. It also uses a hierarchy of connection structures pointing to each other. A DTO may exist even after the iscsi_iser connection pointed by it is destroyed (eg one that is bound to a post receive buffer which was flushed by the IB HW). Hence DTOs need point to the lowest connection, which is struct iser_conn. Signed-off-by: Erez Zilber <erezz@voltaire.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
d8111028 |
|
10-Sep-2006 |
Erez Zilber <erezz@voltaire.com> |
IB/iser: Do not use FMR for a single dma entry sg Fast Memory Registration (fmr) is used to register for rdma an sg whose elements are not linearly sequential after dma mapping. The IB verbs layer provides an "all dma memory MR (memory region)" which can be used for RDMA-ing a dma linearly sequential buffer. Change the code to use the dma mr instead of doing fmr when dma mapping produces a single dma entry sg. Signed-off-by: Erez Zilber <erezz@voltaire.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
8dfa0876 |
|
10-Sep-2006 |
Erez Zilber <erezz@voltaire.com> |
IB/iser: make FMR "page size" be 4K and not PAGE_SIZE As iser is able to use at most one rdma operation for the execution of a scsi command, and registration of the sg associated with scsi command has its restrictions, the code checks if an sg is "aligned for rdma". Alignment for rdma is measured in "fmr page" units whose possible resolutions are different between HCAs and can be smaller, equal or bigger to the system page size. When the system page size is bigger than 4KB (eg the default with ia64 kernels) there a bigger chance that an sg would be aligned for rdma if the fmr page size is 4KB. Change the code to create FMR whose pages are of size 4KB and to take that into account when processing the sg. Signed-off-by: Erez Zilber <erezz@voltaire.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
ffd0436e |
|
31-Aug-2006 |
Mike Christie <michaelc@cs.wisc.edu> |
[SCSI] libiscsi, iscsi_tcp, iscsi_iser: check that burst lengths are valid. iSCSI RFC states that the first burst length must be smaller than the max burst length. We currently assume targets will be good, but that may not be the case, so this patch adds a check. This patch also moves the unsol data out offset to the lib so the LLDs do not have to track it. Signed-off-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
|
#
49cd5382 |
|
11-May-2006 |
Or Gerlitz <ogerlitz@voltaire.com> |
IB/iser: iSCSI iSER transport provider header file iSER (iSCSI Extensions for RDMA) transport provider driver for the iSCSI initiator, whose other parts (under drivers/scsi) are scsi_transport_iscsi - the transport management module, iscsi_tcp - the TCP transport provider module and libiscsi - a kernel library (module) implementing functionality needed by both TCP and iSER transports. iSER is both a provider of the iSCSI transport api and a SCSI low level driver. This file contains internal data structures and non static service functions. Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
|