#
c4d32e77 |
|
20-Nov-2023 |
Md Haris Iqbal <haris.iqbal@ionos.com> |
RDMA/rtrs-srv: Destroy path files after making sure no IOs in-flight Destroying path files may lead to the freeing of rdma_stats. This creates the following race. An IO is in-flight, or has just passed the session state check in process_read/process_write. The close_work gets triggered and the function rtrs_srv_close_work() starts and does destroy path which frees the rdma_stats. After this the function process_read/process_write resumes and tries to update the stats through the function rtrs_srv_update_rdma_stats This commit solves the problem by moving the destroy path function to a later point. This point makes sure any inflights are completed. This is done by qp drain, and waiting for all in-flights through ops_id. Fixes: 9cb837480424 ("RDMA/rtrs: server: main functionality") Signed-off-by: Md Haris Iqbal <haris.iqbal@ionos.com> Signed-off-by: Santosh Kumar Pradhan <santosh.pradhan@ionos.com> Signed-off-by: Grzegorz Prajsner <grzegorz.prajsner@ionos.com> Link: https://lore.kernel.org/r/20231120154146.920486-6-haris.iqbal@ionos.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
#
3a71cd6c |
|
20-Nov-2023 |
Md Haris Iqbal <haris.iqbal@ionos.com> |
RDMA/rtrs-srv: Free srv_mr iu only when always_invalidate is true Since srv_mr->iu is allocated and used only when always_invalidate is true, free it only when always_invalidate is true. Fixes: 9cb837480424 ("RDMA/rtrs: server: main functionality") Signed-off-by: Md Haris Iqbal <haris.iqbal@ionos.com> Signed-off-by: Jack Wang <jinpu.wang@ionos.com> Signed-off-by: Grzegorz Prajsner <grzegorz.prajsner@ionos.com> Link: https://lore.kernel.org/r/20231120154146.920486-5-haris.iqbal@ionos.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
#
ed1e52ae |
|
20-Nov-2023 |
Md Haris Iqbal <haris.iqbal@ionos.com> |
RDMA/rtrs-srv: Check return values while processing info request While processing info request, it could so happen that the srv_path goes to CLOSING state, cause of any of the error events from RDMA. That state change should be picked up while trying to change the state in process_info_req, by checking the return value. In case the state change call in process_info_req fails, we fail the processing. We should also check the return value for rtrs_srv_path_up, since it sends a link event to the client above, and the client can fail for any reason. Fixes: 9cb837480424 ("RDMA/rtrs: server: main functionality") Signed-off-by: Md Haris Iqbal <haris.iqbal@ionos.com> Signed-off-by: Jack Wang <jinpu.wang@ionos.com> Signed-off-by: Grzegorz Prajsner <grzegorz.prajsner@ionos.com> Link: https://lore.kernel.org/r/20231120154146.920486-4-haris.iqbal@ionos.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
#
3ee7ecd7 |
|
20-Nov-2023 |
Jack Wang <jinpu.wang@ionos.com> |
RDMA/rtrs-srv: Do not unconditionally enable irq When IO is completed, rtrs can be called in softirq context, unconditionally enabling irq could cause panic. To be on safe side, use spin_lock_irqsave and spin_unlock_irqrestore instread. Fixes: 9cb837480424 ("RDMA/rtrs: server: main functionality") Signed-off-by: Jack Wang <jinpu.wang@ionos.com> Signed-off-by: Florian-Ewald Mueller <florian-ewald.mueller@ionos.com> Signed-off-by: Md Haris Iqbal <haris.iqbal@ionos.com> Signed-off-by: Grzegorz Prajsner <grzegorz.prajsner@ionos.com> Link: https://lore.kernel.org/r/20231120154146.920486-2-haris.iqbal@ionos.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
#
64917f4c |
|
04-Aug-2023 |
Ivan Orlov <ivan.orlov0322@gmail.com> |
RDMA: Make all 'class' structures const Now that the driver core allows for struct class to be in read-only memory, making all 'class' structures to be declared at build time placing them into read-only memory, instead of having to be dynamically allocated at load time. Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Leon Romanovsky <leon@kernel.org> Cc: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> Cc: "Md. Haris Iqbal" <haris.iqbal@ionos.com> Cc: Jack Wang <jinpu.wang@ionos.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Yishai Hadas <yishaih@nvidia.com> Cc: Ivan Orlov <ivan.orlov0322@gmail.com> Cc: Benjamin Tissoires <benjamin.tissoires@redhat.com> Suggested-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Ivan Orlov <ivan.orlov0322@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Link: https://lore.kernel.org/r/2023080427-commuting-crewless-cbee@gregkh Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
#
1aaba11d |
|
13-Mar-2023 |
Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
driver core: class: remove module * from class_create() The module pointer in class_create() never actually did anything, and it shouldn't have been requred to be set as a parameter even if it did something. So just remove it and fix up all callers of the function in the kernel tree at the same time. Cc: "Rafael J. Wysocki" <rafael@kernel.org> Acked-by: Benjamin Tissoires <benjamin.tissoires@redhat.com> Link: https://lore.kernel.org/r/20230313181843.1207845-4-gregkh@linuxfoundation.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
#
a4399563 |
|
17-Nov-2022 |
Guoqing Jiang <guoqing.jiang@linux.dev> |
RDMA/rtrs-srv: Remove outdated comments from create_con Remove the orphan comments. Acked-by: Md Haris Iqbal <haris.iqbal@ionos.com> Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev> Link: https://lore.kernel.org/r/20221117101945.6317-6-guoqing.jiang@linux.dev Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
#
102d2f70 |
|
17-Nov-2022 |
Guoqing Jiang <guoqing.jiang@linux.dev> |
RDMA/rtrs-srv: Correct the checking of ib_map_mr_sg We should check with nr_sgt, also the only successful case is that all sg elements are mapped, so make it explicitly. Acked-by: Jack Wang <jinpu.wang@ionos.com> Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev> Link: https://lore.kernel.org/r/20221117101945.6317-4-guoqing.jiang@linux.dev Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
#
0f597ac6 |
|
17-Nov-2022 |
Guoqing Jiang <guoqing.jiang@linux.dev> |
RDMA/rtrs-srv: Refactor the handling of failure case in map_cont_bufs Let's call unmap_cont_bufs when failure happens, and also only update mrs_num after everything is settled which means we can remove 'mri'. Acked-by: Md Haris Iqbal <haris.iqbal@ionos.com> Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev> Link: https://lore.kernel.org/r/20221117101945.6317-3-guoqing.jiang@linux.dev Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
#
d7115727 |
|
17-Nov-2022 |
Guoqing Jiang <guoqing.jiang@linux.dev> |
RDMA/rtrs-srv: Refactor rtrs_srv_rdma_cm_handler The RDMA_CM_EVENT_CONNECT_REQUEST is quite different to other types, let's check it separately at the beginning of routine, then we can avoid the indentation accordingly. Acked-by: Jack Wang <jinpu.wang@ionos.com> Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev> Link: https://lore.kernel.org/r/20221117101945.6317-2-guoqing.jiang@linux.dev Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
#
6edd86a2 |
|
26-Aug-2022 |
Guoqing Jiang <guoqing.jiang@linux.dev> |
RDMA/rtrs: Remove 'dir' argument from rnbd_srv_rdma_ev Since process_{read,write} already prints direction info if ctx->ops.rdma_ev fails, no need to pass 'dir'. Link: https://lore.kernel.org/r/20220826081117.21687-1-guoqing.jiang@linux.dev Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev> Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
#
c16762b7 |
|
17-Aug-2022 |
Santosh Pradhan <santosh.pradhan@ionos.com> |
RDMA/rtrs-srv: Add event tracing support Add event tracing mechanism for following routines: - send_io_resp_imm() How to use: 1. Load the rtrs_server module 2. cd /sys/kernel/debug/tracing 3. If all the events need to be enabled: echo 1 > events/rtrs_srv/enable 4. OR only speific routine/event needs to be enabled e.g. echo 1 > events/rtrs_srv/send_io_resp_imm/enable 5. cat trace 6. Run some I/O workload which can trigger send_io_resp_imm() Link: https://lore.kernel.org/r/20220818105240.110234-3-haris.iqbal@ionos.com Signed-off-by: Santosh Pradhan <santosh.pradhan@ionos.com> Signed-off-by: Jack Wang <jinpu.wang@ionos.com> Signed-off-by: Md Haris Iqbal <haris.iqbal@ionos.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
#
56c310de |
|
17-Aug-2022 |
Jack Wang <jinpu.wang@ionos.com> |
RDMA/rtrs-srv: Pass the correct number of entries for dma mapped SGL ib_dma_map_sg() augments the SGL into a 'dma mapped SGL'. This process may change the number of entries and the lengths of each entry. Code that touches dma_address is iterating over the 'dma mapped SGL' and must use dma_nents which returned from ib_dma_map_sg(). We should use the return count from ib_dma_map_sg for futher usage. Fixes: 9cb837480424e ("RDMA/rtrs: server: main functionality") Link: https://lore.kernel.org/r/20220818105355.110344-4-haris.iqbal@ionos.com Signed-off-by: Jack Wang <jinpu.wang@ionos.com> Reviewed-by: Aleksei Marov <aleksei.marov@ionos.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
#
46195de3 |
|
11-Jul-2022 |
Jack Wang <jinpu.wang@ionos.com> |
RDMA/rtrs-srv: Do not use mempool for page allocation The mempool is for guaranteed memory allocation during extreme VM load (see the header of mempool.c of the kernel). But rtrs-srv allocates pages only when creating new session. There is no need to use the mempool. With the removal of mempool, rtrs-server no longer need to reserve huge mount of memory, this will avoid error like this: https://lore.kernel.org/lkml/20220620020727.GA3669@xsang-OptiPlex-9020/ Link: https://lore.kernel.org/r/20220712103113.617754-6-haris.iqbal@ionos.com Reported-by: kernel test robot <oliver.sang@intel.com> Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com> Signed-off-by: Jack Wang <jinpu.wang@ionos.com> Signed-off-by: Md Haris Iqbal <haris.iqbal@ionos.com> Acked-by: Guoqing Jiang <guoqing.jiang@linux.dev> Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
#
f4e13571 |
|
11-Jul-2022 |
Santosh Kumar Pradhan <santosh.pradhan@ionos.com> |
RDMA/rtrs-srv: Use per-cpu variables for rdma stats Convert server stat counters from atomic to per-cpu variables. Link: https://lore.kernel.org/r/20220712103113.617754-4-haris.iqbal@ionos.com Signed-off-by: Santosh Kumar Pradhan <santosh.pradhan@ionos.com> Signed-off-by: Jack Wang <jinpu.wang@ionos.com> Signed-off-by: Md Haris Iqbal <haris.iqbal@ionos.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
#
f7ecac6a |
|
05-Jan-2022 |
Vaishali Thakkar <vaishali.thakkar@ionos.com> |
RDMA/rtrs-srv: Rename rtrs_srv to rtrs_srv_sess Structure rtrs_srv is used for sessions so in order to avoid confusions rename it to rtrs_srv_sess. All changes were done with the help of following Coccinelle script: @@ @@ struct - rtrs_srv + rtrs_srv_sess Link: https://lore.kernel.org/r/20220105180708.7774-5-jinpu.wang@ionos.com Signed-off-by: Vaishali Thakkar <vaishali.thakkar@ionos.com> Signed-off-by: Jack Wang <jinpu.wang@ionos.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
#
ae4c8164 |
|
05-Jan-2022 |
Vaishali Thakkar <vaishali.thakkar@ionos.com> |
RDMA/rtrs-srv: Rename rtrs_srv_sess to rtrs_srv_path rtrs_srv_sess is used for paths and not sessions on the server side. This creates confusion so let's rename it to rtrs_srv_path. Also, rename related variables and functions. Coccinelle is used to do the transformations for most of the occurrences and remaining ones were handled manually. Link: https://lore.kernel.org/r/20220105180708.7774-3-jinpu.wang@ionos.com Signed-off-by: Vaishali Thakkar <vaishali.thakkar@ionos.com> Signed-off-by: Jack Wang <jinpu.wang@ionos.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
#
d9372794 |
|
05-Jan-2022 |
Vaishali Thakkar <vaishali.thakkar@ionos.com> |
RDMA/rtrs: Rename rtrs_sess to rtrs_path rtrs_sess is in fact a path. This makes it confusing and difficult to get into the code. So let's rename the structure and related use cases of it. Coccinelle was used to do the transformation for most of the occurrences and remaining ones were handled manually. Link: https://lore.kernel.org/r/20220105180708.7774-2-jinpu.wang@ionos.com Signed-off-by: Vaishali Thakkar <vaishali.thakkar@ionos.com> Signed-off-by: Jack Wang <jinpu.wang@ionos.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
#
dea7bb3a |
|
22-Sep-2021 |
Md Haris Iqbal <haris.iqbal@ionos.com> |
RDMA/rtrs: Do not allow sessname to contain special symbols / and . Allowing these characters in sessname can lead to unexpected results, particularly because / is used as a separator between files in a path, and . points to the current directory. Link: https://lore.kernel.org/r/20210922125333.351454-7-haris.iqbal@ionos.com Signed-off-by: Md Haris Iqbal <haris.iqbal@ionos.com> Reviewed-by: Gioh Kim <gi-oh.kim@ionos.com> Reviewed-by: Aleksei Marov <aleksei.marov@ionos.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
#
4b6afe9b |
|
22-Sep-2021 |
Jack Wang <jinpu.wang@ionos.com> |
RDMA/rtrs: Fix warning when use poll mode on client side. When testing with poll mode, it will fail and lead to warning below on client side: $ echo "sessname=bla path=gid:fe80::2:c903:4e:d0b3@gid:fe80::2:c903:8:ca17 device_path=/dev/nullb2 nr_poll_queues=-1" | \ sudo tee /sys/devices/virtual/rnbd-client/ctl/map_device rnbd_client L597: Mapping device /dev/nullb2 on session bla, (access_mode: rw, nr_poll_queues: 8) WARNING: CPU: 3 PID: 9886 at drivers/infiniband/core/cq.c:447 ib_cq_pool_get+0x26f/0x2a0 [ib_core] The problem is in case of poll queue, we need to still call ib_alloc_cq/ib_free_cq, we can't use cq_poll api for poll queue. As both client and server use shared function from rtrs, set irq_con_num to con_num on server side, which is number of total connection of the session, this way we can differ if the rtrs_con requires pollqueue. Following up patches will replace the duplicate code with helpers. Link: https://lore.kernel.org/r/20210922125333.351454-4-haris.iqbal@ionos.com Signed-off-by: Jack Wang <jinpu.wang@ionos.com> Reviewed-by: Gioh Kim <gi-oh.kim@ionos.com> Signed-off-by: Md Haris Iqbal <haris.iqbal@cloud.ionos.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
#
cbe2de39 |
|
06-Aug-2021 |
Gioh Kim <gi-oh.kim@ionos.com> |
RDMA/rtrs: Remove (void) casting for functions Casting to (void) does nothing, remove them. Link: https://lore.kernel.org/r/20210806112112.124313-7-haris.iqbal@ionos.com Suggested-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com> Reviewed-by: Md Haris Iqbal <haris.iqbal@ionos.com> Signed-off-by: Jack Wang <jinpu.wang@ionos.com> Signed-off-by: Md Haris Iqbal <haris.iqbal@ionos.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
#
4693d6b7 |
|
06-Aug-2021 |
Gioh Kim <gi-oh.kim@ionos.com> |
RDMA/rtrs: Remove all likely and unlikely The IO performance test with fio after swapping the likely and unlikely macros in all if-statement shows no difference. They do not help for the performance of rtrs. Thanks to Haakon Bugge for the test scenario. The fio test did random read on 32 rnbd devices and 64 processes. Test environment: - Intel(R) Xeon(R) Gold 6130 CPU @ 2.10GHz - 376G memory - kernel version: 5.4.86 - gcc version: gcc (Debian 8.3.0-6) 8.3.0 - Infiniband controller: Mellanox Technologies MT27800 Family [ConnectX-5] Test result: - before swapping: IOPS=829k, BW=3239MiB/s - after swapping: IOPS=829k, BW=3238MiB/s - remove all (un)likely: IOPS=829k, BW=3238MiB/s Link: https://lore.kernel.org/r/20210806112112.124313-5-haris.iqbal@ionos.com Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com> Signed-off-by: Jack Wang <jinpu.wang@ionos.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Md Haris Iqbal <haris.iqbal@ionos.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
#
cfcdbd9d |
|
12-Jul-2021 |
Jack Wang <jinpu.wang@ionos.com> |
RDMA/rtrs: Move sq_wr_avail to rtrs_con In order to account HB for sq_wr_avail properly, move sq_wr_avail from rtrs_srv_con to rtrs_con. Although rtrs-clt do not care sq_wr_avail, but still init it to max_send_wr. Fixes: b38041d50add ("RDMA/rtrs: Do not signal for heatbeat") Link: https://lore.kernel.org/r/20210712060750.16494-7-jinpu.wang@ionos.com Signed-off-by: Jack Wang <jinpu.wang@ionos.com> Reviewed-by: Aleksei Marov <aleksei.marov@ionos.com> Reviewed-by: Gioh Kim <gi-oh.kim@ionos.com> Reviewed-by: Md Haris Iqbal <haris.iqbal@ionos.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
#
e2d98504 |
|
12-Jul-2021 |
Jack Wang <jinpu.wang@ionos.com> |
RDMA/rtrs: Enable the same selective signal for heartbeat and IO On idle session, because we do not do signal for heartbeat, it will overflow the send queue after sometime. To avoid that, we need to enable the signal for heartbeat. To do that, add a new member signal_interval in rtrs_path, which will set min of queue_depth and SERVICE_CON_QUEUE_DEPTH, and track it for both heartbeat and IO, so the sq queue full accounting is correct. Fixes: b38041d50add ("RDMA/rtrs: Do not signal for heatbeat") Link: https://lore.kernel.org/r/20210712060750.16494-4-jinpu.wang@ionos.com Signed-off-by: Jack Wang <jinpu.wang@ionos.com> Reviewed-by: Aleksei Marov <aleksei.marov@ionos.com> Reviewed-by: Gioh Kim <gi-oh.kim@ionos.com> Reviewed-by: Md Haris Iqbal <haris.iqbal@ionos.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
#
a10431ef |
|
12-Jul-2021 |
Jack Wang <jinpu.wang@ionos.com> |
RDMA/rtrs: move wr_cnt from rtrs_srv_con to rtrs_con We need to track also the wr used for heatbeat. This is a preparation for that, will be used in later patch. The io_cnt in rtrs_clt is removed, use wr_cnt instead. Link: https://lore.kernel.org/r/20210712060750.16494-3-jinpu.wang@ionos.com Signed-off-by: Jack Wang <jinpu.wang@ionos.com> Reviewed-by: Aleksei Marov <aleksei.marov@ionos.com> Reviewed-by: Gioh Kim <gi-oh.kim@ionos.com> Reviewed-by: Md Haris Iqbal <haris.iqbal@ionos.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
#
bf194997 |
|
16-Jun-2021 |
Leon Romanovsky <leon@kernel.org> |
RDMA: Fix kernel-doc warnings about wrong comment Compilation with W=1 produces warnings similar to the below. drivers/infiniband/ulp/ipoib/ipoib_main.c:320: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst All such occurrences were found with the following one line git grep -A 1 "\/\*\*" drivers/infiniband/ Link: https://lore.kernel.org/r/e57d5f4ddd08b7a19934635b44d6d632841b9ba7.1623823612.git.leonro@nvidia.com Reviewed-by: Jack Wang <jinpu.wang@ionos.com> #rtrs Reviewed-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
#
a95fbe2a |
|
14-Jun-2021 |
Jack Wang <jinpu.wang@ionos.com> |
RDMA/rtrs: Check device max_qp_wr limit when create QP Currently we only check device max_qp_wr limit for IO connection, but not for service connection. We should check for both. So save the max_qp_wr device limit in wr_limit, and use it for both IO connections and service connections. While at it, also remove an outdated comments. Link: https://lore.kernel.org/r/20210614090337.29557-6-jinpu.wang@ionos.com Suggested-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jack Wang <jinpu.wang@ionos.com> Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
#
354462eb |
|
14-Jun-2021 |
Guoqing Jiang <guoqing.jiang@cloud.ionos.com> |
RDMA/rtrs: Rename cq_size/queue_size to cq_num/queue_num Those variables are passed to create_cq, create_qp, rtrs_iu_alloc and rtrs_iu_free, so these *_size means the num of unit. And cq_size also means number of cq element. Also move the setting of cq_num to common path. Link: https://lore.kernel.org/r/20210614090337.29557-5-jinpu.wang@ionos.com Signed-off-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com> Reviewed-by: Md Haris Iqbal <haris.iqbal@cloud.ionos.com> Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
#
b012f0ad |
|
14-Jun-2021 |
Md Haris Iqbal <haris.iqbal@cloud.ionos.com> |
RDMA/rtrs: RDMA_RXE requires more number of WR When using rdma_rxe, post_one_recv() returns ENOMEM error due to the full recv queue. This patch increase the number of WR for receive queue to support all devices. Link: https://lore.kernel.org/r/20210614090337.29557-4-jinpu.wang@ionos.com Signed-off-by: Md Haris Iqbal <haris.iqbal@cloud.ionos.com> Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com> Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
#
5e91eabf |
|
14-Jun-2021 |
Jack Wang <jinpu.wang@cloud.ionos.com> |
RDMA/rtrs-srv: Set minimal max_send_wr and max_recv_wr Currently rtrs when create_qp use a coarse numbers (bigger in general), which leads to hardware create more resources which only waste memory with no benefits. For max_send_wr, we don't really need alway max_qp_wr size when creating qp, reduce it to cq_size. For max_recv_wr, cq_size is enough. With the patch when sess_queue_depth=128, per session (2 paths) memory consumption reduced from 188 MB to 65MB When always_invalidate is enabled, we need send more wr, so treat it special. Fixes: 9cb837480424e ("RDMA/rtrs: server: main functionality") Link: https://lore.kernel.org/r/20210614090337.29557-2-jinpu.wang@ionos.com Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com> Reviewed-by: Md Haris Iqbal <haris.iqbal@cloud.ionos.com> Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
#
2371c403 |
|
28-May-2021 |
Gioh Kim <gi-oh.kim@cloud.ionos.com> |
RDMA/rtrs-srv: Fix memory leak of unfreed rtrs_srv_stats object When closing a session, currently the rtrs_srv_stats object in the closing session is freed by kobject release. But if it failed to create a session by various reasons, it must free the rtrs_srv_stats object directly because kobject is not created yet. This problem is found by kmemleak as below: 1. One client machine maps /dev/nullb0 with session name 'bla': root@test1:~# echo "sessname=bla path=ip:192.168.122.190 \ device_path=/dev/nullb0" > /sys/devices/virtual/rnbd-client/ctl/map_device 2. Another machine failed to create a session with the same name 'bla': root@test2:~# echo "sessname=bla path=ip:192.168.122.190 \ device_path=/dev/nullb1" > /sys/devices/virtual/rnbd-client/ctl/map_device -bash: echo: write error: Connection reset by peer 3. The kmemleak on server machine reported an error: unreferenced object 0xffff888033cdc800 (size 128): comm "kworker/2:1", pid 83, jiffies 4295086585 (age 2508.680s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<00000000a72903b2>] __alloc_sess+0x1d4/0x1250 [rtrs_server] [<00000000d1e5321e>] rtrs_srv_rdma_cm_handler+0xc31/0xde0 [rtrs_server] [<00000000bb2f6e7e>] cma_ib_req_handler+0xdc5/0x2b50 [rdma_cm] [<00000000e896235d>] cm_process_work+0x2d/0x100 [ib_cm] [<00000000b6866c5f>] cm_req_handler+0x11bc/0x1c40 [ib_cm] [<000000005f5dd9aa>] cm_work_handler+0xe65/0x3cf2 [ib_cm] [<00000000610151e7>] process_one_work+0x4bc/0x980 [<00000000541e0f77>] worker_thread+0x78/0x5c0 [<00000000423898ca>] kthread+0x191/0x1e0 [<000000005a24b239>] ret_from_fork+0x3a/0x50 Fixes: 39c2d639ca183 ("RDMA/rtrs-srv: Set .release function for rtrs srv device during device init") Link: https://lore.kernel.org/r/20210528113018.52290-18-jinpu.wang@ionos.com Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com> Signed-off-by: Md Haris Iqbal <haris.iqbal@ionos.com> Signed-off-by: Jack Wang <jinpu.wang@ionos.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
#
07c14027 |
|
28-May-2021 |
Gioh Kim <gi-oh.kim@cloud.ionos.com> |
RDMA/rtrs-srv: Duplicated session name is not allowed If two clients try to use the same session name, rtrs-server generates a kernel error that it failed to create the sysfs because the filename is duplicated. This patch adds code to check if there already exists the same session name with the different UUID. If a client tries to add more session, it sends the UUID and the session name. Therefore it is ok if there is already same session name with the same UUID. The rtrs-server must fail only-if there is the same session name with the different UUID. Link: https://lore.kernel.org/r/20210528113018.52290-17-jinpu.wang@ionos.com Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com> Signed-off-by: Aleksei Marov <aleksei.marov@ionos.com> Signed-off-by: Jack Wang <jinpu.wang@ionos.com> Signed-off-by: Md Haris Iqbal <haris.iqbal@ionos.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
#
0cdfb3b2 |
|
28-May-2021 |
Md Haris Iqbal <haris.iqbal@cloud.ionos.com> |
RDMA/rtrs-srv: Replace atomic_t with percpu_ref for ids_inflight ids_inflight is used to track the inflight IOs. But the use of atomic_t variable can cause performance drops and can also become a performance bottleneck. This commit replaces the use of atomic_t with a percpu_ref structure. The advantage it offers is, it doesn't check if the reference has fallen to 0, until the user explicitly signals it to; and that is done by the percpu_ref_kill() function call. After that, the percpu_ref structure behaves like an atomic_t and for every put call, checks whether the reference has fallen to 0 or not. rtrs_srv_stats_rdma_to_str shows the count of ids_inflight as 0 for user-mode tools not to be confused. Fixes: 9cb837480424e ("RDMA/rtrs: server: main functionality") Link: https://lore.kernel.org/r/20210528113018.52290-14-jinpu.wang@ionos.com Signed-off-by: Md Haris Iqbal <haris.iqbal@ionos.com> Signed-off-by: Jack Wang <jinpu.wang@ionos.com> Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
#
0aedfb69 |
|
28-May-2021 |
Guoqing Jiang <guoqing.jiang@cloud.ionos.com> |
RDMA/rtrs-srv: Kill __rtrs_srv_change_state No need since the only user is rtrs_srv_change_state. Link: https://lore.kernel.org/r/20210528113018.52290-11-jinpu.wang@ionos.com Signed-off-by: Guoqing Jiang <guoqing.jiang@ionos.com> Reviewed-by: Md Haris Iqbal <haris.iqbal@ionos.com> Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
#
2d612f0d |
|
28-May-2021 |
Dima Stepanov <dmitrii.stepanov@cloud.ionos.com> |
RDMA/rtrs: Use strscpy instead of strlcpy During checkpatch analyzing the following warning message was found: WARNING:STRLCPY: Prefer strscpy over strlcpy - see: https://lore.kernel.org/r/CAHk-=wgfRnXz0W3D37d01q3JFkr_i_uTL=V6A6G1oUZcprmknw@mail.gmail.com/ Fix it by using strscpy calls instead of strlcpy. Link: https://lore.kernel.org/r/20210528113018.52290-8-jinpu.wang@ionos.com Signed-off-by: Dima Stepanov <dmitrii.stepanov@ionos.com> Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com> Signed-off-by: Jack Wang <jinpu.wang@ionos.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
#
3f3d0eab |
|
28-May-2021 |
Gioh Kim <gi-oh.kim@cloud.ionos.com> |
RDMA/rtrs: Define MIN_CHUNK_SIZE Define MIN_CHUNK_SIZE to replace the hard-coding number. We need 4k for metadata, so MIN_CHUNK_SIZE should be at least 8k. Link: https://lore.kernel.org/r/20210528113018.52290-7-jinpu.wang@ionos.com Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com> Signed-off-by: Jack Wang <jinpu.wang@ionos.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
#
485f2fb1 |
|
28-May-2021 |
Guoqing Jiang <guoqing.jiang@cloud.ionos.com> |
RDMA/rtrs-srv: Clean up the code in __rtrs_srv_change_state No need to use double switch to check the change of state everywhere, let's change them to "if" to reduce size. Link: https://lore.kernel.org/r/20210528113018.52290-5-jinpu.wang@ionos.com Signed-off-by: Guoqing Jiang <guoqing.jiang@ionos.com> Reviewed-by: Md Haris Iqbal <haris.iqbal@ionos.com> Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
#
6564b110 |
|
28-May-2021 |
Md Haris Iqbal <haris.iqbal@ionos.com> |
RDMA/rtrs-srv: Add error messages for cases when failing RDMA connection It was difficult to find out why it failed to establish RDMA connection. This patch adds some messages to show which function has failed why. Link: https://lore.kernel.org/r/20210528113018.52290-4-jinpu.wang@ionos.com Signed-off-by: Md Haris Iqbal <haris.iqbal@ionos.com> Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com> Signed-off-by: Jack Wang <jinpu.wang@ionos.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
#
cfbeb0b9 |
|
28-May-2021 |
Guoqing Jiang <guoqing.jiang@cloud.ionos.com> |
RDMA/rtrs-srv: Kill reject_w_econnreset label We can goto reject_w_err label after initialize err with -ECONNRESET. Link: https://lore.kernel.org/r/20210528113018.52290-2-jinpu.wang@ionos.com Reviewed-by: Md Haris Iqbal <haris.iqbal@ionos.com> Signed-off-by: Guoqing Jiang <guoqing.jiang@ionos.com> Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
#
42cdc190 |
|
06-Apr-2021 |
Gioh Kim <gi-oh.kim@cloud.ionos.com> |
RDMA/rtrs-srv: More debugging info when fail to send reply It does not help to debug if it only print error message without any debugging information which session and connection the error happened. Link: https://lore.kernel.org/r/20210406123639.202899-3-gi-oh.kim@ionos.com Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com> Signed-off-by: Jack Wang <jinpu.wang@ionos.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
#
11b74cbf |
|
25-Mar-2021 |
Jack Wang <jinpu.wang@cloud.ionos.com> |
RDMA/rtrs: Cleanup unused 's' variable in __alloc_sess Link: https://lore.kernel.org/r/20210325153308.1214057-18-gi-oh.kim@ionos.com Signed-off-by: Jack Wang <jinpu.wang@ionos.com> Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
#
88e2f105 |
|
25-Mar-2021 |
Gioh Kim <gi-oh.kim@cloud.ionos.com> |
RDMA/rtrs-srv: Report temporary sessname for error message Before receiving the session name, the error message cannot include any information about which connection generates the error. This patch stores the addresses of source and target in the sessname field to show which generates the error. That field will be over-written when receiving the session name from client. Link: https://lore.kernel.org/r/20210325153308.1214057-17-gi-oh.kim@ionos.com Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com> Signed-off-by: Jack Wang <jinpu.wang@ionos.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
#
57dae8ba |
|
25-Mar-2021 |
Guoqing Jiang <guoqing.jiang@cloud.ionos.com> |
RDMA/rtrs: Cleanup the code in rtrs_srv_rdma_cm_handler Let these cases share the same path since all of them need to close session. Link: https://lore.kernel.org/r/20210325153308.1214057-11-gi-oh.kim@ionos.com Signed-off-by: Guoqing Jiang <guoqing.jiang@ionos.com> Reviewed-by: Danil Kipnis <danil.kipnis@ionos.com> Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com> Signed-off-by: Jack Wang <jinpu.wang@ionos.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
#
3b89e92c |
|
22-Feb-2021 |
Jack Wang <jinpu.wang@cloud.ionos.com> |
RDMA/rtrs: Use new shared CQ mechanism Have the driver use shared CQs which provids a ~10%-20% improvement during test. Instead of opening a CQ for each QP per connection, a CQ for each QP will be provided by the RDMA core driver that will be shared between the QPs on that core reducing interrupt overhead. Link: https://lore.kernel.org/r/20210222141551.54345-1-jinpu.wang@cloud.ionos.com Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
#
c81cba85 |
|
19-Apr-2021 |
Gioh Kim <gi-oh.kim@cloud.ionos.com> |
block/rnbd-srv: Remove unused arguments of rnbd_srv_rdma_ev struct rtrs_srv is not used when handling rnbd_srv_rdma_ev messages, so cleaned up rdma_ev function pointer in rtrs_srv_ops also is changed. Cc: Leon Romanovsky <leonro@nvidia.com> Cc: linux-rdma@vger.kernel.org Signed-off-by: Aleksei Marov <aleksei.marov@ionos.com> Signed-off-by: Jack Wang <jinpu.wang@ionos.com> Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Acked-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Link: https://lore.kernel.org/r/20210419073722.15351-16-gi-oh.kim@ionos.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
ed408529 |
|
16-Feb-2021 |
Jack Wang <jinpu.wang@cloud.ionos.com> |
RDMA/rtrs-srv: Do not pass a valid pointer to PTR_ERR() smatch gives the warning: drivers/infiniband/ulp/rtrs/rtrs-srv.c:1805 rtrs_rdma_connect() warn: passing zero to 'PTR_ERR' Which is trying to say smatch has shown that srv is not an error pointer and thus cannot be passed to PTR_ERR. The solution is to move the list_add() down after full initilization of rtrs_srv. To avoid holding the srv_mutex too long, only hold it during the list operation as suggested by Leon. Fixes: 03e9b33a0fd6 ("RDMA/rtrs: Only allow addition of path to an already established session") Link: https://lore.kernel.org/r/20210216143807.65923-1-jinpu.wang@cloud.ionos.com Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
#
f7452a7e |
|
12-Feb-2021 |
Gioh Kim <gi-oh.kim@cloud.ionos.com> |
RDMA/rtrs-srv: fix memory leak by missing kobject free kmemleak reported an error as below: unreferenced object 0xffff8880674b7640 (size 64): comm "kworker/4:1H", pid 113, jiffies 4296403507 (age 507.840s) hex dump (first 32 bytes): 69 70 3a 31 39 32 2e 31 36 38 2e 31 32 32 2e 31 ip:192.168.122.1 31 30 40 69 70 3a 31 39 32 2e 31 36 38 2e 31 32 10@ip:192.168.12 backtrace: [<0000000054413611>] kstrdup+0x2e/0x60 [<0000000078e3120a>] kobject_set_name_vargs+0x2f/0xb0 [<00000000ca2be3ee>] kobject_init_and_add+0xb0/0x120 [<0000000062ba5e78>] rtrs_srv_create_sess_files+0x14c/0x314 [rtrs_server] [<00000000b45b7217>] rtrs_srv_info_req_done+0x5b1/0x800 [rtrs_server] [<000000008fc5aa8f>] __ib_process_cq+0x94/0x100 [ib_core] [<00000000a9599cb4>] ib_cq_poll_work+0x32/0xc0 [ib_core] [<00000000cfc376be>] process_one_work+0x4bc/0x980 [<0000000016e5c96a>] worker_thread+0x78/0x5c0 [<00000000c20b8be0>] kthread+0x191/0x1e0 [<000000006c9c0003>] ret_from_fork+0x3a/0x50 It is caused by the not-freed kobject of rtrs_srv_sess. The kobject embedded in rtrs_srv_sess has ref-counter 2 after calling process_info_req(). Therefore it must call kobject_put twice. Currently it calls kobject_put only once at rtrs_srv_destroy_sess_files because kobject_del removes the state_in_sysfs flag and then kobject_put in free_sess() is not called. This patch moves kobject_del() into free_sess() so that the kobject of rtrs_srv_sess can be freed. And also this patch adds the missing call of sysfs_remove_group() to clean-up the sysfs directory. Fixes: 9cb837480424 ("RDMA/rtrs: server: main functionality") Link: https://lore.kernel.org/r/20210212134525.103456-4-jinpu.wang@cloud.ionos.com Signed-off-by: Gioh Kim <gi-oh.kim@cloud.ionos.com> Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
#
03e9b33a |
|
12-Feb-2021 |
Md Haris Iqbal <haris.iqbal@cloud.ionos.com> |
RDMA/rtrs: Only allow addition of path to an already established session While adding a path from the client side to an already established session, it was possible to provide the destination IP to a different server. This is dangerous. This commit adds an extra member to the rtrs_msg_conn_req structure, named first_conn; which is supposed to notify if the connection request is the first for that session or not. On the server side, if a session does not exist but the first_conn received inside the rtrs_msg_conn_req structure is 1, the connection request is failed. This signifies that the connection request is for an already existing session, and since the server did not find one, it is an wrong connection request. Fixes: 6a98d71daea1 ("RDMA/rtrs: client: main functionality") Fixes: 9cb837480424 ("RDMA/rtrs: server: main functionality") Link: https://lore.kernel.org/r/20210212134525.103456-3-jinpu.wang@cloud.ionos.com Signed-off-by: Md Haris Iqbal <haris.iqbal@cloud.ionos.com> Reviewed-by: Lutz Pogrell <lutz.pogrell@cloud.ionos.com> Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
#
e6daa8f6 |
|
12-Feb-2021 |
Jack Wang <jinpu.wang@cloud.ionos.com> |
RDMA/rtrs-srv: Fix stack-out-of-bounds BUG: KASAN: stack-out-of-bounds in _mlx4_ib_post_send+0x1bd2/0x2770 [mlx4_ib] Read of size 4 at addr ffff8880d5a7f980 by task kworker/0:1H/565 CPU: 0 PID: 565 Comm: kworker/0:1H Tainted: G O 5.4.84-storage #5.4.84-1+feature+linux+5.4.y+dbg+20201216.1319+b6b887b~deb10 Hardware name: Supermicro H8QG6/H8QG6, BIOS 3.00 09/04/2012 Workqueue: ib-comp-wq ib_cq_poll_work [ib_core] Call Trace: dump_stack+0x96/0xe0 print_address_description.constprop.4+0x1f/0x300 ? irq_work_claim+0x2e/0x50 __kasan_report.cold.8+0x78/0x92 ? _mlx4_ib_post_send+0x1bd2/0x2770 [mlx4_ib] kasan_report+0x10/0x20 _mlx4_ib_post_send+0x1bd2/0x2770 [mlx4_ib] ? check_chain_key+0x1d7/0x2e0 ? _mlx4_ib_post_recv+0x630/0x630 [mlx4_ib] ? lockdep_hardirqs_on+0x1a8/0x290 ? stack_depot_save+0x218/0x56e ? do_profile_hits.isra.6.cold.13+0x1d/0x1d ? check_chain_key+0x1d7/0x2e0 ? save_stack+0x4d/0x80 ? save_stack+0x19/0x80 ? __kasan_slab_free+0x125/0x170 ? kfree+0xe7/0x3b0 rdma_write_sg+0x5b0/0x950 [rtrs_server] The problem is when we send imm_wr, the type should be ib_rdma_wr, so hw driver like mlx4 can do rdma_wr(wr), so fix it by use the ib_rdma_wr as type for imm_wr. Fixes: 9cb837480424 ("RDMA/rtrs: server: main functionality") Link: https://lore.kernel.org/r/20210212134525.103456-2-jinpu.wang@cloud.ionos.com Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com> Reviewed-by: Gioh Kim <gi-oh.kim@cloud.ionos.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
#
6f5d1b30 |
|
17-Dec-2020 |
Jack Wang <jinpu.wang@cloud.ionos.com> |
RDMA/rtrs-srv: Init wr_cnt as 1 Fix up wr_avail accounting. if wr_cnt is 0, then we do SIGNAL for first wr, in completion we add queue_depth back, which is not right in the sense of tracking for available wr. So fix it by init wr_cnt to 1. Fixes: 9cb837480424 ("RDMA/rtrs: server: main functionality") Link: https://lore.kernel.org/r/20201217141915.56989-19-jinpu.wang@cloud.ionos.com Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com> Signed-off-by: Gioh Kim <gi-oh.kim@cloud.ionos.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
#
e8ae7ddb |
|
17-Dec-2020 |
Jack Wang <jinpu.wang@cloud.ionos.com> |
RDMA/rtrs-srv: Do not signal REG_MR We do not need to wait for REG_MR completion, so remove the SIGNAL flag. Fixes: 9cb837480424 ("RDMA/rtrs: server: main functionality") Link: https://lore.kernel.org/r/20201217141915.56989-18-jinpu.wang@cloud.ionos.com Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com> Signed-off-by: Gioh Kim <gi-oh.kim@cloud.ionos.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
#
b38041d5 |
|
17-Dec-2020 |
Jack Wang <jinpu.wang@cloud.ionos.com> |
RDMA/rtrs: Do not signal for heatbeat For HB, there is no need to generate signal for completion. Also remove a comment accordingly. Fixes: c0894b3ea69d ("RDMA/rtrs: core: lib functions shared between client and server modules") Link: https://lore.kernel.org/r/20201217141915.56989-16-jinpu.wang@cloud.ionos.com Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com> Reported-by: Gioh Kim <gi-oh.kim@cloud.ionos.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
#
8537f2de |
|
17-Dec-2020 |
Jack Wang <jinpu.wang@cloud.ionos.com> |
RDMA/rtrs-srv: Fix missing wr_cqe We had a few places wr_cqe is not set, which could lead to NULL pointer deref or GPF in error case. Fixes: 9cb837480424 ("RDMA/rtrs: server: main functionality") Link: https://lore.kernel.org/r/20201217141915.56989-14-jinpu.wang@cloud.ionos.com Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com> Reviewed-by: Md Haris Iqbal <haris.iqbal@cloud.ionos.com> Signed-off-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
#
f77c4839 |
|
17-Dec-2020 |
Guoqing Jiang <guoqing.jiang@cloud.ionos.com> |
RDMA/rtrs-srv: Jump to dereg_mr label if allocate iu fails The rtrs_iu_free is called in rtrs_iu_alloc if memory is limited, so we don't need to free the same iu again. Fixes: 9cb837480424 ("RDMA/rtrs: server: main functionality") Link: https://lore.kernel.org/r/20201217141915.56989-7-jinpu.wang@cloud.ionos.com Signed-off-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com> Reviewed-by: Gioh Kim <gi-oh.kim@cloud.ionos.com> Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
#
99f0c380 |
|
17-Dec-2020 |
Jack Wang <jinpu.wang@cloud.ionos.com> |
RDMA/rtrs-srv: Release lock before call into close_sess In this error case, we don't need hold mutex to call close_sess. Fixes: 9cb837480424 ("RDMA/rtrs: server: main functionality") Link: https://lore.kernel.org/r/20201217141915.56989-4-jinpu.wang@cloud.ionos.com Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com> Tested-by: Lutz Pogrell <lutz.pogrell@cloud.ionos.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
#
7490fd1f |
|
17-Dec-2020 |
Jack Wang <jinpu.wang@cloud.ionos.com> |
RDMA/rtrs: Extend ibtrs_cq_qp_create rtrs does not have same limit for both max_send_wr and max_recv_wr, To allow client and server set different values, export in a separate parameter for rtrs_cq_qp_create. Also fix the type accordingly, u32 should be used instead of u16. Fixes: c0894b3ea69d ("RDMA/rtrs: core: lib functions shared between client and server modules") Link: https://lore.kernel.org/r/20201217141915.56989-2-jinpu.wang@cloud.ionos.com Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com> Reviewed-by: Md Haris Iqbal <haris.iqbal@cloud.ionos.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
#
ffea6ad1 |
|
23-Oct-2020 |
Guoqing Jiang <guoqing.jiang@cloud.ionos.com> |
RDMA/rtrs-srv: Kill rtrs_srv_change_state_get_old This function isn't needed since no caller checks the old_state of sess. Link: https://lore.kernel.org/r/20201023074353.21946-11-jinpu.wang@cloud.ionos.com Signed-off-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com> Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
#
8bd372ac |
|
23-Oct-2020 |
Gioh Kim <gi-oh.kim@cloud.ionos.com> |
RDMA/rtrs: Remove unnecessary argument dir of rtrs_iu_free The direction of DMA operation is already in the rtrs_iu Link: https://lore.kernel.org/r/20201023074353.21946-8-jinpu.wang@cloud.ionos.com Signed-off-by: Gioh Kim <gi-oh.kim@cloud.ionos.com> Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
#
d715ff8a |
|
23-Oct-2020 |
Guoqing Jiang <guoqing.jiang@cloud.ionos.com> |
RDMA/rtrs-srv: Don't guard the whole __alloc_srv with srv_mutex The purpose of srv_mutex is to protect srv_list as in put_srv, so no need to hold it when allocate memory for srv since it could be time consuming. Otherwise if one machine has limited memory, rsrv_close_work could be blocked for a longer time due to the mutex is held by get_or_create_srv since it can't get memory in time. INFO: task kworker/1:1:27478 blocked for more than 120 seconds. Tainted: G O 4.14.171-1-storage #4.14.171-1.3~deb9 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. kworker/1:1 D 0 27478 2 0x80000000 Workqueue: rtrs_server_wq rtrs_srv_close_work [rtrs_server] Call Trace: ? __schedule+0x38c/0x7e0 schedule+0x32/0x80 schedule_preempt_disabled+0xa/0x10 __mutex_lock.isra.2+0x25e/0x4d0 ? put_srv+0x44/0x100 [rtrs_server] put_srv+0x44/0x100 [rtrs_server] rtrs_srv_close_work+0x16c/0x280 [rtrs_server] process_one_work+0x1c5/0x3c0 worker_thread+0x47/0x3e0 kthread+0xfc/0x130 ? trace_event_raw_event_workqueue_execute_start+0xa0/0xa0 ? kthread_create_on_node+0x70/0x70 ret_from_fork+0x1f/0x30 Let's move all the logics from __find_srv_and_get and __alloc_srv to get_or_create_srv, and remove the two functions. Then it should be safe for multiple processes to access the same srv since it is protected with srv_mutex. And since we don't want to allocate chunks with srv_mutex held, let's check the srv->refcount after get srv because the chunks could not be allocated yet. Fixes: 9cb837480424 ("RDMA/rtrs: server: main functionality") Link: https://lore.kernel.org/r/20201023074353.21946-6-jinpu.wang@cloud.ionos.com Signed-off-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com> Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
#
558d52b2 |
|
07-Sep-2020 |
Md Haris Iqbal <haris.iqbal@cloud.ionos.com> |
RDMA/rtrs-srv: Incorporate ib_register_client into rtrs server init The rnbd_server module's communication manager (cm) initialization depends on the registration of the "network namespace subsystem" of the RDMA CM agent module. As such, when the kernel is configured to load the rnbd_server and the RDMA cma module during initialization; and if the rnbd_server module is initialized before RDMA cma module, a null ptr dereference occurs during the RDMA bind operation. Call trace: Call Trace: ? xas_load+0xd/0x80 xa_load+0x47/0x80 cma_ps_find+0x44/0x70 rdma_bind_addr+0x782/0x8b0 ? get_random_bytes+0x35/0x40 rtrs_srv_cm_init+0x50/0x80 rtrs_srv_open+0x102/0x180 ? rnbd_client_init+0x6e/0x6e rnbd_srv_init_module+0x34/0x84 ? rnbd_client_init+0x6e/0x6e do_one_initcall+0x4a/0x200 kernel_init_freeable+0x1f1/0x26e ? rest_init+0xb0/0xb0 kernel_init+0xe/0x100 ret_from_fork+0x22/0x30 Modules linked in: CR2: 0000000000000015 All this happens cause the cm init is in the call chain of the module init, which is not a preferred practice. So remove the call to rdma_create_id() from the module init call chain. Instead register rtrs-srv as an ib client, which makes sure that the rdma_create_id() is called only when an ib device is added. Fixes: 9cb837480424 ("RDMA/rtrs: server: main functionality") Link: https://lore.kernel.org/r/20200907103106.104530-1-haris.iqbal@cloud.ionos.com Reported-by: kernel test robot <rong.a.chen@intel.com> Signed-off-by: Md Haris Iqbal <haris.iqbal@cloud.ionos.com> Reviewed-by: Jack Wang <jinpu.wang@cloud.ionos.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
#
39c2d639 |
|
07-Sep-2020 |
Md Haris Iqbal <haris.iqbal@cloud.ionos.com> |
RDMA/rtrs-srv: Set .release function for rtrs srv device during device init The device .release function was not being set during the device initialization. This was leading to the below warning, in error cases when put_srv was called before device_add was called. Warning: Device '(null)' does not have a release() function, it is broken and must be fixed. See Documentation/kobject.txt. So, set the device .release function during device initialization in the __alloc_srv() function. Fixes: baa5b28b7a47 ("RDMA/rtrs-srv: Replace device_register with device_initialize and device_add") Link: https://lore.kernel.org/r/20200907102216.104041-1-haris.iqbal@cloud.ionos.com Signed-off-by: Md Haris Iqbal <haris.iqbal@cloud.ionos.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Acked-by: Jack Wang <jinpu.wang@cloud.ionos.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
#
baa5b28b |
|
11-Aug-2020 |
Md Haris Iqbal <haris.iqbal@cloud.ionos.com> |
RDMA/rtrs-srv: Replace device_register with device_initialize and device_add There are error cases when we will call free_srv before device kobject is initialized; in such cases calling put_device generates the following warning: kobject: '(null)' (000000009f5445ed): is not initialized, yet kobject_put() is being called. So call device_initialize() only once when the server is allocated. If we end up calling put_srv() and subsequently free_srv(), our call to put_device() would result in deletion of the obj. Call device_add() later when we actually have a connection. Correspondingly, call device_del() instead of device_unregister() when srv->dev_ref falls to 0. Fixes: 9cb837480424 ("RDMA/rtrs: server: main functionality") Link: https://lore.kernel.org/r/20200811092722.2450-1-haris.iqbal@cloud.ionos.com Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Md Haris Iqbal <haris.iqbal@cloud.ionos.com> Reviewed-by: Jack Wang <jinpu.wang@cloud.ionos.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
#
03ed5a8c |
|
24-Jul-2020 |
Jack Wang <jinpu.wang@cloud.ionos.com> |
RDMA/rtrs: remove WQ_MEM_RECLAIM for rtrs_wq lockdep triggers a warning from time to time when running a regression test: rnbd_client L685: </dev/nullb0@bla> Device disconnected. rnbd_client L1756: Unloading module workqueue: WQ_MEM_RECLAIM rtrs_client_wq:rtrs_clt_reconnect_work [rtrs_client] is flushing !WQ_MEM_RECLAIM ib_addr:process_one_req [ib_core] WARNING: CPU: 2 PID: 18824 at kernel/workqueue.c:2517 check_flush_dependency+0xad/0x130 The root cause is workqueue core expect flushing should not be done for a !WQ_MEM_RECLAIM wq from a WQ_MEM_RECLAIM workqueue. In above case ib_addr workqueue without WQ_MEM_RECLAIM, but rtrs_wq WQ_MEM_RECLAIM. To avoid the warning, remove the WQ_MEM_RECLAIM flag. Fixes: 9cb837480424 ("RDMA/rtrs: server: main functionality") Fixes: 6a98d71daea1 ("RDMA/rtrs: client: main functionality") Link: https://lore.kernel.org/r/20200724111508.15734-4-haris.iqbal@cloud.ionos.com Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com> Signed-off-by: Md Haris Iqbal <haris.iqbal@cloud.ionos.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
#
8094ba0a |
|
26-May-2020 |
Leon Romanovsky <leon@kernel.org> |
RDMA/cma: Provide ECE reject reason IBTA declares "vendor option not supported" reject reason in REJ messages if passive side doesn't want to accept proposed ECE options. Due to the fact that ECE is managed by userspace, there is a need to let users to provide such rejected reason. Link: https://lore.kernel.org/r/20200526103304.196371-7-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
#
e172037b |
|
22-May-2020 |
Md Haris Iqbal <haris.phnx@gmail.com> |
RDMA/rtrs: server: Use already dereferenced rtrs_sess structure The rtrs_sess structure has already been extracted above from the rtrs_srv_sess structure. Use that to avoid redundant dereferencing. Fixes: 9cb837480424 ("RDMA/rtrs: server: main functionality") Link: https://lore.kernel.org/r/20200522082833.1480551-1-haris.phnx@gmail.com Signed-off-by: Md Haris Iqbal <haris.phnx@gmail.com> Acked-by: Danil Kipnis <danil.kipnis@cloud.ionos.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
#
6b31afce |
|
19-May-2020 |
Wei Yongjun <weiyongjun1@huawei.com> |
RDMA/rtrs: server: Fix some error return code Fix to return negative error code -ENOMEM from the some error handling cases instead of 0, as done elsewhere in this function. Fixes: 9cb837480424 ("RDMA/rtrs: server: main functionality") Fixes: 91b11610af8d ("RDMA/rtrs: server: sysfs interface functions") Link: https://lore.kernel.org/r/20200519091912.134358-1-weiyongjun1@huawei.com Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Reviewed-by: Danil Kipnis <danil.kipnis@cloud.ionos.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
#
bf1d8edb |
|
19-May-2020 |
Dan Carpenter <dan.carpenter@oracle.com> |
RDMA/rtrs: Fix a couple off by one bugs in rtrs_srv_rdma_done() These > comparisons should be >= to prevent accessing one element beyond the end of the buffer. Fixes: 9cb837480424 ("RDMA/rtrs: server: main functionality") Link: https://lore.kernel.org/r/20200519154525.GA66801@mwanda Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Danil Kipnis <danil.kipnis@cloud.ionos.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
#
b386cd65 |
|
19-May-2020 |
Dan Carpenter <dan.carpenter@oracle.com> |
RDMA/rtrs: Fix some signedness bugs in error handling The problem is that "req->sg_cnt" is an unsigned int so if "nr" is negative, it gets type promoted to a high positive value and the condition is false. This patch fixes it by handling negatives separately. Fixes: 6a98d71daea1 ("RDMA/rtrs: client: main functionality") Link: https://lore.kernel.org/r/20200519133223.GN2078@kadam Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Jack Wang <jinpu.wang@cloud.ionos.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
#
9cb83748 |
|
11-May-2020 |
Jack Wang <jinpu.wang@cloud.ionos.com> |
RDMA/rtrs: server: main functionality This is main functionality of rtrs-server module, which accepts set of RDMA connections (so called rtrs session), creates/destroys sysfs entries associated with rtrs session and notifies upper layer (user of RTRS API) about RDMA requests or link events. Link: https://lore.kernel.org/r/20200511135131.27580-11-danil.kipnis@cloud.ionos.com Signed-off-by: Danil Kipnis <danil.kipnis@cloud.ionos.com> Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|