#
b27696cd |
|
04-Feb-2024 |
Wen Gu <guwen@linux.alibaba.com> |
net/smc: change the term virtual ISM to Emulated-ISM According to latest release of SMCv2.1[1], the term 'virtual ISM' has been changed to 'Emulated-ISM' to avoid the ambiguity of the word 'virtual' in different contexts. So the names or comments in the code need be modified accordingly. [1] https://www.ibm.com/support/pages/node/7112343 Signed-off-by: Wen Gu <guwen@linux.alibaba.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Wenjia Zhang <wenjia@linux.ibm.com> Link: https://lore.kernel.org/r/20240205033317.127269-1-guwen@linux.alibaba.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
6cf9ff46 |
|
12-Feb-2024 |
Dmitry Antipov <dmantipov@yandex.ru> |
net: smc: fix spurious error message from __sock_release() Commit 67f562e3e147 ("net/smc: transfer fasync_list in case of fallback") leaves the socket's fasync list pointer within a container socket as well. When the latter is destroyed, '__sock_release()' warns about its non-empty fasync list, which is a dangling pointer to previously freed fasync list of an underlying TCP socket. Fix this spurious warning by nullifying fasync list of a container socket. Fixes: 67f562e3e147 ("net/smc: transfer fasync_list in case of fallback") Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
b40584d1 |
|
19-Dec-2023 |
Wen Gu <guwen@linux.alibaba.com> |
net/smc: compatible with 128-bits extended GID of virtual ISM device According to virtual ISM support feature defined by SMCv2.1, GIDs of virtual ISM device are UUIDs defined by RFC4122, which are 128-bits long. So some adaptation work is required. And note that the GIDs of existing platform firmware ISM devices still remain 64-bits long. Signed-off-by: Wen Gu <guwen@linux.alibaba.com> Reviewed-by: Alexandra Winter <wintera@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
9505450d |
|
19-Dec-2023 |
Wen Gu <guwen@linux.alibaba.com> |
net/smc: unify the structs of accept or confirm message for v1 and v2 The structs of CLC accept and confirm messages for SMCv1 and SMCv2 are separately defined and often casted to each other in the code, which may increase the risk of errors caused by future divergence of them. So unify them into one struct for better maintainability. Suggested-by: Alexandra Winter <wintera@linux.ibm.com> Signed-off-by: Wen Gu <guwen@linux.alibaba.com> Reviewed-by: Alexandra Winter <wintera@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
f8e80fc4 |
|
22-Nov-2023 |
Guangguan Wang <guangguan.wang@linux.alibaba.com> |
net/smc: add sysctl for max links per lgr for SMC-R v2.1 Add a new sysctl: net.smc.smcr_max_links_per_lgr, which is used to control the preferred max links per lgr for SMC-R v2.1. The default value of this sysctl is 2, and the acceptable value ranges from 1 to 2. Signed-off-by: Guangguan Wang <guangguan.wang@linux.alibaba.com> Reviewed-by: Dust Li <dust.li@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
c5a10397 |
|
06-Dec-2023 |
Wen Gu <guwen@linux.alibaba.com> |
net/smc: fix missing byte order conversion in CLC handshake The byte order conversions of ISM GID and DMB token are missing in process of CLC accept and confirm. So fix it. Fixes: 3d9725a6a133 ("net/smc: common routine for CLC accept and confirm") Signed-off-by: Wen Gu <guwen@linux.alibaba.com> Reviewed-by: Tony Lu <tonylu@linux.alibaba.com> Reviewed-by: Alexandra Winter <wintera@linux.ibm.com> Reviewed-by: Wenjia Zhang <wenjia@linux.ibm.com> Link: https://lore.kernel.org/r/1701882157-87956-1-git-send-email-guwen@linux.alibaba.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
e6d71b43 |
|
21-Nov-2023 |
D. Wythe <alibuda@linux.alibaba.com> |
net/smc: avoid data corruption caused by decline We found a data corruption issue during testing of SMC-R on Redis applications. The benchmark has a low probability of reporting a strange error as shown below. "Error: Protocol error, got "\xe2" as reply type byte" Finally, we found that the retrieved error data was as follows: 0xE2 0xD4 0xC3 0xD9 0x04 0x00 0x2C 0x20 0xA6 0x56 0x00 0x16 0x3E 0x0C 0xCB 0x04 0x02 0x01 0x00 0x00 0x20 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0xE2 It is quite obvious that this is a SMC DECLINE message, which means that the applications received SMC protocol message. We found that this was caused by the following situations: client server ¦ clc proposal -------------> ¦ clc accept <------------- ¦ clc confirm -------------> wait llc confirm send llc confirm ¦failed llc confirm ¦ x------ (after 2s)timeout wait llc confirm rsp wait decline (after 1s) timeout (after 2s) timeout ¦ decline --------------> ¦ decline <-------------- As a result, a decline message was sent in the implementation, and this message was read from TCP by the already-fallback connection. This patch double the client timeout as 2x of the server value, With this simple change, the Decline messages should never cross or collide (during Confirm link timeout). This issue requires an immediate solution, since the protocol updates involve a more long-term solution. Fixes: 0fb0b02bd6fd ("net/smc: adapt SMC client code to use the LLC flow") Signed-off-by: D. Wythe <alibuda@linux.alibaba.com> Reviewed-by: Wen Gu <guwen@linux.alibaba.com> Reviewed-by: Wenjia Zhang <wenjia@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
5211c972 |
|
03-Nov-2023 |
D. Wythe <alibuda@linux.alibaba.com> |
net/smc: fix dangling sock under state SMC_APPFINCLOSEWAIT Considering scenario: smc_cdc_rx_handler __smc_release sock_set_flag smc_close_active() sock_set_flag __set_bit(DEAD) __set_bit(DONE) Dues to __set_bit is not atomic, the DEAD or DONE might be lost. if the DEAD flag lost, the state SMC_CLOSED will be never be reached in smc_close_passive_work: if (sock_flag(sk, SOCK_DEAD) && smc_close_sent_any_close(conn)) { sk->sk_state = SMC_CLOSED; } else { /* just shutdown, but not yet closed locally */ sk->sk_state = SMC_APPFINCLOSEWAIT; } Replace sock_set_flags or __set_bit to set_bit will fix this problem. Since set_bit is atomic. Fixes: b38d732477e4 ("smc: socket closing and linkgroup cleanup") Signed-off-by: D. Wythe <alibuda@linux.alibaba.com> Reviewed-by: Dust Li <dust.li@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
10bbf165 |
|
21-Sep-2023 |
Eric Dumazet <edumazet@google.com> |
net: implement lockless SO_PRIORITY This is a followup of 8bf43be799d4 ("net: annotate data-races around sk->sk_priority"). sk->sk_priority can be read and written without holding the socket lock. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Wenjia Zhang <wenjia@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
4abbd2e3 |
|
12-Oct-2023 |
Dust Li <dust.li@linux.alibaba.com> |
net/smc: return the right falback reason when prefix checks fail In the smc_listen_work(), if smc_listen_prfx_check() failed, the real reason: SMC_CLC_DECL_DIFFPREFIX was dropped, and SMC_CLC_DECL_NOSMCDEV was returned. Althrough this is also kind of SMC_CLC_DECL_NOSMCDEV, but return the real reason is much friendly for debugging. Fixes: e49300a6bf62 ("net/smc: add listen processing for SMC-Rv2") Signed-off-by: Dust Li <dust.li@linux.alibaba.com> Reviewed-by: Alexandra Winter <wintera@linux.ibm.com> Reviewed-by: Wenjia Zhang <wenjia@linux.ibm.com> Link: https://lore.kernel.org/r/20231012123729.29307-1-dust.li@linux.alibaba.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
c68681ae |
|
11-Oct-2023 |
Albert Huang <huangjie.albert@bytedance.com> |
net/smc: fix smc clc failed issue when netdevice not in init_net If the netdevice is within a container and communicates externally through network technologies such as VxLAN, we won't be able to find routing information in the init_net namespace. To address this issue, we need to add a struct net parameter to the smc_ib_find_route function. This allow us to locate the routing information within the corresponding net namespace, ensuring the correct completion of the SMC CLC interaction. Fixes: e5c4744cfb59 ("net/smc: add SMC-Rv2 connection establishment") Signed-off-by: Albert Huang <huangjie.albert@bytedance.com> Reviewed-by: Dust Li <dust.li@linux.alibaba.com> Reviewed-by: Wenjia Zhang <wenjia@linux.ibm.com> Link: https://lore.kernel.org/r/20231011074851.95280-1-huangjie.albert@bytedance.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
bc1fb82a |
|
18-Aug-2023 |
Eric Dumazet <edumazet@google.com> |
net: annotate data-races around sk->sk_lingertime sk_getsockopt() runs locklessly. This means sk->sk_lingertime can be read while other threads are changing its value. Other reads also happen without socket lock being held, and must be annotated. Remove preprocessor logic using BITS_PER_LONG, compilers are smart enough to figure this by themselves. v2: fixed a clang W=1 (-Wtautological-constant-out-of-range-compare) warning (Jakub) Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
69b888e3 |
|
17-Aug-2023 |
Guangguan Wang <guangguan.wang@linux.alibaba.com> |
net/smc: support max links per lgr negotiation in clc handshake Support max links per lgr negotiation in clc handshake for SMCR v2.1, which is one of smc v2.1 features. Server makes decision for the final value of max links based on the client preferred max links and self-preferred max links. Here use the minimum value of the client preferred max links and server preferred max links. Client Server Proposal(max links(client preferred)) --------------------------------------> Accept(max links(accepted value)) accepted value=min(client preferred, server preferred) <------------------------------------- Confirm(max links(accepted value)) -------------------------------------> Signed-off-by: Guangguan Wang <guangguan.wang@linux.alibaba.com> Reviewed-by: Tony Lu <tonylu@linux.alibaba.com> Reviewed-by: Jan Karcher <jaka@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
7f0620b9 |
|
17-Aug-2023 |
Guangguan Wang <guangguan.wang@linux.alibaba.com> |
net/smc: support max connections per lgr negotiation Support max connections per lgr negotiation for SMCR v2.1, which is one of smc v2.1 features. Server makes decision for the final value of max conns based on the client preferred max conns and self-preferred max conns. Here use the minimum value of client preferred max conns and server preferred max conns. Client Server Proposal(max conns(client preferred)) ------------------------------------> Accept(max conns(accepted value)) accepted value=min(client preferred, server preferred) <----------------------------------- Confirm(max conns(accepted value)) -----------------------------------> Signed-off-by: Guangguan Wang <guangguan.wang@linux.alibaba.com> Reviewed-by: Tony Lu <tonylu@linux.alibaba.com> Reviewed-by: Jan Karcher <jaka@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
6ac1e656 |
|
17-Aug-2023 |
Guangguan Wang <guangguan.wang@linux.alibaba.com> |
net/smc: support smc v2.x features validate Support SMC v2.x features validate for SMC v2.1. This is the frame code for SMC v2.x features validate, and will take effects only when the negotiated release version is v2.1 or later. For Server, v2.x features' validation should be done in smc_clc_srv_ v2x_features_validate when receiving v2.1 or later CLC Proposal Message, such as max conns, max links negotiation, the decision of the final value of max conns and max links should be made in this function. And final check for server when receiving v2.1 or later CLC Confirm Message should be done in smc_clc_v2x_features_confirm_check. For client, v2.x features' validation should be done in smc_clc_clnt_ v2x_features_validate when receiving v2.1 or later CLC Accept Message, for example, the decision to accpt the accepted value or to decline should be made in this function. Signed-off-by: Guangguan Wang <guangguan.wang@linux.alibaba.com> Reviewed-by: Tony Lu <tonylu@linux.alibaba.com> Reviewed-by: Jan Karcher <jaka@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
7290178a |
|
17-Aug-2023 |
Guangguan Wang <guangguan.wang@linux.alibaba.com> |
net/smc: add vendor unique experimental options area in clc handshake Add vendor unique experimental options area in clc handshake. In clc accept and confirm msg, vendor unique experimental options use the 16-Bytes reserved field, which defined in struct smc_clc_fce_gid_ext in previous version. Because of the struct smc_clc_first_contact_ext is widely used and limit the scope of modification, this patch moves the 16-Bytes reserved field out of struct smc_clc_fce_gid_ext, and followed with the struct smc_clc_first_contact_ext in a new struct names struct smc_clc_first_contact_ext_v2x. For SMC-R first connection, in previous version, the struct smc_clc_ first_contact_ext and the 16-Bytes reserved field has already been included in clc accept and confirm msg. Thus, this patch use struct smc_clc_first_contact_ext_v2x instead of the struct smc_clc_first_ contact_ext and the 16-Bytes reserved field in SMC-R clc accept and confirm msg is compatible with previous version. For SMC-D first connection, in previous version, only the struct smc_ clc_first_contact_ext is included in clc accept and confirm msg, and the 16-Bytes reserved field is not included. Thus, when the negotiated smc release version is the version before v2.1, we still use struct smc_clc_first_contact_ext for compatible consideration. If the negotiated smc release version is v2.1 or later, use struct smc_clc_first_contact_ ext_v2x instead. Signed-off-by: Guangguan Wang <guangguan.wang@linux.alibaba.com> Reviewed-by: Tony Lu <tonylu@linux.alibaba.com> Reviewed-by: Jan Karcher <jaka@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
1e700948 |
|
17-Aug-2023 |
Guangguan Wang <guangguan.wang@linux.alibaba.com> |
net/smc: support smc release version negotiation in clc handshake Support smc release version negotiation in clc handshake based on SMC v2, where no negotiation process for different releases, but for different versions. The latest smc release version was updated to v2.1. And currently there are two release versions of SMCv2, v2.0 and v2.1. In the release version negotiation, client sends the preferred release version by CLC Proposal Message, server makes decision for which release version to use based on the client preferred release version and self-supported release version (here choose the minimum release version of the client preferred and server latest supported), then the decision returns to client by CLC Accept Message. Client confirms the decision by CLC Confirm Message. Client Server Proposal(preferred release version) ------------------------------------> Accept(accpeted release version) min(client preferred, server latest supported) <------------------------------------ Confirm(accpeted release version) ------------------------------------> Signed-off-by: Guangguan Wang <guangguan.wang@linux.alibaba.com> Reviewed-by: Tony Lu <tonylu@linux.alibaba.com> Reviewed-by: Jan Karcher <jaka@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
30c3c4a4 |
|
04-Aug-2023 |
Gerd Bayer <gbayer@linux.ibm.com> |
net/smc: Use correct buffer sizes when switching between TCP and SMC Tuning of the effective buffer size through setsockopts was working for SMC traffic only but not for TCP fall-back connections even before commit 0227f058aa29 ("net/smc: Unbind r/w buffer size from clcsock and make them tunable"). That change made it apparent that TCP fall-back connections would use net.smc.[rw]mem as buffer size instead of net.ipv4_tcp_[rw]mem. Amend the code that copies attributes between the (TCP) clcsock and the SMC socket and adjust buffer sizes appropriately: - Copy over sk_userlocks so that both sockets agree on whether tuning via setsockopt is active. - When falling back to TCP use sk_sndbuf or sk_rcvbuf as specified with setsockopt. Otherwise, use the sysctl value for TCP/IPv4. - Likewise, use either values from setsockopt or from sysctl for SMC (duplicated) on successful SMC connect. In smc_tcp_listen_work() drop the explicit copy of buffer sizes as that is taken care of by the attribute copy. Fixes: 0227f058aa29 ("net/smc: Unbind r/w buffer size from clcsock and make them tunable") Reviewed-by: Wenjia Zhang <wenjia@linux.ibm.com> Reviewed-by: Tony Lu <tonylu@linux.alibaba.com> Signed-off-by: Gerd Bayer <gbayer@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
833bac7e |
|
04-Aug-2023 |
Gerd Bayer <gbayer@linux.ibm.com> |
net/smc: Fix setsockopt and sysctl to specify same buffer size again Commit 0227f058aa29 ("net/smc: Unbind r/w buffer size from clcsock and make them tunable") introduced the net.smc.rmem and net.smc.wmem sysctls to specify the size of buffers to be used for SMC type connections. This created a regression for users that specified the buffer size via setsockopt() as the effective buffer size was now doubled. Re-introduce the division by 2 in the SMC buffer create code and level this out by duplicating the net.smc.[rw]mem values used for initializing sk_rcvbuf/sk_sndbuf at socket creation time. This gives users of both methods (setsockopt or sysctl) the effective buffer size that they expect. Initialize net.smc.[rw]mem from its own constant of 64kB, respectively. Internal performance tests show that this value is a good compromise between throughput/latency and memory consumption. Also, this decouples it from any tuning that was done to net.ipv4.tcp_[rw]mem[1] before the module for SMC protocol was loaded. Check that no more than INT_MAX / 2 is assigned to net.smc.[rw]mem, in order to avoid any overflow condition when that is doubled for use in sk_sndbuf or sk_rcvbuf. While at it, drop the confusing sk_buf_size variable from __smc_buf_create and name "compressed" buffer size variables more consistently. Background: Before the commit mentioned above, SMC's buffer allocator in __smc_buf_create() always used half of the sockets' sk_rcvbuf/sk_sndbuf value as initial value to search for appropriate buffers. If the search resorted to using a bigger buffer when all buffers of the specified size were busy, the duplicate of the used effective buffer size is stored back to sk_rcvbuf/sk_sndbuf. When available, buffers of exactly the size that a user had specified as input to setsockopt() were used, despite setsockopt()'s documentation in "man 7 socket" talking of a mandatory duplication: [...] SO_SNDBUF Sets or gets the maximum socket send buffer in bytes. The kernel doubles this value (to allow space for book‐ keeping overhead) when it is set using setsockopt(2), and this doubled value is returned by getsockopt(2). The default value is set by the /proc/sys/net/core/wmem_default file and the maximum allowed value is set by the /proc/sys/net/core/wmem_max file. The minimum (doubled) value for this option is 2048. [...] Fixes: 0227f058aa29 ("net/smc: Unbind r/w buffer size from clcsock and make them tunable") Co-developed-by: Jan Karcher <jaka@linux.ibm.com> Signed-off-by: Jan Karcher <jaka@linux.ibm.com> Reviewed-by: Wenjia Zhang <wenjia@linux.ibm.com> Reviewed-by: Tony Lu <tonylu@linux.alibaba.com> Signed-off-by: Gerd Bayer <gbayer@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
3c5b4d69 |
|
28-Jul-2023 |
Eric Dumazet <edumazet@google.com> |
net: annotate data-races around sk->sk_mark sk->sk_mark is often read while another thread could change the value. Fixes: 4a19ec5800fc ("[NET]: Introducing socket mark socket option.") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
2f8bc2bb |
|
23-Jun-2023 |
David Howells <dhowells@redhat.com> |
smc: Drop smc_sendpage() in favour of smc_sendmsg() + MSG_SPLICE_PAGES Drop the smc_sendpage() code as smc_sendmsg() just passes the call down to the underlying TCP socket and smc_tx_sendpage() is just a wrapper around its sendmsg implementation. Signed-off-by: David Howells <dhowells@redhat.com> cc: Karsten Graul <kgraul@linux.ibm.com> cc: Wenjia Zhang <wenjia@linux.ibm.com> cc: Jan Karcher <jaka@linux.ibm.com> cc: "D. Wythe" <alibuda@linux.alibaba.com> cc: Tony Lu <tonylu@linux.alibaba.com> cc: Wen Gu <guwen@linux.alibaba.com> cc: Jens Axboe <axboe@kernel.dk> cc: Matthew Wilcox <willy@infradead.org> Link: https://lore.kernel.org/r/20230623225513.2732256-10-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
35112271 |
|
17-May-2023 |
Wen Gu <guwen@linux.alibaba.com> |
net/smc: Reset connection when trying to use SMCRv2 fails. We found a crash when using SMCRv2 with 2 Mellanox ConnectX-4. It can be reproduced by: - smc_run nginx - smc_run wrk -t 32 -c 500 -d 30 http://<ip>:<port> BUG: kernel NULL pointer dereference, address: 0000000000000014 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 8000000108713067 P4D 8000000108713067 PUD 151127067 PMD 0 Oops: 0000 [#1] PREEMPT SMP PTI CPU: 4 PID: 2441 Comm: kworker/4:249 Kdump: loaded Tainted: G W E 6.4.0-rc1+ #42 Workqueue: smc_hs_wq smc_listen_work [smc] RIP: 0010:smc_clc_send_confirm_accept+0x284/0x580 [smc] RSP: 0018:ffffb8294b2d7c78 EFLAGS: 00010a06 RAX: ffff8f1873238880 RBX: ffffb8294b2d7dc8 RCX: 0000000000000000 RDX: 00000000000000b4 RSI: 0000000000000001 RDI: 0000000000b40c00 RBP: ffffb8294b2d7db8 R08: ffff8f1815c5860c R09: 0000000000000000 R10: 0000000000000400 R11: 0000000000000000 R12: ffff8f1846f56180 R13: ffff8f1815c5860c R14: 0000000000000001 R15: 0000000000000001 FS: 0000000000000000(0000) GS:ffff8f1aefd00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000014 CR3: 00000001027a0001 CR4: 00000000003706e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> ? mlx5_ib_map_mr_sg+0xa1/0xd0 [mlx5_ib] ? smcr_buf_map_link+0x24b/0x290 [smc] ? __smc_buf_create+0x4ee/0x9b0 [smc] smc_clc_send_accept+0x4c/0xb0 [smc] smc_listen_work+0x346/0x650 [smc] ? __schedule+0x279/0x820 process_one_work+0x1e5/0x3f0 worker_thread+0x4d/0x2f0 ? __pfx_worker_thread+0x10/0x10 kthread+0xe5/0x120 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x2c/0x50 </TASK> During the CLC handshake, server sequentially tries available SMCRv2 and SMCRv1 devices in smc_listen_work(). If an SMCRv2 device is found. SMCv2 based link group and link will be assigned to the connection. Then assumed that some buffer assignment errors happen later in the CLC handshake, such as RMB registration failure, server will give up SMCRv2 and try SMCRv1 device instead. But the resources assigned to the connection won't be reset. When server tries SMCRv1 device, the connection creation process will be executed again. Since conn->lnk has been assigned when trying SMCRv2, it will not be set to the correct SMCRv1 link in smcr_lgr_conn_assign_link(). So in such situation, conn->lgr points to correct SMCRv1 link group but conn->lnk points to the SMCRv2 link mistakenly. Then in smc_clc_send_confirm_accept(), conn->rmb_desc->mr[link->link_idx] will be accessed. Since the link->link_idx is not correct, the related MR may not have been initialized, so crash happens. | Try SMCRv2 device first | |-> conn->lgr: assign existed SMCRv2 link group; | |-> conn->link: assign existed SMCRv2 link (link_idx may be 1 in SMC_LGR_SYMMETRIC); | |-> sndbuf & RMB creation fails, quit; | | Try SMCRv1 device then | |-> conn->lgr: create SMCRv1 link group and assign; | |-> conn->link: keep SMCRv2 link mistakenly; | |-> sndbuf & RMB creation succeed, only RMB->mr[link_idx = 0] | initialized. | | Then smc_clc_send_confirm_accept() accesses | conn->rmb_desc->mr[conn->link->link_idx, which is 1], then crash. v This patch tries to fix this by cleaning conn->lnk before assigning link. In addition, it is better to reset the connection and clean the resources assigned if trying SMCRv2 failed in buffer creation or registration. Fixes: e49300a6bf62 ("net/smc: add listen processing for SMC-Rv2") Link: https://lore.kernel.org/r/20220523055056.2078994-1-liuyacan@corp.netease.com/ Signed-off-by: Wen Gu <guwen@linux.alibaba.com> Reviewed-by: Tony Lu <tonylu@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
9744d2bf |
|
08-Apr-2023 |
Kuniyuki Iwashima <kuniyu@amazon.com> |
smc: Fix use-after-free in tcp_write_timer_handler(). With Eric's ref tracker, syzbot finally found a repro for use-after-free in tcp_write_timer_handler() by kernel TCP sockets. [0] If SMC creates a kernel socket in __smc_create(), the kernel socket is supposed to be freed in smc_clcsock_release() by calling sock_release() when we close() the parent SMC socket. However, at the end of smc_clcsock_release(), the kernel socket's sk_state might not be TCP_CLOSE. This means that we have not called inet_csk_destroy_sock() in __tcp_close() and have not stopped the TCP timers. The kernel socket's TCP timers can be fired later, so we need to hold a refcnt for net as we do for MPTCP subflows in mptcp_subflow_create_socket(). [0]: leaked reference. sk_alloc (./include/net/net_namespace.h:335 net/core/sock.c:2108) inet_create (net/ipv4/af_inet.c:319 net/ipv4/af_inet.c:244) __sock_create (net/socket.c:1546) smc_create (net/smc/af_smc.c:3269 net/smc/af_smc.c:3284) __sock_create (net/socket.c:1546) __sys_socket (net/socket.c:1634 net/socket.c:1618 net/socket.c:1661) __x64_sys_socket (net/socket.c:1672) do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80) entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:120) ================================================================== BUG: KASAN: slab-use-after-free in tcp_write_timer_handler (net/ipv4/tcp_timer.c:378 net/ipv4/tcp_timer.c:624 net/ipv4/tcp_timer.c:594) Read of size 1 at addr ffff888052b65e0d by task syzrepro/18091 CPU: 0 PID: 18091 Comm: syzrepro Tainted: G W 6.3.0-rc4-01174-gb5d54eb5899a #7 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.0-1.amzn2022.0.1 04/01/2014 Call Trace: <IRQ> dump_stack_lvl (lib/dump_stack.c:107) print_report (mm/kasan/report.c:320 mm/kasan/report.c:430) kasan_report (mm/kasan/report.c:538) tcp_write_timer_handler (net/ipv4/tcp_timer.c:378 net/ipv4/tcp_timer.c:624 net/ipv4/tcp_timer.c:594) tcp_write_timer (./include/linux/spinlock.h:390 net/ipv4/tcp_timer.c:643) call_timer_fn (./arch/x86/include/asm/jump_label.h:27 ./include/linux/jump_label.h:207 ./include/trace/events/timer.h:127 kernel/time/timer.c:1701) __run_timers.part.0 (kernel/time/timer.c:1752 kernel/time/timer.c:2022) run_timer_softirq (kernel/time/timer.c:2037) __do_softirq (./arch/x86/include/asm/jump_label.h:27 ./include/linux/jump_label.h:207 ./include/trace/events/irq.h:142 kernel/softirq.c:572) __irq_exit_rcu (kernel/softirq.c:445 kernel/softirq.c:650) irq_exit_rcu (kernel/softirq.c:664) sysvec_apic_timer_interrupt (arch/x86/kernel/apic/apic.c:1107 (discriminator 14)) </IRQ> Fixes: ac7138746e14 ("smc: establish new socket family") Reported-by: syzbot+7e1e1bdb852961150198@syzkaller.appspotmail.com Link: https://lore.kernel.org/netdev/000000000000a3f51805f8bcc43a@google.com/ Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Tony Lu <tonylu@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
9d876d3e |
|
13-Mar-2023 |
Stefan Raspl <raspl@linux.ibm.com> |
net/smc: Fix device de-init sequence CLC message initialization was not properly reversed in error handling path. Reported-and-suggested-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Stefan Raspl <raspl@linux.ibm.com> Signed-off-by: Wenjia Zhang <wenjia@linux.ibm.com> Reviewed-by: Tony Lu <tonylu@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
ce7ca794 |
|
06-Mar-2023 |
D. Wythe <alibuda@linux.alibaba.com> |
net/smc: fix fallback failed while sendmsg with fastopen Before determining whether the msg has unsupported options, it has been prematurely terminated by the wrong status check. For the application, the general usages of MSG_FASTOPEN likes fd = socket(...) /* rather than connect */ sendto(fd, data, len, MSG_FASTOPEN) Hence, We need to check the flag before state check, because the sock state here is always SMC_INIT when applications tries MSG_FASTOPEN. Once we found unsupported options, fallback it to TCP. Fixes: ee9dfbef02d1 ("net/smc: handle sockopts forcing fallback") Signed-off-by: D. Wythe <alibuda@linux.alibaba.com> Signed-off-by: Simon Horman <simon.horman@corigine.com> v2 -> v1: Optimize code style Reviewed-by: Tony Lu <tonylu@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
e40b801b |
|
15-Feb-2023 |
D. Wythe <alibuda@linux.alibaba.com> |
net/smc: fix potential panic dues to unprotected smc_llc_srv_add_link() There is a certain chance to trigger the following panic: PID: 5900 TASK: ffff88c1c8af4100 CPU: 1 COMMAND: "kworker/1:48" #0 [ffff9456c1cc79a0] machine_kexec at ffffffff870665b7 #1 [ffff9456c1cc79f0] __crash_kexec at ffffffff871b4c7a #2 [ffff9456c1cc7ab0] crash_kexec at ffffffff871b5b60 #3 [ffff9456c1cc7ac0] oops_end at ffffffff87026ce7 #4 [ffff9456c1cc7ae0] page_fault_oops at ffffffff87075715 #5 [ffff9456c1cc7b58] exc_page_fault at ffffffff87ad0654 #6 [ffff9456c1cc7b80] asm_exc_page_fault at ffffffff87c00b62 [exception RIP: ib_alloc_mr+19] RIP: ffffffffc0c9cce3 RSP: ffff9456c1cc7c38 RFLAGS: 00010202 RAX: 0000000000000000 RBX: 0000000000000002 RCX: 0000000000000004 RDX: 0000000000000010 RSI: 0000000000000000 RDI: 0000000000000000 RBP: ffff88c1ea281d00 R8: 000000020a34ffff R9: ffff88c1350bbb20 R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000 R13: 0000000000000010 R14: ffff88c1ab040a50 R15: ffff88c1ea281d00 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #7 [ffff9456c1cc7c60] smc_ib_get_memory_region at ffffffffc0aff6df [smc] #8 [ffff9456c1cc7c88] smcr_buf_map_link at ffffffffc0b0278c [smc] #9 [ffff9456c1cc7ce0] __smc_buf_create at ffffffffc0b03586 [smc] The reason here is that when the server tries to create a second link, smc_llc_srv_add_link() has no protection and may add a new link to link group. This breaks the security environment protected by llc_conf_mutex. Fixes: 2d2209f20189 ("net/smc: first part of add link processing as SMC server") Signed-off-by: D. Wythe <alibuda@linux.alibaba.com> Reviewed-by: Larysa Zaremba <larysa.zaremba@intel.com> Reviewed-by: Wenjia Zhang <wenjia@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
fe33311c |
|
13-Feb-2023 |
Jason Xing <kernelxing@tencent.com> |
net: no longer support SOCK_REFCNT_DEBUG feature Commit e48c414ee61f ("[INET]: Generalise the TCP sock ID lookup routines") commented out the definition of SOCK_REFCNT_DEBUG in 2005 and later another commit 463c84b97f24 ("[NET]: Introduce inet_connection_sock") removed it. Since we could track all of them through bpf and kprobe related tools and the feature could print loads of information which might not be that helpful even under a little bit pressure, the whole feature which has been inactive for many years is no longer supported. Link: https://lore.kernel.org/lkml/20230211065153.54116-1-kerneljasonxing@gmail.com/ Suggested-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: Jason Xing <kernelxing@tencent.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Acked-by: Wenjia Zhang <wenjia@linux.ibm.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Acked-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
4da68744 |
|
02-Feb-2023 |
D. Wythe <alibuda@linux.alibaba.com> |
net/smc: reduce unnecessary blocking in smcr_lgr_reg_rmbs() Unlike smc_buf_create() and smcr_buf_unuse(), smcr_lgr_reg_rmbs() is exclusive when assigned rmb_desc was not registered, although it can be executed in parallel when assigned rmb_desc was registered already and only performs read semtamics on it. Hence, we can not simply replace it with read semaphore. The idea here is that if the assigned rmb_desc was registered already, use read semaphore to protect the critical section, once the assigned rmb_desc was not registered, keep using keep write semaphore still to keep its exclusivity. Thanks to the reusable features of rmb_desc, which allows us to execute in parallel in most cases. Signed-off-by: D. Wythe <alibuda@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
b5dd4d69 |
|
02-Feb-2023 |
D. Wythe <alibuda@linux.alibaba.com> |
net/smc: llc_conf_mutex refactor, replace it with rw_semaphore llc_conf_mutex was used to protect links and link related configurations in the same link group, for example, add or delete links. However, in most cases, the protected critical area has only read semantics and with no write semantics at all, such as obtaining a usable link or an available rmb_desc. This patch do simply code refactoring, replace mutex with rw_semaphore, replace mutex_lock with down_write and replace mutex_unlock with up_write. Theoretically, this replacement is equivalent, but after this patch, we can distinguish lock granularity according to different semantics of critical areas. Signed-off-by: D. Wythe <alibuda@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
509f15b9 |
|
26-Jan-2023 |
Jakub Kicinski <kuba@kernel.org> |
net: add missing includes of linux/splice.h Number of files depend on linux/splice.h getting included by linux/skbuff.h which soon will no longer be the case. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
8c81ba20 |
|
23-Jan-2023 |
Stefan Raspl <raspl@linux.ibm.com> |
net/smc: De-tangle ism and smc device initialization The struct device for ISM devices was part of struct smcd_dev. Move to struct ism_dev, provide a new API call in struct smcd_ops, and convert existing SMCD code accordingly. Furthermore, remove struct smcd_dev from struct ism_dev. This is the final part of a bigger overhaul of the interfaces between SMC and ISM. Signed-off-by: Stefan Raspl <raspl@linux.ibm.com> Signed-off-by: Jan Karcher <jaka@linux.ibm.com> Signed-off-by: Wenjia Zhang <wenjia@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
8747716f |
|
23-Jan-2023 |
Stefan Raspl <raspl@linux.ibm.com> |
net/smc: Register SMC-D as ISM client Register the smc module with the new ism device driver API. This is the second part of a bigger overhaul of the interfaces between SMC and ISM. Signed-off-by: Stefan Raspl <raspl@linux.ibm.com> Signed-off-by: Jan Karcher <jaka@linux.ibm.com> Signed-off-by: Wenjia Zhang <wenjia@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
62ff373d |
|
01-Nov-2022 |
Chen Zhongjin <chenzhongjin@huawei.com> |
net/smc: Fix possible leaked pernet namespace in smc_init() In smc_init(), register_pernet_subsys(&smc_net_stat_ops) is called without any error handling. If it fails, registering of &smc_net_ops won't be reverted. And if smc_nl_init() fails, &smc_net_stat_ops itself won't be reverted. This leaves wild ops in subsystem linkedlist and when another module tries to call register_pernet_operations() it triggers page fault: BUG: unable to handle page fault for address: fffffbfff81b964c RIP: 0010:register_pernet_operations+0x1b9/0x5f0 Call Trace: <TASK> register_pernet_subsys+0x29/0x40 ebtables_init+0x58/0x1000 [ebtables] ... Fixes: 194730a9beb5 ("net/smc: Make SMC statistics network namespace aware") Signed-off-by: Chen Zhongjin <chenzhongjin@huawei.com> Reviewed-by: Tony Lu <tonylu@linux.alibaba.com> Reviewed-by: Wenjia Zhang <wenjia@linux.ibm.com> Link: https://lore.kernel.org/r/20221101093722.127223-1-chenzhongjin@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
6627a207 |
|
22-Sep-2022 |
Tony Lu <tonylu@linux.alibaba.com> |
net/smc: Support SO_REUSEPORT This enables SO_REUSEPORT [1] for clcsock when it is set on smc socket, so that some applications which uses it can be transparently replaced with SMC. Also, this helps improve load distribution. Here is a simple test of NGINX + wrk with SMC. The CPU usage is collected on NGINX (server) side as below. Disable SO_REUSEPORT: 05:15:33 PM CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle 05:15:34 PM all 7.02 0.00 11.86 0.00 2.04 8.93 0.00 0.00 0.00 70.15 05:15:34 PM 0 0.00 0.00 0.00 0.00 16.00 70.00 0.00 0.00 0.00 14.00 05:15:34 PM 1 11.58 0.00 22.11 0.00 0.00 0.00 0.00 0.00 0.00 66.32 05:15:34 PM 2 1.00 0.00 1.00 0.00 0.00 0.00 0.00 0.00 0.00 98.00 05:15:34 PM 3 16.84 0.00 30.53 0.00 0.00 0.00 0.00 0.00 0.00 52.63 05:15:34 PM 4 28.72 0.00 44.68 0.00 0.00 0.00 0.00 0.00 0.00 26.60 05:15:34 PM 5 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 05:15:34 PM 6 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 05:15:34 PM 7 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 Enable SO_REUSEPORT: 05:15:20 PM CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle 05:15:21 PM all 8.56 0.00 14.40 0.00 2.20 9.86 0.00 0.00 0.00 64.98 05:15:21 PM 0 0.00 0.00 4.08 0.00 14.29 76.53 0.00 0.00 0.00 5.10 05:15:21 PM 1 9.09 0.00 16.16 0.00 1.01 0.00 0.00 0.00 0.00 73.74 05:15:21 PM 2 9.38 0.00 16.67 0.00 1.04 0.00 0.00 0.00 0.00 72.92 05:15:21 PM 3 10.42 0.00 17.71 0.00 1.04 0.00 0.00 0.00 0.00 70.83 05:15:21 PM 4 9.57 0.00 15.96 0.00 0.00 0.00 0.00 0.00 0.00 74.47 05:15:21 PM 5 9.18 0.00 15.31 0.00 0.00 1.02 0.00 0.00 0.00 74.49 05:15:21 PM 6 8.60 0.00 15.05 0.00 0.00 0.00 0.00 0.00 0.00 76.34 05:15:21 PM 7 12.37 0.00 14.43 0.00 0.00 0.00 0.00 0.00 0.00 73.20 Using SO_REUSEPORT helps the load distribution of NGINX be more balanced. [1] https://man7.org/linux/man-pages/man7/socket.7.html Signed-off-by: Tony Lu <tonylu@linux.alibaba.com> Acked-by: Wenjia Zhang <wenjia@linux.ibm.com> Link: https://lore.kernel.org/r/20220922121906.72406-1-tonylu@linux.alibaba.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
#
0227f058 |
|
20-Sep-2022 |
Tony Lu <tonylu@linux.alibaba.com> |
net/smc: Unbind r/w buffer size from clcsock and make them tunable Currently, SMC uses smc->sk.sk_{rcv|snd}buf to create buffers for send buffer and RMB. And the values of buffer size are from tcp_{w|r}mem in clcsock. The buffer size from TCP socket doesn't fit SMC well. Generally, buffers are usually larger than TCP for SMC-R/-D to get higher performance, for they are different underlay devices and paths. So this patch unbinds buffer size from TCP, and introduces two sysctl knobs to tune them independently. Also, these knobs are per net namespace and work for containers. Signed-off-by: Tony Lu <tonylu@linux.alibaba.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
#
a8424a9b |
|
30-Aug-2022 |
Yacan Liu <liuyacan@corp.netease.com> |
net/smc: Remove redundant refcount increase For passive connections, the refcount increment has been done in smc_clcsock_accept()-->smc_sock_alloc(). Fixes: 3b2dec2603d5 ("net/smc: restructure client and server code in af_smc") Signed-off-by: Yacan Liu <liuyacan@corp.netease.com> Reviewed-by: Tony Lu <tonylu@linux.alibaba.com> Link: https://lore.kernel.org/r/20220830152314.838736-1-liuyacan@corp.netease.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
#
28ec53f3 |
|
25-Jul-2022 |
Stefan Raspl <raspl@linux.ibm.com> |
net/smc: Enable module load on netlink usage Previously, the smc and smc_diag modules were automatically loaded as dependencies of the ism module whenever an ISM device was present. With the pending rework of the ISM API, the smc module will no longer automatically be loaded in presence of an ISM device. Usage of an AF_SMC socket will still trigger loading of the smc modules, but usage of a netlink socket will not. This is addressed by setting the correct module aliases. Signed-off-by: Stefan Raspl <raspl@linux.ibm.com> Signed-off-by: Wenjia Zhang < wenjia@linux.ibm.com> Reviewed-by: Tony Lu <tonylu@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
b8d19945 |
|
14-Jul-2022 |
Wen Gu <guwen@linux.alibaba.com> |
net/smc: Allow virtually contiguous sndbufs or RMBs for SMC-R On long-running enterprise production servers, high-order contiguous memory pages are usually very rare and in most cases we can only get fragmented pages. When replacing TCP with SMC-R in such production scenarios, attempting to allocate high-order physically contiguous sndbufs and RMBs may result in frequent memory compaction, which will cause unexpected hung issue and further stability risks. So this patch is aimed to allow SMC-R link group to use virtually contiguous sndbufs and RMBs to avoid potential issues mentioned above. Whether to use physically or virtually contiguous buffers can be set by sysctl smcr_buf_type. Note that using virtually contiguous buffers will bring an acceptable performance regression, which can be mainly divided into two parts: 1) regression in data path, which is brought by additional address translation of sndbuf by RNIC in Tx. But in general, translating address through MTT is fast. Taking 256KB sndbuf and RMB as an example, the comparisons in qperf latency and bandwidth test with physically and virtually contiguous buffers are as follows: - client: smc_run taskset -c <cpu> qperf <server> -oo msg_size:1:64K:*2\ -t 5 -vu tcp_{bw|lat} - server: smc_run taskset -c <cpu> qperf [latency] msgsize tcp smcr smcr-use-virt-buf 1 11.17 us 7.56 us 7.51 us (-0.67%) 2 10.65 us 7.74 us 7.56 us (-2.31%) 4 11.11 us 7.52 us 7.59 us ( 0.84%) 8 10.83 us 7.55 us 7.51 us (-0.48%) 16 11.21 us 7.46 us 7.51 us ( 0.71%) 32 10.65 us 7.53 us 7.58 us ( 0.61%) 64 10.95 us 7.74 us 7.80 us ( 0.76%) 128 11.14 us 7.83 us 7.87 us ( 0.47%) 256 10.97 us 7.94 us 7.92 us (-0.28%) 512 11.23 us 7.94 us 8.20 us ( 3.25%) 1024 11.60 us 8.12 us 8.20 us ( 0.96%) 2048 14.04 us 8.30 us 8.51 us ( 2.49%) 4096 16.88 us 9.13 us 9.07 us (-0.64%) 8192 22.50 us 10.56 us 11.22 us ( 6.26%) 16384 28.99 us 12.88 us 13.83 us ( 7.37%) 32768 40.13 us 16.76 us 16.95 us ( 1.16%) 65536 68.70 us 24.68 us 24.85 us ( 0.68%) [bandwidth] msgsize tcp smcr smcr-use-virt-buf 1 1.65 MB/s 1.59 MB/s 1.53 MB/s (-3.88%) 2 3.32 MB/s 3.17 MB/s 3.08 MB/s (-2.67%) 4 6.66 MB/s 6.33 MB/s 6.09 MB/s (-3.85%) 8 13.67 MB/s 13.45 MB/s 11.97 MB/s (-10.99%) 16 25.36 MB/s 27.15 MB/s 24.16 MB/s (-11.01%) 32 48.22 MB/s 54.24 MB/s 49.41 MB/s (-8.89%) 64 106.79 MB/s 107.32 MB/s 99.05 MB/s (-7.71%) 128 210.21 MB/s 202.46 MB/s 201.02 MB/s (-0.71%) 256 400.81 MB/s 416.81 MB/s 393.52 MB/s (-5.59%) 512 746.49 MB/s 834.12 MB/s 809.99 MB/s (-2.89%) 1024 1292.33 MB/s 1641.96 MB/s 1571.82 MB/s (-4.27%) 2048 2007.64 MB/s 2760.44 MB/s 2717.68 MB/s (-1.55%) 4096 2665.17 MB/s 4157.44 MB/s 4070.76 MB/s (-2.09%) 8192 3159.72 MB/s 4361.57 MB/s 4270.65 MB/s (-2.08%) 16384 4186.70 MB/s 4574.13 MB/s 4501.17 MB/s (-1.60%) 32768 4093.21 MB/s 4487.42 MB/s 4322.43 MB/s (-3.68%) 65536 4057.14 MB/s 4735.61 MB/s 4555.17 MB/s (-3.81%) 2) regression in buffer initialization and destruction path, which is brought by additional MR operations of sndbufs. But thanks to link group buffer reuse mechanism, the impact of this kind of regression decreases as times of buffer reuse increases. Taking 256KB sndbuf and RMB as an example, latency of some key SMC-R buffer-related function obtained by bpftrace are as follows: Function Phys-bufs Virt-bufs smcr_new_buf_create() 67154 ns 79164 ns smc_ib_buf_map_sg() 525 ns 928 ns smc_ib_get_memory_region() 162294 ns 161191 ns smc_wr_reg_send() 9957 ns 9635 ns smc_ib_put_memory_region() 203548 ns 198374 ns smc_ib_buf_unmap_sg() 508 ns 1158 ns ------------ Test environment notes: 1. Above tests run on 2 VMs within the same Host. 2. The NIC is ConnectX-4Lx, using SRIOV and passing through 2 VFs to the each VM respectively. 3. VMs' vCPUs are binded to different physical CPUs, and the binded physical CPUs are isolated by `isolcpus=xxx` cmdline. 4. NICs' queue number are set to 1. Signed-off-by: Wen Gu <guwen@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
6d52e2de |
|
14-Jul-2022 |
Guangguan Wang <guangguan.wang@linux.alibaba.com> |
net/smc: remove redundant dma sync ops smc_ib_sync_sg_for_cpu/device are the ops used for dma memory cache consistency. Smc sndbufs are dma buffers, where CPU writes data to it and PCIE device reads data from it. So for sndbufs, smc_ib_sync_sg_for_device is needed and smc_ib_sync_sg_for_cpu is redundant as PCIE device will not write the buffers. Smc rmbs are dma buffers, where PCIE device write data to it and CPU read data from it. So for rmbs, smc_ib_sync_sg_for_cpu is needed and smc_ib_sync_sg_for_device is redundant as CPU will not write the buffers. Signed-off-by: Guangguan Wang <guangguan.wang@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
b3b1a175 |
|
25-May-2022 |
liuyacan <liuyacan@corp.netease.com> |
net/smc: set ini->smcrv2.ib_dev_v2 to NULL if SMC-Rv2 is unavailable In the process of checking whether RDMAv2 is available, the current implementation first sets ini->smcrv2.ib_dev_v2, and then allocates smc buf desc and register rmb, but the latter may fail. In this case, the pointer should be reset. Fixes: e49300a6bf62 ("net/smc: add listen processing for SMC-Rv2") Signed-off-by: liuyacan <liuyacan@corp.netease.com> Reviewed-by: Karsten Graul <kgraul@linux.ibm.com> Link: https://lore.kernel.org/r/20220525085408.812273-1-liuyacan@corp.netease.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
9029ac03 |
|
24-May-2022 |
liuyacan <liuyacan@corp.netease.com> |
Revert "net/smc: fix listen processing for SMC-Rv2" This reverts commit 8c3b8dc5cc9bf6d273ebe18b16e2d6882bcfb36d. Some rollback issue will be fixed in other patches in the future. Link: https://lore.kernel.org/all/20220523055056.2078994-1-liuyacan@corp.netease.com/ Fixes: 8c3b8dc5cc9b ("net/smc: fix listen processing for SMC-Rv2") Signed-off-by: liuyacan <liuyacan@corp.netease.com> Link: https://lore.kernel.org/r/20220524090230.2140302-1-liuyacan@corp.netease.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
8c3b8dc5 |
|
22-May-2022 |
liuyacan <liuyacan@corp.netease.com> |
net/smc: fix listen processing for SMC-Rv2 In the process of checking whether RDMAv2 is available, the current implementation first sets ini->smcrv2.ib_dev_v2, and then allocates smc buf desc, but the latter may fail. Unfortunately, the caller will only check the former. In this case, a NULL pointer reference will occur in smc_clc_send_confirm_accept() when accessing conn->rmb_desc. This patch does two things: 1. Use the return code to determine whether V2 is available. 2. If the return code is NODEV, continue to check whether V1 is available. Fixes: e49300a6bf62 ("net/smc: add listen processing for SMC-Rv2") Signed-off-by: liuyacan <liuyacan@corp.netease.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
75c1edf2 |
|
22-May-2022 |
liuyacan <liuyacan@corp.netease.com> |
net/smc: postpone sk_refcnt increment in connect() Same trigger condition as commit 86434744. When setsockopt runs in parallel to a connect(), and switch the socket into fallback mode. Then the sk_refcnt is incremented in smc_connect(), but its state stay in SMC_INIT (NOT SMC_ACTIVE). This cause the corresponding sk_refcnt decrement in __smc_release() will not be performed. Fixes: 86434744fedf ("net/smc: add fallback check to connect()") Signed-off-by: liuyacan <liuyacan@corp.netease.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
3aba1030 |
|
12-May-2022 |
Guangguan Wang <guangguan.wang@linux.alibaba.com> |
net/smc: align the connect behaviour with TCP Connect with O_NONBLOCK will not be completed immediately and returns -EINPROGRESS. It is possible to use selector/poll for completion by selecting the socket for writing. After select indicates writability, a second connect function call will return 0 to indicate connected successfully as TCP does, but smc returns -EISCONN. Use socket state for smc to indicate connect state, which can help smc aligning the connect behaviour with TCP. Signed-off-by: Guangguan Wang <guangguan.wang@linux.alibaba.com> Acked-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
0558226c |
|
22-Apr-2022 |
Wen Gu <guwen@linux.alibaba.com> |
net/smc: Fix slab-out-of-bounds issue in fallback syzbot reported a slab-out-of-bounds/use-after-free issue, which was caused by accessing an already freed smc sock in fallback-specific callback functions of clcsock. This patch fixes the issue by restoring fallback-specific callback functions to original ones and resetting clcsock sk_user_data to NULL before freeing smc sock. Meanwhile, this patch introduces sk_callback_lock to make the access and assignment to sk_user_data mutually exclusive. Reported-by: syzbot+b425899ed22c6943e00b@syzkaller.appspotmail.com Fixes: 341adeec9ada ("net/smc: Forward wakeup to smc socket waitqueue after fallback") Link: https://lore.kernel.org/r/00000000000013ca8105d7ae3ada@google.com/ Signed-off-by: Wen Gu <guwen@linux.alibaba.com> Acked-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
97b9af7a |
|
22-Apr-2022 |
Wen Gu <guwen@linux.alibaba.com> |
net/smc: Only save the original clcsock callback functions Both listen and fallback process will save the current clcsock callback functions and establish new ones. But if both of them happen, the saved callback functions will be overwritten. So this patch introduces some helpers to ensure that only save the original callback functions of clcsock. Fixes: 341adeec9ada ("net/smc: Forward wakeup to smc socket waitqueue after fallback") Signed-off-by: Wen Gu <guwen@linux.alibaba.com> Acked-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
4e2e65e2 |
|
21-Apr-2022 |
liuyacan <liuyacan@corp.netease.com> |
net/smc: sync err code when tcp connection was refused In the current implementation, when TCP initiates a connection to an unavailable [ip,port], ECONNREFUSED will be stored in the TCP socket, but SMC will not. However, some apps (like curl) use getsockopt(,,SO_ERROR,,) to get the error information, which makes them miss the error message and behave strangely. Fixes: 50717a37db03 ("net/smc: nonblocking connect rework") Signed-off-by: liuyacan <liuyacan@corp.netease.com> Reviewed-by: Tony Lu <tonylu@linux.alibaba.com> Acked-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
1a74e993 |
|
14-Apr-2022 |
Tony Lu <tonylu@linux.alibaba.com> |
net/smc: Fix sock leak when release after smc_shutdown() Since commit e5d5aadcf3cd ("net/smc: fix sk_refcnt underflow on linkdown and fallback"), for a fallback connection, __smc_release() does not call sock_put() if its state is already SMC_CLOSED. When calling smc_shutdown() after falling back, its state is set to SMC_CLOSED but does not call sock_put(), so this patch calls it. Reported-and-tested-by: syzbot+6e29a053eb165bd50de5@syzkaller.appspotmail.com Fixes: e5d5aadcf3cd ("net/smc: fix sk_refcnt underflow on linkdown and fallback") Signed-off-by: Tony Lu <tonylu@linux.alibaba.com> Acked-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
49b7d376 |
|
08-Apr-2022 |
Karsten Graul <kgraul@linux.ibm.com> |
net/smc: Fix af_ops of child socket pointing to released memory Child sockets may inherit the af_ops from the parent listen socket. When the listen socket is released then the af_ops of the child socket points to released memory. Solve that by restoring the original af_ops for child sockets which inherited the parent af_ops. And clear any inherited user_data of the parent socket. Fixes: 8270d9c21041 ("net/smc: Limit backlog connections") Reviewed-by: Wenjia Zhang <wenjia@linux.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Reviewed-by: D. Wythe <alibuda@linux.alibaba.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
7de8eb0d |
|
06-Mar-2022 |
Dust Li <dust.li@linux.alibaba.com> |
net/smc: fix compile warning for smc_sysctl kernel test robot reports multiple warning for smc_sysctl: In file included from net/smc/smc_sysctl.c:17: >> net/smc/smc_sysctl.h:23:5: warning: no previous prototype \ for function 'smc_sysctl_init' [-Wmissing-prototypes] int smc_sysctl_init(void) ^ and >> WARNING: modpost: vmlinux.o(.text+0x12ced2d): Section mismatch \ in reference from the function smc_sysctl_exit() to the variable .init.data:smc_sysctl_ops The function smc_sysctl_exit() references the variable __initdata smc_sysctl_ops. This is often because smc_sysctl_exit lacks a __initdata annotation or the annotation of smc_sysctl_ops is wrong. and net/smc/smc_sysctl.c: In function 'smc_sysctl_init_net': net/smc/smc_sysctl.c:47:17: error: 'struct netns_smc' has no member named 'smc_hdr' 47 | net->smc.smc_hdr = register_net_sysctl(net, "net/smc", table); Since we don't need global sysctl initialization. To make things clean and simple, remove the global pernet_operations and smc_sysctl_{init|exit}. Call smc_sysctl_net_{init|exit} directly from smc_net_{init|exit}. Also initialized sysctl_autocorking_size if CONFIG_SYSCTL it not set, this make sure SMC autocorking is enabled by default if CONFIG_SYSCTL is not set. Fixes: 462791bbfa35 ("net/smc: add sysctl interface for SMC") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Dust Li <dust.li@linux.alibaba.com> Tested-by: Randy Dunlap <rdunlap@infradead.org> # build-tested Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
6b88af83 |
|
01-Mar-2022 |
Dust Li <dust.li@linux.alibaba.com> |
net/smc: don't send in the BH context if sock_owned_by_user Send data all the way down to the RDMA device is a time consuming operation(get a new slot, maybe do RDMA Write and send a CDC, etc). Moving those operations from BH to user context is good for performance. If the sock_lock is hold by user, we don't try to send data out in the BH context, but just mark we should send. Since the user will release the sock_lock soon, we can do the sending there. Add smc_release_cb() which will be called in release_sock() and try send in the callback if needed. This patch moves the sending part out from BH if sock lock is hold by user. In my testing environment, this saves about 20% softirq in the qperf 4K tcp_bw test in the sender side with no noticeable throughput drop. Signed-off-by: Dust Li <dust.li@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
b70a5cc0 |
|
01-Mar-2022 |
Dust Li <dust.li@linux.alibaba.com> |
net/smc: send directly on setting TCP_NODELAY In commit ea785a1a573b("net/smc: Send directly when TCP_CORK is cleared"), we don't use delayed work to implement cork. This patch use the same algorithm, removes the delayed work when setting TCP_NODELAY and send directly in setsockopt(). This also makes the TCP_NODELAY the same as TCP. Cc: Tony Lu <tonylu@linux.alibaba.com> Signed-off-by: Dust Li <dust.li@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
462791bb |
|
01-Mar-2022 |
Dust Li <dust.li@linux.alibaba.com> |
net/smc: add sysctl interface for SMC This patch add sysctl interface to support container environment for SMC as we talk in the mail list. Link: https://lore.kernel.org/netdev/20220224020253.GF5443@linux.alibaba.com Co-developed-by: Tony Lu <tonylu@linux.alibaba.com> Signed-off-by: Tony Lu <tonylu@linux.alibaba.com> Signed-off-by: Dust Li <dust.li@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
7a11455f |
|
18-Feb-2022 |
Dan Carpenter <dan.carpenter@oracle.com> |
net/smc: unlock on error paths in __smc_setsockopt() These two error paths need to release_sock(sk) before returning. Fixes: a6a6fe27bab4 ("net/smc: Dynamic control handshake limitation by socket options") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: D. Wythe <alibuda@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
1ce22047 |
|
15-Feb-2022 |
D. Wythe <alibuda@linux.alibaba.com> |
net/smc: return ETIMEDOUT when smc_connect_clc() timeout When smc_connect_clc() times out, it will return -EAGAIN(tcp_recvmsg retuns -EAGAIN while timeout), then this value will passed to the application, which is quite confusing to the applications, makes inconsistency with TCP. From the manual of connect, ETIMEDOUT is more suitable, and this patch try convert EAGAIN to ETIMEDOUT in that case. Signed-off-by: D. Wythe <alibuda@linux.alibaba.com> Reviewed-by: Karsten Graul <kgraul@linux.ibm.com> Link: https://lore.kernel.org/r/1644913490-21594-1-git-send-email-alibuda@linux.alibaba.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
f9496b7c |
|
10-Feb-2022 |
D. Wythe <alibuda@linux.alibaba.com> |
net/smc: Add global configure for handshake limitation by netlink Although we can control SMC handshake limitation through socket options, which means that applications who need it must modify their code. It's quite troublesome for many existing applications. This patch modifies the global default value of SMC handshake limitation through netlink, providing a way to put constraint on handshake without modifies any code for applications. Suggested-by: Tony Lu <tonylu@linux.alibaba.com> Signed-off-by: D. Wythe <alibuda@linux.alibaba.com> Reviewed-by: Tony Lu <tonylu@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
a6a6fe27 |
|
10-Feb-2022 |
D. Wythe <alibuda@linux.alibaba.com> |
net/smc: Dynamic control handshake limitation by socket options This patch aims to add dynamic control for SMC handshake limitation for every smc sockets, in production environment, it is possible for the same applications to handle different service types, and may have different opinion on SMC handshake limitation. This patch try socket options to complete it, since we don't have socket option level for SMC yet, which requires us to implement it at the same time. This patch does the following: - add new socket option level: SOL_SMC. - add new SMC socket option: SMC_LIMIT_HS. - provide getter/setter for SMC socket options. Link: https://lore.kernel.org/all/20f504f961e1a803f85d64229ad84260434203bd.1644323503.git.alibuda@linux.alibaba.com/ Signed-off-by: D. Wythe <alibuda@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
48b6190a |
|
10-Feb-2022 |
D. Wythe <alibuda@linux.alibaba.com> |
net/smc: Limit SMC visits when handshake workqueue congested This patch intends to provide a mechanism to put constraint on SMC connections visit according to the pressure of SMC handshake process. At present, frequent visits will cause the incoming connections to be backlogged in SMC handshake queue, raise the connections established time. Which is quite unacceptable for those applications who base on short lived connections. There are two ways to implement this mechanism: 1. Put limitation after TCP established. 2. Put limitation before TCP established. In the first way, we need to wait and receive CLC messages that the client will potentially send, and then actively reply with a decline message, in a sense, which is also a sort of SMC handshake, affect the connections established time on its way. In the second way, the only problem is that we need to inject SMC logic into TCP when it is about to reply the incoming SYN, since we already do that, it's seems not a problem anymore. And advantage is obvious, few additional processes are required to complete the constraint. This patch use the second way. After this patch, connections who beyond constraint will not informed any SMC indication, and SMC will not be involved in any of its subsequent processes. Link: https://lore.kernel.org/all/1641301961-59331-1-git-send-email-alibuda@linux.alibaba.com/ Signed-off-by: D. Wythe <alibuda@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
8270d9c2 |
|
10-Feb-2022 |
D. Wythe <alibuda@linux.alibaba.com> |
net/smc: Limit backlog connections Current implementation does not handling backlog semantics, one potential risk is that server will be flooded by infinite amount connections, even if client was SMC-incapable. This patch works to put a limit on backlog connections, referring to the TCP implementation, we divides SMC connections into two categories: 1. Half SMC connection, which includes all TCP established while SMC not connections. 2. Full SMC connection, which includes all SMC established connections. For half SMC connection, since all half SMC connections starts with TCP established, we can achieve our goal by put a limit before TCP established. Refer to the implementation of TCP, this limits will based on not only the half SMC connections but also the full connections, which is also a constraint on full SMC connections. For full SMC connections, although we know exactly where it starts, it's quite hard to put a limit before it. The easiest way is to block wait before receive SMC confirm CLC message, while it's under protection by smc_server_lgr_pending, a global lock, which leads this limit to the entire host instead of a single listen socket. Another way is to drop the full connections, but considering the cast of SMC connections, we prefer to keep full SMC connections. Even so, the limits of full SMC connections still exists, see commits about half SMC connection below. After this patch, the limits of backend connection shows like: For SMC: 1. Client with SMC-capability can makes 2 * backlog full SMC connections or 1 * backlog half SMC connections and 1 * backlog full SMC connections at most. 2. Client without SMC-capability can only makes 1 * backlog half TCP connections and 1 * backlog full TCP connections. Signed-off-by: D. Wythe <alibuda@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
3079e342 |
|
10-Feb-2022 |
D. Wythe <alibuda@linux.alibaba.com> |
net/smc: Make smc_tcp_listen_work() independent In multithread and 10K connections benchmark, the backend TCP connection established very slowly, and lots of TCP connections stay in SYN_SENT state. Client: smc_run wrk -c 10000 -t 4 http://server the netstate of server host shows like: 145042 times the listen queue of a socket overflowed 145042 SYNs to LISTEN sockets dropped One reason of this issue is that, since the smc_tcp_listen_work() shared the same workqueue (smc_hs_wq) with smc_listen_work(), while the smc_listen_work() do blocking wait for smc connection established. Once the workqueue became congested, it's will block the accept() from TCP listen. This patch creates a independent workqueue(smc_tcp_ls_wq) for smc_tcp_listen_work(), separate it from smc_listen_work(), which is quite acceptable considering that smc_tcp_listen_work() runs very fast. Signed-off-by: D. Wythe <alibuda@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
be9a16cc |
|
30-Jan-2022 |
Tony Lu <tonylu@linux.alibaba.com> |
net/smc: Cork when sendpage with MSG_SENDPAGE_NOTLAST flag This introduces a new corked flag, MSG_SENDPAGE_NOTLAST, which is involved in syscall sendfile() [1], it indicates this is not the last page. So we can cork the data until the page is not specify this flag. It has the same effect as MSG_MORE, but existed in sendfile() only. This patch adds a option MSG_SENDPAGE_NOTLAST for corking data, try to cork more data before sending when using sendfile(), which acts like TCP's behaviour. Also, this reimplements the default sendpage to inform that it is supported to some extent. [1] https://man7.org/linux/man-pages/man2/sendfile.2.html Signed-off-by: Tony Lu <tonylu@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
ea785a1a |
|
30-Jan-2022 |
Tony Lu <tonylu@linux.alibaba.com> |
net/smc: Send directly when TCP_CORK is cleared According to the man page of TCP_CORK [1], if set, don't send out partial frames. All queued partial frames are sent when option is cleared again. When applications call setsockopt to disable TCP_CORK, this call is protected by lock_sock(), and tries to mod_delayed_work() to 0, in order to send pending data right now. However, the delayed work smc_tx_work is also protected by lock_sock(). There introduces lock contention for sending data. To fix it, send pending data directly which acts like TCP, without lock_sock() protected in the context of setsockopt (already lock_sock()ed), and cancel unnecessary dealyed work, which is protected by lock. [1] https://linux.die.net/man/7/tcp Signed-off-by: Tony Lu <tonylu@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
4d08b7b5 |
|
24-Feb-2022 |
Tony Lu <tonylu@linux.alibaba.com> |
net/smc: Fix cleanup when register ULP fails This patch calls smc_ib_unregister_client() when tcp_register_ulp() fails, and make sure to clean it up. Fixes: d7cd421da9da ("net/smc: Introduce TCP ULP support") Signed-off-by: Tony Lu <tonylu@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
9f1c50cf |
|
24-Feb-2022 |
D. Wythe <alibuda@linux.alibaba.com> |
net/smc: fix connection leak There's a potential leak issue under following execution sequence : smc_release smc_connect_work if (sk->sk_state == SMC_INIT) send_clc_confirim tcp_abort(); ... sk.sk_state = SMC_ACTIVE smc_close_active switch(sk->sk_state) { ... case SMC_ACTIVE: smc_close_final() // then wait peer closed Unfortunately, tcp_abort() may discard CLC CONFIRM messages that are still in the tcp send buffer, in which case our connection token cannot be delivered to the server side, which means that we cannot get a passive close message at all. Therefore, it is impossible for the to be disconnected at all. This patch tries a very simple way to avoid this issue, once the state has changed to SMC_ACTIVE after tcp_abort(), we can actively abort the smc connection, considering that the state is SMC_INIT before tcp_abort(), abandoning the complete disconnection process should not cause too much problem. In fact, this problem may exist as long as the CLC CONFIRM message is not received by the server. Whether a timer should be added after smc_close_final() needs to be discussed in the future. But even so, this patch provides a faster release for connection in above case, it should also be valuable. Fixes: 39f41f367b08 ("net/smc: common release code for non-accepted sockets") Signed-off-by: D. Wythe <alibuda@linux.alibaba.com> Acked-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
1de9770d |
|
09-Feb-2022 |
Wen Gu <guwen@linux.alibaba.com> |
net/smc: Avoid overwriting the copies of clcsock callback functions The callback functions of clcsock will be saved and replaced during the fallback. But if the fallback happens more than once, then the copies of these callback functions will be overwritten incorrectly, resulting in a loop call issue: clcsk->sk_error_report |- smc_fback_error_report() <------------------------------| |- smc_fback_forward_wakeup() | (loop) |- clcsock_callback() (incorrectly overwritten) | |- smc->clcsk_error_report() ------------------| So this patch fixes the issue by saving these function pointers only once in the fallback and avoiding overwriting. Reported-by: syzbot+4de3c0e8a263e1e499bc@syzkaller.appspotmail.com Fixes: 341adeec9ada ("net/smc: Forward wakeup to smc socket waitqueue after fallback") Link: https://lore.kernel.org/r/0000000000006d045e05d78776f6@google.com Signed-off-by: Wen Gu <guwen@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
341adeec |
|
26-Jan-2022 |
Wen Gu <guwen@linux.alibaba.com> |
net/smc: Forward wakeup to smc socket waitqueue after fallback When we replace TCP with SMC and a fallback occurs, there may be some socket waitqueue entries remaining in smc socket->wq, such as eppoll_entries inserted by userspace applications. After the fallback, data flows over TCP/IP and only clcsocket->wq will be woken up. Applications can't be notified by the entries which were inserted in smc socket->wq before fallback. So we need a mechanism to wake up smc socket->wq at the same time if some entries remaining in it. The current workaround is to transfer the entries from smc socket->wq to clcsock->wq during the fallback. But this may cause a crash like this: general protection fault, probably for non-canonical address 0xdead000000000100: 0000 [#1] PREEMPT SMP PTI CPU: 3 PID: 0 Comm: swapper/3 Kdump: loaded Tainted: G E 5.16.0+ #107 RIP: 0010:__wake_up_common+0x65/0x170 Call Trace: <IRQ> __wake_up_common_lock+0x7a/0xc0 sock_def_readable+0x3c/0x70 tcp_data_queue+0x4a7/0xc40 tcp_rcv_established+0x32f/0x660 ? sk_filter_trim_cap+0xcb/0x2e0 tcp_v4_do_rcv+0x10b/0x260 tcp_v4_rcv+0xd2a/0xde0 ip_protocol_deliver_rcu+0x3b/0x1d0 ip_local_deliver_finish+0x54/0x60 ip_local_deliver+0x6a/0x110 ? tcp_v4_early_demux+0xa2/0x140 ? tcp_v4_early_demux+0x10d/0x140 ip_sublist_rcv_finish+0x49/0x60 ip_sublist_rcv+0x19d/0x230 ip_list_rcv+0x13e/0x170 __netif_receive_skb_list_core+0x1c2/0x240 netif_receive_skb_list_internal+0x1e6/0x320 napi_complete_done+0x11d/0x190 mlx5e_napi_poll+0x163/0x6b0 [mlx5_core] __napi_poll+0x3c/0x1b0 net_rx_action+0x27c/0x300 __do_softirq+0x114/0x2d2 irq_exit_rcu+0xb4/0xe0 common_interrupt+0xba/0xe0 </IRQ> <TASK> The crash is caused by privately transferring waitqueue entries from smc socket->wq to clcsock->wq. The owners of these entries, such as epoll, have no idea that the entries have been transferred to a different socket wait queue and still use original waitqueue spinlock (smc socket->wq.wait.lock) to make the entries operation exclusive, but it doesn't work. The operations to the entries, such as removing from the waitqueue (now is clcsock->wq after fallback), may cause a crash when clcsock waitqueue is being iterated over at the moment. This patch tries to fix this by no longer transferring wait queue entries privately, but introducing own implementations of clcsock's callback functions in fallback situation. The callback functions will forward the wakeup to smc socket->wq if clcsock->wq is actually woken up and smc socket->wq has remaining entries. Fixes: 2153bd1e3d3d ("net/smc: Transfer remaining wait queue entries during fallback") Suggested-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: Wen Gu <guwen@linux.alibaba.com> Acked-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
c0bf3d8a |
|
22-Jan-2022 |
Wen Gu <guwen@linux.alibaba.com> |
net/smc: Transitional solution for clcsock race issue We encountered a crash in smc_setsockopt() and it is caused by accessing smc->clcsock after clcsock was released. BUG: kernel NULL pointer dereference, address: 0000000000000020 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: 0000 [#1] PREEMPT SMP PTI CPU: 1 PID: 50309 Comm: nginx Kdump: loaded Tainted: G E 5.16.0-rc4+ #53 RIP: 0010:smc_setsockopt+0x59/0x280 [smc] Call Trace: <TASK> __sys_setsockopt+0xfc/0x190 __x64_sys_setsockopt+0x20/0x30 do_syscall_64+0x34/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x7f16ba83918e </TASK> This patch tries to fix it by holding clcsock_release_lock and checking whether clcsock has already been released before access. In case that a crash of the same reason happens in smc_getsockopt() or smc_switch_to_fallback(), this patch also checkes smc->clcsock in them too. And the caller of smc_switch_to_fallback() will identify whether fallback succeeds according to the return value. Fixes: fd57770dd198 ("net/smc: wait for pending work before clcsock release_sock") Link: https://lore.kernel.org/lkml/5dd7ffd1-28e2-24cc-9442-1defec27375e@linux.ibm.com/T/ Signed-off-by: Wen Gu <guwen@linux.alibaba.com> Acked-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
ea89c6c0 |
|
13-Jan-2022 |
Wen Gu <guwen@linux.alibaba.com> |
net/smc: Introduce a new conn->lgr validity check helper It is no longer suitable to identify whether a smc connection is registered in a link group through checking if conn->lgr is NULL, because conn->lgr won't be reset even the connection is unregistered from a link group. So this patch introduces a new helper smc_conn_lgr_valid() and replaces all the check of conn->lgr in original implementation with the new helper to judge if conn->lgr is valid to use. Signed-off-by: Wen Gu <guwen@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
36595d8a |
|
06-Jan-2022 |
Wen Gu <guwen@linux.alibaba.com> |
net/smc: Reset conn->lgr when link group registration fails SMC connections might fail to be registered in a link group due to unable to find a usable link during its creation. As a result, smc_conn_create() will return a failure and most resources related to the connection won't be applied or initialized, such as conn->abort_work or conn->lnk. If smc_conn_free() is invoked later, it will try to access the uninitialized resources related to the connection, thus causing a warning or crash. This patch tries to fix this by resetting conn->lgr to NULL if an abnormal exit occurs in smc_lgr_register_conn(), thus avoiding the access to uninitialized resources in smc_conn_free(). Meanwhile, the new created link group should be terminated if smc connections can't be registered in it. So smc_lgr_cleanup_early() is modified to take care of link group only and invoked to terminate unusable link group by smc_conn_create(). The call to smc_conn_free() is moved out from smc_lgr_cleanup_early() to smc_conn_abort(). Fixes: 56bc3b2094b4 ("net/smc: assign link to a new connection") Suggested-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: Wen Gu <guwen@linux.alibaba.com> Acked-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
d7cd421d |
|
28-Dec-2021 |
Tony Lu <tonylu@linux.alibaba.com> |
net/smc: Introduce TCP ULP support This implements TCP ULP for SMC, helps applications to replace TCP with SMC protocol in place. And we use it to implement transparent replacement. This replaces original TCP sockets with SMC, reuse TCP as clcsock when calling setsockopt with TCP_ULP option, and without any overhead. To replace TCP sockets with SMC, there are two approaches: - use setsockopt() syscall with TCP_ULP option, if error, it would fallback to TCP. - use BPF prog with types BPF_CGROUP_INET_SOCK_CREATE or others to replace transparently. BPF hooks some points in create socket, bind and others, users can inject their BPF logics without modifying their applications, and choose which connections should be replaced with SMC by calling setsockopt() in BPF prog, based on rules, such as TCP tuples, PID, cgroup, etc... BPF doesn't support calling setsockopt with TCP_ULP now, I will send the patches after this accepted. Signed-off-by: Tony Lu <tonylu@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
b3cb764a |
|
15-Nov-2021 |
Eric Dumazet <edumazet@google.com> |
net: drop nopreempt requirement on sock_prot_inuse_add() This is distracting really, let's make this simpler, because many callers had to take care of this by themselves, even if on x86 this adds more code than really needed. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
5c15b312 |
|
15-Dec-2021 |
D. Wythe <alibuda@linux.alibaba.com> |
net/smc: Prevent smc_release() from long blocking In nginx/wrk benchmark, there's a hung problem with high probability on case likes that: (client will last several minutes to exit) server: smc_run nginx client: smc_run wrk -c 10000 -t 1 http://server Client hangs with the following backtrace: 0 [ffffa7ce8Of3bbf8] __schedule at ffffffff9f9eOd5f 1 [ffffa7ce8Of3bc88] schedule at ffffffff9f9eløe6 2 [ffffa7ce8Of3bcaO] schedule_timeout at ffffffff9f9e3f3c 3 [ffffa7ce8Of3bd2O] wait_for_common at ffffffff9f9el9de 4 [ffffa7ce8Of3bd8O] __flush_work at ffffffff9fOfeOl3 5 [ffffa7ce8øf3bdfO] smc_release at ffffffffcO697d24 [smc] 6 [ffffa7ce8Of3be2O] __sock_release at ffffffff9f8O2e2d 7 [ffffa7ce8Of3be4ø] sock_close at ffffffff9f8ø2ebl 8 [ffffa7ce8øf3be48] __fput at ffffffff9f334f93 9 [ffffa7ce8Of3be78] task_work_run at ffffffff9flOlff5 10 [ffffa7ce8Of3beaO] do_exit at ffffffff9fOe5Ol2 11 [ffffa7ce8Of3bflO] do_group_exit at ffffffff9fOe592a 12 [ffffa7ce8Of3bf38] __x64_sys_exit_group at ffffffff9fOe5994 13 [ffffa7ce8Of3bf4O] do_syscall_64 at ffffffff9f9d4373 14 [ffffa7ce8Of3bfsO] entry_SYSCALL_64_after_hwframe at ffffffff9fa0007c This issue dues to flush_work(), which is used to wait for smc_connect_work() to finish in smc_release(). Once lots of smc_connect_work() was pending or all executing work dangling, smc_release() has to block until one worker comes to free, which is equivalent to wait another smc_connnect_work() to finish. In order to fix this, There are two changes: 1. For those idle smc_connect_work(), cancel it from the workqueue; for executing smc_connect_work(), waiting for it to finish. For that purpose, replace flush_work() with cancel_work_sync(). 2. Since smc_connect() hold a reference for passive closing, if smc_connect_work() has been cancelled, release the reference. Fixes: 24ac3a08e658 ("net/smc: rebuild nonblocking connect") Reported-by: Tony Lu <tonylu@linux.alibaba.com> Tested-by: Dust Li <dust.li@linux.alibaba.com> Reviewed-by: Dust Li <dust.li@linux.alibaba.com> Reviewed-by: Tony Lu <tonylu@linux.alibaba.com> Signed-off-by: D. Wythe <alibuda@linux.alibaba.com> Acked-by: Karsten Graul <kgraul@linux.ibm.com> Link: https://lore.kernel.org/r/1639571361-101128-1-git-send-email-alibuda@linux.alibaba.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
bacb6c1e |
|
25-Nov-2021 |
Tony Lu <tonylu@linux.alibaba.com> |
net/smc: Don't call clcsock shutdown twice when smc shutdown When applications call shutdown() with SHUT_RDWR in userspace, smc_close_active() calls kernel_sock_shutdown(), and it is called twice in smc_shutdown(). This fixes this by checking sk_state before do clcsock shutdown, and avoids missing the application's call of smc_shutdown(). Link: https://lore.kernel.org/linux-s390/1f67548e-cbf6-0dce-82b5-10288a4583bd@linux.ibm.com/ Fixes: 606a63c9783a ("net/smc: Ensure the active closing peer first closes clcsock") Signed-off-by: Tony Lu <tonylu@linux.alibaba.com> Reviewed-by: Wen Gu <guwen@linux.alibaba.com> Acked-by: Karsten Graul <kgraul@linux.ibm.com> Link: https://lore.kernel.org/r/20211126024134.45693-1-tonylu@linux.alibaba.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
9ebb0c4b |
|
24-Nov-2021 |
Guo DaXing <guodaxing@huawei.com> |
net/smc: Fix loop in smc_listen The kernel_listen function in smc_listen will fail when all the available ports are occupied. At this point smc->clcsock->sk->sk_data_ready has been changed to smc_clcsock_data_ready. When we call smc_listen again, now both smc->clcsock->sk->sk_data_ready and smc->clcsk_data_ready point to the smc_clcsock_data_ready function. The smc_clcsock_data_ready() function calls lsmc->clcsk_data_ready which now points to itself resulting in an infinite loop. This patch restores smc->clcsock->sk->sk_data_ready with the old value. Fixes: a60a2b1e0af1 ("net/smc: reduce active tcp_listen workers") Signed-off-by: Guo DaXing <guodaxing@huawei.com> Acked-by: Tony Lu <tonylu@linux.alibaba.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
7a61432d |
|
22-Nov-2021 |
Wen Gu <guwen@linux.alibaba.com> |
net/smc: Avoid warning of possible recursive locking Possible recursive locking is detected by lockdep when SMC falls back to TCP. The corresponding warnings are as follows: ============================================ WARNING: possible recursive locking detected 5.16.0-rc1+ #18 Tainted: G E -------------------------------------------- wrk/1391 is trying to acquire lock: ffff975246c8e7d8 (&ei->socket.wq.wait){..-.}-{3:3}, at: smc_switch_to_fallback+0x109/0x250 [smc] but task is already holding lock: ffff975246c8f918 (&ei->socket.wq.wait){..-.}-{3:3}, at: smc_switch_to_fallback+0xfe/0x250 [smc] other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&ei->socket.wq.wait); lock(&ei->socket.wq.wait); *** DEADLOCK *** May be due to missing lock nesting notation 2 locks held by wrk/1391: #0: ffff975246040130 (sk_lock-AF_SMC){+.+.}-{0:0}, at: smc_connect+0x43/0x150 [smc] #1: ffff975246c8f918 (&ei->socket.wq.wait){..-.}-{3:3}, at: smc_switch_to_fallback+0xfe/0x250 [smc] stack backtrace: Call Trace: <TASK> dump_stack_lvl+0x56/0x7b __lock_acquire+0x951/0x11f0 lock_acquire+0x27a/0x320 ? smc_switch_to_fallback+0x109/0x250 [smc] ? smc_switch_to_fallback+0xfe/0x250 [smc] _raw_spin_lock_irq+0x3b/0x80 ? smc_switch_to_fallback+0x109/0x250 [smc] smc_switch_to_fallback+0x109/0x250 [smc] smc_connect_fallback+0xe/0x30 [smc] __smc_connect+0xcf/0x1090 [smc] ? mark_held_locks+0x61/0x80 ? __local_bh_enable_ip+0x77/0xe0 ? lockdep_hardirqs_on+0xbf/0x130 ? smc_connect+0x12a/0x150 [smc] smc_connect+0x12a/0x150 [smc] __sys_connect+0x8a/0xc0 ? syscall_enter_from_user_mode+0x20/0x70 __x64_sys_connect+0x16/0x20 do_syscall_64+0x34/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xae The nested locking in smc_switch_to_fallback() is considered to possibly cause a deadlock because smc_wait->lock and clc_wait->lock are the same type of lock. But actually it is safe so far since there is no other place trying to obtain smc_wait->lock when clc_wait->lock is held. So the patch replaces spin_lock() with spin_lock_nested() to avoid false report by lockdep. Link: https://lkml.org/lkml/2021/11/19/962 Fixes: 2153bd1e3d3d ("Transfer remaining wait queue entries during fallback") Reported-by: syzbot+e979d3597f48262cb4ee@syzkaller.appspotmail.com Signed-off-by: Wen Gu <guwen@linux.alibaba.com> Acked-by: Tony Lu <tonylu@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
2153bd1e |
|
13-Nov-2021 |
Wen Gu <guwen@linux.alibaba.com> |
net/smc: Transfer remaining wait queue entries during fallback The SMC fallback is incomplete currently. There may be some wait queue entries remaining in smc socket->wq, which should be removed to clcsocket->wq during the fallback. For example, in nginx/wrk benchmark, this issue causes an all-zeros test result: server: nginx -g 'daemon off;' client: smc_run wrk -c 1 -t 1 -d 5 http://11.200.15.93/index.html Running 5s test @ http://11.200.15.93/index.html 1 threads and 1 connections Thread Stats Avg Stdev Max ± Stdev Latency 0.00us 0.00us 0.00us -nan% Req/Sec 0.00 0.00 0.00 -nan% 0 requests in 5.00s, 0.00B read Requests/sec: 0.00 Transfer/sec: 0.00B The reason for this all-zeros result is that when wrk used SMC to replace TCP, it added an eppoll_entry into smc socket->wq and expected to be notified if epoll events like EPOLL_IN/ EPOLL_OUT occurred on the smc socket. However, once a fallback occurred, wrk switches to use clcsocket. Now it is clcsocket->wq instead of smc socket->wq which will be woken up. The eppoll_entry remaining in smc socket->wq does not work anymore and wrk stops the test. This patch fixes this issue by removing remaining wait queue entries from smc socket->wq to clcsocket->wq during the fallback. Link: https://www.spinics.net/lists/netdev/msg779769.html Signed-off-by: Wen Gu <guwen@linux.alibaba.com> Reviewed-by: Tony Lu <tonylu@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
e5d5aadc |
|
10-Nov-2021 |
Dust Li <dust.li@linux.alibaba.com> |
net/smc: fix sk_refcnt underflow on linkdown and fallback We got the following WARNING when running ab/nginx test with RDMA link flapping (up-down-up). The reason is when smc_sock fallback and at linkdown happens simultaneously, we may got the following situation: __smc_lgr_terminate() --> smc_conn_kill() --> smc_close_active_abort() smc_sock->sk_state = SMC_CLOSED sock_put(smc_sock) smc_sock was set to SMC_CLOSED and sock_put() been called when terminate the link group. But later application call close() on the socket, then we got: __smc_release(): if (smc_sock->fallback) smc_sock->sk_state = SMC_CLOSED sock_put(smc_sock) Again we set the smc_sock to CLOSED through it's already in CLOSED state, and double put the refcnt, so the following warning happens: refcount_t: underflow; use-after-free. WARNING: CPU: 5 PID: 860 at lib/refcount.c:28 refcount_warn_saturate+0x8d/0xf0 Modules linked in: CPU: 5 PID: 860 Comm: nginx Not tainted 5.10.46+ #403 Hardware name: Alibaba Cloud Alibaba Cloud ECS, BIOS 8c24b4c 04/01/2014 RIP: 0010:refcount_warn_saturate+0x8d/0xf0 Code: 05 5c 1e b5 01 01 e8 52 25 bc ff 0f 0b c3 80 3d 4f 1e b5 01 00 75 ad 48 RSP: 0018:ffffc90000527e50 EFLAGS: 00010286 RAX: 0000000000000026 RBX: ffff8881300df2c0 RCX: 0000000000000027 RDX: 0000000000000000 RSI: ffff88813bd58040 RDI: ffff88813bd58048 RBP: 0000000000000000 R08: 0000000000000003 R09: 0000000000000001 R10: ffff8881300df2c0 R11: ffffc90000527c78 R12: ffff8881300df340 R13: ffff8881300df930 R14: ffff88810b3dad80 R15: ffff8881300df4f8 FS: 00007f739de8fb80(0000) GS:ffff88813bd40000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000000000a01b008 CR3: 0000000111b64003 CR4: 00000000003706e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: smc_release+0x353/0x3f0 __sock_release+0x3d/0xb0 sock_close+0x11/0x20 __fput+0x93/0x230 task_work_run+0x65/0xa0 exit_to_user_mode_prepare+0xf9/0x100 syscall_exit_to_user_mode+0x27/0x190 entry_SYSCALL_64_after_hwframe+0x44/0xa9 This patch adds check in __smc_release() to make sure we won't do an extra sock_put() and set the socket to CLOSED when its already in CLOSED state. Fixes: 51f1de79ad8e (net/smc: replace sock_put worker by socket refcounting) Signed-off-by: Dust Li <dust.li@linux.alibaba.com> Reviewed-by: Tony Lu <tonylu@linux.alibaba.com> Signed-off-by: Dust Li <dust.li@linux.alibaba.com> Acked-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
48262608 |
|
01-Nov-2021 |
Tony Lu <tonylu@linux.alibaba.com> |
net/smc: Introduce tracepoint for fallback This introduces tracepoint for smc fallback to TCP, so that we can track which connection and why it fallbacks, and map the clcsocks' pointer with /proc/net/tcp to find more details about TCP connections. Compared with kprobe or other dynamic tracing, tracepoints are stable and easy to use. Signed-off-by: Tony Lu <tonylu@linux.alibaba.com> Reviewed-by: Wen Gu <guwen@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
b4ba4652 |
|
16-Oct-2021 |
Karsten Graul <kgraul@linux.ibm.com> |
net/smc: extend LLC layer for SMC-Rv2 Add support for large v2 LLC control messages in smc_llc.c. The new large work request buffer allows to combine control messages into one packet that had to be spread over several packets before. Add handling of the new v2 LLC messages. Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
e49300a6 |
|
16-Oct-2021 |
Karsten Graul <kgraul@linux.ibm.com> |
net/smc: add listen processing for SMC-Rv2 Implement the server side of the SMC-Rv2 processing. Process incoming CLC messages, find eligible devices and check for a valid route to the remote peer. Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
e5c4744c |
|
16-Oct-2021 |
Karsten Graul <kgraul@linux.ibm.com> |
net/smc: add SMC-Rv2 connection establishment Send a CLC proposal message, and the remote side process this type of message and determine the target GID. Check for a valid route to this GID, and complete the connection establishment. Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
42042dbb |
|
16-Oct-2021 |
Karsten Graul <kgraul@linux.ibm.com> |
net/smc: prepare for SMC-Rv2 connection Prepare the connection establishment with SMC-Rv2. Detect eligible RoCE cards and indicate all supported SMC modes for the connection. Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
11a26c59 |
|
14-Sep-2021 |
Karsten Graul <kgraul@linux.ibm.com> |
net/smc: keep static copy of system EID The system EID is retrieved using an registered ISM device each time when needed. This adds some unnecessary complexity at all places where the system EID is needed, but no ISM device is at hand. Simplify the code and save the system EID in a static variable in smc_ism.c. Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Reviewed-by: Guvenc Gulce <guvenc@linux.ibm.com> Signed-off-by: Guvenc Gulce <guvenc@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
fa086662 |
|
14-Sep-2021 |
Karsten Graul <kgraul@linux.ibm.com> |
net/smc: add support for user defined EIDs SMC-Dv2 allows users to define EIDs which allows to create separate name spaces enabling users to cluster their SMC-Dv2 connections. Add support for user defined EIDs and extent the generic netlink interface so users can add, remove and dump EIDs. Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Reviewed-by: Guvenc Gulce <guvenc@linux.ibm.com> Signed-off-by: Guvenc Gulce <guvenc@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
f3a3a0fe |
|
28-Oct-2021 |
Wen Gu <guwen@linux.alibaba.com> |
net/smc: Correct spelling mistake to TCPF_SYN_RECV There should use TCPF_SYN_RECV instead of TCP_SYN_RECV. Signed-off-by: Wen Gu <guwen@linux.alibaba.com> Reviewed-by: Tony Lu <tonylu@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
64513d26 |
|
09-Aug-2021 |
Guvenc Gulce <guvenc@linux.ibm.com> |
net/smc: Correct smc link connection counter in case of smc client SMC clients may be assigned to a different link after the initial connection between two peers was established. In such a case, the connection counter was not correctly set. Update the connection counter correctly when a smc client connection is assigned to a different smc link. Fixes: 07d51580ff65 ("net/smc: Add connection counters for links") Signed-off-by: Guvenc Gulce <guvenc@linux.ibm.com> Tested-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
e3ae2365 |
|
27-Jun-2021 |
Alexander Aring <aahringo@redhat.com> |
net: sock: introduce sk_error_report This patch introduces a function wrapper to call the sk_error_report callback. That will prepare to add additional handling whenever sk_error_report is called, for example to trace socket errors. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
194730a9 |
|
16-Jun-2021 |
Guvenc Gulce <guvenc@linux.ibm.com> |
net/smc: Make SMC statistics network namespace aware Make the gathered SMC statistics network namespace aware, for each namespace collect an own set of statistic information. Signed-off-by: Guvenc Gulce <guvenc@linux.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
e0e4b8fa |
|
16-Jun-2021 |
Guvenc Gulce <guvenc@linux.ibm.com> |
net/smc: Add SMC statistics support Add the ability to collect SMC statistics information. Per-cpu variables are used to collect the statistic information for better performance and for reducing concurrency pitfalls. The code that is collecting statistic data is implemented in macros to increase code reuse and readability. Signed-off-by: Guvenc Gulce <guvenc@linux.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
86214366 |
|
05-May-2021 |
Cong Wang <cong.wang@bytedance.com> |
smc: disallow TCP_ULP in smc_setsockopt() syzbot is able to setup kTLS on an SMC socket which coincidentally uses sk_user_data too. Later, kTLS treats it as psock so triggers a refcnt warning. The root cause is that smc_setsockopt() simply calls TCP setsockopt() which includes TCP_ULP. I do not think it makes sense to setup kTLS on top of SMC sockets, so we should just disallow this setup. It is hard to find a commit to blame, but we can apply this patch since the beginning of TCP_ULP. Reported-and-tested-by: syzbot+b54a1ce86ba4a623b7f0@syzkaller.appspotmail.com Fixes: 734942cc4ea6 ("tcp: ULP infrastructure") Cc: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: Cong Wang <cong.wang@bytedance.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
6fd6c483 |
|
27-Apr-2021 |
Jiapeng Chong <jiapeng.chong@linux.alibaba.com> |
net/smc: Remove redundant assignment to rc Variable rc is set to zero but this value is never read as it is overwritten with a new value later on, hence it is a redundant assignment and can be removed. Cleans up the following clang-analyzer warning: net/smc/af_smc.c:1079:3: warning: Value stored to 'rc' is never read [clang-analyzer-deadcode.DeadStores]. Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
e8372d9d |
|
01-Dec-2020 |
Guvenc Gulce <guvenc@linux.ibm.com> |
net/smc: Introduce generic netlink interface for diagnostic purposes Introduce generic netlink interface infrastructure to expose the diagnostic information regarding smc linkgroups, links and devices. Signed-off-by: Guvenc Gulce <guvenc@linux.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
49407ae2 |
|
01-Dec-2020 |
Guvenc Gulce <guvenc@linux.ibm.com> |
net/smc: Refactor smc ism v2 capability handling Encapsulate the smc ism v2 capability boolean value in a function for better information hiding. Signed-off-by: Guvenc Gulce <guvenc@linux.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
8cf3f3e4 |
|
01-Dec-2020 |
Karsten Graul <kgraul@linux.ibm.com> |
net/smc: use helper smc_conn_abort() in listen processing The helper smc_connect_abort() can be used by the listen processing functions, too. And rename this helper to smc_conn_abort() to make the purpose clearer. No functional change. Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
0530bd6e |
|
18-Nov-2020 |
Karsten Graul <kgraul@linux.ibm.com> |
net/smc: fix matching of existing link groups With the multi-subnet support of SMC-Dv2 the match for existing link groups should not include the vlanid of the network device. Set ini->smcd_version accordingly before the call to smc_conn_create() and use this value in smc_conn_create() to skip the vlanid check. Fixes: 5c21c4ccafe8 ("net/smc: determine accepted ISM devices") Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
3752404a |
|
31-Oct-2020 |
Karsten Graul <kgraul@linux.ibm.com> |
net/smc: improve return codes for SMC-Dv2 To allow better problem diagnosis the return codes for SMC-Dv2 are improved by this patch. A few more CLC DECLINE codes are defined and sent to the peer when an SMC connection cannot be established. There are now multiple SMC variations that are offered by the client and the server may encounter problems to initialize all of them. Because only one diagnosis code can be sent to the client the decision was made to send the first code that was encountered. Because the server tries the variations in the order of importance (SMC-Dv2, SMC-D, SMC-R) this makes sure that the diagnosis code of the most important variation is sent. v2: initialize rc in smc_listen_v2_check(). Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Link: https://lore.kernel.org/r/20201031181938.69903-1-kgraul@linux.ibm.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
4a9baf45 |
|
23-Oct-2020 |
Karsten Graul <kgraul@linux.ibm.com> |
net/smc: fix null pointer dereference in smc_listen_decline() smc_listen_work() calls smc_listen_decline() on label out_decl, providing the ini pointer variable. But this pointer can still be null when the label out_decl is reached. Fix this by checking the ini variable in smc_listen_work() and call smc_listen_decline() with the result directly. Fixes: a7c9c5f4af7f ("net/smc: CLC accept / confirm V2") Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
f29fa003 |
|
07-Oct-2020 |
Karsten Graul <kgraul@linux.ibm.com> |
net/smc: restore smcd_version when all ISM V2 devices failed to init Field ini->smcd_version is set to SMC_V2 before calling smc_listen_ism_init(). This clears the V1 bit that may be set. When all matching ISM V2 devices fail to initialize then the smcd_version field needs to get restored to allow any possible V1 devices to initialize. And be consistent, always go to the not_found label when no device was found. Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
9047a617dc |
|
07-Oct-2020 |
Karsten Graul <kgraul@linux.ibm.com> |
net/smc: cleanup buffer usage in smc_listen_work() coccinelle informs about net/smc/af_smc.c:1770:10-11: WARNING: opportunity for kzfree/kvfree_sensitive Its not that kzfree() would help here, the memset() is done to prepare the buffer for another socket receive. Fix that warning message by reordering the calls, while at it eliminate the unneeded variable cclc2 and use sizeof(*buf) as above in the same function. No functional changes. Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
c60a2cef |
|
07-Oct-2020 |
Karsten Graul <kgraul@linux.ibm.com> |
net/smc: consolidate unlocking in same function Static code checkers warn of inconsistent returns because the lgr mutex is locked in one function and unlocked in a function called by the locking function: net/smc/af_smc.c:823 smc_connect_rdma() warn: inconsistent returns 'smc_client_lgr_pending'. net/smc/af_smc.c:897 smc_connect_ism() warn: inconsistent returns 'smc_server_lgr_pending'. Make the code consistent by doing the unlock in the same function that fetches the lock. No functional changes. Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
839d696f |
|
02-Oct-2020 |
Karsten Graul <kgraul@linux.ibm.com> |
net/smc: send ISM devices with unique chid in CLC proposal When building a CLC proposal message then the list of ISM devices does not need to contain multiple devices that have the same chid value, all these devices use the same function at the end. Improve smc_find_ism_v2_device_clnt() to collect only ISM devices that have unique chid values. Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
e8d726c8 |
|
25-Sep-2020 |
Ursula Braun <ubraun@linux.ibm.com> |
net/smc: CLC decline - V2 enhancements This patch covers the small SMCD version 2 changes for CLC decline. Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
b81a5eb7 |
|
25-Sep-2020 |
Ursula Braun <ubraun@linux.ibm.com> |
net/smc: introduce CLC first contact extension SMC Version 2 defines a first contact extension for CLC accept and CLC confirm. This patch covers sending and receiving of the CLC first contact extension. Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
a7c9c5f4 |
|
25-Sep-2020 |
Ursula Braun <ubraun@linux.ibm.com> |
net/smc: CLC accept / confirm V2 The new format of SMCD V2 CLC accept and confirm is introduced, and building and checking of SMCD V2 CLC accepts / confirms is adapted accordingly. Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
5c21c4cc |
|
25-Sep-2020 |
Ursula Braun <ubraun@linux.ibm.com> |
net/smc: determine accepted ISM devices SMCD Version 2 allows to propose up to 8 additional ISM devices offered to the peer as candidates for SMCD communication. This patch covers the server side, i.e. selection of an ISM device matching one of the proposed ISM devices, that will be used for CLC accept Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
8c3dca34 |
|
25-Sep-2020 |
Ursula Braun <ubraun@linux.ibm.com> |
net/smc: build and send V2 CLC proposal The new format of an SMCD V2 CLC proposal is introduced, and building and checking of SMCD V2 CLC proposals is adapted accordingly. Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
d70bf4f7 |
|
25-Sep-2020 |
Ursula Braun <ubraun@linux.ibm.com> |
net/smc: determine proposed ISM devices SMCD Version 2 allows to propose up to 8 additional ISM devices offered to the peer as candidates for SMCD communication. This patch covers determination of the ISM devices to be proposed. ISM devices without PNETID are preferred, since ISM devices with PNETID are a V1 leftover and will disappear over the time. Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
8caaccf5 |
|
25-Sep-2020 |
Ursula Braun <ubraun@linux.ibm.com> |
net/smc: introduce CHID callback for ISM devices With SMCD version 2 the CHIDs of ISM devices are needed for the CLC handshake. This patch provides the new callback to retrieve the CHID of an ISM device. Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
201091eb |
|
25-Sep-2020 |
Ursula Braun <ubraun@linux.ibm.com> |
net/smc: introduce System Enterprise ID (SEID) SMCD version 2 defines a System Enterprise ID (short SEID). This patch contains the SEID creation and adds the callback to retrieve the created SEID. Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
3fc64937 |
|
25-Sep-2020 |
Ursula Braun <ubraun@linux.ibm.com> |
net/smc: prepare for more proposed ISM devices SMCD Version 2 allows proposing of up to 8 ISM devices in addition to the native ISM device of SMCD Version 1. This patch prepares the struct smc_init_info to deal with these additional 8 ISM devices. Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
7affc809 |
|
25-Sep-2020 |
Ursula Braun <ubraun@linux.ibm.com> |
net/smc: separate find device functions This patch provides better separation of device determinations in function smc_listen_work(). No functional change. Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
f1eb02f9 |
|
25-Sep-2020 |
Ursula Braun <ubraun@linux.ibm.com> |
net/smc: CLC header fields renaming SMCD version 2 defines 2 more bits in the CLC header to specify version 2 types. This patch prepares better naming of the CLC header fields. No functional change. Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
ac679364 |
|
17-Sep-2020 |
Ursula Braun <ubraun@linux.ibm.com> |
net/smc: fix double kfree in smc_listen_work() If smc_listen_rmda_finish() returns with an error, the storage addressed by 'buf' is freed a second time. Consolidate freeing under a common label and jump to that label. Fixes: 6bb14e48ee8d ("net/smc: dynamic allocation of CLC proposal buffer") Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
22ef473d |
|
10-Sep-2020 |
Karsten Graul <kgraul@linux.ibm.com> |
net/smc: use separate work queues for different worker types There are 6 types of workers which exist per smc connection. 3 of them are used for listen and handshake processing, another 2 are used for close and abort processing and 1 is the tx worker that moves calls to sleeping functions into a worker. To prevent flooding of the system work queue when many connections are opened or closed at the same time (some pattern uperf implements), move those workers to one of 3 smc-specific work queues. Two work queues are module-global and used for handshake and close workers. The third work queue is defined per link group and used by the tx workers that may sleep waiting for resources of this link group. And in smc_llc_enqueue() queue the llc_event_work work to the system prio work queue because its critical that this work is started fast. Reviewed-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
0c881ada |
|
10-Sep-2020 |
Ursula Braun <ubraun@linux.ibm.com> |
net/smc: reduce smc_listen_decline() calls smc_listen_work() contains already an smc_listen_decline() exit. Use this exit for smc_listen_rdma_finish() problems as well. Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
7b2977d0 |
|
10-Sep-2020 |
Ursula Braun <ubraun@linux.ibm.com> |
net/smc: improve server ISM device determination Move check whether peer can be reached into smc_pnet_find_ism_by_pnetid(). Thus searching continues for another ism device, if check fails. Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
3d9725a6 |
|
10-Sep-2020 |
Ursula Braun <ubraun@linux.ibm.com> |
net/smc: common routine for CLC accept and confirm smc_clc_send_accept() and smc_clc_send_confirm() are quite similar. Move common code into a separate function smc_clc_send_confirm_accept(). And introduce separate SMCD and SMCR struct definitions for CLC accept resp. confirm. No functional change. Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
6bb14e48 |
|
10-Sep-2020 |
Ursula Braun <ubraun@linux.ibm.com> |
net/smc: dynamic allocation of CLC proposal buffer Reduce stack size for smc_listen_work() and smc_clc_send_proposal() by dynamic allocation of the CLC buffer to be received or sent. Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
5ac54d87 |
|
10-Sep-2020 |
Ursula Braun <ubraun@linux.ibm.com> |
net/smc: introduce better field names Field names "srv_first_contact" and "cln_first_contact" are misleading, since they apply to both, server and client. Rename them to "first_contact_peer" and "first_contact_local". Rename "ism_gid" by the more precise name "ism_peer_gid". Rename version constant "SMC_CLC_V1" into "SMC_V1". No functional change. Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
a60a2b1e |
|
10-Sep-2020 |
Ursula Braun <ubraun@linux.ibm.com> |
net/smc: reduce active tcp_listen workers SMC starts a separate tcp_listen worker for every SMC socket in state SMC_LISTEN, and can accept an incoming connection request only, if this worker is really running and waiting in kernel_accept(). But the number of running workers is limited. This patch reworks the listening SMC code and starts a tcp_listen worker after the SYN-ACK handshake on the internal clc-socket only. Suggested-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Reviewed-by: Guvenc Gulce <guvenc@linux.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
72b7f6c4 |
|
26-Jul-2020 |
Karsten Graul <kgraul@linux.ibm.com> |
net/smc: unique reason code for exceeded max dmb count When the maximum dmb buffer limit for an ism device is reached no more dmb buffers can be registered. When this happens the reason code is set to SMC_CLC_DECL_MEM indicating out-of-memory. This is the same reason code that is used when no memory could be allocated for the new dmb buffer. This is confusing for users when they see this error but there is more memory available. To solve this set a separate new reason code when the maximum dmb limit exceeded. Reviewed-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
a7b75c5a |
|
23-Jul-2020 |
Christoph Hellwig <hch@lst.de> |
net: pass a sockptr_t into ->setsockopt Rework the remaining setsockopt code to pass a sockptr_t instead of a plain user pointer. This removes the last remaining set_fs(KERNEL_DS) outside of architecture specific code. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Stefan Schmidt <stefan@datenfreihafen.org> [ieee802154] Acked-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
a44d9e72 |
|
17-Jul-2020 |
Christoph Hellwig <hch@lst.de> |
net: make ->{get,set}sockopt in proto_ops optional Just check for a NULL method instead of wiring up sock_no_{get,set}sockopt. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Marc Kleine-Budde <mkl@pengutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
1ad24058 |
|
18-Jul-2020 |
Karsten Graul <kgraul@linux.ibm.com> |
net/smc: fix restoring of fallback changes When a listen socket is closed then all non-accepted sockets in its accept queue are to be released. Inside __smc_release() the helper smc_restore_fallback_changes() restores the changes done to the socket without to check if the clcsocket has a file set. This can result in a crash. Fix this by checking the file pointer first. Reviewed-by: Ursula Braun <ubraun@linux.ibm.com> Fixes: f536dffc0b79 ("net/smc: fix closing of fallback SMC sockets") Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
741a49a4 |
|
18-Jul-2020 |
Karsten Graul <kgraul@linux.ibm.com> |
net/smc: do not call dma sync for unmapped memory The dma related ...sync_sg... functions check the link state before the dma function is actually called. But the check in smc_link_usable() allows links in ACTIVATING state which are not yet mapped to dma memory. Under high load it may happen that the sync_sg functions are called for such a link which results in an debug output like DMA-API: mlx5_core 0002:00:00.0: device driver tries to sync DMA memory it has not allocated [device address=0x0000000103370000] [size=65536 bytes] To fix that introduce a helper to check for the link state ACTIVE and use it where appropriate. And move the link state update to ACTIVATING to the end of smcr_link_init() when most initial setup is done. Reviewed-by: Ursula Braun <ubraun@linux.ibm.com> Fixes: d854fcbfaeda ("net/smc: add new link state and related helpers") Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
7df8bcb5 |
|
18-Jul-2020 |
Karsten Graul <kgraul@linux.ibm.com> |
net/smc: fix link lookup for new rdma connections For new rdma connections the SMC server assigns the link and sends the link data in the clc accept message. To match the correct link use not only the qp_num but also the gid and the mac of the links. If there are equal qp_nums for different links the wrong link would be chosen. Reviewed-by: Ursula Braun <ubraun@linux.ibm.com> Fixes: 0fb0b02bd6fd ("net/smc: adapt SMC client code to use the LLC flow") Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
0a99be43 |
|
05-May-2020 |
Karsten Graul <kgraul@linux.ibm.com> |
net/smc: log important pnetid and state change events Print to system log when SMC links are available or go down, link group state changes or pnetids are applied to and removed from devices. The log entries are triggered by either user configuration actions or adapter activation/deactivation events and are not expected to happen often. The entries help SMC users to keep track of the SMC link group status and to detect when actions are needed (like to add replacements for failed adapters). Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Reviewed-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
649758ff |
|
04-May-2020 |
Karsten Graul <kgraul@linux.ibm.com> |
net/smc: save SMC-R peer link_uid During SMC-R link establishment the peers exchange the link_uid that is used for debugging purposes. Save the peer link_uid in smc_link so it can be retrieved by the smc_diag netlink interface. Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Reviewed-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
2d2209f2 |
|
03-May-2020 |
Karsten Graul <kgraul@linux.ibm.com> |
net/smc: first part of add link processing as SMC server First set of functions to process an ADD_LINK LLC request as an SMC server. Find an alternate IB device, determine the new link group type and get the index for the new link. Then initialize the link and send the ADD_LINK LLC message to the peer. Save the contents of the response, ready the link, map all used buffers and register the buffers with the IB device. If any error occurs, stop the processing and clear the link. And call smc_llc_srv_add_link() in af_smc.c to start second link establishment after the initial link of a link group was created. Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Reviewed-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
b1570a87 |
|
03-May-2020 |
Karsten Graul <kgraul@linux.ibm.com> |
net/smc: final part of add link processing as SMC client This patch finalizes the ADD_LINK processing of new links. Receive the CONFIRM_LINK request from peer, complete the link initialization, register all used buffers with the IB device and finally send the CONFIRM_LINK response, which completes the ADD_LINK processing. And activate smc_llc_cli_add_link() in af_smc.c. Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Reviewed-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
d5500667 |
|
30-Apr-2020 |
Karsten Graul <kgraul@linux.ibm.com> |
net/smc: mutex to protect the lgr against parallel reconfigurations Introduce llc_conf_mutex in the link group which is used to protect the buffers and lgr states against parallel link reconfiguration. This ensures that new connections do not start to register buffers with the links of a link group when link creation or termination is running. Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Reviewed-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
7562a13d |
|
30-Apr-2020 |
Karsten Graul <kgraul@linux.ibm.com> |
net/smc: multiple link support for rmb buffer registration The CONFIRM_RKEY LLC processing handles all links in one LLC message. Move the call to this processing out of smcr_link_reg_rmb() which does processing per link, into smcr_lgr_reg_rmbs() which is responsible for link group level processing. Move smcr_link_reg_rmb() into module smc_core.c. >From af_smc.c now call smcr_lgr_reg_rmbs() to register new rmbs on all available links. Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Reviewed-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
0fb0b02b |
|
30-Apr-2020 |
Karsten Graul <kgraul@linux.ibm.com> |
net/smc: adapt SMC client code to use the LLC flow Change the code that processes the SMC client part of connection establishment to use the LLC flow framework (CONFIRM_LINK request messages). Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Reviewed-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
4667bb4a |
|
30-Apr-2020 |
Karsten Graul <kgraul@linux.ibm.com> |
net/smc: adapt SMC server code to use the LLC flow Change the code that processes the SMC server part of connection establishment to use the LLC flow framework (CONFIRM_LINK response messages). Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Reviewed-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
00a049cf |
|
29-Apr-2020 |
Karsten Graul <kgraul@linux.ibm.com> |
net/smc: move llc layer related init and clear into smc_llc.c Introduce smc_llc_lgr_init() and smc_llc_lgr_clear() to implement all llc layer specific initialization and cleanup in module smc_llc.c. Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Reviewed-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
e07d31dc |
|
29-Apr-2020 |
Karsten Graul <kgraul@linux.ibm.com> |
net/smc: multi-link support for smc_rmb_rtoken_handling() Extend smc_rmb_rtoken_handling() and smc_rtoken_delete() to support multiple links. Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Reviewed-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
b9247544 |
|
29-Apr-2020 |
Karsten Graul <kgraul@linux.ibm.com> |
net/smc: convert static link ID instances to support multiple links As a preparation for the support of multiple links remove the usage of a static link id (SMC_SINGLE_LINK) and allow dynamic link ids. Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Reviewed-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
387707fd |
|
29-Apr-2020 |
Karsten Graul <kgraul@linux.ibm.com> |
net/smc: convert static link ID to dynamic references As a preparation for the support of multiple links remove the usage of a static link id (SMC_SINGLE_LINK) and allow dynamic link ids. Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Reviewed-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
51e3dfa8 |
|
25-Feb-2020 |
Ursula Braun <ubraun@linux.ibm.com> |
net/smc: fix cleanup for linkgroup setup failures If an SMC connection to a certain peer is setup the first time, a new linkgroup is created. In case of setup failures, such a linkgroup is unusable and should disappear. As a first step the linkgroup is removed from the linkgroup list in smc_lgr_forget(). There are 2 problems: smc_listen_decline() might be called before linkgroup creation resulting in a crash due to calling smc_lgr_forget() with parameter NULL. If a setup failure occurs after linkgroup creation, the connection is never unregistered from the linkgroup, preventing linkgroup freeing. This patch introduces an enhanced smc_lgr_cleanup_early() function which * contains a linkgroup check for early smc_listen_decline() invocations * invokes smc_conn_free() to guarantee unregistering of the connection. * schedules fast linkgroup removal of the unusable linkgroup And the unused function smcd_conn_free() is removed from smc_core.h. Fixes: 3b2dec2603d5b ("net/smc: restructure client and server code in af_smc") Fixes: 2a0674fffb6bc ("net/smc: improve abnormal termination of link groups") Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
67f562e3 |
|
14-Feb-2020 |
Ursula Braun <ubraun@linux.ibm.com> |
net/smc: transfer fasync_list in case of fallback SMC does not work together with FASTOPEN. If sendmsg() is called with flag MSG_FASTOPEN in SMC_INIT state, the SMC-socket switches to fallback mode. To handle the previous ioctl FIOASYNC call correctly in this case, it is necessary to transfer the socket wait queue fasync_list to the internal TCP socket. Reported-by: syzbot+4b1fe8105f8044a26162@syzkaller.appspotmail.com Fixes: ee9dfbef02d18 ("net/smc: handle sockopts forcing fallback") Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
86434744 |
|
12-Dec-2019 |
Ursula Braun <ubraun@linux.ibm.com> |
net/smc: add fallback check to connect() FASTOPEN setsockopt() or sendmsg() may switch the SMC socket to fallback mode. Once fallback mode is active, the native TCP socket functions are called. Nevertheless there is a small race window, when FASTOPEN setsockopt/sendmsg runs in parallel to a connect(), and switch the socket into fallback mode before connect() takes the sock lock. Make sure the SMC-specific connect setup is omitted in this case. This way a syzbot-reported refcount problem is fixed, triggered by different threads running non-blocking connect() and FASTOPEN_KEY setsockopt. Reported-by: syzbot+96d3f9ff6a86d37e44c8@syzkaller.appspotmail.com Fixes: 6d6dd528d5af ("net/smc: fix refcount non-blocking connect() -part 2") Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
|
#
8204df72 |
|
14-Nov-2019 |
Ursula Braun <ubraun@linux.ibm.com> |
net/smc: fix fastopen for non-blocking connect() FASTOPEN does not work with SMC-sockets. Since SMC allows fallback to TCP native during connection start, the FASTOPEN setsockopts trigger this fallback, if the SMC-socket is still in state SMC_INIT. But if a FASTOPEN setsockopt is called after a non-blocking connect(), this is broken, and fallback does not make sense. This change complements commit cd2063604ea6 ("net/smc: avoid fallback in case of non-blocking connect") and fixes the syzbot reported problem "WARNING in smc_unhash_sk". Reported-by: syzbot+8488cc4cf1c9e09b8b86@syzkaller.appspotmail.com Fixes: e1bbdd570474 ("net/smc: reduce sock_put() for fallback sockets") Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
4ead9c96 |
|
16-Nov-2019 |
Ursula Braun <ubraun@linux.ibm.com> |
net/smc: use rcu_barrier() on module unload Add rcu_barrier() to make sure no RCU readers or callbacks are pending when the module is unloaded. Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
6dabd405 |
|
16-Nov-2019 |
Ursula Braun <ubraun@linux.ibm.com> |
net/smc: introduce bookkeeping of SMCR link groups If the smc module is unloaded return control from exit routine only, if all link groups are freed. If an IB device is thrown away return control from device removal only, if all link groups belonging to this device are freed. Counters for the total number of SMCR link groups and for the total number of SMCR links per IB device are introduced. smc module unloading continues only if the total number of SMCR link groups is zero. IB device removal continues only it the total number of SMCR links per IB device has decreased to zero. Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
6d6dd528 |
|
12-Nov-2019 |
Ursula Braun <ubraun@linux.ibm.com> |
net/smc: fix refcount non-blocking connect() -part 2 If an SMC socket is immediately terminated after a non-blocking connect() has been called, a memory leak is possible. Due to the sock_hold move in commit 301428ea3708 ("net/smc: fix refcounting for non-blocking connect()") an extra sock_put() is needed in smc_connect_work(), if the internal TCP socket is aborted and cancels the sk_stream_wait_connect() of the connect worker. Reported-by: syzbot+4b73ad6fc767e576e275@syzkaller.appspotmail.com Fixes: 301428ea3708 ("net/smc: fix refcounting for non-blocking connect()") Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
301428ea |
|
28-Oct-2019 |
Ursula Braun <ubraun@linux.ibm.com> |
net/smc: fix refcounting for non-blocking connect() If a nonblocking socket is immediately closed after connect(), the connect worker may not have started. This results in a refcount problem, since sock_hold() is called from the connect worker. This patch moves the sock_hold in front of the connect worker scheduling. Reported-by: syzbot+4c063e6dea39e4b79f29@syzkaller.appspotmail.com Fixes: 50717a37db03 ("net/smc: nonblocking connect rework") Reviewed-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
ca5f8d2d |
|
23-Oct-2019 |
Ursula Braun <ubraun@linux.ibm.com> |
net/smc: keep vlan_id for SMC-R in smc_listen_work() Creating of an SMC-R connection with vlan-id fails, because smc_listen_work() determines the vlan_id of the connection, saves it in struct smc_init_info ini, but clears the ini area again if SMC-D is not applicable. This patch just resets the ISM device before investigating SMC-R availability. Fixes: bc36d2fc93eb ("net/smc: consolidate function parameters") Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
f536dffc |
|
23-Oct-2019 |
Ursula Braun <ubraun@linux.ibm.com> |
net/smc: fix closing of fallback SMC sockets For SMC sockets forced to fallback to TCP, the file is propagated from the outer SMC to the internal TCP socket. When closing the SMC socket, the internal TCP socket file pointer must be restored to the original NULL value, otherwise memory leaks may show up (found with CONFIG_DEBUG_KMEMLEAK). The internal TCP socket is released in smc_clcsock_release(), which calls __sock_release() function in net/socket.c. This calls the needed iput(SOCK_INODE(sock)) only, if the file pointer has been reset to the original NULL-value. Fixes: 07603b230895 ("net/smc: propagate file from SMC to TCP socket") Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
81cf4f47 |
|
21-Oct-2019 |
Ursula Braun <ubraun@linux.ibm.com> |
net/smc: remove close abort worker With the introduction of the link group termination worker there is no longer a need to postpone smc_close_active_abort() to a worker. To protect socket destruction due to normal and abnormal socket closing, the socket refcount is increased. Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
|
#
cd206360 |
|
02-Aug-2019 |
Ursula Braun <ubraun@linux.ibm.com> |
net/smc: avoid fallback in case of non-blocking connect FASTOPEN is not possible with SMC. sendmsg() with msg_flag MSG_FASTOPEN triggers a fallback to TCP if the socket is in state SMC_INIT. But if a nonblocking connect is already started, fallback to TCP is no longer possible, even though the socket may still be in state SMC_INIT. And if a nonblocking connect is already started, a listen() call does not make sense. Reported-by: syzbot+bd8cc73d665590a1fcad@syzkaller.appspotmail.com Fixes: 50717a37db032 ("net/smc: nonblocking connect rework") Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
f9cedf1a |
|
02-Aug-2019 |
Ursula Braun <ubraun@linux.ibm.com> |
net/smc: do not schedule tx_work in SMC_CLOSED state The setsockopts options TCP_NODELAY and TCP_CORK may schedule the tx worker. Make sure the socket is not yet moved into SMC_CLOSED state (for instance by a shutdown SHUT_RDWR call). Reported-by: syzbot+92209502e7aab127c75f@syzkaller.appspotmail.com Reported-by: syzbot+b972214bb803a343f4fe@syzkaller.appspotmail.com Fixes: 01d2f7e2cdd31 ("net/smc: sockopts TCP_NODELAY and TCP_CORK") Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
39f41f36 |
|
27-Jun-2019 |
Ursula Braun <ubraun@linux.ibm.com> |
net/smc: common release code for non-accepted sockets There are common steps when releasing an accepted or unaccepted socket. Move this code into a common routine. Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
8c33bf1b |
|
26-Jun-2019 |
YueHaibing <yuehaibing@huawei.com> |
net/smc: Fix error path in smc_init If register_pernet_subsys success in smc_init, we should cleanup it in case any other error. Fixes: 64e28b52c7a6 (net/smc: add pnet table namespace support") Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
09c434b8 |
|
19-May-2019 |
Thomas Gleixner <tglx@linutronix.de> |
treewide: Add SPDX license identifier for more missed files Add SPDX license identifiers to all files which: - Have no license information of any form - Have MODULE_LICENCE("GPL*") inside which was used in the initial scan/conversion to ignore the file These files fall under the project license, GPL v2 only. The resulting SPDX license identifier is: GPL-2.0-only Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
#
7a62725a |
|
11-Apr-2019 |
Karsten Graul <kgraul@linux.ibm.com> |
net/smc: improve smc_conn_create reason codes Rework smc_conn_create() to always return a valid DECLINE reason code. This removes the need to translate the return codes on 4 different places and allows to easily add more detailed return codes by changing smc_conn_create() only. Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
9aa68d29 |
|
11-Apr-2019 |
Karsten Graul <kgraul@linux.ibm.com> |
net/smc: improve smc_listen_work reason codes Rework smc_listen_work() to provide improved reason codes when an SMC connection is declined. This allows better debugging on user side. This also adds 3 more detailed reason codes in smc_clc.h to indicate what type of device was not found (ism or rdma or both), or if ism cannot talk to the peer. Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
228bae05 |
|
11-Apr-2019 |
Karsten Graul <kgraul@linux.ibm.com> |
net/smc: code cleanup smc_listen_work In smc_listen_work() the variables rc and reason_code are defined which have the same meaning. Eliminate reason_code in favor of the shorter name rc. No functional changes. Rename the functions smc_check_ism() and smc_check_rdma() into smc_find_ism_device() and smc_find_rdma_device() to make there purpose more clear. No functional changes. Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
fba7e8ef |
|
11-Apr-2019 |
Karsten Graul <kgraul@linux.ibm.com> |
net/smc: cleanup of get vlan id The vlan_id of the underlying CLC socket was retrieved two times during processing of the listen handshaking. Change this to get the vlan id one time in connect and in listen processing, and reuse the id. And add a new CLC DECLINE return code for the case when the retrieval of the vlan id failed. Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
bc36d2fc |
|
11-Apr-2019 |
Karsten Graul <kgraul@linux.ibm.com> |
net/smc: consolidate function parameters During initialization of an SMC socket a lot of function parameters need to get passed down the function call path. Consolidate the parameters in a helper struct so there are less enough parameters to get all passed by register. Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
59886697 |
|
11-Apr-2019 |
Karsten Graul <kgraul@linux.ibm.com> |
net/smc: check for ip prefix and subnet The check for a matching ip prefix and subnet was only done for SMC-R in smc_listen_rdma_check() but not when an SMC-D connection was possible. Rename the function into smc_listen_prfx_check() and move its call to a place where it is called for both SMC variants. And add a new CLC DECLINE reason for the case when the IP prefix or subnet check fails so the reason for the failing SMC connection can be found out more easily. Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
50717a37 |
|
11-Apr-2019 |
Ursula Braun <ubraun@linux.ibm.com> |
net/smc: nonblocking connect rework For nonblocking sockets move the kernel_connect() from the connect worker into the initial smc_connect part to return kernel_connect() errors other than -EINPROGRESS to user space. Reviewed-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
f61bca58 |
|
11-Apr-2019 |
Ursula Braun <ubraun@linux.ibm.com> |
net/smc: move unhash before release of clcsock Commit <26d92e951fe0> ("net/smc: move unhash as early as possible in smc_release()") fixes one occurrence in the smc code, but the same pattern exists in other places. This patch covers the remaining occurrences and makes sure, the unhash operation is done before the smc->clcsock is released. This avoids a potential use-after-free in smc_diag_dump(). Reviewed-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
07603b23 |
|
11-Apr-2019 |
Ursula Braun <ubraun@linux.ibm.com> |
net/smc: propagate file from SMC to TCP socket fcntl(fd, F_SETOWN, getpid()) selects the recipient of SIGURG signals that are delivered when out-of-band data arrives on socket fd. If an SMC socket program makes use of such an fcntl() call, it fails in case of fallback to TCP-mode. In case of fallback the traffic is processed with the internal TCP socket. Propagating field "file" from the SMC socket to the internal TCP socket fixes the issue. Reviewed-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
fd57770d |
|
11-Apr-2019 |
Karsten Graul <kgraul@linux.ibm.com> |
net/smc: wait for pending work before clcsock release_sock When the clcsock is already released using sock_release() and a pending smc_listen_work accesses the clcsock than that will fail. Solve this by canceling and waiting for the work to complete first. Because the work holds the sock_lock it must make sure that the lock is not hold before the new helper smc_clcsock_release() is invoked. And before the smc_listen_work starts working check if the parent listen socket is still valid, otherwise stop the work early. Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
64e28b52 |
|
21-Feb-2019 |
Hans Wippel <hwippel@linux.ibm.com> |
net/smc: add pnet table namespace support This patch adds namespace support to the pnet table code. Each network namespace gets its own pnet table. Infiniband and smcd device pnetids can only be modified in the initial namespace. In other namespaces they can still be used as if they were set by the underlying hardware. Signed-off-by: Hans Wippel <hwippel@linux.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
72a36a8a |
|
07-Feb-2019 |
Hans Wippel <hwippel@linux.ibm.com> |
net/smc: use client and server LGR pending locks for SMC-R If SMC client and server connections are both established at the same time, smc_connect_rdma() cannot send a CLC confirm message while smc_listen_work() is waiting for one due to lock contention. This can result in timeouts in smc_clc_wait_msg() and failed SMC connections. In case of SMC-R, there are two types of LGRs (client and server LGRs) which can be protected by separate locks. So, this patch splits the LGR pending lock into two separate locks for client and server to avoid the locking issue for SMC-R. Signed-off-by: Hans Wippel <hwippel@linux.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
62c7139f |
|
07-Feb-2019 |
Hans Wippel <hwippel@linux.ibm.com> |
net/smc: unlock LGR pending lock earlier for SMC-D If SMC client and server connections are both established at the same time, smc_connect_ism() cannot send a CLC confirm message while smc_listen_work() is waiting for one due to lock contention. This can result in timeouts in smc_clc_wait_msg() and failed SMC connections. In case of SMC-D, the LGR pending lock is not needed while smc_listen_work() is waiting for the CLC confirm message. So, this patch releases the lock earlier for SMC-D to avoid the locking issue. Signed-off-by: Hans Wippel <hwippel@linux.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
b03faa1f |
|
07-Feb-2019 |
Ursula Braun <ubraun@linux.ibm.com> |
net/smc: postpone release of clcsock According to RFC7609 (http://www.rfc-editor.org/info/rfc7609) first the SMC-R connection is shut down and then the normal TCP connection FIN processing drives cleanup of the internal TCP connection. The unconditional release of the clcsock during active socket closing has to be postponed if the peer has not yet signalled socket closing. Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
9718475e |
|
02-Feb-2019 |
Deepa Dinamani <deepa.kernel@gmail.com> |
socket: Add SO_TIMESTAMPING_NEW Add SO_TIMESTAMPING_NEW variant of socket timestamp options. This is the y2038 safe versions of the SO_TIMESTAMPING_OLD for all architectures. Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com> Acked-by: Willem de Bruijn <willemb@google.com> Cc: chris@zankel.net Cc: fenghua.yu@intel.com Cc: rth@twiddle.net Cc: tglx@linutronix.de Cc: ubraun@linux.ibm.com Cc: linux-alpha@vger.kernel.org Cc: linux-arch@vger.kernel.org Cc: linux-ia64@vger.kernel.org Cc: linux-mips@linux-mips.org Cc: linux-s390@vger.kernel.org Cc: linux-xtensa@linux-xtensa.org Cc: sparclinux@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
51c5aba3 |
|
30-Jan-2019 |
Karsten Graul <kgraul@linux.ibm.com> |
net/smc: recvmsg and splice_read should return 0 after shutdown When a socket was connected and is now shut down for read, return 0 to indicate end of data in recvmsg and splice_read (like TCP) and do not return ENOTCONN. This behavior is required by the socket api. Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
26d92e95 |
|
06-Jan-2019 |
Cong Wang <xiyou.wangcong@gmail.com> |
smc: move unhash as early as possible in smc_release() In smc_release() we release smc->clcsock before unhash the smc sock, but a parallel smc_diag_dump() may be still reading smc->clcsock, therefore this could cause a use-after-free as reported by syzbot. Reported-and-tested-by: syzbot+fbd1e5476e4c94c7b34e@syzkaller.appspotmail.com Fixes: 51f1de79ad8e ("net/smc: replace sock_put worker by socket refcounting") Cc: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Reported-by: syzbot+0bf2e01269f1274b4b03@syzkaller.appspotmail.com Reported-by: syzbot+e3132895630f957306bc@syzkaller.appspotmail.com Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
78abe3d0 |
|
18-Dec-2018 |
Myungho Jung <mhjungk@gmail.com> |
net/smc: fix TCP fallback socket release clcsock can be released while kernel_accept() references it in TCP listen worker. Also, clcsock needs to wake up before released if TCP fallback is used and the clcsock is blocked by accept. Add a lock to safely release clcsock and call kernel_sock_shutdown() to wake up clcsock from accept in smc_release(). Reported-by: syzbot+0bf2e01269f1274b4b03@syzkaller.appspotmail.com Reported-by: syzbot+e3132895630f957306bc@syzkaller.appspotmail.com Signed-off-by: Myungho Jung <mhjungk@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
c7674c00 |
|
22-Nov-2018 |
Karsten Graul <kgraul@linux.ibm.com> |
net/smc: unregister rkeys of unused buffer When an rmb is no longer in use by a connection, unregister its rkey at the remote peer with an LLC DELETE RKEY message. With this change, unused buffers held in the buffer pool are no longer registered at the remote peer. They are registered before the buffer is actually used and unregistered when they are no longer used by a connection. Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
587e41dc |
|
22-Nov-2018 |
Ursula Braun <ubraun@linux.ibm.com> |
net/smc: cleanup listen worker mutex unlocking For easier reading move the unlock of mutex smc_create_lgr_pending into smc_listen_work(), i.e. into the function the mutex has been locked. No functional change. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
2b59f58e |
|
22-Nov-2018 |
Ursula Braun <ubraun@linux.ibm.com> |
net/smc: short wait for late smc_clc_wait_msg After sending one of the initial LLC messages CONFIRM LINK or ADD LINK, there is already a wait for the LLC response. It does not make sense to wait another long time for a CLC DECLINE. Thus this patch introduces a shorter wait time for these cases. Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
9ed28556 |
|
22-Nov-2018 |
Ursula Braun <ubraun@linux.ibm.com> |
net/smc: allow fallback after clc timeouts If connection initialization fails for the LLC CONFIRM LINK or the LLC ADD LINK step, fallback to TCP should be enabled. Thus the negative return code -EAGAIN should switch to a positive timeout reason code in these cases, and the internal CLC socket should not have a set sk_err. Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
263ffaee |
|
22-Nov-2018 |
Ursula Braun <ubraun@linux.ibm.com> |
net/smc: cleanup tcp_listen_worker initialization The tcp_listen_worker is already initialized when socket is created (in smc_sock_alloc()). Get rid of the duplicate initialization in smc_listen(). No functional change. Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
ee05ff7a |
|
20-Nov-2018 |
Karsten Graul <kgraul@linux.ibm.com> |
net/smc: use queue pair number when matching link group When searching for an existing link group the queue pair number is also to be taken into consideration. When the SMC server sends a new number in a CLC packet (keeping all other values equal) then a new link group is to be created on the SMC client side. Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
f07920ad |
|
20-Nov-2018 |
Hans Wippel <hwippel@linux.ibm.com> |
net/smc: abort CLC connection in smc_release In case of a non-blocking SMC socket, the initial CLC handshake is performed over a blocking TCP connection in a worker. If the SMC socket is released, smc_release has to wait for the blocking CLC socket operations (e.g., kernel_connect) inside the worker. This patch aborts a CLC connection when the respective non-blocking SMC socket is released to avoid waiting on socket operations or timeouts. Signed-off-by: Hans Wippel <hwippel@linux.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
89ab066d |
|
23-Oct-2018 |
Karsten Graul <kgraul@linux.ibm.com> |
Revert "net: simplify sock_poll_wait" This reverts commit dd979b4df817e9976f18fb6f9d134d6bc4a3c317. This broke tcp_poll for SMC fallback: An AF_SMC socket establishes an internal TCP socket for the initial handshake with the remote peer. Whenever the SMC connection can not be established this TCP socket is used as a fallback. All socket operations on the SMC socket are then forwarded to the TCP socket. In case of poll, the file->private_data pointer references the SMC socket because the TCP socket has no file assigned. This causes tcp_poll to wait on the wrong socket. Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
71d117f5 |
|
18-Sep-2018 |
Karsten Graul <kgraul@linux.ibm.com> |
net/smc: no urgent data check for listen sockets Don't check a listen socket for pending urgent data in smc_poll(). Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
1ca52fcf |
|
18-Sep-2018 |
Ursula Braun <ubraun@linux.ibm.com> |
net/smc: remove duplicate mutex_unlock For a failing smc_listen_rdma_finish() smc_listen_decline() is called. If fallback is possible, the new socket is already enqueued to be accepted in smc_listen_decline(). Avoid enqueuing a second time afterwards in this case, otherwise the smc_create_lgr_pending lock is released twice: [ 373.463976] WARNING: bad unlock balance detected! [ 373.463978] 4.18.0-rc7+ #123 Tainted: G O [ 373.463979] ------------------------------------- [ 373.463980] kworker/1:1/30 is trying to release lock (smc_create_lgr_pending) at: [ 373.463990] [<000003ff801205fc>] smc_listen_work+0x22c/0x5d0 [smc] [ 373.463991] but there are no more locks to release! [ 373.463991] other info that might help us debug this: [ 373.463993] 2 locks held by kworker/1:1/30: [ 373.463994] #0: 00000000772cbaed ((wq_completion)"events"){+.+.}, at: process_one_work+0x1ec/0x6b0 [ 373.464000] #1: 000000003ad0894a ((work_completion)(&new_smc->smc_listen_work)){+.+.}, at: process_one_work+0x1ec/0x6b0 [ 373.464003] stack backtrace: [ 373.464005] CPU: 1 PID: 30 Comm: kworker/1:1 Kdump: loaded Tainted: G O 4.18.0-rc7uschi+ #123 [ 373.464007] Hardware name: IBM 2827 H43 738 (LPAR) [ 373.464010] Workqueue: events smc_listen_work [smc] [ 373.464011] Call Trace: [ 373.464015] ([<0000000000114100>] show_stack+0x60/0xd8) [ 373.464019] [<0000000000a8c9bc>] dump_stack+0x9c/0xd8 [ 373.464021] [<00000000001dcaf8>] print_unlock_imbalance_bug+0xf8/0x108 [ 373.464022] [<00000000001e045c>] lock_release+0x114/0x4f8 [ 373.464025] [<0000000000aa87fa>] __mutex_unlock_slowpath+0x4a/0x300 [ 373.464027] [<000003ff801205fc>] smc_listen_work+0x22c/0x5d0 [smc] [ 373.464029] [<0000000000197a68>] process_one_work+0x2a8/0x6b0 [ 373.464030] [<0000000000197ec2>] worker_thread+0x52/0x410 [ 373.464033] [<000000000019fd0e>] kthread+0x15e/0x178 [ 373.464035] [<0000000000aaf58a>] kernel_thread_starter+0x6/0xc [ 373.464052] [<0000000000aaf584>] kernel_thread_starter+0x0/0xc [ 373.464054] INFO: lockdep is turned off. Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
648a5a7a |
|
18-Sep-2018 |
Ursula Braun <ubraun@linux.ibm.com> |
net/smc: fix non-blocking connect problem In state SMC_INIT smc_poll() delegates polling to the internal CLC socket. This means, once the connect worker has finished its kernel_connect() step, the poll wake-up may occur. This is not intended. The wake-up should occur from the wake up call in smc_connect_work() after __smc_connect() has finished. Thus in state SMC_INIT this patch now calls sock_poll_wait() on the main SMC socket. Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
7311d665 |
|
08-Aug-2018 |
Ursula Braun <ubraun@linux.ibm.com> |
net/smc: move sock lock in smc_ioctl() When an SMC socket is connecting it is decided whether fallback to TCP is needed. To avoid races between connect and ioctl move the sock lock before the use_fallback check. Reported-by: syzbot+5b2cece1a8ecb2ca77d8@syzkaller.appspotmail.com Reported-by: syzbot+19557374321ca3710990@syzkaller.appspotmail.com Fixes: 1992d99882af ("net/smc: take sock lock in smc_ioctl()") Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
bd58c7e0 |
|
08-Aug-2018 |
Ursula Braun <ubraun@linux.ibm.com> |
net/smc: allow sysctl rmem and wmem defaults for servers Without setsockopt SO_SNDBUF and SO_RCVBUF settings, the sysctl defaults net.ipv4.tcp_wmem and net.ipv4.tcp_rmem should be the base for the sizes of the SMC sndbuf and rcvbuf. Any TCP buffer size optimizations for servers should be ignored. Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
caa21e19 |
|
08-Aug-2018 |
Ursula Braun <ubraun@linux.ibm.com> |
net/smc: no shutdown in state SMC_LISTEN Invoking shutdown for a socket in state SMC_LISTEN does not make sense. Nevertheless programs like syzbot fuzzing the kernel may try to do this. For SMC this means a socket refcounting problem. This patch makes sure a shutdown call for an SMC socket in state SMC_LISTEN simply returns with -ENOTCONN. Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
dd979b4d |
|
30-Jul-2018 |
Christoph Hellwig <hch@lst.de> |
net: simplify sock_poll_wait The wait_address argument is always directly derived from the filp argument, so remove it. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
603cc149 |
|
25-Jul-2018 |
Karsten Graul <kgraul@linux.ibm.com> |
net/smc: provide fallback reason code Remember the fallback reason code and the peer diagnosis code for smc sockets, and provide them in smc_diag.c to the netlink interface. And add more detailed reason codes. Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
7005ada6 |
|
25-Jul-2018 |
Ursula Braun <ubraun@linux.ibm.com> |
net/smc: use correct vlan gid of RoCE device SMC code uses the base gid for VLAN traffic. The gids exchanged in the CLC handshake and the gid index used for the QP have to switch from the base gid to the appropriate vlan gid. When searching for a matching IB device port for a certain vlan device, it does not make sense to return an IB device port, which is not enabled for the used vlan_id. Add another check whether a vlan gid exists for a certain IB device port. Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
947541f3 |
|
25-Jul-2018 |
Ursula Braun <ubraun@linux.ibm.com> |
net/smc: fewer parameters for smc_llc_send_confirm_link() Link confirmation will always be sent across the new link being confirmed. This allows to shrink the parameter list. No functional change. Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
bac6de7b |
|
23-Jul-2018 |
Stefan Raspl <raspl@linux.ibm.com> |
net/smc: eliminate cursor read and write calls The functions to read and write cursors are exclusively used to copy cursors. Therefore switch to a respective function instead. Signed-off-by: Stefan Raspl <raspl@linux.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
ac0107ed |
|
18-Jul-2018 |
Ursula Braun <ursula.braun@linux.ibm.com> |
net/smc: add error handling for get_user() For security reasons the return code of get_user() should always be checked. Fixes: 01d2f7e2cdd31 ("net/smc: sockopts TCP_NODELAY and TCP_CORK") Reported-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
1992d998 |
|
16-Jul-2018 |
Ursula Braun <ursula.braun@linux.ibm.com> |
net/smc: take sock lock in smc_ioctl() SMC ioctl processing requires the sock lock to work properly in all thinkable scenarios. Problem has been found with RaceFuzzer and fixes: KASAN: null-ptr-deref Read in smc_ioctl Reported-by: Byoungyoung Lee <lifeasageek@gmail.com> Reported-by: syzbot+35b2c5aa76fd398b9fd4@syzkaller.appspotmail.com Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Reviewed-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
e1bbdd57 |
|
05-Jul-2018 |
Ursula Braun <ubraun@linux.ibm.com> |
net/smc: reduce sock_put() for fallback sockets smc_release() calls a sock_put() for smc fallback sockets to cover the passive closing sock_hold() in __smc_connect() and smc_tcp_listen_work(). This does not make sense for sockets in state SMC_LISTEN and SMC_INIT. An SMC socket stays in state SMC_INIT if connect fails. The sock_put in smc_connect_abort() does not cover all failures. Move it into smc_connect_decline_fallback(). Fixes: ee9dfbef02d18 ("net/smc: handle sockopts forcing fallback") Reported-by: syzbot+3a0748c8f2f210c0ef9b@syzkaller.appspotmail.com Reported-by: syzbot+9e60d2428a42049a592a@syzkaller.appspotmail.com Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
410da1e1 |
|
03-Jul-2018 |
Linus Torvalds <torvalds@linux-foundation.org> |
net/smc: fix up merge error with poll changes My networking merge (commit 4e33d7d47943: "Pull networking fixes from David Miller") got the poll() handling conflict wrong for af_smc. The conflict between my a11e1d432b51 ("Revert changes to convert to ->poll_mask() and aio IOCB_CMD_POLL") and Ursula Braun's 24ac3a08e658 ("net/smc: rebuild nonblocking connect") should have left the call to sock_poll_wait() in place, just without the socket lock release/retake. And I really should have realized that. But happily, I at least asked Ursula to double-check the merge, and she set me right. This also fixes an incidental whitespace issue nearby that annoyed me while looking at this. Pointed-out-by: Ursula Braun <ubraun@linux.ibm.com> Cc: David Miller <davem@davemloft.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
41349844 |
|
28-Jun-2018 |
Hans Wippel <hwippel@linux.ibm.com> |
net/smc: add SMC-D support in af_smc This patch ties together the previous SMC-D patches. It adds support for SMC-D to the listen and connect functions and, thus, enables SMC-D support in the SMC code. If a connection supports both SMC-R and SMC-D, SMC-D is preferred. Signed-off-by: Hans Wippel <hwippel@linux.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Suggested-by: Thomas Richter <tmricht@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
c758dfdd |
|
28-Jun-2018 |
Hans Wippel <hwippel@linux.ibm.com> |
net/smc: add SMC-D support in CLC messages There are two types of SMC: SMC-R and SMC-D. These types are signaled within the CLC messages during the CLC handshake. This patch adds support for and checks of the SMC type. Also, SMC-R and SMC-D need to exchange different information during the CLC handshake. So, this patch extends the current message formats to support the SMC-D header fields. The Proposal message can contain both SMC-R and SMC-D information. The Accept and Confirm messages contain either SMC-R or SMC-D information. Signed-off-by: Hans Wippel <hwippel@linux.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Suggested-by: Thomas Richter <tmricht@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
c6ba7c9b |
|
28-Jun-2018 |
Hans Wippel <hwippel@linux.ibm.com> |
net/smc: add base infrastructure for SMC-D and ISM SMC supports two variants: SMC-R and SMC-D. For data transport, SMC-R uses RDMA devices, SMC-D uses so-called Internal Shared Memory (ISM) devices. An ISM device only allows shared memory communication between SMC instances on the same machine. For example, this allows virtual machines on the same host to communicate via SMC without RDMA devices. This patch adds the base infrastructure for SMC-D and ISM devices to the existing SMC code. It contains the following: * ISM driver interface: This interface allows an ISM driver to register ISM devices in SMC. In the process, the driver provides a set of device ops for each device. SMC uses these ops to execute SMC specific operations on or transfer data over the device. * Core SMC-D link group, connection, and buffer support: Link groups, SMC connections and SMC buffers (in smc_core) are extended to support SMC-D. * SMC type checks: Some type checks are added to prevent using SMC-R specific code for SMC-D and vice versa. To actually use SMC-D, additional changes to pnetid, CLC, CDC, etc. are required. These are added in follow-up patches. Signed-off-by: Hans Wippel <hwippel@linux.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Suggested-by: Thomas Richter <tmricht@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
a11e1d43 |
|
28-Jun-2018 |
Linus Torvalds <torvalds@linux-foundation.org> |
Revert changes to convert to ->poll_mask() and aio IOCB_CMD_POLL The poll() changes were not well thought out, and completely unexplained. They also caused a huge performance regression, because "->poll()" was no longer a trivial file operation that just called down to the underlying file operations, but instead did at least two indirect calls. Indirect calls are sadly slow now with the Spectre mitigation, but the performance problem could at least be largely mitigated by changing the "->get_poll_head()" operation to just have a per-file-descriptor pointer to the poll head instead. That gets rid of one of the new indirections. But that doesn't fix the new complexity that is completely unwarranted for the regular case. The (undocumented) reason for the poll() changes was some alleged AIO poll race fixing, but we don't make the common case slower and more complex for some uncommon special case, so this all really needs way more explanations and most likely a fundamental redesign. [ This revert is a revert of about 30 different commits, not reverted individually because that would just be unnecessarily messy - Linus ] Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
24ac3a08 |
|
27-Jun-2018 |
Ursula Braun <ubraun@linux.ibm.com> |
net/smc: rebuild nonblocking connect The recent poll change may lead to stalls for non-blocking connecting SMC sockets, since sock_poll_wait is no longer performed on the internal CLC socket, but on the outer SMC socket. kernel_connect() on the internal CLC socket returns with -EINPROGRESS, but the wake up logic does not work in all cases. If the internal CLC socket is still in state TCP_SYN_SENT when polled, sock_poll_wait() from sock_poll() does not sleep. It is supposed to sleep till the state of the internal CLC socket switches to TCP_ESTABLISHED. This problem triggered a redesign of the SMC nonblocking connect logic. This patch introduces a connect worker covering all connect steps followed by a wake up of socket waiters. It allows to get rid of all delays and locks in smc_poll(). Fixes: c0129a061442 ("smc: convert to ->poll_mask") Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
c0129a06 |
|
11-Jun-2018 |
Cong Wang <xiyou.wangcong@gmail.com> |
smc: convert to ->poll_mask smc->clcsock is an internal TCP socket, after TCP socket converts to ->poll_mask, ->poll doesn't exist any more. So just convert smc socket to ->poll_mask too. Fixes: 2c7d3dacebd4 ("net/tcp: convert to ->poll_mask") Reported-by: syzbot+f5066e369b2d5fff630f@syzkaller.appspotmail.com Cc: Christoph Hellwig <hch@lst.de> Cc: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
3dc9f558 |
|
30-May-2018 |
Wei Yongjun <weiyongjun1@huawei.com> |
net/smc: fix error return code in smc_setsockopt() Fix to return error code -EINVAL instead of 0 if optlen is invalid. Fixes: 01d2f7e2cdd3 ("net/smc: sockopts TCP_NODELAY and TCP_CORK") Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
de8474eb |
|
23-May-2018 |
Stefan Raspl <raspl@linux.ibm.com> |
net/smc: urgent data support Add support for out of band data send and receive. Signed-off-by: Stefan Raspl <raspl@linux.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
2351abe6 |
|
23-May-2018 |
Ursula Braun <ubraun@linux.ibm.com> |
net/smc: return 0 for ioctl calls in states INIT and CLOSED A connected SMC-socket contains addresses of descriptors for the send buffer and the rmb (receive buffer). Fields of these descriptors are used to determine the answer for certain ioctl requests. Add extra handling for unconnected SMC socket states without valid buffer descriptor addresses. Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Reported-by: syzbot+e6714328fda813fc670f@syzkaller.appspotmail.com Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
3b2dec26 |
|
18-May-2018 |
Hans Wippel <hwippel@linux.ibm.com> |
net/smc: restructure client and server code in af_smc This patch splits up the functions smc_connect_rdma and smc_listen_work into smaller functions. Signed-off-by: Hans Wippel <hwippel@linux.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
95d8d263 |
|
18-May-2018 |
Hans Wippel <hwippel@linux.ibm.com> |
net/smc: calculate write offset in RMB only once per connection Currently, the write offset within the RMB is calculated on each write operation although it is fixed for each connection. With this patch, the offset is calculated once and stored in a connection specific variable. Signed-off-by: Hans Wippel <hwippel@linux.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
92a138e3 |
|
18-May-2018 |
Hans Wippel <hwippel@linux.ibm.com> |
net/smc: rename connection index to RMBE index The connection index is actually a RMBE index. So, this patch changes the name accordingly. Signed-off-by: Hans Wippel <hwippel@linux.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
9fda3510 |
|
18-May-2018 |
Hans Wippel <hwippel@linux.ibm.com> |
net/smc: move link group list to smc_core This patch moves the global link group list to smc_core where the link group functions are. To make this work, it moves code in af_smc and smc_ib that operates on the link group list to smc_core as well. While at it, the link group counter is integrated into the list structure and initialized to zero. Signed-off-by: Hans Wippel <hwippel@linux.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
69cb7dc0 |
|
18-May-2018 |
Hans Wippel <hwippel@linux.ibm.com> |
net/smc: add common buffer size in send and receive buffer descriptors In addition to the buffer references, SMC currently stores the sizes of the receive and send buffers in each connection as separate variables. This patch introduces a buffer length variable in the common buffer descriptor and uses this length instead. Signed-off-by: Hans Wippel <hwippel@linux.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
be7f3e59 |
|
17-May-2018 |
Eric Dumazet <edumazet@google.com> |
net/smc: init conn.tx_work & conn.send_lock sooner syzkaller found that following program crashes the host : { int fd = socket(AF_SMC, SOCK_STREAM, 0); int val = 1; listen(fd, 0); shutdown(fd, SHUT_RDWR); setsockopt(fd, 6, TCP_NODELAY, &val, 4); } Simply initialize conn.tx_work & conn.send_lock at socket creation, rather than deeper in the stack. ODEBUG: assert_init not available (active state 0) object type: timer_list hint: (null) WARNING: CPU: 1 PID: 13988 at lib/debugobjects.c:329 debug_print_object+0x16a/0x210 lib/debugobjects.c:326 Kernel panic - not syncing: panic_on_warn set ... CPU: 1 PID: 13988 Comm: syz-executor0 Not tainted 4.17.0-rc4+ #46 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x1b9/0x294 lib/dump_stack.c:113 panic+0x22f/0x4de kernel/panic.c:184 __warn.cold.8+0x163/0x1b3 kernel/panic.c:536 report_bug+0x252/0x2d0 lib/bug.c:186 fixup_bug arch/x86/kernel/traps.c:178 [inline] do_error_trap+0x1de/0x490 arch/x86/kernel/traps.c:296 do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:315 invalid_op+0x14/0x20 arch/x86/entry/entry_64.S:992 RIP: 0010:debug_print_object+0x16a/0x210 lib/debugobjects.c:326 RSP: 0018:ffff880197a37880 EFLAGS: 00010086 RAX: 0000000000000061 RBX: 0000000000000005 RCX: ffffc90001ed0000 RDX: 0000000000004aaf RSI: ffffffff8160f6f1 RDI: 0000000000000001 RBP: ffff880197a378c0 R08: ffff8801aa7a0080 R09: ffffed003b5e3eb2 R10: ffffed003b5e3eb2 R11: ffff8801daf1f597 R12: 0000000000000001 R13: ffffffff88d96980 R14: ffffffff87fa19a0 R15: ffffffff81666ec0 debug_object_assert_init+0x309/0x500 lib/debugobjects.c:692 debug_timer_assert_init kernel/time/timer.c:724 [inline] debug_assert_init kernel/time/timer.c:776 [inline] del_timer+0x74/0x140 kernel/time/timer.c:1198 try_to_grab_pending+0x439/0x9a0 kernel/workqueue.c:1223 mod_delayed_work_on+0x91/0x250 kernel/workqueue.c:1592 mod_delayed_work include/linux/workqueue.h:541 [inline] smc_setsockopt+0x387/0x6d0 net/smc/af_smc.c:1367 __sys_setsockopt+0x1bd/0x390 net/socket.c:1903 __do_sys_setsockopt net/socket.c:1914 [inline] __se_sys_setsockopt net/socket.c:1911 [inline] __x64_sys_setsockopt+0xbe/0x150 net/socket.c:1911 do_syscall_64+0x1b1/0x800 arch/x86/entry/common.c:287 entry_SYSCALL_64_after_hwframe+0x49/0xbe Fixes: 01d2f7e2cdd3 ("net/smc: sockopts TCP_NODELAY and TCP_CORK") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Ursula Braun <ubraun@linux.ibm.com> Cc: linux-s390@vger.kernel.org Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
3cf52eb1 |
|
15-May-2018 |
Karsten Graul <kgraul@linux.ibm.com> |
net/smc: set link inactive before calling smc_lgr_free() Before smc_lgr_free() is called the link must be set inactive by calling smc_llc_link_inactive(). Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
1401ea04 |
|
15-May-2018 |
Karsten Graul <kgraul@linux.ibm.com> |
net/smc: handle all error codes from smc_conn_create() Always set a reason_code when smc_conn_create() returns an error code. Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
44aa81ce |
|
15-May-2018 |
Karsten Graul <kgraul@linux.ibm.com> |
net/smc: register new rmbs with the peer Register new rmb buffers with the remote peer by exchanging a confirm_rkey llc message. Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
569bc643 |
|
15-May-2018 |
Ursula Braun <ubraun@linux.ibm.com> |
net/smc: no tx work trigger for fallback sockets If TCP_NODELAY is set or TCP_CORK is reset, setsockopt triggers the tx worker. This does not make sense, if the SMC socket switched to the TCP fallback when the connection is created. This patch adds the additional check for the fallback case. Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
9014db20 |
|
03-May-2018 |
Stefan Raspl <stefan.raspl@linux.ibm.com> |
smc: add support for splice() Provide an implementation for splice() when we are using SMC. See smc_splice_read() for further details. Signed-off-by: Stefan Raspl <raspl@linux.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>< Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
b51fa1b1 |
|
03-May-2018 |
Stefan Raspl <stefan.raspl@linux.ibm.com> |
smc: make smc_rx_wait_data() generic Turn smc_rx_wait_data into a generic function that can be used at various instances to wait on traffic to complete with varying criteria. Signed-off-by: Stefan Raspl <raspl@linux.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>< Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
bda27ff5 |
|
03-May-2018 |
Stefan Raspl <stefan.raspl@linux.ibm.com> |
smc: fix sendpage() call The sendpage() call grabs the sock lock before calling the default implementation - which tries to grab it once again. Signed-off-by: Stefan Raspl <raspl@linux.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>< Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
a6920d1d |
|
03-May-2018 |
Karsten Graul <kgraul@linux.ibm.com> |
net/smc: handle unregistered buffers When smc_wr_reg_send() fails then tag (regerr) the affected buffer and free it in smc_buf_unuse(). Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
e63a5f8c |
|
03-May-2018 |
Karsten Graul <kgraul@linux.ibm.com> |
net/smc: call consolidation Consolidate the call to smc_wr_reg_send() in a new function. No functional changes. Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
9b67e26f |
|
02-May-2018 |
Ursula Braun <ubraun@linux.ibm.com> |
net/smc: handle ioctls SIOCINQ, SIOCOUTQ, and SIOCOUTQNSD SIOCINQ returns the amount of unread data in the RMB. SIOCOUTQ returns the amount of unsent or unacked sent data in the send buffer. SIOCOUTQNSD returns the amount of data prepared for sending, but not yet sent. Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
877ae5be |
|
02-May-2018 |
Karsten Graul <kgraul@linux.ibm.com> |
net/smc: periodic testlink support Add periodic LLC testlink support to ensure the link is still active. The interval time is initialized using the value of sysctl_tcp_keepalive_time. Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
784813ae |
|
02-May-2018 |
Ursula Braun <ubraun@linux.ibm.com> |
net/smc: restrict non-blocking connect finish The smc_poll code tries to finish connect() if the socket is in state SMC_INIT and polling of the internal CLC-socket returns with EPOLLOUT. This makes sense for a select/poll call following a connect call, but not without preceding connect(). With this patch smc_poll starts connect logic only, if the CLC-socket is no longer in its initial state TCP_CLOSE. In addition, a poll error on the internal CLC-socket is always propagated to the SMC socket. With this patch the code path mentioned by syzbot https://syzkaller.appspot.com/bug?extid=03faa2dc16b8b64be396 is no longer possible. Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Reported-by: syzbot+03faa2dc16b8b64be396@syzkaller.appspotmail.com Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
abb190f1 |
|
26-Apr-2018 |
Ursula Braun <ubraun@linux.ibm.com> |
net/smc: handle sockopt TCP_DEFER_ACCEPT If sockopt TCP_DEFER_ACCEPT is set, the accept is delayed till data is available. Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
01d2f7e2 |
|
26-Apr-2018 |
Ursula Braun <ubraun@linux.ibm.com> |
net/smc: sockopts TCP_NODELAY and TCP_CORK Setting sockopt TCP_NODELAY or resetting sockopt TCP_CORK triggers data transfer. For a corked SMC socket RDMA writes are deferred, if there is still sufficient send buffer space available. Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
ee9dfbef |
|
26-Apr-2018 |
Ursula Braun <ubraun@linux.ibm.com> |
net/smc: handle sockopts forcing fallback Several TCP sockopts do not work for SMC. One example are the TCP_FASTOPEN sockopts, since SMC-connection setup is based on the TCP three-way-handshake. If the SMC socket is still in state SMC_INIT, such sockopts trigger fallback to TCP. Otherwise an error is returned. Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
070204a3 |
|
24-Apr-2018 |
Ursula Braun <ubraun@linux.ibm.com> |
net/smc: keep clcsock reference in smc_tcp_listen_work() The internal CLC socket should exist till the SMC-socket is released. Function tcp_listen_worker() releases the internal CLC socket of a listen socket, if an smc_close_active() is called. This function is called for the final release(), but it is called for shutdown SHUT_RDWR as well. This opens a door for protection faults, if socket calls using the internal CLC socket are called for a shutdown listen socket. With the changes of commit 3d502067599f ("net/smc: simplify wait when closing listen socket") there is no need anymore to release the internal CLC socket in function tcp_listen_worker((). It is sufficient to release it in smc_release(). Fixes: 127f49705823 ("net/smc: release clcsock from tcp_listen_worker") Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Reported-by: syzbot+9045fc589fcd196ef522@syzkaller.appspotmail.com Reported-by: syzbot+28a2c86cf19c81d871fa@syzkaller.appspotmail.com Reported-by: syzbot+9605e6cace1b5efd4a0a@syzkaller.appspotmail.com Reported-by: syzbot+cf9012c597c8379d535c@syzkaller.appspotmail.com Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
1255fcb2 |
|
19-Apr-2018 |
Ursula Braun <ubraun@linux.vnet.ibm.com> |
net/smc: fix shutdown in state SMC_LISTEN Calling shutdown with SHUT_RD and SHUT_RDWR for a listening SMC socket crashes, because commit 127f49705823 ("net/smc: release clcsock from tcp_listen_worker") releases the internal clcsock in smc_close_active() and sets smc->clcsock to NULL. For SHUT_RD the smc_close_active() call is removed. For SHUT_RDWR the kernel_sock_shutdown() call is omitted, since the clcsock is already released. Fixes: 127f49705823 ("net/smc: release clcsock from tcp_listen_worker") Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com> Reported-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
aaa4d33f |
|
16-Mar-2018 |
Karsten Graul <kgraul@linux.vnet.ibm.com> |
net/smc: enable ipv6 support for smc Add ipv6 support to the smc socket layer functions. Make use of the updated clc layer functions to retrieve and match ipv6 information. The indicator for ipv4 or ipv6 is the protocol constant that is provided in the socket() call with address family AF_SMC. Based-on-patch-by: Takanori Ueda <tkueda@jp.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.vnet.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
c246d942 |
|
16-Mar-2018 |
Karsten Graul <kgraul@linux.vnet.ibm.com> |
net/smc: restructure netinfo for CLC proposal msgs Introduce functions smc_clc_prfx_set to retrieve IP information for the CLC proposal msg and smc_clc_prfx_match to match the contents of a proposal message against the IP addresses of the net device. The new functions replace the functionality provided by smc_clc_netinfo_by_tcpsk, which is removed by this patch. The match functionality is extended to scan all ipv4 addresses of the net device for a match against the ipv4 subnet from the proposal msg. Signed-off-by: Karsten Graul <kgraul@linux.vnet.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
3d502067 |
|
13-Mar-2018 |
Ursula Braun <ubraun@linux.vnet.ibm.com> |
net/smc: simplify wait when closing listen socket Closing of a listen socket wakes up kernel_accept() of smc_tcp_listen_worker(), and then has to wait till smc_tcp_listen_worker() gives up the internal clcsock. The wait logic introduced with commit 127f49705823 ("net/smc: release clcsock from tcp_listen_worker") might wait longer than necessary. This patch implements the idea to implement the wait just with flush_work(), and gets rid of the extra smc_close_wait_listen_clcsock() function. Fixes: 127f49705823 ("net/smc: release clcsock from tcp_listen_worker") Reported-by: Hans Wippel <hwippel@linux.vnet.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
268ffcc4 |
|
14-Mar-2018 |
Ursula Braun <ubraun@linux.vnet.ibm.com> |
net/smc: free link group without pending free_work only Make sure there is no pending or running free_work worker for the link group when freeing the link group. Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
9651b934 |
|
01-Mar-2018 |
Karsten Graul <kgraul@linux.vnet.ibm.com> |
net/smc: prevent new connections on link group When the processing of a DELETE LINK message has started, new connections should not be added to the link group that is about to terminate. Signed-off-by: Karsten Graul <kgraul@linux.vnet.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
52bedf37 |
|
01-Mar-2018 |
Karsten Graul <kgraul@linux.vnet.ibm.com> |
net/smc: process add/delete link messages Add initial support for the LLC messages ADD LINK and DELETE LINK. Introduce a link state field. Extend the initial LLC handshake with ADD LINK processing. Signed-off-by: Karsten Graul <kgraul@linux.vnet.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
75d320d6 |
|
01-Mar-2018 |
Karsten Graul <kgraul@linux.vnet.ibm.com> |
net/smc: do not allow eyecatchers in rmbe SMC does not support eyecatchers in RMB elements, decline peers requesting this support. Signed-off-by: Karsten Graul <kgraul@linux.vnet.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
be6d467b |
|
01-Mar-2018 |
Karsten Graul <kgraul@linux.vnet.ibm.com> |
net/smc: remove unused fields from smc structures The daddr field holds the destination IPv4 address. The field was set but never used and can be removed. The addr field was a left-over from an earlier version of non-blocking connects and can be removed. The result of the call to kernel_getpeername is not used, the call can be removed. Non-blocking connects are working, so remove restriction comment. Signed-off-by: Karsten Graul <kgraul@linux.vnet.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
696cd301 |
|
01-Mar-2018 |
Karsten Graul <kgraul@linux.vnet.ibm.com> |
net/smc: move netinfo function to file smc_clc.c The function smc_netinfo_by_tcpsk() belongs to CLC handling. Move it to smc_clc.c and rename to smc_clc_netinfo_by_tcpsk. Signed-off-by: Karsten Graul <kgraul@linux.vnet.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
0f627126 |
|
01-Mar-2018 |
Stefan Raspl <stefan.raspl@de.ibm.com> |
net/smc: cleanup smc_llc.h and smc_clc.h headers Remove structures used internal only from headers. And remove an extra function parameter. Signed-off-by: Stefan Raspl <raspl@linux.vnet.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
a5dcb73b |
|
27-Feb-2018 |
Davide Caratti <dcaratti@redhat.com> |
net/smc: fix NULL pointer dereference on sock_create_kern() error path when sock_create_kern(..., a) returns an error, 'a' might not be a valid pointer, so it shouldn't be dereferenced to read a->sk->sk_sndbuf and and a->sk->sk_rcvbuf; not doing that caused the following crash: general protection fault: 0000 [#1] SMP KASAN Dumping ftrace buffer: (ftrace buffer empty) Modules linked in: CPU: 0 PID: 4254 Comm: syzkaller919713 Not tainted 4.16.0-rc1+ #18 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:smc_create+0x14e/0x300 net/smc/af_smc.c:1410 RSP: 0018:ffff8801b06afbc8 EFLAGS: 00010202 RAX: dffffc0000000000 RBX: ffff8801b63457c0 RCX: ffffffff85a3e746 RDX: 0000000000000004 RSI: 00000000ffffffff RDI: 0000000000000020 RBP: ffff8801b06afbf0 R08: 00000000000007c0 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: ffff8801b6345c08 R14: 00000000ffffffe9 R15: ffffffff8695ced0 FS: 0000000001afb880(0000) GS:ffff8801db200000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000020000040 CR3: 00000001b0721004 CR4: 00000000001606f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: __sock_create+0x4d4/0x850 net/socket.c:1285 sock_create net/socket.c:1325 [inline] SYSC_socketpair net/socket.c:1409 [inline] SyS_socketpair+0x1c0/0x6f0 net/socket.c:1366 do_syscall_64+0x282/0x940 arch/x86/entry/common.c:287 entry_SYSCALL_64_after_hwframe+0x26/0x9b RIP: 0033:0x4404b9 RSP: 002b:00007fff44ab6908 EFLAGS: 00000246 ORIG_RAX: 0000000000000035 RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00000000004404b9 RDX: 0000000000000000 RSI: 0000000000000001 RDI: 000000000000002b RBP: 00007fff44ab6910 R08: 0000000000000002 R09: 00007fff44003031 R10: 0000000020000040 R11: 0000000000000246 R12: ffffffffffffffff R13: 0000000000000006 R14: 0000000000000000 R15: 0000000000000000 Code: 48 c1 ea 03 80 3c 02 00 0f 85 b3 01 00 00 4c 8b a3 48 04 00 00 48 b8 00 00 00 00 00 fc ff df 49 8d 7c 24 20 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 82 01 00 00 4d 8b 7c 24 20 48 b8 00 00 00 00 RIP: smc_create+0x14e/0x300 net/smc/af_smc.c:1410 RSP: ffff8801b06afbc8 Fixes: cd6851f30386 smc: remote memory buffers (RMBs) Reported-and-tested-by: syzbot+aa0227369be2dcc26ebe@syzkaller.appspotmail.com Signed-off-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
9b2c45d4 |
|
12-Feb-2018 |
Denys Vlasenko <dvlasenk@redhat.com> |
net: make getname() functions return length rather than use int* parameter Changes since v1: Added changes in these files: drivers/infiniband/hw/usnic/usnic_transport.c drivers/staging/lustre/lnet/lnet/lib-socket.c drivers/target/iscsi/iscsi_target_login.c drivers/vhost/net.c fs/dlm/lowcomms.c fs/ocfs2/cluster/tcp.c security/tomoyo/network.c Before: All these functions either return a negative error indicator, or store length of sockaddr into "int *socklen" parameter and return zero on success. "int *socklen" parameter is awkward. For example, if caller does not care, it still needs to provide on-stack storage for the value it does not need. None of the many FOO_getname() functions of various protocols ever used old value of *socklen. They always just overwrite it. This change drops this parameter, and makes all these functions, on success, return length of sockaddr. It's always >= 0 and can be differentiated from an error. Tests in callers are changed from "if (err)" to "if (err < 0)", where needed. rpc_sockname() lost "int buflen" parameter, since its only use was to be passed to kernel_getsockname() as &buflen and subsequently not used in any way. Userspace API is not changed. text data bss dec hex filename 30108430 2633624 873672 33615726 200ef6e vmlinux.before.o 30108109 2633612 873672 33615393 200ee21 vmlinux.o Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> CC: David S. Miller <davem@davemloft.net> CC: linux-kernel@vger.kernel.org CC: netdev@vger.kernel.org CC: linux-bluetooth@vger.kernel.org CC: linux-decnet-user@lists.sourceforge.net CC: linux-wireless@vger.kernel.org CC: linux-rdma@vger.kernel.org CC: linux-sctp@vger.kernel.org CC: linux-nfs@vger.kernel.org CC: linux-x25@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
a9a08845 |
|
11-Feb-2018 |
Linus Torvalds <torvalds@linux-foundation.org> |
vfs: do bulk POLL* -> EPOLL* replacement This is the mindless scripted replacement of kernel use of POLL* variables as described by Al, done by this script: for V in IN OUT PRI ERR RDNORM RDBAND WRNORM WRBAND HUP RDHUP NVAL MSG; do L=`git grep -l -w POLL$V | grep -v '^t' | grep -v /um/ | grep -v '^sa' | grep -v '/poll.h$'|grep -v '^D'` for f in $L; do sed -i "-es/^\([^\"]*\)\(\<POLL$V\>\)/\\1E\\2/" $f; done done with de-mangling cleanups yet to come. NOTE! On almost all architectures, the EPOLL* constants have the same values as the POLL* constants do. But they keyword here is "almost". For various bad reasons they aren't the same, and epoll() doesn't actually work quite correctly in some cases due to this on Sparc et al. The next patch from Al will sort out the final differences, and we should be all done. Scripted-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
63e2480c |
|
01-Feb-2018 |
Al Viro <viro@zeniv.linux.org.uk> |
smc: missing poll annotations Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
127f4970 |
|
26-Jan-2018 |
Ursula Braun <ubraun@linux.vnet.ibm.com> |
net/smc: release clcsock from tcp_listen_worker Closing a listen socket may hit the warning WARN_ON(sock_owned_by_user(sk)) of tcp_close(), if the wake up of the smc_tcp_listen_worker has not yet finished. This patch introduces smc_close_wait_listen_clcsock() making sure the listening internal clcsock has been closed in smc_tcp_listen_work(), before the listening external SMC socket finishes closing. Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
51f1de79 |
|
26-Jan-2018 |
Ursula Braun <ubraun@linux.vnet.ibm.com> |
net/smc: replace sock_put worker by socket refcounting Proper socket refcounting makes the sock_put worker obsolete. Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
8dce2786 |
|
26-Jan-2018 |
Ursula Braun <ubraun@linux.vnet.ibm.com> |
net/smc: smc_poll improvements Increase the socket refcount during poll wait. Take the socket lock before checking socket state. For a listening socket return a mask independent of state SMC_ACTIVE and cover errors or closed state as well. Get rid of the accept_q loop in smc_accept_poll(). Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
610db66f |
|
25-Jan-2018 |
Ursula Braun <ubraun@linux.vnet.ibm.com> |
net/smc: do not reuse a linkgroup with setup problems Once a linkgroup is created successfully, it stays alive for a certain time to service more connections potentially created. If one of the initialization steps for a new linkgroup fails, the linkgroup should not be reused by other connections following. Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
35a6b178 |
|
24-Jan-2018 |
Ursula Braun <ubraun@linux.vnet.ibm.com> |
net/smc: simplify function smc_clcsock_accept() Cleanup to avoid duplicate code in smc_clcsock_accept(). No functional change. Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
3163c507 |
|
24-Jan-2018 |
Ursula Braun <ubraun@linux.vnet.ibm.com> |
net/smc: use local struct sock variables consistently Cleanup to consistently exploit the local struct sock definitions. No functional change. Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
e7b7a64a |
|
07-Dec-2017 |
Ursula Braun <ubraun@linux.vnet.ibm.com> |
smc: support variable CLC proposal messages According to RFC7609 [1] the CLC proposal message contains an area of unknown length for future growth. Additionally it may contain up to 8 IPv6 prefixes. The current version of the SMC-code does not understand CLC proposal messages using these variable length fields and, thus, is incompatible with SMC implementations in other operating systems. This patch makes sure, SMC understands incoming CLC proposals * with arbitrary length values for future growth * with up to 8 IPv6 prefixes [1] SMC-R Informational RFC: http://www.rfc-editor.org/info/rfc7609 Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com> Reviewed-by: Hans Wippel <hwippel@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
0c9f1515 |
|
07-Dec-2017 |
Ursula Braun <ubraun@linux.vnet.ibm.com> |
smc: improve smc_clc_send_decline() error handling Let smc_clc_send_decline() return with an error, if the amount sent is smaller than the length of an smc decline message. Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
ade994f4 |
|
02-Jul-2017 |
Al Viro <viro@zeniv.linux.org.uk> |
net: annotate ->poll() instances Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
e6c8adca |
|
03-Jul-2017 |
Al Viro <viro@zeniv.linux.org.uk> |
anntotate the places where ->poll() return values go Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
c5c1cc9c |
|
25-Oct-2017 |
Ursula Braun <ubraun@linux.vnet.ibm.com> |
smc: add SMC rendezvous protocol The SMC protocol [1] uses a rendezvous protocol to negotiate SMC capability between peers. The current Linux implementation does not yet use this rendezvous protocol and, thus, is not compliant to RFC7609 and incompatible with other SMC implementations like in zOS. This patch adds support for the SMC rendezvous protocol. It uses a new TCP experimental option. With this option, SMC capabilities are exchanged between the peers during the TCP three way handshake. [1] SMC-R Informational RFC: http://www.rfc-editor.org/info/rfc7609 Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
145686ba |
|
25-Oct-2017 |
Ursula Braun <ubraun@linux.vnet.ibm.com> |
smc: fix mutex unlocks during link group creation Link group creation is synchronized with the smc_create_lgr_pending lock. In smc_listen_work() this mutex is sometimes unlocked, even though it has not been locked before. This issue will surface in presence of the SMC rendezvous code. Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
bfbedfd3 |
|
21-Sep-2017 |
Ursula Braun <ubraun@linux.vnet.ibm.com> |
net/smc: terminate link group if out-of-sync is received An out-of-sync condition can just be detected by the client. If the server receives a CLC DECLINE message indicating an out-of-sync condition for the link groups, the server must clean up the out-of-sync link group. There is no need for an extra third parameter in smc_clc_send_decline(). Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
731b0085 |
|
21-Sep-2017 |
Ursula Braun <ubraun@linux.vnet.ibm.com> |
net/smc: take RCU read lock for routing cache lookup smc_netinfo_by_tcpsk() looks up the routing cache. Such a lookup requires protection by an RCU read lock. Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
10428dd8 |
|
28-Jul-2017 |
Ursula Braun <ubraun@linux.vnet.ibm.com> |
net/smc: synchronize buffer usage with device Usage of send buffer "sndbuf" is synced (a) before filling sndbuf for cpu access (b) after filling sndbuf for device access Usage of receive buffer "RMB" is synced (a) before reading RMB content for cpu access (b) after reading RMB content for device access Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
3e034725 |
|
28-Jul-2017 |
Ursula Braun <ubraun@linux.vnet.ibm.com> |
net/smc: common functions for RMBs and send buffers Creation and deletion of SMC receive and send buffers shares a high amount of common code . This patch introduces common functions to get rid of duplicate code. Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
652a1e41 |
|
28-Jul-2017 |
Ursula Braun <ubraun@linux.vnet.ibm.com> |
net/smc: register RMB-related memory region A memory region created for a new RMB must be registered explicitly, before the peer can make use of it for remote DMA transfer. Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
977bb324 |
|
28-Jul-2017 |
Ursula Braun <ubraun@linux.vnet.ibm.com> |
net/smc: serialize connection creation in all cases If a link group for a new server connection exists already, the mutex serializing the determination of link groups is given up early. The coming registration of memory regions benefits from the serialization as well, if the mutex is held till connection creation is finished. This patch postpones the unlocking of the link group creation mutex. Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
5f0d5a3a |
|
18-Jan-2017 |
Paul E. McKenney <paulmck@kernel.org> |
mm: Rename SLAB_DESTROY_BY_RCU to SLAB_TYPESAFE_BY_RCU A group of Linux kernel hackers reported chasing a bug that resulted from their assumption that SLAB_DESTROY_BY_RCU provided an existence guarantee, that is, that no block from such a slab would be reallocated during an RCU read-side critical section. Of course, that is not the case. Instead, SLAB_DESTROY_BY_RCU only prevents freeing of an entire slab of blocks. However, there is a phrase for this, namely "type safety". This commit therefore renames SLAB_DESTROY_BY_RCU to SLAB_TYPESAFE_BY_RCU in order to avoid future instances of this sort of confusion. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: <linux-mm@kvack.org> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Vlastimil Babka <vbabka@suse.cz> [ paulmck: Add comments mentioning the old name, as requested by Eric Dumazet, in order to help people familiar with the old name find the new one. ] Acked-by: David Rientjes <rientjes@google.com>
|
#
288c8390 |
|
10-Apr-2017 |
Ursula Braun <ubraun@linux.vnet.ibm.com> |
net/smc: destruct non-accepted sockets Make sure sockets never accepted are removed cleanly. Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com> Reviewed-by: Thomas Richter <tmricht@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
f5227cd9 |
|
10-Apr-2017 |
Ursula Braun <ubraun@linux.vnet.ibm.com> |
net/smc: remove duplicate unhash unhash is already called in sock_put_work. Remove the second call. Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com> Reviewed-by: Thomas Richter <tmricht@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
46c28dbd |
|
10-Apr-2017 |
Ursula Braun <ubraun@linux.vnet.ibm.com> |
net/smc: no socket state changes in tasklet context Several state changes occur during SMC socket closing. Currently state changes triggered locally occur in process context with lock_sock() taken while state changes triggered by peer occur in tasklet context with bh_lock_sock() taken. bh_lock_sock() does not wait till a lock_sock(() task in process context is finished. This may lead to races in socket state transitions resulting in dangling SMC-sockets, or it may lead to duplicate SMC socket freeing. This patch introduces a closing worker to run all state changes under lock_sock(). Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com> Reviewed-by: Thomas Richter <tmricht@linux.vnet.ibm.com> Reported-by: Dave Jones <davej@codemonkey.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
cdfbabfb |
|
09-Mar-2017 |
David Howells <dhowells@redhat.com> |
net: Work around lockdep limitation in sockets that use sockets Lockdep issues a circular dependency warning when AFS issues an operation through AF_RXRPC from a context in which the VFS/VM holds the mmap_sem. The theory lockdep comes up with is as follows: (1) If the pagefault handler decides it needs to read pages from AFS, it calls AFS with mmap_sem held and AFS begins an AF_RXRPC call, but creating a call requires the socket lock: mmap_sem must be taken before sk_lock-AF_RXRPC (2) afs_open_socket() opens an AF_RXRPC socket and binds it. rxrpc_bind() binds the underlying UDP socket whilst holding its socket lock. inet_bind() takes its own socket lock: sk_lock-AF_RXRPC must be taken before sk_lock-AF_INET (3) Reading from a TCP socket into a userspace buffer might cause a fault and thus cause the kernel to take the mmap_sem, but the TCP socket is locked whilst doing this: sk_lock-AF_INET must be taken before mmap_sem However, lockdep's theory is wrong in this instance because it deals only with lock classes and not individual locks. The AF_INET lock in (2) isn't really equivalent to the AF_INET lock in (3) as the former deals with a socket entirely internal to the kernel that never sees userspace. This is a limitation in the design of lockdep. Fix the general case by: (1) Double up all the locking keys used in sockets so that one set are used if the socket is created by userspace and the other set is used if the socket is created by the kernel. (2) Store the kern parameter passed to sk_alloc() in a variable in the sock struct (sk_kern_sock). This informs sock_lock_init(), sock_init_data() and sk_clone_lock() as to the lock keys to be used. Note that the child created by sk_clone_lock() inherits the parent's kern setting. (3) Add a 'kern' parameter to ->accept() that is analogous to the one passed in to ->create() that distinguishes whether kernel_accept() or sys_accept4() was the caller and can be passed to sk_alloc(). Note that a lot of accept functions merely dequeue an already allocated socket. I haven't touched these as the new socket already exists before we get the parameter. Note also that there are a couple of places where I've made the accepted socket unconditionally kernel-based: irda_accept() rds_rcp_accept_one() tcp_accept_from_sock() because they follow a sock_create_kern() and accept off of that. Whilst creating this, I noticed that lustre and ocfs don't create sockets through sock_create_kern() and thus they aren't marked as for-kernel, though they appear to be internal. I wonder if these should do that so that they use the new set of lock keys. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
c3edc401 |
|
02-Feb-2017 |
Ingo Molnar <mingo@kernel.org> |
sched/headers: Move task_struct::signal and task_struct::sighand types and accessors into <linux/sched/signal.h> task_struct::signal and task_struct::sighand are pointers, which would normally make it straightforward to not define those types in sched.h. That is not so, because the types are accompanied by a myriad of APIs (macros and inline functions) that dereference them. Split the types and the APIs out of sched.h and move them into a new header, <linux/sched/signal.h>. With this change sched.h does not know about 'struct signal' and 'struct sighand' anymore, trying to put accessors into sched.h as a test fails the following way: ./include/linux/sched.h: In function ‘test_signal_types’: ./include/linux/sched.h:2461:18: error: dereferencing pointer to incomplete type ‘struct signal_struct’ ^ This reduces the size and complexity of sched.h significantly. Update all headers and .c code that relied on getting the signal handling functionality from <linux/sched.h> to include <linux/sched/signal.h>. The list of affected files in the preparatory patch was partly generated by grepping for the APIs, and partly by doing coverage build testing, both all[yes|mod|def|no]config builds on 64-bit and 32-bit x86, and an array of cross-architecture builds. Nevertheless some (trivial) build breakage is still expected related to rare Kconfig combinations and in-flight patches to various kernel code, but most of it should be handled by this patch. Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
f16a7dd5 |
|
09-Jan-2017 |
Ursula Braun <ubraun@linux.vnet.ibm.com> |
smc: netlink interface for SMC sockets Support for SMC socket monitoring via netlink sockets of protocol NETLINK_SOCK_DIAG. Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
b38d7324 |
|
09-Jan-2017 |
Ursula Braun <ubraun@linux.vnet.ibm.com> |
smc: socket closing and linkgroup cleanup smc_shutdown() and smc_release() handling delayed linkgroup cleanup for linkgroups without connections Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
952310cc |
|
09-Jan-2017 |
Ursula Braun <ubraun@linux.vnet.ibm.com> |
smc: receive data from RMBE move RMBE data into user space buffer and update managing cursors Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
e6727f39 |
|
09-Jan-2017 |
Ursula Braun <ubraun@linux.vnet.ibm.com> |
smc: send data (through RDMA) copy data to kernel send buffer, and trigger RDMA write Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
5f08318f |
|
09-Jan-2017 |
Ursula Braun <ubraun@linux.vnet.ibm.com> |
smc: connection data control (CDC) send and receive CDC messages (via IB message send and CQE) Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
9bf9abea |
|
09-Jan-2017 |
Ursula Braun <ubraun@linux.vnet.ibm.com> |
smc: link layer control (LLC) send and receive LLC messages CONFIRM_LINK (via IB message send and CQE) Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
bd4ad577 |
|
09-Jan-2017 |
Ursula Braun <ubraun@linux.vnet.ibm.com> |
smc: initialize IB transport incl. PD, MR, QP, CQ, event, WR Prepare the link for RDMA transport: Create a queue pair (QP) and move it into the state Ready-To-Receive (RTR). Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
cd6851f3 |
|
09-Jan-2017 |
Ursula Braun <ubraun@linux.vnet.ibm.com> |
smc: remote memory buffers (RMBs) * allocate data RMB memory for sending and receiving * size depends on the maximum socket send and receive buffers * allocated RMBs are kept during life time of the owning link group * map the allocated RMBs to DMA Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
0cfdd8f9 |
|
09-Jan-2017 |
Ursula Braun <ubraun@linux.vnet.ibm.com> |
smc: connection and link group creation * create smc_connection for SMC-sockets * determine suitable link group for a connection * create a new link group if necessary Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
a046d57d |
|
09-Jan-2017 |
Ursula Braun <ubraun@linux.vnet.ibm.com> |
smc: CLC handshake (incl. preparation steps) * CLC (Connection Layer Control) handshake Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
6812baab |
|
09-Jan-2017 |
Thomas Richter <tmricht@linux.vnet.ibm.com> |
smc: establish pnet table management Connection creation with SMC-R starts through an internal TCP-connection. The Ethernet interface for this TCP-connection is not restricted to the Ethernet interface of a RoCE device. Any existing Ethernet interface belonging to the same physical net can be used, as long as there is a defined relation between the Ethernet interface and some RoCE devices. This relation is defined with the help of an identification string called "Physical Net ID" or short "pnet ID". Information about defined pnet IDs and their related Ethernet interfaces and RoCE devices is stored in the SMC-R pnet table. A pnet table entry consists of the identifying pnet ID and the associated network and IB device. This patch adds pnet table configuration support using the generic netlink message interface referring to network and IB device by their names. Commands exist to add, delete, and display pnet table entries, and to flush or display the entire pnet table. There are cross-checks to verify whether the ethernet interfaces or infiniband devices really exist in the system. If either device is not available, the pnet ID entry is not created. Loss of network devices and IB devices is also monitored; a pnet ID entry is removed when an associated network or IB device is removed. Signed-off-by: Thomas Richter <tmricht@linux.vnet.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
a4cf0443 |
|
09-Jan-2017 |
Ursula Braun <ubraun@linux.vnet.ibm.com> |
smc: introduce SMC as an IB-client * create a list of SMC IB-devices Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
ac713874 |
|
09-Jan-2017 |
Ursula Braun <ubraun@linux.vnet.ibm.com> |
smc: establish new socket family * enable smc module loading and unloading * register new socket family * basic smc socket creation and deletion * use backing TCP socket to run CLC (Connection Layer Control) handshake of SMC protocol * Setup for infiniband traffic is implemented in follow-on patches. For now fallback to TCP socket is always used. Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com> Reviewed-by: Utz Bacher <utz.bacher@de.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|