History log of /linux-master/net/smc/smc_wr.c
Revision Date Author Comments
# 79a22238 16-Mar-2023 Kai Shen <KaiShen@linux.alibaba.com>

net/smc: Use percpu ref for wr tx reference

The refcount wr_tx_refcnt may cause cache thrashing problems among
cores and we can use percpu ref to mitigate this issue here. We
gain some performance improvement with percpu ref here on our
customized smc-r verion. Applying cache alignment may also mitigate
this problem but it seem more reasonable to use percpu ref here.
We can also replace wr_reg_refcnt with one percpu reference like
wr_tx_refcnt.

redis-benchmark on smc-r with atomic wr_tx_refcnt:
SET: 525707.06 requests per second, p50=0.087 msec
GET: 554877.38 requests per second, p50=0.087 msec

redis-benchmark on the percpu_ref version:
SET: 540482.06 requests per second, p50=0.087 msec
GET: 570711.12 requests per second, p50=0.079 msec

Cases are like "redis-benchmark -h x.x.x.x -q -t set,get -P 1 -n
5000000 -c 50 -d 10 --threads 4".

Signed-off-by: Kai Shen <KaiShen@linux.alibaba.com>
Reviewed-by: Tony Lu <tonylu@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# e9b1a4f8 06-Sep-2022 Yacan Liu <liuyacan@corp.netease.com>

net/smc: Fix possible access to freed memory in link clear

After modifying the QP to the Error state, all RX WR would be completed
with WC in IB_WC_WR_FLUSH_ERR status. Current implementation does not
wait for it is done, but destroy the QP and free the link group directly.
So there is a risk that accessing the freed memory in tasklet context.

Here is a crash example:

BUG: unable to handle page fault for address: ffffffff8f220860
#PF: supervisor write access in kernel mode
#PF: error_code(0x0002) - not-present page
PGD f7300e067 P4D f7300e067 PUD f7300f063 PMD 8c4e45063 PTE 800ffff08c9df060
Oops: 0002 [#1] SMP PTI
CPU: 1 PID: 0 Comm: swapper/1 Kdump: loaded Tainted: G S OE 5.10.0-0607+ #23
Hardware name: Inspur NF5280M4/YZMB-00689-101, BIOS 4.1.20 07/09/2018
RIP: 0010:native_queued_spin_lock_slowpath+0x176/0x1b0
Code: f3 90 48 8b 32 48 85 f6 74 f6 eb d5 c1 ee 12 83 e0 03 83 ee 01 48 c1 e0 05 48 63 f6 48 05 00 c8 02 00 48 03 04 f5 00 09 98 8e <48> 89 10 8b 42 08 85 c0 75 09 f3 90 8b 42 08 85 c0 74 f7 48 8b 32
RSP: 0018:ffffb3b6c001ebd8 EFLAGS: 00010086
RAX: ffffffff8f220860 RBX: 0000000000000246 RCX: 0000000000080000
RDX: ffff91db1f86c800 RSI: 000000000000173c RDI: ffff91db62bace00
RBP: ffff91db62bacc00 R08: 0000000000000000 R09: c00000010000028b
R10: 0000000000055198 R11: ffffb3b6c001ea58 R12: ffff91db80e05010
R13: 000000000000000a R14: 0000000000000006 R15: 0000000000000040
FS: 0000000000000000(0000) GS:ffff91db1f840000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffffff8f220860 CR3: 00000001f9580004 CR4: 00000000003706e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<IRQ>
_raw_spin_lock_irqsave+0x30/0x40
mlx5_ib_poll_cq+0x4c/0xc50 [mlx5_ib]
smc_wr_rx_tasklet_fn+0x56/0xa0 [smc]
tasklet_action_common.isra.21+0x66/0x100
__do_softirq+0xd5/0x29c
asm_call_irq_on_stack+0x12/0x20
</IRQ>
do_softirq_own_stack+0x37/0x40
irq_exit_rcu+0x9d/0xa0
sysvec_call_function_single+0x34/0x80
asm_sysvec_call_function_single+0x12/0x20

Fixes: bd4ad57718cc ("smc: initialize IB transport incl. PD, MR, QP, CQ, event, WR")
Signed-off-by: Yacan Liu <liuyacan@corp.netease.com>
Reviewed-by: Tony Lu <tonylu@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# b632eb06 15-May-2022 Guangguan Wang <guangguan.wang@linux.alibaba.com>

net/smc: send cdc msg inline if qp has sufficient inline space

As cdc msg's length is 44B, cdc msgs can be sent inline in
most rdma devices, which can help reducing sending latency.

In my test environment, which are 2 VMs running on the same
physical host and whose NICs(ConnectX-4Lx) are working on
SR-IOV mode, qperf shows 0.4us-0.7us improvement in latency.

Test command:
server: smc_run taskset -c 1 qperf
client: smc_run taskset -c 1 qperf <server ip> -oo \
msg_size:1:2K:*2 -t 30 -vu tcp_lat

The results shown below:
msgsize before after
1B 11.9 us 11.2 us (-0.7 us)
2B 11.7 us 11.2 us (-0.5 us)
4B 11.7 us 11.3 us (-0.4 us)
8B 11.6 us 11.2 us (-0.4 us)
16B 11.7 us 11.3 us (-0.4 us)
32B 11.7 us 11.3 us (-0.4 us)
64B 11.7 us 11.2 us (-0.5 us)
128B 11.6 us 11.2 us (-0.4 us)
256B 11.8 us 11.2 us (-0.6 us)
512B 11.8 us 11.4 us (-0.4 us)
1KB 11.9 us 11.4 us (-0.5 us)
2KB 12.1 us 11.5 us (-0.6 us)

Signed-off-by: Guangguan Wang <guangguan.wang@linux.alibaba.com>
Reviewed-by: Tony Lu <tonylu@linux.alibaba.com>
Tested-by: kernel test robot <lkp@intel.com>
Acked-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 925a2421 04-Mar-2022 Dust Li <dust.li@linux.alibaba.com>

Revert "net/smc: don't req_notify until all CQEs drained"

This reverts commit a505cce6f7cfaf2aa2385aab7286063c96444526.

Leon says:
We already discussed that. SMC should be changed to use
RDMA CQ pool API
drivers/infiniband/core/cq.c.
ib_poll_handler() has much better implementation (tracing,
IRQ rescheduling, proper error handling) than this SMC variant.

Since we will switch to ib_poll_handler() in the future,
revert this patch.

Link: https://lore.kernel.org/netdev/20220301105332.GA9417@linux.alibaba.com/
Suggested-by: Leon Romanovsky <leon@kernel.org>
Suggested-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Dust Li <dust.li@linux.alibaba.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# a505cce6 01-Mar-2022 Dust Li <dust.li@linux.alibaba.com>

net/smc: don't req_notify until all CQEs drained

When we are handling softirq workload, enable hardirq may
again interrupt the current routine of softirq, and then
try to raise softirq again. This only wastes CPU cycles
and won't have any real gain.

Since IB_CQ_REPORT_MISSED_EVENTS already make sure if
ib_req_notify_cq() returns 0, it is safe to wait for the
next event, with no need to poll the CQ again in this case.

This patch disables hardirq during the processing of softirq,
and re-arm the CQ after softirq is done. Somehow like NAPI.

Co-developed-by: Guangguan Wang <guangguan.wang@linux.alibaba.com>
Signed-off-by: Guangguan Wang <guangguan.wang@linux.alibaba.com>
Signed-off-by: Dust Li <dust.li@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 49dc9013 30-Dec-2021 Christophe JAILLET <christophe.jaillet@wanadoo.fr>

net/smc: Use the bitmap API when applicable

Using the bitmap API is less verbose than hand writing them.
It also improves the semantic.

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 349d4312 28-Dec-2021 Dust Li <dust.li@linux.alibaba.com>

net/smc: fix kernel panic caused by race of smc_sock

A crash occurs when smc_cdc_tx_handler() tries to access smc_sock
but smc_release() has already freed it.

[ 4570.695099] BUG: unable to handle page fault for address: 000000002eae9e88
[ 4570.696048] #PF: supervisor write access in kernel mode
[ 4570.696728] #PF: error_code(0x0002) - not-present page
[ 4570.697401] PGD 0 P4D 0
[ 4570.697716] Oops: 0002 [#1] PREEMPT SMP NOPTI
[ 4570.698228] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.16.0-rc4+ #111
[ 4570.699013] Hardware name: Alibaba Cloud Alibaba Cloud ECS, BIOS 8c24b4c 04/0
[ 4570.699933] RIP: 0010:_raw_spin_lock+0x1a/0x30
<...>
[ 4570.711446] Call Trace:
[ 4570.711746] <IRQ>
[ 4570.711992] smc_cdc_tx_handler+0x41/0xc0
[ 4570.712470] smc_wr_tx_tasklet_fn+0x213/0x560
[ 4570.712981] ? smc_cdc_tx_dismisser+0x10/0x10
[ 4570.713489] tasklet_action_common.isra.17+0x66/0x140
[ 4570.714083] __do_softirq+0x123/0x2f4
[ 4570.714521] irq_exit_rcu+0xc4/0xf0
[ 4570.714934] common_interrupt+0xba/0xe0

Though smc_cdc_tx_handler() checked the existence of smc connection,
smc_release() may have already dismissed and released the smc socket
before smc_cdc_tx_handler() further visits it.

smc_cdc_tx_handler() |smc_release()
if (!conn) |
|
|smc_cdc_tx_dismiss_slots()
| smc_cdc_tx_dismisser()
|
|sock_put(&smc->sk) <- last sock_put,
| smc_sock freed
bh_lock_sock(&smc->sk) (panic) |

To make sure we won't receive any CDC messages after we free the
smc_sock, add a refcount on the smc_connection for inflight CDC
message(posted to the QP but haven't received related CQE), and
don't release the smc_connection until all the inflight CDC messages
haven been done, for both success or failed ones.

Using refcount on CDC messages brings another problem: when the link
is going to be destroyed, smcr_link_clear() will reset the QP, which
then remove all the pending CQEs related to the QP in the CQ. To make
sure all the CQEs will always come back so the refcount on the
smc_connection can always reach 0, smc_ib_modify_qp_reset() was replaced
by smc_ib_modify_qp_error().
And remove the timeout in smc_wr_tx_wait_no_pending_sends() since we
need to wait for all pending WQEs done, or we may encounter use-after-
free when handling CQEs.

For IB device removal routine, we need to wait for all the QPs on that
device been destroyed before we can destroy CQs on the device, or
the refcount on smc_connection won't reach 0 and smc_sock cannot be
released.

Fixes: 5f08318f617b ("smc: connection data control (CDC)")
Reported-by: Wen Gu <guwen@linux.alibaba.com>
Signed-off-by: Dust Li <dust.li@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 90cee52f 28-Dec-2021 Dust Li <dust.li@linux.alibaba.com>

net/smc: don't send CDC/LLC message if link not ready

We found smc_llc_send_link_delete_all() sometimes wait
for 2s timeout when testing with RDMA link up/down.
It is possible when a smc_link is in ACTIVATING state,
the underlaying QP is still in RESET or RTR state, which
cannot send any messages out.

smc_llc_send_link_delete_all() use smc_link_usable() to
checks whether the link is usable, if the QP is still in
RESET or RTR state, but the smc_link is in ACTIVATING, this
LLC message will always fail without any CQE entering the
CQ, and we will always wait 2s before timeout.

Since we cannot send any messages through the QP before
the QP enter RTS. I add a wrapper smc_link_sendable()
which checks the state of QP along with the link state.
And replace smc_link_usable() with smc_link_sendable()
in all LLC & CDC message sending routine.

Fixes: 5f08318f617b ("smc: connection data control (CDC)")
Signed-off-by: Dust Li <dust.li@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 6d7373da 27-Dec-2021 Karsten Graul <kgraul@linux.ibm.com>

net/smc: fix using of uninitialized completions

In smc_wr_tx_send_wait() the completion on index specified by
pend->idx is initialized and after smc_wr_tx_send() was called the wait
for completion starts. pend->idx is used to get the correct index for
the wait, but the pend structure could already be cleared in
smc_wr_tx_process_cqe().
Introduce pnd_idx to hold and use a local copy of the correct index.

Fixes: 09c61d24f96d ("net/smc: wait for departure of an IB message")
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# b4ba4652 16-Oct-2021 Karsten Graul <kgraul@linux.ibm.com>

net/smc: extend LLC layer for SMC-Rv2

Add support for large v2 LLC control messages in smc_llc.c.
The new large work request buffer allows to combine control
messages into one packet that had to be spread over several
packets before.
Add handling of the new v2 LLC messages.

Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 8799e310 16-Oct-2021 Karsten Graul <kgraul@linux.ibm.com>

net/smc: add v2 support to the work request layer

In the work request layer define one large v2 buffer for each link group
that is used to transmit and receive large LLC control messages.
Add the completion queue handling for this buffer.

Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 8f3d65c1 09-Aug-2021 Karsten Graul <kgraul@linux.ibm.com>

net/smc: fix wait on already cleared link

There can be a race between the waiters for a tx work request buffer
and the link down processing that finally clears the link. Although
all waiters are woken up before the link is cleared there might be
waiters which did not yet get back control and are still waiting.
This results in an access to a cleared wait queue head.

Fix this by introducing atomic reference counting around the wait calls,
and wait with the link clear processing until all waiters have finished.
Move the work request layer related calls into smc_wr.c and set the
link state to INACTIVE before calling smcr_link_clear() in
smc_llc_srv_add_link().

Fixes: 15e1b99aadfb ("net/smc: no WR buffer wait for terminating link group")
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Guvenc Gulce <guvenc@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# fcb8e3a3 03-Nov-2020 Allen Pais <apais@linux.microsoft.com>

net: smc: convert tasklets to use new tasklet_setup() API

In preparation for unconditionally passing the
struct tasklet_struct pointer to all tasklet
callbacks, switch to using the new tasklet_setup()
and from_tasklet() to pass the tasklet pointer explicitly.

Signed-off-by: Romain Perier <romain.perier@gmail.com>
Signed-off-by: Allen Pais <apais@linux.microsoft.com>
Acked-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# b7eede75 08-Jul-2020 Karsten Graul <kgraul@linux.ibm.com>

net/smc: fix work request handling

Wait for pending sends only when smc_switch_conns() found a link to move
the connections to. Do not wait during link freeing, this can lead to
permanent hang situations. And refuse to provide a new tx slot on an
unusable link.

Fixes: c6f02ebeea3a ("net/smc: switch connections to alternate link")
Reviewed-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 09c61d24 04-May-2020 Karsten Graul <kgraul@linux.ibm.com>

net/smc: wait for departure of an IB message

Introduce smc_wr_tx_send_wait() to send an IB message and wait for the
tx completion event of the message. This makes sure that the message is
no longer in-flight when the function returns.

Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Reviewed-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# c9a5d243 03-May-2020 Karsten Graul <kgraul@linux.ibm.com>

net/smc: delete an asymmetric link as SMC server

When a link group moved from asymmetric to symmetric state then the
dangling asymmetric link can be deleted. Add smc_llc_find_asym_link() to
find the respective link and add smc_llc_delete_asym_link() to delete
it.

Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Reviewed-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 87523930 30-Apr-2020 Karsten Graul <kgraul@linux.ibm.com>

net/smc: take link down instead of terminating the link group

Use the introduced link down processing in all places where the link
group is terminated and take down the affected link only.

Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Reviewed-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# d854fcbf 29-Apr-2020 Karsten Graul <kgraul@linux.ibm.com>

net/smc: add new link state and related helpers

Before a link can be reused it must have been cleared. Lowest current
link state is INACTIVE, which does not mean that the link is already
cleared.
Add a new state UNUSED that is set when the link is cleared and can be
reused.
Add helper smc_llc_usable_link() to find an active link in a link group,
and smc_link_usable() to determine if a link is usable.

Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Reviewed-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 6a37ad3d 14-Nov-2019 Ursula Braun <ubraun@linux.ibm.com>

net/smc: wait for tx completions before link freeing

Make sure all pending work requests are completed before freeing
a link.
Dismiss tx pending slots already when terminating a link group to
exploit termination shortcut in tx completion queue handler.

And kill the completion queue tasklets after destroy of the
completion queues, otherwise there is a time window for another
tasklet schedule of an already killed tasklet.

Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 15e1b99a 14-Nov-2019 Ursula Braun <ubraun@linux.ibm.com>

net/smc: no WR buffer wait for terminating link group

Avoid waiting for a free work request buffer, if the link group
is already terminating.

Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# f528ba24 21-Oct-2019 Ursula Braun <ubraun@linux.ibm.com>

net/smc: introduce link group termination worker

Use a worker for link group termination to guarantee process context.

Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>


# ad6f317f 04-Feb-2019 Ursula Braun <ubraun@linux.ibm.com>

net/smc: preallocated memory for rdma work requests

The work requests for rdma writes are built in local variables within
function smc_tx_rdma_write(). This violates the rule that the work
request storage has to stay till the work request is confirmed by
a completion queue response.
This patch introduces preallocated memory for these work requests.
The storage is allocated, once a link (and thus a queue pair) is
established.

Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 46ad0222 30-Jan-2019 Karsten Graul <kgraul@linux.ibm.com>

net/smc: fix use of variable in cleared area

Do not use pend->idx as index for the arrays because its value is
located in the cleared area. Use the existing local variable instead.
Without this fix the wrong area might be cleared.

Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# e438bae4 20-Nov-2018 Ursula Braun <ursula.braun@linux.ibm.com>

net/smc: use after free fix in smc_wr_tx_put_slot()

In smc_wr_tx_put_slot() field pend->idx is used after being
cleared. That means always idx 0 is cleared in the wr_tx_mask.
This results in a broken administration of available WR send
payload buffers.

Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 0d18a0cb 25-Jul-2018 Karsten Graul <kgraul@linux.ibm.com>

net/smc: improve delete link processing

Send an orderly DELETE LINK request before termination of a link group,
add support for client triggered DELETE LINK processing. And send a
disorderly DELETE LINK before module is unloaded.

Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 2e3bbe46 18-Jul-2018 Bart Van Assche <bvanassche@acm.org>

net/smc: Simplify ib_post_(send|recv|srq_recv)() calls

Instead of declaring and passing a dummy 'bad_wr' pointer, pass NULL
as third argument to ib_post_(send|recv|srq_recv)().

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Acked-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>


# 12c36dcf 18-Jul-2018 Bart Van Assche <bvanassche@acm.org>

net/smc: Remove a WARN_ON() statement

Remove a WARN_ON() statement that verifies something that is guaranteed
by the RDMA API, namely that the failed_wr pointer is not touched if an
ib_post_send() call succeeds and that it points at the failed wr if an
ib_post_send() call fails.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Acked-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>


# 00e5fb26 23-Jul-2018 Stefan Raspl <raspl@linux.ibm.com>

net/smc: add function to get link group from link

Replace a frequently used construct with a more readable variant,
reducing the code. Also might come handy when we start to support
more than a single per link group.

Signed-off-by: Stefan Raspl <raspl@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 6396bb22 12-Jun-2018 Kees Cook <keescook@chromium.org>

treewide: kzalloc() -> kcalloc()

The kzalloc() function has a 2-factor argument form, kcalloc(). This
patch replaces cases of:

kzalloc(a * b, gfp)

with:
kcalloc(a * b, gfp)

as well as handling cases of:

kzalloc(a * b * c, gfp)

with:

kzalloc(array3_size(a, b, c), gfp)

as it's slightly less ugly than:

kzalloc_array(array_size(a, b), c, gfp)

This does, however, attempt to ignore constant size factors like:

kzalloc(4 * 1024, gfp)

though any constants defined via macros get caught up in the conversion.

Any factors with a sizeof() of "unsigned char", "char", and "u8" were
dropped, since they're redundant.

The Coccinelle script used for this was:

// Fix redundant parens around sizeof().
@@
type TYPE;
expression THING, E;
@@

(
kzalloc(
- (sizeof(TYPE)) * E
+ sizeof(TYPE) * E
, ...)
|
kzalloc(
- (sizeof(THING)) * E
+ sizeof(THING) * E
, ...)
)

// Drop single-byte sizes and redundant parens.
@@
expression COUNT;
typedef u8;
typedef __u8;
@@

(
kzalloc(
- sizeof(u8) * (COUNT)
+ COUNT
, ...)
|
kzalloc(
- sizeof(__u8) * (COUNT)
+ COUNT
, ...)
|
kzalloc(
- sizeof(char) * (COUNT)
+ COUNT
, ...)
|
kzalloc(
- sizeof(unsigned char) * (COUNT)
+ COUNT
, ...)
|
kzalloc(
- sizeof(u8) * COUNT
+ COUNT
, ...)
|
kzalloc(
- sizeof(__u8) * COUNT
+ COUNT
, ...)
|
kzalloc(
- sizeof(char) * COUNT
+ COUNT
, ...)
|
kzalloc(
- sizeof(unsigned char) * COUNT
+ COUNT
, ...)
)

// 2-factor product with sizeof(type/expression) and identifier or constant.
@@
type TYPE;
expression THING;
identifier COUNT_ID;
constant COUNT_CONST;
@@

(
- kzalloc
+ kcalloc
(
- sizeof(TYPE) * (COUNT_ID)
+ COUNT_ID, sizeof(TYPE)
, ...)
|
- kzalloc
+ kcalloc
(
- sizeof(TYPE) * COUNT_ID
+ COUNT_ID, sizeof(TYPE)
, ...)
|
- kzalloc
+ kcalloc
(
- sizeof(TYPE) * (COUNT_CONST)
+ COUNT_CONST, sizeof(TYPE)
, ...)
|
- kzalloc
+ kcalloc
(
- sizeof(TYPE) * COUNT_CONST
+ COUNT_CONST, sizeof(TYPE)
, ...)
|
- kzalloc
+ kcalloc
(
- sizeof(THING) * (COUNT_ID)
+ COUNT_ID, sizeof(THING)
, ...)
|
- kzalloc
+ kcalloc
(
- sizeof(THING) * COUNT_ID
+ COUNT_ID, sizeof(THING)
, ...)
|
- kzalloc
+ kcalloc
(
- sizeof(THING) * (COUNT_CONST)
+ COUNT_CONST, sizeof(THING)
, ...)
|
- kzalloc
+ kcalloc
(
- sizeof(THING) * COUNT_CONST
+ COUNT_CONST, sizeof(THING)
, ...)
)

// 2-factor product, only identifiers.
@@
identifier SIZE, COUNT;
@@

- kzalloc
+ kcalloc
(
- SIZE * COUNT
+ COUNT, SIZE
, ...)

// 3-factor product with 1 sizeof(type) or sizeof(expression), with
// redundant parens removed.
@@
expression THING;
identifier STRIDE, COUNT;
type TYPE;
@@

(
kzalloc(
- sizeof(TYPE) * (COUNT) * (STRIDE)
+ array3_size(COUNT, STRIDE, sizeof(TYPE))
, ...)
|
kzalloc(
- sizeof(TYPE) * (COUNT) * STRIDE
+ array3_size(COUNT, STRIDE, sizeof(TYPE))
, ...)
|
kzalloc(
- sizeof(TYPE) * COUNT * (STRIDE)
+ array3_size(COUNT, STRIDE, sizeof(TYPE))
, ...)
|
kzalloc(
- sizeof(TYPE) * COUNT * STRIDE
+ array3_size(COUNT, STRIDE, sizeof(TYPE))
, ...)
|
kzalloc(
- sizeof(THING) * (COUNT) * (STRIDE)
+ array3_size(COUNT, STRIDE, sizeof(THING))
, ...)
|
kzalloc(
- sizeof(THING) * (COUNT) * STRIDE
+ array3_size(COUNT, STRIDE, sizeof(THING))
, ...)
|
kzalloc(
- sizeof(THING) * COUNT * (STRIDE)
+ array3_size(COUNT, STRIDE, sizeof(THING))
, ...)
|
kzalloc(
- sizeof(THING) * COUNT * STRIDE
+ array3_size(COUNT, STRIDE, sizeof(THING))
, ...)
)

// 3-factor product with 2 sizeof(variable), with redundant parens removed.
@@
expression THING1, THING2;
identifier COUNT;
type TYPE1, TYPE2;
@@

(
kzalloc(
- sizeof(TYPE1) * sizeof(TYPE2) * COUNT
+ array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
, ...)
|
kzalloc(
- sizeof(TYPE1) * sizeof(THING2) * (COUNT)
+ array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
, ...)
|
kzalloc(
- sizeof(THING1) * sizeof(THING2) * COUNT
+ array3_size(COUNT, sizeof(THING1), sizeof(THING2))
, ...)
|
kzalloc(
- sizeof(THING1) * sizeof(THING2) * (COUNT)
+ array3_size(COUNT, sizeof(THING1), sizeof(THING2))
, ...)
|
kzalloc(
- sizeof(TYPE1) * sizeof(THING2) * COUNT
+ array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
, ...)
|
kzalloc(
- sizeof(TYPE1) * sizeof(THING2) * (COUNT)
+ array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
, ...)
)

// 3-factor product, only identifiers, with redundant parens removed.
@@
identifier STRIDE, SIZE, COUNT;
@@

(
kzalloc(
- (COUNT) * STRIDE * SIZE
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
kzalloc(
- COUNT * (STRIDE) * SIZE
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
kzalloc(
- COUNT * STRIDE * (SIZE)
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
kzalloc(
- (COUNT) * (STRIDE) * SIZE
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
kzalloc(
- COUNT * (STRIDE) * (SIZE)
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
kzalloc(
- (COUNT) * STRIDE * (SIZE)
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
kzalloc(
- (COUNT) * (STRIDE) * (SIZE)
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
kzalloc(
- COUNT * STRIDE * SIZE
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
)

// Any remaining multi-factor products, first at least 3-factor products,
// when they're not all constants...
@@
expression E1, E2, E3;
constant C1, C2, C3;
@@

(
kzalloc(C1 * C2 * C3, ...)
|
kzalloc(
- (E1) * E2 * E3
+ array3_size(E1, E2, E3)
, ...)
|
kzalloc(
- (E1) * (E2) * E3
+ array3_size(E1, E2, E3)
, ...)
|
kzalloc(
- (E1) * (E2) * (E3)
+ array3_size(E1, E2, E3)
, ...)
|
kzalloc(
- E1 * E2 * E3
+ array3_size(E1, E2, E3)
, ...)
)

// And then all remaining 2 factors products when they're not all constants,
// keeping sizeof() as the second factor argument.
@@
expression THING, E1, E2;
type TYPE;
constant C1, C2, C3;
@@

(
kzalloc(sizeof(THING) * C2, ...)
|
kzalloc(sizeof(TYPE) * C2, ...)
|
kzalloc(C1 * C2 * C3, ...)
|
kzalloc(C1 * C2, ...)
|
- kzalloc
+ kcalloc
(
- sizeof(TYPE) * (E2)
+ E2, sizeof(TYPE)
, ...)
|
- kzalloc
+ kcalloc
(
- sizeof(TYPE) * E2
+ E2, sizeof(TYPE)
, ...)
|
- kzalloc
+ kcalloc
(
- sizeof(THING) * (E2)
+ E2, sizeof(THING)
, ...)
|
- kzalloc
+ kcalloc
(
- sizeof(THING) * E2
+ E2, sizeof(THING)
, ...)
|
- kzalloc
+ kcalloc
(
- (E1) * E2
+ E1, E2
, ...)
|
- kzalloc
+ kcalloc
(
- (E1) * (E2)
+ E1, E2
, ...)
|
- kzalloc
+ kcalloc
(
- E1 * E2
+ E1, E2
, ...)
)

Signed-off-by: Kees Cook <keescook@chromium.org>


# 877ae5be 02-May-2018 Karsten Graul <kgraul@linux.ibm.com>

net/smc: periodic testlink support

Add periodic LLC testlink support to ensure the link is still active.
The interval time is initialized using the value of
sysctl_tcp_keepalive_time.

Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 1a0a04c7 25-Jan-2018 Ursula Braun <ubraun@linux.vnet.ibm.com>

net/smc: check for healthy link group resp. connections

If a problem for at least one connection of a link group is detected,
the whole link group and all its connections are terminated.
This patch adds a check for healthy link group when trying to reserve
a work request, and checks for healthy connections before starting
a tx worker.

Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# b4772b3a 25-Jan-2018 Ursula Braun <ubraun@linux.vnet.ibm.com>

net/smc: terminate link group for ib_post_send problems

If ib_post_send() fails, terminate all connections of this
link group.

Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 86e780d3 24-Jan-2018 Ursula Braun <ubraun@linux.vnet.ibm.com>

net/smc: make wait for work request uninterruptible

Work requests are needed for every ib_post_send(), among them the
ib_post_send() to signal closing. If an smc socket program is cancelled,
the smc connections should be cleaned up, and require sending of closing
signals to the peer. This may fail, if a wait for
a free work request is needed, but is cancelled immediately due to the
cancel interrupt. To guarantee notification of the peer, the wait for
a work request is changed to uninterruptible.

And the area to receive work request completion info with
ib_poll_cq() is cleared first.
And _tx_ variable names are used in the _tx_routines for the
demultiplexing common type in the header.

Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 8429c134 24-Jan-2018 Ursula Braun <ubraun@linux.vnet.ibm.com>

net/smc: get rid of tx_pend waits in socket closing

There is no need to wait for confirmation of pending tx requests
for a closing connection, since pending tx slots are dismissed
when finishing a connection.

Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# b2441318 01-Nov-2017 Greg Kroah-Hartman <gregkh@linuxfoundation.org>

License cleanup: add SPDX GPL-2.0 license identifier to files with no license

Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.

By default all files without license information are under the default
license of the kernel, which is GPL version 2.

Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier. The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.

This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.

How this work was done:

Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
- file had no licensing information it it.
- file was a */uapi/* one with no licensing information in it,
- file was a */uapi/* one with existing licensing information,

Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.

The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne. Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.

The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed. Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.

Criteria used to select files for SPDX license identifier tagging was:
- Files considered eligible had to be source code files.
- Make and config files were included as candidates if they contained >5
lines of source
- File already had some variant of a license header in it (even if <5
lines).

All documentation files were explicitly excluded.

The following heuristics were used to determine which SPDX license
identifiers to apply.

- when both scanners couldn't find any license traces, file was
considered to have no license information in it, and the top level
COPYING file license applied.

For non */uapi/* files that summary was:

SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 11139

and resulted in the first patch in this series.

If that file was a */uapi/* path one, it was "GPL-2.0 WITH
Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was:

SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 WITH Linux-syscall-note 930

and resulted in the second patch in this series.

- if a file had some form of licensing information in it, and was one
of the */uapi/* ones, it was denoted with the Linux-syscall-note if
any GPL family license was found in the file or had no licensing in
it (per prior point). Results summary:

SPDX license identifier # files
---------------------------------------------------|------
GPL-2.0 WITH Linux-syscall-note 270
GPL-2.0+ WITH Linux-syscall-note 169
((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21
((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17
LGPL-2.1+ WITH Linux-syscall-note 15
GPL-1.0+ WITH Linux-syscall-note 14
((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5
LGPL-2.0+ WITH Linux-syscall-note 4
LGPL-2.1 WITH Linux-syscall-note 3
((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3
((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1

and that resulted in the third patch in this series.

- when the two scanners agreed on the detected license(s), that became
the concluded license(s).

- when there was disagreement between the two scanners (one detected a
license but the other didn't, or they both detected different
licenses) a manual inspection of the file occurred.

- In most cases a manual inspection of the information in the file
resulted in a clear resolution of the license that should apply (and
which scanner probably needed to revisit its heuristics).

- When it was not immediately clear, the license identifier was
confirmed with lawyers working with the Linux Foundation.

- If there was any question as to the appropriate license identifier,
the file was flagged for further research and to be revisited later
in time.

In total, over 70 hours of logged manual review was done on the
spreadsheet to determine the SPDX license identifiers to apply to the
source files by Kate, Philippe, Thomas and, in some cases, confirmation
by lawyers working with the Linux Foundation.

Kate also obtained a third independent scan of the 4.13 code base from
FOSSology, and compared selected files where the other two scanners
disagreed against that SPDX file, to see if there was new insights. The
Windriver scanner is based on an older version of FOSSology in part, so
they are related.

Thomas did random spot checks in about 500 files from the spreadsheets
for the uapi headers and agreed with SPDX license identifier in the
files he inspected. For the non-uapi files Thomas did random spot checks
in about 15000 files.

In initial set of patches against 4.14-rc6, 3 files were found to have
copy/paste license identifier errors, and have been fixed to reflect the
correct identifier.

Additionally Philippe spent 10 hours this week doing a detailed manual
inspection and review of the 12,461 patched files from the initial patch
version early this week with:
- a full scancode scan run, collecting the matched texts, detected
license ids and scores
- reviewing anything where there was a license detected (about 500+
files) to ensure that the applied SPDX license was correct
- reviewing anything where there was no detection but the patch license
was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
SPDX license was correct

This produced a worksheet with 20 files needing minor correction. This
worksheet was then exported into 3 different .csv files for the
different types of files to be modified.

These .csv files were then reviewed by Greg. Thomas wrote a script to
parse the csv files and add the proper SPDX tag to the file, in the
format that the file expected. This script was further refined by Greg
based on the output to detect more types of files automatically and to
distinguish between header and source .c files (which need different
comment types.) Finally Greg ran the script using the .csv files to
generate the patches.

Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


# 8301fa44 21-Sep-2017 Ursula Braun <ubraun@linux.vnet.ibm.com>

net/smc: adapt send request completion notification

The solicited flag is meaningful for the receive completion queue.
Ask for next work completion of any type on the send queue.

Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 652a1e41 28-Jul-2017 Ursula Braun <ubraun@linux.vnet.ibm.com>

net/smc: register RMB-related memory region

A memory region created for a new RMB must be registered explicitly,
before the peer can make use of it for remote DMA transfer.

Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 2c9c1682 10-Apr-2017 Ursula Braun <ubraun@linux.vnet.ibm.com>

net/smc: do not use IB_SEND_INLINE together with mapped data

smc specifies IB_SEND_INLINE for IB_WR_SEND ib_post_send calls, but
provides a mapped buffer to be sent. This is inconsistent, since
IB_SEND_INLINE works without mapped buffer. Problem has not been
detected in the past, because tests had been limited to Connect X3 cards
from Mellanox, whose mlx4 driver just ignored the IB_SEND_INLINE flag.
For now, the IB_SEND_INLINE flag is removed.

Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Reviewed-by: Thomas Richter <tmricht@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# b38d7324 09-Jan-2017 Ursula Braun <ubraun@linux.vnet.ibm.com>

smc: socket closing and linkgroup cleanup

smc_shutdown() and smc_release() handling
delayed linkgroup cleanup for linkgroups without connections

Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 5f08318f 09-Jan-2017 Ursula Braun <ubraun@linux.vnet.ibm.com>

smc: connection data control (CDC)

send and receive CDC messages (via IB message send and CQE)

Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# bd4ad577 09-Jan-2017 Ursula Braun <ubraun@linux.vnet.ibm.com>

smc: initialize IB transport incl. PD, MR, QP, CQ, event, WR

Prepare the link for RDMA transport:
Create a queue pair (QP) and move it into the state Ready-To-Receive (RTR).

Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# f38ba179 09-Jan-2017 Ursula Braun <ubraun@linux.vnet.ibm.com>

smc: work request (WR) base for use by LLC and CDC

The base containers for RDMA transport are work requests and completion
queue entries processed through Infiniband verbs:
* allocate and initialize these areas
* map these areas to DMA
* implement the basic communication consisting of work request posting
and receival of completion queue events

Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>